Jake Tuero
Search
Search
Dark mode
Light mode
Explorer
rl
17 items with this tag.
Jun 23, 2026
A2C
rl
policy-gradient
actor-critic
on-policy
a2c
Jun 23, 2026
Actor Critic Methods
rl
actor-critic
Jun 23, 2026
Deterministic Policy Gradient Methods
rl
actor-critic
off-policy
policy-gradient
ddpg
td3
Jun 23, 2026
Generalized Advantage Estimation
rl
policy-gradient
actor-critic
on-policy
Jun 23, 2026
LLMs for RL
rl
llm
Jun 23, 2026
Markov Decision Process
rl
Jun 23, 2026
Model-Based RL
rl
mbrl
dyna
Jun 23, 2026
Off-Policy Methods
rl
off-policy
impala
Jun 23, 2026
Policy Gradient Methods
rl
policy-gradient
reinforce
Jun 23, 2026
Policy Improvement Methods
rl
policy-gradient
policy-improvement
on-policy
ppo
trpo
Jun 23, 2026
Q Learning
rl
off-policy
value
Jun 23, 2026
REINFORCE
rl
on-policy
policy-gradient
Jun 23, 2026
RL as Inference
rl
off-policy
Jun 23, 2026
RL for LLMs
rl
llm
ppo
grpo
dapo
gspo
drgrpo
rlft
rlvr
rlhf
cot
Jun 23, 2026
Soft Actor Critic (SAC)
rl
off-policy
actor-critic
Jun 23, 2026
Value Based RL
rl
value
on-policy
Jun 23, 2026
World Models
rl
mbrl