Jake Tuero

rl

17 items with this tag.

  • Jun 23, 2026

    A2C

    • rl
    • policy-gradient
    • actor-critic
    • on-policy
    • a2c
  • Jun 23, 2026

    Actor Critic Methods

    • rl
    • actor-critic
  • Jun 23, 2026

    Deterministic Policy Gradient Methods

    • rl
    • actor-critic
    • off-policy
    • policy-gradient
    • ddpg
    • td3
  • Jun 23, 2026

    Generalized Advantage Estimation

    • rl
    • policy-gradient
    • actor-critic
    • on-policy
  • Jun 23, 2026

    LLMs for RL

    • rl
    • llm
  • Jun 23, 2026

    Markov Decision Process

    • rl
  • Jun 23, 2026

    Model-Based RL

    • rl
    • mbrl
    • dyna
  • Jun 23, 2026

    Off-Policy Methods

    • rl
    • off-policy
    • impala
  • Jun 23, 2026

    Policy Gradient Methods

    • rl
    • policy-gradient
    • reinforce
  • Jun 23, 2026

    Policy Improvement Methods

    • rl
    • policy-gradient
    • policy-improvement
    • on-policy
    • ppo
    • trpo
  • Jun 23, 2026

    Q Learning

    • rl
    • off-policy
    • value
  • Jun 23, 2026

    REINFORCE

    • rl
    • on-policy
    • policy-gradient
  • Jun 23, 2026

    RL as Inference

    • rl
    • off-policy
  • Jun 23, 2026

    RL for LLMs

    • rl
    • llm
    • ppo
    • grpo
    • dapo
    • gspo
    • drgrpo
    • rlft
    • rlvr
    • rlhf
    • cot
  • Jun 23, 2026

    Soft Actor Critic (SAC)

    • rl
    • off-policy
    • actor-critic
  • Jun 23, 2026

    Value Based RL

    • rl
    • value
    • on-policy
  • Jun 23, 2026

    World Models

    • rl
    • mbrl

Created with Quartz v5.0.0 © 2026

  • GitHub
  • Twitter
  • LinkedIn
  • Scholar