We propose Q2RL, Q-Estimation and Q-Gating from Behavior Cloning for Reinforcement Learning, an algorithm for efficient offline-to-online learning.
Accelerating Residual Reinforcement Learning with Uncertainty Estimation
We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty...