RobotLearning: Scaling Deep Q-Learning Part1

Name: RobotLearning: Scaling Deep Q-Learning Part1
Uploaded: Mar 10, 2025
Duration: 2245 s

Montreal Robotics3.2K subscribers

285 views

Mar 10, 2025

37:25

In this lecture segment, I explained the progression from simple bandits to Q-learning, outlining the challenges and solutions in reinforcement learning. I began by discussing multi-armed bandits, emphasizing the exploration-exploitation dilemma and introducing methods like epsilon-greedy and upper confidence bound (UCB) to balance these competing needs. I then moved to contextual bandits, which incorporate state information, and finally to Q-learning, which learns a state-dependent policy. I highlighted the advantages of Q-learning over policy gradients, such as its ability to learn from off-policy data and its lower variance. I delved into the concept of approximate dynamic programming, explaining how value and policy iteration methods, like value iteration and policy iteration, can be used to train a Q-function. I discussed the computational cost of these methods, particularly the need to perform an argmax over all possible actions, and how policy iteration can reduce this cost by bootstrapping on previous policies. I concluded by hinting at the possibility of combining policy evaluation and improvement into a single step for further efficiency.

Download

0 formats

No download links available.