RobotLearning: Scaling Deep Q-Learning Part2
I discussed the challenges of training a Q-function when using deep learning to maintain contractive learning, highlighting the instability caused by updates that affect both the predicted and target Q-values, leading to potential divergence. To address this, I explained the concept of a target network, which is a delayed copy of the Q-network used to stabilize the learning process by keeping the target values fixed for a period. I also covered the issue of overestimation in Q-learning due to the maximization operation and introduced double Q-learning as a solution, where the online Q-function selects the best action, and the target network evaluates it, reducing overestimation. I then delved into the "deadly triad" of off-policy learning, bootstrapping, and function approximation, emphasizing the difficulties in combining these three elements. Finally, I briefly discussed the use of n-step returns to reduce bias and improve training. I then transitioned into discussing more modern applications of Q-learning, specifically highlighting the QT-Opt algorithm for robotic grasping, which uses multiple robot arms and a cross-entropy method for continuous action spaces, and the PQ-N algorithm which aims to reduce the need for target networks and replay buffers.
Download
0 formatsNo download links available.