Policy iteration is a method for finding a solution to a Markov decision process. Deep Q-learning introduces a practical, model-free, way to determine the value function in off-policy settings.
Learn more about the Duckietown massive online open course "Self-Driving Cars with Duckietown" on https://www.duckietown.org/mooc