This lecture uses the excellent MDP example from David Silver.
Slides: https://cwkx.github.io/data/teaching/dl-and-rl/rl-lecture2.pdf
Colab: https://colab.research.google.com/gist/cwkx/ba6c44031137575d2445901ee90454da/mrp.ipynb
Twitter: https://twitter.com/cwkx
Next video: https://www.youtube.com/playlist?list=PLMsTLcO6ettgmyLVrcPvFLYi2Rs-R4JOE
Content:
Markov Chains
- markov property
- state transition matrix
- definition and example
Markov Reward Process
- definition and example
- the return
- state value function
- the Bellman equation
Markov Decision Process
- definition and example
- policies
- state and action value functions
- the Bellman equation for MDPs
- optimal state and action value functions
- the Bellman optimality equations
#MDPs #MRPs #markovchains #reinforcementlearning #BellmanEquations #BellmanOptimality