Back to Browse

03 Reinforcement Learning with Markov Decision Processes

10 views
Jan 5, 2026
17:38

In this video, we build a solid foundation for Reinforcement Learning by modeling sequential decision-making problems as Markov Decision Processes (MDPs). You’ll learn what makes an environment “Markov,” how to define states, actions, transition dynamics, rewards, and discounting, and how these pieces fit together to describe an RL task. We then work through a practical example (a Grid World–style setup) and explore how to solve MDPs using both: Planning methods (e.g., value-based dynamic programming like value iteration) when the model is known, and Model-free RL methods like TD learning, Q-learning, and SARSA when the model is unknown. Finally, we take the crucial next step: real-world environments are often partially observable. That’s where POMDPs (Partially Observable MDPs) come in. We explain why full-state observability is a strong assumption, what changes when you only get noisy/incomplete observations, and how this motivates memory and state estimation (e.g., belief-based approaches) to make better decisions. ✅ By the end, you’ll understand when an MDP is enough, when you need a POMDP, and how this shift sets the stage for more advanced topics like state estimation and filtering. Topics covered - MDP definition: states, actions, transitions, rewards, discount factor - Grid World example + policy/value intuition - Planning vs learning (model-based vs model-free) - Value iteration, TD learning, Q-learning, SARSA (high-level intuition) - Why partial observability breaks the MDP assumption - From MDPs to POMDPs: observations, hidden state, belief/memory

Download

0 formats

No download links available.

03 Reinforcement Learning with Markov Decision Processes | NatokHD