Back to Browse

Reinforcement Learning (Adaptive Dynamic Programming (ADP), Temporal Difference Learning)

2.0K views
May 1, 2021
57:08

ADP is a smarter method than Direct Utility Estimation as it runs trials to learn the model of the environment by estimating the utility of a state as a sum of reward for being in that state and the expected discounted reward of being in the next state. Calculating the exact utility for a state can we approximate it and possibly make it less computationally expensive? Yes we can! Using Temporal Difference (TD) learning

Download

0 formats

No download links available.

Reinforcement Learning (Adaptive Dynamic Programming (ADP), Temporal Difference Learning) | NatokHD