Back to Browse

Reinforcement Learning (Passive Learning- Direct Utility Estimation)

3.0K views
May 1, 2021
1:03:17

Direct Utility Estimation In this method, the agent executes a sequence of trials or runs (sequences of states-actions transitions that continue until the agent reaches the terminal state). Each trial gives a sample value and the agent estimates the utility based on the samples values. Can be calculated as running averages of sample values. The main drawback is that this method makes a wrong assumption that state utilities are independent while in reality they are Markovian. Also, it is slow to converge.

Download

0 formats

No download links available.

Reinforcement Learning (Passive Learning- Direct Utility Estimation) | NatokHD