Direct Utility Estimation
In this method, the agent executes a sequence of trials or runs (sequences of states-actions transitions that continue until the agent reaches the terminal state). Each trial gives a sample value and the agent estimates the utility based on the samples values. Can be calculated as running averages of sample values. The main drawback is that this method makes a wrong assumption that state utilities are independent while in reality they are Markovian. Also, it is slow to converge.
Download
0 formats
No download links available.
Reinforcement Learning (Passive Learning- Direct Utility Estimation) | NatokHD