Talk in the ZHAW Datalab Seminar series of lunch-time lectures, November 15, 2018.
Outline:
- Learning to act
- Example: DeepMind’s Alpha Zero
- Training the policy/value network
Slides: https://stdm.github.io/downloads/talks/2018-11-15_AlphaZero-LearningGamesFromSelfplay.pdf
Issues:
- Sorry, no audio for the last 1.5 minutes of the Q&A part.
- More background on the question discussed at minute 52:02 (quivalent of duration of a single training run on a single machine) can be found here: http://computer-go.org/pipermail/computer-go/2017-October/010307.html. Actually, this reference speaks of 1,700 years, not 1.7m years as implied by one of the discussants.
More information: https://www.zhaw.ch/datalab