The Upper Confidence Bounds multi-armed bandit algorithm is a statistically smart way to balance exploration and exploitation when making decisions under uncertainties. In this video, I explain and implement UCB.
Notebook : https://colab.research.google.com/drive/1egLv7viZQXfqynh6bPli5V_pnwBQ-SQG?usp=sharing