Adam Optimization from Scratch in Python
π¨βπ» to get started with AI engineering, check out this Scrimba course: https://scrimba.com/the-ai-engineer-path-c02v?via=yacineMahdid Adam is yet another stochastic gradient descent technique, building on Adadelta and RMSProp it fixes the shortcoming of Adagrad by using two running average in its calculation. You can find the code for this video over here: https://github.com/yacineMahdid/artificial-intelligence-and-machine-learning ## Credit Check out this blogpost for more gradient descent explanation: https://ruder.io/optimizing-gradient-descent/index.html#adam The music is taken from Youtube music! ## Table of Content Introduction: 0:00 Theory: 0:21 Python Implementation: 3:49 Conclusion: 12:04 Here is an explanation of Adam from the blog post mentioned above which I find very intuitive: "Adaptive Moment Estimation (Adam) is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum. Whereas momentum can be seen as a ball running down a slope, Adam behaves like a heavy ball with friction, which thus prefers flat minima in the error surface. " ## Reference Kingma, D. P., & Ba, J. L. (2015). Adam: a Method for Stochastic Optimization. International Conference on Learning Representations, 1β13 ---- Join the Discord for general discussion: https://discord.gg/QpkxRbQBpf ---- Follow Me Online Here: Twitter: https://twitter.com/CodeThisCodeTh1 GitHub: https://github.com/yacineMahdid LinkedIn: https://www.linkedin.com/in/yacine-mahdid-809425163/ Instagram: https://www.instagram.com/yacine_mahdid/ ___ Have a great week! π
Download
1 formatsVideo Formats
Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.