Hi guys, This is the first video in a three video series where we completely dive into the transformers and how they work.
In this video, we go over the general idea of the architecture as well as how each component works and what they are responsible for.
We also get a deeper intuitive understanding behind self-attention one of the ideas that was revolutionary at the time.
This is the blog I took inspiration from: https://goyalpramod.github.io/blogs/Transformers_laid_out/
To read more about normalization check this out: https://www.pinecone.io/learn/batch-layer-normalization/
Consider connecting with me on my socials!!
X: https://x.com/goyal__pramod
LinkedIn: https://www.linkedin.com/in/goyalpramod/
Happy learning, hope you have fun understanding the ideas. As much as I had while making the video.