Back to Browse

Vision Transformer

1.0K views
Apr 28, 2024
20:18

After the success in NLP, transformer architecture is adapted for image recognition as Vision Transformer (ViT) Video Contents: 00:00 Introduction 02:20 Extracting Embedding Vectors 05:13 Self-Attention 12:58 Multi-Head Attention 15:46 MLP 16:28 Classification Head 17:36 Comparison of CNN and ViT In this video, animations and images except the ones taken from reference papers belong to me References Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin https://arxiv.org/abs/1706.03762 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby https://arxiv.org/abs/2010.11929 #machinelearning #computervision #deeplearning #ai #aitutorial #education #transformer #visiontransformer #vit #selfattention #multiheadattention #imageprocessing #datascience #computervisionwithhuseyinozdemir

Download

1 formats

Video Formats

360pmp423.9 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Vision Transformer | NatokHD