Back to Browse

Vision Transformers (ViT) pytorch code

5.0K views
Nov 29, 2023
16:11

I implemented a vision transformer (ViT) model, which was based on the paper “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”. In this video, the focus is on (1) building a pytorch vision transformer (ViT) model (2) training the model on MNIST dataset which we import from torchvision (3) feeding test samples to the transformer and visualizing its responses. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ - notes+code: https://mashaan14.github.io/YouTube-channel/vision_transformers/2023_11_29_VisionTransformer_MNIST ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ - website https://mashaan14.github.io/mashaan/ - github https://github.com/mashaan14 - X https://twitter.com/mashaan_14 - linkedin https://linkedin.com/in/mashaan ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Chapters: 0:00 start 0:15 acknowledgement 0:47 importing datasets 2:53 model parameters 3:57 converting an image to patches 5:23 class AttentionBlock 6:11 class VisionTransformer 7:32 model printout 8:15 training loop 10:21 inference 10:39 attention map for a test sample 14:43 plotting the attention map ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ References: - Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #attention #transformers #VisionTransformer #imageclassification #mnist #computervision #pytorch #DeepLearningTutorial #MachineLearningProject #AIResearch #CodingTutorial

Download

1 formats

Video Formats

360pmp433.0 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Vision Transformers (ViT) pytorch code | NatokHD