In this video, we are going to implement the MobileViT (Mobile Vision Transformer) from scratch in TensorFlow. The MobileViT is proposed in the paper: MobileViT : Light-weight, general-purpose, and Mobile-friendly Vision Transformer.
MobileViT is a light-weight, general-purpose and low-latency network for mobile vision tasks. It combines the strength of CNNs and ViT. The paper proposes a MobileViT block that encodes both local and global information more effectively.
Code: https://github.com/nikhilroxtomar/MobileViT-Implementation/tree/main/tensorflow
Research paper: https://arxiv.org/pdf/2110.02178.pdf
Timeline:
00:00 - Introduction
00:53 - What is MobileViT?
01:18 - What is the need for MobileViT
01:45 - Blocks used in the MobileViT
04:55 - Importing the required libraries
05:30 - Implementing the Inverted Residual block
12:42 - Implementing the MobileViT block
27:56 - Implementing the MobileViT architecture
38:37 - Varients of MobileViT
41:41 - Ending
Support:
- https://www.youtube.com/channel/UClkqp31PHke-f8b8mjiiY-Q/join
- https://www.buymeacoffee.com/nikhilroxtomar
Follow Me:
BLOG: https://idiotdeveloper.com https://sciencetonight.com
TELEGRAM: https://t.me/idiotdeveloper
FACEBOOK: https://www.facebook.com/idiotdeveloper
TWITTER: https://twitter.com/nikhilroxtomar
INSTAGRAM: https://instagram/nikhilroxtomar
PATREON: https://www.patreon.com/idiotdeveloper