Back to Browse

ImageBERT

4.9K views
Feb 7, 2020
11:40

This video explores the ImageBERT model from Microsoft Research! This is a really interesting combination of vision and language tokens to achieve state of the art results on MSCOCO and Flickr30k image and text retrieval tasks! I hope this video helped you get a better sense of how image and text tokens can be combined in the transformer architecture and how self-attention uses visual tokens to inform the text task output of BERT's masked language modeling! Paper Links: ImageBERT: https://arxiv.org/pdf/2001.07966.pdf Conceptual Captions: https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html Thanks for watching! Please Subscribe!

Download

0 formats

No download links available.

ImageBERT | NatokHD