Back to Browse

RAG with vision models

1.8K views
Streamed live on Sep 9, 2024
59:05

RAG (Retrieval Augmented Generation) is a way to get LLMs to answer questions grounded in a particular knowledge base. What do you do when your knowledge base includes images, like graphs or photos? You first need to generate embeddings using a multimodal model, like the one available from Azure Computer Vision, search those embeddings using a powerful vector search like Azure AI Search, and then send any retrieved text and images to a multimodal LLM like GPT-4o. Learn how to get started quickly with a RAG on multimodal documents in this session. Presented by Pamela Fox, Python Advocate at Microsoft ** Part of RAGHack, a free global hackathon to developer RAG applications. Join at https://aka.ms/raghack ** 📌 Check out the RAGHack 2024 series here! https://aka.ms/RAGHack2024 #MicrosoftReactor #RAGHack [eventID:23336]

Download

1 formats

Video Formats

360pmp4109.9 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

RAG with vision models | NatokHD