RAG (Retrieval Augmented Generation) is a way to get LLMs to answer questions grounded in a particular knowledge base. What do you do when your knowledge base includes images, like graphs or photos? You first need to generate embeddings using a multimodal model, like the one available from Azure Computer Vision, search those embeddings using a powerful vector search like Azure AI Search, and then send any retrieved text and images to a multimodal LLM like GPT-4o. Learn how to get started quickly with a RAG on multimodal documents in this session.
Presented by Pamela Fox, Python Advocate at Microsoft
** Part of RAGHack, a free global hackathon to developer RAG applications. Join at https://aka.ms/raghack **
📌 Check out the RAGHack 2024 series here! https://aka.ms/RAGHack2024
#MicrosoftReactor #RAGHack
[eventID:23336]