Benchmark embedding models #5 - Embed text chunks and questions
Hi everyone! In this video, we'll continue our series on benchmarking embedding models. We'll take the text chunks and question-answer pairs we generated in the previous steps and use several different embedding models to convert them into numerical representations (dense vectors). First, I'll give a quick overview of what text embeddings are and why they're so important for tasks like semantic search and RAG (Retrieval-Augmented Generation). We'll look at how they work by converting text into "dense vectors," which allows a computer to understand semantic meaning. We'll also discuss how the dimension (size) of these vectors impacts performance, computational cost, and storage needs. After covering the theory, we'll jump straight into the code. I'll show you how to generate embeddings using popular proprietary (closed-source) models. We'll write Python code in a Jupyter notebook to call the Google Gemini API (using gemini-embedding-001) and the OpenAI API (using both text-embedding-3-small and text-embedding-3-large). I'll also show you how to handle API rate limiting and even calculate the total cost for the OpenAI embeddings. Next, we'll explore open-source alternatives that you can run locally. You'll learn how to use the popular sentence-transformers library to generate embeddings from models like all-MiniLM-L6-v2 and different sizes of the Qwen-3 model (small, medium, and large). We'll even cover how to run quantized GGUF models using llama.cpp, allowing us to get high-quality embeddings from a powerful model running entirely on a local GPU. Throughout the video, we'll process all our text chunks and questions, saving all the resulting embedding vectors into a single, merged JSON file. This file will be the foundation for the next video, where we'll finally benchmark all these models against each other! Here is the link to the GitHub repository: https://github.com/ImadSaddik/Benchmark_Embedding_Models ⭐️ Timestamps (00:00) Intro (00:16) What are Text Embeddings? (00:38) The Purpose of Embeddings (t-SNE Visualization) (02:30) Does Embedding Dimension (Size) Matter? (04:21) Embedding Models & Output Dimensions (05:32) Proprietary vs. Open-Source Models (06:14) MTEB Leaderboard (06:23) Updated Pipeline Diagram (07:26) Coding: Generate Embeddings with Google Gemini (17:30) Coding: Generate Embeddings with OpenAI (text-embedding-3-small) (24:43) Coding: Generate Embeddings with OpenAI (text-embedding-3-large) (25:21) Coding: Generate Embeddings with all-MiniLM-L6-v2 (27:26) Coding: Generate Embeddings with Qwen3-Embedding-0.6B (29:24) Coding: Running Qwen3-Embedding-4B with llama-server (33:37) Coding: Running Qwen3-Embedding-8B with llama-server (36:28) The End
Download
0 formatsNo download links available.