Why Adding the Right Document Can Make AI Dumber

Name: Why Adding the Right Document Can Make AI Dumber
Uploaded: May 12, 2026
Duration: 616 s

VisualCompute31 subscribers

29 views

May 12, 2026

10:16

Most RAG systems retrieve the right document… and still fail. Researchers proved that placing the correct chunk in the middle of a long prompt can make GPT-3.5-Turbo perform *worse* than having no retrieval at all. That’s not a model bug — it’s a structural failure that every tutorial RAG pipeline hides. This video shows the three real failure modes documented in the literature and the exact production components that fix them. ────────────────────────────────────────── CHAPTERS ────────────────────────────────────────── 00:00 Hook — the Liu contradiction 00:15 Tutorial RAG walkthrough (Phase B lifecycle) 01:30 Failure Mode 1: Lost in the Middle (Liu et al. 2023) 03:00 Production Fix 1: Reranking (cross-encoder) 03:20 Pattern interrupt — what the baseline is hiding 03:50 Failure Mode 2: Lexical/Semantic Mismatch (BEIR benchmark) 05:08 Production Fix 2: Hybrid Search (BM25 + dense + RRF) 06:07 Failure Mode 3: Context-less Chunks (Anthropic, 2024) 07:30 Production Fix 3: Contextual Retrieval 08:11 Stack synthesis — 5.7% → 1.9% failure rate ────────────────────────────────────────── SOURCES ────────────────────────────────────────── • Lost in the Middle (Liu et al. 2023): https://arxiv.org/abs/2307.03172 • RAG original paper (Lewis et al. 2020): https://arxiv.org/abs/2005.11401 • BEIR benchmark (Thakur et al. 2021): https://arxiv.org/abs/2104.08663 • Contextual Retrieval (Anthropic, Sept 2024): https://www.anthropic.com/news/contextual-retrieval • Passage Re-ranking with BERT (Nogueira & Cho 2019): https://arxiv.org/abs/1901.04085 • Reciprocal Rank Fusion (Cormack et al. 2009): https://dl.acm.org/doi/10.1145/1571941.1572114 ────────────────────────────────────────── SIMPLIFICATIONS NAMED IN THE VIDEO ────────────────────────────────────────── • Joint retriever-generator training omitted (Lewis et al. §2.4) • Embedding model internals abstracted • LLM internals abstracted (KV-cache covered in next episode) • Vector index internals abstracted (HNSW, IVF, quantization) • Single corpus, single index assumed • No query rewriting, HyDE, or multi-query shown. RAG production, retrieval augmented generation, lost in the middle, contextual retrieval, hybrid search, BM25 dense retrieval, reranking LLM, RAG failure modes, vector database, AI engineering, LLM production, semantic search, BEIR benchmark, reciprocal rank fusion, RAG tutorial vs production, AI engineering explained, LLM retrieval, RAG pipeline, machine learning engineering, llm engineering

Download

0 formats

No download links available.