Do LLMs Know When They're Wrong?

Name: Do LLMs Know When They're Wrong?
Uploaded: Sep 5, 2025
Duration: 848 s

Martin Andrews1.59K subscribers

606 views

Sep 5, 2025

14:08

We're moving past LLMs that just predict the next word. Discover a new frontier: models that can gauge their own uncertainty to improve reasoning. This video explores two brand new papers that turn the "Entropix" meme into practical, working code. Current methods like Chain-of-Thought are powerful, but they are essentially a model "thinking out loud." What if a model could recognize when it's on a bad path and correct itself? This is the core idea behind using token entropy and logprobs as a "confidence" signal. This video is for the AI builder, developer, and enthusiast who wants to look under the hood. We break down the history of this idea (from OpenAI's o-1 hints to Twitter theories) and then dive into the mechanics of two pivotal papers: 1. **ARPO**: Agentic Reinforced Policy Optimization 2. **Deep Think with Confidence**: A practical vLLM implementation from Meta By the end, you'll understand not just *what* LLM confidence is, but *how* it works, and *why* it's a compelling direction for building more capable and efficient agentic systems. --- ### Papers & Resources Mentioned * [ARPO : Agentic Reinforced Policy Optimization (Dong et al., 2025)](https://arxiv.org/abs/2507.19849) + [ARPO GitHub Repo](https://github.com/dongguanting/ARPO) * [Deep Think with Confidence (Fu et al., 2025)](https://arxiv.org/abs/2508.15260) + [DeepThink Project Page (Meta AI)](https://jiaweizzhao.github.io/deepconf/) + [DeepThink Pull Request for vLLM](https://github.com/vllm-project/vllm/pull/23201) * [OpenAI o-1 Blog Post](https://openai.com/index/learning-to-reason-with-llms/) + [Let's Verify Step-by-Step (OpenAI, 2023)](https://arxiv.org/abs/2305.20050) * [ICML 2024 Tutorial: Physics of Language Models](https://www.youtube.com/watch?v=yBL7J0kgldU) --- ### Chapters 00:00 - Introduction: The Idea of LLM Confidence 00:31 - Background: From OpenAI's o-1 to the "Entropix" Meme 05:26 - Paper 1: ARPO & Agentic Rollout Confidence 07:55 - Paper 2: Meta's "Deep Think with Confidence" 09:17 - How It Works: Implementation in vLLM 11:25 - The Catch: Is 512 Rollouts "Real Reasoning"? 12:22 - Wrap-Up: What This Means for AI Builders --- ### About The Channel My channel is for "The AI Builder": the developer, tinkerer, and hands-on enthusiast. We go beyond the headlines to understand the *mechanisms* behind the latest research, empowering you to build the future. From the Lab to Your Laptop. ### Social Links * GitHub: https://github.com/mdda * LinkedIn: https://sg.linkedin.com/in/martinandrews * X (Twitter): https://x.com/mdda123 #AI #LLM #MachineLearning #Research #AIExplained #OpenAI #Meta

Download

0 formats

No download links available.