Back to Browse

Why are diffusion LLMs so fast?

24.7K views
Feb 9, 2026
22:15

This video discusses techniques for making diffusion LLMs faster, including: • Self-Distillation Through Time • Curriculum Learning • Confidence scores for unmasking • Guided diffusion (FlashDLM) • Approximate KV caching (dLLM-Cache, dKV-Cache) • Block diffusion 🔗 Inception: Home: https://www.inceptionlabs.ai/ API: https://docs.inceptionlabs.ai/ X: https://x.com/_inception_ai Stefano Ermon, cofounder & CEO: https://cs.stanford.edu/~ermon/ 📚 Papers Self-Distillation Through Time: https://arxiv.org/abs/2410.21035 FlashDLM (Guided Diffusion): https://arxiv.org/abs/2505.21467 dLLM-Cache: https://arxiv.org/abs/2506.06295 dKV-Cache: https://arxiv.org/abs/2505.15781 Block Diffusion: https://arxiv.org/abs/2503.09573 LLaDA: https://arxiv.org/abs/2502.09992 LLaDA 2.0: https://arxiv.org/abs/2512.15745 Seed Diffusion from ByteDance: https://arxiv.org/abs/2508.02193 Mercury from Inception: https://arxiv.org/abs/2506.17298 ▶️ Other videos on diffusion: https://youtube.com/playlist?list=PL4bm2lr9UVG3SN79Y6WBe4OOlEiO88vie&si=LuSlceom29bz9-WG 00:00 Intro 02:00 Auto-regressive vs diffusion LLMs 03:06 Reducing refinement steps 05:54 Self-Distillation Through Time 07:15 Curriculum learning 08:17 Speeding up sampling 09:40 Confidence scores 11:35 Guided diffusion (FlashDLM) 13:24 Approximate KV caching (dLLM-Cache, dKV-Cache) 19:03 Block diffusion 21:19 Where to find diffusion models

Download

0 formats

No download links available.

Why are diffusion LLMs so fast? | NatokHD