This video breaks down the DeepSeek V4 attention architecture and shows how it balances long-context compression with exact local details. It covers heavily compressed attention, compressed sparse attention, lightning indexer, hybrid attention, shared key-value compressed tokens, attention sinks, and how these mechanisms are scheduled across layers.
Become AI Researcher (Skool) - https://skool.com/become-ai-researcher-2669/about
- 7+ hours of from-scratch video courses - math fundamentals, PyTorch, neural networks, transformers, reinforcement learning, LLMs
- Every lesson is code-first: you build the thing, not just watch it
- Implementation notebooks, exercises, and walkthroughs
- Advanced breakdowns that go deeper than the YouTube tutorials
Chapters:
0:00 DeepSeek V4 attention overview
0:06 Heavily compressed attention
1:14 Compressed sparse attention and lightning indexer
2:49 Hybrid attention and shared key-value tokens
3:22 Attention sink and layer scheduling