DeepSeek Explained - Design, Training and Key Concepts.
DeepSeek-V3, DeepSeek-R1-Zero, DeepSeek-R1
#deepseek #ai #llm #artificialintelligence
DeepSeek-R1 Paper: https://arxiv.org/abs/2501.12948
Timelines:
00:10 Models
00:26 Training
00:39 DeepSeek-R1-Zero
00:53 GRPO
02:22 DeepSeek-R1
05:47 Distillation
06:31 Transformer Blocks
07:09 Multi-Head Latent Attention
07:47 Mixture of Experts
08:23 Multi-Token Prediction
08:48 Chain of Thought - CoT