What is pre-training?
A clear, technical intro to pre-training — how a transformer with random weights becomes a model that can write and reason, the simple loss function at the heart of it, and why it costs $100M+. ▶ Watch the full series in order: https://www.youtube.com/playlist?list=PL3k41AsXtY9vSBXqvUU7T_QIfV1SZN5NT 📚 New here? Start at episode 1: https://www.youtube.com/watch?v=GsKplQ_5Pak Concept Stack — one AI concept a day, each one stacks on the last. We're building the framework for AI, together. Subscribe: https://www.youtube.com/@conceptstackai Chapters: 0:00 Hook 0:40 Picking up from transformers 1:14 The data: trillions of tokens 2:22 The task: predict the next token 2:57 The loss function 3:39 Why this is self-supervised 4:17 The actual training loop 5:15 The three things that scale 5:52 The dollar cost 6:33 What you get at the end 7:34 Why predicting the next word works at all 8:14 Outro
Download
0 formatsNo download links available.