What is fine-tuning?
A clear, technical intro to fine-tuning β supervised fine-tuning and RLHF, why a base model needs a second pass, and how the same parameters become an assistant that follows instructions. βΆ Watch the full series in order: https://www.youtube.com/playlist?list=PL3k41AsXtY9vSBXqvUU7T_QIfV1SZN5NT π New here? Start at episode 1: https://www.youtube.com/watch?v=GsKplQ_5Pak Concept Stack β one AI concept a day, each one stacks on the last. We're building the framework for AI, together. Subscribe: https://www.youtube.com/@conceptstackai Chapters: 0:00 Hook 0:38 Picking up from pre-training 1:09 Two passes: SFT, then RLHF 1:51 Supervised fine-tuning, in concrete terms 2:28 Why SFT alone isn't enough 3:16 RLHF β the reward model trick 4:00 What reward-model data looks like 4:46 How the model actually learns from rewards 5:30 Constitutional AI and RLAIF 6:20 How much does fine-tuning cost 6:56 What you get at the end 7:46 Outro
Download
0 formatsNo download links available.