Back to Browse

Working with text data | Chapter 2 — Build a Large Language Model (From Scratch)

22 views
May 11, 2026
8:59

This chapter dives into the essential data preparation steps required before training an LLM. You will learn how to split text into word and subword tokens, implement advanced tokenization using byte pair encoding, sample training examples using a sliding window approach, and convert these tokens into the vector embeddings that feed into the model.

Download

1 formats

Video Formats

360pmp45.5 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Working with text data | Chapter 2 — Build a Large Language Model (From Scratch) | NatokHD