Working with text data | Chapter 2 — Build a Large Language Model (From Scratch)

Name: Working with text data | Chapter 2 — Build a Large Language Model (From Scratch)
Uploaded: May 11, 2026
Duration: 539 s
Description: This chapter dives into the essential data preparation steps required before training an LLM. You will learn how to split text into word and subword tokens, implement advanced tokenization using byte pair encoding, sample training examples using a sliding window approach, and convert these tokens into the vector embeddings that feed into the model.

BookSpokify144 subscribers

22 views

May 11, 2026

8:59

This chapter dives into the essential data preparation steps required before training an LLM. You will learn how to split text into word and subword tokens, implement advanced tokenization using byte pair encoding, sample training examples using a sliding window approach, and convert these tokens into the vector embeddings that feed into the model.

Download

1 formats

Video Formats

360pmp45.5 MB

Download

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.