🔤 Tokenizing Text - Live Coding with Sebastian Raschka (Chapter 2.2)
Check out Sebastian Raschka's book 📖 Build a Large Language Model (From Scratch) | https://hubs.la/Q03l0mSf0 📖 Watch LLM Research Engineer and bestselling author @SebastianRaschka dive into one of the most critical steps in building large language models (LLMs): tokenization. This hands-on session walks through Chapter 2.2 of his Manning book "Build a Large Language Model (From Scratch)", showing exactly how to convert raw text into tokens a model can learn from. 0:00 - Introduction to Chapter Two 0:40 - Overview of LLM Building Process 1:03 - Overview of the Data Preparation 4:14 - Setting Up the Dataset 8:49 - Tokenization with Regular Expresions 12:44 - Applying Tokenization to Dataset 13:46 - Conclusion and Next Steps 📕 About the Book Build a Large Language Model (From Scratch) is a practical and eminently-satisfying hands-on journey into the foundations of generative AI. Without relying on any existing LLM libraries, you’ll code a base model, evolve it into a text classifier, and ultimately create a chatbot that can follow your conversational instructions. And you’ll really understand it because you built it yourself! 🎓 Whether you're building your own GPT-style model or just want to demystify tokenization, this chapter is a must-watch. 🔗 Get the Book: https://hubs.la/Q03l0mSf0 📺 Subscribe for more deep learning tutorials and walkthroughs from top ML authors. #SebastianRaschka #LLM #Tokenization #NLP #MachineLearning #DeepLearning #ManningPublications #LiveCoding #Transformers #PyTorch
Download
0 formatsNo download links available.