This video will teach you everything there is to know about the WordPiece algorithm for tokenization. How it's trained on a text corpus and how it's applied to tokenize texts.
This video is part of the Hugging Face course: http://huggingface.co/course
Related videos:
- Byte Pair Encoding Tokenization: https://youtu.be/HEikzVL-lZU
- Unigram Tokenization — https://youtu.be/TGZfZVuF9Yc
Don't have a Hugging Face account? Join now: http://huggingface.co/join
Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20
Subscribe to our newsletter: https://huggingface.curated.co/