Back to Browse

L11.1-2: NLP and Preparing Text Data

43 views
Jun 17, 2025
29:50

In this video I begin our discussion of how we can process natural human languages using deep neural networks. Natural human languages are sequence information, the order of words in a sentence or paragraph can matter to meaning. So they have some of the properties of timeseries in that they are sequences though not naturally ones captured at a regular time interval. But since they are sequences, things like recurrent layers that we looked at in the last section can be useful to building models with natural languages. In this video I first give some background to processing natural languages with machine learning systems. I then concentrate on some of the basic steps that all natural language input must be preprocessed with to use in a deep neural network. This is known as text vectorization, the process of transforming text into numeric tensors suitable for use in training a neural network. The three basic steps all text must go through are to standardize it to make it easer to process, split the text into unit, known as tokenization, and then build a vocabulary index of the text so you can convert each token into a numerical vector. Resources: Textbook: Chollet (2022). "Deep Learning with Python (2ed)". Manning. https://www.amazon.com/dp/1617296864/?bestFormat=true&k=deep%20learning%20with%20python&ref_=nb_sb_ss_w_scx-ent-pd-bk-d_de_k0_1_15 CSci 560 Class Repository: https://github.com/csci560-nndl/nndl Contains video slides and iPython notebooks for this course. 00:00 Introduction 00:56 Natural language processing (NLP): The bird's eye view 05:09 Text vectorization process 08:52 Text standardization, feature engineering 11:51 Tokenization, splitting text into units 13:38 Text processing model types: sequence models vs. bag-of-word models 18:15 Vocabulary indexing 21:12 Example of standardization, tokenization and vocabulary indexing by hand 24:38 Using the Keras TextVectorization layer 28:00 Summary

Download

0 formats

No download links available.

L11.1-2: NLP and Preparing Text Data | NatokHD