In this video we discuss multi-class classification using the softmax function to model class probabilities. We define the likelihood over all the data and then proceed to discuss maximum likelihood estimation of our class-specific weight vectors using gradient descent.