In this video we introduce the important method of logistic regression, where we model the probability of class membership as a logistic sigmoid applied to a linear weighting of the features. We discuss why this discriminative approach to classification can be more efficient than the generative approach, derive the cross-entropy loss as the negative log likelihood of the model, show that minimising the loss using gradient descent gives the same update as in linear regression (without a link function), and discuss why linearly separable data can result in overfitting.