Back to Browse

3.2 - The Bias-Variance Decomposition (with Code!) - Pattern Recognition and Machine Learning

595 views
Nov 13, 2024
44:47

In this section, we return to the problem of overfitting and how to pick the right regularization parameter. We discuss a frequentist approach to this problem, which considers the average performance of our prediction, measured using squared loss, over multiple datasets drawn from the same distribution. This yields a decomposition of the average loss into a sum of three terms: a squared bias term that measures how well our average predictions do, a variance term that measures how variable our predictions are across datasets, and an irreducible noise term. Using a simple toy example we see how low regularization leads to models with low bias and high variance, high regularization reduces the complexity of our models, producing high bias but low variance, and the optimal regularization is somewhere in between these. At 35:38 we finish by going over code I've provided that qualitatively reproduces the figures in the section, showing how the various terms of the decomposition are computed for toy data. The repository with the code is https://github.com/stootoon/prml-bias-variance-decomposition

Download

0 formats

No download links available.

3.2 - The Bias-Variance Decomposition (with Code!) - Pattern Recognition and Machine Learning | NatokHD