One way to understand a machine learning model is to ask a very simple question: uf we slightly change an input feature, how does the prediction change? Instead of dividing responsibility across features, or fitting local surrogate models, gradient-based methods focus on sensitivity. They look at how responsive the model is to small changes in each feature at a specific point:
- If changing a feature slightly leads to a large change in prediction, that feature is locally influential.
- If changing it has little effect, it is locally unimportant.
The core idea is straightforward: importance can be measured by how much the prediction reacts to small changes in each feature.