* Convolutions (generally) don't learn sample specific solutions
* Whereas linear layers love them
* Gradients point to minima that must be consistent across samples and location
* Intuitive: convolutions use fewer weights, so they must be biased towards simpler solutions
Here's an example of linear layers fitting to a single data point, even with L2 regularization.
---
## Regularization
* What does this mean for regularization?
* Regularization tends to "improve" models in areas missing datapoints
* So we still need to regularize our convolutional networks
* And (most) convolutional networks still use linear layers for classifiers
---
## Recall