* Old NNs (like LeNet) had distinct sections
* Convolutions in the beginning
* Break image features into semantic features
* Linear layers at the end
* Pulled features apart for classification
* LeNet, fyi, was used to read zip codes in the 90's
---
## LeNet 5 (1998)
[1998 paper](https://proceedings.neurips.cc/paper/1989/hash/53c3bce66e43be4f209556518c2fcb54-Abstract.html)