* This is the residual block, as first described
* Preserves original input
* So even if one layer does nothing, the next can learn with that
* Layers are only learning *diffs* to apply, not entirely new feature maps
---
## Details
* When the skip goes over an increase in feature maps, use 1x1 convolution to add dimensions
* Or save parameters and use an identity
* When the skip goes over dimensionality reduction, increase stride to match the reduction
* e.g. stride 2 to cut feature map size in half
---
## Unravelled