# CS 462 - Lecture 01

## Introduction to Deep Learning

Bernhard Firner

2026-01-20

---

## Course Details

* My email: `bfirner@cs.rutgers.edu`
* Canvas
* Office: Hill 273
* Office hours: TBD

---

## Syllabus: Book

* Understanding Deep Learning by Simon J. D. Prince
  * [udlbook.com](https://udlbook.com/)
* We will stay close to the text so that you have an easy reference
* We will also cover a few readable and interesting academic papers

---

## Syllabus: Tools

* Python and PyTorch
  * [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)
  * [https://docs.pytorch.org/docs/stable/index.html](https://docs.pytorch.org/docs/stable/index.html)

---

## Resources

* All examples will run on iLab machines (on CPU, but could be slow)
  * [https://resources.cs.rutgers.edu/docs/instructional-lab/](https://resources.cs.rutgers.edu/docs/instructional-lab/)
  * [https://resources.cs.rutgers.edu/docs/using-python-on-cs-linux-machines/](https://resources.cs.rutgers.edu/docs/using-python-on-cs-linux-machines/)
* Slurm will allow GPU access
  * [https://resources.cs.rutgers.edu/docs/scheduler-for-gpu-jobs/](https://resources.cs.rutgers.edu/docs/scheduler-for-gpu-jobs/)
  * [https://resources.cs.rutgers.edu/docs/limitation-enforced-on-cs-linux-machines/](https://resources.cs.rutgers.edu/docs/limitation-enforced-on-cs-linux-machines/)

---

## Structure

* Lectures
  * Introductions to concepts
  * Mostly theoretical
* Recitations
  * Starting with a crash course in python
  * Will usually go over example code and sample questions

---

## Syllabus: Grading

* 30% Final
* 20% Midterm
* 10% Recitation Assignments
* 40% Homeworks

---

## Academic Integrity

* [CS Academic Integrity Policy](https://www.cs.rutgers.edu/academics/undergraduate/academic-integrity-policy)
* Academic Integrity applies to both exams and assignments
* Violating the integrity policy negatively impacts your classmates, and current and future Rutgers students
  * So enforcement will be strict

---

## Assistance on Assignments

* Your work must be your own
  * But may ask for advice from other students, on canvas, in recitation and in office hours
    * And you can use online material, including LLMs, to clarify any confusion
* Afraid of being punished just because you looked something up?
  * Don't be!

---

## Documenting Work

* Note any assistance in a comment at the top of the file
  * I'll give you a format to follow
  * If you cross over the line from "assistance" to "copying", but you have noted your sources then I can assume no ill-intent
    * At least the first time it happens
  * If you note *no* references, but hand in a copy of another student's work, I will assume ill-intent

---

## Absences & Late assignments

* Use the self-reporting tool
  * [https://sims.rutgers.edu/ssra/](https://sims.rutgers.edu/ssra/)
* Late assignments
  * 10% per day, down to 50%
  * Accept assignments up to 7 days late
  * Exceptions for major illnesses
* Missed exams
  * You must have a valid excuse to schedule a make up exam

---

## Exceptions

* Inform me ahead of time of special circumstances
  * Meaning non-emergency situations
  * Do not tell me the day of an exam that you have a club activity or that you've run off to Cancun
* But if you have an emergency, please don't stop to email me
  * You can self-report after the fact

---

## Appropriate Conversations

* I can help with problems related to this class
* Can probably help with professional questions
* I'm not a psychologist. I probably can't help with other issues.
  * I am also required to report some topics, so don't assume everything you tell me will remain private

---

## What is Deep Learning?

* Depending upon your definition, it's either been around since the 90s or is more recent
* Nowadays, we generally mean a neural network with multiply layers
* Some of the first profitable uses were for automated check cashing
  * That was in the 90s

---

## AI Winter

* There was little visible progress from the 90s to the 2010s
  * That's not to say that people weren't making progress, but they garnered little attention
* So what happened?
  * Progress was harder in the 90s, and people were disappointed
  * Datasets are larger and easier to manage now
  * And hardware speed and parallelism have also vastly improved

---

## Deep Learning, 90s View

</div>
</div>

---

## Deep Learning, Today

</div>
</div>

---

## Reality

* In reality, deep learning is mostly about data
  * How can we get data?
  * How can we get labels?
    * Which labels are the best ones?
  * What tricks let us use fewer labels or less data?
* Training the neural network is straightforward compared to the data science part

---

## Course Outline

* This course is a shallow look at deep learning
* Notations
* Fundamental concepts to build up an intuition
  * First $\frac{1}{2}$ of the course
* A few applications and specific networks
  * Second $\frac{1}{2}$ of the course

---

</div>
</div>

---

</div>
</div>

---

</div>
</div>

---

## Prerequisite Knowledge

* This course does not have many prerequisites
* However, there are some key concepts that you should understand
  * Derivatives and maximization/minimization
  * Linear algebra and matrices
* There are also some things that will make the course content easier
  * Linear regression and other ML fundamentals

---

## Uses of Deep Learning

* Deep learning can be used for just about anything
* Generally, most outputs are one of:
  * Regression
  * Classification
  * Generation

---

## Regression

* Regression just means that we create a **model** with a continuous output
* Examples
  * Predict someone's height or weight
  * Predict the image coordinates of bounding box for an object
  * Predict real world coordinates of a path
  * Assign a score to a suggested action in a game

---

## Classification

* Classification gives scores to different possible classes
* Examples
  * Classify the letter in an image
  * Classify the object in an image
  * Classify if an action is something that a human would do

---

## Supervised Learning

* Those were all examples of supervised learning
  * Meaning that we have target labels for the model to learn
    * Either values for regression or class labels for classification
* But the world has far more unlabelled data than (correctly) labelled data
  * So how can we use all of that?

---

## Unsupervised Learning

* Unsupervised tasks don't use labels
* Clustering is an example
  * K-means, etc (you've heard of this somewhere, right?)
* For neural networks, this is usually training a model for a generative task
  * Fill in the missing words/pixels, match an image to a caption, etc

---

## Practical Experience

* We will spend more time on supervised learning
* In the real world, systems tend to be a mashup of unsupervised and supervised techniques, ML and deep learning
* Beyond the scope of this class
  * Some homeworks will try to give a small insight into data

---

## Data

* If you try to make a deep learning model and wind up disappointed, it is usually because of data
  * You don't have enough data
  * You didn't want to pay for the right data
    * Collection, transfer, storage, labelling, etc
  * The data you need is unobtainable
    * e.g. high fidelity 3D models of everything

---

## Progress Through Data

* Many advances are related to data
  * New ways to use existing data
  * New ways to generate data
  * New ways to collect data
* So let's spend a few moments reviewing datasets

---

## Data Qualities

* Sample Vs Population Statistics
  * Variance
  * Bias
  * Noise
* Separability

---

## Bias

* Imagine you are collecting data from a dashcam
  * If you collect a frame once every 5 miles, you will mostly end up with highway images
  * If you collect a frame once every minute, you may find a lot of highways and traffic lights
* The way that data is collected often leads to **bias**

---

## Bias Vs Variance

* Don't confuse bias and variance in a dataset
* Let's take handwritten digits as an example
  * Stroke width, stroke angles, or character size are examples of variance
  * Missing 7s that have the European line through the center is an example of bias

---

## Overfitting

* When we say that our trained model is **overfitting**, we usually mean that it is fitting our training data, but not the real world
  * Why?
  * Training data is a **sample** of the entire data population
  * It is nearly always biased, so strong **regularization** is required
* Regularization will get lots of attention throughout this course

---

## Testing Set

* We train on a training set of data, and test on a testing set
* The testing set is generally smaller than the training set
  * It should be as reflective of true population statistics as possible
  * Capturing all of the variance is impossible, but the statistics should be representative
  * Most importantly, it should avoid bias

---

## Noise

* How are bias and variance different from noise?
* Noise means things that aren't in the original signal
  * Maybe there is grease on a camera lens
    * Always there, but different in different pictures
  * Or perhaps some bounding boxes are always wrong by a few pixels because labelling accurately on a track pad is too hard

---

## Noise Vs Variance

* Consider a training set of handwritten digits
  * If strokes are different because some pens have more ink that others, we could call that variance
  * If strokes are different because someone leaned up against the writing, that is noise
    * Meaning it is not reflective of anything in the true data

</div>
<div class="col">
<img style="width: 100%" class="r-stretch" src="./figures/Larry_OriginalDigitsCropped.png" />

</div>
</div>

---

## Gaussian Vs Non-Gaussian Noise

* Gaussian noise can almost always be ignored
  * Why? It is unbiased. Any algorithm that guesses the mean of the observations will have removed the noise.
    * Example: a bad trackpad that leads to bounding boxes off by up to 5 pixels in any direction
* Biased noise cannot be ignored
  * Example: a bad user interface that leads to bounding boxes off by 5 pixels to the left

---

## Separability

* Let's say we are classifying between class A and class B
  * Can we draw a line or hyperplane to divide them perfectly?
  * If yes, the data is linearly separable
  * If not, the data is not linearly separable
* This concept will show up many times, so scream if you don't understand it

</div>
<div class="col">
<img style="width: 80%" class="r-stretch" src="./figures/separable_classes.png" />

</div>
</div>

---

## Optional Background

* `The Two Cultures` by Leo Breiman
  * [Paper in statistical science](https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full)
* Neural networks are robust to overfitting
* This allows us to throw a huge model at problems
  * Without fully grasping the problem space
* This can make them an *unbiased* learners, free from human assumptions

---

## DL Parts

* Model
* Loss function
* Training algorithm
* Regularization
  * Often incorporated into the model and the cost function

---

## Building Blocks

* The simplest building block of a neural network is a line
  * $y = \phi_0 + \phi_1x$
  * This is called a **neuron**
  * $\phi$ is the set of all model parameters
* Training a single neuron is the same as solving linear regression

---

## Multiple Neurons

* Any non-example neural network has multiple neurons
* There are other methods that ensemble linear regression
  * AdaBoost, for example
  * But Neural network structures have more expressive power

---

## Next Topics

* We'll begin talking about actual neural networks next time
  * Start off with linear regression
* We'll begin by building up our notation and then work up to a real network in a few lectures

---

## Your TODOs

* Get the book
* Didn't take 461? Never heard of regression?
  * It's a good idea to look that up
* First recitation will be a brief python tutorial
  * Will also cover some matrix math concepts in pytorch
* If you have trouble picking up new programming languages, it will not hurt to go through an online tutorial