hw03

CS 462 Homework 3

This is a “hello world” for convolutional neural networks. You will load some numbers, put them into matrices, and perform the 2D convolution, as per PyTorch’s definition.

The model you will be working with is a small convolutional network:

net = torch.nn.Sequential(
        torch.nn.Conv2d(in_channels=1, out_channels=12, kernel_size=(3,3), padding=0, stride=2),
        # 28x28 input becomes 13x13
        torch.nn.ReLU(),
        torch.nn.Conv2d(in_channels=12, out_channels=24, kernel_size=(3,3), padding=0, stride=2),
        # 13x13 input becomes 6x6
        torch.nn.ReLU(),
        torch.nn.Conv2d(in_channels=24, out_channels=48, kernel_size=(3,3), padding=0, stride=2),
        # 6x6 input becomes 2x2
        torch.nn.ReLU(),
        torch.nn.Flatten(),
        torch.nn.Linear(2*2*48, 60),
        torch.nn.Linear(60, 10),
        )

You will not use the Conv2d class in this assignment. You are being forced to multiply values yourself, to ensure that you understand the operation taking place.

An archive (in tgz format) is provided. It has the following contents:

weight0.pt, weight1.pt, weight2.pt: These are the weights of the convolutional layers.
weight3.pt, weight4.pt: These are the weights of the linear layers.
bias0.pt, bias1.pt, bias2.pt: These are the bias values of the convolutional layers.
bias3.pt, bias4.pt: These are the bias values of the linear layers.
example_test_digit_0.png through example_test_digit_19.png
validate_test_digit_0.png through validate_test_digit_9.png

The sizes of the weights may be confusing. The first convolution has weights with size: (12, 1, 3, 3). The second convolution has weights with size: (24, 12, 3, 3). The first dimension is the number of kernels in the convolution, while the second is the number of inputs. Each kernel in the second convolution is multiplied by a 12 by 3 by 3 region of the input feature map.

Load the weights as in homework 1.

Your program will accept a single argument, which specifies an image file to load. Load it with the PIL Image class.

from PIL import Image
X = torch.tensor(numpy.array(Image.open(sys.argv[1]))/255).view(1, 28, 28).float()

The arrays were saved with torch.save, so they can be loaded with torch.load. When your program is invoked, you may assume that pt are in the same directory (for your own testing, just extract the provided archive in the same directory).

Desired Output

Your program will print the prediction and confidence, as follows:

print(f"Prediction is {prediction} with confidence {confidence:3f}")

The neural network output is turned into a prediction with the softmax function. The maximimum value is the prediction. Confidence is the output corresponding to the predicted class.

decision = torch.nn.Softmax(dim=0)
y_hat = decision(out)
prediction = torch.argmax(y_hat)
y_hat[prediction]

Deliverables

Name your program “hw03.py” and submit it through canvas.

Author Block

Your code must contain an author block at the top of the file. The author block must match the following format:

# Author: your name
# Netid: your netid
# Aid: If in doubt, list anything your referenced. A stack overflow post, chatgpt, gemini, your friend Bob, etc. If referring to an LLM, you must included a link to a particular session. Just saying "I asked chatgpt" isn't a get out of trouble free statement. If two submissions are nearly identical and nothing is listed, I will have to assume that there was copying with an attempt at obfuscation.

Examples:

$ python3 hw03.py example_test_digit_4.png 
Prediction is 4 with confidence 0.999797
$ python3 hw03.py example_test_digit_10.png 
Prediction is 0 with confidence 0.999800
$ for I in `seq 0 9`; do python3 hw03.py validate_test_digit_${I}.png; done
Prediction is 5 with confidence 0.410526
Prediction is 0 with confidence 0.999762
Prediction is 4 with confidence 0.998778
Prediction is 1 with confidence 0.995912
Prediction is 9 with confidence 0.998438
Prediction is 9 with confidence 0.903832
Prediction is 1 with confidence 0.999699
Prediction is 9 with confidence 0.997828
Prediction is 1 with confidence 0.999607
Prediction is 9 with confidence 0.988948

Validation class labels are actually [5, 0, 4, 1, 9, 2, 1, 3, 1, 4], so this model (trained with only 20 samples) has a bias to predict 9. Notice that the confidence has nothing to do with the model being incorrect. This is typical.

Grading Script

Due to the class size, grading will be automated whenever possible. In this case, you can run the grade_hw3.sh script to verify that your program has the expected outputs. As with the first homework, your code will likely either work or fail to work, but do not forgot the comments with your name and netid.