CS480/680 Fall 2020 - Introduction to Machine Learning

There will be six assignments, each worth 10% of the final mark (7% for CS680). Assignments are done individually (i.e., no team). The assignments will consist of a mixture of theoretical questions and programming questions. Some assignments may make use of TensorFlow or PyTorch. For GPU and TPU acceleration, feel free to use Google's Colaboratory environment. This is a free cloud service where you can run Python code (including TensorFlow and PyTorch, which are pre-installed) with GPU or TPU acceleration. A virtual machine with two CPUs and one GPU or TPU will run up to 12 hours after which it must be restarted. The following steps are recommended:

Create a Python notebook in Google Colab
Click on "edit", then "notebook settings" and select "None" (CPU), "GPU" or "TPU" for hardware acceleration.

The approximate out and due dates are:

A1: out Sept 18, due Sept 29 (11:59 pm)
A2: out Sept 29, due Oct 6 (11:59 pm)
A3: out Oct 6, due Oct 19 (11:59 pm)
A4: out Oct 19, due Oct 29 (11:59 pm)
A5: out Oct 29, due Nov 9 (11:59 pm)
A6: out Nov 9, due Nov 20 (11:59 pm)

On the due date of an assignment, the work done to date should be submitted electronically on the LEARN website; further material may be submitted with a 2% penalty for every rounded up hour past the deadline. For example, an assignment submitted 5 hours and 15 min late will receive a penalty of ceiling(5.25) * 2% = 12%. Assignments submitted more than 50 hours late will not be marked.

Assignment 1: due Sept 29 (11:59 pm)

In this assignment, you will implement k-nearest neighbours and linear regression. Then you will test your implementations on some small datasets.

Step 1: Download the datasets

Dataset for K-nearest neighbours: knn-dataset.zip

Origin: this data is a modified version of the Optical Recognition of Handwritten Digits Dataset from the UCI repository. It contains pre-processed black and white images of the digits 5 and 6. Each feature indicates how many pixels are black in a patch of 4 x 4 pixels.
Format: there is one row per image and one column per feature. The class labels are 5 and 6. The label on line n in train_labels.csv is the label for the data point on line n in train_inputs.csv.

Dataset for linear regression: regression-dataset.zip (Dataset corrected Sept 19, 4pm: if you downloaded the dataset before this time, re-download it.)

Origin: this data consists of samples from a 2D surface that you can plot to visualize how linear regression is working.
Format: there is one row per data point and one column per feature. The targets are real values. The target on line n in train_targets.csv is the target for the data point on line n in train_inputs.csv.

Step 2: Implement k-nearest neighbours and linear regression by filling in the functions in the skeleton code. The skeleton code consists of Python Jupyter notebooks:

K-nearest neighbour skeleton: cs480_fall20_asst1_knn_skeleton.ipynb
Linear regression skeleton: cs480_fall20_asst1_linear_regression_skeleton.ipynb
Do not import any additional library. Feel free to run the Jupyter notebooks on any machine or Google Colab. Google Colab is a free cloud environment provided by Google that allows you to run Jupyter notebooks very easily. Python and all necessary libraries are already installed.

Step 3: Submit your assignment via LEARN.

Once you are done filling in all the functions, run each Jupyter notebook entirely and save the results. Make sure that the following results are saved:

K-nearest neighbour results:

A graph that shows the average accuracy based on 10-fold cross validation when varying the number of neighbours from 1 to 30.
The best number of neighbours found by 10-fold cross validation and its cross-validation accuracy.
The test accuracy based on the best number of neighbours

Linear regression results:

A graph that shows the average mean squared error based on 10-fold cross validation when varying the lambda hyperparameter from 0 to 3 in increments of 0.1.
The best lambda found by 10-fold cross validation and its cross validation mean squared error.
The test mean squared error based on the best lambda.

Upload to LEARN the two Jupyter notebooks with the results saved. Do not submit a zip file or pdf file. The TAs will run some of the Jupyter notebooks to verify the results.

Assignment 2: due Oct 6 (11:59 pm)

In this assignment, you will implement logistic regression. Then you will test your implementations on some small datasets.

Dataset: Use the same dataset as for the K-nearest neighbour in Assignment 1
Algorithm implementation: Implement logistic regression by filling in the functions in the skeleton code. The skeleton code consists of a Python Jupyter notebook:

Logistic regression skeleton: cs480_fall20_asst2_logistic_regression_skeleton.ipynb
Do not import any additional library. Feel free to run the Jupyter notebooks on any machine or Google Colab. Google Colab is a free cloud environment provided by Google that allows you to run Jupyter notebooks very easily. Python and all necessary libraries are already installed.

Submission via LEARN: Jupyter notebook

Part I: Once you are done filling in all the functions, run the Jupyter notebook entirely and save the results. Make sure that the following results are saved:

A graph that shows the negative log probabilties based on 10-fold cross validation when varying the lambda hyperparameter from 0 to 200 in increments of 1.
The best lambda found by 10-fold cross validation and its cross validation negative log probability.
The test negative log probability and the test accuracy based on the best lambda

Part II: Discussion of the results. At the end of the Jupyter notebook, answers the following questions:

Logistic regression finds a linear separator where as k-Nearest Neighbours (in Assignment 1) finds a non-linear separator. Add some text at the end of the Jupyter Notebook that compares the expressivity of the separators. Discuss under what circumstances each type of separator is expected to perform best. What could explain the results obtained with KNN in comparison to the results obtained with logistic regression?
Is the training set used in this assignment linearly separable? To answer this question, add some code to the Jupyter Notebook that uses a logistic regression classifier to determine whether the training set is linearly separable. Add some text that explains why this code can determine the linear separability of a dataset. Indicate whether the training set is linearly separable based on the results.

Assignment 3: due Oct 19 (11:59 pm)

In this assignment, you will implement generalized linear regression and Gaussian process regression. Then you will test your implementations on a small dataset.

Dataset: non_linear_regression_dataset.zip
Algorithm implementation: Implement generalized linear regression and Gaussian proces regression by filling in the functions in the skeleton code. The skeleton code consists of Python Jupyter notebooks:

Generalized linear regression skeleton: cs480_fall20_asst3_generalized_linear_regression_skeleton.ipynb
Gaussian process skeleton: cs480_fall20_asst3_gaussian_processs_skeleton.ipynb

October 8, 10:30 am: Typo corrected in the description of the Gaussian kernel. Minus sign added in the exponenent of the Gaussian kernel.
October 13, 1 pm:Changed prior_variance to measurement_variance and added measurement_variance as a parameter to the function cross_validation. For the purpose of the assignment, set the measurement variance to 1.

Do not import any additional library. Feel free to run the Jupyter notebooks on any machine or Google Colab. Google Colab is a free cloud environment provided by Google that allows you to run Jupyter notebooks very easily. Python and all necessary libraries are already installed.

Submission via LEARN: Jupyter notebooks

Part I: Generalized linear regression Fill in all the functions. For the purpose of this assignment, use lambda=1 (no need to optimize lambda by cross validation). Run the Jupyter notebook entirely and save the results. Make sure that the following results are saved:

A graph that shows the mean squared error based on 10-fold cross validation when varying the max degree of the monomial basis functions from 1 to 20 in increments of 1.
The best degree found by 10-fold cross validation and its cross validation mean squared error.
The test mean squared error based on the best degree.
Discussion: Add some text at the end of the Jupyter notebook to answer the following question. What is the training time complexity of generalized linear regression as a function of the amount of training data, the dimensionality of the data in the original feature space and the maximum degree of the monomial basis functions?

Part II: Gaussian process Fill in all the functions. For the purpose of this assignment, use prior_variance=1. Run the Jupyter notebook entirely and save the results. Make sure that the following results are saved:

Identity kernel: test mean squared error
Polynomial kernel: a) A graph that shows the mean squared error based on 10-fold cross validation when varying the degree of the polynomial kernel from 1 to 20 in increments of 1. b) The best degree found by 10-fold cross validation and its cross validation mean squared error. c) The test mean squared error based on the best degree.
Gaussian kernel: a) A graph that shows the mean squared error based on 10-fold cross validation when varying the width of the Gaussian kernel from 0.1 to 1 in increments of 0.1. b) The best width found by 10-fold cross validation and its cross validation mean squared error. c) The test mean squared error based on the best width.
Discussion: Add some text at the end of the Jupyter notebook to answer the following question. What is the training time complexity of Gaussian processes regression as a function of the amount of training data, the dimensionality of the data in the original feature space and the dimensionalty of the data in the new feature space induced by the kernel?

Assignment 4: due Oct 29 (11:59 pm)

In this assignment, you will experiment with fully connected neural networks and convolutional neural networks, using the Keras open source package. Keras is one of the simplest deep learning packages that serves as a wrapper on top of TensorFlow. Preliminary steps:

Familiarize yourself with Keras. Click on "Guides" and read the first two guides: "The functional API" and "The Sequential Model".
Download and install Keras on a machine with a GPU or use Google's Colaboratory environment, which allows you to run Keras code on a GPU in the cloud. Colab already has Keras pre-installed. To enable GPU acceleration, click on "edit", then "notebook settings" and select "GPU" for hardware acceleration. It is also possible to select "TPU", but the Keras code provided with this assignment will need to be modified in a non-trivial way to take advantage of TPU acceleration.
Download the base code for this assignment: cs480_fall20_asst4_cnn_cifar10.ipynb.

Answer the following questions by modifying the base code in cs480_fall20_asst4_cnn_cifar10.ipynb. Submit the modified Jupyter notebook via LEARN.

Part 1 (3 points): Compare the accuracy of the convolutional neural network in the file cs480_fall20_asst4_cnn_cifar10.ipynb on the cifar10 dataset to the accuracy of simple dense neural networks with 0, 1, 2, 3 and 4 hidden layers of 512 rectified linear units each. Run the code in the file cs480_fall20_asst4_cnn_cifar10.ipynb without changing the parameters to train a convolutional neural networks. Then, modify the code in cs480_fall20_asst4_cnn_cifar10.ipynb to obtain simple dense neural networks with 0, 1, 2, 3 and 4 hidden layers of 512 rectified linear units (with a dropout rate of 0.5). Produce two graphs that contain 6 curves (one for the convolutional neural net and one for each dense neural net of 0-4 hidden layers). The y-axis is the accuracy and the x-axis is the number of epochs (\# of passes through the training set). Since neural networks take a while to train, cross-validation is not practical. Instead, produce one graph where all the curves correspond to the training accuracy and a second graph where all the curves correspond to the validation accuracy. Train the neural networks for 20 epochs. Although 20 epochs is not sufficient to reach convergence, it is sufficient to see the trend. Among the models abtained after each epoch, save the model that achieves the best validation accuracy and report its test accuracy. Save the following results in your Jupyter notebook:

The two graphs for training and validation accuracy.
For each architecture, print the test accuracy of the model that achieved the best validation accuracy among all epochs (i.e., one best test accuracy per network architecture).
Add some text to the Jupyter notebook to explain the results (i.e., why some models perform better or worse than other models).

Part 2 (1 point): Compare the accuracy achieved by rectified linear units and sigmoid units in the convolutional neural network in cs480_fall20_asst4_cnn_cifar10.ipynb. Modify the code in cs480_fall20_asst4_cnn_cifar10.ipynb to use sigmoid units. Produce two graphs (one for training accuracy and one for validation accuracy) that each contain 2 curves (one for rectified linear units and another one for sigmoid units). The y-axis is the accuracy and the x-axis is the number of epochs. Train the neural networks for 20 epochs. Although 20 epochs is not sufficient to reach convergence, it is sufficient to see the trend. Save the following results in your Jupyter notebook:

The two graphs for training and validation accuracy.
For each activation function, print the test accuracy of the model that achieved the best validation accuracy among all epochs (i.e., one best test accuracy per activation function).
Add some text to the Jupyter notebook to explain the results (i.e., why one model performs better or worse than the other model).

Part 3 (2 points): Compare the accuracy achieved with and without drop out as well as with and without data augmentation in the convolutional neural network in cs480_fall20_asst4_cnn_cifar10.ipynb. Modify the code in cs480_fall20_asst4_cnn_cifar10.ipynb to turn on and off dropout as well as data augmentation. Produce two graphs (one for training accuracy and the other one for validation accuracy) that each contain 4 curves (dropout with data augmentation, dropout with no data augmentation, no dropout with data augmentation, no dropout with no data augmentation). The y-axis is the accuracy and the x-axis is the number of epochs. Produce curves for as many epochs as you can up to 100 epochs. No marks will be deducted for doing less than 100 epochs, however make sure to explain what you expect to see in the curves as the number of epochs reaches 100.

The two graphs for training and validation accuracy.
For each combination of dropout and data augmentation, print the test accuracy of the model that achieved the best validation accuracy among all epochs (i.e., one best test accuracy per combination of dropout and data augmentation).
Add some text to the Jupyter notebook to explain the results (i.e., why did some models perform better or worse than other models and are the results consistent with the theory).

Part 4 (1 point): Compare the accuracy achieved when training the convolutional neural network in cs480_fall20_asst4_cnn_cifar10.ipynb with three different optimizers: RMSprop, Adagrad and Adam. Modify the code in cs480_fall20_asst4_cnn_cifar10.ipynb to use the Adagrad and Adam optimizers (with default parameters). Produce two graphs (one for training accuracy and the other one for validation accuracy) that each contain 3 curves (for RMSprop, Adagrad and Adam). The y-axis is the accuracy and the x-axis is the number of epochs. Produce curves for as many epochs as you can up to 100 epochs.

The two graphs for training and validation accuracy.
For each optimizer pringt the test accuracy of the model that achieved the best validation accuracy among all epochs (i.e., one best test accuracy per optimizer).
Add some text to the Jupyter notebook to explain the results (i.e., why did some optimizers perform better or worse than other optimizers).

Part 5 (3 points): Compare the accuracy of the convolutional neural network in cs480_fall20_asst4_cnn_cifar10.ipynb with a modified version that replaces each stack of (CONV2D, Activation, CONV2D, Activation) layers with 3x3 filters by a smaller stack of (CONV2D, Activation) layers with 5x5 filters. Produce two graphs (one for training accuracy and the other one for validation accuracy) that each contain 2 curves (for 3x3 filters and 5x5 filters). The y-axis is the accuracy and the x-axis is the number of epochs. Produce curves for as many epochs as you can up to 100 epochs.

The two graphs for training and validation accuracy.
For each filter configuration, print the test accuracy of the model that achieved the best validation accuracy among all epochs (i.e., one best test accuracy per filter configuration).
Add some text to the Jupyter notebook to explain the results (i.e., why did one architecture perform better or worse than the other architecture).

Assignment 5: due Nov 9 (11:59 pm)

In this assignment, you will experiment with various types of recurrent neural networks (RNNs) in PyTorch. PyTorch is a popular alternative to Keras and TensorFlow that has become quite popular in recent years. It is more intuitive than TensorFlow, while giving the programmer greater control than Keras. Preliminary steps:

Familiarize yourself with PyTorch by going through the tutorial Get familiar with PyTorch: a 60 minute blitz
Download and install PyTorch on a machine with a GPU or use Google's Colaboratory environment, which allows you to run PyTorch code on a GPU in the cloud. Colab already has PyTorch pre-installed. The first two parts below do not require a GPU, but the third part can be accelerated with a GPU. To enable GPU acceleration, click on "edit", then "notebook settings" and select "GPU" for hardware acceleration. It is also possible to select "TPU", but the PyTorch code provided with this assignment will need to be modified in a non-trivial way to take advantage of TPU acceleration.
Download the base code for this assignment:

Part 1: cs480_fall20_asst5_char_rnn_classification.ipynb
Part 2: cs480_fall20_asst5_char_rnn_generation.ipynb
Part 3: cs480_fall20_asst5_seq2seq_translation.ipynb

Answer the following questions by modifying the base code in each notebook. Submit the modified Jupyter notebooks via LEARN.

Part 1 (4 points): Encoder implementation in cs480_fall20_asst5_char_rnn_classification.ipynb. Compare the accuracy of the encoder when varying the type of hidden units: linear units, gated recurrent units (GRUs) and long short term memory (LSTM) units. For linear hidden units, just run the script of the Jupyter notebook as it is. For GRUs and LSTMs, modify the base code. Save the following results in your Jupyter notebook:

Two graphs that each contain 3 curves (linear hidden units, GRUs and LSTM units). The first graph displays the training loss and the second graph displays the validation loss. In both graphs, the y-axis is the negative log likelihood and the x-axis is the number of thousands of iterations.
For each type of hidden unit, print the test loss and the test confusion matrix of the model that achieved the best validation loss among all iterations (i.e., one best test loss and test confusion matrix per type of hidden unit).
Explanation of the results (i.e., why some hidden units perform better or worse than other units).

Part 2 (4 points): Decoder implementation in cs480_fall20_asst5_char_rnn_generation.ipynb. Compare the accuracy of the decoder when varying the information fed as input to the hidden units at each time step: i) previous hidden unit, previous character and category; ii) previous hidden unit and previous character; iii) previous hidden unit and category; iv) previous hidden unit. For i), just run the Python notebook as it is. For ii) and iv) modify the code to feed the category as input to the hidden unit(s) of the first time steps only. For iii) and iv), modify the code to avoid feeding the previous character as input to each hidden unit. Save the following results in your Jupyter notebook:

Two graphs that each contain 4 curves (i, ii, iii, iv). The first graph displays the training loss and the second graph displays the validation loss. In both graphs, the y-axis is the negative log likelihood and the x-axis is the number of 500 iterations.
For each architecture, print the test loss of the model that achieved the best validation loss among all iterations (i.e., one best test loss per architecture).
Explanation of the results (i.e., how does the type of information fed to the hidden units affect the results).

Part 3 (2 points): Seq2seq implementation in cs480_fall20_asst5_seq2seq_translation.ipynb. Compare the accuracy of the seq2seq model with and without attention. For the seq2seq model with attention, just run the base code as it is. For the seq2seq model without attention, modify the base code. Save the following results in your Jupyter notebook:

Two graphs that each contain 2 curves (with attention and without attention). The first graph displays the training loss and the second graph displays the validation loss. In both graphs, the y-axis is the negative log likelihood and the x-axis is the number of thousands of iterations.
For each architecture, print the test loss of the model that achieved the best validation loss among all iterations (i.e., one best test loss per architecture).
Explanation of the results (i.e., how does attention affects the results).

Assignment 6: due Nov 20 (11:59 pm)

In this assignment, you will implement a variational auto-encoder (VAE) and a generative adversarial network (GAN) in PyTorch to generate images similar to those in the MNIST dataset. As a starting point, the code for a deterministic auto-encoder (DAE) is provided. While DAEs achieve good reconstruction of the original images, they struggle to generate new images that are similar to those in MNIST. Implement a VAE and GAN to generate better images. Download the skeleton code for this assignment:

Part 1: cs480_fall20_asst6_vae_skeleton.ipynb
Part 2: cs480_fall20_asst6_gan_skeleton.ipynb

Fill in the functions in each skeleton notebook and answer the following questions in each notebook. Submit the Jupyter notebooks via LEARN.

Part 1 (4 points): VAE implementation in cs480_fall20_asst6_vae_skeleton.ipynb. Fill in the functions and save the following results in your Jupyter notebook:

Two graphs that each contain 2 curves (DAE and VAE). The first graph displays the training reconstruction loss and the second graph displays the testing reconstruction loss. In both graphs, the y-axis is binary cross entropy and the x-axis is the number of epochs.
Print a sample of generated images after each epoch of training for both DAEs and VAEs.
Explanation of the results (i.e., compare and explain the binary cross entropy and the quality of the sampled images generated by DAEs and VAEs).

Part 2 (6 points): GAN implementation in cs480_fall20_asst6_gan_skeleton.ipynb. Checkout the following tutorial for GANs. Fill in the functions and save the following results in your Jupyter notebook:

Two graphs that each contain 2 curves (Generator and Discriminator losses). The first graph displays the training loss and the second graph displays the testing loss. In both graphs, the y-axis is binary cross entropy and the x-axis is the number of epochs.
Print a sample of generated images after each epoch of training for your GAN.
Explanation of the results (i.e., compare and explain the quality of the sampled images generated by VAEs and GANs).