CS489/698 - Assignments

There will be four assignments given the course, each worth 12.5% of the final mark (7.5% for CS698). Each assignment will have a theoretical part and a programming part. Assignments are done individually (i.e., no team). You are free to program in the language of your choice, however Matlab is recommended since it provides a convenient high-level programming environment for matrix operations. If you decide to program in Matlab, the IST group maintains a nice set of online references for Matlab including a tutorial.

The approximate out and due dates are:

A1: out Jan 14, due Feb 2 out Jan 21, due Feb 9
A2: out Feb 2, due Feb 23 out Feb 9, due Feb 25
A3: out Feb 23, due Mar 11 out March 2, due March 16
A4: out Mar 11, due Mar 30 out March 18, due April 1

For each assignment, a hard copy must be handed in on the due date either in class or in slot X of the assignment drop off box #X (3rd floor of the math building near the bridge to DC). No late assignment will be accepted.

NB: Unclaimed assignments can be picked up at Jessica Miranda's office (DC3516) between 8:30-12 and 1-4:30pm.

Assignment 1: due Feb 9

Assignment handout
Datasets: trainData.txt and testData.txt. There is one row per record and one column per attribute (the first column is the class label). These datasets are modified version of the mushroom dataset published on the UCI repository. The task is to classify mushrooms as edible "e" or poisonous "p" based on 22 attributes that describe various characteristics of mushrooms. Background information regarding the dataset and a description of each attribute can be found in the file readme.txt.
Matlab script to load the data: loadScript.m. N.B. Feel free to write your own parser to load the data in any language.
Notes for submission

No electronic submission: submit a printout of your code with your assignment.
The decision tree found before any pruning may too large to print or draw by hand. In that case it is fine to print or draw only the first three levels of your decision tree.
When pruning the tree, feel free to use n(log(d)+1) as the description length (as used Equation 13 in the lecture notes) or the more concise description length n(log(d+2)) that we derived in class. Also, the log can be either the natural log or base 2 log (the bound holds in both cases) but natural log gives a tighter bound. You also need to specify the parameter delta. I recommend delta=0.2, but feel free to try other values. Whatever you choose, make sure to indicate the precise bound on error that you use to prune the tree.

Correction: In question 1b), the domain of X should be the union of [0,0.5) with (0.5,1] instead of [0,1]. Otherwise, H_mirror is empty since there is no h that satistfies the conditions of H_mirror at 0.5.
Click here for the solutions

Assignment 2: due Feb 25

Assignment handout
Datasets:

Click here for the solutions

Assignment 3: due March 16

Assignment handout
Correction: In question 1a, the last sentence should read: "Show that if their convex hulls intersect, the two sets of points cannot be linearly separable, and conversely that if they are not linearly separable, their convex hulls intersect."
Q3 Datasets: trainData.csv, trainLabels.csv, testData.csv, testLabels.csv

Problem: this data is a modified version of the Optical Recognition of Handwritten Digits Dataset from the UCI repository. It contains preprocessed black and white images of the digits 5 and 6. Each attribute indicates how many pixels are black in a patch of 4 x 4 pixels.
Format: there is one row per image and one column per attribute. The class labels are 5 and 6.

Q3: what to hand in

report the accuracy of mixtures of Gaussians and logistic regression for the training and testing sets. For logistic regression, report the accuracy after each iteration for the first 3 iterations. Measure the accuracy by adding the probability of the correct label for each image.

brief discussion of the results
printout of the parameters found for each model
printout of your code

Click here for the solutions

Assignment 4: due April 1

Assignment handout
Corrections:

In Question 1, the middle factor of the kernel expansion should not be divided by 2: k(x,x') = exp(-x^Tx / 2sigma²) exp(x^Tx' / sigma^2) exp(-(x')^Tx' / 2sigma²).
In Question 3, the condition for the first case of the perceptron learning rule should be: if y_nw^Tphi(x_n) <= 0.
In Question 4, the parameter d for the polynomial kernel is not the dimensionality of the input space, but simply the degree of the polynomial. See below for a range of values to experiment with for d.

Additional information regarding Q4:

Datasets: use the same datasets as for Assignment 3
Suggested parameters for kernel perceptron

alpha initialization: 0
sigma in Gaussian kernel: 10 to 20
degree d in polynomial kernel: 2 to 6

Suggested parameters for the neural network

learning rate: 0.001
number of hidden nodes: 5 to 15
number of iterations in backprop: 1000
weight initialization: random number in [-0.05,0.05]

What to hand in for Q4

train and test accuracy for a perceptron trained with the identity kernel.
graph of the train and test accuracy for a perceptron trained with a polynomial kernel when the degree of the polynomial varies from 2 to 6
graph of the train and test accuracy for a perceptron trained with a Gaussian kernel when sigma varies from 10 to 20
graph of the train and test accuracy for a neural network with 5 to 15 hidden nodes. Since the sigmoid outputs a number between 0 and 1 that can be interpreted as a probability, measure accuracy by computing the average probability of the correct class for all data points.
brief discussion of the results
printout of the best parameters found for each model
printout of your code

Click here for the solutions