CS489/698 - Assignments
There will be four assignments given the course, each worth 12.5% of
the
final mark (7.5% for CS698). Each assignment will have a theoretical
part
and
a programming part. Assignments are done individually (i.e., no
team). You are free to program in the language of
your choice, however Matlab is recommended since it provides a
convenient high-level programming environment for matrix
operations. If you decide to program in
Matlab, the IST group maintains a nice set of online references for Matlab including a
tutorial.
The approximate out and due
dates
are:
- A1: out Jan 14, due
Feb 2 out Jan 21, due Feb 9
- A2: out Feb 2, due
Feb 23 out Feb 9, due Feb 25
- A3: out Feb 23, due
Mar 11 out March 2, due March 16
- A4: out Mar 11, due
Mar 30 out March 18, due April 1
For each assignment, a hard copy must be
handed in on the due date either in class or in slot X of the
assignment drop off
box #X (3rd floor of the math building near the bridge to DC).
No
late
assignment will be accepted.
NB: Unclaimed assignments can be picked up at Jessica Miranda's
office (DC3516) between 8:30-12 and 1-4:30pm.
Assignment 1: due Feb 9
- Assignment handout
- Datasets: trainData.txt and testData.txt. There is one
row per record and one column per
attribute (the first column is the class label). These datasets
are modified version of the mushroom dataset published on the UCI repository. The
task is to classify mushrooms as edible "e" or poisonous "p" based on
22 attributes that describe various characteristics of mushrooms.
Background information regarding the dataset and a description of each
attribute can be found in the file readme.txt.
- Matlab script to load the data: loadScript.m.
N.B. Feel free to write your own parser to load the data in any
language.
- Notes for submission
- No electronic submission: submit a printout of your code with
your assignment.
- The decision tree found before any pruning may too large to
print or draw by hand. In that case it is fine to print or draw
only the first three levels of your decision tree.
- When pruning the tree, feel free to use n(log(d)+1) as the
description length (as used Equation 13 in the lecture notes) or the
more concise description length n(log(d+2)) that we derived in
class. Also, the log can be either the natural log or base 2 log
(the bound holds in both cases) but natural log gives a tighter
bound. You also need to specify the parameter delta. I
recommend delta=0.2, but feel free to try other values. Whatever
you choose, make sure to indicate the precise
bound on error that you use to prune the tree.
- Correction:
In question 1b), the domain of X should be the union of [0,0.5) with
(0.5,1] instead of [0,1]. Otherwise, Hmirror is empty
since there is no h that satistfies the conditions of Hmirror
at 0.5.
- Click here for the solutions
Assignment 2: due Feb 25
Assignment 3: due March 16
- Assignment handout
- Correction:
In question 1a, the last sentence should read: "Show that if their
convex hulls intersect, the two sets of points cannot be linearly
separable, and conversely that if
they are not linearly separable, their convex hulls intersect."
- Q3 Datasets: trainData.csv, trainLabels.csv, testData.csv, testLabels.csv
- Problem: this data is a modified version of the Optical
Recognition of Handwritten Digits Dataset from the UCI
repository. It contains preprocessed black and white images of
the digits 5 and 6. Each attribute indicates how many pixels are
black in a patch of 4 x 4 pixels.
- Format: there is one row per image and one column per
attribute. The class labels are 5 and 6.
- Q3: what to hand in
- report the accuracy of mixtures of Gaussians and logistic
regression for the training and testing sets. For logistic
regression, report the accuracy after each iteration for the first 3
iterations. Measure the accuracy by adding the probability of the
correct label for each image.
- brief discussion of the results
- printout of the parameters found for each model
- printout of your code
- Click here for the solutions
Assignment 4: due April 1
- Assignment handout
- Corrections:
- In Question 1, the middle factor of the kernel expansion should
not be divided by 2: k(x,x') =
exp(-xTx / 2sigma2)
exp(xTx' / sigma^2) exp(-(x')Tx'
/ 2sigma2).
- In Question 3, the condition for the first case of the
perceptron learning rule should be: if
ynwTphi(xn) <= 0.
- In Question 4, the parameter d for the polynomial kernel is not
the dimensionality of the input space, but simply the degree of the
polynomial. See below for a range of values to experiment with
for d.
- Additional information regarding Q4:
- Datasets: use the same datasets as for Assignment 3
- Suggested parameters for kernel perceptron
- alpha initialization: 0
- sigma in Gaussian kernel: 10 to 20
- degree d in polynomial kernel: 2 to 6
- Suggested parameters for the neural network
- learning rate: 0.001
- number of hidden nodes: 5 to 15
- number of iterations in backprop: 1000
- weight initialization: random number in [-0.05,0.05]
- What to hand in for Q4
- train and test accuracy for a perceptron trained with the
identity kernel.
- graph of the train and test accuracy for a perceptron trained
with a polynomial kernel when the degree of the polynomial varies from
2 to 6
- graph of the train and test accuracy for a perceptron trained
with a Gaussian kernel when sigma varies from 10 to 20
- graph of the train and test accuracy for a neural network with
5 to 15 hidden nodes. Since the sigmoid outputs a number between 0 and
1 that can be interpreted as a probability, measure accuracy by
computing the average probability of the correct class for all data
points.
- brief discussion of the results
- printout of the best parameters found for each model
- printout of your code
- Click here for the solutions