CS486/686 - Assignments

There will be four assignments given the course, each worth 10% of the final mark (7% for CS686). Each assignment will have a theoretical part and a programming part. Assignments are done individually (i.e., no team). You are free to program in the language of your choice, however Matlab is recommended since it provides a convenient high-level programming environment for matrix operations. If you decide to program in Matlab, the IST group maintains a nice set of online references for Matlab including a tutorial.

The approximate out and due dates are:

A1: out Sept 16, due Oct 2
A2: out Oct 2, due Oct 23
A3: out Oct 28, due Nov 17
A4: out Nov 18, due Dec 1

For each assignment, a hard copy must be handed in on the due date either in class or in slots 6 or 7 of the assignment drop off box #2 (3rd floor of the math building near the bridge to DC). No late assignment will be accepted.

Assignment 1: due Oct 2

Click here for the assignment
Test problems. NB: The handout requests that you report the average time and average number of nodes searched for each category (easy, medium and hard), however there is only one test problem per category so just report the time and number of nodes (no average).
Jie Zhang will hold special office hours on Wednesday Oct 15, 3-5pm in the AI lab (DC2306C) to answer questions regarding the marking scheme.

Assignment 2: due Oct 23

Click here for the assignment
Farheen Omar will hold special office hours on Friday Oct 31, 10-12am in the AI lab (DC2306C) to answer questions regarding the marking scheme.

Assignment 3: due Nov 13 Nov 17

Clarification: as explained in class, learning a decision tree with a depth first construction (as described in the lecture slides and the textbook) will yield an unbalanced tree with a few long skiny branches. To avoid this, use a priority queue to select the next leaf to split based on information gain or do an iterative deepening cinstruction. Iterative deepening permits you to use the depth first approach described in the lecture slides and the textbook, but limits the depth of the tree and gradually increases it. To give you some time to adjust your decision tree learning algorithm, the deadline has been extended to Monday, Nov 17. You can submit your assignment anytime on Monday in the drop off box.
Click here for the assignment
Train and test your algorithms with a subset of the 20 newsgroup dataset. More precisely, you will use the documents posted on the alt.atheism and comp.graphics newsgroup. To save you the trouble of writing a parser for arbitrary text, I converted the relevant documents to a simple encoding (files below). Each line of the files trainData.txt and testData.txt are formatted "docId wordId" which indicates that word wordId is present in document docId. The files trainLabel.txt and testLabel.txt indicate the label/category (1=alt.atheism or 2=comp.graphics) for each document (docId = line#). The file words.txt indicates which word corresponds to each wordId (denoted by the line#). If you are using Matlab, the file loadScript.m provides a simple script to load the files into appropriate matrices. At the Matlab prompt, just type "loadScript" to execute the script. Feel free to use any other language and to build your own parser if you prefer.

Assignment 4: due Dec 1

Click here for the assignment
Clarifications

Question 1b: Assume that the Wi variables can take W values (e.g., W words) and the Ti variables can take T tags when indicating the number of parameters.
Question 2: The "+" in the first rule should be before "c" (e.g., HasWord(+w,p) => Topic(+c,p))