CS486/686 - Assignments
There will be four assignments given the course, each worth 10% of
the
final mark (7% for CS686). Each assignment will have a theoretical part
and
a programming part. Assignments are done individually (i.e., no
team). You are free to program in the language of
your choice, however Matlab is recommended since it provides a
convenient high-level programming environment for matrix
operations. If you decide to program in
Matlab, the IST group maintains a nice set of online references for Matlab including a
tutorial.
The approximate out and due
dates
are:
- A1: out Sept 16, due Oct 2
- A2: out Oct 2, due Oct 23
- A3: out Oct 28, due Nov 17
- A4: out Nov 18, due Dec 1
For each assignment, a hard copy must be
handed in on the due date either in class or in slots 6 or 7 of the
assignment drop off
box #2 (3rd floor of the math building near the bridge to DC).
No
late
assignment will be accepted.
Assignment 1: due Oct 2
- Click here for the assignment
- Test problems. NB: The
handout requests that you report the average time and average number of
nodes searched for each category (easy, medium and hard), however there
is only one test problem per category so just report the time and
number of nodes (no average).
- Jie Zhang will hold special office hours
on Wednesday Oct 15, 3-5pm in the AI lab (DC2306C) to answer questions
regarding the marking scheme.
Assignment 2: due Oct 23
- Click here for the assignment
- Farheen Omar will hold special office
hours
on Friday Oct 31, 10-12am in the AI lab (DC2306C) to answer questions
regarding the marking scheme.
Assignment 3: due Nov 13 Nov 17
- Clarification: as explained in
class, learning a decision tree with a depth first construction (as
described in the lecture slides and the textbook) will yield an
unbalanced tree with a few long skiny branches. To avoid
this, use a priority queue to select the next leaf to split based on
information gain or do an iterative deepening cinstruction. Iterative
deepening permits you to use the depth first approach described in the
lecture slides and the textbook, but limits the depth of the tree and
gradually increases it. To give you some time to adjust your
decision tree learning algorithm, the deadline
has been extended to Monday, Nov 17. You can submit your
assignment anytime on Monday in the drop off box.
- Click here for the assignment
- Train and test your algorithms with a subset of the 20 newsgroup
dataset. More precisely, you will use the documents posted on
the alt.atheism
and comp.graphics
newsgroup. To save you the trouble of writing a parser for
arbitrary text, I converted the relevant documents to a simple encoding
(files below). Each line of the files trainData.txt
and testData.txt
are formatted "docId
wordId" which indicates that word wordId is
present in document docId.
The files trainLabel.txt
and testLabel.txt
indicate the label/category (1=alt.atheism
or 2=comp.graphics)
for each document (docId =
line#). The file words.txt
indicates which word corresponds to each wordId
(denoted by the line#). If you are using Matlab, the file loadScript.m
provides a simple script to load the files into appropriate
matrices. At the Matlab prompt, just type "loadScript" to execute
the script. Feel free to use any other language and to build your
own parser if you prefer.
Assignment 4: due Dec 1
- Click here for the assignment
- Clarifications
- Question 1b: Assume that the Wi
variables can take W values (e.g., W words) and the Ti
variables can take T tags when indicating the number of parameters.
- Question 2: The "+" in the first rule should be before
"c" (e.g., HasWord(+w,p)
=> Topic(+c,p))