CS486/686 - Assignments

There will be four assignments, each worth 10% of the final mark (7% for CS686). Each assignment will have a theoretical part and a programming part. Assignments are done individually (i.e., no team).

The approximate out and due dates are:

A1: out May 8, due May 19 (11:59 pm)
A2: out May 23, due June 2 (11:59 pm)
A3: out June 19, due June 30 (11:59 pm)
A4: out July 5, due July 21 (11:59 pm)

On the due date of an assignment, the work done to date should be submitted electronically on the LEARN website; further material may be submitted with a 2% penalty for every rounded up hour past the deadline. For example, an assignment submitted 5 hours and 15 min late will receive a penalty of ceiling(5.25) * 2% = 12%. Assignments submitted more than 50 hours late will not be marked.

Assignment 1: due May 19 (11:59 pm)

Click here for the assignment
Click here for the test problems. The problems are taken from www.websudoku.com and are labeled "easy", "medium", "hard" and "evil" to reflect the category that each problem is taken from. Note that it is not clear how those labels are assigned and therefore they may not reflect the level of difficulty encountered by search algorithms.
Timmy Tse (trttse [at] uwaterloo [dot] ca) and Edward Cheung (eycheung [at] uwaterloo [dot] ca) are the TAs responsible for Assignment 1. You can meet them by going to the TAs' office hours on Thursdays 10:30-12:30 in the AI lab (DC2306C).
The TAs will hold office hours on Thursday June 1, 10:30-12:30 in the AI lab (DC2306C) to answer questions about the marking scheme.

Assignment 2: due June 2 (11:59 pm)

Click here for the assignment
Paulo Pacheco (ppacheco [at] uwaterloo [dot] ca) and Jianqiao Shen (j26shen [at] uwaterloo [dot] ca) are the TAs responsible for Assignment 2. You can meet them by going to the TAs' office hours on Thursdays 10:30-12:30 in the AI lab (DC2306C).
The TAs will hold office hours on Thursday June 15, 10:30-12:30 in the AI lab (DC2306C) to answer questions about the marking scheme.

Assignment 3: due June 30 (11:59 pm)

Click here for the assignment
Train and test your algorithms with a subset of the 20 newsgroup dataset. More precisely, you will use the documents posted on the alt.atheism and comp.graphics newsgroup. To save you the trouble of writing a parser for arbitrary text, I converted the relevant documents to a simple encoding (files below). Each line of the files trainData.txt and testData.txt are formatted "docId wordId" which indicates that word wordId is present in document docId. The files trainLabel.txt and testLabel.txt indicate the label/category (1=alt.atheism or 2=comp.graphics) for each document (docId = line#). The file words.txt indicates which word corresponds to each wordId (denoted by the line#).

Daniel Patrick Recoskie (dprecosk [at] uwaterloo [dot] ca) and Mohamed Sabri (mmsabri [at] uwaterloo [dot] ca) are the TAs in charge of Assignment 3. You can meet them by going to the TAs' office hours on Thursdays 10:30-12:30 in the AI lab (DC2306C).
The TAs will hold office hours on Thursday July 13, 10:30-12:30 in the AI lab (DC2306C) to answer questions about the marking scheme.

Assignment 4: due July 21 (11:59 pm)

Click here for the assignment
Description of the grid world: gridWorld.py
Instructions for Question 3:

Construct a deep Q-network with the following configuration:

Input layer of 4 nodes (corresponding to the 4 state features)
Two hidden layers of 10 rectified linear units (fully connected)
Output layer of 2 identity units (fully connected) that compute the Q-values of the two actions

Train this neural network by gradient Q-learning with the following parameter:

Discount factor: gamma=0.99
Exploration strategy: epsilon-greedy with epsilon=0.05
Use the adagradOptimizer(learingRate=0.1), AdamOptimizer(learningRate=0.1) or GradientDescentOptimizer(learningRate=0.01). The Adagrad and Adam optimizers automatically adjust the learning rate in gradient descent and therefore perform better in practice.
Maximum horizon of 500 steps per episode (An episode may terminate earlier if the pole falls before 500 steps. The gym simulator will set the flag "done" to true when the pole has fallen.)
Train for a maximum of 1000 episodes

Produce a graph that shows the discounted total reward (y-axis) earned in each training episode as a function of the number of training episodes. Produce 4 curves for the following 4 scenarios:

Q-learning (no experience replay and no target network)
Q-learning with experience replay (no target network). Use a replay buffer of size 1000 and replay a mini-batch of size 50 after each new experience.
Q-learning with a target network (no experience replay). Update the the target network after every 2 episodes.
Q-learning with experience replay and a target network. Use a replay buffer of size 1000 and replay a mini-batch of size 50 after each new experience. Update the the target network after every 2 episodes.

Discuss the results. Explain the impact of experience replay and the target network based on what you observed in your experiments.
Submit your code

Michael Cormier (m4cormie [at] uwaterloo [dot] ca) and Paulo Pacheco (ppacheco [at] uwaterloo [dot] ca) are the TAs responsible for Assignment 4. You can meet them by going to the TAs' office hours on Thursdays 10:30-12:30 in the AI lab (DC2306C).