CS486/686 - Assignments
There will be four assignments, each worth 10% of the final mark
(7% for CS686). Each assignment will have a theoretical part and a
programming part. Assignments are done individually (i.e.,
no team).
The approximate out and
due dates are:
- A1: out May 8, due May 19 (11:59 pm)
- A2: out May 23, due June 2 (11:59 pm)
- A3: out June 19, due June 30 (11:59 pm)
- A4: out July 5, due July 21 (11:59 pm)
On the due date of an assignment, the
work done to date should be submitted electronically on the LEARN
website; further material may be submitted with a 2% penalty for
every rounded up hour past the deadline. For example, an
assignment submitted 5 hours and 15 min late will receive a
penalty of ceiling(5.25) * 2% = 12%. Assignments submitted
more than 50 hours late will not be marked.
Assignment
1: due May 19
(11:59 pm)
- Click here for the assignment
- Click here for the test
problems. The problems are taken from www.websudoku.com
and are labeled "easy", "medium", "hard" and "evil" to reflect
the category that each problem is taken from. Note that it
is not clear how those labels are assigned and therefore they
may not reflect the level of difficulty encountered by search
algorithms.
- Timmy Tse (trttse [at] uwaterloo [dot] ca) and Edward Cheung
(eycheung [at] uwaterloo [dot] ca) are the TAs responsible for
Assignment 1. You can meet them by going to the TAs'
office hours on Thursdays 10:30-12:30 in the AI lab (DC2306C).
- The TAs will hold office hours on Thursday June 1, 10:30-12:30
in the AI lab (DC2306C) to answer questions about the marking
scheme.
Assignment
2: due June 2
(11:59 pm)
- Click here for the assignment
- Paulo Pacheco (ppacheco [at] uwaterloo [dot] ca) and Jianqiao
Shen (j26shen [at] uwaterloo [dot] ca) are the TAs responsible
for Assignment 2. You can meet them by going to the TAs'
office hours on Thursdays 10:30-12:30 in the AI lab (DC2306C).
- The TAs will hold office hours on Thursday June 15,
10:30-12:30 in the AI lab (DC2306C) to answer questions about
the marking scheme.
Assignment 3: due June 30 (11:59 pm)
- Click here for the assignment
- Train and test your algorithms with a subset of the 20 newsgroup dataset. More precisely,
you will use the documents posted on the alt.atheism and comp.graphics newsgroup. To
save you the trouble of writing a parser for arbitrary text, I
converted the relevant documents to a simple encoding (files
below). Each line of the files trainData.txt and testData.txt are formatted "docId wordId" which indicates that word wordId is
present in document docId. The files trainLabel.txt and testLabel.txt indicate the
label/category (1=alt.atheism or 2=comp.graphics) for each document (docId =
line#). The file words.txt indicates which
word corresponds to each wordId (denoted
by the line#).
- Daniel Patrick Recoskie (dprecosk [at] uwaterloo [dot] ca) and
Mohamed Sabri (mmsabri [at] uwaterloo [dot] ca) are the TAs in
charge of Assignment 3. You can meet them by going to the
TAs' office hours on Thursdays 10:30-12:30 in the AI lab
(DC2306C).
- The TAs will hold office hours on Thursday July 13,
10:30-12:30 in the AI lab (DC2306C) to answer questions about
the marking scheme.
Assignment
4: due July 21
(11:59 pm)
- Click here for the assignment
- Description of the grid world: gridWorld.py
- Instructions for Question 3:
- Construct a deep Q-network with the following configuration:
- Input layer of 4 nodes (corresponding to the 4 state
features)
- Two hidden layers of 10 rectified linear units (fully
connected)
- Output layer of 2 identity units (fully connected) that
compute the Q-values of the two actions
- Train this neural network by gradient Q-learning with the
following parameter:
- Discount factor: gamma=0.99
- Exploration strategy: epsilon-greedy with epsilon=0.05
- Use the adagradOptimizer(learingRate=0.1),
AdamOptimizer(learningRate=0.1) or
GradientDescentOptimizer(learningRate=0.01). The
Adagrad and Adam optimizers automatically adjust the
learning rate in gradient descent and therefore perform
better in practice.
- Maximum horizon of 500 steps per episode (An episode may
terminate earlier if the pole falls before 500 steps.
The gym simulator will set the flag "done" to true when the
pole has fallen.)
- Train for a maximum of 1000 episodes
- Produce a graph that shows the discounted total reward
(y-axis) earned in each training episode as a function of the
number of training episodes. Produce 4 curves for the
following 4 scenarios:
- Q-learning (no experience replay and no target network)
- Q-learning with experience replay (no target
network). Use a replay buffer of size 1000 and replay
a mini-batch of size 50 after each new experience.
- Q-learning with a target network (no experience
replay). Update the the target network after every 2
episodes.
- Q-learning with experience replay and a target
network. Use a replay buffer of size 1000 and replay a
mini-batch of size 50 after each new experience. Update the
the target network after every 2 episodes.
- Discuss the results. Explain the impact of experience
replay and the target network based on what you observed in
your experiments.
- Submit your code
- Michael Cormier (m4cormie [at] uwaterloo [dot] ca) and Paulo
Pacheco (ppacheco [at] uwaterloo [dot] ca) are the TAs
responsible for Assignment 4. You can meet them by going
to the TAs' office hours on Thursdays 10:30-12:30 in the AI
lab (DC2306C).