CS886 - Assignments

There will be five assignments, each worth 7% of the final mark. Each assignment must be done individually (i.e., no team) and will consist entirely of programming questions. More precisely, you will be asked to program some algorithms for sequential decision making and reinforcement learning, and to test them on some datasets. Programs must be written in Python and submitted via Marmoset, which is an automated system that runs and evaluates programs.

Assignment 2: out July 11, due July 25

In this assignment, you will program two reinforcement learning algorithms (Q-learning and model-based active RL) and three bandit algorithms (epsilon-greedy, Thompson sampling and UCB) in Python (Python 2.6 is sufficient). More specifically, fill in the functions in the skeleton code of the file RL.py. This file requires the file MDP.py that you programmed for assignment 1 so make sure to include it in the same directory. The file TestRL.py contains simple RL and bandit problems to test your functions (i.e. the output of each function will be printed to the screen). You can verify that your code compiles properly by running "python TestRL.py".

We will not use Marmoset for this assignment due to the stochastic nature of RL, which makes it difficult to ensure that everyone has the same output. Instead, you will produce some graphs showing the convergence rate of each algorithms on some test problems. Instructions to that effect will be posted shortly.

Skeleton code: RL.py (requires MDP.py from assignment 1)
Simple RL and bandit problems to test your code: TestRL.py

Hand in two graphs and a brief discussion that compare the convergence rate of the RL and bandit techniques. Submit your graphs and discussion by email. Suggestion: use matplotlib in Python to produce graphs. More precisely, use the function pyplot.errorbar to produce graphs with error bars.

RL graph:

X axis: episode # (from 0 to 100)
Y axis: average reward per episode (20 steps) with error bars based on 100 trials. For the error bars, report the standard deviation of the empirical mean (a.k.a. standard error). The standard deviation of the empirical mean is std/sqrt(n) where n is the sample size (i.e. # trials)
Two curves: Q-learning, model-based active RL

Problem: Advertising/saving problem in TestRL.py
Initial State: 0
Initial Q function (in Q-learning): all zeros
Initial R (in model-based RL): all zeros
default T (in model-based RL): all uniform distributions
Epsilon: 0.3
Temperature (in Q-learning): 0

Bandit graph:

X axis: episode # (from 0 to 100)
Y axis: average reward per episode (1 step) with error bars based on 1000 trials. For the error bars, report the standard deviation of the empirical mean (a.k.a. standard error). The standard deviation of the empirical mean is std/sqrt(n) where n is the sample size (i.e. # trials).
Three curves: epsilon-greedy, Thompson sampling, UCB

Problem: 3 arms with probabilities 0.3, 0.5, 0.7 as described in TestRL.py
Prior (Thompson sampling): all ones

Discussion: explain briefly the results obtain in each graph. Are the results surprising? Do they match what is expected based on the theory?

Assignment 1: out June 3, due June 19

In this assignment, you will program value iteration, policy iteration and modified policy iteration for Markov decision processes in Python (Python 2.6 is sufficient). More specifically, fill in the functions in the skeleton code of the file MDP.py. The file TestMDP.py contains the simple MDP example that we saw in Module 5. You can verify that your code compiles properly with TestMDP.py by running "python TestMDP.py". Add print statements to this file to verify that the output of each function makes sense.

We will use Marmoset for automated grading of the assignments. Instructions on how to submit your code to Marmoset will be posted shortly.

Skeleton code: MDP.py
Simple MDP from Module 5 to test your code: TestMDP.py

Using Marmoset to submit assignments

Basic Logistics

Go to https://marmoset.student.cs.uwaterloo.ca/ and log in using your WatIAM info. This should be the same as the info you use to log in to Quest.
Click the "as" button under "Authenticate". You should only have one choice here.
Click "CS886 (Fall 2013):".
Note: If you don't see what you're supposed to see in steps 2 or 3, then you are probably not yet added as a student to CS886 on Marmoset. Contact Pascal to resolve this issue.
You should now be able to see the assignments that have been set up on Marmoset for CS886. You can submit your assignment files to Marmoset via the "web submission" page for each assignment question.

Submitting Assignments

If only one file is required for an assignment question, you can submit only that file. If multiple files are required for an assignment question, you must zip all of the required files and submit the zip file. Make sure that all of the required files with a required name are named correctly; otherwise, you will receive "did not compile" as the test result.

Tests on Marmoset

There are two types of tests that we will use on Marmoset in CS886: public tests and release tests.

Public tests are designed to help you verify that your program works well. They are simple tests that test a variety of cases.

Release tests are designed to test whether your program generalizes as expected to the full range of problems.

After Submitting a Program

When you make a submission to an assignment question on Marmoset, your submission will be automatically tested on a CSCF server. After a while (a few seconds to a few minutes, depending on server load and program and test complexity), the tests should finish and a result will be available.

Interpreting Test Results

If your submitted program does not compile or run successfully on its own, your submission will receive a result of "did not compile" and the detailed test results will contain something similar to the error message you get if you ran your program yourself. In this case, your submission will not be tested with any of the tests.

If your submitted program runs successfully on its own, it will be tested with all of the public and release tests.

If it fails any public test, the detailed test results will display an error message for that public test. In this case, you won't be able to see how your program did on the release tests (although it was run on the release tests anyway).

If it passes all of the public tests, you will have the option to see how your program did for the release tests. If you do so, you will use up one of your "release tokens" for that question. Normally, for every assignment question, you will be initially given 3 release tokens. If you use up one or more of them, one release token will regenerate once every 12 hours, until you have 3 release tokens again. Start your work early if you want to have more chances to see the results of the release tests. If the deadline will expire before your token regenerates, you can still submit, though you will not be able to tell how your submission did on the release tests.

Marmoset automatically tests each submission with all of the release tests. If your submission fails a release test and you use a token to see the results, you will only see that test and one more test in the detailed test results. If your submission passes all the release tests, you will not see any release tests in the detailed test results, but you will be credited with full marks for that question.

If you fail a release test, Marmoset won't tell you what the correct answer is, but there will be enough information for you to know what the test case was about.

Common Marmoset Test Messages

Here are the possible results of requesting a release test:

"Success: test X passed", where X is the name of the test. You can pat yourself on the back for this one.

"Error: wrong output for test X". This means that your program did not produce the output that we expected.

"Error: program ran out of resources while running test X". This means that your program took too much time or too much memory during the test.

"Error: program encountered error while running test X". This means that your program halted with an error during the test.

"Error: program ran out of time while initializing". This happens if your program takes too much time when it is evaluated by Marmoset, before any of our tests are run.

"Error: program ran out of memory while initializing" (same as above, but with memory).

"Error: program failed to initialize". This happens if you use the wrong language level, if you have non-text elements in your program (such as values snipped out of the Interactions window and pasted into the Definitions window), or if you are doing something that is not permitted, such as file I/O.

Grading

Your grade for each assignment will be the number of release tests that your program passed. Public tests do not count since Marmoset tells you what is the correct answer when you fail a public test.

There is no penalty for multiple submissions. Your best submission counts. The only thing stopping you from spamming Marmoset with many submissions is your own conscience, so don't do it. Please remember that the server is a shared resource; out of courtesy to your fellow students, do not do anything that overloads it, especially close to deadlines.