There will be five assignments, each worth 7% of the final mark. Each assignment must be done individually (i.e., no team) and will consist entirely of programming questions. More precisely, you will be asked to program some algorithms for sequential decision making and reinforcement learning, and to test them on some datasets. Programs must be written in Python and submitted via Marmoset, which is an automated system that runs and evaluates programs.
In this assignment, you will program two reinforcement learning
algorithms (Q-learning and model-based active RL) and three bandit
algorithms (epsilon-greedy, Thompson sampling and UCB) in Python
(Python 2.6 is sufficient). More specifically, fill in the
functions in the skeleton code of the file RL.py. This
file requires the file MDP.py that you programmed for assignment 1
so make sure to include it in the same directory. The file
TestRL.py contains simple RL and bandit problems to test your
functions (i.e. the output of each function will be printed to the
screen). You can verify that your code compiles properly by
running "python TestRL.py".
We will not use Marmoset for this assignment due to the
stochastic nature of RL, which makes it difficult to ensure that
everyone has the same output. Instead, you will produce some
graphs showing the convergence rate of each algorithms on some
test problems. Instructions to that effect will be posted
shortly.
Hand in two graphs and a brief discussion that compare the
convergence rate of the RL and bandit techniques. Submit
your graphs and discussion by email. Suggestion: use
matplotlib in Python to produce graphs. More precisely, use
the function pyplot.errorbar to produce graphs with error bars.
Assignment 1: out June 3, due June 19
Note: If you don't see what you're supposed to see in steps 2 or 3, then you are probably not yet added as a student to CS886 on Marmoset. Contact Pascal to resolve this issue.
If only one file is required for an assignment question, you can submit only that file. If multiple files are required for an assignment question, you must zip all of the required files and submit the zip file. Make sure that all of the required files with a required name are named correctly; otherwise, you will receive "did not compile" as the test result.
There are two types of tests that we will use on Marmoset in CS886: public tests and release tests.
Public tests are designed to help you verify that your program works well. They are simple tests that test a variety of cases.
Release tests are designed to test whether your program
generalizes as expected to the full range of problems.
When you make a submission to an assignment question on Marmoset, your submission will be automatically tested on a CSCF server. After a while (a few seconds to a few minutes, depending on server load and program and test complexity), the tests should finish and a result will be available.
If your submitted program does not compile or run successfully on its own, your submission will receive a result of "did not compile" and the detailed test results will contain something similar to the error message you get if you ran your program yourself. In this case, your submission will not be tested with any of the tests.
If your submitted program runs successfully on its own, it will be tested with all of the public and release tests.
If it fails any public test, the detailed test results will
display an error message for that public test. In this case, you
won't be able to see how your program did on the release tests
(although it was run on the release tests anyway).
If it passes all of the public tests, you will have the option to see how your program did for the release tests. If you do so, you will use up one of your "release tokens" for that question. Normally, for every assignment question, you will be initially given 3 release tokens. If you use up one or more of them, one release token will regenerate once every 12 hours, until you have 3 release tokens again. Start your work early if you want to have more chances to see the results of the release tests. If the deadline will expire before your token regenerates, you can still submit, though you will not be able to tell how your submission did on the release tests.
Marmoset automatically tests each submission with all of the release tests. If your submission fails a release test and you use a token to see the results, you will only see that test and one more test in the detailed test results. If your submission passes all the release tests, you will not see any release tests in the detailed test results, but you will be credited with full marks for that question.
If you fail a release test, Marmoset won't tell you what the
correct answer is, but there will be enough information for you to
know what the test case was about.
Here are the possible results of requesting a release test:
"Success: test X passed", where X is the name of the test. You can pat yourself on the back for this one.
"Error: wrong output for test X". This means that your program did not produce the output that we expected.
"Error: program ran out of resources while running test X". This
means that your program took too much time or too much memory
during the test.
"Error: program encountered error while running test X". This means that your program halted with an error during the test.
"Error: program ran out of time while initializing". This happens if your program takes too much time when it is evaluated by Marmoset, before any of our tests are run.
"Error: program ran out of memory while initializing" (same as above, but with memory).
"Error: program failed to initialize". This happens if you use the wrong language level, if you have non-text elements in your program (such as values snipped out of the Interactions window and pasted into the Definitions window), or if you are doing something that is not permitted, such as file I/O.
Your grade for each assignment will be the number of release
tests that your program passed. Public tests do not count
since Marmoset tells you what is the correct answer when you fail
a public test.
There is no penalty for multiple submissions. Your best submission counts. The only thing stopping you from spamming Marmoset with many submissions is your own conscience, so don't do it. Please remember that the server is a shared resource; out of courtesy to your fellow students, do not do anything that overloads it, especially close to deadlines.