Home Goals Textbook Schedule Assignments Critiques Presentation Project Marks Policies Pascal's Homepage

CS885 Spring 2018 - Reinforcement Learning

There will be three assignments, each worth 10% of the final grade. Assignments are done individually (i.e., no team). Each assignment will have a programming part to be done in Python. Some assignments will make use of TensorFlow and OpenAI Gym. For GPU acceleration, feel free to use Google's Colaboratory environment. This is a free cloud service where you can run Python code (including TensorFlow, which is pre-installed) with GPU acceleration. A virtual machine with two CPUs and one Nvidia K80 GPU will run up to 12 hours after which it must be restarted. The following steps are recommended:

The approximate out and due dates are:

On the due date of an assignment, the work done to date should be submitted electronically on the LEARN website; further material may be submitted with a 2% penalty for every rounded up hour past the deadline. For example, an assignment submitted 5 hours and 15 min late will receive a penalty of ceiling(5.25) * 2% = 12%. Assignments submitted more than 50 hours late will not be marked.

Assignment 1: out May 23, due June 5 (11:59 pm)

This assignment has three parts.

Part I

In the first part, you will program value iteration, policy iteration and modified policy iteration for Markov decision processes in Python. More specifically, fill in the functions in the skeleton code of the file MDP.py. The file TestMDP.py contains the simple MDP example that we saw Lecture 2a Slides 13-14. You can verify that your code compiles properly with TestMDP.py by running "python TestMDP.py". Add print statements to this file to verify that the output of each function makes sense.

Submit the following material via LEARN:

Part II

In the second part, you will program the Q-learning algorithm in Python. More specifically, fill in the functions in the skeleton code of the file RL.py. This file requires the file MDP.py that you programmed for part I so make sure to include it in the same directory. The file TestRL.py contains a simple RL problem to test your functions (i.e. the output of each function will be printed to the screen). You can verify that your code compiles properly by running "python TestRL.py".

Submit the following material via LEARN:

Part III

In the third part, you will train a deep Q-network in OpenAI Gym to solve problems with continuous states that prevent the use of a tabular representation. Instead, you will use a neural network to represent the Q-function. Follow these steps to get started:

Submit the following material via LEARN:

Assignment 2: out June 15, due June 29 (11:59 pm)

This assignment has two parts.

Part I

In the first part, you will program three bandit algorithms (epsilon-greedy, Thompson sampling and UCB) and two RL algorithms (REINFORCE and model-based RL) in Python. More specifically, fill in the functions in the skeleton code of the file RL2.py. This file requires the file MDP.py that you programmed in Assignment 1 so make sure to include it in the same directory. The file TestRL2.py contains a simple bandit and RL problems to test your functions (i.e. the output of each function will be printed to the screen). You can verify that your code compiles properly by running "python TestRL2.py".

Submit the following material via LEARN:

Part II

In the second part, you will use deep deterministic policy gradient (DDPG) in OpenAI Gym to solve problems with continuous actions. Follow these steps to get started:

Submit the following material via LEARN: