University of Waterloo

Term and Year of Offering: Spring 2013

Course Number and Title: CS886, Sequential Decision Making and Reinforcement Learning


Discussion Forum:

Video Conferencing: contact Mike Willson ( for an account

Comp Sec Camp Loc Time Days/Date Bldg Room Instructor
LEC 002 UW U 11:00-12:20MW E5 3052

Instructor's Name Office Location Contact Office Hours
Pascal Poupart DC2514  by appointment 

Course Description:

With the proliferation of sensors, organizations are now collecting streams of data about all kinds of processes (e.g., physiological measurements, financial transactions, energy consumption, text messages, etc.).  There is a need to process this data and to make intelligent decisions with respect to this data in order to optimize desired processes (e.g., assistive technologies, portfolio management, energy optimization, dialog management, robotic control, etc.).  Hence, this course will cover the theory and practice of sequential decision making.  More precisely, we will focus on Markov decision processes, which provide a general framework to model and optimize a wide range of decision processes in health informatics, robotics, computational finance, human computer interaction, computational sustainability, operations research, etc.  Since the dynamics of a process are usually only partially known at the time of making decisions, we will also cover reinforcement learning which provides a framework to simultaneously learn about a process while making decisions.

Course Objectives:

At the end of the course, students should have the ability to:

Course Overview:

The topics we will cover include:


There is no required textbook.  However complementary readings (optional) will be recommended in several references (see course schedule)

NB: The textbooks by Csaba Szepesvari [Sze] as well as Mausam and Andrey Kolobov [MauKo] can be accessed electronically from the campus (UW subscription) but not from home.


The grading scheme for the course is as follows:

NB: For an audit mark, you need to submit the assignments.


There will be five assignments.  Each assignment must be done individually (i.e., no team) and will consist entirely of programming questions.  More precisely, you will be asked to program some algorithms for sequential decision making and reinforcement learning and to test them on some datasets.  Programs must be written in python and submitted via Marmoset, which is an automated system that runs and evaluates programs.

You will also write five paper critiques.  They must be done individually (i.e., no team).  Paper critiques should must be saved in pdf format and submitted by email to the instructor. 


There is no midterm and no final exam.

Rules for Group Work:

Assignments, paper critiques and the project must be done individually. 

Indication of how late submission of assignments and missed assignments will be treated

On the due date of an assignment, programs should be submitted via Marmoset.  Late programs may be submitted for half credit within 24 hours. Programs submitted more than 24 hours late will not be marked.

Paper critiques must be submitted by email to the instructor in pdf format.  Late critiques may be submitted for half credit within 24 hours.

Indication of where students are to submit assignments and pick up marked assignments

Assignments must be submitted electronically via Marmoset. All that is returned is a mark based on the performance of the program.  The marks will be made available via Marmoset.

Paper critiques must be submitted by email in pdf format.  Marked critiques will be returned by email.

Academic Integrity: In order to maintain a culture of academic integrity, members of the University of Waterloo community are expected to promote honesty, trust, fairness, respect and responsibility. [Check for more information.]

Grievance: A student who believes that a decision affecting some aspect of his/her university life has been unfair or unreasonable may have grounds for initiating a grievance. Read Policy 70, Student Petitions and Grievances, Section 4, When in doubt please be certain to contact the department's administrative assistant who will provide further assistance.

Discipline: A student is expected to know what constitutes academic integrity [check] to avoid committing an academic offence, and to take responsibility for his/her actions. A student who is unsure whether an action constitutes an offence, or who needs help in learning how to avoid offences (e.g., plagiarism, cheating) or about 'rules' for group work/collaboration should seek guidance from the course instructor, academic advisor, or the undergraduate Associate Dean. For information on categories of offences and types of penalties, students should refer to Policy 71, Student Discipline, For typical penalties check Guidelines for the Assessment of Penalties,

Appeals: A decision made or penalty imposed under Policy 70 (Student Petitions and Grievances) (other than a petition) or Policy 71 (Student Discipline) may be appealed if there is a ground. A student who believes he/she has a ground for an appeal should refer to Policy 72 (Student Appeals)

Note for Students with Disabilities: The Office for persons with Disabilities (OPD), located in Needles Hall, Room 1132, collaborates with all academic departments to arrange appropriate accommodations for students with disabilities without compromising the academic integrity of the curriculum. If you require academic accommodations to lessen the impact of your disability, please register with the OPD at the beginning of each academic term.