Course Outline for CS886, Spring 2013

University of Waterloo

Term and Year of Offering: Spring 2013

Course Number and Title: CS886, Sequential Decision Making and Reinforcement Learning

Website: http://www.cs.uwaterloo.ca/~ppoupart/teaching/cs886-spring13/cs886-spring13.html

Discussion Forum: piazza.com/uwaterloo.ca/spring2013/cs886b

Video Conferencing: contact Mike Willson (mike.willson@uwaterloo.ca) for an account

Comp Sec	Camp Loc	Time Days/Date	Bldg Room	Instructor
LEC 002	UW U	11:00-12:20MW	E5 3052	Poupart,Pascal

Instructor's Name	Office Location	Contact	Office Hours
Pascal Poupart	DC2514	ppoupart@uwaterloo.ca	by appointment

Course Description:

With the proliferation of sensors, organizations are now collecting streams of data about all kinds of processes (e.g., physiological measurements, financial transactions, energy consumption, text messages, etc.). There is a need to process this data and to make intelligent decisions with respect to this data in order to optimize desired processes (e.g., assistive technologies, portfolio management, energy optimization, dialog management, robotic control, etc.). Hence, this course will cover the theory and practice of sequential decision making. More precisely, we will focus on Markov decision processes, which provide a general framework to model and optimize a wide range of decision processes in health informatics, robotics, computational finance, human computer interaction, computational sustainability, operations research, etc. Since the dynamics of a process are usually only partially known at the time of making decisions, we will also cover reinforcement learning which provides a framework to simultaneously learn about a process while making decisions.

Course Objectives:

At the end of the course, students should have the ability to:

Model sequential decision making tasks
Design algorithms for automated decision making and reinforcement learning

Course Overview:

The topics we will cover include:

Reasoning under uncertainty
Decision Theory
Sequential Decision Making
Markov decision processes

Offline optimization techniques
Online optimization techniques
Partially observable domains
Decentralized decision making
Multi-agent systems

Reinforcement Learning

Model-based techniques
Model-free techniques
Non-parametric techniques
Bayesian reinforcement learning

Potential applications:

Robotic control
Dialog management
Operations research
Health informatics and assistive technologies
Intelligent tutoring systems
Computational finance
Computational sustainability

Textbook:

There is no required textbook. However complementary readings (optional) will be recommended in several references (see course schedule)

[SiBuf] Olivier Sigaud and Olivier Buffer (2010) Markov Decision Processes in Artificial Intelligence
[Po] Warren B. Powell (2011, 2nd edition) Approximate Dynamic Programming: Solving the Curses of Dimensionality
[SutBar] Richard Sutton and Andy Barto (1998) Reinforcement Learning: An Introduction
[Sze] Csaba Szepesvari (2010) Algorithms for Reinforcement Learning (this link should be free from UW)
[MauKo] Mausam and Andrey Kolobov (2012) Planning with Markov Decision Processes: An AI Perspective (this link should be free from UW)
[Pu] Martin L. Puterman (2009, 2nd edition) Markov Decision Processes: Discrete Stochastic Dynamic Programming
[Ber] Dimitri Bertsekas (2012, 4th edition) Dynamic Programming and Optimal Control

NB: The textbooks by Csaba Szepesvari [Sze] as well as Mausam and Andrey Kolobov [MauKo] can be accessed electronically from the campus (UW subscription) but not from home.

Evaluation:

The grading scheme for the course is as follows:

Course project (30%)
Five programming assignments (7% each)
Five paper critiques (7% each)

NB: For an audit mark, you need to submit the assignments.

Assignments

There will be five assignments. Each assignment must be done individually (i.e., no team) and will consist entirely of programming questions. More precisely, you will be asked to program some algorithms for sequential decision making and reinforcement learning and to test them on some datasets. Programs must be written in python and submitted via Marmoset, which is an automated system that runs and evaluates programs.

You will also write five paper critiques. They must be done individually (i.e., no team). Paper critiques should must be saved in pdf format and submitted by email to the instructor.

Tests

There is no midterm and no final exam.

Rules for Group Work:

Assignments, paper critiques and the project must be done individually.

Indication of how late submission of assignments and missed assignments will be treated

On the due date of an assignment, programs should be submitted via Marmoset. Late programs may be submitted for half credit within 24 hours. Programs submitted more than 24 hours late will not be marked.

Paper critiques must be submitted by email to the instructor in pdf format. Late critiques may be submitted for half credit within 24 hours.

Indication of where students are to submit assignments and pick up marked assignments

Assignments must be submitted electronically via Marmoset. All that is returned is a mark based on the performance of the program. The marks will be made available via Marmoset.

Paper critiques must be submitted by email in pdf format. Marked critiques will be returned by email.

Academic Integrity: In order to maintain a culture of academic integrity, members of the University of Waterloo community are expected to promote honesty, trust, fairness, respect and responsibility. [Check www.uwaterloo.ca/academicintegrity/ for more information.]

Grievance: A student who believes that a decision affecting some aspect of his/her university life has been unfair or unreasonable may have grounds for initiating a grievance. Read Policy 70, Student Petitions and Grievances, Section 4, www.adm.uwaterloo.ca/infosec/Policies/policy70.htm. When in doubt please be certain to contact the department's administrative assistant who will provide further assistance.

Discipline: A student is expected to know what constitutes academic integrity [check www.uwaterloo.ca/academicintegrity/] to avoid committing an academic offence, and to take responsibility for his/her actions. A student who is unsure whether an action constitutes an offence, or who needs help in learning how to avoid offences (e.g., plagiarism, cheating) or about 'rules' for group work/collaboration should seek guidance from the course instructor, academic advisor, or the undergraduate Associate Dean. For information on categories of offences and types of penalties, students should refer to Policy 71, Student Discipline, www.adm.uwaterloo.ca/infosec/Policies/policy71.htm. For typical penalties check Guidelines for the Assessment of Penalties, www.adm.uwaterloo.ca/infosec/guidelines/penaltyguidelines.htm.

Appeals: A decision made or penalty imposed under Policy 70 (Student Petitions and Grievances) (other than a petition) or Policy 71 (Student Discipline) may be appealed if there is a ground. A student who believes he/she has a ground for an appeal should refer to Policy 72 (Student Appeals) www.adm.uwaterloo.ca/infosec/Policies/policy72.htm.

Note for Students with Disabilities: The Office for persons with Disabilities (OPD), located in Needles Hall, Room 1132, collaborates with all academic departments to arrange appropriate accommodations for students with disabilities without compromising the academic integrity of the curriculum. If you require academic accommodations to lessen the impact of your disability, please register with the OPD at the beginning of each academic term.