University of Waterloo
Term and Year of Offering: Spring 2013
Course Number and Title: CS886, Sequential Decision Making
and Reinforcement Learning
Website:
http://www.cs.uwaterloo.ca/~ppoupart/teaching/cs886-spring13/cs886-spring13.html
Discussion Forum: piazza.com/uwaterloo.ca/spring2013/cs886b
Video Conferencing:
contact Mike Willson (mike.willson@uwaterloo.ca) for
an account
Instructor's Name |
Office Location |
Contact |
Office Hours |
Pascal Poupart |
DC2514 |
ppoupart@uwaterloo.ca |
by appointment
|
Course Description:
With the proliferation of sensors, organizations are now collecting
streams of data about all kinds of processes (e.g., physiological
measurements, financial transactions, energy consumption, text
messages, etc.). There is a need to process this data and to
make
intelligent decisions with respect to this data in order to optimize
desired processes (e.g., assistive technologies, portfolio
management,
energy optimization, dialog management, robotic control,
etc.).
Hence, this course will cover the theory and practice of sequential
decision making. More precisely, we will focus on Markov
decision
processes, which provide a general framework to model and optimize a
wide range of decision processes in health informatics, robotics,
computational finance, human computer interaction, computational
sustainability, operations research, etc. Since the dynamics
of a
process are usually only partially known at the time of making
decisions, we will also cover reinforcement learning which provides
a
framework to simultaneously learn about a process while making
decisions.
Course Objectives:
At the end of the course, students should have the ability to:
- Model sequential decision making tasks
- Design algorithms for automated decision making and
reinforcement
learning
Course Overview:
The topics we will cover include:
- Reasoning under uncertainty
- Decision Theory
- Sequential Decision Making
- Markov decision processes
- Offline optimization techniques
- Online optimization techniques
- Partially observable domains
- Decentralized decision making
- Multi-agent systems
- Reinforcement Learning
- Model-based techniques
- Model-free techniques
- Non-parametric techniques
- Bayesian reinforcement learning
- Potential applications:
- Robotic control
- Dialog management
- Operations research
- Health informatics and assistive technologies
- Intelligent tutoring systems
- Computational finance
- Computational sustainability
Textbook:
There is no required textbook. However complementary
readings
(optional) will be recommended in several references (see course schedule)
- [SiBuf] Olivier Sigaud and Olivier Buffer (2010) Markov Decision Processes in
Artificial
Intelligence
- [Po] Warren B. Powell (2011, 2nd edition) Approximate Dynamic Programming:
Solving
the Curses of Dimensionality
- [SutBar] Richard Sutton and Andy Barto (1998) Reinforcement
Learning:
An
Introduction
- [Sze] Csaba Szepesvari (2010)
Algorithms
for
Reinforcement
Learning (this link should be free from UW)
- [MauKo] Mausam and Andrey Kolobov (2012) Planning
with
Markov
Decision
Processes:
An
AI Perspective (this link should be free from UW)
- [Pu] Martin L. Puterman (2009, 2nd edition) Markov Decision Processes:
Discrete
Stochastic Dynamic Programming
- [Ber] Dimitri Bertsekas (2012, 4th edition) Dynamic Programming and Optimal
Control
NB: The textbooks by Csaba Szepesvari [Sze] as well as Mausam and
Andrey Kolobov [MauKo] can be accessed electronically from the
campus
(UW subscription) but not from home.
Evaluation:
The grading scheme for the course is as follows:
- Course project (30%)
- Five programming assignments (7% each)
- Five paper critiques (7% each)
NB: For an audit mark, you need to submit the assignments.
Assignments
There will be five assignments. Each assignment must be
done
individually (i.e., no team) and will consist entirely of
programming
questions. More precisely, you will be asked to program some
algorithms for sequential decision making and reinforcement
learning
and to test
them on some datasets. Programs must be written in python
and
submitted via Marmoset, which is an automated system that runs and
evaluates programs.
You will also write five paper critiques. They must be done
individually (i.e., no team). Paper critiques should must be
saved in pdf format and submitted by email to the
instructor.
Tests
There is no midterm and no final exam.
Rules for Group Work:
Assignments, paper critiques and the project must be done
individually.
Indication of how late submission of assignments and missed
assignments will be treated
On the due date of an assignment, programs should be
submitted via Marmoset. Late programs may be submitted
for half credit within 24 hours. Programs submitted more than 24
hours late will not be marked.
Paper critiques must be submitted by email to the instructor in pdf
format. Late critiques may be submitted for half credit within
24
hours.
Indication of where students are to submit assignments and pick
up
marked assignments
Assignments must be submitted electronically via Marmoset. All that
is
returned is a mark based on the performance of the program.
The
marks will be made available via Marmoset.
Paper critiques must be submitted by email in pdf format.
Marked
critiques will be returned by email.
Academic Integrity: In order to maintain a culture of
academic integrity, members of the University of Waterloo community
are
expected to promote honesty, trust, fairness, respect and
responsibility.
[Check www.uwaterloo.ca/academicintegrity/
for more information.]
Grievance: A student who believes that a decision
affecting
some aspect of his/her university life has been unfair or
unreasonable
may have grounds for initiating a grievance. Read Policy 70,
Student
Petitions and Grievances, Section 4, www.adm.uwaterloo.ca/infosec/Policies/policy70.htm.
When
in
doubt
please
be certain to contact the department's
administrative assistant who will provide further assistance.
Discipline: A student is expected to know what constitutes
academic integrity [check www.uwaterloo.ca/academicintegrity/]
to
avoid
committing
an
academic offence, and to take responsibility for
his/her actions. A student who is unsure whether an action
constitutes
an offence, or who needs help in learning how to avoid offences
(e.g.,
plagiarism, cheating) or about 'rules' for group
work/collaboration
should seek guidance from the course instructor, academic advisor,
or
the undergraduate Associate Dean. For information on categories of
offences and types of penalties, students should refer to Policy
71,
Student Discipline, www.adm.uwaterloo.ca/infosec/Policies/policy71.htm.
For
typical
penalties
check
Guidelines for the Assessment of Penalties,
www.adm.uwaterloo.ca/infosec/guidelines/penaltyguidelines.htm.
Appeals: A decision made or penalty imposed under Policy
70
(Student Petitions and Grievances) (other than a petition) or
Policy 71
(Student Discipline) may be appealed if there is a ground. A
student
who believes he/she has a ground for an appeal should refer to
Policy
72 (Student Appeals) www.adm.uwaterloo.ca/infosec/Policies/policy72.htm.
Note for Students with Disabilities: The Office for
persons
with Disabilities (OPD), located in Needles Hall, Room 1132,
collaborates with all academic departments to arrange appropriate
accommodations for students with disabilities without compromising
the
academic integrity of the curriculum. If you require academic
accommodations to lessen the impact of your disability, please
register
with the OPD at the beginning of each academic term.