CS885 Spring 2018 - Reinforcement Learning

This is a tentative schedule only. As the course progresses, the schedule will be adjusted. All lectures are recorded and the videos are available as a playlist in Pascal's YouTube channel.

[SutBar] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction (2nd edition, 2018) freely available online
[Sze] Csaba Szepesvari, Algorithms for Reinforcement Learning freely available online
[SigBuf] Olivier Sigaud and Olivier Buffet (editors), Markov Decision Processes in Artificial Intelligence (2010) freely available online
[GBC] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning (2016) freely available online
[Put] Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (2008)
[Ber] Dimitri P. Bertsekas, Dynamic Programming and Optimal Control (2017)
[Pow] Warren B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality (2015)
[RusNor] Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (3rd Edition) (2010)

Lecture	Date	Topic	Readings (textbooks)
1a	May 2, 10:30-11:20 am	Course introduction (slides) (video)	[SutBar] Chapter 1, [Sze] Chapter 1
1b	May 2, 11:30-12:20 pm	Markov Processes (slides) (video)	[RusNor] Section 15.1
2a	May 4, 10:30-11:20 am	Markov Decision Processes (slides (slides 5,6,9 revised May 5)) (video)	[SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
2b	May 4, 11:30-12:20 pm	Value Iteration (slides (slides 13,14 revised May 14) ) (video)	[SutBar] Sec. 4.1, 4.1, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
3a	May 9, 10:30-11:20 am	Policy Iteration (slides) (video)	[SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3
3b	May 9, 11:30-12:20 pm	Introduction to RL (slides) (video)	[SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
4a	May 11, 10:30-11:20 am	Deep neural networks (slides) (video)	[GBC] Chap. 6, 7, 8
4b	May 11, 11:30-12:20 pm	Deep Q-Networks (slides) (video)	[SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
5	May 16, 10:30-12:20 pm	Guest lecture by Nabiha Asghar on RL for dialog systems (slides) (video)
6a	May 18, 10:30-11:20 am	Guest Lecture by Mike Rudd on OpenAI environments (slides) (video)
6b	May 18, 11:30-12:20 pm	Guest Lecture by Timmy Tse on DQN and TensorFlow (slides) (video) (demo.zip)
7a	May 23, 10:30-11:20 am	Policy Gradient (slides (slide 9 revised May 24)) (video)	[SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
7b	May 23, 11:30-12:20 pm	Actor Critic (slides) (video)	[SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
8a	May 25, 10:30-11:20 am	Multi-armed bandits (slides) (video)	[SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
8b	May 25, 11:30-12:20 pm	Bayesian and contextual bandits (slides) (video)	[SutBar] Sec. 2.9
9	May 30, 10:30-12:20 pm	Model-based RL (slides (slide 13 revised June 1)) (video)	[SutBar] Chap. 8
10	June 1, 10:30-12:20 pm	Bayesian RL (slides) (video)	Michael O’Gordon Duff’s PhD Thesis (2002)
11a	June 6, 10:30-11:20 am	Hidden Markov models (slides) (video)	[RusNor] Sec. 15.3 [SutBar] Sec. 17.3
11b	June 6, 11:30-12:20 pm	Partially observable RL (slides) (video)	[RusNor] Sec. 17.3 [SigBuf] Chap. 7
12	June 8, 10:30-12:20 pm	Deep recurrent Q-networks (slides) (video)	[GBC] Chap. 10
13a	June 13, 10:30-10:50 am	RL for video games (slides) (video)	Playing FPS Games with Deep Reinforcement Learning (Presenter: Mark Iwanchyshyn)
13b	June 13, 11:10-11:30 am	RL for video games (slides) (video)	A Deep Hierarchical Approach to Lifelong Learning in Minecraft (Presenter: Yetian Wang)
13c	June 13, 11:50-12:20 pm	Adversarial Search (slides) (video)	[RusNor] Sec. 5.1-5.4
14a	June 15, 10:30-10:50 am	RL for computer Go (slides) (video)	Mastering the Game of Go without Human Knowledge (Presenter: Henry Chen)
14b	June 15, 11:10-11:30 am	RL for board games (slides) (video)	Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (Presenter: Kira Selby)
14c	June 15, 11:50-12:20 pm	Trust Region Methods (slides) (video)	Nocedal and Wright, Numerical Optimization, Chapter 4
	June 17	Project proposal due (11:59 pm)
15a	June 20, 10:30-10:50 am	Policy optimization (slides) (video)	Trust region policy optimization (Presenter: Shivam Kalra)
15b	June 20, 11:10-11:30 am	Policy optimization (slides) (video)	Proximal policy optimization algorithms (Presenter: Ruifan Yu)
15c	June 20, 11:50-12:20 pm	Semi-Markov Decision Processes (slides) (video)	[Put] Sec. 11.1-11.3
16a	June 22, 10:30-10:50 am	Hierarchical RL (slides) (video)	The Option-Critic Architecture (Presenter: Zebin Kang)
16b	June 22, 11:10-11:30 am	Hierarchical RL (slides) (video)	FeUdal Networks for Hierarchical Reinforcement Learning (Presenter: Rene Bidart)
17a	June 27, 10:30-10:50 am	RL for robotics (slides) (video)	Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning (Presenter: James Cagalawan)
17b	June 27, 11:10-11:30 am	RL for robotics (slides) (video)	Control of a Quadrotor with Reinforcement Learning (Presenter: Nicole McNabb)
17c	June 27, 11:50-12:20 pm	Inverse Reinforcement Learning (slides) (video)	Abbeel, Ng, Apprenticeship Learning via Inverse Reinforcement Learning, ICML-2004
18a	June 29, 10:30-10:50 am	RL for autonomous vehicles (slides) (video)	Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving (Presenter: Ashish Gaurav)
18b	June 29, 11:10-11:30 am	RL for autonomous vehicles (slides) (video)	Learning Driving Styles for Autonomous Vehicles from Demonstration (Presenter: Marko Ilievski)
19a	July 4, 10:30-10:50 am	RL for conversational agents (slides) (video)	End-to-end lstm-based dialog control optimized with supervised and reinforcement learning (Presenter: Hamidreza Shahidi)
19b	July 4, 11:10-11:30 am	RL for conversational agents (slides) (video)	Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning (Presenter: Nalin Chhibber)
19c	July 4, 11:50-12:20 pm	Memory Augmented Networks (slides) (video)	[GBC] Chap. 10
20a	July 6, 10:30-10:50 am	Memory based RL (slides) (video)	Neural Map: Structured Memory for Deep Reinforcement Learning (Presenter: Andreas Stöckel)
20b	July 6, 11:10-11:30 am	Memory based RL (slides) (video)	Memory Augmented Control Networks (Presenter: Aravind Balakrishnan)
	August 1	Project report due (11:59 pm)