Home Goals Textbook Schedule Assignments Critiques Presentation Project Marks Policies Pascal's Homepage

CS885 Spring 2018 - Reinforcement Learning

This is a tentative schedule only. As the course progresses, the schedule will be adjusted. All lectures are recorded and the videos are available as a playlist in Pascal's YouTube channel.

Lecture Date Topic Readings (textbooks)
1a May 2, 10:30-11:20 am Course introduction (slides) (video) [SutBar] Chapter 1, [Sze] Chapter 1
1b May 2, 11:30-12:20 pm Markov Processes (slides) (video) [RusNor] Section 15.1
2a May 4, 10:30-11:20 am Markov Decision Processes (slides (slides 5,6,9 revised May 5)) (video) [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
2b May 4, 11:30-12:20 pm Value Iteration (slides (slides 13,14 revised May 14) ) (video) [SutBar] Sec. 4.1, 4.1, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
3a May 9, 10:30-11:20 am Policy Iteration (slides) (video) [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec., [RusNor] Sec. 17.3
3b May 9, 11:30-12:20 pm Introduction to RL (slides) (video) [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
4a May 11, 10:30-11:20 am Deep neural networks (slides) (video) [GBC] Chap. 6, 7, 8
4b May 11, 11:30-12:20 pm Deep Q-Networks (slides) (video) [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
5 May 16, 10:30-12:20 pm Guest lecture by Nabiha Asghar on RL for dialog systems (slides) (video)
6a May 18, 10:30-11:20 am Guest Lecture by Mike Rudd on OpenAI environments (slides) (video)
6b May 18, 11:30-12:20 pm Guest Lecture by Timmy Tse on DQN and TensorFlow (slides) (video) (demo.zip)
7a May 23, 10:30-11:20 am Policy Gradient (slides (slide 9 revised May 24)) (video) [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
7b May 23, 11:30-12:20 pm Actor Critic (slides) (video) [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
8a May 25, 10:30-11:20 am Multi-armed bandits (slides) (video) [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
8b May 25, 11:30-12:20 pm Bayesian and contextual bandits (slides) (video) [SutBar] Sec. 2.9
9 May 30, 10:30-12:20 pm Model-based RL (slides (slide 13 revised June 1)) (video) [SutBar] Chap. 8
10 June 1, 10:30-12:20 pm Bayesian RL (slides) (video) Michael O’Gordon Duff’s PhD Thesis (2002)
11a June 6, 10:30-11:20 am Hidden Markov models (slides) (video) [RusNor] Sec. 15.3 [SutBar] Sec. 17.3
11b June 6, 11:30-12:20 pm Partially observable RL (slides) (video) [RusNor] Sec. 17.3 [SigBuf] Chap. 7
12 June 8, 10:30-12:20 pm Deep recurrent Q-networks (slides) (video) [GBC] Chap. 10
13a June 13, 10:30-10:50 am RL for video games (slides) (video) Playing FPS Games with Deep Reinforcement Learning
(Presenter: Mark Iwanchyshyn)
13b June 13, 11:10-11:30 am RL for video games (slides) (video) A Deep Hierarchical Approach to Lifelong Learning in Minecraft
(Presenter: Yetian Wang)
13c June 13, 11:50-12:20 pm Adversarial Search (slides) (video) [RusNor] Sec. 5.1-5.4
14a June 15, 10:30-10:50 am RL for computer Go (slides) (video) Mastering the Game of Go without Human Knowledge
(Presenter: Henry Chen)
14b June 15, 11:10-11:30 am RL for board games (slides) (video) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
(Presenter: Kira Selby)
14c June 15, 11:50-12:20 pm Trust Region Methods (slides) (video) Nocedal and Wright, Numerical Optimization, Chapter 4
June 17 Project proposal due (11:59 pm)
15a June 20, 10:30-10:50 am Policy optimization (slides) (video) Trust region policy optimization
(Presenter: Shivam Kalra)
15b June 20, 11:10-11:30 am Policy optimization (slides) (video) Proximal policy optimization algorithms
(Presenter: Ruifan Yu)
15c June 20, 11:50-12:20 pm Semi-Markov Decision Processes (slides) (video) [Put] Sec. 11.1-11.3
16a June 22, 10:30-10:50 am Hierarchical RL (slides) (video) The Option-Critic Architecture
(Presenter: Zebin Kang)
16b June 22, 11:10-11:30 am Hierarchical RL (slides) (video) FeUdal Networks for Hierarchical Reinforcement Learning
(Presenter: Rene Bidart)
17a June 27, 10:30-10:50 am RL for robotics (slides) (video) Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
(Presenter: James Cagalawan)
17b June 27, 11:10-11:30 am RL for robotics (slides) (video) Control of a Quadrotor with Reinforcement Learning
(Presenter: Nicole McNabb)
17c June 27, 11:50-12:20 pm Inverse Reinforcement Learning (slides) (video) Abbeel, Ng, Apprenticeship Learning via Inverse Reinforcement Learning, ICML-2004
18a June 29, 10:30-10:50 am RL for autonomous vehicles (slides) (video) Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
(Presenter: Ashish Gaurav)
18b June 29, 11:10-11:30 am RL for autonomous vehicles (slides) (video) Learning Driving Styles for Autonomous Vehicles from Demonstration
(Presenter: Marko Ilievski)
19a July 4, 10:30-10:50 am RL for conversational agents (slides) (video) End-to-end lstm-based dialog control optimized with supervised and reinforcement learning
(Presenter: Hamidreza Shahidi)
19b July 4, 11:10-11:30 am RL for conversational agents (slides) (video) Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
(Presenter: Nalin Chhibber)
19c July 4, 11:50-12:20 pm Memory Augmented Networks (slides) (video) [GBC] Chap. 10
20a July 6, 10:30-10:50 am Memory based RL (slides) (video) Neural Map: Structured Memory for Deep Reinforcement Learning
(Presenter: Andreas Stöckel)
20b July 6, 11:10-11:30 am Memory based RL (slides) (video) Memory Augmented Control Networks
(Presenter: Aravind Balakrishnan)
August 1 Project report due (11:59 pm)