This is a tentative schedule only. As the course progresses, the schedule will be adjusted. All lectures are recorded and the videos are available as a playlist in Pascal's YouTube channel.
Lecture | Date | Topic | Readings (textbooks) |
---|---|---|---|
1a | May 2, 10:30-11:20 am | Course introduction (slides) (video) | [SutBar] Chapter 1, [Sze] Chapter 1 |
1b | May 2, 11:30-12:20 pm | Markov Processes (slides) (video) | [RusNor] Section 15.1 |
2a | May 4, 10:30-11:20 am | Markov Decision Processes (slides (slides 5,6,9 revised May 5)) (video) | [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5 |
2b | May 4, 11:30-12:20 pm | Value Iteration (slides (slides 13,14 revised May 14) ) (video) | [SutBar] Sec. 4.1, 4.1, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1 |
3a | May 9, 10:30-11:20 am | Policy Iteration (slides) (video) | [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3 |
3b | May 9, 11:30-12:20 pm | Introduction to RL (slides) (video) | [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3 |
4a | May 11, 10:30-11:20 am | Deep neural networks (slides) (video) | [GBC] Chap. 6, 7, 8 |
4b | May 11, 11:30-12:20 pm | Deep Q-Networks (slides) (video) | [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2 |
5 | May 16, 10:30-12:20 pm | Guest lecture by Nabiha Asghar on RL for dialog systems (slides) (video) | |
6a | May 18, 10:30-11:20 am | Guest Lecture by Mike Rudd on OpenAI environments (slides) (video) | |
6b | May 18, 11:30-12:20 pm | Guest Lecture by Timmy Tse on DQN and TensorFlow (slides) (video) (demo.zip) | |
7a | May 23, 10:30-11:20 am | Policy Gradient (slides (slide 9 revised May 24)) (video) | [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 |
7b | May 23, 11:30-12:20 pm | Actor Critic (slides) (video) | [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3 |
8a | May 25, 10:30-11:20 am | Multi-armed bandits (slides) (video) | [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2 |
8b | May 25, 11:30-12:20 pm | Bayesian and contextual bandits (slides) (video) | [SutBar] Sec. 2.9 |
9 | May 30, 10:30-12:20 pm | Model-based RL (slides (slide 13 revised June 1)) (video) | [SutBar] Chap. 8 |
10 | June 1, 10:30-12:20 pm | Bayesian RL (slides) (video) | Michael O’Gordon Duff’s PhD Thesis (2002) |
11a | June 6, 10:30-11:20 am | Hidden Markov models (slides) (video) | [RusNor] Sec. 15.3 [SutBar] Sec. 17.3 |
11b | June 6, 11:30-12:20 pm | Partially observable RL (slides) (video) | [RusNor] Sec. 17.3 [SigBuf] Chap. 7 |
12 | June 8, 10:30-12:20 pm | Deep recurrent Q-networks (slides) (video) | [GBC] Chap. 10 |
13a | June 13, 10:30-10:50 am | RL for video games (slides) (video) | Playing FPS Games with Deep Reinforcement Learning (Presenter: Mark Iwanchyshyn) |
13b | June 13, 11:10-11:30 am | RL for video games (slides) (video) | A Deep Hierarchical Approach to Lifelong Learning in Minecraft (Presenter: Yetian Wang) |
13c | June 13, 11:50-12:20 pm | Adversarial Search (slides) (video) | [RusNor] Sec. 5.1-5.4 |
14a | June 15, 10:30-10:50 am | RL for computer Go (slides) (video) | Mastering the Game of Go without Human Knowledge (Presenter: Henry Chen) |
14b | June 15, 11:10-11:30 am | RL for board games (slides) (video) | Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (Presenter: Kira Selby) |
14c | June 15, 11:50-12:20 pm | Trust Region Methods (slides) (video) | Nocedal and Wright, Numerical Optimization, Chapter 4 |
June 17 | Project proposal due (11:59 pm) | ||
15a | June 20, 10:30-10:50 am | Policy optimization (slides) (video) | Trust region policy optimization (Presenter: Shivam Kalra) |
15b | June 20, 11:10-11:30 am | Policy optimization (slides) (video) | Proximal policy optimization algorithms (Presenter: Ruifan Yu) |
15c | June 20, 11:50-12:20 pm | Semi-Markov Decision Processes (slides) (video) | [Put] Sec. 11.1-11.3 |
16a | June 22, 10:30-10:50 am | Hierarchical RL (slides) (video) | The Option-Critic Architecture (Presenter: Zebin Kang) |
16b | June 22, 11:10-11:30 am | Hierarchical RL (slides) (video) | FeUdal Networks for Hierarchical Reinforcement Learning (Presenter: Rene Bidart) |
17a | June 27, 10:30-10:50 am | RL for robotics (slides) (video) | Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning (Presenter: James Cagalawan) |
17b | June 27, 11:10-11:30 am | RL for robotics (slides) (video) | Control of a Quadrotor with Reinforcement Learning (Presenter: Nicole McNabb) |
17c | June 27, 11:50-12:20 pm | Inverse Reinforcement Learning (slides) (video) | Abbeel, Ng, Apprenticeship Learning via Inverse Reinforcement Learning, ICML-2004 |
18a | June 29, 10:30-10:50 am | RL for autonomous vehicles (slides) (video) | Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving (Presenter: Ashish Gaurav) |
18b | June 29, 11:10-11:30 am | RL for autonomous vehicles (slides) (video) | Learning Driving Styles for Autonomous Vehicles from Demonstration (Presenter: Marko Ilievski) |
19a | July 4, 10:30-10:50 am | RL for conversational agents (slides) (video) | End-to-end lstm-based dialog control optimized with supervised and reinforcement learning (Presenter: Hamidreza Shahidi) |
19b | July 4, 11:10-11:30 am | RL for conversational agents (slides) (video) | Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning (Presenter: Nalin Chhibber) |
19c | July 4, 11:50-12:20 pm | Memory Augmented Networks (slides) (video) | [GBC] Chap. 10 |
20a | July 6, 10:30-10:50 am | Memory based RL (slides) (video) | Neural Map: Structured Memory for Deep Reinforcement Learning (Presenter: Andreas Stöckel) |
20b | July 6, 11:10-11:30 am | Memory based RL (slides) (video) | Memory Augmented Control Networks (Presenter: Aravind Balakrishnan) |
August 1 | Project report due (11:59 pm) |