The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.
Table of online modules
Week | Module | Topic | Readings (textbooks) |
---|---|---|---|
May 11-15 | 2020: Logistics & Website | Welcome & Logistics (video) Goals (video) Textbook (video) Schedule (video) Marks (video) Assignments (video) Critiques (video) Presentation (video) Project (video) |
|
2018: 1a | Course introduction (slides) (video) | [SutBar] Chapter 1, [Sze] Chapter 1 | |
2018: 1b | Markov Processes (slides) (video) | [RusNor] Section 15.1 | |
2018: 2a | Markov Decision Processes (slides) (video) | [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5 | |
2018: 2b | Value Iteration (slides) (video) | [SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1 | |
May 18-22 | 2018: 3a | Policy Iteration (slides) (video) | [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3 |
2018: 3b | Introduction to RL (slides) (video) | [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3 | |
2018: 4a | Deep neural networks (slides) (video) | [GBC] Chap. 6, 7, 8 | |
2018: 4b | Deep Q-Networks (slides) (video) | [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2 | |
May 25-29 | 2018: 7a | Policy Gradient (slides (Slides 8,9 revised June 11)) (video) | [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 |
2018: 7b | Actor Critic (slides) (video) | [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3 | |
2018: 14c | Trust Region Methods (slides) (video) | Nocedal and Wright, Numerical Optimization, Chapter 4 | |
2020: 1 | Trust Region & Proximal Policy Optimization (slides) (video) | Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML. Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv. |
|
2020: 2 | Maximum Entropy Reinforcement Learning (slides (slides 14 and 22 modified June 23)) (video) | Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML. Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML. |
|
May 29 | Assignment 1 due (11:59 pm) | ||
June 1-5 | 2018: 8a | Multi-armed bandits (slides) (video) | [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2 |
2018: 8b | Bayesian and contextual bandits (slides) (video) | [SutBar] Sec. 2.9 | |
2018: 9 | Model-based RL (slides) (video) | [SutBar] Chap. 8 | |
2018: 10 | Bayesian RL (slides) (video) | Michael O’Gordon Duff’s PhD Thesis (2002) | |
June 8-12 | 2018: 11a | Hidden Markov models (slides) (video) | [RusNor] Sec. 15.3 [SutBar] Sec. 17.3 |
2018: 11b | Partially observable RL (slides) (video) | [RusNor] Sec. 17.3 [SigBuf] Chap. 7 | |
2018: 12 | Deep recurrent Q-networks (slides) (video) | [GBC] Chap. 10 | |
2020: 3 | Imitation Learning (slides) (video) | Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957). |
|
June 24 | Assignment 2 due (11:59 pm) | ||
June 26 | Project proposal due (11:59 pm) |
The following table provides the schedule for paper presentations. The list of papers and assigned presenters will be determined by the end of May.
Table of paper presentations
Week | Presenter | Topic | Papers |
---|---|---|---|
June 29 - July 3 | Siliang Huang (slides) (video) | RL for video games | Goel, V., Weng, J., & Poupart, P. (2018). Unsupervised video object segmentation for deep reinforcement learning. In Advances in Neural Information Processing Systems (pp. 5683-5694). |
Runsheng Guo (slides) (video) | RL for video games | Justesen, N., Bontrager, P., Togelius, J., & Risi, S. (2019). Deep learning for video game playing. IEEE Transactions on Games. | |
Enamul Haque (slides) (video) | RL for query optimization in datas systems | Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., & Stoica, I. (2018). Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196. | |
Upside down RL | Srivastava, R. K., Shyam, P., Mutz, F., Jaśkowski, W., & Schmidhuber, J. (2019). Training Agents using Upside-Down Reinforcement Learning. arXiv preprint arXiv:1912.02877. | ||
July 6-10 | Samuel Yigzaw (slides) (video) | Hierarchical RL | Nachum, O., Gu, S. S., Lee, H., & Levine, S. (2018). Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems (pp. 3303-3313). |
David Thomas Radke (slides) (video) | Hierarchical RL | Wang, X., Chen, W., Wu, J., Wang, Y. F., & Yang Wang, W. (2018). Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4213-4222). | |
Scott Larter (slides) (video) | RL for robotics | Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018, May). Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6292-6299). IEEE. | |
Egill Gudmundsson (slides) (video) | RL for robotics | Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254. | |
July 13-17 | Laura Graves (slides) (video) | RL for autonomous vehicles | You, C., Lu, J., Filev, D., & Tsiotras, P. (2019). Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 114, 1-18. |
Neel Bhatt (slides) (video) | RL for autonomous vehicles | Zhang, P., Xiong, L., Yu, Z., Fang, P., Yan, S., Yao, J., & Zhou, Y. (2019). Reinforcement Learning-Based End-to-End Parking for Automatic Parking System. Sensors, 19(18), 3996. | |
Hytham Farah (slides) (video) | RL for conversational agents | Zhao, T., & Eskenazi, M. (2016, September). Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 1-10). | |
Mojtaba Valipour (slides) (video) | RL for conversational agents | Zhao, T., Xie, K., & Eskenazi, M. (2019, June). Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1208-1218). | |
July 20-24 | Saif Zabarah (slides) (video) | RL for finance | Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006, June). Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on Machine learning (pp. 673-680). |
Zhongwen Zhang (slides) (video) | RL for healthcare | Futoma, J., Hughes, M. C., & Doshi-Velez, F. (2020). POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning. arXiv preprint arXiv:2001.04032. | |
Yan Shi (slides) (video) | RL for combinatorial optimization | Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940. | |
Lizhe Chen (slides) (video) | RL for neural architecture search | Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697-8710). | |
August 9 | Project report due (11:59 pm) |