The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). Readings are complementary and optional.
Table of online modules
Week | Module | Topic | Readings (textbooks) |
---|---|---|---|
Sept 9 | 1a | Course introduction (slides) | [SutBar] Chapter 1, [Sze] Chapter 1 |
1b | Markov Decision Processes, Value Iteration (slides) | [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 15.1, 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5 | |
Sept 12 | 2a | Convergence Properties (slides) | [SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1 |
2b | Policy Iteration (slides) | [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3 | |
Sept 16 | 3a | Intro to Reinforcement Learning, Q-learning (slides) | [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3 |
3b | Deep Q-networks (slides) | [GBC] Chap. 6, 7, 8, [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2 | |
Sept 19 | 4a | Policy gradient (slides) | [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 |
4b | Actor critic (slides) (corrected typos in Slides 6, 9, 11, 13 on Oct 6) | [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3 | |
Sept 23 | None | No new material | |
None | No new material | ||
Sept 26 | 5a | Multi-armed bandits (slides) | [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2 |
5b | Bayesian and Contextual bandits (slides) | [SutBar] Sec. 2.9 | |
Sept 28 | Assignment 1 due (11:59 pm) | ||
Sept 30 | 6a | Trust Regions and Proximal Policies (slides) (added Slide 15 and corrected typos on Slides 11, 14 on Oct 6) | Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML. Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv. |
6b | Maximum entropy RL (slides) | Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML. Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML. |
|
Oct 3 | 7 | Offline RL (slides) | Levine, Kumar, Tucker, Fu (2021) Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arxiv. Kumar, Zhou, Tucker, Levine (2020) Conservative Q-Learning for Offline Reinforcement Learning, NeurIPS. |
Oct 7 | 8a | Model-based RL (slides) | [SutBar] Chap. 8 |
8b | Partially observable RL, DRQN (slides) | Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. In 2015 AAAI fall symposium series. | |
Oct 10-14 | Reading break | ||
Oct 17 | 9 | Distributional RL (slides) | Bellemare, Dabney, Munos. A distributional perspective on reinforcement learning. ICML. 2017. Bellemare, Dabney, Rolland. Distributional Reinforcement Learning, MIT Press, 2023. |
Oct 19 | Assignment 2 due (11:59 pm) | ||
Oct 21 | 10 | Constrained RL (slides) | Ray, Achiam, Amodei, Benchmarking Safe Exploration in Deep Reinforcement Learning.Liu, Alev, Liu, Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey, IJCAI, 2021 |
Oct 24 | 11a | Bayesian RL (slides) | Michael O’Gordon Duff’s PhD Thesis (2002) Vlassis, Ghavamzadeh, Mannor, Poupart, Bayesian Reinforcement Learning (Chapter in Reinforcement Learning: State-of-the-Art), Springer Verlag, 2012 |
11b | Multi-task RL (slides) | Vithayathil Varghese, N., & Mahmoud, Q. H. (2020). A survey of multi-task deep reinforcement learning. Electronics, 9(9), 1363. | |
Oct 26 | Project proposal due (11:59 pm) | ||
Oct 28 | 12 | Imitation Learning (slides) | Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957). |
Oct 31 | 13 | Inverse RL (slides) | Ziebart, B. D., Bagnell, J. A., & Dey, A. K. (2010). Modeling interaction via the principle of maximum causal entropy. In ICML. Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In ICML (pp. 49-58). |
Nov 2 | Assignment 3 due (11:59 pm) | ||
Nov 4 | 14 | RL with Sequence Modeling (slides) | Esslinger, Platt & Amato (2022). Deep Transformer Q-Networks for Partially Observable Reinforcement Learning. arXiv. Chen et al.. (2021). Decision transformer: Reinforcement learning via sequence modeling. NeurIPS, 34, 15084-15097. Gu, Goel, & Ré (2022). Efficiently modeling long sequences with structured state spaces. ICLR. Gu, Dao, Ermon, Rudra & Ré (2020). Hippo: Recurrent memory with optimal polynomial projections. NeurIPS, 33, 1474-1487. |
Table of paper presentations
Date | Presenter | Discussants | Topic | Papers |
---|---|---|---|---|
Nov 7 | Wen Cui | Angelo Arvind Rajendram, Ahmed Hussein Salamah, Marty Mukherjee, Vijay Ravi, Christopher Risi | RL for Math | Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis & Pushmeet Kohli (2022), Discovering faster matrix multiplication algorithms with reinforcement learning, Nature volume 610, pages47–53. |
Nimmi Rashinika Weeraddana | RL for Health | Mariya Popova, Olexandr Isayev, Alexander Tropsha (2018), Deep reinforcement learning for de novo drug design, Science Advances, Vol 4, Issue 7. | ||
Nov 11 | Eli Henry Dykhne | Sonja Linghui Shan, Yulin Xue, Christopher Risi, Stephanie Maaz, Remy El Sabeh, Vijay Ravi, Sherman Siu | RL for Games | Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap & David Silver (2020), Mastering Atari, Go, chess and shogi by planning with a learned model, Nature volume 588, pages604–609. |
Marty Mukherjee | Shirley Chen, Stephanie Maaz, Edward Poon, Yandong Zhu, Vijay Ravi, Liwei Alan Wu, Siqing Huo | RL for Games | Linus Gisslén, Andy Eakins, Camilo Gordillo, Joakim Bergdahl, Konrad Tollmar (2021), Adversarial Reinforcement Learning for Procedural Content Generation, IEEE Conference on Games, pages 1-8. | |
Nov 14 | Zheng Ma | RL for Finance | Zhengyao Jiang, Jinjun Liang (2017), Cryptocurrency Portfolio Management with Deep Reinforcement Learning, Intelligent Systems Conference. | |
Lufan Wang | RL for Finance | Berend Jelmer Dirk Gort, Xiao-Yang Liu, Xinghang Sun, Jiechao Gao, Shuaiyu Chen, and Christina Dan Wang (2022), Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting, In Proceedings of 3rd ACM International Conference on AI in Finance (ICAIF). ACM, New York, NY, USA, 9 pages. | ||
Nov 18 | Shirley Chen | RL for Data Systems | Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, Nesime Tatbul (2019), Neo: A Learned Query Optimizer, PVLDB, 12(11): 1705-1718. | |
Yueheng Zhang | RL for Optimization | Iddo Drori, Anant Kharkar, William R. Sickinger, Brandon Kates, Qiang Ma, Suwen Ge, Eden Dolev, Brenda Dietrich, David P. Williamson, Madeleine Udell (2020), Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time, IEEE International Conference on Machine Learning and Applications (ICMLA). | ||
Nov 21 | Angelo Arvind Rajendram | Credit Assignment in RL | Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos, Counterfactual Credit Assignment in Model-Free Reinforcement Learning, International Conference on Machine Learning, PMLR 139. | |
Chi-Chung Cheung | Explainability | Stratis Tsirtsis, Abir De, Manuel Gomez Rodriguez (2021), Counterfactual Explanations in Sequential Decision Making Under Uncertainty, NeurIPS. | ||
Nov 25 | Ipsita Mohanty | RL in Software Verification | Konstantin Böttinger, Patrice Godefroid, Rishabh Singh (2018), Deep Reinforcement Fuzzing, IEEE Symposium on Security and Privacy Workshops. | |
Brian Zimmerman | RL for Recommender Systems | Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin (2019), Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems, KDD. | ||
Nov 28 | Kaixiang Zheng | RL as sequence modeling | Michael Janner, Qiyang Li, Sergey Levine (2021), Offline Reinforcement Learning as One Big Sequence Modeling Problem, NeurIPS. | |
Martin Ethier | RL for Continual Learning | Ju Xu, Zhanxing Zhu (2018), Reinforced Continual Learning, NeurIPS | ||
Dec 2 | Ahmed Hussein Salamah | RL for Computer Vision | Li, H., Guo, Y., Wang, Z., Xia, S., & Zhu, W. (2019). AdaCompress: Adaptive compression for online computer vision services. In Proceedings of the 27th ACM International Conference on Multimedia (pp. 2440-2448). | |
Thomas Humphries | RL for Traffic Control | Tianshu Chu, Jie Wang, Lara Codecà, and Zhaojian Li (2020), Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control, IEEE Transactions on Intelligent Transportation Systems, Vol 21, No 3. | ||
Dec 12 | Project report due (11:59 pm) |