Home Goals Textbook Schedule Assignments Critiques Presentation Project Marks Policies Pascal's Homepage

CS885 Fall 2022 - Reinforcement Learning

The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). Readings are complementary and optional.

Table of online modules

Week Module Topic Readings (textbooks)
Sept 9 1a Course introduction (slides) [SutBar] Chapter 1, [Sze] Chapter 1
1b Markov Decision Processes, Value Iteration (slides) [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 15.1, 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
Sept 12 2a Convergence Properties (slides) [SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
2b Policy Iteration (slides) [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3
Sept 16 3a Intro to Reinforcement Learning, Q-learning (slides) [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
3b Deep Q-networks (slides) [GBC] Chap. 6, 7, 8, [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
Sept 19 4a Policy gradient (slides) [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
4b Actor critic (slides) (corrected typos in Slides 6, 9, 11, 13 on Oct 6) [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
Sept 23 None No new material
None No new material
Sept 26 5a Multi-armed bandits (slides) [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
5b Bayesian and Contextual bandits (slides) [SutBar] Sec. 2.9
Sept 28 Assignment 1 due (11:59 pm)
Sept 30 6a Trust Regions and Proximal Policies (slides) (added Slide 15 and corrected typos on Slides 11, 14 on Oct 6) Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML.
Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv.
6b Maximum entropy RL (slides) Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML.
Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
Oct 3 7 Offline RL (slides) Levine, Kumar, Tucker, Fu (2021) Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arxiv.
Kumar, Zhou, Tucker, Levine (2020) Conservative Q-Learning for Offline Reinforcement Learning, NeurIPS.
Oct 7 8a Model-based RL (slides) [SutBar] Chap. 8
8b Partially observable RL, DRQN (slides) Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. In 2015 AAAI fall symposium series.
Oct 10-14 Reading break
Oct 17 9 Distributional RL (slides) Bellemare, Dabney, Munos. A distributional perspective on reinforcement learning. ICML. 2017.
Bellemare, Dabney, Rolland. Distributional Reinforcement Learning, MIT Press, 2023.
Oct 19 Assignment 2 due (11:59 pm)
Oct 21 10 Constrained RL (slides) Ray, Achiam, Amodei, Benchmarking Safe Exploration in Deep Reinforcement Learning. Liu, Alev, Liu, Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey, IJCAI, 2021
Oct 24 11a Bayesian RL (slides) Michael O’Gordon Duff’s PhD Thesis (2002)
Vlassis, Ghavamzadeh, Mannor, Poupart, Bayesian Reinforcement Learning (Chapter in Reinforcement Learning: State-of-the-Art), Springer Verlag, 2012
11b Multi-task RL (slides) Vithayathil Varghese, N., & Mahmoud, Q. H. (2020). A survey of multi-task deep reinforcement learning. Electronics, 9(9), 1363.
Oct 26 Project proposal due (11:59 pm)
Oct 28 12 Imitation Learning (slides) Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573).
Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957).
Oct 31 13 Inverse RL (slides) Ziebart, B. D., Bagnell, J. A., & Dey, A. K. (2010). Modeling interaction via the principle of maximum causal entropy. In ICML.
Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In ICML (pp. 49-58).
Nov 2 Assignment 3 due (11:59 pm)
Nov 4 14 RL with Sequence Modeling (slides) Esslinger, Platt & Amato (2022). Deep Transformer Q-Networks for Partially Observable Reinforcement Learning. arXiv.
Chen et al.. (2021).
Decision transformer: Reinforcement learning via sequence modeling. NeurIPS, 34, 15084-15097.
Gu, Goel, & Ré (2022). Efficiently modeling long sequences with structured state spaces. ICLR.
Gu, Dao, Ermon, Rudra & Ré (2020). Hippo: Recurrent memory with optimal polynomial projections. NeurIPS, 33, 1474-1487.

Table of paper presentations

Date Presenter Discussants Topic Papers
Nov 7 Wen Cui Angelo Arvind Rajendram, Ahmed Hussein Salamah, Marty Mukherjee, Vijay Ravi, Christopher Risi RL for Math Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis & Pushmeet Kohli (2022), Discovering faster matrix multiplication algorithms with reinforcement learning, Nature volume 610, pages47–53.
Nimmi Rashinika Weeraddana RL for Health Mariya Popova, Olexandr Isayev, Alexander Tropsha (2018), Deep reinforcement learning for de novo drug design, Science Advances, Vol 4, Issue 7.
Nov 11 Eli Henry Dykhne Sonja Linghui Shan, Yulin Xue, Christopher Risi, Stephanie Maaz, Remy El Sabeh, Vijay Ravi, Sherman Siu RL for Games Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap & David Silver (2020), Mastering Atari, Go, chess and shogi by planning with a learned model, Nature volume 588, pages604–609.
Marty Mukherjee Shirley Chen, Stephanie Maaz, Edward Poon, Yandong Zhu, Vijay Ravi, Liwei Alan Wu, Siqing Huo RL for Games Linus Gisslén, Andy Eakins, Camilo Gordillo, Joakim Bergdahl, Konrad Tollmar (2021), Adversarial Reinforcement Learning for Procedural Content Generation, IEEE Conference on Games, pages 1-8.
Nov 14 Zheng Ma RL for Finance Zhengyao Jiang, Jinjun Liang (2017), Cryptocurrency Portfolio Management with Deep Reinforcement Learning, Intelligent Systems Conference.
Lufan Wang RL for Finance Berend Jelmer Dirk Gort, Xiao-Yang Liu, Xinghang Sun, Jiechao Gao, Shuaiyu Chen, and Christina Dan Wang (2022), Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting, In Proceedings of 3rd ACM International Conference on AI in Finance (ICAIF). ACM, New York, NY, USA, 9 pages.
Nov 18 Shirley Chen RL for Data Systems Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, Nesime Tatbul (2019), Neo: A Learned Query Optimizer, PVLDB, 12(11): 1705-1718.
Yueheng Zhang RL for Optimization Iddo Drori, Anant Kharkar, William R. Sickinger, Brandon Kates, Qiang Ma, Suwen Ge, Eden Dolev, Brenda Dietrich, David P. Williamson, Madeleine Udell (2020), Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time, IEEE International Conference on Machine Learning and Applications (ICMLA).
Nov 21 Angelo Arvind Rajendram Credit Assignment in RL Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos, Counterfactual Credit Assignment in Model-Free Reinforcement Learning, International Conference on Machine Learning, PMLR 139.
Chi-Chung Cheung Explainability Stratis Tsirtsis, Abir De, Manuel Gomez Rodriguez (2021), Counterfactual Explanations in Sequential Decision Making Under Uncertainty, NeurIPS.
Nov 25 Ipsita Mohanty RL in Software Verification Konstantin Böttinger, Patrice Godefroid, Rishabh Singh (2018), Deep Reinforcement Fuzzing, IEEE Symposium on Security and Privacy Workshops.
Brian Zimmerman RL for Recommender Systems Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin (2019), Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems, KDD.
Nov 28 Kaixiang Zheng RL as sequence modeling Michael Janner, Qiyang Li, Sergey Levine (2021), Offline Reinforcement Learning as One Big Sequence Modeling Problem, NeurIPS.
Martin Ethier RL for Continual Learning Ju Xu, Zhanxing Zhu (2018), Reinforced Continual Learning, NeurIPS
Dec 2 Ahmed Hussein Salamah RL for Computer Vision Li, H., Guo, Y., Wang, Z., Xia, S., & Zhu, W. (2019). AdaCompress: Adaptive compression for online computer vision services. In Proceedings of the 27th ACM International Conference on Multimedia (pp. 2440-2448).
Thomas Humphries RL for Traffic Control Tianshu Chu, Jie Wang, Lara Codecà, and Zhaojian Li (2020), Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control, IEEE Transactions on Intelligent Transportation Systems, Vol 21, No 3.
Dec 12 Project report due (11:59 pm)