The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.
Table of online modules
Week | Module | Topic | Readings (textbooks) |
---|---|---|---|
Jan 3-7 | 2018: 1a | Course introduction (slides) (video) | [SutBar] Chapter 1, [Sze] Chapter 1 |
2018: 1b | Markov Processes (slides) (video) | [RusNor] Section 15.1 | |
Jan 10-14 | 2018: 2a | Markov Decision Processes (slides) (video) | [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5 |
2018: 2b | Value Iteration (slides) (video) | [SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1 | |
2018: 3a | Policy Iteration (slides) (video) | [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3 | |
2018: 3b | Introduction to RL (slides) (video) | [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3 | |
Jan 17-21 | 2018: 4a | Deep neural networks (slides) (video) | [GBC] Chap. 6, 7, 8 |
2018: 4b | Deep Q-Networks (slides) (video) | [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2 | |
2018: 7a | Policy Gradient (slides) (video) | [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 | |
2018: 7b | Actor Critic (slides) (video) | [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3 | |
Jan 21 | Assignment 1 due (11:59 pm) | ||
Jan 24-28 | 2018: 14c | Trust Region Methods (slides) (video) | Nocedal and Wright, Numerical Optimization, Chapter 4 |
2020: 1 | Trust Region & Proximal Policy Optimization (slides) (video) | Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML. Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv. |
|
2020: 2 | Maximum Entropy Reinforcement Learning (slides) (video) | Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML. Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML. |
|
2018: 8a | Multi-armed bandits (slides) (video) | [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2 | |
2018: 8b | Bayesian and contextual bandits (slides) (video) | [SutBar] Sec. 2.9 | |
Jan 31 - Feb 4 | 2018: 9 | Model-based RL (slides) (video) | [SutBar] Chap. 8 |
2018: 10 | Bayesian RL (slides) (video) | Michael O’Gordon Duff’s PhD Thesis (2002) | |
2021: 4 | Partially observable RL (slides) (video) | Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. In 2015 AAAI fall symposium series. | |
2021: 5 | Distributional RL (slides) (video) | Bellemare, Marc G., Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. International Conference on Machine Learning. 2017. | |
Feb 4 | Assignment 2 due (11:59 pm) | ||
Feb 7-11 | 2020: 3 | Imitation Learning (slides) (video) | Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957). |
2021: 6 | Inverse Reinforcement Learning (slides) (video) | Ziebart, B. D., Bagnell, J. A., & Dey, A. K. (2010). Modeling interaction via the principle of maximum causal entropy. In ICML. Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In ICML (pp. 49-58). |
|
2022: 7 | Constrained RL | ||
2022: 8 | Inverse Constrained RL | ||
Feb 14 | Bid for paper presentations due (11:59 pm) | ||
Feb 16 | Project proposal due (11:59 pm) | ||
Feb 18 | Assignment 3 due (11:59 pm) | ||
Feb 21-25 | Reading break |
The following table provides the schedule for paper discussions. Each date refers to the online session when we will discuss that paper. Paper presentations consist of pre-recorded videos that can be watched at anytime
Table of paper discussions
Date | Presenter | Discussants | Topic | Papers |
---|---|---|---|---|
Feb 28 | Jack Xu (slides) (video) | Roy Qu | Multi-Agent RL | Iqbal, S., & Sha, F. (2019, May). Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning (pp. 2961-2970). PMLR. |
Yuxiang Huang (slides) (video) | Multi-Agent RL | Subramanian, S. G., Taylor, M. E., Crowley, M., & Poupart, P. (2022). Decentralized Mean Field Games. AAAI. | ||
Mar 2 | Shuhui Zhu (slides) (video) | William Loh, Amar Sarang | E-commerce | Cai, H., Ren, K., Zhang, W., Malialis, K., Wang, J., Yu, Y., & Guo, D. (2017, February). Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (pp. 661-670). |
Jared Feng (slides) (video) | William Loh | E-commerce | Hu, Y., Da, Q., Zeng, A., Yu, Y., & Xu, Y. (2018, July). Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 368-377). | |
Mar 7 | Haoye Lu (slides) (video) | Michael Karras | Finance | Gašperov, B., Begušić, S., Posedel Šimović, P., & Kostanjčar, Z. (2021). Reinforcement Learning Approaches to Optimal Market Making. Mathematics, 9(21), 2689. |
Erik Huebner (slides) (video) | Finance | Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10-27. | ||
Mar 9 | Iara Santelices (slides) (video) | Haoye Lu, Ahmad Rashid, Aaron Propp, Mohammad Aali, Daniel Herman | Multitask RL | Guo, Z. D., Pires, B. A., Piot, B., Grill, J. B., Altché, F., Munos, R., & Azar, M. G. (2020, November). Bootstrap latent-predictive representations for multitask reinforcement learning. In International Conference on Machine Learning (pp. 3875-3886). PMLR. |
Rowan Dempster (slides) (video) | Joel Prabhu, Daniel Herman, Jarvis Xie, Rasoul Mahdavi, Mohammad Aali, Abdelrahman Ahmed, Abhinav Bora, Aayush Wadhwa, Amar Sarang, Roy Qu | Multitask RL | Yu, T., Quillen, D., He, Z., Julian, R., Hausman, K., Finn, C., & Levine, S. (2020, May). Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (pp. 1094-1100). PMLR. | |
Mar 14 | Wei Hu (slides) (video) | Daniel Herman, Youssef Fathi, Abdelrahman Agmed, Tharindu Kodippili, Rasoul Mahdavi, Michael Karras, Helen Wu, Liam Hebert, Abhinav Bora, Roy Qu, Mohammad Khalaji, William Loh | Combinatorial Optimization | Nazari, M., Oroojlooy, A., Snyder, L., & Takác, M. (2018). Reinforcement learning for solving the vehicle routing problem. Advances in neural information processing systems, 31. |
Benyamin Jamialahmadi (slides) (video) | Youssef Fathi, Tharindu Kodippili, Aayush Wadhwa, Iara Santelices, Aaron Propp, Helen Wu, Liam Hebert | Biology | Angermueller, C., Dohan, D., Belanger, D., Deshpande, R., Murphy, K., & Colwell, L. (2020). Model-based reinforcement learning for biological sequence design. In International conference on learning representations. | |
Mar 16 | William Dawkins (slides) (video) | Shuihui Zhu, Mohammad Aali, Jarvis Xie, Weijie Zhou, Wei Hu, Ahmad Rashid, Zhenyang Xu, Mattie Nejati, Rowan Dempster, Amar Sarang | Safe RL | Wen, L., Duan, J., Li, S. E., Xu, S., & Peng, H. (2020, September). Safe reinforcement learning for autonomous vehicles through parallel constrained policy optimization. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) (pp. 1-7). IEEE. |
Pouya Kananian (slides) (video) | Mohammad Aali, Seyed Shushtari, Jack Xu | Safe RL | Donti, P. L., Roderick, M., Fazlyab, M., & Kolter, J. Z. (2020, September). Enforcing robust control guarantees within neural network policies. In International Conference on Learning Representations. | |
Mar 21 | Mattie Nejati (slides) (video) | Michael Karras, Abhinav Bora, Jeffery Liu, Weijie Zhou, Joel Prabhu, Yaoxin Li, Erik Huebner | Systems | Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2021, June). Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data (pp. 1275-1288). |
Mohammad Khalaji (slides) (video) | Weijie Zhou, Pouya Kananian | Systems | Yan, Z., Ge, J., Wu, Y., Li, L., & Li, T. (2020). Automatic virtual network embedding: A deep reinforcement learning approach with graph convolutional networks. IEEE Journal on Selected Areas in Communications, 38(6), 1040-1057. | |
Mar 23 | Francis Kiwon (slides) (video) | Yuxiang Huang, Yaoxin Li, Erik Huebner, Jeffery Liu, Seyed Shushtari, Zhenyang Xu, Helen Wu | Explainability | Liu, G., Sun, X., Schulte, O., & Poupart, P. (2021). Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning. Advances in Neural Information Processing Systems, 34. |
Youssef Fathi (slides) (video) | Francis Kiwon, Ahmad Rashid, Yaoxin Li, Liam Hebert, Jarvis Xie, Joel Prabhu, Benyamin Jamialah Madi, Jeffery Liu, Weijie Zhou, William Dawkins, Aayush Wadhwa, Abdelrahman Ahmed, Seyed Shushtari, Rasoul Mahdavi, Tharindu Kodippili, Aaron Propp, Zhenyang Xu, Jared Feng, Helen Wu | Offline RL | Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., ... & Mordatch, I. (2021). Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34. | |
Apr 11 | Project report due (11:59 pm) |