CS885 Winter 2022 - Reinforcement Learning

The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.

[SutBar] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction (2nd edition, 2018) freely available online
[Sze] Csaba Szepesvari, Algorithms for Reinforcement Learning freely available online
[ZB] Alex Zai and Brandon Brown, Deep Reinforcement Learning in Action (2nd edition, 2020) freely available online
[GBC] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning (2016) freely available online
[L] Maxim Lapan, Deep Reinforcement Learning Hands On (2020)
[GK] Laura Graesser and Wah Loon Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (2020)
[SigBuf] Olivier Sigaud and Olivier Buffet (editors), Markov Decision Processes in Artificial Intelligence (2013)
[Put] Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (2008)
[Ber] Dimitri P. Bertsekas, Dynamic Programming and Optimal Control (2017)
[Pow] Warren B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality (2015)
[RusNor] Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (4th Edition) (2020)

Table of online modules

Week	Module	Topic	Readings (textbooks)
Jan 3-7	2018: 1a	Course introduction (slides) (video)	[SutBar] Chapter 1, [Sze] Chapter 1
Jan 3-7	2018: 1b	Markov Processes (slides) (video)	[RusNor] Section 15.1
Jan 10-14	2018: 2a	Markov Decision Processes (slides) (video)	[SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
	2018: 2b	Value Iteration (slides) (video)	[SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
	2018: 3a	Policy Iteration (slides) (video)	[SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3
	2018: 3b	Introduction to RL (slides) (video)	[SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
Jan 17-21	2018: 4a	Deep neural networks (slides) (video)	[GBC] Chap. 6, 7, 8
	2018: 4b	Deep Q-Networks (slides) (video)	[SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
	2018: 7a	Policy Gradient (slides) (video)	[SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
	2018: 7b	Actor Critic (slides) (video)	[SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
Jan 21	Assignment 1 due (11:59 pm)
Jan 24-28	2018: 14c	Trust Region Methods (slides) (video)	Nocedal and Wright, Numerical Optimization, Chapter 4
	2020: 1	Trust Region & Proximal Policy Optimization (slides) (video)	Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML. Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv.
	2020: 2	Maximum Entropy Reinforcement Learning (slides) (video)	Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML. Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
	2018: 8a	Multi-armed bandits (slides) (video)	[SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
	2018: 8b	Bayesian and contextual bandits (slides) (video)	[SutBar] Sec. 2.9
Jan 31 - Feb 4	2018: 9	Model-based RL (slides) (video)	[SutBar] Chap. 8
	2018: 10	Bayesian RL (slides) (video)	Michael O’Gordon Duff’s PhD Thesis (2002)
	2021: 4	Partially observable RL (slides) (video)	Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. In 2015 AAAI fall symposium series.
	2021: 5	Distributional RL (slides) (video)	Bellemare, Marc G., Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. International Conference on Machine Learning. 2017.
Feb 4	Assignment 2 due (11:59 pm)
Feb 7-11	2020: 3	Imitation Learning (slides) (video)	Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957).
	2021: 6	Inverse Reinforcement Learning (slides) (video)	Ziebart, B. D., Bagnell, J. A., & Dey, A. K. (2010). Modeling interaction via the principle of maximum causal entropy. In ICML. Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In ICML (pp. 49-58).
	2022: 7	Constrained RL
	2022: 8	Inverse Constrained RL
Feb 14	Bid for paper presentations due (11:59 pm)
Feb 16	Project proposal due (11:59 pm)
Feb 18	Assignment 3 due (11:59 pm)
Feb 21-25	Reading break

The following table provides the schedule for paper discussions. Each date refers to the online session when we will discuss that paper. Paper presentations consist of pre-recorded videos that can be watched at anytime

Table of paper discussions

Date	Presenter	Discussants	Topic	Papers
Feb 28	Jack Xu (slides) (video)	Roy Qu	Multi-Agent RL	Iqbal, S., & Sha, F. (2019, May). Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning (pp. 2961-2970). PMLR.
Feb 28	Yuxiang Huang (slides) (video)		Multi-Agent RL	Subramanian, S. G., Taylor, M. E., Crowley, M., & Poupart, P. (2022). Decentralized Mean Field Games. AAAI.
Mar 2	Shuhui Zhu (slides) (video)	William Loh, Amar Sarang	E-commerce	Cai, H., Ren, K., Zhang, W., Malialis, K., Wang, J., Yu, Y., & Guo, D. (2017, February). Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (pp. 661-670).
Mar 2	Jared Feng (slides) (video)	William Loh	E-commerce	Hu, Y., Da, Q., Zeng, A., Yu, Y., & Xu, Y. (2018, July). Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 368-377).
Mar 7	Haoye Lu (slides) (video)	Michael Karras	Finance	Gašperov, B., Begušić, S., Posedel Šimović, P., & Kostanjčar, Z. (2021). Reinforcement Learning Approaches to Optimal Market Making. Mathematics, 9(21), 2689.
Mar 7	Erik Huebner (slides) (video)		Finance	Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10-27.
Mar 9	Iara Santelices (slides) (video)	Haoye Lu, Ahmad Rashid, Aaron Propp, Mohammad Aali, Daniel Herman	Multitask RL	Guo, Z. D., Pires, B. A., Piot, B., Grill, J. B., Altché, F., Munos, R., & Azar, M. G. (2020, November). Bootstrap latent-predictive representations for multitask reinforcement learning. In International Conference on Machine Learning (pp. 3875-3886). PMLR.
Mar 9	Rowan Dempster (slides) (video)	Joel Prabhu, Daniel Herman, Jarvis Xie, Rasoul Mahdavi, Mohammad Aali, Abdelrahman Ahmed, Abhinav Bora, Aayush Wadhwa, Amar Sarang, Roy Qu	Multitask RL	Yu, T., Quillen, D., He, Z., Julian, R., Hausman, K., Finn, C., & Levine, S. (2020, May). Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (pp. 1094-1100). PMLR.
Mar 14	Wei Hu (slides) (video)	Daniel Herman, Youssef Fathi, Abdelrahman Agmed, Tharindu Kodippili, Rasoul Mahdavi, Michael Karras, Helen Wu, Liam Hebert, Abhinav Bora, Roy Qu, Mohammad Khalaji, William Loh	Combinatorial Optimization	Nazari, M., Oroojlooy, A., Snyder, L., & Takác, M. (2018). Reinforcement learning for solving the vehicle routing problem. Advances in neural information processing systems, 31.
Mar 14	Benyamin Jamialahmadi (slides) (video)	Youssef Fathi, Tharindu Kodippili, Aayush Wadhwa, Iara Santelices, Aaron Propp, Helen Wu, Liam Hebert	Biology	Angermueller, C., Dohan, D., Belanger, D., Deshpande, R., Murphy, K., & Colwell, L. (2020). Model-based reinforcement learning for biological sequence design. In International conference on learning representations.
Mar 16	William Dawkins (slides) (video)	Shuihui Zhu, Mohammad Aali, Jarvis Xie, Weijie Zhou, Wei Hu, Ahmad Rashid, Zhenyang Xu, Mattie Nejati, Rowan Dempster, Amar Sarang	Safe RL	Wen, L., Duan, J., Li, S. E., Xu, S., & Peng, H. (2020, September). Safe reinforcement learning for autonomous vehicles through parallel constrained policy optimization. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) (pp. 1-7). IEEE.
Mar 16	Pouya Kananian (slides) (video)	Mohammad Aali, Seyed Shushtari, Jack Xu	Safe RL	Donti, P. L., Roderick, M., Fazlyab, M., & Kolter, J. Z. (2020, September). Enforcing robust control guarantees within neural network policies. In International Conference on Learning Representations.
Mar 21	Mattie Nejati (slides) (video)	Michael Karras, Abhinav Bora, Jeffery Liu, Weijie Zhou, Joel Prabhu, Yaoxin Li, Erik Huebner	Systems	Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2021, June). Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data (pp. 1275-1288).
Mar 21	Mohammad Khalaji (slides) (video)	Weijie Zhou, Pouya Kananian	Systems	Yan, Z., Ge, J., Wu, Y., Li, L., & Li, T. (2020). Automatic virtual network embedding: A deep reinforcement learning approach with graph convolutional networks. IEEE Journal on Selected Areas in Communications, 38(6), 1040-1057.
Mar 23	Francis Kiwon (slides) (video)	Yuxiang Huang, Yaoxin Li, Erik Huebner, Jeffery Liu, Seyed Shushtari, Zhenyang Xu, Helen Wu	Explainability	Liu, G., Sun, X., Schulte, O., & Poupart, P. (2021). Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning. Advances in Neural Information Processing Systems, 34.
Mar 23	Youssef Fathi (slides) (video)	Francis Kiwon, Ahmad Rashid, Yaoxin Li, Liam Hebert, Jarvis Xie, Joel Prabhu, Benyamin Jamialah Madi, Jeffery Liu, Weijie Zhou, William Dawkins, Aayush Wadhwa, Abdelrahman Ahmed, Seyed Shushtari, Rasoul Mahdavi, Tharindu Kodippili, Aaron Propp, Zhenyang Xu, Jared Feng, Helen Wu	Offline RL	Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., ... & Mordatch, I. (2021). Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34.
Apr 11	Project report due (11:59 pm)