Home Goals Textbook Schedule Assignments Critiques Presentation Project Marks Policies Pascal's Homepage

CS885 Fall 2021 - Reinforcement Learning

The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.

Table of online modules

Week Module Topic Readings (textbooks)
Sept 6-10 2018: 1a Course introduction (slides) (video) [SutBar] Chapter 1, [Sze] Chapter 1
2018: 1b Markov Processes (slides) (video) [RusNor] Section 15.1
Sept 13-17 2018: 2a Markov Decision Processes (slides) (video) [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
2018: 2b Value Iteration (slides) (video) [SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
2018: 3a Policy Iteration (slides) (video) [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec., [RusNor] Sec. 17.3
2018: 3b Introduction to RL (slides) (video) [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
Sept 20-24 2018: 4a Deep neural networks (slides) (video) [GBC] Chap. 6, 7, 8
2018: 4b Deep Q-Networks (slides) (video) [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
2018: 7a Policy Gradient (slides) (video) [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
2018: 7b Actor Critic (slides) (video) [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
Sept 24 Assignment 1 due (11:59 pm)
Sept 27 - Oct 1 2018: 14c Trust Region Methods (slides) (video) Nocedal and Wright, Numerical Optimization, Chapter 4
2020: 1 Trust Region & Proximal Policy Optimization (slides) (video) Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML.
Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv.
2020: 2 Maximum Entropy Reinforcement Learning (slides (Slides 14, 18 corrected on Sept 27, 30)) (video) Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML.
Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
2018: 8a Multi-armed bandits (slides) (video) [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
2018: 8b Bayesian and contextual bandits (slides) (video) [SutBar] Sec. 2.9
Oct 4-8 2018: 9 Model-based RL (slides) (video) [SutBar] Chap. 8
2018: 10 Bayesian RL (slides) (video) Michael O’Gordon Duff’s PhD Thesis (2002)
2021: 4 Partially observable RL (slides) (video) Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. In 2015 AAAI fall symposium series.
2021: 5 Distributional RL (slides) (video) Bellemare, Marc G., Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. International Conference on Machine Learning. 2017.
Oct 8 Assignment 2 due (11:59 pm)
Oct 11-15 Reading break
Oct 18-22 2020: 3 Imitation Learning (slides) (video) Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573).
Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957).
2021: 6 Inverse Reinforcement Learning (slides) (video) Ziebart, B. D., Bagnell, J. A., & Dey, A. K. (2010). Modeling interaction via the principle of maximum causal entropy. In ICML.
Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In ICML (pp. 49-58).
Oct 25 Bid for paper presentations due (11:59 pm)
Oct 27 Assignment 3 due (11:59 pm)
Nov 3 Project proposal due (11:59 pm)

The following table provides the schedule for paper discussions. Each date refers to the online session when we will discuss that paper. Paper presentations consist of pre-recorded videos that can be watched at anytime

Table of paper discussions

Date Presenter Discussants Topic Papers
Nov 8 Rory Soiffer (slides) (video) Theo Vanderkooy, Logan Mosier, Muhammad Hassan, Kanav Mehra, Newsha Seyedi, Muhammad Sulaiman, Amin Khodaee, Gin Suarez, Anbo Wang Multi-agent RL Lowe, Ryan, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Advances in Neural Information Processing Systems 30 (2017): 6379-6390.
Muhammad Hassan (slides) (video) Yiping Wang, Blake Paul Allen Vanberlo, Partha Chakraborty, Leonard Zhao Multi-agent RL He, He, et al. Opponent modelling in deep reinforcement learning. International conference on machine learning. PMLR, 2016
Nov 10 Lucas Fenaux (slides) (video) Elbert Lai, Joe Sun RL for Finance Li, Yuxi, Csaba Szepesvari, and Dale Schuurmans. Learning exercise policies for American options. Artificial Intelligence and Statistics. PMLR, 2009.
Xinyi Yan (slides) (video) Wei Zhong, Muhammad Sulaiman RL for Autonomous Driving Wu, Zheng, et al. Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robotics and Automation Letters 5.4 (2020): 5355-5362.
Nov 15 Arash Moayyedi (slides) (video) Xiaoyu Wen, Yuan Chen, Hongfeng Huang, Navid Malekghaini, Gengyi Sun, Nanda Kishore Sreenivas, Niloy Saha, Joseph Musleh, Wei Zhong, Shadi Ghasemitaheri RL for Recommender Systems Zhao, Xiangyu, et al. Deep reinforcement learning for page-wise recommendations. Proceedings of the 12th ACM Conference on Recommender Systems. 2018.
Mohammad Zangooei (slides) (video) Connor Stewart, Leonard Zhao, Hongfeng Huang, Gengyi Sun, Cameron Seth, Xuejun Du, Nanda Kishore Sreenivas, Kanav Mehra, Niloy Saha, Soheil Johari, Haudi Ghiassi Nejad, Renee Leung, Soroosh Baselizadeh, Shadi Ghasemitaheri, Anbo Wang, Navid Malekghaini RL for Recommender Systems Zheng, Guanjie, et al. DRN: A deep reinforcement learning framework for news recommendation. Proceedings of the 2018 World Wide Web Conference. 2018.
Nov 17 Joel Rorseth (slides) (video) Xiaoyu Wen, Aruth Kandage, Yongqiang Tian RL for Computer Systems Mirhoseini, Azalia, et al. Device placement optimization with reinforcement learning. International Conference on Machine Learning. PMLR, 2017.
Aruth Kandage (slides) (video) Theo Vanderkooy, Ende Jin, Haudi Ghiassi Nejad, RL for Computer Systems Mao, Hongzi, et al. Learning scheduling algorithms for data processing clusters. Proceedings of the ACM Special Interest Group on Data Communication. 2019. 270-288.
Nov 22 Yun Zhi (Judy) Lin (slides) (video) Benjamin Therien, Theo Vanderkooy, Mohammad Dehghan, Yuan Chen, Jianlin Li, Navid Malekghaini, Xueyan Zhang, Elbert Lai, Arash Moayyedi, Haudi Ghiassi Nejad, Wei Zhong, Lucas Fenaux, Shadi Ghasemitaheri, Dheeraj Vagavolu, Soroosh Baselizadeh RL for Energy Lazic, Nevena, et al. Data center cooling using model-predictive control. Advances in Neural Information Processing Systems 31 (2018): 3814-3823.
Alexander James (slides) (video) Jianlin Li, Niloy Saha, Benjamin Therien RL for Energy Chung, Hwei-Ming, et al. Distributed deep reinforcement learning for intelligent load scheduling in residential smart grids. IEEE Transactions on Industrial Informatics 17.4 (2020): 2752-2763.
Nov 24 Partha Chakraborty (slides) (video) Nanda Kishore Sreenivas, Yiping Wang, Blake Paul Allen Vanberlo, Sherman Siu, Cameron Seth, Yonghan Yu, Gustavo Sutter Pessurno De Carvalho, Lena Podina, Gin Suarez, Alexander Michael James, Leonard Zhao RL for Healthcare Komorowski, Matthieu, et al. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine 24.11 (2018): 1716-1720.
Gustavo Sutter (slides) (video) Blake Paul Allen Vanberlo, Connor Stewart, Yiping Wang, Renee Leung, Aref Jafari, Mohammad Dehghan, Mohsin Hasan, Xuejun Du, Yue Lyu, Judy Lin, Partha Chakraborty, Joel Rorseth, Lena Podina, Anbo Wang, Junteng Zheng, Yongqiang Tian, Muhammad Sulaiman, Soroosh Baselizadeh, Nicole Yan, Wei Zhong, Kanav Mehra RL for Healthcare Li, Yuan, et al. Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation. NeurIPS. 2018.
Nov 29 Xueyan Zhang (slides) (video) Steven Lawrence, Yue Lyu, Gin Suarez, Daniel Herman, Xiaoyu Wen, Reza Bigdeli, Gengyi Sun, Deepak Singh Kalhan, Shuyang Zhang, Soheil Johari, Yuan Chen, Anthony Boyko, Dheeraj Vagavolu, Luke Rowe, Renee Leung, Yonghan Yu, Hossam ElAtali, YanTing Miao, Niloy Saha, Junteng Zheng RL for Robotics Haarnoja, Tuomas, et al. Composable deep reinforcement learning for robotic manipulation. 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018.
Steven Lawrence (slides) (video) Junteng Zheng, Wei Pang, Wei Zhong, Deepak Singh Kalhan, Logan Mosier, Aref Jafari, Elbert Lai, Xuejun Du, Dheeraj Vagavolu, YanTing Miao, Hossam ElAtali, Joseph Musleh RL for Robotics Zhang, Marvin, et al. Solar: Deep structured representations for model-based reinforcement learning. International Conference on Machine Learning. PMLR, 2019.
Dec 1 Newsha Seyedi (slides) (video) Wei Pang, Joe Sun, Ende Jin, Sherman Siu, Deepak Singh Kalhan, Mohammad Dehghan, Aref Jafari, Reza Bigdeli, Anthony Boyko, Cameron Seth, Yue Lyu, Lena Podina, Partha Chakraborty, Joseph Musleh, Yonghan Yu, Shuyang Zhang, Luke Rowe, Hossam ElAtali, Logan Mosier, Yongqiang Tian, Amin Khodaee RL for Combinatorial Optimization Huang, Jiayi, Mostofa Patwary, and Gregory Diamos. Coloring big graphs with AlphaGoZero. arXiv preprint arXiv:1902.10162 (2019).
Mohsin Hasan (slides) (video) Benjamin Therien, YanTing Miao, Joe Sun, Ende Jin, Sherman Siu, Rory Soiffer, Reza Bigdeli, Shuyang Zhang, Deepak Singh Kalhan, Hongfeng Huang, Soheil Johari, Mohammad Zangooei, Luke Rowe, Anthony Boyko, Amin Khodaee, Jianlin Li, Wei Pang, Connor Stewart RL for Combinatorial Optimization Tang, Yunhao, Shipra Agrawal, and Yuri Faenza. Reinforcement learning for integer programming: Learning to cut. International Conference on Machine Learning. PMLR, 2020.
Dec 10 Project report due (11:59 pm)