The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.
Table of online modules
|Sept 6-10||2018: 1a||Course introduction (slides) (video)||[SutBar] Chapter 1, [Sze] Chapter 1|
|2018: 1b||Markov Processes (slides) (video)||[RusNor] Section 15.1|
|Sept 13-17||2018: 2a||Markov Decision Processes (slides) (video)||[SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5|
|2018: 2b||Value Iteration (slides) (video)||[SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1|
|2018: 3a||Policy Iteration (slides) (video)||[SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 18.104.22.168, [RusNor] Sec. 17.3|
|2018: 3b||Introduction to RL (slides) (video)||[SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3|
|Sept 20-24||2018: 4a||Deep neural networks (slides) (video)||[GBC] Chap. 6, 7, 8|
|2018: 4b||Deep Q-Networks (slides) (video)||[SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2|
|2018: 7a||Policy Gradient (slides) (video)||[SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5|
|2018: 7b||Actor Critic (slides) (video)||[SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3|
|Sept 24||Assignment 1 due (11:59 pm)|
|Sept 27 - Oct 1||2018: 14c||Trust Region Methods (slides) (video)||Nocedal and Wright, Numerical Optimization, Chapter 4|
|2020: 1||Trust Region & Proximal Policy Optimization (slides) (video)||Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML.
Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv.
|2020: 2||Maximum Entropy Reinforcement Learning (slides (Slides 14, 18 corrected on Sept 27, 30)) (video)||Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML.
Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
|2018: 8a||Multi-armed bandits (slides) (video)||[SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2|
|2018: 8b||Bayesian and contextual bandits (slides) (video)||[SutBar] Sec. 2.9|
|Oct 4-8||2018: 9||Model-based RL (slides) (video)||[SutBar] Chap. 8|
|2018: 10||Bayesian RL (slides) (video)||Michael O’Gordon Duff’s PhD Thesis (2002)|
|2021: 4||Partially observable RL (slides) (video)||Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. In 2015 AAAI fall symposium series.|
|2021: 5||Distributional RL (slides) (video)||Bellemare, Marc G., Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. International Conference on Machine Learning. 2017.|
|Oct 8||Assignment 2 due (11:59 pm)|
|Oct 11-15||Reading break|
|Oct 18-22||2020: 3||Imitation Learning (slides) (video)||Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573).
Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957).
|2021: 6||Inverse Reinforcement Learning (slides) (video)||Ziebart, B. D., Bagnell, J. A., & Dey, A. K. (2010). Modeling interaction via the principle of maximum causal entropy. In ICML.
Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In ICML (pp. 49-58).
|Oct 25||Bid for paper presentations due (11:59 pm)|
|Oct 27||Assignment 3 due (11:59 pm)|
|Nov 3||Project proposal due (11:59 pm)|
The following table provides the schedule for paper discussions. Each date refers to the online session when we will discuss that paper. Paper presentations consist of pre-recorded videos that can be watched at anytime
Table of paper discussions
|Nov 8||Rory Soiffer (slides) (video)||Theo Vanderkooy, Logan Mosier, Muhammad Hassan, Kanav Mehra, Newsha Seyedi, Muhammad Sulaiman, Amin Khodaee, Gin Suarez, Anbo Wang||Multi-agent RL||Lowe, Ryan, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Advances in Neural Information Processing Systems 30 (2017): 6379-6390.|
|Muhammad Hassan (slides) (video)||Yiping Wang, Blake Paul Allen Vanberlo, Partha Chakraborty, Leonard Zhao||Multi-agent RL||He, He, et al. Opponent modelling in deep reinforcement learning. International conference on machine learning. PMLR, 2016|
|Nov 10||Lucas Fenaux (slides) (video)||Elbert Lai, Joe Sun||RL for Finance||Li, Yuxi, Csaba Szepesvari, and Dale Schuurmans. Learning exercise policies for American options. Artificial Intelligence and Statistics. PMLR, 2009.|
|Xinyi Yan (slides) (video)||Wei Zhong, Muhammad Sulaiman||RL for Autonomous Driving||Wu, Zheng, et al. Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robotics and Automation Letters 5.4 (2020): 5355-5362.|
|Nov 15||Arash Moayyedi (slides) (video)||Xiaoyu Wen, Yuan Chen, Hongfeng Huang, Navid Malekghaini, Gengyi Sun, Nanda Kishore Sreenivas, Niloy Saha, Joseph Musleh, Wei Zhong, Shadi Ghasemitaheri||RL for Recommender Systems||Zhao, Xiangyu, et al. Deep reinforcement learning for page-wise recommendations. Proceedings of the 12th ACM Conference on Recommender Systems. 2018.|
|Mohammad Zangooei (slides) (video)||Connor Stewart, Leonard Zhao, Hongfeng Huang, Gengyi Sun, Cameron Seth, Xuejun Du, Nanda Kishore Sreenivas, Kanav Mehra, Niloy Saha, Soheil Johari, Haudi Ghiassi Nejad, Renee Leung, Soroosh Baselizadeh, Shadi Ghasemitaheri, Anbo Wang, Navid Malekghaini||RL for Recommender Systems||Zheng, Guanjie, et al. DRN: A deep reinforcement learning framework for news recommendation. Proceedings of the 2018 World Wide Web Conference. 2018.|
|Nov 17||Joel Rorseth (slides) (video)||Xiaoyu Wen, Aruth Kandage, Yongqiang Tian||RL for Computer Systems||Mirhoseini, Azalia, et al. Device placement optimization with reinforcement learning. International Conference on Machine Learning. PMLR, 2017.|
|Aruth Kandage (slides) (video)||Theo Vanderkooy, Ende Jin, Haudi Ghiassi Nejad,||RL for Computer Systems||Mao, Hongzi, et al. Learning scheduling algorithms for data processing clusters. Proceedings of the ACM Special Interest Group on Data Communication. 2019. 270-288.|
|Nov 22||Yun Zhi (Judy) Lin (slides) (video)||Benjamin Therien, Theo Vanderkooy, Mohammad Dehghan, Yuan Chen, Jianlin Li, Navid Malekghaini, Xueyan Zhang, Elbert Lai, Arash Moayyedi, Haudi Ghiassi Nejad, Wei Zhong, Lucas Fenaux, Shadi Ghasemitaheri, Dheeraj Vagavolu, Soroosh Baselizadeh||RL for Energy||Lazic, Nevena, et al. Data center cooling using model-predictive control. Advances in Neural Information Processing Systems 31 (2018): 3814-3823.|
|Alexander James (slides) (video)||Jianlin Li, Niloy Saha, Benjamin Therien||RL for Energy||Chung, Hwei-Ming, et al. Distributed deep reinforcement learning for intelligent load scheduling in residential smart grids. IEEE Transactions on Industrial Informatics 17.4 (2020): 2752-2763.|
|Nov 24||Partha Chakraborty (slides) (video)||Nanda Kishore Sreenivas, Yiping Wang, Blake Paul Allen Vanberlo, Sherman Siu, Cameron Seth, Yonghan Yu, Gustavo Sutter Pessurno De Carvalho, Lena Podina, Gin Suarez, Alexander Michael James, Leonard Zhao||RL for Healthcare||Komorowski, Matthieu, et al. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine 24.11 (2018): 1716-1720.|
|Gustavo Sutter (slides) (video)||Blake Paul Allen Vanberlo, Connor Stewart, Yiping Wang, Renee Leung, Aref Jafari, Mohammad Dehghan, Mohsin Hasan, Xuejun Du, Yue Lyu, Judy Lin, Partha Chakraborty, Joel Rorseth, Lena Podina, Anbo Wang, Junteng Zheng, Yongqiang Tian, Muhammad Sulaiman, Soroosh Baselizadeh, Nicole Yan, Wei Zhong, Kanav Mehra||RL for Healthcare||Li, Yuan, et al. Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation. NeurIPS. 2018.|
|Nov 29||Xueyan Zhang (slides) (video)||Steven Lawrence, Yue Lyu, Gin Suarez, Daniel Herman, Xiaoyu Wen, Reza Bigdeli, Gengyi Sun, Deepak Singh Kalhan, Shuyang Zhang, Soheil Johari, Yuan Chen, Anthony Boyko, Dheeraj Vagavolu, Luke Rowe, Renee Leung, Yonghan Yu, Hossam ElAtali, YanTing Miao, Niloy Saha, Junteng Zheng||RL for Robotics||Haarnoja, Tuomas, et al. Composable deep reinforcement learning for robotic manipulation. 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018.|
|Steven Lawrence (slides) (video)||Junteng Zheng, Wei Pang, Wei Zhong, Deepak Singh Kalhan, Logan Mosier, Aref Jafari, Elbert Lai, Xuejun Du, Dheeraj Vagavolu, YanTing Miao, Hossam ElAtali, Joseph Musleh||RL for Robotics||Zhang, Marvin, et al. Solar: Deep structured representations for model-based reinforcement learning. International Conference on Machine Learning. PMLR, 2019.|
|Dec 1||Newsha Seyedi (slides) (video)||Wei Pang, Joe Sun, Ende Jin, Sherman Siu, Deepak Singh Kalhan, Mohammad Dehghan, Aref Jafari, Reza Bigdeli, Anthony Boyko, Cameron Seth, Yue Lyu, Lena Podina, Partha Chakraborty, Joseph Musleh, Yonghan Yu, Shuyang Zhang, Luke Rowe, Hossam ElAtali, Logan Mosier, Yongqiang Tian, Amin Khodaee||RL for Combinatorial Optimization||Huang, Jiayi, Mostofa Patwary, and Gregory Diamos. Coloring big graphs with AlphaGoZero. arXiv preprint arXiv:1902.10162 (2019).|
|Mohsin Hasan (slides) (video)||Benjamin Therien, YanTing Miao, Joe Sun, Ende Jin, Sherman Siu, Rory Soiffer, Reza Bigdeli, Shuyang Zhang, Deepak Singh Kalhan, Hongfeng Huang, Soheil Johari, Mohammad Zangooei, Luke Rowe, Anthony Boyko, Amin Khodaee, Jianlin Li, Wei Pang, Connor Stewart||RL for Combinatorial Optimization||Tang, Yunhao, Shipra Agrawal, and Yuri Faenza. Reinforcement learning for integer programming: Learning to cut. International Conference on Machine Learning. PMLR, 2020.|
|Dec 10||Project report due (11:59 pm)|