CS885 Fall 2021 - Reinforcement Learning

The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.

[SutBar] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction (2nd edition, 2018) freely available online
[Sze] Csaba Szepesvari, Algorithms for Reinforcement Learning freely available online
[ZB] Alex Zai and Brandon Brown, Deep Reinforcement Learning in Action (2nd edition, 2020) freely available online
[GBC] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning (2016) freely available online
[L] Maxim Lapan, Deep Reinforcement Learning Hands On (2020)
[GK] Laura Graesser and Wah Loon Keng, Foundations of Deep Reinforcement Learning: Theory and Practice in Python (2020)
[SigBuf] Olivier Sigaud and Olivier Buffet (editors), Markov Decision Processes in Artificial Intelligence (2013)
[Put] Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (2008)
[Ber] Dimitri P. Bertsekas, Dynamic Programming and Optimal Control (2017)
[Pow] Warren B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality (2015)
[RusNor] Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (4th Edition) (2020)

Table of online modules

Week	Module	Topic	Readings (textbooks)
Sept 6-10	2018: 1a	Course introduction (slides) (video)	[SutBar] Chapter 1, [Sze] Chapter 1
Sept 6-10	2018: 1b	Markov Processes (slides) (video)	[RusNor] Section 15.1
Sept 13-17	2018: 2a	Markov Decision Processes (slides) (video)	[SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
	2018: 2b	Value Iteration (slides) (video)	[SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
	2018: 3a	Policy Iteration (slides) (video)	[SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3
	2018: 3b	Introduction to RL (slides) (video)	[SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
Sept 20-24	2018: 4a	Deep neural networks (slides) (video)	[GBC] Chap. 6, 7, 8
	2018: 4b	Deep Q-Networks (slides) (video)	[SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
	2018: 7a	Policy Gradient (slides) (video)	[SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
	2018: 7b	Actor Critic (slides) (video)	[SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
Sept 24	Assignment 1 due (11:59 pm)
Sept 27 - Oct 1	2018: 14c	Trust Region Methods (slides) (video)	Nocedal and Wright, Numerical Optimization, Chapter 4
	2020: 1	Trust Region & Proximal Policy Optimization (slides) (video)	Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML. Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv.
	2020: 2	Maximum Entropy Reinforcement Learning (slides (Slides 14, 18 corrected on Sept 27, 30)) (video)	Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML. Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
	2018: 8a	Multi-armed bandits (slides) (video)	[SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
	2018: 8b	Bayesian and contextual bandits (slides) (video)	[SutBar] Sec. 2.9
Oct 4-8	2018: 9	Model-based RL (slides) (video)	[SutBar] Chap. 8
	2018: 10	Bayesian RL (slides) (video)	Michael O’Gordon Duff’s PhD Thesis (2002)
	2021: 4	Partially observable RL (slides) (video)	Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. In 2015 AAAI fall symposium series.
	2021: 5	Distributional RL (slides) (video)	Bellemare, Marc G., Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. International Conference on Machine Learning. 2017.
Oct 8	Assignment 2 due (11:59 pm)
Oct 11-15	Reading break
Oct 18-22	2020: 3	Imitation Learning (slides) (video)	Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957).
	2021: 6	Inverse Reinforcement Learning (slides) (video)	Ziebart, B. D., Bagnell, J. A., & Dey, A. K. (2010). Modeling interaction via the principle of maximum causal entropy. In ICML. Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In ICML (pp. 49-58).
	Oct 25	Bid for paper presentations due (11:59 pm)
	Oct 27	Assignment 3 due (11:59 pm)
Nov 3	Project proposal due (11:59 pm)

The following table provides the schedule for paper discussions. Each date refers to the online session when we will discuss that paper. Paper presentations consist of pre-recorded videos that can be watched at anytime

Table of paper discussions

Date	Presenter	Discussants	Topic	Papers
Nov 8	Rory Soiffer (slides) (video)	Theo Vanderkooy, Logan Mosier, Muhammad Hassan, Kanav Mehra, Newsha Seyedi, Muhammad Sulaiman, Amin Khodaee, Gin Suarez, Anbo Wang	Multi-agent RL	Lowe, Ryan, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Advances in Neural Information Processing Systems 30 (2017): 6379-6390.
Nov 8	Muhammad Hassan (slides) (video)	Yiping Wang, Blake Paul Allen Vanberlo, Partha Chakraborty, Leonard Zhao	Multi-agent RL	He, He, et al. Opponent modelling in deep reinforcement learning. International conference on machine learning. PMLR, 2016
Nov 10	Lucas Fenaux (slides) (video)	Elbert Lai, Joe Sun	RL for Finance	Li, Yuxi, Csaba Szepesvari, and Dale Schuurmans. Learning exercise policies for American options. Artificial Intelligence and Statistics. PMLR, 2009.
Nov 10	Xinyi Yan (slides) (video)	Wei Zhong, Muhammad Sulaiman	RL for Autonomous Driving	Wu, Zheng, et al. Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robotics and Automation Letters 5.4 (2020): 5355-5362.
Nov 15	Arash Moayyedi (slides) (video)	Xiaoyu Wen, Yuan Chen, Hongfeng Huang, Navid Malekghaini, Gengyi Sun, Nanda Kishore Sreenivas, Niloy Saha, Joseph Musleh, Wei Zhong, Shadi Ghasemitaheri	RL for Recommender Systems	Zhao, Xiangyu, et al. Deep reinforcement learning for page-wise recommendations. Proceedings of the 12th ACM Conference on Recommender Systems. 2018.
Nov 15	Mohammad Zangooei (slides) (video)	Connor Stewart, Leonard Zhao, Hongfeng Huang, Gengyi Sun, Cameron Seth, Xuejun Du, Nanda Kishore Sreenivas, Kanav Mehra, Niloy Saha, Soheil Johari, Haudi Ghiassi Nejad, Renee Leung, Soroosh Baselizadeh, Shadi Ghasemitaheri, Anbo Wang, Navid Malekghaini	RL for Recommender Systems	Zheng, Guanjie, et al. DRN: A deep reinforcement learning framework for news recommendation. Proceedings of the 2018 World Wide Web Conference. 2018.
Nov 17	Joel Rorseth (slides) (video)	Xiaoyu Wen, Aruth Kandage, Yongqiang Tian	RL for Computer Systems	Mirhoseini, Azalia, et al. Device placement optimization with reinforcement learning. International Conference on Machine Learning. PMLR, 2017.
Nov 17	Aruth Kandage (slides) (video)	Theo Vanderkooy, Ende Jin, Haudi Ghiassi Nejad,	RL for Computer Systems	Mao, Hongzi, et al. Learning scheduling algorithms for data processing clusters. Proceedings of the ACM Special Interest Group on Data Communication. 2019. 270-288.
Nov 22	Yun Zhi (Judy) Lin (slides) (video)	Benjamin Therien, Theo Vanderkooy, Mohammad Dehghan, Yuan Chen, Jianlin Li, Navid Malekghaini, Xueyan Zhang, Elbert Lai, Arash Moayyedi, Haudi Ghiassi Nejad, Wei Zhong, Lucas Fenaux, Shadi Ghasemitaheri, Dheeraj Vagavolu, Soroosh Baselizadeh	RL for Energy	Lazic, Nevena, et al. Data center cooling using model-predictive control. Advances in Neural Information Processing Systems 31 (2018): 3814-3823.
Nov 22	Alexander James (slides) (video)	Jianlin Li, Niloy Saha, Benjamin Therien	RL for Energy	Chung, Hwei-Ming, et al. Distributed deep reinforcement learning for intelligent load scheduling in residential smart grids. IEEE Transactions on Industrial Informatics 17.4 (2020): 2752-2763.
Nov 24	Partha Chakraborty (slides) (video)	Nanda Kishore Sreenivas, Yiping Wang, Blake Paul Allen Vanberlo, Sherman Siu, Cameron Seth, Yonghan Yu, Gustavo Sutter Pessurno De Carvalho, Lena Podina, Gin Suarez, Alexander Michael James, Leonard Zhao	RL for Healthcare	Komorowski, Matthieu, et al. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine 24.11 (2018): 1716-1720.
Nov 24	Gustavo Sutter (slides) (video)	Blake Paul Allen Vanberlo, Connor Stewart, Yiping Wang, Renee Leung, Aref Jafari, Mohammad Dehghan, Mohsin Hasan, Xuejun Du, Yue Lyu, Judy Lin, Partha Chakraborty, Joel Rorseth, Lena Podina, Anbo Wang, Junteng Zheng, Yongqiang Tian, Muhammad Sulaiman, Soroosh Baselizadeh, Nicole Yan, Wei Zhong, Kanav Mehra	RL for Healthcare	Li, Yuan, et al. Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation. NeurIPS. 2018.
Nov 29	Xueyan Zhang (slides) (video)	Steven Lawrence, Yue Lyu, Gin Suarez, Daniel Herman, Xiaoyu Wen, Reza Bigdeli, Gengyi Sun, Deepak Singh Kalhan, Shuyang Zhang, Soheil Johari, Yuan Chen, Anthony Boyko, Dheeraj Vagavolu, Luke Rowe, Renee Leung, Yonghan Yu, Hossam ElAtali, YanTing Miao, Niloy Saha, Junteng Zheng	RL for Robotics	Haarnoja, Tuomas, et al. Composable deep reinforcement learning for robotic manipulation. 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018.
Nov 29	Steven Lawrence (slides) (video)	Junteng Zheng, Wei Pang, Wei Zhong, Deepak Singh Kalhan, Logan Mosier, Aref Jafari, Elbert Lai, Xuejun Du, Dheeraj Vagavolu, YanTing Miao, Hossam ElAtali, Joseph Musleh	RL for Robotics	Zhang, Marvin, et al. Solar: Deep structured representations for model-based reinforcement learning. International Conference on Machine Learning. PMLR, 2019.
Dec 1	Newsha Seyedi (slides) (video)	Wei Pang, Joe Sun, Ende Jin, Sherman Siu, Deepak Singh Kalhan, Mohammad Dehghan, Aref Jafari, Reza Bigdeli, Anthony Boyko, Cameron Seth, Yue Lyu, Lena Podina, Partha Chakraborty, Joseph Musleh, Yonghan Yu, Shuyang Zhang, Luke Rowe, Hossam ElAtali, Logan Mosier, Yongqiang Tian, Amin Khodaee	RL for Combinatorial Optimization	Huang, Jiayi, Mostofa Patwary, and Gregory Diamos. Coloring big graphs with AlphaGoZero. arXiv preprint arXiv:1902.10162 (2019).
Dec 1	Mohsin Hasan (slides) (video)	Benjamin Therien, YanTing Miao, Joe Sun, Ende Jin, Sherman Siu, Rory Soiffer, Reza Bigdeli, Shuyang Zhang, Deepak Singh Kalhan, Hongfeng Huang, Soheil Johari, Mohammad Zangooei, Luke Rowe, Anthony Boyko, Amin Khodaee, Jianlin Li, Wei Pang, Connor Stewart	RL for Combinatorial Optimization	Tang, Yunhao, Shipra Agrawal, and Yuri Faenza. Reinforcement learning for integer programming: Learning to cut. International Conference on Machine Learning. PMLR, 2020.
Dec 10	Project report due (11:59 pm)