CS885 Spring 2020 - Reinforcement Learning

The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.

[SutBar] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction (2nd edition, 2018) freely available online
[Sze] Csaba Szepesvari, Algorithms for Reinforcement Learning freely available online
[GBC] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning (2016) freely available online
[SigBuf] Olivier Sigaud and Olivier Buffet (editors), Markov Decision Processes in Artificial Intelligence (2013)
[Put] Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (2008)
[Ber] Dimitri P. Bertsekas, Dynamic Programming and Optimal Control (2017)
[Pow] Warren B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality (2015)
[RusNor] Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (4th Edition) (2020)

Table of online modules

Week	Module	Topic	Readings (textbooks)
May 11-15	2020: Logistics & Website	Welcome & Logistics (video) Goals (video) Textbook (video) Schedule (video) Marks (video) Assignments (video) Critiques (video) Presentation (video) Project (video)
	2018: 1a	Course introduction (slides) (video)	[SutBar] Chapter 1, [Sze] Chapter 1
	2018: 1b	Markov Processes (slides) (video)	[RusNor] Section 15.1
	2018: 2a	Markov Decision Processes (slides) (video)	[SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
	2018: 2b	Value Iteration (slides) (video)	[SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
May 18-22	2018: 3a	Policy Iteration (slides) (video)	[SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3
	2018: 3b	Introduction to RL (slides) (video)	[SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
	2018: 4a	Deep neural networks (slides) (video)	[GBC] Chap. 6, 7, 8
	2018: 4b	Deep Q-Networks (slides) (video)	[SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
May 25-29	2018: 7a	Policy Gradient (slides (Slides 8,9 revised June 11)) (video)	[SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
	2018: 7b	Actor Critic (slides) (video)	[SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
	2018: 14c	Trust Region Methods (slides) (video)	Nocedal and Wright, Numerical Optimization, Chapter 4
	2020: 1	Trust Region & Proximal Policy Optimization (slides) (video)	Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML. Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv.
	2020: 2	Maximum Entropy Reinforcement Learning (slides (slides 14 and 22 modified June 23)) (video)	Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML. Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
May 29	Assignment 1 due (11:59 pm)
June 1-5	2018: 8a	Multi-armed bandits (slides) (video)	[SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
	2018: 8b	Bayesian and contextual bandits (slides) (video)	[SutBar] Sec. 2.9
	2018: 9	Model-based RL (slides) (video)	[SutBar] Chap. 8
	2018: 10	Bayesian RL (slides) (video)	Michael O’Gordon Duff’s PhD Thesis (2002)
June 8-12	2018: 11a	Hidden Markov models (slides) (video)	[RusNor] Sec. 15.3 [SutBar] Sec. 17.3
	2018: 11b	Partially observable RL (slides) (video)	[RusNor] Sec. 17.3 [SigBuf] Chap. 7
	2018: 12	Deep recurrent Q-networks (slides) (video)	[GBC] Chap. 10
	2020: 3	Imitation Learning (slides) (video)	Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957).
June 24	Assignment 2 due (11:59 pm)
June 26	Project proposal due (11:59 pm)

The following table provides the schedule for paper presentations. The list of papers and assigned presenters will be determined by the end of May.

Table of paper presentations

Week	Presenter	Topic	Papers
June 29 - July 3	Siliang Huang (slides) (video)	RL for video games	Goel, V., Weng, J., & Poupart, P. (2018). Unsupervised video object segmentation for deep reinforcement learning. In Advances in Neural Information Processing Systems (pp. 5683-5694).
	Runsheng Guo (slides) (video)	RL for video games	Justesen, N., Bontrager, P., Togelius, J., & Risi, S. (2019). Deep learning for video game playing. IEEE Transactions on Games.
	Enamul Haque (slides) (video)	RL for query optimization in datas systems	Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., & Stoica, I. (2018). Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196.
		Upside down RL	Srivastava, R. K., Shyam, P., Mutz, F., Jaśkowski, W., & Schmidhuber, J. (2019). Training Agents using Upside-Down Reinforcement Learning. arXiv preprint arXiv:1912.02877.
July 6-10	Samuel Yigzaw (slides) (video)	Hierarchical RL	Nachum, O., Gu, S. S., Lee, H., & Levine, S. (2018). Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems (pp. 3303-3313).
	David Thomas Radke (slides) (video)	Hierarchical RL	Wang, X., Chen, W., Wu, J., Wang, Y. F., & Yang Wang, W. (2018). Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4213-4222).
	Scott Larter (slides) (video)	RL for robotics	Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018, May). Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6292-6299). IEEE.
	Egill Gudmundsson (slides) (video)	RL for robotics	Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254.
July 13-17	Laura Graves (slides) (video)	RL for autonomous vehicles	You, C., Lu, J., Filev, D., & Tsiotras, P. (2019). Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 114, 1-18.
	Neel Bhatt (slides) (video)	RL for autonomous vehicles	Zhang, P., Xiong, L., Yu, Z., Fang, P., Yan, S., Yao, J., & Zhou, Y. (2019). Reinforcement Learning-Based End-to-End Parking for Automatic Parking System. Sensors, 19(18), 3996.
	Hytham Farah (slides) (video)	RL for conversational agents	Zhao, T., & Eskenazi, M. (2016, September). Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 1-10).
	Mojtaba Valipour (slides) (video)	RL for conversational agents	Zhao, T., Xie, K., & Eskenazi, M. (2019, June). Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1208-1218).
July 20-24	Saif Zabarah (slides) (video)	RL for finance	Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006, June). Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on Machine learning (pp. 673-680).
	Zhongwen Zhang (slides) (video)	RL for healthcare	Futoma, J., Hughes, M. C., & Doshi-Velez, F. (2020). POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning. arXiv preprint arXiv:2001.04032.
	Yan Shi (slides) (video)	RL for combinatorial optimization	Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.
	Lizhe Chen (slides) (video)	RL for neural architecture search	Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697-8710).
August 9	Project report due (11:59 pm)