Home Goals Textbook Schedule Assignments Critiques Presentation Project Marks Policies Pascal's Homepage

CS885 Spring 2020 - Reinforcement Learning

The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.

Table of online modules

Week Module Topic Readings (textbooks)
May 11-15 2020: Logistics & Website Welcome & Logistics (video)
Goals (video)
Textbook (video)
Schedule (video)
Marks (video)
Assignments (video)
Critiques (video)
Presentation (video)
Project (video)
2018: 1a Course introduction (slides) (video) [SutBar] Chapter 1, [Sze] Chapter 1
2018: 1b Markov Processes (slides) (video) [RusNor] Section 15.1
2018: 2a Markov Decision Processes (slides) (video) [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5
2018: 2b Value Iteration (slides) (video) [SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.1-6.3, [SigBuf] Chap. 1
May 18-22 2018: 3a Policy Iteration (slides) (video) [SutBar] Sec. 4.3, [Put] Sec. 6.4-6.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3
2018: 3b Introduction to RL (slides) (video) [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3
2018: 4a Deep neural networks (slides) (video) [GBC] Chap. 6, 7, 8
2018: 4b Deep Q-Networks (slides) (video) [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2
May 25-29 2018: 7a Policy Gradient (slides (Slides 8,9 revised June 11)) (video) [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5
2018: 7b Actor Critic (slides) (video) [SutBar] Sec. 13.4-13.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3
2018: 14c Trust Region Methods (slides) (video) Nocedal and Wright, Numerical Optimization, Chapter 4
2020: 1 Trust Region & Proximal Policy Optimization (slides) (video) Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML.
Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv.
2020: 2 Maximum Entropy Reinforcement Learning (slides (slides 14 and 22 modified June 23)) (video) Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML.
Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
May 29 Assignment 1 due (11:59 pm)
June 1-5 2018: 8a Multi-armed bandits (slides) (video) [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2
2018: 8b Bayesian and contextual bandits (slides) (video) [SutBar] Sec. 2.9
2018: 9 Model-based RL (slides) (video) [SutBar] Chap. 8
2018: 10 Bayesian RL (slides) (video) Michael O’Gordon Duff’s PhD Thesis (2002)
June 8-12 2018: 11a Hidden Markov models (slides) (video) [RusNor] Sec. 15.3 [SutBar] Sec. 17.3
2018: 11b Partially observable RL (slides) (video) [RusNor] Sec. 17.3 [SigBuf] Chap. 7
2018: 12 Deep recurrent Q-networks (slides) (video) [GBC] Chap. 10
2020: 3 Imitation Learning (slides) (video) Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 4565-4573).
Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 4950-4957).
June 24 Assignment 2 due (11:59 pm)
June 26 Project proposal due (11:59 pm)

The following table provides the schedule for paper presentations. The list of papers and assigned presenters will be determined by the end of May.

Table of paper presentations

Week Presenter Topic Papers
June 29 - July 3 Siliang Huang (slides) (video) RL for video games Goel, V., Weng, J., & Poupart, P. (2018). Unsupervised video object segmentation for deep reinforcement learning. In Advances in Neural Information Processing Systems (pp. 5683-5694).
Runsheng Guo (slides) (video) RL for video games Justesen, N., Bontrager, P., Togelius, J., & Risi, S. (2019). Deep learning for video game playing. IEEE Transactions on Games.
Enamul Haque (slides) (video) RL for query optimization in datas systems Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., & Stoica, I. (2018). Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196.
Upside down RL Srivastava, R. K., Shyam, P., Mutz, F., Jaśkowski, W., & Schmidhuber, J. (2019). Training Agents using Upside-Down Reinforcement Learning. arXiv preprint arXiv:1912.02877.
July 6-10 Samuel Yigzaw (slides) (video) Hierarchical RL Nachum, O., Gu, S. S., Lee, H., & Levine, S. (2018). Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems (pp. 3303-3313).
David Thomas Radke (slides) (video) Hierarchical RL Wang, X., Chen, W., Wu, J., Wang, Y. F., & Yang Wang, W. (2018). Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4213-4222).
Scott Larter (slides) (video) RL for robotics Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018, May). Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6292-6299). IEEE.
Egill Gudmundsson (slides) (video) RL for robotics Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254.
July 13-17 Laura Graves (slides) (video) RL for autonomous vehicles You, C., Lu, J., Filev, D., & Tsiotras, P. (2019). Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 114, 1-18.
Neel Bhatt (slides) (video) RL for autonomous vehicles Zhang, P., Xiong, L., Yu, Z., Fang, P., Yan, S., Yao, J., & Zhou, Y. (2019). Reinforcement Learning-Based End-to-End Parking for Automatic Parking System. Sensors, 19(18), 3996.
Hytham Farah (slides) (video) RL for conversational agents Zhao, T., & Eskenazi, M. (2016, September). Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 1-10).
Mojtaba Valipour (slides) (video) RL for conversational agents Zhao, T., Xie, K., & Eskenazi, M. (2019, June). Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1208-1218).
July 20-24 Saif Zabarah (slides) (video) RL for finance Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006, June). Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on Machine learning (pp. 673-680).
Zhongwen Zhang (slides) (video) RL for healthcare Futoma, J., Hughes, M. C., & Doshi-Velez, F. (2020). POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning. arXiv preprint arXiv:2001.04032.
Yan Shi (slides) (video) RL for combinatorial optimization Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.
Lizhe Chen (slides) (video) RL for neural architecture search Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697-8710).
August 9 Project report due (11:59 pm)