The schedule below includes two tables: one for concepts (material taught by Pascal) and one for applications (papers presented by students). For the first table (concepts), you are expected to watch and understand the material covered in the slides and videos listed for each week by that week. Readings are complementary and optional.
Table of online modules
Week  Module  Topic  Readings (textbooks) 

May 1115  2020: Logistics & Website  Welcome & Logistics (video) Goals (video) Textbook (video) Schedule (video) Marks (video) Assignments (video) Critiques (video) Presentation (video) Project (video) 

2018: 1a  Course introduction (slides) (video)  [SutBar] Chapter 1, [Sze] Chapter 1  
2018: 1b  Markov Processes (slides) (video)  [RusNor] Section 15.1  
2018: 2a  Markov Decision Processes (slides) (video)  [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.117.2, 17.4, [Put] Chap. 2, 4, 5  
2018: 2b  Value Iteration (slides) (video)  [SutBar] Sec. 4.1, 4.4, [Sze] Sec. 2.2, 2.3, [Put] Sec. 6.16.3, [SigBuf] Chap. 1  
May 1822  2018: 3a  Policy Iteration (slides) (video)  [SutBar] Sec. 4.3, [Put] Sec. 6.46.5, [SigBuf] Sec. 1.6.2.3, [RusNor] Sec. 17.3 
2018: 3b  Introduction to RL (slides) (video)  [SutBar] Sec. 5.15.3, 6.16.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.12.5, [RusNor] Sec. 21.121.3  
2018: 4a  Deep neural networks (slides) (video)  [GBC] Chap. 6, 7, 8  
2018: 4b  Deep QNetworks (slides) (video)  [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2  
May 2529  2018: 7a  Policy Gradient (slides (Slides 8,9 revised June 11)) (video)  [SutBar] Sec. 13.113.3, 13.7 [SigBuf] Sec. 5.15.2, [RusNor] Sec. 21.5 
2018: 7b  Actor Critic (slides) (video)  [SutBar] Sec. 13.413.5, [Sze] Sec. 4.4, [SigBuf] Sec. 5.3  
2018: 14c  Trust Region Methods (slides) (video)  Nocedal and Wright, Numerical Optimization, Chapter 4  
2020: 1  Trust Region & Proximal Policy Optimization (slides) (video)  Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization, ICML. Schulman, Wolski, Dhariwal, Radford, Klimov (2017) Proximal Policy Optimization, arXiv. 

2020: 2  Maximum Entropy Reinforcement Learning (slides (slides 14 and 22 modified June 23)) (video)  Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep EnergyBased Policies, ICML. Haarnoja, Zhou, Abbeel, Levine (2018) Soft ActorCritic: OffPolicy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML. 

May 29  Assignment 1 due (11:59 pm)  
June 15  2018: 8a  Multiarmed bandits (slides) (video)  [SutBar] Sec. 2.12.7, [Sze] Sec. 4.2.14.2.2 
2018: 8b  Bayesian and contextual bandits (slides) (video)  [SutBar] Sec. 2.9  
2018: 9  Modelbased RL (slides) (video)  [SutBar] Chap. 8  
2018: 10  Bayesian RL (slides) (video)  Michael O’Gordon Duff’s PhD Thesis (2002)  
June 812  2018: 11a  Hidden Markov models (slides) (video)  [RusNor] Sec. 15.3 [SutBar] Sec. 17.3 
2018: 11b  Partially observable RL (slides) (video)  [RusNor] Sec. 17.3 [SigBuf] Chap. 7  
2018: 12  Deep recurrent Qnetworks (slides) (video)  [GBC] Chap. 10  
2020: 3  Imitation Learning (slides) (video)  Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In NeurIPS (pp. 45654573). Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In IJCAI (pp. 49504957). 

June 1519  2020: 4  Inverse Reinforcement Learning  
2020: 5  Meta Learning  
June 24  Assignment 2 due (11:59 pm)  
June 26  Project proposal due (11:59 pm) 
The following table provides the schedule for paper presentations. The list of papers and assigned presenters will be determined by the end of May.
Table of paper presentations
Week  Presenter  Topic  Papers 

June 29  July 3  Siliang Huang (slides) (video)  RL for video games  Goel, V., Weng, J., & Poupart, P. (2018). Unsupervised video object segmentation for deep reinforcement learning. In Advances in Neural Information Processing Systems (pp. 56835694). 
Runsheng Guo (slides) (video)  RL for video games  Justesen, N., Bontrager, P., Togelius, J., & Risi, S. (2019). Deep learning for video game playing. IEEE Transactions on Games.  
Enamul Haque (slides) (video)  RL for query optimization in datas systems  Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., & Stoica, I. (2018). Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196.  
Upside down RL  Srivastava, R. K., Shyam, P., Mutz, F., Jaśkowski, W., & Schmidhuber, J. (2019). Training Agents using UpsideDown Reinforcement Learning. arXiv preprint arXiv:1912.02877.  
July 610  Samuel Yigzaw (slides) (video)  Hierarchical RL  Nachum, O., Gu, S. S., Lee, H., & Levine, S. (2018). Dataefficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems (pp. 33033313). 
David Thomas Radke (slides) (video)  Hierarchical RL  Wang, X., Chen, W., Wu, J., Wang, Y. F., & Yang Wang, W. (2018). Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 42134222).  
Scott Larter (slides) (video)  RL for robotics  Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018, May). Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 62926299). IEEE.  
Egill Gudmundsson (slides) (video)  RL for robotics  Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. (2019). Efficient offpolicy metareinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254.  
July 1317  Laura Graves (slides) (video)  RL for autonomous vehicles  You, C., Lu, J., Filev, D., & Tsiotras, P. (2019). Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 114, 118. 
Neel Bhatt (slides) (video)  RL for autonomous vehicles  Zhang, P., Xiong, L., Yu, Z., Fang, P., Yan, S., Yao, J., & Zhou, Y. (2019). Reinforcement LearningBased EndtoEnd Parking for Automatic Parking System. Sensors, 19(18), 3996.  
Hytham Farah (slides) (video)  RL for conversational agents  Zhao, T., & Eskenazi, M. (2016, September). Towards EndtoEnd Learning for Dialog State Tracking and Management using Deep Reinforcement Learning. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 110).  
Mojtaba Valipour (slides) (video)  RL for conversational agents  Zhao, T., Xie, K., & Eskenazi, M. (2019, June). Rethinking Action Spaces for Reinforcement Learning in Endtoend Dialog Agents with Latent Variable Models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 12081218).  
July 2024  Saif Zabarah (slides) (video)  RL for finance  Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006, June). Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on Machine learning (pp. 673680). 
Zhongwen Zhang (slides) (video)  RL for healthcare  Futoma, J., Hughes, M. C., & DoshiVelez, F. (2020). POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning. arXiv preprint arXiv:2001.04032.  
Yan Shi (slides) (video)  RL for combinatorial optimization  Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.  
Lizhe Chen (slides) (video)  RL for neural architecture search  Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 86978710).  
August 9  Project report due (11:59 pm) 