CS 886-002: Winter 2020: Home

CS 886: Deep Learning and Natural Language Processing
Winter 2020

INSTRUCTOR:

Ming Li DC 3355, x84659 mli@uwaterloo.ca

Course time and location: Mondays 3:00-5:50pm, DC 2585 (Starting from Jan 20)

Office hours: I will try to do office hours by phone (please call me at 519-500-3026 any time), or by appointment
Reference Materials: Papers listed below.

INSTRUCTOR:
Ming Li	DC 3355, x84659	mli@uwaterloo.ca
Course time and location:	Mondays 3:00-5:50pm, DC 2585 (Starting from Jan 20)
Office hours:	I will try to do office hours by phone (please call me at 519-500-3026 any time), or by appointment
Reference Materials:	Papers listed below.

Deep learning has brought truly revolutionary changes in NLP research. This course intends to review the recent progress of this exciting development.

The course will be run as follows. I will do some lectures at the beginning introducing recent breakthrough results that have fundamentally changed NLP research. These include word2vec, and pretraining models such as GPT and BERT, and single headed attention RNN. Then during the second part of the course, each student will present one or a group of research papers from the paper list I give below (mainly from the first two lists, and please discuss with me about your choices). The paper you choose should represent an important progress in NLP or on shortcomings of current approachs and how we can solve the fundamental problem in NLP: understanding. Additionally, each student will need to do one course project of your own choice and present it to the class at the end of the term. I expect the students already knew the basics of deep learning such as different type of gates, pooling, backpropogation gradient descent methods, fully connected networks, recurrent networks such as LSTM, convolutional networks, and more specialized structures such as residue networks and Grid LSTM, recursive structure, memory networks, sequence-to-sequence structure, generative adversarial nets (GANs). If you do not already know about these, you can read about these materials online or go to my lecture notes at: https://cs.uwaterloo.ca/~mli/cs898-2017.html

GPUs: In order for some of you to do experiments, students can go to https://www.awseducate.com/application to sign up. Amazon will review the application for a couple of days. More information can be found at: https://aws.amazon.com/cn/education/awseducate/ Sharcnet might be another resource for GPU. It is possible to apply for a TPU from google, https://heartbeat.fritz.ai/step-by-step-use-of-google-colab-free-tpu-75f8629492b3

Marking Scheme: Each student is evaluated according to the following three components:

[30 marks] Present a paper that represents one aspect of recent progress in NLP (30 minutes). You need to demonstrate thorough understanding of the relevant literatures of your topic. Presentations should be an in-depth survey of the relevant literature and educational. Each week, we will have three students presenting.
[65 marks] Do a project in one NLP direction, and present your own project in class at the end of the term for 20 minutes. Hand in a project report of about 10 pages at the end of the term. I will be very happy to discuss projects with you.
[5 marks] Class attendance and participation -- These marks will be given for free.

Presentations and relevant papers will be posted on this website (the presenters should provide these materials to me) several days before class.

Course announcements and lecture notes will appear on this page. Please look at this page regularly.

For presenting in class, please choose one (group) of the papers in the following sites: https://github.com/tomohideshibata/BERT-related-papers
https://www.topbots.com/top-ai-nlp-research-papers-2019/
In principle, we want to hear about the most frontier results: (a) papers related to BERT, GPT, Transformers, (b) 2018/2019 or newer papers. If you wish to present something else, please discuss with me.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in neural information processing systems, 2013, pp. 3111-3119.
Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, A neural probabilistic language model, Journal of machine learning research, vol. 3, no. Feb, pp. 1137-1155, 2003.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781, 2013.
J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation. in EMNLP, vol. 14, 2014, pp. 1532-1543.
X. Rong, word2vec parameter learning explained, arXiv preprint arXiv:1411.2738, 2014.
R. Johnson and T. Zhang, Semi-supervised convolutional neural networks for text categorization via region embedding, in Advances in neural information processing systems, 2015, pp. 919-927.
R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, Semi-supervised recursive autoencoders for predicting sentiment distributions, in Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2011, pp. 151-161.
X. Wang, Y. Liu, C. Sun, B. Wang, and X. Wang, Predicting polarities of tweets by composing word embeddings with long short-term memory. in ACL (1), 2015, pp. 1343-1353.
D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, Learning sentiment-specific word embedding for twitter sentiment classification. in ACL (1), 2014, pp. 1555-1565.
I. Labutov and H. Lipson, Re-embedding words. in ACL (2), 2013, pp. 489-493.
S. Upadhyay, K.-W. Chang, M. Taddy, A. Kalai, and J. Zou, Beyond bilingual: Multi-sense word embeddings using multilingual context, arXiv preprint arXiv:1706.08160, 2017.
Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, Character-aware neural language models, in AAAI, 2016, pp. 2741-2749.
C. N. Dos Santos and M. Gatti, Deep convolutional neural networks for sentiment analysis of short texts. in COLING, 2014, pp. 69--78.
C. N. d. Santos and V. Guimaraes, Boosting named entity recognition with neural character embeddings, arXiv preprint arXiv:1505.05008, 2015.
C. D. Santos and B. Zadrozny, Learning character-level representations for part-of-speech tagging, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1818-1826.
Y. Ma, E. Cambria, and S. Gao, Label embedding for zero-shot fine-grained named entity typing, in COLING, Osaka, 2016, pp. 171-180.
X. Chen, L. Xu, Z. Liu, M. Sun, and H. Luan, Joint learning of character and word embeddings, in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
H. Peng, E. Cambria, and X. Zou, Radical-based hierarchical embeddings for chinese sentiment analysis at sentence level, in FLAIRS, 2017, pp. 347-352.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, arXiv preprint 27 arXiv:1607.04606, 2016.
A. Herbelot and M. Baroni, High-risk learning: acquiring new word vectors from tiny data, arXiv preprint arXiv:1707.06556, 2017.
Y. Pinter, R. Guthrie, and J. Eisenstein, Mimicking word embeddings using subword rnns, arXiv preprint arXiv:1707.06961, 2017.
L. Lucy and J. Gauthier, Are distributional representations ready for the real world? evaluating word vectors for grounded perceptual meaning, arXiv preprint arXiv:1705.11168, 2017.
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word representations, arXiv preprint arXiv:1802.05365, 2018 (ELMO)
A. Mousa and B. Schuller, Contextual bidirectional long short-term memory recurrent neural network language models: A generative approach to sentiment analysis, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, vol. 1, 2017, pp. 1023-1032.
A. M. Dai and Q. V. Le, Semi-supervised sequence learning, in Advances in neural information processing systems, 2015, pp. 3079-3087.
A. Vaswani, N. Shazeer, N. Parmar, and J. Uszkoreit, Attention is all you need, arXiv preprint arXiv:1706.03762, 2017
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving language understanding by generative pretraining, URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/language-unsupervised/language understanding paper. pdf, 2018. (OpenAI-GPT)
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018.
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Mandar Joshi*, Danqi Chen*, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy. SpanBERT: Improving Pre-training by Representing and Predicting Spans, 2019.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu, ERNIE: Enhanced Language Representation with Informative Entities. 2019.
Stephen Merity, Single Headed Attention RNN: Stop Thinking With Your Head 2019.
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, Squad: 100,000+ questions for machine comprehension of text, arXiv preprint arXiv:1606.05250, 2016.
A. Bordes, J. Weston, and N. Usunier, Open question answering with weakly supervised embedding models, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2014, pp. 165-180.
D. Chen, A. Fisch, J. Weston, and A. Bordes, Reading wikipedia to answer open-domain questions, arXiv preprint arXiv:1704.00051, 2017.
M. Bryan, B. James, X. Caiming, and S. Richard, Learned in translation: Contextualized word vectors, In NIPS 2017.
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, Glue: A multi-task benchmark and analysis platform for natural language understanding, arXiv preprint arXiv:1804.07461, 2018.
K. M. Hermann and P. Blunsom, The role of syntax in vector space models of compositional semantics, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2013.
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436-444(2015).
R. Socher, Y. Bengio, C. Manning, Deep learning for NLP, ACL 2012
R. Sutton and A. Barto: Reinforcement Learning: an introduction. MIT Press (1998).
K. Cho, B. van Merrienboer, D. Bahdanau, Y. Bengio, On the properties of neural machine translation: encoder-decoder approaches, Oct. 2014.
K. Cho, B. van Merrienboer, C. Culcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation. Jun. 2014
I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks. 2014.
Alex Graves, Generating sequences with recurrent neural networks. 2013-2014 (this paper generates handwritting characters by LSTM) P. D. Turney and P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of artificial intelligence research, vol. 37, pp. 141-188, 2010
A. Nguyen, J. Clune, Y. Bengio, A. Dosoviskiy, J. Yoshinski Plug and play generative networks: conditional iterative generation of images in latent space. 2016.
F. Ture and O. Jojic, Simple and effective question answering with recurrent neural networks. 2016.
J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky, Adversarial learning for neural dialogue generation. arXiv: 1701.06547v4, 2017.

Lecture Notes:

Announcements:

This Monday (Jan. 13)'s class is moved to Jan 17, 5pm, same classroom.

We will move to DC 2585, a larger classroom, starting from the Jan 20's class. The class time will be 3pm to 5:50pm, Mondays. This way, all qualified students in the waiting list will be able to enroll to this course.

For all those who would like to be added to this course, please come to class today (Jan 20) and give me your student ID.

Announcement Jan 26, 2020. Attention: If you have visited a city in China in the past 10 days, please wear a mask to attend the class, to protect other students.

Announcement Feb. 3, 2020: If you had contact with anybody who might have had contact with people with nCoV in the past 15 days or if you have any symptoms, please do not come to the class (and I do not need a doctor's note). This class will no longer have the attendance requirement. Everybody will have that 5% attendance mark automatically. Of course, if you stay home, then please read thru the presented materials on your own. You are expected to know the material.

It's time to talk to me about your course (research) project. Note, this is independent to your presentation (you can of course extend the paper you have presented).

Please make sure you send me your presentation ppt or pdf file 1 day before the presentation.

The final projects are due on April 10th, via emailing me in pdf. (Please submit your 10 minute presentation together with the final project.)

The final project presentations will be on March 30 and April 1. 10 minutes each person. Please let me know which day you like to present (first come first choose). You will use your own laptop to present. As the time will be very tight, please test your computer connection beforehand.

Announcement March 13, 2020. Due to school corona virus closure, We will stop the last day of presentation for March 16. Please see the detailed arrangement below. The 5 class attendance mark will now depend on your two half-page reviews of two presentations of final projects, respectively.

Announcement March 21, 2020. Course evaluation, attention all students. Please go to Evaluation Website to evaluate our course CS886 Section 002 (SEM), open Sun. Mar 22 11:59 to Fri. Apr 3, 11:59pm. I wish you are all safe!

Announcement March 22, 2020. If any student have any question, please call me any time at: 519-500-3026. Stay safe!

Announcement March 26, 2020. Deadlines postponed The new deadline for submitting the final project report is April 15 (any time in that day). On the same day, submit your voice-over-ppt (10 minute) presenting your project. You no longer need to write reviews for two other student presentations.

Announcement March 28, 2020. GPU's Several students asked about GPU resources. There are 3 GPU servers: gpu1, gpu2, and gpu3. First ssh to datasci.cs.uwaterloo.ca, then using your linux.student.cs userid and password. From there, ssh to one of the GPU servers. But I think these are not sufficent to train big models like BERT. Please use these only for light trainings.

Announcement April 11, 2020 April 15 is the deadline for submitting the project report (approximately 10 pages) and voice-over-ppt (10 minutes). All projects and voice-over-ppt will be posted on this website unless you explicitly tell me not to do so with a good reason (for example, the material will be published in a paper).

Announcement April 15, 2020 Today is the deadline. Please email me your project report and voice-over-ppt. I will acknowledge each email. For submission: please send email attaching 2 files, if possible. Do not do anything else such , as zip etc, to make things easier on my side.

Announcement April 16, 2020 As many people do not wish to publicize their work because they wish to continue their work as a part of larger research programs, I decide we will not publish anybody's ppt and final project.

Announcement April 26, 2020 This is our last announcement of the class. I have now finished marking all the reports and watched all presentations. I will input these marks to the university system. I would like to take this chance to thank every student in this class. COVID-19 stopped me from saying good-bye to you, I wish all of you stay healthy and carry on to do great research in NLP and deep learning.

Presentation Schedule:

Please start picking topics/papers. Then I will put the topics or papers behind each name. For each topic/paper, we will have only one student presenting. Therefore, it is a good idea pick up topics you like early.
Jan. 27, Omar Attia (Attention) Omar Presentation; Bowen Yang (word embedding, matrix factorization)Bowen Presentation
Feb. 3: He Bai (Electra)Bai He Presentation , Owain West (spanBERT, XLM), Owain Presentation. Rasoul Akhavan Mahdavi (Positional Encoding), Rasoul Presentation Kira Selby (What does BERT learn)Kira Presentation .
Feb. 10, Anup Deshmulch (Text Summerization),Anup Presentation. Hussam Kaka (Use of BERT for text and doc classification),Hussam Presentation. Ruifeng Chen (XLNet), Ruifeng Presentation. Priyabrata Senapati (graph networks), Priyabrata Presentation. Yuqing Xie (The reversible residue network: Backpropagation without storing activations),Yuqing Presentation. Natalie Zhang (Reformer: The Efficient Transformer) Natalie Presentation.
Feb 24, Sidharth Singla (ViLBERT) Sidharth Presentation, Sheik Shameer Sheik Presentation. (Sentiment Analysis), Mohammadali Niknamian Mohammadali Presentation. pdf file. Utsav Tushar Das (Style Transformer: unpaired text style transfer without disentagled latent representation), Utsav Presentation. Yin Ki Ng (Bridging the gap between training and inference for NMT). Ng presentation.
Mar. 2, Avery Hiebert (Turing completeness), Avery presentation. Shreyance Jain (BERTQA -- attention to steroids), Jain presentation. Vedanshi Kataria, Kataria presentation. Genseric Ghiro (Adative Transformer), Ghiro presentation. Haonan Duan (Gender-preserving debiasing for pre-trained word embedding), Duan presentation. Bin Zhan (RoBERTa) Bin presentation.
Mar. 9, Shiqi Xiao (BART), Shiqi presentation. Udhav Sethi (Multi-stage document ranking with BERT), Udhav presentation. Archit Shah (On extractive neural document summarization with transformer language models), Shah presentation. Karthik Ramesh (controled text generation), Ramesh presentation. Pascale Brunelle Walters (VideoBERT). Pascale presentation.
March 16 presentations below will be cancelled. However all the presenters are still required to submit your PPT files to me before March 16, by email. Your marks (just for the 6 people here) for this part of the course will be just depending on your ppt file (I will read each one). The ppt files will still be posted here online, and all other students are still required to go thru these ppt files to gain understanding of these topics. Joshua Lemmon (Transferable multi-domain state generator for task-oriented diaologue systems), Josh presentation. Josh presentatoin video: https://www.dropbox.com/s/cd50c9hylc91ss6/TRADE-video.mp4?dl=0 Egill Ian Gudmundsson (Apply MLM to Sentiment transfer), Ian presentation. Ian's video presentation: https://www.youtube.com/watch?v=9vG4hs_L01Q Ali Saheb Pasand (Emotion-Cause pair extraction: a new task to emotion analysis in texts), Ali presentation. Ali presentation video: https://www.dropbox.com/s/bdml2t9u57cv624/presentation4.mp47d!=0 Futian Zhang (ERNIE), Zhang presentation. Zhiying Jiang (represent knowledgegraph embedding), Jiang presentation. Zhiying's presentation video is at: https://www.dropbox.com/s/xg3hgz1od62rrpo/Knowledge%20Graph%20Embeddding.mov?dl=0 M. Valipour (BERTology?). Moji presentation. Moji video presentation.
Mar. 23 No class.All have been merged to previous days.
Mar. 30 No class. See below

Final Project Presentation (10 Minutes Each Person):

Due to COVID-19 outbreak, we will change format for final project presentations. We will not do face-to-face in class presentations. Each student will still be required to prepare a 10 minute voice-over PPT (see How to create voice-over for PPT presenttion) for your project (in addition to your course project paper), making sure that they are understandable for other people to read. All of you will email your project 10-minute ppt to me on April 15 (any time on that day) together with your final project report, and I will post them on this website for other students to read. Students are no longer required to write reviews for two other student project presentations.
March 30, 3pm - 6pm, DC 2585: Cancelled. See above.
April 1, 3pm-6pm, DC 2585: Cancelled. See above. On this day, you are no longer required to submit anything.
April 15 (any time on that day), Deadline for submitting the final projects paper and your 10 minute presentation.

Maintained by Ming Li