CS 886: Deep Learning and Natural Language Processing
||DC 3355, x84659
Course time and location:||Mondays
3:00-5:50pm, DC 2585 (Starting from Jan 20) |
|Office hours:||I will try to do office
hours by phone (please call me at 519-500-3026 any time), or
|Reference Materials:||Papers listed below.
Deep learning has brought truly revolutionary changes in NLP research. This course intends
to review the recent progress of this exciting development.
The course will be run as follows.
I will do some lectures at the beginning introducing recent breakthrough results
that have fundamentally changed NLP research. These include word2vec, and pretraining models
such as GPT and BERT, and single headed attention RNN.
Then during the second part of the course,
each student will present one or a group of
research papers from the paper list I
give below (mainly from the first two lists, and please discuss with
me about your choices).
The paper you choose should represent an important progress in NLP
or on shortcomings of current approachs and how we can solve the
fundamental problem in NLP: understanding. Additionally, each student will need
to do one course project of your own choice and present
it to the class at the end of the term.
I expect the students already knew the basics of
deep learning such as different type of
gates, pooling, backpropogation gradient descent methods,
fully connected networks,
recurrent networks such as LSTM,
convolutional networks, and more specialized
structures such as residue networks and Grid LSTM, recursive structure,
memory networks, sequence-to-sequence structure, generative adversarial nets
(GANs). If you do not already know about these, you can read about
these materials online or go to my lecture notes at:
GPUs: In order for some of you to do experiments,
students can go to https://www.awseducate.com/application
to sign up. Amazon will review the application for a couple of days.
More information can be
found at: https://aws.amazon.com/cn/education/awseducate/
Sharcnet might be another resource for GPU.
It is possible to apply for a TPU from google,
Each student is evaluated according to the following three components:
Presentations and relevant papers will be posted on this website
(the presenters should provide these materials to me) several days before class.
[30 marks] Present a paper that represents one aspect of recent progress in NLP (30 minutes). You need to demonstrate thorough understanding of
the relevant literatures of your topic.
Presentations should be an in-depth
survey of the relevant literature and educational. Each week, we will have
three students presenting.
[65 marks] Do a project in one NLP direction, and present your own
project in class at the end of the term for 20 minutes. Hand in a
report of about 10 pages at the end of the term.
I will be very happy to discuss projects with you.
[5 marks] Class attendance and participation --
These marks will be given for free.
Course announcements and lecture notes will appear on this page.
Please look at this page regularly.
For presenting in class, please choose one (group) of the papers
in the following sites:
In principle, we want to hear about the most frontier results: (a) papers
related to BERT, GPT, Transformers, (b) 2018/2019 or newer papers. If you
wish to present something else, please discuss with me.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
Distributed representations of words and phrases and
their compositionality, in Advances in neural information processing systems,
2013, pp. 3111-3119.
Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, A neural probabilistic language model, Journal of machine learning research, vol. 3, no.
Feb, pp. 1137-1155, 2003.
T. Mikolov, K. Chen, G. Corrado, and J. Dean,
Efficient estimation of word representations in vector space, arXiv
preprint arXiv:1301.3781, 2013.
J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation. in EMNLP, vol. 14,
2014, pp. 1532-1543.
X. Rong, word2vec parameter learning explained, arXiv preprint arXiv:1411.2738, 2014.
R. Johnson and T. Zhang, Semi-supervised convolutional neural networks for text categorization via region embedding,
in Advances in neural information processing systems, 2015, pp. 919-927.
R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, Semi-supervised recursive autoencoders
for predicting sentiment distributions, in Proceedings of the conference on empirical methods in natural language
processing. Association for Computational Linguistics, 2011, pp. 151-161.
- X. Wang, Y. Liu, C. Sun, B. Wang, and X. Wang, Predicting polarities of tweets by composing word embeddings with
long short-term memory. in ACL (1), 2015, pp. 1343-1353.
D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, Learning sentiment-specific word embedding for twitter
sentiment classification. in ACL (1), 2014, pp. 1555-1565.
- I. Labutov and H. Lipson, Re-embedding words. in ACL (2), 2013, pp. 489-493.
S. Upadhyay, K.-W. Chang, M. Taddy, A. Kalai, and J. Zou, Beyond bilingual: Multi-sense word embeddings using
multilingual context, arXiv preprint arXiv:1706.08160, 2017.
Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, Character-aware neural language models, in AAAI, 2016, pp. 2741-2749.
C. N. Dos Santos and M. Gatti, Deep convolutional neural networks for sentiment analysis of short texts. in COLING,
2014, pp. 69--78.
C. N. d. Santos and V. Guimaraes, Boosting named entity recognition with neural character embeddings, arXiv preprint
C. D. Santos and B. Zadrozny, Learning character-level representations for part-of-speech tagging, in Proceedings of
the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1818-1826.
Y. Ma, E. Cambria, and S. Gao, Label embedding for zero-shot fine-grained named entity typing, in COLING, Osaka,
2016, pp. 171-180.
X. Chen, L. Xu, Z. Liu, M. Sun, and H. Luan, Joint learning of character and word embeddings, in Twenty-Fourth
International Joint Conference on Artificial Intelligence, 2015.
H. Peng, E. Cambria, and X. Zou, Radical-based hierarchical embeddings for chinese sentiment analysis at sentence
level, in FLAIRS, 2017, pp. 347-352.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, arXiv preprint 27 arXiv:1607.04606, 2016.
A. Herbelot and M. Baroni, High-risk learning: acquiring new word vectors from tiny data, arXiv preprint
Y. Pinter, R. Guthrie, and J. Eisenstein, Mimicking word embeddings using subword rnns, arXiv preprint
L. Lucy and J. Gauthier, Are distributional representations ready for the real world? evaluating word vectors for grounded
perceptual meaning, arXiv preprint arXiv:1705.11168, 2017.
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word
representations, arXiv preprint arXiv:1802.05365, 2018 (ELMO)
A. Mousa and B. Schuller, Contextual bidirectional long short-term memory recurrent neural network language models:
A generative approach to sentiment analysis, in Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics: Volume 1, Long Papers, vol. 1, 2017, pp. 1023-1032.
A. M. Dai and Q. V. Le, Semi-supervised sequence learning, in Advances in neural information processing systems, 2015, pp. 3079-3087.
A. Vaswani, N. Shazeer, N. Parmar, and J. Uszkoreit,
Attention is all you need, arXiv preprint arXiv:1706.03762, 2017
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving language understanding by generative pretraining,
URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/language-unsupervised/language
understanding paper. pdf, 2018. (OpenAI-GPT)
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language
understanding, arXiv preprint arXiv:1810.04805, 2018.
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
Radu Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Mandar Joshi*, Danqi Chen*, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer,
SpanBERT: Improving Pre-training by Representing and Predicting Spans, 2019.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
RoBERTa: A Robustly Optimized BERT Pretraining Approach.
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu,
ERNIE: Enhanced Language Representation with Informative Entities.
Stephen Merity, Single Headed Attention RNN: Stop Thinking With Your Head
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang,
Squad: 100,000+ questions for machine comprehension of text,
arXiv preprint arXiv:1606.05250, 2016.
A. Bordes, J. Weston, and N. Usunier, Open question answering with
weakly supervised embedding models, in Joint
European Conference on Machine Learning and Knowledge Discovery in
Databases. Springer, 2014, pp. 165-180.
D. Chen, A. Fisch, J. Weston, and A. Bordes, Reading wikipedia
to answer open-domain questions, arXiv preprint arXiv:1704.00051, 2017.
M. Bryan, B. James, X. Caiming, and S. Richard, Learned in translation:
Contextualized word vectors, In NIPS 2017.
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman,
Glue: A multi-task benchmark and analysis
platform for natural language understanding,
arXiv preprint arXiv:1804.07461, 2018.
K. M. Hermann and P. Blunsom, The role of syntax in vector space
models of compositional semantics, in Proceedings
of the 51st Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). Association
for Computational Linguistics, 2013.
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436-444(2015).
R. Socher, Y. Bengio, C. Manning, Deep learning for NLP, ACL 2012
R. Sutton and A. Barto: Reinforcement Learning: an introduction.
MIT Press (1998).
K. Cho, B. van Merrienboer, D. Bahdanau, Y. Bengio,
On the properties of neural machine translation: encoder-decoder approaches,
K. Cho, B. van Merrienboer, C. Culcehre, D. Bahdanau, F. Bougares, H. Schwenk,
Learning phrase representations using RNN encoder-decoder for statistical
machine translation. Jun. 2014
I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, Y. Bengio, Generative adversarial networks. 2014.
Alex Graves, Generating sequences with recurrent neural networks.
2013-2014 (this paper generates handwritting characters by LSTM)
P. D. Turney and P. Pantel, From frequency to meaning:
Vector space models of semantics, Journal of artificial
intelligence research, vol. 37, pp. 141-188, 2010
A. Nguyen, J. Clune, Y. Bengio, A. Dosoviskiy, J. Yoshinski
Plug and play generative networks: conditional iterative generation
of images in latent space. 2016.
F. Ture and O. Jojic,
Simple and effective question answering with recurrent neural networks. 2016.
J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky,
Adversarial learning for neural dialogue generation.
arXiv: 1701.06547v4, 2017.
This Monday (Jan. 13)'s class is moved to Jan 17, 5pm, same
We will move to DC 2585, a larger classroom, starting from the Jan
20's class. The class time will be 3pm to 5:50pm, Mondays.
This way, all qualified students in the waiting list
will be able to enroll to this course.
For all those who would like to be added to this course,
please come to class today (Jan 20) and give me your student ID.
Announcement Jan 26, 2020. Attention: If you have visited a
city in China in the past 10 days, please wear a mask to attend the
class, to protect other students.
Announcement Feb. 3, 2020: If you had contact with
anybody who might have had contact with people with nCoV in the
past 15 days or if you have any
symptoms, please do not come to the class (and I do not need a doctor's
note). This class will no longer have
the attendance requirement. Everybody will have that 5% attendance
mark automatically. Of course, if you stay home, then please read
thru the presented materials on your own. You are expected to know
It's time to talk to me about your course (research) project. Note,
this is independent to your presentation (you can of course extend
the paper you have presented).
Please make sure you send me your presentation ppt or pdf file 1 day
before the presentation.
The final projects are due on April 10th, via emailing me in
pdf. (Please submit your 10 minute presentation together with the
The final project presentations will be on March 30 and April 1. 10
minutes each person. Please let me know which day you like to
present (first come first choose).
You will use your own laptop to
present. As the time will be very tight,
please test your computer connection beforehand.
Announcement March 13, 2020. Due to school corona virus closure,
We will stop the last day of
presentation for March 16. Please see the detailed arrangement below.
The 5 class attendance mark will now depend on your two
half-page reviews of two
presentations of final projects, respectively.
Announcement March 21, 2020. Course evaluation, attention all
Please go to Evaluation Website to
evaluate our course CS886 Section 002 (SEM), open
Sun. Mar 22 11:59 to Fri. Apr 3, 11:59pm. I wish you are all safe!
Announcement March 22, 2020. If any student have any question,
please call me any time at: 519-500-3026. Stay safe!
Announcement March 26, 2020. Deadlines postponed
The new deadline for submitting the final project report is
April 15 (any time in that day). On the same day, submit your voice-over-ppt (10 minute)
presenting your project. You no longer need to write reviews for
two other student presentations.
Announcement March 28, 2020. GPU's
Several students asked about GPU resources. There are 3 GPU
servers: gpu1, gpu2, and gpu3. First ssh to
datasci.cs.uwaterloo.ca, then using your linux.student.cs userid
and password. From there, ssh to one of the GPU servers. But I
think these are not sufficent to train big models like
BERT. Please use these only for light trainings.
Please start picking topics/papers. Then I will put the topics or
papers behind each name. For each topic/paper, we will have only
one student presenting. Therefore, it is a good idea pick up
topics you like early.
Jan. 27, Omar Attia (Attention) Omar
Bowen Yang (word embedding,
Feb. 3: He Bai (Electra)Bai He
Presentation , Owain West (spanBERT,
XLM), Owain Presentation.
Rasoul Akhavan Mahdavi (Positional
Encoding), Rasoul Presentation
Kira Selby (What does BERT learn)Kira Presentation .
Feb. 10, Anup Deshmulch (Text Summerization),Anup Presentation.
(Use of BERT for text and doc classification),Hussam Presentation.
Ruifeng Chen (XLNet), Ruifeng Presentation.
Priyabrata Senapati (graph networks), Priyabrata Presentation.
Yuqing Xie (The reversible residue network: Backpropagation without storing
Natalie Zhang (Reformer: The Efficient Transformer)
Feb 24, Sidharth Singla (ViLBERT) Sidharth Presentation,
Sheik Shameer Sheik Presentation.
Utsav Tushar Das (Style Transformer: unpaired
text style transfer without disentagled latent representation),
Ki Ng (Bridging the gap between training and inference for NMT).
Mar. 2, Avery Hiebert (Turing completeness),
Shreyance Jain (BERTQA -- attention to
Genseric Ghiro (Adative Transformer),
Haonan Duan (Gender-preserving debiasing
for pre-trained word embedding),
Bin Zhan (RoBERTa)
Mar. 9, Shiqi Xiao (BART),
Udhav Sethi (Multi-stage document ranking
Archit Shah (On extractive neural
summarization with transformer language models),
(controled text generation),
Pascale Brunelle Walters (VideoBERT).
March 16 presentations below will be cancelled.
However all the presenters are still required to submit your PPT
files to me before March 16, by email. Your marks (just for the 6
people here) for this part of the course
will be just depending on your ppt file (I will read each one). The ppt
files will still be posted here online, and all other students are
still required to go thru these ppt files to gain understanding of
Joshua Lemmon (Transferable
multi-domain state generator for task-oriented diaologue systems),
Josh presentatoin video:
Egill Ian Gudmundsson (Apply MLM to Sentiment transfer),
Ian presentation. Ian's video presentation:
Ali Saheb Pasand (Emotion-Cause pair extraction: a new task to emotion
analysis in texts),
Ali presentation video:
Futian Zhang (ERNIE),
Zhiying Jiang (represent knowledgegraph embedding),
Zhiying's presentation video is at:
M. Valipour (BERTology?).
Moji video presentation.
Mar. 23 No class.All have been merged to previous days.
Mar. 30 No class. See below
Final Project Presentation (10 Minutes Each Person):
Due to COVID-19 outbreak, we will change format for final
project presentations. We will not do face-to-face in class
presentations. Each student will still be required to
prepare a 10 minute voice-over PPT (see
create voice-over for PPT presenttion)
for your project (in addition to your
course project paper), making sure that they
are understandable for other people to read.
All of you will email your project 10-minute ppt to me on April 15
(any time on that day) together with your final project report, and I will post them on this website
for other students to read.
Students are no longer required to write reviews for
two other student project presentations.
March 30, 3pm - 6pm, DC 2585: Cancelled. See above.
April 1, 3pm-6pm, DC 2585: Cancelled. See above. On this day, you
are no longer required to submit anything.
April 15 (any time on that day), Deadline for submitting the final
projects paper and your 10 minute presentation.
Maintained by Ming Li