CS 886: Deep Learning and Natural Language Processing
Winter 2020 |
|
INSTRUCTOR: |
Ming
Li |
DC 3355, x84659 |
mli@uwaterloo.ca |
Course time and location: | Mondays
3:00-5:50pm, DC 2585 (Starting from Jan 20) |
Office hours: | I will try to do office
hours by phone (please call me at 519-500-3026 any time), or
by appointment |
Reference Materials: | Papers listed below.
|
Deep learning has brought truly revolutionary changes in NLP research. This course intends
to review the recent progress of this exciting development.
The course will be run as follows.
I will do some lectures at the beginning introducing recent breakthrough results
that have fundamentally changed NLP research. These include word2vec, and pretraining models
such as GPT and BERT, and single headed attention RNN.
Then during the second part of the course,
each student will present one or a group of
research papers from the paper list I
give below (mainly from the first two lists, and please discuss with
me about your choices).
The paper you choose should represent an important progress in NLP
or on shortcomings of current approachs and how we can solve the
fundamental problem in NLP: understanding. Additionally, each student will need
to do one course project of your own choice and present
it to the class at the end of the term.
I expect the students already knew the basics of
deep learning such as different type of
gates, pooling, backpropogation gradient descent methods,
fully connected networks,
recurrent networks such as LSTM,
convolutional networks, and more specialized
structures such as residue networks and Grid LSTM, recursive structure,
memory networks, sequence-to-sequence structure, generative adversarial nets
(GANs). If you do not already know about these, you can read about
these materials online or go to my lecture notes at:
https://cs.uwaterloo.ca/~mli/cs898-2017.html
GPUs: In order for some of you to do experiments,
students can go to https://www.awseducate.com/application
to sign up. Amazon will review the application for a couple of days.
More information can be
found at: https://aws.amazon.com/cn/education/awseducate/
Sharcnet might be another resource for GPU.
It is possible to apply for a TPU from google,
https://heartbeat.fritz.ai/step-by-step-use-of-google-colab-free-tpu-75f8629492b3
Marking Scheme:
Each student is evaluated according to the following three components:
-
[30 marks] Present a paper that represents one aspect of recent progress in NLP (30 minutes). You need to demonstrate thorough understanding of
the relevant literatures of your topic.
Presentations should be an in-depth
survey of the relevant literature and educational. Each week, we will have
three students presenting.
-
[65 marks] Do a project in one NLP direction, and present your own
project in class at the end of the term for 20 minutes. Hand in a
project
report of about 10 pages at the end of the term.
I will be very happy to discuss projects with you.
-
[5 marks] Class attendance and participation --
These marks will be given for free.
Presentations and relevant papers will be posted on this website
(the presenters should provide these materials to me) several days before class.
Course announcements and lecture notes will appear on this page.
Please look at this page regularly.
Reading Materials:
-
For presenting in class, please choose one (group) of the papers
in the following sites:
https://github.com/tomohideshibata/BERT-related-papers
https://www.topbots.com/top-ai-nlp-research-papers-2019/
In principle, we want to hear about the most frontier results: (a) papers
related to BERT, GPT, Transformers, (b) 2018/2019 or newer papers. If you
wish to present something else, please discuss with me.
-
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
Distributed representations of words and phrases and
their compositionality, in Advances in neural information processing systems,
2013, pp. 3111-3119.
-
Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, A neural probabilistic language model, Journal of machine learning research, vol. 3, no.
Feb, pp. 1137-1155, 2003.
-
T. Mikolov, K. Chen, G. Corrado, and J. Dean,
Efficient estimation of word representations in vector space, arXiv
preprint arXiv:1301.3781, 2013.
-
J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation. in EMNLP, vol. 14,
2014, pp. 1532-1543.
-
X. Rong, word2vec parameter learning explained, arXiv preprint arXiv:1411.2738, 2014.
-
R. Johnson and T. Zhang, Semi-supervised convolutional neural networks for text categorization via region embedding,
in Advances in neural information processing systems, 2015, pp. 919-927.
-
R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, Semi-supervised recursive autoencoders
for predicting sentiment distributions, in Proceedings of the conference on empirical methods in natural language
processing. Association for Computational Linguistics, 2011, pp. 151-161.
- X. Wang, Y. Liu, C. Sun, B. Wang, and X. Wang, Predicting polarities of tweets by composing word embeddings with
long short-term memory. in ACL (1), 2015, pp. 1343-1353.
-
D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, Learning sentiment-specific word embedding for twitter
sentiment classification. in ACL (1), 2014, pp. 1555-1565.
- I. Labutov and H. Lipson, Re-embedding words. in ACL (2), 2013, pp. 489-493.
-
S. Upadhyay, K.-W. Chang, M. Taddy, A. Kalai, and J. Zou, Beyond bilingual: Multi-sense word embeddings using
multilingual context, arXiv preprint arXiv:1706.08160, 2017.
-
Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, Character-aware neural language models, in AAAI, 2016, pp. 2741-2749.
-
C. N. Dos Santos and M. Gatti, Deep convolutional neural networks for sentiment analysis of short texts. in COLING,
2014, pp. 69--78.
-
C. N. d. Santos and V. Guimaraes, Boosting named entity recognition with neural character embeddings, arXiv preprint
arXiv:1505.05008, 2015.
-
C. D. Santos and B. Zadrozny, Learning character-level representations for part-of-speech tagging, in Proceedings of
the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1818-1826.
-
Y. Ma, E. Cambria, and S. Gao, Label embedding for zero-shot fine-grained named entity typing, in COLING, Osaka,
2016, pp. 171-180.
-
X. Chen, L. Xu, Z. Liu, M. Sun, and H. Luan, Joint learning of character and word embeddings, in Twenty-Fourth
International Joint Conference on Artificial Intelligence, 2015.
-
H. Peng, E. Cambria, and X. Zou, Radical-based hierarchical embeddings for chinese sentiment analysis at sentence
level, in FLAIRS, 2017, pp. 347-352.
-
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, arXiv preprint 27 arXiv:1607.04606, 2016.
-
A. Herbelot and M. Baroni, High-risk learning: acquiring new word vectors from tiny data, arXiv preprint
arXiv:1707.06556, 2017.
-
Y. Pinter, R. Guthrie, and J. Eisenstein, Mimicking word embeddings using subword rnns, arXiv preprint
arXiv:1707.06961, 2017.
-
L. Lucy and J. Gauthier, Are distributional representations ready for the real world? evaluating word vectors for grounded
perceptual meaning, arXiv preprint arXiv:1705.11168, 2017.
-
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word
representations, arXiv preprint arXiv:1802.05365, 2018 (ELMO)
-
A. Mousa and B. Schuller, Contextual bidirectional long short-term memory recurrent neural network language models:
A generative approach to sentiment analysis, in Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics: Volume 1, Long Papers, vol. 1, 2017, pp. 1023-1032.
-
A. M. Dai and Q. V. Le, Semi-supervised sequence learning, in Advances in neural information processing systems, 2015, pp. 3079-3087.
-
A. Vaswani, N. Shazeer, N. Parmar, and J. Uszkoreit,
Attention is all you need, arXiv preprint arXiv:1706.03762, 2017
-
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving language understanding by generative pretraining,
URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/language-unsupervised/language
understanding paper. pdf, 2018. (OpenAI-GPT)
-
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language
understanding, arXiv preprint arXiv:1810.04805, 2018.
-
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
Radu Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
-
Mandar Joshi*, Danqi Chen*, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer,
Omer Levy.
SpanBERT: Improving Pre-training by Representing and Predicting Spans, 2019.
-
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
RoBERTa: A Robustly Optimized BERT Pretraining Approach.
-
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu,
ERNIE: Enhanced Language Representation with Informative Entities.
2019.
-
Stephen Merity, Single Headed Attention RNN: Stop Thinking With Your Head
2019.
-
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang,
Squad: 100,000+ questions for machine comprehension of text,
arXiv preprint arXiv:1606.05250, 2016.
-
A. Bordes, J. Weston, and N. Usunier, Open question answering with
weakly supervised embedding models, in Joint
European Conference on Machine Learning and Knowledge Discovery in
Databases. Springer, 2014, pp. 165-180.
-
D. Chen, A. Fisch, J. Weston, and A. Bordes, Reading wikipedia
to answer open-domain questions, arXiv preprint arXiv:1704.00051, 2017.
-
M. Bryan, B. James, X. Caiming, and S. Richard, Learned in translation:
Contextualized word vectors, In NIPS 2017.
-
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman,
Glue: A multi-task benchmark and analysis
platform for natural language understanding,
arXiv preprint arXiv:1804.07461, 2018.
-
K. M. Hermann and P. Blunsom, The role of syntax in vector space
models of compositional semantics, in Proceedings
of the 51st Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). Association
for Computational Linguistics, 2013.
-
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436-444(2015).
-
R. Socher, Y. Bengio, C. Manning, Deep learning for NLP, ACL 2012
-
R. Sutton and A. Barto: Reinforcement Learning: an introduction.
MIT Press (1998).
-
K. Cho, B. van Merrienboer, D. Bahdanau, Y. Bengio,
On the properties of neural machine translation: encoder-decoder approaches,
Oct. 2014.
-
K. Cho, B. van Merrienboer, C. Culcehre, D. Bahdanau, F. Bougares, H. Schwenk,
Y. Bengio,
Learning phrase representations using RNN encoder-decoder for statistical
machine translation. Jun. 2014
-
I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, Y. Bengio, Generative adversarial networks. 2014.
-
Alex Graves, Generating sequences with recurrent neural networks.
2013-2014 (this paper generates handwritting characters by LSTM)
P. D. Turney and P. Pantel, From frequency to meaning:
Vector space models of semantics, Journal of artificial
intelligence research, vol. 37, pp. 141-188, 2010
-
A. Nguyen, J. Clune, Y. Bengio, A. Dosoviskiy, J. Yoshinski
Plug and play generative networks: conditional iterative generation
of images in latent space. 2016.
-
F. Ture and O. Jojic,
Simple and effective question answering with recurrent neural networks. 2016.
-
J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky,
Adversarial learning for neural dialogue generation.
arXiv: 1701.06547v4, 2017.
Lecture Notes:
Announcements:
This Monday (Jan. 13)'s class is moved to Jan 17, 5pm, same
classroom.
We will move to DC 2585, a larger classroom, starting from the Jan
20's class. The class time will be 3pm to 5:50pm, Mondays.
This way, all qualified students in the waiting list
will be able to enroll to this course.
For all those who would like to be added to this course,
please come to class today (Jan 20) and give me your student ID.
Announcement Jan 26, 2020. Attention: If you have visited a
city in China in the past 10 days, please wear a mask to attend the
class, to protect other students.
Announcement Feb. 3, 2020: If you had contact with
anybody who might have had contact with people with nCoV in the
past 15 days or if you have any
symptoms, please do not come to the class (and I do not need a doctor's
note). This class will no longer have
the attendance requirement. Everybody will have that 5% attendance
mark automatically. Of course, if you stay home, then please read
thru the presented materials on your own. You are expected to know
the material.
It's time to talk to me about your course (research) project. Note,
this is independent to your presentation (you can of course extend
the paper you have presented).
Please make sure you send me your presentation ppt or pdf file 1 day
before the presentation.
The final projects are due on April 10th, via emailing me in
pdf. (Please submit your 10 minute presentation together with the
final project.)
The final project presentations will be on March 30 and April 1. 10
minutes each person. Please let me know which day you like to
present (first come first choose).
You will use your own laptop to
present. As the time will be very tight,
please test your computer connection beforehand.
Announcement March 13, 2020. Due to school corona virus closure,
We will stop the last day of
presentation for March 16. Please see the detailed arrangement below.
The 5 class attendance mark will now depend on your two
half-page reviews of two
presentations of final projects, respectively.
Announcement March 21, 2020. Course evaluation, attention all
students.
Please go to Evaluation Website to
evaluate our course CS886 Section 002 (SEM), open
Sun. Mar 22 11:59 to Fri. Apr 3, 11:59pm. I wish you are all safe!
Announcement March 22, 2020. If any student have any question,
please call me any time at: 519-500-3026. Stay safe!
Announcement March 26, 2020. Deadlines postponed
The new deadline for submitting the final project report is
April 15 (any time in that day). On the same day, submit your voice-over-ppt (10 minute)
presenting your project. You no longer need to write reviews for
two other student presentations.
Announcement March 28, 2020. GPU's
Several students asked about GPU resources. There are 3 GPU
servers: gpu1, gpu2, and gpu3. First ssh to
datasci.cs.uwaterloo.ca, then using your linux.student.cs userid
and password. From there, ssh to one of the GPU servers. But I
think these are not sufficent to train big models like
BERT. Please use these only for light trainings.
Announcement April 11, 2020
April 15 is the deadline for submitting the project report
(approximately 10 pages) and voice-over-ppt (10 minutes).
All projects and voice-over-ppt will be posted on this website
unless you explicitly tell me not to do so with a good reason
(for example, the material will be published in a paper).
Announcement April 15, 2020
Today is the deadline. Please email me your project report and
voice-over-ppt. I will acknowledge each email.
For submission: please send email attaching 2 files, if possible. Do not do
anything else such , as zip etc, to make things easier on my
side.
Announcement April 16, 2020 As many people do not wish
to publicize their work because they wish to continue their work
as a part of larger research programs, I decide we will not
publish anybody's ppt and final project.
Announcement April 26, 2020 This is our last
announcement of the class. I have now finished marking
all the reports and watched all presentations. I will input these marks to the university
system.
I would like to take this chance to thank every student in this
class.
COVID-19 stopped me from saying good-bye to you, I wish all of
you stay healthy and carry on to do great research in NLP and
deep learning.
Presentation Schedule:
-
Please start picking topics/papers. Then I will put the topics or
papers behind each name. For each topic/paper, we will have only
one student presenting. Therefore, it is a good idea pick up
topics you like early.
-
Jan. 27, Omar Attia (Attention) Omar
Presentation;
Bowen Yang (word embedding,
matrix factorization)Bowen
Presentation
-
Feb. 3: He Bai (Electra)Bai He
Presentation , Owain West (spanBERT,
XLM), Owain Presentation.
Rasoul Akhavan Mahdavi (Positional
Encoding), Rasoul Presentation
Kira Selby (What does BERT learn)Kira Presentation .
-
Feb. 10, Anup Deshmulch (Text Summerization),Anup Presentation.
Hussam Kaka
(Use of BERT for text and doc classification),Hussam Presentation.
Ruifeng Chen (XLNet), Ruifeng Presentation.
Priyabrata Senapati (graph networks), Priyabrata Presentation.
Yuqing Xie (The reversible residue network: Backpropagation without storing
activations),Yuqing Presentation.
Natalie Zhang (Reformer: The Efficient Transformer)
Natalie Presentation.
-
Feb 24, Sidharth Singla (ViLBERT) Sidharth Presentation,
Sheik Shameer Sheik Presentation.
(Sentiment Analysis),
Mohammadali Niknamian
Mohammadali Presentation.
pdf file.
Utsav Tushar Das (Style Transformer: unpaired
text style transfer without disentagled latent representation),
Utsav Presentation.
Yin
Ki Ng (Bridging the gap between training and inference for NMT).
Ng presentation.
-
Mar. 2, Avery Hiebert (Turing completeness),
Avery presentation.
Shreyance Jain (BERTQA -- attention to
steroids),
Jain presentation.
Vedanshi Kataria,
Kataria presentation.
Genseric Ghiro (Adative Transformer),
Ghiro presentation.
Haonan Duan (Gender-preserving debiasing
for pre-trained word embedding),
Duan presentation.
Bin Zhan (RoBERTa)
Bin presentation.
-
Mar. 9, Shiqi Xiao (BART),
Shiqi presentation.
Udhav Sethi (Multi-stage document ranking
with BERT),
Udhav presentation.
Archit Shah (On extractive neural
document
summarization with transformer language models),
Shah presentation.
Karthik Ramesh
(controled text generation),
Ramesh presentation.
Pascale Brunelle Walters (VideoBERT).
Pascale presentation.
-
March 16 presentations below will be cancelled.
However all the presenters are still required to submit your PPT
files to me before March 16, by email. Your marks (just for the 6
people here) for this part of the course
will be just depending on your ppt file (I will read each one). The ppt
files will still be posted here online, and all other students are
still required to go thru these ppt files to gain understanding of
these topics.
Joshua Lemmon (Transferable
multi-domain state generator for task-oriented diaologue systems),
Josh presentation.
Josh presentatoin video:
https://www.dropbox.com/s/cd50c9hylc91ss6/TRADE-video.mp4?dl=0
Egill Ian Gudmundsson (Apply MLM to Sentiment transfer),
Ian presentation. Ian's video presentation:
https://www.youtube.com/watch?v=9vG4hs_L01Q
Ali Saheb Pasand (Emotion-Cause pair extraction: a new task to emotion
analysis in texts),
Ali presentation.
Ali presentation video:
https://www.dropbox.com/s/bdml2t9u57cv624/presentation4.mp47d!=0
Futian Zhang (ERNIE),
Zhang presentation.
Zhiying Jiang (represent knowledgegraph embedding),
Jiang presentation.
Zhiying's presentation video is at:
https://www.dropbox.com/s/xg3hgz1od62rrpo/Knowledge%20Graph%20Embeddding.mov?dl=0
M. Valipour (BERTology?).
Moji presentation.
Moji video presentation.
-
Mar. 23 No class.All have been merged to previous days.
-
Mar. 30 No class. See below
-
Final Project Presentation (10 Minutes Each Person):
-
Due to COVID-19 outbreak, we will change format for final
project presentations. We will not do face-to-face in class
presentations. Each student will still be required to
prepare a 10 minute voice-over PPT (see
How to
create voice-over for PPT presenttion)
for your project (in addition to your
course project paper), making sure that they
are understandable for other people to read.
All of you will email your project 10-minute ppt to me on April 15
(any time on that day) together with your final project report, and I will post them on this website
for other students to read.
Students are no longer required to write reviews for
two other student project presentations.
-
March 30, 3pm - 6pm, DC 2585: Cancelled. See above.
-
April 1, 3pm-6pm, DC 2585: Cancelled. See above. On this day, you
are no longer required to submit anything.
-
April 15 (any time on that day), Deadline for submitting the final
projects paper and your 10 minute presentation.
Maintained by Ming Li