CS 784: Computational Linguistics, Winter 2025

Welcome to the course!

In this course, we will discuss topics in computational linguistics, including morphology, syntax, semantics, and pragmatics. We will also discuss the use of computational tools and methods for analyzing and processing natural language data, as well as how linguistic insights can be used to design better computational systems. The course will consist of lectures, discussions, and hands-on exercises. We will have a term project where you will have the opportunity to apply the methodologies you have learned in a real research problem, as well as an open-book and open-note final exam, where you will use the methodologies you have learned to analyze some research results or design a follow-up study.

People

  • Instructor: Freda Shi (fhs at uwaterloo dot ca)
    Office hours: T/Th 5:20pm-5:50pm, DC 2568
  • TA: TBA

Time and Location

  • Time: T/Th 4pm-5:20pm
  • Location: DC 2568
  • Final Exam: TBD

Tentative Course Schedule and Related Material

See this page. Slides will be made available before each lecture.

Piazza forum: https://piazza.com/uwaterloo.ca/winter2025/cs784

Textbook

There is no official textbook, but we will use the following resources for reference:

  • Jurafsky, D., & Martin, J. H. (2019). Speech and language processing (3rd ed.). Draft available at https://web.stanford.edu/~jurafsky/slp3/.
  • Clyde, M et al. (2022). An Introduction to Bayesian Thinking. A Companion to the Statistics with R Course. https://statswithr.github.io/book/
    Note: you don't need to use R in this course. The book is just for reference on statistical inference.

Additionally, books below are recommended for more comprehensive understanding of the course material:

Topics

Note: the schedule is tentative and may change according to progress.

  • Introduction (1 lecture)
  • Basic methods (2 lectures): probability, information theory, regular expressions, statistical inference, Bayesian statistics
  • Words (1 lecture): definition, tokenization, morphology
  • Lexical semantics (1 lecture): word senses, distributional semantics, word embeddings, word clustering
  • Text classification (2 lectures): classifiers, linear models, features, training linear classifiers via loss function optimization, loss functions, stochastic gradient descent
  • Neural networks (1 lecture): MLP, CNN, RNN and Transformers, finetuning
  • Language modeling (2 lectures): n-gram models, smoothing, neural network--based language modeling
  • Sequence labeling (2 lectures): part-of-speech tagging, named entity recognition, hidden Markov models, dynamic programming, forward-backward algorithms, Viterbi, conditional random fields
  • Syntax (2 lectures): weighted context-free grammars, dependency syntax, inference algorithms
  • Semantics (2 lectures): compositionality, semantic role labeling, frame semantics, lambda calculus, semantic parsing, grounded semantics
  • Pragmatics (1 lecture): phenomena, rational speech act model
  • Linguistic typology and cross-lingual NLP (1 lecture): translation, decoding, lexicon induction, unsupervised translation
  • Large language models (2 lectures): challenges, prompting, LLMs as cognitive models
  • Critical review of research papers (1 lecture)
  • Project discussion (1-2 lectures, tentative)
  • Final exam review (1 lecture)

Grade Breakdown

  • Assignments: 30%. There will be 2 assignments, each worth 15%. In each assignment, there will be a coding exercise and a paper review exercise.

  • Project: 40%. Survey/meta analysis of ML/AI literature on a key word below of your choice. Project must be done individually.

    • Alignment
    • Attention
    • Bias
    • Causal
    • Classifier
    • Grounding
    • Reality
    • Valence
    • Any keyword that you have observed in ML/AI literature with a clear ambiguity, vagueness, or semantic shift across time---please check with Freda first.

    Project will be evaluated based on the following criteria:

    • Midterm Checkin (5%): Clear description of the chosen keyword and the methodology.
    • Literature Review (15%): Demonstrate thorough identification and accurate summarization of at least two senses of the chosen keyword, supported by appropriate scholarly references.
    • Meta Analysis (15%): Execute a comprehensive meta-analysis with clear methodology and reproducible results.
    • Written Presentation (5%): Exhibit clear, coherent, and professional academic writing and adhere to the formatting requirements throughout the report.

    Excellent reports will be invited for co-authorship on a meta-analysis paper.
    Please refer to Lecture 1 slides for more detailed project requirements.

  • Project review: 10%. Review of another peer's project report.

  • Final exam: 20%. Open book, open notes. You will be presented with some research results and asked to analyze them using the methodologies you have learned in the course, or to design a follow-up study based on the results.

Prerequisites

  • There is no hard requirement for this course, but some background in probability, linear algebra, calculus, and Python programming would be helpful.

FAQs

  • Q: I am an undergraduate student. Can I take this course?

    A: Yes, you can take the course with permission from the instructor. Please complete this form and sent it to Freda along with your transcripts via email. Freda will sign and return it to the Grad Office for processing. For non-CS and/or non-grad students, we maintain a waitlist, so be sure to attend the first lecture. There will be a quiz at the end of the first lecture to determine everyone's rank on the waitlist.

  • Q: I am a non-CS grad student. Can I take this course?

    A: Yes, please email Freda with your name and student number, and she will add you to the waitlist and process with the CS Grad Office to enroll you in. For non-CS and/or non-grad students, we maintain a waitlist, so be sure to attend the first lecture. There will be a quiz at the end of the first lecture to determine everyone's rank on the waitlist.

  • Q: I have no background in linguistics. Can I take the course?

    A: Yes. This course is designed for CS students who are interested in studying language in a computational and scientific framework. We will cover the basics of linguistics.