Group  
Home  
Freda Shi
  石昊悦

Greetings! I am an Assistant Professor in the David R. Cheriton School of Computer Science at the University of Waterloo and a Faculty Member at the Vector Institute, where I also hold a Canada CIFAR AI Chair. I received my Ph.D. in Computer Science from the Toyota Technological Institute at Chicago in 2024, where I was advised by Professors Karen Livescu and Kevin Gimpel, and was supported by a Google Ph.D. Fellowship. I completed my Bachelor's degree in Intelligence Science and Technology (Computer Science Track) in 2018 at Peking University, with a minor in Sociology.

Research

My research interests are in computational linguistics and natural language processing. I work towards deeper understandings of natural language and the human language processing mechanism, as well as how these insights can inform the design of more efficient, effective, safe, and trustworthy NLP systems. Among all relevant topics, I am particularly interested in learning language through grounding, computational multilingualism, and related machine learning aspects. For more details, check out my publications and the CompLING Lab at the University of Waterloo.

Prospective students and visitors: please read this.

Publications show selected / show all by date / show all by topic

Topics: Syntax / Semantics / Multilingualism / Others / Theses / Tutorials (*: Equal Contribution)

Siren's song in the AI ocean: A survey on hallucination in large language models
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

Computational Linguistics 2025    

How Tokenization Limits Phonological Knowledge Representation in Language Models and How to Improve Them
Disen Liao, Freda Shi

Tokenization Workshop, ICML 2025    

FORG3D: Flexible Object Rendering for Generating Vision-Language Spatial Reasoning Data from 3D Scenes
Oscar Pang, Freda Shi

ACL Demo 2025     Code / Project Page

Knowledge Distillation for Language Models
Yuqiao Wen, Freda Shi, Lili Mou

NAACL-HLT Tutorial 2025     Paper

Learning Language through Grounding
Freda Shi, Ziqiao Ma, Jiayuan Mao, Parisa Kordjamshidi, Joyce Chai

NAACL-HLT Tutorial 2025     Paper

SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data
Michael Ogezi, Freda Shi

ACL 2025     arXiv
Oral Presentation

Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning
Yilei Tu, Andrew Xue, Freda Shi

Findings of ACL 2025     arXiv

Logical forms complement probability in understanding language model (and human) performance
Yixuan Wang, Freda Shi

ACL 2025     arXiv
Oral Presentation; Abridged version presented at the 2025 Workshop on Cognitive Modeling and Computational Linguistics (CMCL)

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma

ICLR 2025     Paper / Code / arXiv / Project Page / Data
Oral Presentation; Abridged version presented at the 2025 NeurIPS Workshop in Pluralistic Alignment

Gated slot attention for efficient linear-time sequence modeling
Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu

NeurIPS 2024     Paper / arXiv

Learning Language Structures through Grounding
Haoyue Freda Shi

PhD Thesis, Toyota Technological Institute at Chicago 2024     Paper
Thesis of Distinction
AAAI 2025 New Faculty Highlight

Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing
Freda Shi, Kevin Gimpel, Karen Livescu

ACL 2024     Paper / arXiv

LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP
Danlu Chen, Freda Shi, Aditi Agarwal, Jacobo Myerston, Taylor Berg-Kirkpatrick

ACL 2024     Paper
Best Paper Nominee

Large language models can be easily distracted by irrelevant context
Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, Denny Zhou

ICML 2023     Paper / arXiv

Language models are multilingual chain-of-thought reasoners
Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, others

ICLR 2023     Paper / arXiv

Audio-Visual Neural Syntax Acquisition
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

ASRU 2023     Paper / arXiv

InCoder: A Generative Model for Code Infilling and Synthesis
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

ICLR 2023     Paper / arXiv
Spotlight Presentation

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Srivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Rishabh Gupta, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, M. Yee, Jing Zhang, Yue Zhang

NEJLT 2023     Paper

Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing
Freda Shi, Kevin Gimpel, Karen Livescu

ACL 2022     Paper / arXiv

Natural Language to Code Translation with Execution
Freda Shi, Daniel Fried, Marjan Ghazvininejad, Luke Zettlemoyer, Sida I. Wang

EMNLP 2022     Paper / arXiv

Deep Clustering of Text Representations for Supervision-Free Probing of Syntax
Vikram Gupta, Haoyue Shi, Kevin Gimpel, Mrinmaya Sachan

AAAI 2022     Paper / arXiv

Substructure Substitution: Structured Data Augmentation for NLP
Haoyue Shi, Karen Livescu, Kevin Gimpel

Findings of ACL 2021     Paper / arXiv

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment
Haoyue Shi, Luke Zettlemoyer, Sida I. Wang

ACL 2021     Paper / arXiv
Oral Presentation
Best Paper Nominee

Grammar-Based Grounded Lexicon Learning
Jiayuan Mao, Freda Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum

NeurIPS 2021     Paper / arXiv

On the Role of Supervision in Unsupervised Constituency Parsing
Haoyue Shi, Karen Livescu, Kevin Gimpel

EMNLP 2020     Paper / arXiv
Oral Presentation

A Cross-Task Analysis of Text Span Representations
Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel

RepL4NLP 2020     Paper / arXiv

Visually Grounded Neural Syntax Acquisition
Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu

ACL 2019     Paper / arXiv
Oral Presentation
Best Paper Nominee

Learning Visually-Grounded Semantics from Contrastive Adversarial Samples
Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun

COLING 2018     Paper / arXiv

On Tree-Based Neural Sentence Modeling
Haoyue Shi, Hao Zhou, Jiaze Chen, Lei Li

EMNLP 2018     Paper / arXiv

On Multi-Sense Word Embeddings via Matrix Factorization and Matrix Multiplication
Haoyue Shi

Bachelor's Thesis, Peking University 2018     Paper
Best Dissertation Award

Constructing High Quality Sense-specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-sense
Haoyue Shi, Xihao Wang, Yuqi Sun, Junfeng Hu

LREC 2018     Paper

Implicit Subjective and Sentimental Usages in Multi-sense Word Embeddings
Yuqi Sun, Haoyue Shi, Junfeng Hu

WASSA 2018     Paper

Joint Saliency Estimation and Matching using Image Regions for Geo-Localization of Online Video
Haoyue Shi, Jia Chen, Alexander G. Hauptmann

ICMR 2017     Paper

Real Multi-Sense or Pseudo Multi-Sense: An Approach to Improve Word Representation
Haoyue Shi, Caihua Li, Junfeng Hu

CL4LC 2016     Paper

Cardiovascular Risk Prediction Method Based on Test Analysis and Data Mining Ensemble System
Shan Xu, Haoyue Shi, Xiaohui Duan, Tiangang Zhu, Peihua Wu, Dongyue Liu

ICBDA 2016     Paper