Master’s Thesis Presentation • Machine Learning • Unsupervised Syntactic Structure Induction in Natural Language Processing

Tuesday, August 24, 2021 1:30 pm - 1:30 pm EDT (GMT -04:00)

Please note: This master’s thesis presentation will be given online.

Anup Anand Deshmukh, Master’s candidate
David R. Cheriton School of Computer Science

Supervisors: Professors Ming Li, Jimmy Lin

This work addresses unsupervised chunking as a task for syntactic structure induction, which could help understand the linguistic structures of human languages especially, low-resource languages. In chunking, words of a sentence are grouped together into different phrases (also known as chunks) in a non-hierarchical fashion. Understanding text fundamentally requires finding noun and verb phrases, which makes unsupervised chunking an important step in several real-world applications.

In this thesis, we establish several baselines and discuss our three-step knowledge transfer approach for unsupervised chunking. In the first step, we take advantage of state-of-the-art unsupervised parsers, and in the second, we heuristically induce chunk labels from them. We propose a simple heuristic that does not require any supervision of annotated grammar and generates reasonable (albeit noisy) chunks. In the third step, we design a hierarchical recurrent neural network (HRNN) that learns from these pseudo ground-truth labels. The HRNN explicitly models the composition of words into chunks and smooths out the noise from heuristically induced labels. Our HRNN a) maintains both word-level and phrase-level representations and b) explicitly handles the chunking decisions by providing autoregressiveness at each step. Furthermore, we make a case for exploring the self-supervised learning objectives for unsupervised chunking. Finally, we discuss our attempt to transfer knowledge from chunking back to parsing in an unsupervised setting.

We carry out thorough experiments which show that our HRNN achieves an improvement of more than five percentage points based on our teacher models. This significant improvement shows that HRNN can smooth out the noise from induced chunk labels and accurately capture the chunking patterns. We evaluate different chunking heuristics and show that maximal left-branching performs the best, reinforcing the fact that left-branching structures indicate closely related words. We also present rigorous analysis on the HRNN’s architecture and discuss the performance of language models and vanilla recurrent neural networks.

To join this master’s thesis presentation on Zoom, please go to