Please note: This PhD seminar will take place online.
Wei Zhong, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Jimmy Lin
Neural retrievers have been introduced into existing math-aware search methods with great success. Their ability to cope with math symbol mismatches, to represent highly contextualized semantics, and to learn effective in-domain retrieval representations in an end-to-end manner, are critical to improving math information retrieval. However, the most effective retriever for math remains impractical as it depends on token-level dense representations for each math token, which leads to prohibitive storage demands, especially considering that math content generally consumes more tokens.
In this work, we try to alleviate this efficiency bottleneck while boosting math information retrieval effectiveness via hybrid search. To this end, we propose MABOWDOR, a Math-Aware Best-of-Worlds Domain Optimized Retriever, which has an unsupervised structure search component, a dense retriever, and optionally a sparse retriever on top of a domain-adapted backbone learned by context-enhanced pretraining, each addressing a different need in retrieving heterogeneous data from math documents.