PhD Seminar • Data Systems | Information Retrieval • Efficient First-Stage Formula Retrieval via Structure MaxScore Dynamic Pruning

Thursday, May 11, 2023 10:30 am - 11:30 am EDT (GMT -04:00)

Please note: This PhD seminar will take place online.

Wei Zhong, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

Formula retrieval systems using substructure matching are effective, but suffer from slow retrieval times caused by the complexity of structure matching. We present a specialized inverted index and rank-safe dynamic pruning algorithm for faster substructure retrieval. Formulas are indexed from their Operator Tree (OPT) representations. Our model is evaluated using the NTCIR-12 Wikipedia Formula Browsing Task and a new formula corpus produced from Math StackExchange posts. The proposed approach preserves the effectiveness of structure matching while allowing queries to be executed in real-time.