Tangent-L
Math-aware search engine built on Lucene

Tangent-L evolved from initial designs at the Rochester Institute of Technology (Tangent and Tangent-2) and a joint effort with the University of Waterloo (Tangent-3). The underlying approach is to represent each mathematical formula by a bag of math tokens that form search units similar to word tokens in a text search engine.

To create the math tokens, an input formula with its Presentation MathML representation is first being converted into a Symbol Layout Tree (SLT), where nodes represent the math symbols, and edges represent the spatial relationship between these symbols. Thereafter, this tree-like representation is traversed to extract different sets of math tokens that capture local characteristics of the appearance of the formula. Tangent-L's Math Tokens from a Converted Symbol Layout Tree Click to replay the animation above.

The math tokens replace the input formula in the document (or the query) during indexing and searching. Unlike its predecessors, Tangent-L relies on a combined index for both math tokens and word tokens, and uses a traditional text-based IR ranking method BM25+. This initial design of Tangent-L is documented in Dallas Fraser's thesis and in the “best paper” from Document Engineering 2018. In later development, wildcard matching is further supported.

Various improvements of Tangent-L are described in the ARQMath Lab Working Paper 2020 and 2021. With Tangent-L, the MathDowsers team produces the best participant run of the Math-Answer Retrieval task in both years of the Lab, and also the best automatic run of the Formula Retrieval task in 2021. More information about the submissions and results can be found here.

The BrushSearch system provides an interface for entering arbitrary formulas and keywords or phrases and to search various corpora using Tangent-L. The user can utilize wildcard matching by specifying the type of a certain symbol in a formula. The user can also decide a custom weight for formulas terms within a query that helps achieve better search result. The MathDowser's Browser allows users to explore the search results of Tangent-L for the MathCQA task for ARQMath-1 and ARQMath-2.