Overview

Scientific and engineering corpora (such as published papers, white papers, or manuals) typically contain text and mathematical formulas. However, conventional search engines are limited when used against these corpora, as they only search the text describing mathematical formulas rather than the formulas themselves. Such Keyword search cannot take advantage of the rich information contained within the mathematical content. Math-aware search engines are thus prefered.

The primary objective of the BrushSearch project is to establish a state-of-the-art math-aware search system for scientific documents. The project also aims to provide an intuitive user interface that accepts handwritten mathematical formulas and supports natural gestures to specify constraints, wildcards, and user search preferences.

You might access the latest version of the web interace, know more about the research project below, or learn the features and usage of the BrushSearch system.

User Interface

The web user interface allows a user to write mathematical expressions using a pen or a mouse. It then integrates with the recognizer to generate a LaTeX expression. One can also directly input the LaTeX expression and view the MathML of any LaTeX expression (supported by the LatexML library). The interface offers natural gestures to facilitate the writting process.

Users can choose alternatives for any subexpression to alter the top recognition result. One can also provide samples of his or her handwriting to personalize the recognizer.

Through the interface, the user can specify wildcards to be used during the search and other search preferences.

Users can choose a corpus to search and save and load previous queries. Please see the guide for more information about the features of the user interface.

Card image cap
Math Recognizer

The current version of the recognizer uses a grammar based approch. The recognizer generates recognition alternatives to the user to choose from in case the user wants to replace the top suggetsion. It also allows the user to provide samples for different symbols in order to better recognize one's own handwritting style.

We are working on developing a data driven recognizer that can be trained. For this purpose, we created a handwritten math generator that can generate multiple handwritten variations of latex expressions.

Search Engine

Our search engine utilizes Tangent-L, a math-aware search engine that is based on the Lucene framework. The search engine participated in the ARQMath Lab as the MathDowsers' team with promising result in 2020 and 2021. Tangent-L is described here, and more information about the MathDowsers' submissions and results can be found here.

We are working on incorporating proximity for better search results, for example, query keywords and formulas (or math tokens) appearing closer together in a document can be a strong signal for document relevancy. Partial investigation is documented in the ARQMath Lab Working Paper 2021.