Haotian
Zhang,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
The problem of high-recall information retrieval (HRIR) is to find all, or nearly all, relevant documents while ensuring reasonable assessment effort. Achieving high recall is a key problem for applications such as electronic discovery, systematic review, and the construction of test collections for information retrieval tasks.
The current state-of-the-art HRIR systems commonly rely on iterative relevance feedback, in which human assessors keep assessing machine learning selected documents. The relevance of assessed documents are then fed back to improve the machine learned model for selecting the next most likely to be relevant documents for assessments. In many instances, thousands of human assessments might be required to achieve high recall. These assessments represent the main cost of such HRIR applications. Therefore, the effectiveness to achieve high recall of such HRIR methods are limited by their reliance on human input, i.e., to assess the relevance of documents.
In this thesis, we try different methods to improve the effectiveness and efficiency of finding relevant documents for the current state-of-the-art HRIR system. With regard to the effectiveness, we try to build an effective machine learned model to retrieve relevant documents more accurately. On the other hand, for the efficiency, we try to help human assessor make relevance assessments more easily and quickly via our HRIR system. Furthermore, we also try to establish a stopping criteria in the assessment process so as to avoid excessive assessments. In particular, we hypothesize that the total assessment effort to achieve high recall can be reduced by using shorter document excerpts, e.g., extractive summaries, in place of full documents for the assessments of relevance as part of a high-recall retrieval system based on continuous active learning (CAL).
In order to test this hypothesis, we implemented a high-recall retrieval system based on the current state-of-the art implementation of CAL. This high-recall retrieval system could display either full documents or short document excerpts for relevance assessments. A search engine was also integrated into our system to provide the option for assessors to conduct interactive search and judging.
We conducted a simulation study and a 50-person, controlled user study separately to test our hypothesis. The results from the simulation study show that even judging a single extracted sentence for relevance feedback may be adequate for CAL to achieve high recall. The results of the controlled user study further confirmed that human assessors were able to find a significantly larger number of relevant documents when they used the a system with paragraph-length document excerpts as opposed to full documents within a limited time. In addition, we found that allowing participants to compose and execute their own search queries did not improve their ability to find relevant documents and, by some measures, impaired performance. Moreover, we also found that integrating sampling methods with active learning can yield accurate estimates of the number of relevant documents, and thus to further avoid excessive assessment