Master’s Thesis Presentation • Data Systems • MIRAGE-ANNS: Mixed Approach Graph-based Indexing for Approximate Nearest Neighbor Search

Wednesday, April 9, 2025 12:30 pm - 1:30 pm EDT (GMT -04:00)

Please note: This master’s thesis presentation will take place in DC 3301.

Sairaj Voruganti, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Tamer Özsu

Approximate nearest neighbor search (ANNS) on high dimensional vectors is important for numerous applications, such as search engines, recommendation systems, and more recently, large language models (LLMs), where Retrieval Augmented Generation (RAG) is used to add context to an LLM query. Graph-based indexes built on these vectors have been shown to perform best but have challenges. These indexes can either employ refinement-based construction strategies such as K-Graph and NSG, or increment-based strategies such as HNSW. Refinement-based approaches have fast construction times, but worse search performance and do not allow for incremental inserts, requiring a full reconstruction each time new vectors are added to the index. Increment-based approaches have good search performance and allow for incremental inserts, but suffer from slow construction.

This work presents MIRAGE-ANNS (Mixed Incremental Refinement Approach Graph-based Exploration for Approximate Nearest Neighbor Search) that constructs the index as fast as refinement-based approaches while retaining search performance comparable or better than increment-based ones. It also allows incremental inserts. We show that MIRAGE achieves state of the art construction and query performance, outperforming existing methods by up to 2x query throughput on real-world datasets.