Please note: This master’s thesis presentation will take place in DC 3317.
Gaurav Sehgal, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Semih Salihoğlu
There is an increasing demand for extending existing DBMSs with vector indices to become unified systems that can support modern predictive applications, which require joint querying of vector embeddings and structured properties and connections of objects. We present NaviX, a Native vector indeX for graph DBMSs (GDBMSs) that has two main design goals.
First, we aim to implement a disk-based vector index that leverages the core storage and query processing capabilities of the underlying GDBMS. To this end, NaviX is a hierarchical navigable small world (HNSW) index, which is itself a graph-based structure.
Second, we aim to evaluate predicate-agnostic filtered vector search queries, where the k nearest neighbors (kNNs) of a query vector 𝑣𝑄 is searched across an arbitrary subset 𝑆 of vectors that is specified by an ad-hoc selection sub-query 𝑄𝑆. We adopt a prefiltering-based approach that evaluates 𝑄𝑆 first and passes the full information about 𝑆 to the kNN search operator. We study how to design a pre-filtering-based search algorithm that is robust under different selectivities as well as correlations of 𝑆 with 𝑣𝑄. We propose an adaptive algorithm that utilizes the local selectivity of each vector in the HNSW graph to pick a suitable heuristic at each iteration of the kNN search algorithm. We demonstrate NaviX’s robustness and efficiency through extensive experiments against both existing prefiltering- and postfiltering-based baselines that include specialized vector databases (Weaviate and Milvus) as well as DBMS extensions (PGVectorScale and VBase).