Master’s Thesis Presentation • Data Systems • Disk-based Indexing for NIR-Trees using Polygon Overlays

Friday, January 19, 2024 4:00 pm - 5:00 pm EST (GMT -05:00)

Please note: This master’s thesis presentation will take place online.

Fadhil Abubaker, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Khuzaima Daudjee

This thesis presents the NIR+-Tree, a disk-resident R-Tree variant that eliminates overlap among its MBRs. The NIR+-Tree is an extension of the main-memory NIR-Tree, adopting techniques for efficient storage and retrieval on disk. By employing non-intersecting polygons instead of rectangles for data partitioning, the NIR+-Tree minimizes the number of spurious disk accesses incurred due to MBR overlap. To stabilize the height of the NIR+-Tree, the dynamically-sized polygons are stored in main-memory using an efficient encoding. Experimental results show that the NIR+-Tree is efficient at point queries and selective range queries, using 2× to 5× fewer disk accesses than its closest competitors, the R+-Tree and the R*-Tree.

Additionally, this thesis investigates bulk-loading algorithms for the NIR+-Tree. Bulk-loading can be used to efficiently construct an index from a pre-defined set of data. Bulk-loading algorithms that generate MBRs with significant overlap create NIR+-Trees with undesirable, complex polygons. This thesis shows that top-down bulk-loading algorithms are better suited for the NIR+-Tree than bottom-up algorithms, due to their overlap minimizing properties. These techniques enable the NIR+-Tree to be a complete, disk-based indexing solution for spatial data.