Hicham
El-Zein,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
In modern computation the volume of data-sets has increased dramatically. Since the majority of these data-sets are stored in internal memory, reducing their storage requirement is an important research topic. One way of reducing storage is using compression schemes. However, standard compression techniques alone are useless as a tool in the context of handling data in main memory since the whole data has to be entirely decompressed to answer queries about a small fraction of the data. Succinct and compact data structures attempt to deal with this shortcoming by maintaining the data in compressed form with extra data structures over it in a way that allows efficient access and query of the data.
Succinct and compact data structures are data structures that emphasize space efficiency. The goal is to occupy as little space as possible while maintaining an efficient query time. More precisely, succinct data structures are data structures whose size is within the information theoretic lower bound plus a lower order term, while compact data structures are data structures whose size is constant factor from the information theoretic lower bound.
In general, the goal is to dramatically reduce the storage cost for structural information. For example, a suffix tree is a data structure that permits us to find all occurrences of a query string in time linear to the query. This is crucial in many applications including queries about the human genome. In that particular case, the suffix tree, naively implemented, takes about 80 times as much space as the raw data. Succinct methods reduces this to a factor of about 2. Moreover, this enables data structures to be stored in a faster (smaller) level of memory and so permit queries to be answered much more quickly.
In this thesis, we study succinct and compact data structures for various combinatorial objects.