Daniel
Gibney,
Department
of
Computer
Science
University
of
Central
Florida
First introduced in 1994, the Burrows-Wheeler Transform (BWT) went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to the entropy-based lower bound. Within the last decade, it has seen its role further enhanced with the development of suffix trees in space proportional to “r”, the number of runs in the BWT of the text. Moreover, r is appearing increasingly often in the time complexity of new algorithms, making having the smallest value of r of great importance. Unlike other popular measures of compression, the parameter r is sensitive to the lexicographic ordering given to the text’s alphabet. Despite several past attempts, a provably efficient algorithm for finding an optimal alphabet ordering minimizing r has been an open problem for many years.
We help to explain this lack of progress by presenting the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time (2^o(σ))n unless the Exponential Time Hypothesis fails, where σ is the size of the alphabet. Additionally, we provide an efficient algorithm for the more restricted problem of finding an optimal ordering on a subset of symbols (occurring only once) under ordering constraints. The algorithm runs in optimal time for small values of σ. We also look at a version of the problem on the newly discovered class of graphs with BWT like properties called Wheeler graphs. Here also we show NP-hardness results on a related problem which we call Source Ordering.
This is joint work with Jason Bentley and Sharma V. Thankachan.