# Efficient Fully-Compressed Sequence Representations - Computer Science > Data Structures and Algorithms

Efficient Fully-Compressed Sequence Representations - Computer Science > Data Structures and Algorithms - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: We present a data structure that stores a sequence $s1 n$ over alphabet$1 \sigma$ in $n\Hos + on\Hos{+}1$ bits, where $\Hos$ is thezero-order entropy of $s$. This structure supports the queries \access, ank\and \select, which are fundamental building blocks for many other compresseddata structures, in worst-case time $\Oh{\lg\lg\sigma}$ and average time$\Oh{\lg \Hos}$. The worst-case complexity matches the best previous results,yet these had been achieved with data structures using $n\Hos+on\lg\sigma$bits. On highly compressible sequences the $on\lg\sigma$ bits of theredundancy may be significant compared to the the $n\Hos$ bits that encodethe data. Our representation, instead, compresses the redundancy as well.Moreover, our average-case complexity is unprecedented. Our technique is basedon partitioning the alphabet into characters of similar frequency. Thesubsequence corresponding to each group can then be encoded using fastuncompressed representations without harming the overall compression ratios,even in the redundancy. The result also improves upon the best currentcompressed representations of several other data structures. For example, weachieve $i$ compressed redundancy, retaining the best time complexities, forthe smallest existing full-text self-indexes; $ii$ compressed permutations$\pi$ with times for $\pi$ and $\pii$ improved to loglogarithmic; and$iii$ the first compressed representation of dynamic collections of disjointsets. We also point out various applications to inverted indexes, suffixarrays, binary relations, and data compressors.

.

Autor: Jeremy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro, Yakov Nekrich

Fuente: https://arxiv.org/