1 SYMBIOSE - Biological systems and models, bioinformatics and sequences IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique 2 LaBRI - Laboratoire Bordelais de Recherche en Informatique 3 HELIX - Computer science and genomics Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive

Abstract : We present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.

Keywords : suffix tree k-factor tree string index gapped-factor gapped-factor tree

Author: Pierre Peterlongo - Julien Allali - Marie-France Sagot -



