BMC Bioinformatics

, 9:466

First Online: 03 November 2008Received: 12 March 2008Accepted: 03 November 2008


BackgroundIdentification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform DFT has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.

ResultsWe used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats HOR. For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies more than a thousand higher harmonics is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency multiplication factor n for n mer and higher harmonics. In general, n mer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1-f– noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations.

ConclusionDFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency secondary periodicity and a subset of pronounced peaks corresponding to constituent monomers primary periodicity. The number of lower frequency peaks secondary periodicity below the frequency of the first primary periodicity peak reveals the size of n mer HOR, i.e., the number n of monomers contained in consensus HOR.

AbbreviationsHORHigher Order Repeat

KSAKey String Algorithm

DFTDiscrete Fourier Transform.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-9-466 contains supplementary material, which is available to authorized users.

Autor: Vladimir Paar - Nenad Pavin - Ivan Basar - Marija Rosandić - Matko Glunčić - Nils Paar

Fuente: https://link.springer.com/


