Hybrid de novo tandem repeat detection using short and long readsReport as inadecuate

Hybrid de novo tandem repeat detection using short and long reads - Download this document for free, or read online. Document in PDF available to download.

BMC Medical Genomics

, 8:S5

First Online: 23 September 2015


BackgroundAs one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%.

MethodsIn this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies.

ResultsMixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns.

ConclusionsOur method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.

Keywordstandem repeat second-generation sequencing third-generation sequencing de Bruijn graph List of abbreviationsSGSSecond Generation Sequencing

TGSThird Generation Sequencing

TRTandem repeat

ETRExact tandem repeat

ATRApproximate tandem repeat

SRset of Short Reads obtained with a SGS technology

LRset of Long Reads obtained with Pacific Bioscience sequencing technology.

Electronic supplementary materialThe online version of this article doi:10.1186-1755-8794-8-S3-S5 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Author: Guillaume Fertin - Géraldine Jean - Andreea Radulescu - Irena Rusu

Source: https://link.springer.com/


Related documents