WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing ReadsReportar como inadecuado

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

1 LBBE - Laboratoire de Biométrie et Biologie Evolutive 2 Center for Bioinformatics Saarbrücken 3 MPII - Max Planck Institut für Informatik 4 ERABLE - Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale Inria Grenoble - Rhône-Alpes 5 DI - Dipartimento di Informatica Pisa 6 Department of Mathematics and Statistics Christchurch 7 Department of mathematics and computing science Eindhoven 8 MAC4 - Life Sciences Amsterdam

Abstract : The human genome is diploid, which requires assigning heterozygous single nucleotide polymorphisms SNPs to the two copies of the genome. The resulting haplotypes, lists of SNPs belonging to each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, which are oblivious to direct read information, constitute the state-of-the-art. Haplotype assembly, which addresses phasing directly from sequencing reads, suffers from the fact that sequencing reads of the current generation are too short to serve the purposes of genome-wide phasing. While future-technology sequencing reads will contain sufficient amounts of SNPs per read for phasing, they are also likely to suffer from higher sequencing error rates. Currently, no haplotype assembly approaches exist that allow for taking both increasing read length and sequencing error information into account. Here, we suggest WhatsHap, the first approach that yields provably optimal solutions to the weighted minimum error correction problem in runtime linear in the number of SNPs. WhatsHap is a fixed parameter tractable FPT approach with coverage as the parameter. We demonstrate that WhatsHap can handle datasets of coverage up to 20×, and that 15× are generally enough for reliably phasing long reads, even at significantly elevated sequencing error rates. We also find that the switch and flip error rates of the haplotypes we output are favorable when comparing them with state-of-the-art statistical phasers.

Keywords : dynamic programming algorithms combinatorial optimization haplotypes next generation sequencing

Autor: Murray Patterson - Tobias Marschall - Nadia Pisanti - Leo Van Iersel - Leen Stougie - Gunnar W Klau - Alexander Schönhuth -

Fuente: https://hal.archives-ouvertes.fr/


Documentos relacionados