Efficient and accurate whole genome assembly and methylome profiling of E. coliReportar como inadecuado

Efficient and accurate whole genome assembly and methylome profiling of E. coli - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Genomics

, 14:675

Prokaryote microbial genomics


BackgroundWith the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes.

ResultsThree E. coli strains – BL21DE3, Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21DE3 genome GenBank:AM946981.2, allowed us to evaluate the accuracy of each of the BL21DE3 assemblies. BL21DE3 PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies ~20 SNPs vs. ~50 SNPs and indels also saw dramatic reductions ~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies. Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing.

ConclusionUsing data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications.

KeywordsGenome assembly Illumina MiSeq Ion Torrent PGM PacBio RS Base modifications E. coli Hybrid assembly 5mC AbbreviationsPacBioPacific Biosciences

RSAHA, A Hybrid Scaffolder

DAMDNA adenine methylase

DCMDNA cytosine methylase.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2164-14-675 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Autor: Jason G Powers - Victor J Weigman - Jenny Shu - John M Pufky - Donald Cox - Patrick Hurban

Fuente: https://link.springer.com/

Documentos relacionados