Removal of redundant contigs from de novo RNA-Seq assemblies via homology search improves accurate detection of differentially expressed genesReportar como inadecuado




Removal of redundant contigs from de novo RNA-Seq assemblies via homology search improves accurate detection of differentially expressed genes - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Genomics

, 16:1031

Transcriptomic methods

Abstract

BackgroundFor plant species with unsequenced genomes, cDNA contigs created by de novo assembly of RNA-Seq reads are used as reference sequences for comparative analysis of RNA-Seq datasets and the detection of differentially expressed genes DEGs. Redundancies in such contigs are evident in previous RNA-Seq studies, and such redundancies can lead to difficulties in subsequent analysis. Nevertheless, the effects of removing redundancy from contig assemblies on comparative RNA-Seq analysis have not been evaluated.

ResultsHere we describe a method for removing redundancy from raw contigs that were primarily created by de novo assembly of Arabidopsis thaliana RNA-Seq reads. Specifically, the contigs with the highest bit scores were selected from raw contigs by a homology search against the gene dataset in the TAIR10 database. The two existing methods for removal of redundancy based on contig length or clustering analysis used to eliminate redundancies from raw contigs. Contig number was reduced most effectively with the method based on homology search. In a comparative analysis of RNA-Seq datasets, DEGs detected in contigs that underwent redundancy removal via the homology search method showed the highest identity to the DEGs detected when the TAIR10 gene dataset was used as an exact reference. Redundancy in raw contigs could also be removed by a homology search against integrated protein datasets from several plant species other than A. thaliana. DEGs detected using contigs that underwent such redundancy-removed also showed high homology to DEGs detected using the TAIR10 gene dataset.

ConclusionHere we describe a method for removing redundant contigs within raw contigs; this method involves a homology search against a gene or protein database. In principal, this method can be used with unsequenced plant genomes that lack a well-developed gene database. Redundant contigs were not removed adequately via either of two existing methods, but our method allowed for removal of all redundant contigs. To our knowledge, this is the first reported improvement in accurate detection of DEGs via comparative RNA-Seq analysis that involved preparation of a non-redundant reference sequence. This method could be used to rapidly and cost-effectively detect useful genes in unsequenced plants.

KeywordsPlant breeding Unsequenced plant genomes de novo assembly RNA-Seq Redundant contigs Genome editing Genetic modification Transcriptome Trinity AbbreviationsagriGOa GO analysis toolkit for the agricultural community

DEGsDifferentially expressed genes

TAIRThe Arabidopsis Information Resource

Download fulltext PDF



Autor: Hanako Ono - Kazuo Ishii - Toshinori Kozaki - Isao Ogiwara - Motoki Kanekatsu - Tetsuya Yamada

Fuente: https://link.springer.com/







Documentos relacionados