Predicting RNA-Protein Interactions Using Only Sequence InformationReportar como inadecuado




Predicting RNA-Protein Interactions Using Only Sequence Information - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Bioinformatics

, 12:489

Sequence analysis applications

Abstract

BackgroundRNA-protein interactions RPIs play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions.

ResultsWe propose RPISeq , a family of classifiers for predicting R NA-p rotein i nteractions using only seq uence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine SVM classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database PRIDB, RPISeq achieved an AUC Area Under the Receiver Operating Characteristic ROC curve of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features e.g., mRNA half-life, GO annotations of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority 57-99% of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens.

ConclusionsOur experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http:-pridb.gdcb.iastate.edu-RPISeq-.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-12-489 contains supplementary material, which is available to authorized users.

Download fulltext PDF



Autor: Usha K Muppirala - Vasant G Honavar - Drena Dobbs

Fuente: https://link.springer.com/







Documentos relacionados