Identification of properties important to protein aggregation using feature selectionReportar como inadecuado

Identification of properties important to protein aggregation using feature selection - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Bioinformatics

, 14:314

Sequence analysis applications


BackgroundProtein aggregation is a significant problem in the biopharmaceutical industry protein drug stability and is associated medically with over 40 human diseases. Although a number of computational models have been developed for predicting aggregation propensity and identifying aggregation-prone regions in proteins, little systematic research has been done to determine physicochemical properties relevant to aggregation and their relative importance to this important process. Such studies may result in not only accurately predicting peptide aggregation propensities and identifying aggregation prone regions in proteins, but also aid in discovering additional underlying mechanisms governing this process.

ResultsWe use two feature selection algorithms to identify 16 features, out of a total of 560 physicochemical properties, presumably important to protein aggregation. Two predictors ProA-SVM and ProA-RF using selected features are built for predicting peptide aggregation propensity and identifying aggregation prone regions in proteins. Both methods are compared favourably to other state-of-the-art algorithms in cross validation. The identified important properties are fairly consistent with previous studies and bring some new insights into protein and peptide aggregation. One interesting new finding is that aggregation prone peptide sequences have similar properties to signal peptide and signal anchor sequences.

ConclusionsBoth predictors are implemented in a freely available web application We suggest that the quaternary structure of protein aggregates, especially soluble oligomers, may allow the formation of new molecular recognition signals that guide aggregate targeting to specific cellular sites.

KeywordsAggregation Amyloid Peptide Prediction Feature selection Machine learning AbbreviationsProA-SVMProtein Aggregation SVM Predictor

ProA-RFProtein Aggregation RF predictor

SVMSupport Vector Machine

RFRandom Forest

SVM-RFESVM based recursive feature elimination

RF-ISRandom Forest importance spectrum based feature selection


GBMGeneralized Boosted Model

RPARTRecursive Partitioning And Regression Tree

NNetNeural Network

PLSPartial Least Square

KNNK-Nearest Neighbour

NBNaive Bayes

TPThe number of True Positive samples

FNThe number of False Negatives samples

FPThe number of False Positives samples

TNThe number of True Negatives samples


MCCMatthews correlation coefficient.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-14-314 contains supplementary material, which is available to authorized users.

Yaping Fang, Shan Gao contributed equally to this work.

Download fulltext PDF

Autor: Yaping Fang - Shan Gao - David Tai - C Russell Middaugh - Jianwen Fang


Documentos relacionados