iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training DatasetsReportar como inadecuado




iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

1

Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China

2

Gordon Life Science Institute, Boston, MA 02478, USA

3

Center of Excellence in Genomic Medicine Research CEGMR, King Abdulaziz University, Jeddah 21589, Saudi Arabia





*

Authors to whom correspondence should be addressed.



Academic Editor: Derek J. McPhee

Abstract Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites PPBSs based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: 1 the K-Nearest Neighbors Cleaning KNNC and Inserting Hypothetical Training Samples IHTS treatments to optimize the training dataset; 2 the ensemble voting approach to select the most relevant features; and 3 the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein-peptide sequences might be the key in grasping the problem’s essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor’s web server http:-www.jci-bioinfo.cn-iPPBS-Opt to get the desired results without the need to go through the complicated mathematical equations involved. View Full-Text

Keywords: protein-protein binding sites; physicochemical property; stationary wavelet transform; PseAAC; Optimize training dataset; KNNC; IHTS; target cross-validation protein-protein binding sites; physicochemical property; stationary wavelet transform; PseAAC; Optimize training dataset; KNNC; IHTS; target cross-validation





Autor: Jianhua Jia 1,2,* , Zi Liu 1, Xuan Xiao 1,2,* , Bingxiang Liu 1 and Kuo-Chen Chou 2,3

Fuente: http://mdpi.com/



DESCARGAR PDF




Documentos relacionados