Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequencesReport as inadecuate

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences - Download this document for free, or read online. Document in PDF available to download.

BMC Bioinformatics

, 18:300

Sequence analysis methods


BackgroundDNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA single-stranded DNA or dsDNA double-stranded DNA, and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins SSBs and double-stranded DNA-binding proteins DSBs. The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity.

In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC overall amino acid composition features, dipeptide compositions, PSSM position-specific scoring matrix profiles and split amino acid composition SAA, and then we adopt SVM support vector machine and RF random forest classification model to distinguish SSBs from DSBs.

ResultsOur results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC area under the curve of 0.919. Moreover, our method has good performance in independent testing.

ConclusionsUsing various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.

KeywordsSSBs Single-stranded DNA-binding proteins DSBs Double-stranded DNA-binding proteins Binding specificity Protein sequence AbbreviationsAccAccuracy

AUCArea under the curve

DSBsDouble-stranded DNA binding proteins

dsDNADouble-stranded DNA


MCCMathews Correlation Coefficient

MCCMatthew’s correlation coefficient

OAACOverall amino acid composition

PSSMPosition-specific scoring matrix

RFRandom forest

ROCReceiver operating characteristics

SAASplit amino acid



SSBsSingle-stranded DNA binding proteins,

ssDNASingle-stranded DNA

SVMSupport vector machine

Electronic supplementary materialThe online version of this article doi:10.1186-s12859-017-1715-8 contains supplementary material, which is available to authorized users.

Author: Wei Wang - Lin Sun - Shiguang Zhang - Hongjun Zhang - Jinling Shi - Tianhe Xu - Keliang Li


Related documents