Estimates of statistical significance for comparison of individual positions in multiple sequence alignmentsReportar como inadecuado

Estimates of statistical significance for comparison of individual positions in multiple sequence alignments - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Bioinformatics

, 5:106

First Online: 05 August 2004Received: 01 April 2004Accepted: 05 August 2004


BackgroundProfile-based analysis of multiple sequence alignments MSA allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between 1 MSA position and a set of predicted residue frequencies, and 2 between two MSA positions. These problems are important for i evaluation and optimization of methods predicting residue occurrence at protein positions; ii detection of potentially misaligned regions in automatically produced alignments and their further refinement; and iii detection of sites that determine functional or structural specificity in two related families.

ResultsFor problems 1 and 2, we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. a We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. b We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. c We compare MSA positions that correspond to residues aligned by automatic structure aligners. d We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance.

ConclusionThe proposed computational method is of significant potential value for the analysis of protein families.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-5-106 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Autor: Ruslan I Sadreyev - Nick V Grishin


Documentos relacionados