Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnosticsReport as inadecuate

Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics - Download this document for free, or read online. Document in PDF available to download.

Human Genomics

, 11:10

First Online: 16 May 2017Received: 03 April 2017Accepted: 04 May 2017DOI: 10.1186-s40246-017-0104-8

Cite this article as: Mahmood, K., Jung, C., Philip, G. et al. Hum Genomics 2017 11: 10. doi:10.1186-s40246-017-0104-8


BackgroundGenetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools.

ResultsApparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets.

ConclusionsThese results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.

KeywordsVariant effect prediction Functional datasets Benchmarking Mutation assessment Pathogenicity prediction Protein function Functional assays Genomic screening AbbreviationsAUCArea under the curve

BEDBrowser extensible data

BRCA1-DMSBRCA1 deep mutational scanning

ClinvarHCClinvar high confidence

FPRFalse positive rate

HDRHomology-directed DNA repair

MAFMinor allele frequency

MCCMatthews correlation coefficient

OMIMOnline Mendelian Inheritance in Man

ROCReceiver operating characteristic

SPARQLSPARQL protocol and RDF query language

TP53-TATP53 transactivation assay

TPRTrue positive rate

UniFunUniProt-derived, functionally characterised

VCFVariant call format

Electronic supplementary materialThe online version of this article doi:10.1186-s40246-017-0104-8 contains supplementary material, which is available to authorized users.

Author: Khalid Mahmood - Chol-hee Jung - Gayle Philip - Peter Georgeson - Jessica Chung - Bernard J. Pope - Daniel J. Park


Related documents