Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approachReportar como inadecuado

Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Journal of Translational Medicine

, 14:235

First Online: 05 August 2016Received: 01 June 2016Accepted: 27 July 2016DOI: 10.1186-s12967-016-0992-8

Cite this article as: Surkis, A., Hogle, J.A., DiazGranados, D. et al. J Transl Med 2016 14: 235. doi:10.1186-s12967-016-0992-8


BackgroundTranslational research is a key area of focus of the National Institutes of Health NIH, as demonstrated by the substantial investment in the Clinical and Translational Science Award CTSA program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications.

MethodsBased on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1-T2 and T3-T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier.

ResultsThe definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating curves AUC, with an AUC of 0.94 for the T0 classifier, 0.84 for T1-T2, and 0.92 for T3-T4.

ConclusionsThe combination of definitions agreed upon by five CTSA hubs, a checklist that facilitates more uniform definition interpretation, and algorithms that perform well in classifying publications along the translational spectrum provide a basis for establishing and applying uniform definitions of translational research categories. The classification algorithms allow publication analyses that would not be feasible with manual classification, such as assessing the distribution and trends of publications across the CTSA network and comparing the categories of publications and their citations to assess knowledge transfer across the translational research spectrum.

KeywordsMachine learning Translational research Knowledge translation Text classification AbbreviationsNIHNational Institutes of Health

CTSAClinical and Translational Science Award

NCATSNational Center for Advancing Translational Sciences

IOMInstitute of Medicine

AUCarea under the receiver operating curves

MeSHmedical subject headings

PMIDPubMed Identifiers

SVMsupport vector machine

FPRfalse positive rate

TPRtrue positive rate

Electronic supplementary materialThe online version of this article doi:10.1186-s12967-016-0992-8 contains supplementary material, which is available to authorized users.

Autor: Alisa Surkis - Janice A. Hogle - Deborah DiazGranados - Joe D. Hunt - Paul E. Mazmanian - Emily Connors - Kate Westaby -


Documentos relacionados