AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics dataReportar como inadecuado




AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Genomics

, 15:S1

First Online: 12 December 2014DOI: 10.1186-1471-2164-15-S10-S1

Cite this article as: Yu, W. & Park, T. BMC Genomics 2014 15Suppl 10: S1. doi:10.1186-1471-2164-15-S10-S1

Abstract

MotivationIt is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve AUC have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data.

ResultsWe propose an AUC-based approach using penalized regression AucPR, which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes.

ConclusionWe propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

KeywordsAUC high-dimensional data penalized regression ROC curve  Download fulltext PDF



Autor: Wenbao Yu - Taesung Park

Fuente: https://link.springer.com/







Documentos relacionados