An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparisonReportar como inadecuado

An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparison - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Public Health

, 15:341

Biostatistics and methods


BackgroundOther forms of tobacco use are increasing in prevalence, yet most tobacco control efforts are aimed at cigarettes. In light of this, it is important to identify individuals who are using both cigarettes and alternative tobacco products ATPs. Most previous studies have used regression models. We conducted a traditional logistic regression model and a classification and regression tree CART model to illustrate and discuss the added advantages of using CART in the setting of identifying high-risk subgroups of ATP users among cigarettes smokers.

MethodsThe data were collected from an online cross-sectional survey administered by Survey Sampling International between July 5, 2012 and August 15, 2012. Eligible participants self-identified as current smokers, African American, White, or Latino of any race, were English-speaking, and were at least 25 years old. The study sample included 2,376 participants and was divided into independent training and validation samples for a hold out validation. Logistic regression and CART models were used to examine the important predictors of cigarettes + ATP users.

ResultsThe logistic regression model identified nine important factors: gender, age, race, nicotine dependence, buying cigarettes or borrowing, whether the price of cigarettes influences the brand purchased, whether the participants set limits on cigarettes per day, alcohol use scores, and discrimination frequencies. The C-index of the logistic regression model was 0.74, indicating good discriminatory capability. The model performed well in the validation cohort also with good discrimination c-index = 0.73 and excellent calibration R-square = 0.96 in the calibration regression. The parsimonious CART model identified gender, age, alcohol use score, race, and discrimination frequencies to be the most important factors. It also revealed interesting partial interactions. The c-index is 0.70 for the training sample and 0.69 for the validation sample. The misclassification rate was 0.342 for the training sample and 0.346 for the validation sample. The CART model was easier to interpret and discovered target populations that possess clinical significance.

ConclusionThis study suggests that the non-parametric CART model is parsimonious, potentially easier to interpret, and provides additional information in identifying the subgroups at high risk of ATP use among cigarette smokers.

KeywordsSurvey sampling Stratified samples Logistic regression CART Partial interaction AbbreviationsCARTClassification and Regression Trees

ATPAlternative tobacco product

AICAkaike Information Criterion

Download fulltext PDF

Autor: Yang Lei - Nikki Nollen - Jasjit S Ahluwahlia - Qing Yu - Matthew S Mayo


Documentos relacionados