Node harvest - Statistics > Machine LearningReportar como inadecuado

Node harvest - Statistics > Machine Learning - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: When choosing a suitable technique for regression and classification withmultivariate predictor variables, one is often faced with a tradeoff betweeninterpretability and high predictive accuracy. To give a classical example,classification and regression trees are easy to understand and interpret. Treeensembles like Random Forests provide usually more accurate predictions. Yettree ensembles are also more difficult to analyze than single trees and areoften criticized, perhaps unfairly, as `black box- predictors. Node harvest istrying to reconcile the two aims of interpretability and predictive accuracy bycombining positive aspects of trees and tree ensembles. Results are very sparseand interpretable and predictive accuracy is extremely competitive, especiallyfor low signal-to-noise data. The procedure is simple: an initial set of a fewthousand nodes is generated randomly. If a new observation falls into just asingle node, its prediction is the mean response of all training observationwithin this node, identical to a tree-like prediction. A new observation fallstypically into several nodes and its prediction is then the weighted average ofthe mean responses across all these nodes. The only role of node harvest is to`pick- the right nodes from the initial large ensemble of nodes by choosingnode weights, which amounts in the proposed algorithm to a quadraticprogramming problem with linear inequality constraints. The solution is sparsein the sense that only very few nodes are selected with a nonzero weight. Thissparsity is not explicitly enforced. Maybe surprisingly, it is not necessary toselect a tuning parameter for optimal predictive accuracy. Node harvest canhandle mixed data and missing values and is shown to be simple to interpret andcompetitive in predictive accuracy on a variety of data sets.

Autor: Nicolai Meinshausen


Documentos relacionados