Robust nearest-neighbor methods for classifying high-dimensional data - Mathematics > Statistics TheoryReportar como inadecuado




Robust nearest-neighbor methods for classifying high-dimensional data - Mathematics > Statistics Theory - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: We suggest a robust nearest-neighbor approach to classifying high-dimensionaldata. The method enhances sensitivity by employing a threshold and truncates toa sequence of zeros and ones in order to reduce the deleterious impact ofheavy-tailed data. Empirical rules are suggested for choosing the threshold.They require the bare minimum of data; only one data vector is needed from eachpopulation. Theoretical and numerical aspects of performance are explored,paying particular attention to the impacts of correlation and heterogeneityamong data components. On the theoretical side, it is shown that our truncated,thresholded, nearest-neighbor classifier enjoys the same classificationboundary as more conventional, nonrobust approaches, which require finitemoments in order to achieve good performance. In particular, the greaterrobustness of our approach does not come at the price of reduced effectiveness.Moreover, when both training sample sizes equal 1, our new method can haveperformance equal to that of optimal classifiers that require independent andidentically distributed data with known marginal distributions; yet, ourclassifier does not itself need conditions of this type.



Autor: Yao-ban Chan, Peter Hall

Fuente: https://arxiv.org/







Documentos relacionados