Approximating quantiles in very large datasets - Statistics > ComputationReportar como inadecuado

Approximating quantiles in very large datasets - Statistics > Computation - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: Very large datasets are often encountered in climatology, either from amultiplicity of observations over time and space or outputs from deterministicmodels sometimes in petabytes= 1 million gigabytes. Loading a large datavector and sorting it, is impossible sometimes due to memory limitations orcomputing power. We show that a proposed algorithm to approximating the median,-the median of the median- performs poorly. Instead we develop an algorithm toapproximate quantiles of very large datasets which works by partitioning thedata or use existing partitions possibly of non-equal size. We show thedeterministic precision of this algorithm and how it can be adjusted to getcustomized precisions.

Autor: Reza Hosseini


Documentos relacionados