Finding Associations and Computing Similarity via Biased Pair Sampling - Computer Science > Data Structures and AlgorithmsReportar como inadecuado




Finding Associations and Computing Similarity via Biased Pair Sampling - Computer Science > Data Structures and Algorithms - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: This version is ***superseded*** by a full version that can be found atthis http URL, which contains strongertheoretical results and fixes a mistake in the reporting of experiments.Abstract: Sampling-based methods have previously been proposed for theproblem of finding interesting associations in data, even for low-supportitems. While these methods do not guarantee precise results, they can be vastlymore efficient than approaches that rely on exact counting. However, for manysimilarity measures no such methods have been known. In this paper we show howa wide variety of measures can be supported by a simple biased sampling method.The method also extends to find high-confidence association rules. Wedemonstrate theoretically that our method is superior to exact methods when thethreshold for -interesting similarity-confidence- is above the average pairwisesimilarity-confidence, and the average support is not too low. Our method isparticularly good when transactions contain many items. We confirm inexperiments on standard association mining benchmarks that this gives asignificant speedup on real data sets sometimes much larger than thetheoretical guarantees. Reductions in computation time of over an order ofmagnitude, and significant savings in space, are observed.



Autor: Andrea Campagna, Rasmus Pagh

Fuente: https://arxiv.org/







Documentos relacionados