State-of-the-art on clustering data streamsReportar como inadecuado

State-of-the-art on clustering data streams - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Big Data Analytics

, 1:13

Scalable, Intelligent Data Analytics and LearningScalable, Intelligent Data Analytics and Learning


Clustering is a key data mining task. This is the problem of partitioning a set of observations into clusters such that the intra-cluster observations are similar and the inter-cluster observations are dissimilar. The traditional set-up where a static dataset is available in its entirety for random access is not applicable as we do not have the entire dataset at the launch of the learning, the data continue to arrive at a rapid rate, we can not access the data randomly, and we can make only one or at most a small number of passes on the data in order to generate the clustering results. These types of data are referred to as data streams. The data stream clustering problem requires a process capable of partitioning observations continuously while taking into account restrictions of memory and time. In the literature of data stream clustering methods, a large number of algorithms use a two-phase scheme which consists of an online component that processes data stream points and produces summary statistics, and an offline component that uses the summary data to generate the clusters. An alternative class is capable of generating the final clusters without the need of an offline phase. This paper presents a comprehensive survey of the data stream clustering methods and an overview of the most well-known streaming platforms which implement clustering.

KeywordsData stream clustering Streaming platforms State-of-the-art  Download fulltext PDF

Autor: Mohammed Ghesmoune - Mustapha Lebbah - Hanene Azzag


Documentos relacionados