# Scaling Analysis of Affinity Propagation

1 TAO - Machine Learning and Optimisation LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623

Abstract : We analyze and exploit some scaling properties of the {\em Affinity Propagation} AP clustering algorithm proposed by Frey and Dueck 2007. Following a divide and conquer strategy we setup an exact renormalization-based approach to address the question of clustering consistency, in particular, how many cluster are present in a given data set. We first observe that the divide and conquer strategy, used on a large data set hierarchically reduces the complexity ${\cal O}N^2$ to ${\cal O}N^{h+2-h+1}$, for a data-set of size $N$ and a depth $h$ of the hierarchical strategy. For a data-set embedded in a $d$-dimensional space, we show that this is obtained without notably damaging the precision except in dimension $d=2$. In fact, for $d$ larger than $2$ the relative loss in precision scales like $N^{2-d-h+1d}$. Finally, under some conditions we observe that there is a value $s^*$ of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase for $ss^*$ of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position, as a result of an exact decimation procedure. From this observation, a strategy based on \AP\ can be defined to find out how many clusters are present in a given dataset.

Keywords : Clustering Belief-Propagation Extrem value statistics Scale invariance Renormalization

Author: Cyril Furtlehner - Michèle Sebag - Zhang Xiangliang -

Source: https://hal.archives-ouvertes.fr/