An Incremental Reseeding Strategy for Clustering
We propose an easy-to-implement and highly parallelizable algorithm for multiway graph partitioning. The algorithm proceeds by alternating three simple routines in an iterative fashion: diffusion , thresholding, and random sampling. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmark data sets. We also describe a coarsen, cluster and refine approach similar to Dhillon et al. (IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957, 2007) and Karypis and Kumar (SIAM J Sci Comput 20(1):359–392, 1998) that removes an order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy.
XB is supported by NRF Fellowship NRFF2017-10.
- 1.R. Andersen, F. Chung, K. Lang, Local graph partitioning using pagerank vectors, in Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS ’06), pp. 475–486 (2006)Google Scholar
- 2.R. Arora, M. Gupta, A. Kapila, M. Fazel, Clustering by left-stochastic matrix factorization, in International Conference on Machine Learning (ICML) (2011), pp. 761–768Google Scholar
- 3.X. Bresson, T. Laurent, D. Uminsky, J. von Brecht, Multiclass total variation clustering, in Advances in Neural Information Processing Systems (NIPS) (2013)Google Scholar
- 4.X. Bresson, T. Laurent, A. Szlam, J.H. von Brecht, The product cut, in Advances in Neural Information Processing Systems (NIPS) (2016)Google Scholar
- 10.F. Lin, W.W. Cohen, Power iteration clustering, in ICML (2010), pp. 655–662Google Scholar
- 12.A.K. McCallum, Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering (1996). http://www.cs.cmu.edu/~mccallum/bow
- 13.D.A. Spielman, S.-H. Teng, Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems, in Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing (2004), pp. 81–90Google Scholar
- 17.S.X. Yu, J. Shi, Multiclass spectral clustering. in international conference on computer vision, in International Conference on Computer Vision (2003)Google Scholar
- 18.X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions. in IN ICML, pp. 912–919 (2003), pp. 912–919Google Scholar