# An Incremental Reseeding Strategy for Clustering

## Abstract

We propose an easy-to-implement and highly parallelizable algorithm for multiway graph partitioning. The algorithm proceeds by alternating three simple routines in an iterative fashion: diffusion , thresholding, and random sampling. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmark data sets. We also describe a coarsen, cluster and refine approach similar to Dhillon et al. (IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957, 2007) and Karypis and Kumar (SIAM J Sci Comput 20(1):359–392, 1998) that removes an order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy.

## Notes

### Acknowledgements

XB is supported by NRF Fellowship NRFF2017-10.

## References

- 1.R. Andersen, F. Chung, K. Lang, Local graph partitioning using pagerank vectors, in
*Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS ’06)*, pp. 475–486 (2006)Google Scholar - 2.R. Arora, M. Gupta, A. Kapila, M. Fazel, Clustering by left-stochastic matrix factorization, in
*International Conference on Machine Learning (ICML)*(2011), pp. 761–768Google Scholar - 3.X. Bresson, T. Laurent, D. Uminsky, J. von Brecht, Multiclass total variation clustering, in
*Advances in Neural Information Processing Systems (NIPS)*(2013)Google Scholar - 4.X. Bresson, T. Laurent, A. Szlam, J.H. von Brecht, The product cut, in
*Advances in Neural Information Processing Systems (NIPS)*(2016)Google Scholar - 5.J. Bruna, S. Mallat, Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell.
**35**(8), 1872–1886 (2013)CrossRefGoogle Scholar - 6.I.S. Dhillon, Y. Guan, B. Kulis, Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell.
**29**(11), 1944–1957 (2007)CrossRefGoogle Scholar - 7.C. Garcia-Cardona, E. Merkurjev, A.L. Bertozzi, A. Flenner, A.G. Percus, Multiclass data segmentation using diffuse interface methods on graphs. IEEE Trans. Pattern Anal. Mach. Intell.
**99**, 1 (2014)zbMATHGoogle Scholar - 8.G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput.
**20**(1), 359–392 (1998)MathSciNetCrossRefGoogle Scholar - 9.S. Lafon, A.B. Lee, Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans. Pattern Anal. Mach. Intell.
**28**(9), 1393–1403 (2006)CrossRefGoogle Scholar - 10.F. Lin, W.W. Cohen, Power iteration clustering, in
*ICML*(2010), pp. 655–662Google Scholar - 11.L. Lovász, M. Simonovits, Random walks in a convex body and an improved volume algorithm. Random Struct. Algorithms
**4**(4), 359–412 (1993)MathSciNetCrossRefGoogle Scholar - 12.A.K. McCallum, Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering (1996). http://www.cs.cmu.edu/~mccallum/bow
- 13.D.A. Spielman, S.-H. Teng, Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems, in
*Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing*(2004), pp. 81–90Google Scholar - 14.D.A. Spielman, S.-H. Teng, A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput.
**42**(1), 1–26 (2013)MathSciNetCrossRefGoogle Scholar - 15.M. Stephane, Group invariant scattering. Commun. Pure Appl. Math.
**65**(10), 1331–1398 (2012)MathSciNetCrossRefGoogle Scholar - 16.Z. Yang, T. Hao, O. Dikmen, X. Chen, E. Oja, Clustering by nonnegative matrix factorization using graph random walk, in
*Advances in Neural Information Processing Systems (NIPS)*(2012), pp. 1088–1096CrossRefGoogle Scholar - 17.S.X. Yu, J. Shi, Multiclass spectral clustering. in international conference on computer vision, in
*International Conference on Computer Vision*(2003)Google Scholar - 18.X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions. in
*IN ICML*, pp. 912–919 (2003), pp. 912–919Google Scholar