Advertisement

Statistics and Computing

, Volume 29, Issue 2, pp 391–414 | Cite as

Minimum spectral connectivity projection pursuit

Divisive clustering using optimal projections for spectral clustering
  • David P. HofmeyrEmail author
  • Nicos G. Pavlidis
  • Idris A. Eckley
Article

Abstract

We study the problem of determining the optimal low-dimensional projection for maximising the separability of a binary partition of an unlabelled dataset, as measured by spectral graph theory. This is achieved by finding projections which minimise the second eigenvalue of the graph Laplacian of the projected data, which corresponds to a non-convex, non-smooth optimisation problem. We show that the optimal univariate projection based on spectral connectivity converges to the vector normal to the maximum margin hyperplane through the data, as the scaling parameter is reduced to zero. This establishes a connection between connectivity as measured by spectral graph theory and maximal Euclidean separation. The computational cost associated with each eigen problem is quadratic in the number of data. To mitigate this issue, we propose an approximation method using microclusters with provable approximation error bounds. Combining multiple binary partitions within a divisive hierarchical model allows us to construct clustering solutions admitting clusters with varying scales and lying within different subspaces. We evaluate the performance of the proposed method on a large collection of benchmark datasets and find that it compares favourably with existing methods for projection pursuit and dimension reduction for data clustering. Applying the proposed approach for a decreasing sequence of scaling parameters allows us to obtain large margin clustering solutions, which are found to be competitive with those from dedicated maximum margin clustering algorithms.

Keywords

Spectral clustering Dimension reduction Projection pursuit Maximum margin 

Notes

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful recommendations, which helped improve the quality of the paper. They would also like to thank Dr. Teemu Roos for his valuable comments on this work. Finally, they are very grateful to Dr. Kai Zhang for providing code to implement the iSVR algorithm.

Supplementary material

References

  1. Bach, F.R., Jordan, M.I.: Learning spectral clustering, with application to speech separation. J. Mach. Learn. Res. 7, 1963–2001 (2006)MathSciNetzbMATHGoogle Scholar
  2. Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  3. Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15, 1455–1459 (2014)zbMATHGoogle Scholar
  4. Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2006)MathSciNetzbMATHGoogle Scholar
  5. Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: On evolutionary spectral clustering. ACM Trans. Knowl. 3(4), 17:1–17:30 (2009)Google Scholar
  6. Edelman, A., Arias, T., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)MathSciNetzbMATHGoogle Scholar
  7. Fan, K.: On a theorem of weyl concerning eigenvalues of linear transformations I. Proc. Natl. Acad. Sci. USA 35(11), 652 (1949)MathSciNetGoogle Scholar
  8. Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 11(9), 1074–1085 (1992)Google Scholar
  9. Hartigan, J.A., Hartigan, P.M.: The dip test of unimodality. Ann. Stat. 13(1), 70–84 (1985)MathSciNetzbMATHGoogle Scholar
  10. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Texts in Statistics, 2nd edn. Springer, New York (2009)zbMATHGoogle Scholar
  11. Hofmeyr, D., Pavlidis, N.: Maximum clusterability divisive clustering. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 780–786. IEEE (2015)Google Scholar
  12. Hofmeyr, D.: Improving spectral clustering using the asymptotic value of the normalised cut. arXiv preprint arXiv:1703.09975 (2017)
  13. Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of International Conference on Machine Learning (ICML), Bled, Slowenien, vol. 99, pp. 200–209 (1999)Google Scholar
  14. Kaiser, H.F.: The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20(1), 141–151 (1960)Google Scholar
  15. Krause, A., Liebscher, V.: Multimodal projection pursuit using the dip statistic. Preprint-Reihe Mathematik 13 (2005)Google Scholar
  16. Lewis, A.S., Overton, M.L.: Eigenvalue optimization. Acta Numer. 5, 149–190 (1996)MathSciNetzbMATHGoogle Scholar
  17. Lewis, A., Overton, M.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141, 135–163 (2013)MathSciNetzbMATHGoogle Scholar
  18. Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econ. Theory 1(02), 179–191 (1985)Google Scholar
  19. Ng, A., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 849–856. MIT Press, Cambridge (2002)Google Scholar
  20. Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.S.: Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recogn. 43(1), 113–127 (2010)zbMATHGoogle Scholar
  21. Niu, D., Dy, J.G., Jordan, M.I.: Dimensionality reduction for spectral clustering. In: International Conference on Artificial Intelligence and Statistics, pp. 552–560 (2011)Google Scholar
  22. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006)zbMATHGoogle Scholar
  23. Overton, M.L., Womersley, R.S.: Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Program. 62(1–3), 321–357 (1993)MathSciNetzbMATHGoogle Scholar
  24. Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)MathSciNetzbMATHGoogle Scholar
  25. Peña, D., Prieto, F.J.: Cluster identification using projections. J. Am. Stat. Assoc. 147, 389 (2001)MathSciNetzbMATHGoogle Scholar
  26. Polak, E.: On the mathematical foundations of nondifferentiable optimization in engineering design. SIAM Rev. 29(1), 21–89 (1987)MathSciNetGoogle Scholar
  27. Rahimi, A., Recht, B.: Clustering with normalized cuts is clustering with a hyperplane. Stat. Learn. Comput. Vis. 56, 1 (2004)Google Scholar
  28. Schur, J.: Bemerkungen zur theorie der beschränkten bilinearformen mit unendlich vielen veränderlichen. J. für die reine und Angew. Math. 140, 1–28 (1911)MathSciNetzbMATHGoogle Scholar
  29. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)Google Scholar
  30. Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNetzbMATHGoogle Scholar
  31. Tong, S., Koller, D.: Restricted Bayes optimal classifiers. In: AAAI/IAAI, pp. 658–664 (2000)Google Scholar
  32. Trillos, N.G., Slepčev, D., Von Brecht, J., Laurent, T., Bresson, X.: Consistency of cheeger and ratio graph cuts. J. Mach. Learn. Res. 17(1), 6268–6313 (2016)MathSciNetzbMATHGoogle Scholar
  33. Vapnik, V.N., Kotz, S.: Estimation of Dependences Based on Empirical Data, vol. 40. Springer, New York (1982)zbMATHGoogle Scholar
  34. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetGoogle Scholar
  35. Wagner, D., Wagner, F.: Between Min Cut and Graph Bisection. Springer, Berlin (1993)zbMATHGoogle Scholar
  36. Wang, F., Zhao, B., Zhang, C.: Linear time maximum margin clustering. IEEE Trans. Neural Netw. 21(2), 319–332 (2010)Google Scholar
  37. Weiss, Y.: Segmentation using eigenvectors: a unifying view. In: Proceedings of the 7th IEEE International Conference on Computer Vision, vol. 2, pp. 975–982 (1999)Google Scholar
  38. Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Math. Ann. 71(4), 441–479 (1912)MathSciNetzbMATHGoogle Scholar
  39. Wolfe, P.: On the convergence of gradient methods under constraint. IBM J. Res. Dev. 16(4), 407–411 (1972)MathSciNetzbMATHGoogle Scholar
  40. Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: Advances in Neural Information Processing Systems, pp. 1537–1544 (2004)Google Scholar
  41. Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)Google Scholar
  42. Ye, Q.: Relative perturbation bounds for eigenvalues of symmetric positive definite diagonally dominant matrices. SIAM J. Matrix Anal. Appl. 31(1), 11–17 (2009)MathSciNetzbMATHGoogle Scholar
  43. Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Advances in Neural Information Processing Systems, pp. 1601–1608 (2004)Google Scholar
  44. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD Record, vol. 25, pp. 103–114. ACM (1996)Google Scholar
  45. Zhang, B.: Dependence of clustering algorithm performance on clustered-ness of data. Technical Report, 20010417. Hewlett-Packard Labs (2001)Google Scholar
  46. Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. IEEE Trans. Neural Netw. 20(4), 583–596 (2009)Google Scholar
  47. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • David P. Hofmeyr
    • 1
    Email author
  • Nicos G. Pavlidis
    • 2
  • Idris A. Eckley
    • 3
  1. 1.Department of Statistics and Actuarial ScienceStellenbosch UniversityStellenboschSouth Africa
  2. 2.Department of Management ScienceLancaster UniversityLancasterUK
  3. 3.Department of Mathematics and StatisticsLancaster UniversityLancasterUK

Personalised recommendations