Advertisement

Knowledge and Information Systems

, Volume 51, Issue 3, pp 991–1021 | Cite as

Finding multiple stable clusterings

  • Juhua HuEmail author
  • Qi Qian
  • Jian Pei
  • Rong Jin
  • Shenghuo Zhu
Regular Paper

Abstract

Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stable clusterings (MSC). We introduce to multi-clustering the notion of clustering stability based on Laplacian eigengap, which was originally used by the regularized spectral learning method for similarity matrix learning. We mathematically prove that the larger the eigengap, the more stable the clustering. Furthermore, we propose a novel multi-clustering method MSC. An advantage of our method comparing to the state-of-the-art multi-clustering methods is that our method can provide users a feature subspace to understand each clustering solution. Another advantage is that MSC does not need users to specify the number of clusters and the number of alternative clusterings, which is usually difficult for users without any guidance. Our method can heuristically estimate the number of stable clusterings in a data set. We also discuss a practical way to make MSC applicable to large-scale data. We report an extensive empirical study that clearly demonstrates the effectiveness of our method.

Keywords

Multi-clustering Clustering stability Laplacian eigengap Feature subspace 

References

  1. 1.
    Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data. Seattle, WA, pp 94–105Google Scholar
  2. 2.
    Assent I, Krieger R, Müller E, Seidl T (2008) INSCY: indexing subspace clusters with in-process-removal of redundancy. In: Proceedings of the IEEE international conference on data mining. Pisa, Italy, pp 719–724Google Scholar
  3. 3.
    Azran A, Ghahramani Z (2006), Spectral methods for automatic multiscale data clustering, In: IEEE computer society conference on computer vision and pattern recognition. New York, NY, pp 190–197Google Scholar
  4. 4.
    Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the IEEE international conference on data mining. Hong Kong, China, pp 53–62Google Scholar
  5. 5.
    Bae E, Bailey J, Dong G (2010) A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings. Data Min Knowl Disc 21(3):427–471MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bailey J (2013) Alternative clustering analysis: a review. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Taylor & Francis, London, pp 535–550Google Scholar
  7. 7.
    Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann Stat 37(4):1705–1732MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Caruana R, Elhawary MF, Nguyen N, Smith C (2006) Meta clustering. In: Proceedings of the IEEE international conference on data mining. Hong Kong, China, pp 107–118Google Scholar
  9. 9.
    Chen, X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence. San Francisco, CAGoogle Scholar
  10. 10.
    Dang X, Bailey J (2015) A framework to uncover multiple alternative clusterings. Mach Learn 98(1–2):7–30MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Dasgupta S, Ng V (2010) Mining clustering dimensions. In: Proceedings of the 27th international conference on machine learning. Haifa, Israel, pp 263–270Google Scholar
  12. 12.
    Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefzbMATHGoogle Scholar
  13. 13.
    Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data 2(4):17:1–17:40. doi: 10.1145/1460797.1460800
  14. 14.
    Drineas P, Mahoney MW (2005) On the nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetzbMATHGoogle Scholar
  15. 15.
    Ester M, Kriegel H-P, Sander J, Xu X (1996), A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. Portland, OR, pp 226–231Google Scholar
  16. 16.
    Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 186–193Google Scholar
  17. 17.
    Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469MathSciNetzbMATHGoogle Scholar
  18. 18.
    Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRefzbMATHGoogle Scholar
  19. 19.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  20. 20.
    Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 4th SIAM international conference on data mining. Lake Buena Vista, FL, pp 246–257Google Scholar
  21. 21.
    Kriegel H-P, Kröger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the IEEE international conference on data mining. Houston, TX, pp 250–257Google Scholar
  22. 22.
    Kumar A, Sabharwal Y, Sen S (2004) A simple linear time (1+\(\acute{\epsilon }\))-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th symposium on foundations of computer science. Rome, Italy, pp 454–462Google Scholar
  23. 23.
    Kyrillidis AT, Becker S, Cevher V, Koch C (2013) Sparse projections onto the simplex. In: Proceedings of the 30th international conference on machine learning. Atlanta, GA, pp 235–243Google Scholar
  24. 24.
    Li M, Lian X, Kwok JT, Lu B (2011) Time and space efficient spectral clustering via column sampling. In: Proceedings of IEEE conference on computer vision and pattern recognition. Colorado Springs, CO, pp 2297–2304Google Scholar
  25. 25.
    Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
  26. 26.
    Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416MathSciNetCrossRefGoogle Scholar
  28. 28.
    Meilǎ M, Shortreed S (2006) Regularized spectral learning. J Mach Learn Res 2006:1–20Google Scholar
  29. 29.
    Müller E, Assent I, Krieger R, Günnemann S, Seidl T (2009) Densest: density estimation for data mining in high dimensional spaces. In: Proceedings of the 9th SIAM international conference on data mining. Sparks, NV, pp 173–184Google Scholar
  30. 30.
    Nagesh H, Goil S, Choudhary A (2001) Adaptive grids for clustering massive data sets. In: Proceedings of the 1st SIAM international conference on data mining. Chicago, IL, pp 1–17Google Scholar
  31. 31.
    Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14, pp 849–856Google Scholar
  32. 32.
    Roth V, Lange T (2003) Feature selection in clustering problems. In: Advances in neural information processing systems 16 [neural information processing systems, NIPS 2003, 8–13 Dec 2003, Vancouver and Whistler, British Columbia, Canada], pp 473–480Google Scholar
  33. 33.
    Sequeira K, Zaki MJ (2005) Schism: a new approach to interesting subspace mining. Int J Bus Intell Data Min 1(2):137–160CrossRefGoogle Scholar
  34. 34.
    Shapiro LG, Stockman GC (2001) Computer vision. Prentice Hall, San DiegoGoogle Scholar
  35. 35.
    Stewart GW, Sun J-G (1990) Matrix perturbation theory. Academic Press, San DiegozbMATHGoogle Scholar
  36. 36.
    Tang C, Zhang A, Pei J (2003) Mining phenotypes and informative genes from gene expression data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03). Washington, DCGoogle Scholar
  37. 37.
    Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetzbMATHGoogle Scholar
  38. 38.
    Wang JZ, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963CrossRefGoogle Scholar
  39. 39.
    Williams CKI, Seeger M (2001) Using the nyström method to speed up kernel machines. In: Advances in neural information processing systems 13, pp 682–688Google Scholar
  40. 40.
    Xiang S, Tong X, Ye J (2013) Efficient sparse group feature selection via nonconvex optimization. In: Proceedings of the 30th international conference on machine learning. Atlanta, GA, pp 284–292Google Scholar
  41. 41.
    Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. France, Paris, pp 907–916Google Scholar
  42. 42.
    Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Juhua Hu
    • 1
    Email author
  • Qi Qian
    • 2
  • Jian Pei
    • 1
  • Rong Jin
    • 2
  • Shenghuo Zhu
    • 2
  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  2. 2.Alibaba GroupBellevueUSA

Personalised recommendations