Advertisement

Knowledge and Information Systems

, Volume 52, Issue 1, pp 179–219 | Cite as

rFILTA: relevant and nonredundant view discovery from collections of clusterings via filtering and ranking

  • Yang Lei
  • Nguyen Xuan Vinh
  • Jeffrey Chan
  • James Bailey
Regular Paper
  • 168 Downloads

Abstract

Meta-clustering is a popular approach for finding multiple clusterings in the dataset, taking a large number of base clusterings as input for further user navigation and refinement. However, the effectiveness of meta-clustering is highly dependent on the distribution of the base clusterings and open challenges exist with regard to its stability and noise tolerance. In addition, the clustering views returned may not all be relevant, hence there is open challenge on how to rank those clustering views. In this paper we propose a simple and effective filtering algorithm that can be flexibly used in conjunction with any meta-clustering method. In addition, we propose an unsupervised method to rank the returned clustering views. We evaluate the framework (rFILTA) on both synthetic and real-world datasets, and see how its use can enhance the clustering view discovery for complex scenarios.

Keywords

Clustering Meta-clustering Multiple clusterings Clustering visualization Clustering filtering Clustering ranking 

References

  1. 1.
    Azimi J, Fern X (2009) Adaptive cluster ensemble selection. In: IJCAI vol 9, pp 992–997Google Scholar
  2. 2.
    Bache K, Lichman M (2013) UCI machine learning repository. URL http://archive.ics.uci.edu/ml
  3. 3.
    Bae E, Bailey J Coala (2006) A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Sixth international conference on data mining, 2006 (ICDM’06). IEEE, pp 53–62Google Scholar
  4. 4.
    Bailey J (2013) Alternative clustering analysis: a review. In: Aggarwal C, Reddy C (eds) Data clustering: algorithms and applications. CRC Press, Boca RatonGoogle Scholar
  5. 5.
    Caruana R, Elhaway M, Nguyen N, Smith C (2006) Meta clustering. In: Proceedings of ICDM, pp 107–118Google Scholar
  6. 6.
    Cui Y, Fern XZ, Dy JG (2007) Multi-view clustering via orthogonalization. In: Proceedings of ICDM, pp 133–142Google Scholar
  7. 7.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society Conference on computer vision and pattern recognition, 2005 (CVPR’2005) IEEE, vol 1, pp 886–893Google Scholar
  8. 8.
    Dang XH, Bailey J (2010) A hierarchical information theoretic technique for the discovery of non linear alternative clusterings. In: Proceedings of the of (KDD’10), pp 573–582Google Scholar
  9. 9.
    Dang XH, Bailey J (2014) Generating multiple alternative clusterings via globally optimal subspaces. Data Min Knowl Discov 28(3):569–592MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Dang XH, Bailey J (2015) A framework to uncover multiple alternative clusterings. Mach Learn 98(1–2):7–30MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: Proceedings of ICDM, pp 773–778Google Scholar
  12. 12.
    Faivishevsky L, Goldberger J (2010) Nonparametric information theoretic clustering algorithm. In: Proceedings of ICML, pp 351–358Google Scholar
  13. 13.
    Fern XZ, Lin W (2008) Cluster ensemble selection. Stat Anal Data Min 1(3):128–141MathSciNetCrossRefGoogle Scholar
  14. 14.
    Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):4CrossRefGoogle Scholar
  15. 15.
    Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Gullo F, Domeniconi C, Tagarelli A (2015) Metacluster-based projective clustering ensembles. Mach Learn 98(1–2):181–216MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Hadjitodorov ST, Kuncheva LI, Todorova LP (2006) Moderate diversity for better cluster ensembles. Inf Fusion 7(3):264–275CrossRefGoogle Scholar
  18. 18.
    Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5):813–822CrossRefGoogle Scholar
  19. 19.
    Havens TC, Bezdek JC, Keller JM, Popescu M (2009) Clustering in ordered dissimilarity data. Int J Int Syst 24(5):504–528CrossRefzbMATHGoogle Scholar
  20. 20.
    Hossain MS, Ramakrishnan N, Davidson I, Watson LT (2013) How to “alternatize” a clustering algorithm. Data Min Knowl Discov 27(2):193–224MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffszbMATHGoogle Scholar
  22. 22.
    Jain P, Meka R, Dhillon IS (2008) Simultaneous unsupervised learning of disparate clusterings. Stat Anal Data Min: ASA Data Sci J 1(3):195–210MathSciNetCrossRefGoogle Scholar
  23. 23.
    Jaskowiak PA, Moulavi D, Furtado AC, Campello RJ, Zimek A, Sander J (2016) On strategies for building effective ensembles of relative clustering validity criteria. Knowl Inf Syst 47(2):329–354CrossRefGoogle Scholar
  24. 24.
    Lei Y, Vinh NX, Chan J, Bailey J (2014) Filta Better view discovery from collections of clusterings via filtering. Machine learning and knowledge discovery in databases. Springer, Berlin, pp 145–160Google Scholar
  25. 25.
    Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  26. 26.
    Naldi MC, Carvalho A, Campello RJ (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Nguyen N, Caruana R (2007) Consensus clusterings. In: Seventh IEEE international conference on data mining (ICDM’2007). IEEE, pp 607–612Google Scholar
  28. 28.
    Nie F, Xu D, Li X (2012) Initialization independent clustering with actively self-training method. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 42(1):17–27CrossRefGoogle Scholar
  29. 29.
    Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 977–986Google Scholar
  30. 30.
    Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 2, pp 1447–1454Google Scholar
  31. 31.
    Niu D, Dy JG, Jordan MI (2014) Iterative discovery of multiple alternativeclustering views. IEEE Trans Pattern Anal Mach Intell 36(7):1340–1353CrossRefGoogle Scholar
  32. 32.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  33. 33.
    Phillips JM, Raman P, Venkatasubramanian S (2011) Generating a diverse set of high-quality clusterings. arXiv:1108.0017
  34. 34.
    Pihur V, Datta S, Datta S (2007) Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23(13):1607–1615CrossRefGoogle Scholar
  35. 35.
    Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefzbMATHGoogle Scholar
  36. 36.
    Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 35(6):1156–1167CrossRefGoogle Scholar
  37. 37.
    Strehl A, Ghosh J (2003) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetzbMATHGoogle Scholar
  38. 38.
    Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881CrossRefGoogle Scholar
  39. 39.
    Vinh NX, Epps J (2010) minCEntropy: a novel information theoretic approach for the generation of alternative clusterings. In: Proceedings of the ICDM, pp 521–530Google Scholar
  40. 40.
    Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of ICML. ACM, pp 1073–1080Google Scholar
  41. 41.
    Wang L, Nguyen UT, Bezdek JC, Leckie CA, Ramamohanarao K (2010) iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Proceedings of PAKDD, pp 16–27Google Scholar
  42. 42.
    Wang H, Shan H, Banerjee A (2011) Bayesian cluster ensembles. Stat Anal Data Min 4(1):54–70MathSciNetCrossRefGoogle Scholar
  43. 43.
    Zhang Y, Li T (2011) Extending consensus clustering to explore multiple clustering views. In: Proceedings of the SDM, pp 920–931Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.Department of Computing and Information SystemsUniversity of MelbourneParkvilleAustralia
  2. 2.School of Science (Computer Science)RMIT UniversityMelbourneAustralia

Personalised recommendations