# rFILTA: relevant and nonredundant view discovery from collections of clusterings via filtering and ranking

- 168 Downloads

## Abstract

Meta-clustering is a popular approach for finding multiple clusterings in the dataset, taking a large number of base clusterings as input for further user navigation and refinement. However, the effectiveness of meta-clustering is highly dependent on the distribution of the base clusterings and open challenges exist with regard to its stability and noise tolerance. In addition, the clustering views returned may not all be relevant, hence there is open challenge on how to rank those clustering views. In this paper we propose a simple and effective filtering algorithm that can be flexibly used in conjunction with any meta-clustering method. In addition, we propose an unsupervised method to rank the returned clustering views. We evaluate the framework (rFILTA) on both synthetic and real-world datasets, and see how its use can enhance the clustering view discovery for complex scenarios.

## Keywords

Clustering Meta-clustering Multiple clusterings Clustering visualization Clustering filtering Clustering ranking## References

- 1.Azimi J, Fern X (2009) Adaptive cluster ensemble selection. In: IJCAI vol 9, pp 992–997Google Scholar
- 2.Bache K, Lichman M (2013) UCI machine learning repository. URL http://archive.ics.uci.edu/ml
- 3.Bae E, Bailey J Coala (2006) A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Sixth international conference on data mining, 2006 (ICDM’06). IEEE, pp 53–62Google Scholar
- 4.Bailey J (2013) Alternative clustering analysis: a review. In: Aggarwal C, Reddy C (eds) Data clustering: algorithms and applications. CRC Press, Boca RatonGoogle Scholar
- 5.Caruana R, Elhaway M, Nguyen N, Smith C (2006) Meta clustering. In: Proceedings of ICDM, pp 107–118Google Scholar
- 6.Cui Y, Fern XZ, Dy JG (2007) Multi-view clustering via orthogonalization. In: Proceedings of ICDM, pp 133–142Google Scholar
- 7.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society Conference on computer vision and pattern recognition, 2005 (CVPR’2005) IEEE, vol 1, pp 886–893Google Scholar
- 8.Dang XH, Bailey J (2010) A hierarchical information theoretic technique for the discovery of non linear alternative clusterings. In: Proceedings of the of (KDD’10), pp 573–582Google Scholar
- 9.Dang XH, Bailey J (2014) Generating multiple alternative clusterings via globally optimal subspaces. Data Min Knowl Discov 28(3):569–592MathSciNetCrossRefzbMATHGoogle Scholar
- 10.Dang XH, Bailey J (2015) A framework to uncover multiple alternative clusterings. Mach Learn 98(1–2):7–30MathSciNetCrossRefzbMATHGoogle Scholar
- 11.Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: Proceedings of ICDM, pp 773–778Google Scholar
- 12.Faivishevsky L, Goldberger J (2010) Nonparametric information theoretic clustering algorithm. In: Proceedings of ICML, pp 351–358Google Scholar
- 13.Fern XZ, Lin W (2008) Cluster ensemble selection. Stat Anal Data Min 1(3):128–141MathSciNetCrossRefGoogle Scholar
- 14.Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):4CrossRefGoogle Scholar
- 15.Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306MathSciNetCrossRefzbMATHGoogle Scholar
- 16.Gullo F, Domeniconi C, Tagarelli A (2015) Metacluster-based projective clustering ensembles. Mach Learn 98(1–2):181–216MathSciNetCrossRefzbMATHGoogle Scholar
- 17.Hadjitodorov ST, Kuncheva LI, Todorova LP (2006) Moderate diversity for better cluster ensembles. Inf Fusion 7(3):264–275CrossRefGoogle Scholar
- 18.Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5):813–822CrossRefGoogle Scholar
- 19.Havens TC, Bezdek JC, Keller JM, Popescu M (2009) Clustering in ordered dissimilarity data. Int J Int Syst 24(5):504–528CrossRefzbMATHGoogle Scholar
- 20.Hossain MS, Ramakrishnan N, Davidson I, Watson LT (2013) How to “alternatize” a clustering algorithm. Data Min Knowl Discov 27(2):193–224MathSciNetCrossRefzbMATHGoogle Scholar
- 21.Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffszbMATHGoogle Scholar
- 22.Jain P, Meka R, Dhillon IS (2008) Simultaneous unsupervised learning of disparate clusterings. Stat Anal Data Min: ASA Data Sci J 1(3):195–210MathSciNetCrossRefGoogle Scholar
- 23.Jaskowiak PA, Moulavi D, Furtado AC, Campello RJ, Zimek A, Sander J (2016) On strategies for building effective ensembles of relative clustering validity criteria. Knowl Inf Syst 47(2):329–354CrossRefGoogle Scholar
- 24.Lei Y, Vinh NX, Chan J, Bailey J (2014) Filta Better view discovery from collections of clusterings via filtering. Machine learning and knowledge discovery in databases. Springer, Berlin, pp 145–160Google Scholar
- 25.Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
- 26.Naldi MC, Carvalho A, Campello RJ (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289MathSciNetCrossRefzbMATHGoogle Scholar
- 27.Nguyen N, Caruana R (2007) Consensus clusterings. In: Seventh IEEE international conference on data mining (ICDM’2007). IEEE, pp 607–612Google Scholar
- 28.Nie F, Xu D, Li X (2012) Initialization independent clustering with actively self-training method. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 42(1):17–27CrossRefGoogle Scholar
- 29.Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 977–986Google Scholar
- 30.Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 2, pp 1447–1454Google Scholar
- 31.Niu D, Dy JG, Jordan MI (2014) Iterative discovery of multiple alternativeclustering views. IEEE Trans Pattern Anal Mach Intell 36(7):1340–1353CrossRefGoogle Scholar
- 32.Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
- 33.Phillips JM, Raman P, Venkatasubramanian S (2011) Generating a diverse set of high-quality clusterings. arXiv:1108.0017
- 34.Pihur V, Datta S, Datta S (2007) Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 23(13):1607–1615CrossRefGoogle Scholar
- 35.Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefzbMATHGoogle Scholar
- 36.Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 35(6):1156–1167CrossRefGoogle Scholar
- 37.Strehl A, Ghosh J (2003) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetzbMATHGoogle Scholar
- 38.Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881CrossRefGoogle Scholar
- 39.Vinh NX, Epps J (2010) minCEntropy: a novel information theoretic approach for the generation of alternative clusterings. In: Proceedings of the ICDM, pp 521–530Google Scholar
- 40.Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of ICML. ACM, pp 1073–1080Google Scholar
- 41.Wang L, Nguyen UT, Bezdek JC, Leckie CA, Ramamohanarao K (2010) iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Proceedings of PAKDD, pp 16–27Google Scholar
- 42.Wang H, Shan H, Banerjee A (2011) Bayesian cluster ensembles. Stat Anal Data Min 4(1):54–70MathSciNetCrossRefGoogle Scholar
- 43.Zhang Y, Li T (2011) Extending consensus clustering to explore multiple clustering views. In: Proceedings of the SDM, pp 920–931Google Scholar