Abstract
Ensemble and Consensus Clustering address the problem of unifying multiple clustering results into a single output to best reflect the agreement of input methods. They can be used to obtain more stable and robust clustering results in comparison with a single clustering approach. In this study, we propose a novel subset selection method that looks at controlling the number of clustering inputs and datasets in an efficient way. The authors propose a number of manual selection and heuristic search techniques to perform the selection. Our investigation and experiments demonstrate very promising results. Using these techniques can ensure better selection methods and datasets for Ensemble and Consensus Clustering and thus more efficient clustering results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM. 55(5), 1–27 (2008). https://doi.org/10.1145/1411509.1411513
Altman, D.G.: Practical Statistics for Medical Research. Chapman and Hall, London (1997)
Arzoky, M., Swift, S., Tucker, A., Cain, J.: A seeded search for the modularisation of sequential software versions. J. Object Technol. 11(2) (2012). https://doi.org/10.5381/jot.2012.11.2.a6
Azimi, J., Fern, X.: Adaptive cluster ensemble selection. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 992–997 (2009)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004). https://doi.org/10.1023/B:MACH.0000033116.57574.95
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Twenty-First International Conference on Machine Learning - ICML 2004, p. 18 (2004). https://doi.org/10.1145/1015330.1015432
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005). https://doi.org/10.1016/J.JCSS.2004.10.012
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104
Dua, D., Taniskidou, E.K.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml. Accessed 6 Oct 2017
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003). https://doi.org/10.1093/bioinformatics/btg038
Fern, X.Z., Lin, W.: Cluster ensemble selection. Stat. Anal. Data Min. 1(3), 128–141 (2008). https://doi.org/10.1002/sam.10008
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Twenty-First International Conference on Machine Learning - ICML 2004, p. 36 (2004). https://doi.org/10.1145/1015330.1015414
Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2006). https://doi.org/10.1109/TPAMI.2005.113
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings - International Conference on Data Engineering, pp. 341–352 (2005). https://doi.org/10.1109/ICDE.2005.34
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, p. 16 (2005). https://doi.org/10.1145/1109557.1109686
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075
Hyndman, R.J.: Time series data library. http://data.is/TSDLdemo. Accessed 15 Oct 2017
Jain, K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504
Kaggle: Kaggle datasets. www.kaggle.com/datasets. Accessed 15 Sept 2017
Kirkpatrick, S., Gelatt, D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983). https://doi.org/10.1007/BF01009452
Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006). https://doi.org/10.1109/TPAMI.2006.226
Mldata.org.: Machine learning data set repository. http://mldata.org. Accessed 7 Dec 2017
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003). https://doi.org/10.1023/A:1023949509487
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning. table contents, pp. 727–734 (2000). https://doi.org/10.1007/3-540-44491-2_3
Singh, V., Mukherjee, L., Peng, J., Xu, J.: Ensemble clustering using semidefinite programming with applications. Mach. Learn. 79(1–2), 177–200 (2010). https://doi.org/10.1007/s10994-009-5158-y
StatLib.: StatLib—Datasets Azrchive. Carnegie Mellon University (1989). http://lib.stat.cmu.edu/datasets. Accessed 20 Nov 2017
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002). https://doi.org/10.1162/153244303321897735
Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene-expression data. Genome Biol. 5(11) (2004). https://doi.org/10.1186/gb-2004-5-11-r94
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ayed, S., Arzoky, M., Swift, S., Counsell, S., Tucker, A. (2019). An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_72
Download citation
DOI: https://doi.org/10.1007/978-3-030-01054-6_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01053-9
Online ISBN: 978-3-030-01054-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)