An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem

Ayed, Samy; Arzoky, Mahir; Swift, Stephen; Counsell, Steve; Tucker, Allan

doi:10.1007/978-3-030-01054-6_72

Samy Ayed¹⁷,
Mahir Arzoky¹⁷,
Stephen Swift¹⁷,
Steve Counsell¹⁷ &
…
Allan Tucker¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1593 Accesses
1 Citations

Abstract

Ensemble and Consensus Clustering address the problem of unifying multiple clustering results into a single output to best reflect the agreement of input methods. They can be used to obtain more stable and robust clustering results in comparison with a single clustering approach. In this study, we propose a novel subset selection method that looks at controlling the number of clustering inputs and datasets in an efficient way. The authors propose a number of manual selection and heuristic search techniques to perform the selection. Our investigation and experiments demonstrate very promising results. Using these techniques can ensure better selection methods and datasets for Ensemble and Consensus Clustering and thus more efficient clustering results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM. 55(5), 1–27 (2008). https://doi.org/10.1145/1411509.1411513
Article MathSciNet Google Scholar
Altman, D.G.: Practical Statistics for Medical Research. Chapman and Hall, London (1997)
Google Scholar
Arzoky, M., Swift, S., Tucker, A., Cain, J.: A seeded search for the modularisation of sequential software versions. J. Object Technol. 11(2) (2012). https://doi.org/10.5381/jot.2012.11.2.a6
Article Google Scholar
Azimi, J., Fern, X.: Adaptive cluster ensemble selection. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 992–997 (2009)
Google Scholar
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004). https://doi.org/10.1023/B:MACH.0000033116.57574.95
Article MathSciNet Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Twenty-First International Conference on Machine Learning - ICML 2004, p. 18 (2004). https://doi.org/10.1145/1015330.1015432
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005). https://doi.org/10.1016/J.JCSS.2004.10.012
Article MathSciNet Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104
Article Google Scholar
Dua, D., Taniskidou, E.K.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml. Accessed 6 Oct 2017
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003). https://doi.org/10.1093/bioinformatics/btg038
Article Google Scholar
Fern, X.Z., Lin, W.: Cluster ensemble selection. Stat. Anal. Data Min. 1(3), 128–141 (2008). https://doi.org/10.1002/sam.10008
Article MathSciNet Google Scholar
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Twenty-First International Conference on Machine Learning - ICML 2004, p. 36 (2004). https://doi.org/10.1145/1015330.1015414
Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2006). https://doi.org/10.1109/TPAMI.2005.113
Article Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings - International Conference on Data Engineering, pp. 341–352 (2005). https://doi.org/10.1109/ICDE.2005.34
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, p. 16 (2005). https://doi.org/10.1145/1109557.1109686
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075
Article Google Scholar
Hyndman, R.J.: Time series data library. http://data.is/TSDLdemo. Accessed 15 Oct 2017
Jain, K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504
Article Google Scholar
Kaggle: Kaggle datasets. www.kaggle.com/datasets. Accessed 15 Sept 2017
Kirkpatrick, S., Gelatt, D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983). https://doi.org/10.1007/BF01009452
Article MathSciNet Google Scholar
Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006). https://doi.org/10.1109/TPAMI.2006.226
Article Google Scholar
Mldata.org.: Machine learning data set repository. http://mldata.org. Accessed 7 Dec 2017
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003). https://doi.org/10.1023/A:1023949509487
Article Google Scholar
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning. table contents, pp. 727–734 (2000). https://doi.org/10.1007/3-540-44491-2_3
Chapter Google Scholar
Singh, V., Mukherjee, L., Peng, J., Xu, J.: Ensemble clustering using semidefinite programming with applications. Mach. Learn. 79(1–2), 177–200 (2010). https://doi.org/10.1007/s10994-009-5158-y
Article MathSciNet Google Scholar
StatLib.: StatLib—Datasets Azrchive. Carnegie Mellon University (1989). http://lib.stat.cmu.edu/datasets. Accessed 20 Nov 2017
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002). https://doi.org/10.1162/153244303321897735
Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene-expression data. Genome Biol. 5(11) (2004). https://doi.org/10.1186/gb-2004-5-11-r94
Article Google Scholar

Download references

Author information

Authors and Affiliations

Brunel University London, Middlesex, UK
Samy Ayed, Mahir Arzoky, Stephen Swift, Steve Counsell & Allan Tucker

Authors

Samy Ayed
View author publications
You can also search for this author in PubMed Google Scholar
Mahir Arzoky
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Swift
View author publications
You can also search for this author in PubMed Google Scholar
Steve Counsell
View author publications
You can also search for this author in PubMed Google Scholar
Allan Tucker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samy Ayed .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ayed, S., Arzoky, M., Swift, S., Counsell, S., Tucker, A. (2019). An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_72

Download citation

DOI: https://doi.org/10.1007/978-3-030-01054-6_72
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01053-9
Online ISBN: 978-3-030-01054-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics