Skip to main content

An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Included in the following conference series:

Abstract

Ensemble and Consensus Clustering address the problem of unifying multiple clustering results into a single output to best reflect the agreement of input methods. They can be used to obtain more stable and robust clustering results in comparison with a single clustering approach. In this study, we propose a novel subset selection method that looks at controlling the number of clustering inputs and datasets in an efficient way. The authors propose a number of manual selection and heuristic search techniques to perform the selection. Our investigation and experiments demonstrate very promising results. Using these techniques can ensure better selection methods and datasets for Ensemble and Consensus Clustering and thus more efficient clustering results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM. 55(5), 1–27 (2008). https://doi.org/10.1145/1411509.1411513

    Article  MathSciNet  Google Scholar 

  2. Altman, D.G.: Practical Statistics for Medical Research. Chapman and Hall, London (1997)

    Google Scholar 

  3. Arzoky, M., Swift, S., Tucker, A., Cain, J.: A seeded search for the modularisation of sequential software versions. J. Object Technol. 11(2) (2012). https://doi.org/10.5381/jot.2012.11.2.a6

    Article  Google Scholar 

  4. Azimi, J., Fern, X.: Adaptive cluster ensemble selection. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 992–997 (2009)

    Google Scholar 

  5. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004). https://doi.org/10.1023/B:MACH.0000033116.57574.95

    Article  MathSciNet  Google Scholar 

  6. Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Twenty-First International Conference on Machine Learning - ICML 2004, p. 18 (2004). https://doi.org/10.1145/1015330.1015432

  7. Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005). https://doi.org/10.1016/J.JCSS.2004.10.012

    Article  MathSciNet  Google Scholar 

  8. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  9. Dua, D., Taniskidou, E.K.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml. Accessed 6 Oct 2017

  10. Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003). https://doi.org/10.1093/bioinformatics/btg038

    Article  Google Scholar 

  11. Fern, X.Z., Lin, W.: Cluster ensemble selection. Stat. Anal. Data Min. 1(3), 128–141 (2008). https://doi.org/10.1002/sam.10008

    Article  MathSciNet  Google Scholar 

  12. Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Twenty-First International Conference on Machine Learning - ICML 2004, p. 36 (2004). https://doi.org/10.1145/1015330.1015414

  13. Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2006). https://doi.org/10.1109/TPAMI.2005.113

    Article  Google Scholar 

  14. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings - International Conference on Data Engineering, pp. 341–352 (2005). https://doi.org/10.1109/ICDE.2005.34

  15. Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, p. 16 (2005). https://doi.org/10.1145/1109557.1109686

  16. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075

    Article  Google Scholar 

  17. Hyndman, R.J.: Time series data library. http://data.is/TSDLdemo. Accessed 15 Oct 2017

  18. Jain, K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504

    Article  Google Scholar 

  19. Kaggle: Kaggle datasets. www.kaggle.com/datasets. Accessed 15 Sept 2017

  20. Kirkpatrick, S., Gelatt, D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983). https://doi.org/10.1007/BF01009452

    Article  MathSciNet  Google Scholar 

  21. Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006). https://doi.org/10.1109/TPAMI.2006.226

    Article  Google Scholar 

  22. Mldata.org.: Machine learning data set repository. http://mldata.org. Accessed 7 Dec 2017

  23. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003). https://doi.org/10.1023/A:1023949509487

    Article  Google Scholar 

  24. Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning. table contents, pp. 727–734 (2000). https://doi.org/10.1007/3-540-44491-2_3

    Chapter  Google Scholar 

  25. Singh, V., Mukherjee, L., Peng, J., Xu, J.: Ensemble clustering using semidefinite programming with applications. Mach. Learn. 79(1–2), 177–200 (2010). https://doi.org/10.1007/s10994-009-5158-y

    Article  MathSciNet  Google Scholar 

  26. StatLib.: StatLib—Datasets Azrchive. Carnegie Mellon University (1989). http://lib.stat.cmu.edu/datasets. Accessed 20 Nov 2017

  27. Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002). https://doi.org/10.1162/153244303321897735

  28. Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene-expression data. Genome Biol. 5(11) (2004). https://doi.org/10.1186/gb-2004-5-11-r94

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samy Ayed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ayed, S., Arzoky, M., Swift, S., Counsell, S., Tucker, A. (2019). An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_72

Download citation

Publish with us

Policies and ethics