Advertisement

Annals of Operations Research

, Volume 276, Issue 1–2, pp 229–247 | Cite as

A weighted framework for unsupervised ensemble learning based on internal quality measures

  • Ramazan Ünlü
  • Petros XanthopoulosEmail author
S.I.: Computational Biomedicine
  • 98 Downloads

Abstract

Unsupervised ensemble, or consensus clustering, consists in finding the optimal combination strategy of individual clusterings that is robust with respect to the selection of an algorithmic clustering pool. Recently an approach was proposed based on the concept of consensus graph that has profound advantages over its predecessors. Despite its robust properties this approach assigns the same weight to the contribution of each clustering to the final solution. In this paper, we propose a weighting policy for this problem that is based on internal clustering quality measures and compare against other popular approaches. Results on publicly available datasets show that weights can significantly improve the accuracy performance while retaining the robust properties.

Keywords

Consensus clustering Unsupervised learning Weighted consensus clustering Graph partitioning 

References

  1. Abawajy, J. H., Kelarev, A. V., & Chowdhury, M. (2013). Multistage approach for clustering and classification of ecg data. Computer Methods and Programs in Biomedicine, 112(3), 720–730.CrossRefGoogle Scholar
  2. Abello, J., Pardalos, P. M., & Resende, M. G. (2013). Handbook of massive data sets (Vol. 4). Berlin: Springer.Google Scholar
  3. Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 20th international conference on Pattern recognition (ICPR), 2010 (pp. 3121–3124). IEEE.Google Scholar
  4. Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3(1), 1–27.CrossRefGoogle Scholar
  5. Chang, H., & Yeung, D.-Y. (2008). Robust path-based spectral clustering. Pattern Recognition, 41(1), 191–203.CrossRefGoogle Scholar
  6. Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 224–227.CrossRefGoogle Scholar
  7. Deodhar, M., & Ghosh, J. (2006). Consensus clustering for detection of overlapping clusters in microarray data. In ICDM workshops (pp. 104–108).Google Scholar
  8. Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57.Google Scholar
  9. Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd, 96, 226–231.Google Scholar
  10. Fodeh, S. J., Brandt, C., Luong, T. B., Haddad, A., Schultz, M., Murphy, T., et al. (2013). Complementary ensemble clustering of biomedical data. Journal of Biomedical Informatics, 46(3), 436–443.CrossRefGoogle Scholar
  11. Fred, A. (2001). Finding consistent clusters in data partitions. In Multiple classifier systems (pp. 309–318). Springer.Google Scholar
  12. Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.CrossRefGoogle Scholar
  13. Fu, L., & Medico, E. (2007). Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics, 8(1), 3.CrossRefGoogle Scholar
  14. Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 4.CrossRefGoogle Scholar
  15. Goder, A., & Filkov, V. (2008). Consensus clustering algorithms: Comparison and refinement. In Alenex (Vol. 8, pp. 109–117). SIAM.Google Scholar
  16. Haghtalab, S., Xanthopoulos, P., & Madani, K. (2015). A robust unsupervised consensus control chart pattern recognition framework. Expert Systems with Applications, 42, 6767–6776.CrossRefGoogle Scholar
  17. Halkidi, M., & Vazirgiannis, M. (2001). Clustering validity assessment: Finding the optimal partitioning of a data set. In Proceedings IEEE international conference on data mining, 2001. ICDM 2001 (pp. 187–194). IEEE.Google Scholar
  18. Halkidi, M., Vazirgiannis, M., Batistakis, Y. (2000). Quality scheme assessment in the clustering process. In Proceedings of the 4th European conference on principles of data mining and knowledge discovery, PKDD ’00 (pp. 265–276) London, UK: Springer. ISBN 3-540-41066-X. URL http://dl.acm.org/citation.cfm?id=645804.669820. Accessed 20 Nov 2017.
  19. Jang, J.-S. R., Sun, C.-T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing : A computational approach to learning and machine intelligence. New Jersey, NJ: Prentice Hall.Google Scholar
  20. Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254.CrossRefGoogle Scholar
  21. Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25–36.Google Scholar
  22. Kovács, F., Legány, C., & Babos, A. (2005). Cluster validity measurement techniques. In 6th International symposium of hungarian researchers on computational intelligence.Google Scholar
  23. Křivánek, M., & Morávek, J. (1986). Np-hard problems in hierarchical-tree clustering. Acta Informatica, 23(3), 311–323.CrossRefGoogle Scholar
  24. Kuncheva, L. I., Hadjitodorov, S. T., & Todorova, L. P. (2006). Experimental comparison of cluster ensemble methods. In 9th International conference on information fusion, 2006 (pp. 1–7). IEEE.Google Scholar
  25. Lancichinetti, A., & Fortunato, S. (2012). Consensus clustering in complex networks. Scientific Reports, 2, 336.CrossRefGoogle Scholar
  26. Lawlor, N., Fabbri, A., Guan, P., George, J., & Karuturi, R. K. M. (2016). multiclust: An r-package for identifying biologically relevant clusters in cancer transcriptome profiles. Cancer Informatics, 15, 103.CrossRefGoogle Scholar
  27. LeCun, Y., & Cortes, C. (2010). Mnist handwritten digit database. AT&T Labs[Online]. http://yann.lecun.com/exdb/mnist. Accessed 20 Nov 2017.
  28. Li, T., & Ding, C. (2008). 2008 SIAM international conference on data mining (p. 12), 24–26 April 2008, Atlanta, GA.Google Scholar
  29. Li, T., Ogihara, M., & Zhu, S. (2006). Integrating features from different sources for music information retrieval. In Sixth international conference on data mining, 2006. ICDM’06 (pp. 372–381). IEEE,Google Scholar
  30. Lichman, M. (2013). UCI machine learning repository. URL http://archive.ics.uci.edu/ml. Accessed 20 Nov 2017.
  31. Liu, H., Cheng, G., & Wu, J. (2015). Consensus clustering on big data. In 12th International conference on service systems and service management (ICSSSM), 2015 (pp. 1–6). IEEE.Google Scholar
  32. Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010). Understanding of internal clustering validation measures. In IEEE 10th international conference on data mining (ICDM), 2010 (pp. 911–916). IEEE.Google Scholar
  33. MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA (Vol. 1, pp. 281–297).Google Scholar
  34. Mangasarian, O. L., Nick Street, W., & Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), 570–577.Google Scholar
  35. McLachlan, G., & Peel, D. (2000). Multivariate normal mixtures. In Finite Mixture Models. Hoboken, NJ: Wiley.  https://doi.org/10.1002/0471721182.ch3.
  36. McQuitty, L. L. (1957). Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educational and Psychological Measurement, 17, 207–229.CrossRefGoogle Scholar
  37. Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 2, 849–856.Google Scholar
  38. Nguyen, N., & Caruana, R. (2007). Consensus clusterings. In Seventh IEEE international conference on data mining, 2007. ICDM 2007 (pp. 607–612). IEEEGoogle Scholar
  39. Race, S. L. (2014). Iterative consensus clustering. Raleigh: North Carolina State University.Google Scholar
  40. Rajaraman, A., Ullman, J. D., Ullman, J. D., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge: Cambridge University Press.Google Scholar
  41. Rendón, E., Abundez, I., Arizmendi, A., & Quiroz, E. (2011). Internal versus external cluster validation indexes. International Journal of Computers and Communications, 5(1), 27–34.Google Scholar
  42. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.CrossRefGoogle Scholar
  43. Sharma, S. (1996). Applied multivariate techniques. New York, NY: Wiley.Google Scholar
  44. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRefGoogle Scholar
  45. Sneath, P. H. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17(1), 201–226.CrossRefGoogle Scholar
  46. Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3, 583–617.Google Scholar
  47. Sukegawa, N., Yamamoto, Y., & Zhang, L. (2013). Lagrangian relaxation and pegging test for the clique partitioning problem. Advances in Data Analysis and Classification, 7(4), 363–391.CrossRefGoogle Scholar
  48. Topchy, A., Jain, A. K., & Punch, W. (2005). Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 1866–1881.CrossRefGoogle Scholar
  49. Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03), 337–372.CrossRefGoogle Scholar
  50. Weiss, S. M., & Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Francisco, CA: Morgan Kaufmann Publishers, Inc.Google Scholar
  51. Weng, C. G., & Poon, J. (2008). A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian data mining conference (Vol. 87, pp. 27–32). Australian Computer Society, Inc.Google Scholar
  52. Xanthopoulos, P. (2014). A review on consensus clustering methods. In T. M. Rassias, C. A. Floudas & S. Butenko (Eds.), Optimization in Science and Engineering (pp. 553–566). New York: Springer.Google Scholar
  53. Yu, X., Yu, G., & Wang, J. (2017). Clustering cancer gene expression data by projective clustering ensemble. PloS One, 12(2), e0171429.CrossRefGoogle Scholar
  54. Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 100(1), 68–86.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.University of Central FloridaOrlandoUSA
  2. 2.Decision and Information Science DepartmentStetson UniversityDelandUSA

Personalised recommendations