A weighted framework for unsupervised ensemble learning based on internal quality measures
- 98 Downloads
Unsupervised ensemble, or consensus clustering, consists in finding the optimal combination strategy of individual clusterings that is robust with respect to the selection of an algorithmic clustering pool. Recently an approach was proposed based on the concept of consensus graph that has profound advantages over its predecessors. Despite its robust properties this approach assigns the same weight to the contribution of each clustering to the final solution. In this paper, we propose a weighting policy for this problem that is based on internal clustering quality measures and compare against other popular approaches. Results on publicly available datasets show that weights can significantly improve the accuracy performance while retaining the robust properties.
KeywordsConsensus clustering Unsupervised learning Weighted consensus clustering Graph partitioning
- Abello, J., Pardalos, P. M., & Resende, M. G. (2013). Handbook of massive data sets (Vol. 4). Berlin: Springer.Google Scholar
- Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 20th international conference on Pattern recognition (ICPR), 2010 (pp. 3121–3124). IEEE.Google Scholar
- Deodhar, M., & Ghosh, J. (2006). Consensus clustering for detection of overlapping clusters in microarray data. In ICDM workshops (pp. 104–108).Google Scholar
- Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57.Google Scholar
- Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd, 96, 226–231.Google Scholar
- Fred, A. (2001). Finding consistent clusters in data partitions. In Multiple classifier systems (pp. 309–318). Springer.Google Scholar
- Goder, A., & Filkov, V. (2008). Consensus clustering algorithms: Comparison and refinement. In Alenex (Vol. 8, pp. 109–117). SIAM.Google Scholar
- Halkidi, M., & Vazirgiannis, M. (2001). Clustering validity assessment: Finding the optimal partitioning of a data set. In Proceedings IEEE international conference on data mining, 2001. ICDM 2001 (pp. 187–194). IEEE.Google Scholar
- Halkidi, M., Vazirgiannis, M., Batistakis, Y. (2000). Quality scheme assessment in the clustering process. In Proceedings of the 4th European conference on principles of data mining and knowledge discovery, PKDD ’00 (pp. 265–276) London, UK: Springer. ISBN 3-540-41066-X. URL http://dl.acm.org/citation.cfm?id=645804.669820. Accessed 20 Nov 2017.
- Jang, J.-S. R., Sun, C.-T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing : A computational approach to learning and machine intelligence. New Jersey, NJ: Prentice Hall.Google Scholar
- Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25–36.Google Scholar
- Kovács, F., Legány, C., & Babos, A. (2005). Cluster validity measurement techniques. In 6th International symposium of hungarian researchers on computational intelligence.Google Scholar
- Kuncheva, L. I., Hadjitodorov, S. T., & Todorova, L. P. (2006). Experimental comparison of cluster ensemble methods. In 9th International conference on information fusion, 2006 (pp. 1–7). IEEE.Google Scholar
- LeCun, Y., & Cortes, C. (2010). Mnist handwritten digit database. AT&T Labs[Online]. http://yann.lecun.com/exdb/mnist. Accessed 20 Nov 2017.
- Li, T., & Ding, C. (2008). 2008 SIAM international conference on data mining (p. 12), 24–26 April 2008, Atlanta, GA.Google Scholar
- Li, T., Ogihara, M., & Zhu, S. (2006). Integrating features from different sources for music information retrieval. In Sixth international conference on data mining, 2006. ICDM’06 (pp. 372–381). IEEE,Google Scholar
- Lichman, M. (2013). UCI machine learning repository. URL http://archive.ics.uci.edu/ml. Accessed 20 Nov 2017.
- Liu, H., Cheng, G., & Wu, J. (2015). Consensus clustering on big data. In 12th International conference on service systems and service management (ICSSSM), 2015 (pp. 1–6). IEEE.Google Scholar
- Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010). Understanding of internal clustering validation measures. In IEEE 10th international conference on data mining (ICDM), 2010 (pp. 911–916). IEEE.Google Scholar
- MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA (Vol. 1, pp. 281–297).Google Scholar
- Mangasarian, O. L., Nick Street, W., & Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), 570–577.Google Scholar
- McLachlan, G., & Peel, D. (2000). Multivariate normal mixtures. In Finite Mixture Models. Hoboken, NJ: Wiley. https://doi.org/10.1002/0471721182.ch3.
- Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 2, 849–856.Google Scholar
- Nguyen, N., & Caruana, R. (2007). Consensus clusterings. In Seventh IEEE international conference on data mining, 2007. ICDM 2007 (pp. 607–612). IEEEGoogle Scholar
- Race, S. L. (2014). Iterative consensus clustering. Raleigh: North Carolina State University.Google Scholar
- Rajaraman, A., Ullman, J. D., Ullman, J. D., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge: Cambridge University Press.Google Scholar
- Rendón, E., Abundez, I., Arizmendi, A., & Quiroz, E. (2011). Internal versus external cluster validation indexes. International Journal of Computers and Communications, 5(1), 27–34.Google Scholar
- Sharma, S. (1996). Applied multivariate techniques. New York, NY: Wiley.Google Scholar
- Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3, 583–617.Google Scholar
- Weiss, S. M., & Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Francisco, CA: Morgan Kaufmann Publishers, Inc.Google Scholar
- Weng, C. G., & Poon, J. (2008). A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian data mining conference (Vol. 87, pp. 27–32). Australian Computer Society, Inc.Google Scholar
- Xanthopoulos, P. (2014). A review on consensus clustering methods. In T. M. Rassias, C. A. Floudas & S. Butenko (Eds.), Optimization in Science and Engineering (pp. 553–566). New York: Springer.Google Scholar