Machine Learning

, Volume 98, Issue 1–2, pp 57–91 | Cite as

A flexible cluster-oriented alternative clustering algorithm for choosing from the Pareto front of solutions

Article

Abstract

Supervised alternative clustering is the problem of finding a set of clusterings which are of high quality and different from a given negative clustering. The task is therefore a clear multi-objective optimization problem. Optimizing two conflicting objectives at the same time requires dealing with trade-offs. Most approaches in the literature optimize these objectives sequentially (one objective after another one) or indirectly (by some heuristic combination of the objectives). Solving a multi-objective optimization problem in these ways can result in solutions which are dominated, and not Pareto-optimal. We develop a direct algorithm, called COGNAC, which fully acknowledges the multiple objectives, optimizes them directly and simultaneously, and produces solutions approximating the Pareto front. COGNAC performs the recombination operator at the cluster level instead of at the object level, as in the traditional genetic algorithms. It can accept arbitrary clustering quality and dissimilarity objectives and provides solutions dominating those obtained by other state-of-the-art algorithms. Based on COGNAC, we propose another algorithm called SGAC for the sequential generation of alternative clusterings where each newly found alternative clustering is guaranteed to be different from all previous ones. The experimental results on widely used benchmarks demonstrate the advantages of our approach.

Keywords

Alternative clustering Multi-objective optimization Cluster-oriented recombination Genetic algorithms 

References

  1. Bae, E., & Bailey, J. (2006). Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In Sixth international conference on data mining, 2006. ICDM’06 (pp. 53–62). New York: IEEE. Google Scholar
  2. Battiti, R., & Passerini, A. (2010). Brain–computer evolutionary multiobjective optimization: a genetic algorithm adapting to the decision maker. IEEE Transactions on Evolutionary Computation, 14(5), 671–687. CrossRefGoogle Scholar
  3. Battiti, R., Brunato, M., & Mascia, F. (2008). Reactive search and intelligent optimization. In Operations research/computer science interfaces. Berlin: Springer. ISBN: 978-0-387-09623-0. Google Scholar
  4. Branke, J., Deb, K., & Miettinen, K. (2008). Multiobjective optimization: interactive and evolutionary approaches (Vol. 5252). New York: Springer. Google Scholar
  5. Cui, Y., Fern, X. Z., & Dy, J. G. (2007). Non-redundant multi-view clustering via orthogonalization. In Seventh IEEE international conference on data mining, 2007. ICDM 2007 (pp. 133–142). New York: IEEE. Google Scholar
  6. Dang, X. H., & Bailey, J. (2010). Generation of alternative clusterings using the Cami approach. In The SIAM international conference on data mining (pp. 118–129). Google Scholar
  7. Dasgupta, S., & Ng, V. (2010). Mining clustering dimensions. In Proceedings of the 27th international conference on machine learning (pp. 263–270). Google Scholar
  8. Davidson, I., & Qi, Z. (2008). Finding alternative clusterings using constraints. In The eighth IEEE international conference on data mining (pp. 773–778). New York: IEEE. Google Scholar
  9. Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II-ii. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. CrossRefGoogle Scholar
  10. De Bie, T. (2011). Subjectively interesting alternative clusters. In Proceedings of the 2nd MultiClust workshop: discovering, summarizing, and using multiple clusterings. Athens, Greece: CEUR workshop proceedings (CEUR-WS. org) (online) (pp. 43–54). Google Scholar
  11. Falkenauer, E. (1994). A new representation and operators for genetic algorithms applied to grouping problems. Evolutionary Computation, 2(2), 123–144. CrossRefGoogle Scholar
  12. Frank, A., & Asuncion, A. UCI machine learning repository. URL http://archive.ics.uci.edu/ml.
  13. Gondek, D., & Hofmann, T. (2003). Conditional information bottleneck clustering. In 3rd IEEE international conference on data mining, workshop on clustering large data sets (pp. 36–42). Princeton: Citeseer. Google Scholar
  14. Gondek, D., & Hofmann, T. (2005). Non-redundant clustering with conditional ensembles. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining (pp. 70–77). New York: ACM. CrossRefGoogle Scholar
  15. Günnemann, S., Färber, I., & Seidl, T. (2012). Multi-view clustering using mixture models in subspace projections. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 132–140). New York: ACM. CrossRefGoogle Scholar
  16. Handl, J., & Knowles, J. (2007). An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation, 11(1), 56–76. CrossRefGoogle Scholar
  17. Hruschka, E. R., Campello, R. J. G. B., Freitas, A. A., & de Carvalho, A. C. P. L. F. (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, 39(2), 133–155. CrossRefGoogle Scholar
  18. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. CrossRefGoogle Scholar
  19. Jain, P., Meka, R., & Dhillon, I. S. (2008). Simultaneous unsupervised learning of disparate clusterings. Statistical Analysis and Data Mining, 1(3), 195–210. MathSciNetCrossRefGoogle Scholar
  20. Kirkpatrick, S., Gelatt, C. D. Jr, & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680. MathSciNetCrossRefMATHGoogle Scholar
  21. Lance, G. N., & Williams, W. T. (1967). A general theory of classificatory sorting strategies II. Clustering systems. The Computer Journal, 10(3), 271–277. CrossRefGoogle Scholar
  22. Lloyd, S. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28, 129–137. MathSciNetCrossRefMATHGoogle Scholar
  23. Munkres, J. (1957). Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1), 32–38. MathSciNetCrossRefMATHGoogle Scholar
  24. Niu, D., Dy, J. G., & Jordan, M. I. (2010). Multiple non-redundant spectral clustering views. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 831–838). Google Scholar
  25. Qi, Z. J., & Davidson, I. (2009). A principled and flexible framework for finding alternative clusterings. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 717–726). New York: ACM. CrossRefGoogle Scholar
  26. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 846–850. Google Scholar
  27. Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. arXiv:physics/0004057. Google Scholar
  28. Truong, D. T., & Battiti, R. (2012). A cluster-oriented genetic algorithm for alternative clustering. In Proceedings of the 12th international conference on data mining (ICDM12) (pp. 1122–1127). New York: IEEE. CrossRefGoogle Scholar
  29. Vinh, N. X., & Epps, J. (2010). Mincentropy: a novel information theoretic approach for the generation of alternative clusterings. In Proceedings of the 10th international conference on data mining (ICDM10) (pp. 521–530). New York: IEEE. Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Department of Information Engineering and Computer ScienceUniversity of TrentoTrentoItaly

Personalised recommendations