A flexible cluster-oriented alternative clustering algorithm for choosing from the Pareto front of solutions
Supervised alternative clustering is the problem of finding a set of clusterings which are of high quality and different from a given negative clustering. The task is therefore a clear multi-objective optimization problem. Optimizing two conflicting objectives at the same time requires dealing with trade-offs. Most approaches in the literature optimize these objectives sequentially (one objective after another one) or indirectly (by some heuristic combination of the objectives). Solving a multi-objective optimization problem in these ways can result in solutions which are dominated, and not Pareto-optimal. We develop a direct algorithm, called COGNAC, which fully acknowledges the multiple objectives, optimizes them directly and simultaneously, and produces solutions approximating the Pareto front. COGNAC performs the recombination operator at the cluster level instead of at the object level, as in the traditional genetic algorithms. It can accept arbitrary clustering quality and dissimilarity objectives and provides solutions dominating those obtained by other state-of-the-art algorithms. Based on COGNAC, we propose another algorithm called SGAC for the sequential generation of alternative clusterings where each newly found alternative clustering is guaranteed to be different from all previous ones. The experimental results on widely used benchmarks demonstrate the advantages of our approach.
KeywordsAlternative clustering Multi-objective optimization Cluster-oriented recombination Genetic algorithms
- Bae, E., & Bailey, J. (2006). Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In Sixth international conference on data mining, 2006. ICDM’06 (pp. 53–62). New York: IEEE. Google Scholar
- Battiti, R., Brunato, M., & Mascia, F. (2008). Reactive search and intelligent optimization. In Operations research/computer science interfaces. Berlin: Springer. ISBN: 978-0-387-09623-0. Google Scholar
- Branke, J., Deb, K., & Miettinen, K. (2008). Multiobjective optimization: interactive and evolutionary approaches (Vol. 5252). New York: Springer. Google Scholar
- Cui, Y., Fern, X. Z., & Dy, J. G. (2007). Non-redundant multi-view clustering via orthogonalization. In Seventh IEEE international conference on data mining, 2007. ICDM 2007 (pp. 133–142). New York: IEEE. Google Scholar
- Dang, X. H., & Bailey, J. (2010). Generation of alternative clusterings using the Cami approach. In The SIAM international conference on data mining (pp. 118–129). Google Scholar
- Dasgupta, S., & Ng, V. (2010). Mining clustering dimensions. In Proceedings of the 27th international conference on machine learning (pp. 263–270). Google Scholar
- Davidson, I., & Qi, Z. (2008). Finding alternative clusterings using constraints. In The eighth IEEE international conference on data mining (pp. 773–778). New York: IEEE. Google Scholar
- De Bie, T. (2011). Subjectively interesting alternative clusters. In Proceedings of the 2nd MultiClust workshop: discovering, summarizing, and using multiple clusterings. Athens, Greece: CEUR workshop proceedings (CEUR-WS. org) (online) (pp. 43–54). Google Scholar
- Frank, A., & Asuncion, A. UCI machine learning repository. URL http://archive.ics.uci.edu/ml.
- Gondek, D., & Hofmann, T. (2003). Conditional information bottleneck clustering. In 3rd IEEE international conference on data mining, workshop on clustering large data sets (pp. 36–42). Princeton: Citeseer. Google Scholar
- Niu, D., Dy, J. G., & Jordan, M. I. (2010). Multiple non-redundant spectral clustering views. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 831–838). Google Scholar
- Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 846–850. Google Scholar
- Vinh, N. X., & Epps, J. (2010). Mincentropy: a novel information theoretic approach for the generation of alternative clusterings. In Proceedings of the 10th international conference on data mining (ICDM10) (pp. 521–530). New York: IEEE. Google Scholar