Evolutionary Multiobjective Clustering

  • Julia Handl
  • Joshua Knowles
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3242)

Abstract

A new approach to data clustering is proposed, in which two or more measures of cluster quality are simultaneously optimized using a multiobjective evolutionary algorithm (EA). For this purpose, the PESA-II EA is adapted for the clustering problem by the incorporation of specialized mutation and initialization procedures, described herein. Two conceptually orthogonal measures of cluster quality are selected for optimization, enabling, for the first time, a clustering algorithm to explore and improve different compromise solutions during the clustering process. Our results, on a diverse suite of 15 real and synthetic data sets – where the correct classes are known – demonstrate a clear advantage to the multiobjective approach: solutions in the discovered Pareto set are objectively better than those obtained when the same EA is applied to optimize just one measure. Moreover, the multiobjective EA exhibits a far more robust level of performance than both the classic k-means and average-link agglomerative clustering algorithms, outperforming them substantially on aggregate.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Blake, C., Merz, C.: UCI repository of machine learning databases. Technical report, Department of Information and Computer Sciences, University of California, Irvine (1998), http://, http://www.ics.uci.edu/~mlearn/MLRepository.html
  3. 3.
    Brucker, P.: On the complexity of clustering problems. In: Optimization and Operations Research, pp. 45–54. Springer, New York (1977)Google Scholar
  4. 4.
    Cole, R.M.: Clustering with genetic algorithms. Master’s thesis, University of Western Australia, Nedlands 6907, Australia (1998)Google Scholar
  5. 5.
    Corne, D.W., Jerram, N.R., Knowles, J.D., Oates, M.J.: PESA-II: regionbased selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2001), pp. 283–290. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  6. 6.
    Corne, D.W., Knowles, J.D., Oates, M.J.: The Pareto envelope-based selection algorithm for multiobjective optimization. In: Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 839–848. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley & Sons, Chichester (2001)MATHGoogle Scholar
  8. 8.
    Demiriz, A., Bennett, K., Embrechts, M.: Semi-supervised clustering using genetic algorithms. Technical report, Rensselaer Polytechnic Institute, Troy, New York (1999)Google Scholar
  9. 9.
    Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Sons, Chichester (1998)Google Scholar
  10. 10.
    Gablentz, W., Köppen, M., Dimitriadou, E.: Robust clustering by evolutionary computation. In: 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), The Internet (2000)Google Scholar
  11. 11.
    Halkidi, M., Vazirgiannis, M., Batistakis, I.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–267. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  12. 12.
    Jain, A.K., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  13. 13.
    Law, M., Topchy, A., Jain, A.K.: Multiobjective data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (June 2004) (to appear)Google Scholar
  14. 14.
    Lozano, J.A., Larrañaga, P.: Applying genetic algorithms to search for the best hierarchical clustering of a dataset. Pattern Recognition Letters 20, 911–918 (1999)CrossRefGoogle Scholar
  15. 15.
    MacQueen, L.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  16. 16.
    Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition 33, 1455–1465 (2000)CrossRefGoogle Scholar
  17. 17.
    Radcliffe, N.J.: Equivalence class analysis of genetic algorithms. Complex Systems 5, 183–205 (1991)MATHMathSciNetGoogle Scholar
  18. 18.
    Topchy, A., Jain, A.K., Punch, W.: A mixture model for clustering ensembles. In: Proceedings SIAM Conf. on Data Mining (2004) (in press)Google Scholar
  19. 19.
    van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)Google Scholar
  20. 20.
    Vorhees, E.: The effectiveness and efficiency of agglomerative hierarchical clustering in document retrieval. PhD thesis, Department of Computer Science, Cornell University (1985)Google Scholar
  21. 21.
    Weisstein, E.W.: Box-and-whisker plot. From MathWorld – A Wolfram Web Resource, http://mathworld.wolfram.com/Box-and-WhiskerPlot.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Julia Handl
    • 1
  • Joshua Knowles
    • 1
  1. 1.Department of ChemistryUMISTManchesterUK

Personalised recommendations