An Investigation of Representations and Operators for Evolutionary Data Clustering with a Variable Number of Clusters
This paper analyses the properties of four alternative representation/operator combinations suitable for data clustering algorithms that keep the number of clusters variable. These representations are investigated in the context of their performance when used in a multiobjective evolutionary clustering algorithm (MOCK), which we have described previously. To shed light on the resulting performance differences observed, we consider the relative size of the search space and heuristic bias inherent to each representation, as well as its locality and heritability under the associated variation operators. We find that the representation that performs worst when a random initialization is employed, is nevertheless the best overall performer given the heuristic initialization normally used in MOCK. This suggests there are strong interaction effects between initialization, representation and operators in this problem.
KeywordsPareto Front Data Item Cluster Solution Cluster Membership Initialization Scheme
Unable to display preview. Download preview PDF.
- 1.Cole, R.M.: Clustering with genetic algorithms. Master’s thesis, University of Western Australia, Australia (1998)Google Scholar
- 2.Corne, D.W., Knowles, J.D., Oates, M.J.: PESA-II: Region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 283–290. ACM Press, New York (2001)Google Scholar
- 3.Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley and Son Ltd., New York (1998)Google Scholar
- 6.Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation (in press, 2006)Google Scholar
- 8.Ma, P.C.H., Chan, K.C.C., Yao, X., Chiu, D.K.Y.: An evolutionary clustering algorithm for gene expression microarray data analysis. IEEE Transactions on Evolutionary Computation (in press, 2006)Google Scholar
- 9.MacQueen, L.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
- 10.Park, Y.-J., Song, M.-S.: A genetic algorithm for clustering problems. In: Proceedings of the Third Annual Conference on Genetic Programming, pp. 568–575. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 11.Radcliffe, N.J., Surry, P.D.: Fitness variance of formae and performance prediction. In: Foundations of Genetic Algorithms, vol. 3, pp. 51–72. Morgan Kaufmann Publishers, San Mateo (1995)Google Scholar
- 14.Sloane, N.J.A.: Series A060281 in The On-Line Encyclopedia of Integer SequencesGoogle Scholar
- 15.Zitzler, E.: Evolutionary algorithms for multiobjective optimization: methods and applications. PhD thesis, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland (1999)Google Scholar