An Innovative Approach to Genetic Programming—based Clustering

  • I. De Falco
  • E. Tarantino
  • A. Della Cioppa
  • F. Fontanella
Part of the Advances in Soft Computing book series (AINSC, volume 34)


Most of the classical clustering algorithms are strongly dependent on, and sensitive to, parameters such as number of expected clusters and resolution level. To overcome this drawback, a Genetic Programming framework, capable of performing an automatic data clustering, is presented. Moreover, a novel way of representing clusters which provides intelligible information on patterns is introduced together with an innovative clustering process. The effectiveness of the implemented partitioning system is estimated on a medical domain by means of evaluation indices.


Genetic Program Evaluation Index Data Cluster Logical Formula Genetic Program System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fayyad U M, Piatetsky-Shapiro G, Smith P (1996) From data mining to knowledge discovery: an overview. In: Fayyad U M et al. (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, 1–34Google Scholar
  2. 2.
    Hand D J, Mannila H, Smyth P (1988) Principles of data mining. MIT PressGoogle Scholar
  3. 3.
    Han J, Kamber M 2001) Data mining: concepts and techniques. Morgan KaufmannGoogle Scholar
  4. 4.
    Zhang T, Ramakrishnan R, Livny M. (1996) BIRCH: an efficient data clustering method for very large databases. Proceedings of the ACM SIGMOD Int. Conf. on Management of Data, 103–114Google Scholar
  5. 5.
    Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. Proceedings of the ACM SIGMOD Int. Conf. on Management of Data, 73–84Google Scholar
  6. 6.
    Aggarwal C, Yu P S (2000) Finding generalized projected clusters in high dimensional spaces. Proceedings of the ACM SIGMOD Int. Conf. on Management of Data, 70–81Google Scholar
  7. 7.
    Bock H H (1996) Probability models in partitional cluster analysis. In: Ferligoj A, Kramberger A (eds) Developments in data analysis, 3–25Google Scholar
  8. 8.
    Fraley C, Raftery A (1998) How many clusters? Which clustering method? Answers via model–based cluster analysis. The Computer Journal 41 (8): 578–588MATHCrossRefGoogle Scholar
  9. 9.
    Lee C Y, Antonsson E K (2000) Dynamic partitional clustering using evolutionary strategies. Proceedings of the 3rd Asia–Pacific Conference on Simulated Evolution and Learning. IEEE Press, Nagoya, JapanGoogle Scholar
  10. 10.
    Jain A K, Murty M N, Flynn P J (1999) Data clustering: a review. ACM Computing Surveys 31 (3): 264–323CrossRefGoogle Scholar
  11. 11.
    Hall L O, Ozyurt B, Bezdek J C (1999) Clustering with a genetically optimized approach. IEEE Trans, on Evolutionary Computation 3(2):103–112CrossRefGoogle Scholar
  12. 12.
    Sarafis I, Zalzala A M S, Trinder P W (2002) A genetic rule–based data clustering toolkit. Proceedings of the IEEE Congress on Evolutionary Computation, 1238–1243. Honolulu, Hawaii, USAGoogle Scholar
  13. 13.
    Cristofor D, Simovici D A (2002) An information–theoretical approach to clustering categorical databases using genetic algorithms. Proceedings of the Second SIAM International Conference on Data Mining, 37–46. WashingtonGoogle Scholar
  14. 14.
    Babu G P, Marty M N (1994) Clustering with evolutionary strategies Pattern Recognition 27(2): 321–329Google Scholar
  15. 15.
    Koza J R (1992) Genetic Programming: on programming computers by means of natural selection and genetics. The MIT Press, Cambridge, MAGoogle Scholar
  16. 16.
    Yip A M (2002) A scale dependent data clustering model by direct maximization of homogeneity and separation. Proceedings of the Mathematical challenges in scientific data mining IPAM, 14–18 January, Scholar
  17. 17.
    Murphy P M, Aha D W UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • I. De Falco
    • 1
  • E. Tarantino
    • 1
  • A. Della Cioppa
    • 2
  • F. Fontanella
    • 3
  1. 1.Institute of High Performance Computing and Networking – CNRNaplesItaly
  2. 2.Dept. of Computer Science and Electrical EngineeringUniversity of SalernoFiscianoItaly
  3. 3.Dept. of Information Engineering and SystemsUniversity of NaplesNaplesItaly

Personalised recommendations