Skip to main content

Abstract

Evolutionary Algorithms (EAs) are stochastic search algorithms inspired by the process of Darwinian evolution. The motivation for applying EAs to Data Mining is that they are robust, adaptive search techniques that perform a global search in the solution space. This chapter reviews mainly two kinds of EAs, viz. Genetic Algorithms (GAs) and Genetic Programming (GP), and discusses how EAs can be applied to several Data Mining tasks, namely: discovery of classification rules, clustering, attribute selection and attribute construction. It also discusses the basic idea of Multi-Objective EAs, based on the concept of Pareto dominance, which also has applications in Data Mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • M.S. Aldenderfer and R.K. Blashfield. Cluster Analysis (Sage University Paper Series on Quantitative Applications in the Social Sciences, No. 44) Sage Publications, 1984.

    Google Scholar 

  • J. Atkinson-Abutridy, C. Mellish, and S. Aitken. A semantically guided and domain-independent evolutionary model for knowledge discovery from texts. IEEE Trans. Evolutionary Computation 7(6), 546–560, 2003.

    Article  Google Scholar 

  • E. Backer. Computer-Assisted Reasoning in Cluster Analysis. Prentice-Hall, 1995.

    Google Scholar 

  • T. Back, D.B. Fogel and T. Michalewicz (Eds.) Evolutionary Computation I: Basic Algorithms and Operators. Institute of Physics Publishing, 2000.

    Google Scholar 

  • J. Bala, K. De Jong, J. Huang, H. Vafaie and H. Wechsler. Hybrid learning using genetic algorithms and decision trees for pattern classification. Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI-95), 719–724, 1995.

    Google Scholar 

  • J. Bala, K. De Jong, J. Huang, H. Vafaie and H. Wechsler. Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4(3): 297–312, 1996.

    Google Scholar 

  • W. Banzhaf. Interactive evolution. In: T. Back, D.B. Fogel and T. Michalewicz (Eds.) Evolutionary Computation I, 228–236. Institute of Physics Pub, 2000.

    Google Scholar 

  • W. Banzhaf, P. Nordin, R.E. Keller, and F.D. Francone. Genetic Programming ∼ an Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, 1998.

    Google Scholar 

  • S. Bhattacharrya. Direct marketing response models using genetic algorithms. Proceedings of the 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD-98), 144–148. AAAI Press, 1998.

    Google Scholar 

  • T. Blickle. Tournament selection. In: T. Back, D.B. Fogel and T. Michalewicz (Eds.) Evolutionary Computation I: Basic Algorithms and Operators, 181–186. Institute of Physics Publishing, 2000.

    Google Scholar 

  • L.B. Booker. Recombination distributions for genetic algorithms. In: D. Whitley (Ed.) Foundations of Genetic Algorithms 2, 29–44. Morgan Kaufmann, 1991.

    Google Scholar 

  • R.J. Brachman and T. Anand. The process of knowledge discovery in databases: a human-centered approach. In: U.M. Fayyad et al (Eds.) Advances in Knowledge Discovery and Data Mining, 37–58. AAAI/MIT, 1996.

    Google Scholar 

  • E. Cantu-Paz. Efficient and Accurate Parallel Genetic Algorithms. Kluwer, 2000.

    Google Scholar 

  • D.R. Carvalho and A.A. Freitas. A hybrid decision tree/genetic algorithm method for Data Mining. Special issue on Soft Computing Data Mining, Information Sciences 163(1–3), pp. 13–35. 14 June 2004, 2004.

    Google Scholar 

  • S. Chen, C. Guerra-Salcedo, and S.F. Smith. Non-standard crossover for a standard representation-commonality-based feature subset selection. Proc. Genetic and Evolutionary Computation Conf. (GECCO-99), 129–134. Morgan Kaufmann, 1999.

    Google Scholar 

  • K.J. Cherkauer and J.W. Shavlik. Growing simpler decision trees to facilitate knowledge discovery. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), 315–318. AAAI Press, 1996.

    Google Scholar 

  • C.A. Coello Coello, D.A. Van Veldhuizen and G.B. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer, 2002.

    Google Scholar 

  • K. Deb. Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, 2001.

    Google Scholar 

  • K. Deb and D.E. Goldberg. An investigation of niche and species formation in genetic function optimization. Proc. 2nd Int. Conf. Genetic Algorithms (ICGA-89), 42–49, 1989.

    Google Scholar 

  • V. Dhar, D. Chou and F. Provost. Discovering interesting patterns for investment decision making with GLOWER — a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery 4(4), 251–280, 2000.

    Article  Google Scholar 

  • F. Divina and E. Marchiori. Evolutionary Concept Learning. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2002), 343–350. Morgan Kaufmann, 2002.

    Google Scholar 

  • A.E. Eiben and J.E. Smith. Introduction to Evolutionary Computing. Springer, 2003.

    Google Scholar 

  • C. Emmanouilidis, A. Hunter and J. MacIntyre. A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator. Proc. 2000 Congress on Evolutionary Computation (CEC-2000), 309–316. IEEE, 2000.

    Google Scholar 

  • C. Emmanouilidis. Evolutionary multi-objective feature selection and ROC analysis with application to industrial machinery fault diagnosis. In: K. Giannakoglou et al. (Eds.) Evolutionary Methods for Design, Optimisation and Control. Barcelona: CIMNE, 2002.

    Google Scholar 

  • L.J. Eshelman; R.A. Caruana and J.D. Schaffer. Biases in the crossover landscape. Proc. 2nd Int. Conf. Genetic Algorithms (ICGA-89), 10–19, 1989.

    Google Scholar 

  • L.J. Eshelman, K.E. Mathias and J.D. Schaffer. Crossover operator biases: exploiting the population distribution. Proc. 7th Int. Conf. on Genetic Algorithms (ICGA-97), 354–361. Morgan Kaufmann, 1997.

    Google Scholar 

  • V. Estivill-Castro and A.T. Murray. Spatial clustering for data mining with genetic algorithms. Tech. Report FIT-TR-97-10. Queensland University of Technology. Australia, 1997.

    Google Scholar 

  • E. Falkenauer. Genetic Algorithms and Grouping Problems. John-Wiley and Sons, 1998.

    Google Scholar 

  • U.M. Fayyad, G. Piatetsky-Shapiro and P. Smyth. From Data Mining to knowledge discovery: an overview. In: U.M. Fayyad et al (Eds.) Advances in Knowledge Discovery and Data Mining, 1–34. AAAI/MIT, 1996.

    Google Scholar 

  • A.A. Freitas and S.H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer, 1998.

    Google Scholar 

  • A.A. Freitas. Understanding the crucial role of attribute interaction in Data Mining. Artificial Intelligence Review 16(3), 177–199, 2001.

    Article  MATH  Google Scholar 

  • A.A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, 2002A.

    Google Scholar 

  • A.A. Freitas. A survey of evolutionary algorithms for Data Mining and knowledge discovery. In: A. Ghosh and S. Tsutsui. (Eds.) Advances in Evolutionary Computation, pp. 819–845. Springer-Verlag, 2002B.

    Google Scholar 

  • A.A. Freitas. Evolutionary Computation. In: W. Klosgen and J. Zytkow (Eds.) Handbook of Data Mining and Knowledge Discovery, pp. 698–706. Oxford Univ. Press, 2002C.

    Google Scholar 

  • J. Furnkranz and P.A. Flach. An analysis of rule evaluation metrics. Proc.20th Int. Conf. Machine Learning (ICML-2003). Morgan Kaufmann, 2003.

    Google Scholar 

  • C. Gathercole and P. Ross. Tackling the Boolean even N parity problem with genetic programming and limited-error fitness. Genetic Programming 1997: Proc. 2nd Conf. (GP-97), 119–127. Morgan Kaufmann, 1997.

    Google Scholar 

  • A. Ghozeil and D.B. Fogel. Discovering patterns in spatial data using evolutionary programming. Genetic Programming 1996: Proceedings of the 1st Annual Conf., 521–527. MIT Press, 1996.

    Google Scholar 

  • A. Giordana, L. Saitta, F. Zini. Learning disjunctive concepts by means of genetic algorithms. Proc. 10th Int. Conf. Machine Learning (ML-94), 96–104. Morgan Kaufmann, 1994.

    Google Scholar 

  • D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.

    Google Scholar 

  • D.E. Goldberg and J. Richardson. Genetic algorithms with sharing for multi-modal function optimization. Proc. Int. Conf. Genetic Algorithms (ICGA-87), 41–49, 1987.

    Google Scholar 

  • C. Guerra-Salcedo and D. Whitley. Genetic search for feature subset selection: a comparison between CHC and GENESIS. Genetic Programming 1998: Proc. 3rd Annual Conf., 504–509. Morgan Kaufmann. 1998.

    Google Scholar 

  • C. Guerra-Salcedo, S. Chen, D. Whitley, and S. Smith. Fast and accurate feature selection using hybrid genetic strategies. Proc. Congress on Evolutionary Computation (CEC-99), 177–184. IEEE, 1999.

    Google Scholar 

  • I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182, 2003.

    Article  Google Scholar 

  • L.O. Hall, I.B. Ozyurt, J.C. Bezdek. Clustering with a genetically optimized approach. IEEE Trans. on Evolutionary Computation 3(2), 103–112, 1999.

    Article  Google Scholar 

  • D.J. Hand. Construction and Assessment of Classification Rules. Wiley, 1997.

    Google Scholar 

  • J. Hekanaho. Symbiosis in multimodal concept learning. Proc. 1995 Int. Conf. on Machine Learning (ML-95), 278–285. Morgan Kaufmann, 1995.

    Google Scholar 

  • J. Hekanaho. Testing different sharing methods in concept learning. TUCS Technical Report No. 71. Turku Centre for Computer Science, Finland, 1996.

    Google Scholar 

  • Hu Y-J. A genetic programming approach to constructive induction. Genetic Programming 1998: Proc. 3rd Annual Conf., 146–151. Morgan Kaufmann, 1998.

    Google Scholar 

  • H. Ishibuchi and T. Nakashima. Multi-objective pattern and feature selection by a genetic algorithm. Proc. 2000 Genetic and Evolutionary Computation Conf. (GECCO-2000), 1069–1076. Morgan Kaufmann, 2000.

    Google Scholar 

  • L. Jourdan, C. Dhaenens-Flipo and E.-G. Talbi. Discovery of genetic and environmental interactions in disease data using evolutionary computation. In: G.B. Fogel and D.W. Corne (Eds.) Evolutionary Computation in Bioinformaiics, 297–316. Morgan Kaufmann, 2003.

    Google Scholar 

  • Y. Kim, W.N. Street and F. Menczer. Feature selection in unsupervised learning via evolutionary search. Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2000), 365–369. ACM, 2000.

    Google Scholar 

  • J.R. Koza. Genetic Programming: on the programming of computers by means of natural selection. MIT Press, 1992.

    Google Scholar 

  • K. Krawiec. Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3(4), 329–344, 2002.

    Article  MATH  Google Scholar 

  • K. Krsihma and M. N. Murty. Genetic k-means algorithm. IEEE Transactions on Systems, Man and Cyberneics-Part B: Cybernetics, 29(3), 433–439, 1999.

    Google Scholar 

  • W.J. Krzanowski and F.H.C. Marriot. Kendall’s Library of Statistics 2: Multi-variate Analysis-Part 2. Chapter 10-Cluster Analysis, pp. 61–94.London: Arnold, 1995.

    Google Scholar 

  • M. Kudo and J. Sklansky. Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33(2000), 25–41, 2000.

    Article  Google Scholar 

  • J.J. Liu and J.T.-Y. Kwok. An extended genetic rule induction algorithm. Proc. 2000 Congress on Evolutionary Computation (CEC-2000). IEEE, 2000.

    Google Scholar 

  • H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer, 1998.

    Google Scholar 

  • B. Liu, W. Hsu and S. Chen. Using general impressions to analyze discovered classification rules. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD-97), 31–36. AAAI Press, 1997.

    Google Scholar 

  • X. Llora and J. Garrell. Prototype induction and attribute selection via evolutionary algorithms. Intelligent Data Analysis 7, 193–208, 2003.

    Google Scholar 

  • M.T. Miller, A.K. Jerebko, J.D. Malley, R.M. Summers. Feature selection for computer-aided polyp detection using genetic algorithms. Medical Imaging 2003: Physiology and Function: methods, systems and applications. Proc. SPIE Vol. 5031, 2003.

    Google Scholar 

  • A. Moser and M.N. Murty. On the scalability of genetic algorithms to very large-scale feature selection. Proc. Real-World Applications of Evolutionary Computing (EvoWorkshops 2000). LNCS 1803, 77–86. Springer, 2000.

    Google Scholar 

  • M.A. Muharram and G.D. Smith. Evolutionary feature construction using information gain and gene index. Genetic Programming: Proc. 7th European Conf. (EuroGP-2003), LNCS 3003, 379–388. Springer, 2004.

    Google Scholar 

  • A. Giordana and F. Neri. Search-intensive concept induction. Evolutionary Computation 3(4), 375–416, 1995.

    Google Scholar 

  • F.B. Otero, M.M.S. Silva, A.A. Freitas and J.C. Nievola. Genetic programming for attribute construction in Data Mining. Genetic Programming: Proc. EuroGP-2003, LNCS 2610, 384–393. Springer, 2003.

    Google Scholar 

  • A. Papagelis and D. Kalles. Breeding decision trees using evolutionary techniques. Proc. 18th Int. Conf. Machine Learning (ICML-2001), 393–400. Morgan Kaufmann, 2001.

    Google Scholar 

  • G.L. Pappa, A.A. Freitas and C.A.A. Kaestner. A multiobjective genetic algorithm for attribute selection. Proc. 4th Int. Conf. On Recent Advances in Soft Computing (RASC-2002), 116–121. Nottingham Trent University, UK, 2002.

    Google Scholar 

  • R. Poli and W.B. Langdon. On the search properties of different crossover operators in genetic programming. Genetic Programming 1998: Proc. 3rd Annual Conf. (GP-98), 293–301. Morgan Kaufmann, 1998.

    Google Scholar 

  • R. Poli, J. Page and W.B. Langdon. Smooth uniform crossover, sub-machine code GP and demes: a recipe for solving high-order boolean parity problems. Proc. Genetic and Evolutionary Computation Conf (GECCO-99), 1162–1169. Morgan Kaufmann, 1999.

    Google Scholar 

  • J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  • W. Romao, A.A. Freitas and R.C.S. Pacheco. A Genetic Algorithm for Discovering Interesting Fuzzy Prediction Rules: applications to science and technology data. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2002), pp. 1188–1195. Morgan Kaufmann, 2002.

    Google Scholar 

  • W. Romao, A.A. Freitas, I.M.S. Gimenes. Discovering interesting knowledge from a science and technology database with a genetic algorithm. Applied Soft Computing 4(2), pp. 121–137, 2004.

    Article  Google Scholar 

  • A. Rozsypal and M. Kubat. Selecting representative examples and attributes by a genetic algorithm. Intelligent Data Analysis 7, 290–304, 2003.

    Google Scholar 

  • P.K. Sharpe and R.P. Glover. Efficient GA based techniques for classification. Applied Intelligence 11, 277–284, 1999.

    Article  Google Scholar 

  • R.E. Smith. Learning classifier systems. In: T. Back, D.B. Fogel and T. Michalewicz (Eds.) Evolutionary Computation 1: Basic Algorithms and Operators, 114–123. Institute of Physics Publishing, 2000.

    Google Scholar 

  • M.G. Smith and L. Bull. Feature construction and selection using genetic programming and a genetic algorithm. Genetic Programming: Proc. EuroGP-2003, LNCS 2610, 229–237. Springer, 2003.

    Google Scholar 

  • R. Srikanth, R. George, N. Warsi, D. Prabhu, F.E. Petry, B.P. Buckles. A variable-length genetic algorithm for clustering and classification. Pattern Recognition Letters 16(8), 789–800, 1995.

    Article  Google Scholar 

  • T. Terano and Y. Ishino. Interactive genetic algorithm based feature selection and its application to marketing data analysis. In: Liu H and Motoda H (Eds.) Feature Extraction, Construction and Selection: a Data Mining perspective, 393–406. Kluwer, 1998.

    Google Scholar 

  • T. Terano and M. Inada.Data Mining from clinical data using interactive evolutionary computation. In: A. Ghosh and S. Tsutsui (Eds.) Advances in Evolutionary Computing: theory and applications, 847–861. Springer, 2002.

    Google Scholar 

  • H. Vafaie and K. De Jong. Evolutionary Feature Space Transformation. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection, 307–323. Kluwer, 1998.

    Google Scholar 

  • I.H. Witten and E. Frank. Data Mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, 2000.

    Google Scholar 

  • M.L. Wong and K.S. Leung. Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer, 2000.

    Google Scholar 

  • J. Yang and V. Honavar. Feature subset selection using a genetic algorithm. Genetic Programming 1997: Proc. 2nd Annual Conf. (GP-97), 380–385. Morgan Kaufmann, 1997.

    Google Scholar 

  • J. Yang and V. Honavar. Feature subset selection using a genetic algorithm. In: Liu, H. and Motoda, H (Eds.) Feature Extraction, Construction and Selection, 117–136. Kluwer, 1998.

    Google Scholar 

  • P. Zhang, B. Verma, K. Kumar. Neural vs. Statistical classifier in conjunction with genetic algorithm feature selection in digital mammography. Proc. Congress on Evolutionary Computation (CEC-2003). IEEE Press, 2003.

    Google Scholar 

  • C. Zhou, W. Xiao, T.M. Tirpak and P.C. Nelson. Evolving accurate and compact classification rules with gene expression programming, IEEE Trans, on Evolutionary Computation special, 7(6), 519–531, 2003.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Freitas, A.A. (2005). Evolutionary Algorithms for Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_20

Download citation

  • DOI: https://doi.org/10.1007/0-387-25465-X_20

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-24435-8

  • Online ISBN: 978-0-387-25465-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics