Abstract
This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting high-level knowledge from data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal R, Imielinski T and Swami A. Mining association rules between sets of items in large Databases. Proc. 1993 Int. Conf Management of Data (SIGMOD-93), 207–216. May 1993.
Agrawal R, Mannila H, Srikant R, Toivonen H and Verkamo AI. Fast Discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery and Data Mining, 307–328. AAAI/MIT Press, 1996.
[3] Anglano C, Giordana A, Lo Bello G and Saitta L. Coevolutionary, Distributed search for inducing concept Descriptions. Lecture Notes in Artificial Intelligence 1398. ECML-98: Proc. 10th Eur. Conf. Machine Learning, 422–333. Springer-Verlag, 1998.
[4] Araujo DLA, Lopes HS and Freitas AA. A parallel genetic algorithm for rule Discovery in large Databases. Proc. 1999 IEEE Systems, Man and Cybernetics Conf., v. 3, 940–945. Tokyo, 1999.
[5] Bala J, De Jong K, Huang J, Vafaie H and Wechsler H. Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4(3) — Special Issue on Evolution, Learning, and Instinct: 100 years of the Baldwin Effect. 1997.
Banzhaf W, Nordin P, Keller RE and Francone FD Genetic Programming — an Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, 1998.
Bhattacharyya S, Pictet O and Zumbach G. Representational semantics for genetic programming based learning in high-frequency financial Data. Genetic Programming 1998: Proc. 3rd Annual Conf., 11–16. Morgan Kaufmann, 1998.
Bojarczuk CC, Lopes HS and Freitas AA. Discovering comprehensible classification rules using genetic programming: a case study in a medical Domain. Proc. Genetic and Evolutionary Computation Conf. (GECCO-99), 953–958. Orlando, FL, USA, July/1999.
Bojarczuk CC, Lopes HS and Freitas AA. Genetic programming for knowledge discovery in chest pain Diagnosis. IEEE Engineering in Medicine and Biology Magazine& special issue on Data mining and knowledge Discovery, 19(4), 38–44, July/Aug. 2000.
Carvalho DR and Freitas AA. A hybrid Decision tree/genetic algorithm for coping with the problem of small Disjuncts in Data mining. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2000), 1061–1068. Las Vegas, NV, USA, July 2000.
Catlett J. On changing continuous attributes into ordered Discrete attributes. Proc. Eur. Working Session on Learning (EWSL-91). Lecture Notes in Artificial Intelligence 482, 164–178. Springer-Verlag, 1991.
Cherkauer KJ and Shavlik JW. Growing simpler Decision trees to facilitate knowledge discovery. Proc. 2nd Int. Conf. Knowledge Discovery& Data Mining (KDD-96), 315–318. AAAI Press, 1996.
De Jong KA, Spears WM and Gordon DF. Using genetic algorithms for concept learning. Machine Learning, 13, 161–188, 1993.
Dhar V, Chou D and Provost F. Discovering interesting patterns for investment decision making with GLOWER& a Genetic Learner Overlaid with Entropy Reduction. To appear in Data Mining and Knowledge Discovery Journal. 2000.
Domingos P. Knowledge acquisition from examples via multiple models. Machine Learning: Proc. 14th Int. Conf. (ICML-97), 98–106. Morgan Kaufmann, 1997.
Eggermont J, Eiben AE and van Hemert JI. A comparison of genetic programming variants for Data classification. Proc. Intelligent Data Analysis (IDA-99). 1999.
Falkenauer E. Genetic Algorithms and Grouping Problems. John Wiley& Sons, 1998.
Fayyad UM, Piatetsky-Shapiro G and Smyth P. From Data mining to knowledge discovery: an overview. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery& Data Mining, 1–34. AAAI/MIT, 1996.
Fisher DH. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172,1987.
Fisher D and Hapanyengwi G. Database management and analysis tools of machine induction. Journal of Intelligent Information Systems, 2(1), 5–38, 1993.
Flockhart IW and Radcliffe NJ. GA-MINER: parallel Data mining with hierarchical genetic algorithms — final report. EPCC-AIKMS-GA-MINER-Report 1.0. University of Edinburgh, UK, 1995.
Freitas AA. On objective measures of rule surprisingness. Lecture Notes in Artificial Intelligence 1510: Principles of Data Mining and Knowledge Discovery (Proc. 2nd Eur. Symp., PKDD’98, Nantes, France), 1–9. Springer-Verlag, 1998.
Freitas AA. On Rule Interestingness Measures. Knowledge-Based Systems, 12(5-6), 309–315, Oct. 1999.
Freitas AA. Understanding the crucial Differences between classification and Discovery of association rules — a position paper. To appear in ACM SIGKDD Explorations, 2(1), 2000.
Freitas AA and Lavington SH. Mining Very Large Databases with Parallel Processing. Kluwer, 1998.
Gebhardt F. Choosing among competing generalizations. Knowledge Acquisition, 3, 361–380, 1991,.
Giordana A and Neri F. Search-intensive concept induction. Evolutionary Computation 3(4), 375–416, Winter 1995.
Giordana A and Saitta L, Zini F. Learning Disjunctive concepts by means of genetic algorithms. Proc. 10th Int. Conf. Machine Learning (ML-94), 96–104. Morgan Kaufmann, 1994.
Goldberg DE Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.
Greene DP and Smith SF. Competition-based induction of Decision models from examples. Machine Learning, 13, 229–257, 1993.
Guerra-Salcedo C and Whitley D. Feature selection mechanisms for ensemble creation: a genetic search perspective. In: Freitas AA (Ed.) Data Mining with Evolutionary Algorithms: Research Directions — Papers from the AAAI Workshop, 13–17. Technical Report WS-99-06. AAAI Press, 1999.
Guyon I, Matic O and Vapnik V. Discovering informative patterns and Data cleaning. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery and Data Mining, 181–203. AAAI/MIT Press. 1996.
Hall LO, Ozyurt IB and Bezdek JC. Clustering with a genetically optimized approach. IEEE Trans. Evolutionary Computation 3(2), 103–112. July 1999.
Hand DJ. Construction and Assessment of Classification Rules. John Wiley& Sons, 1997.
Holland JH. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Mitchell T et al. (Eds.) Machine Learning, Vol. 2, 593–623. Morgan Kaufmann, 1986.
Hu Y-J. A genetic programming approach to constructive induction. Genetic Programming 1998: Proc. 3rd Annual Conf., 146–151. Morgan Kaufmann, 1998.
Janikow CZ. A knowledge-intensive genetic algorithm for supervised learning. Machine Learning, 13, 189–228, 1993.
John GH, Kohavi R and Pfleger K. Irrelevant features and the subset selection problem. Proc. 11th Int. Conf. Machine Learning, 121–129. 1994.
Kelly Jr. JD and Davis L. A hybrid genetic algorithm for classification. Proc. 12th Int. Joint Conf on A1, 645–650. 1991.
Klemettinen M, Mannila H, Ronkainen P, Toivonen H and Verkamo AI. Finding interesting rules from large sets of Discovered association rules. Proc. 3rd Int. Conf. on Information and Knowledge Management. Gaithersburg, MD, USA, Nov./Dec. 1994.
Koza JR. Genetic Programming: on the Programming of Computers by Means of Natural Selection. MIT Press, 1992.
Kudo M and Skalansky J. Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), 25–41, Jan. 2000.
Kwedlo W and Kretowski M. Discovery of Decision rules from Databases: an evolutionary approach. Proc. 2nd Eur. Symp. Principles of Data Mining and Knowledge Discovery (PKDD-98). Lecture Motes in Artificial Intelligence 1510, 371–378. Springer-Verlag, 1998.
Liu B, Hsu W. and Chen S. Using general impressions to analyze Discovered classification rules. Proc. 3rd Int. Conf. Knowledge Discovery& Data Mining, 31–36. AAAI Press, 1997.
Mahfoud SW. Niching Methods for Genetic Algorithms. Ph.D. Thesis. Univ. of Illinois at Urbana-Champaign. IlliGAL Report No. 95001. May 1995.
Martin-Bautista MJ and Vila MA. A survey of genetic feature selection in mining issues. Proc. Congr. Evolutionary Computation (CEC-99), 1314–1321. Washington DC, USA, July 1999.
Michalewicz O. Genetic Algorithms + Data Structures = Evolution Programs. 3rd Ed. Springer-Verlag, 1996.
Michie, D, Spiegelhalter, DJ and Taylor, CC. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.
Noda E, Freitas AA and Lopes HS. Discovering interesting prediction rules with a genetic algorithm. Proc. Conf. on Evolutionary Computation — 1999 (CEC-99), 1322–1329. Washington DC, USA, July 1999.
Park Y and Song M. A genetic algorithm for clustering problems. Genetic Programming 1998: Proc. 3rd Annual Conf., 568–575. Morgan Kaufmann, 1998.
Pei M, Goodman ED, Punch WF. Pattern Discovery from Data using genetic algorithms. Proc. 1st Pacific-Asia Conf. Knowledge Discovery& Data Mining (PAKDD-97). Feb. 1997.
Pfahringer B. Supervised and unsupervised Discretization of continuous features. Proc. 12th Int. Conf. Machine Learning, 456–463. 1995.
Poli R and Cagnoni S. Genetic programming with user-driven selection: experiments on the evolution of algorithms for image enhancement. Genetic Programming 1997: Proc. 2nd Annual Conf., 269–277. Morgan Kaufmann, 1997.
Punch WF, Goodman ED, Pei M, Chia-Sun L, Hovland P, Enbody R. Further research on feature selection and classification using genetic algorithms. Proc. 5th Int. Conf. Genetic Algorithms (ICGA-93), 557–564. Morgan Kaufmann, 1993.
Pyle D. Data Preparation for Data Mining. Morgan Kaufmann, 1999.
Ryan MD and Rayward-Smith VJ. The evolution of Decision trees. Genetic Programming 1998: Proc. 3rd Annual Conf., 350–358. Morgan Kaufmann, 1998.
Schaffer C. Overfitting avoidance as bias. Machine Learning, 10, 153–178, 1993.
Schapire RE, Freund Y, Bartlett P and Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Machine Learning: Proc. 14th Int. Conf. (ICML-97), 322–330. Morgan Kaufmann, 1997.
Simoudis E, Livezey B and Kerber R. Integrating inductive and Deductive reasoning for Data mining. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery and Data Mining, 353–373. AAAI/MIT Press, 1996.
Terano T and Ishino Y. Interactive genetic algorithm based feature selection and its application to marketing Data analysis. In: Liu H and Motoda H (Eds.) Feature Extraction, Construction and Selection: a Data mining perspective, 393–406. Kluwer, 1998.
Thompson S. Pruning boosted classifiers with a real valued genetic algorithm. Research& Development. in Expert Systems XV — Proc. ES’98, 133–146. Springer-Verlag, 1998.
Thompson S. Genetic algorithms as postprocessors for Data mining. In: Freitas AA (Ed.) Data Mining with Evolutionary Algorithms: Research Directions — Papers from the AAAI Workshop, 18–22. Technical Report WS-99-06. AAAI Press, 1999.
Vafaie H and De Jong K. Robust feature selection algorithms. Proc. 1993 IEEE Int. Conf on Tools with A1, 356–363. Boston, MS, USA. Nov. 1993.
Vafaie H and De Jong K. Evolutionary feature space transformation. In: Liu H and Motoda H (Eds.) Feature Extraction, Construction and Selection: a Data mining perspective, 307–323. Kluwer, 1998.
Weiss GM and Hirsh H. Learning to predict rare events in event sequences. Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, 359–363. AAAI Press, 1998.
Weiss SM and Indurkhya N. Predictive Data Mining: a practical guide. Morgan Kaufmann, 1998.
Weiss SM and Kulikowski CA. Computer Systems that Learn. Morgan Kaufmann, 1991.
Wong ML and Leung KS. Data Mining Using Grammar-Based Genetic Programming and Applications. Kluwer, 2000.
Yang J and Honavar V. Feature subset selection using a genetic algorithm. In: Liu O and Motoda H (Eds.) Feature Extraction, Construction and Selection: a Data mining perspective, 117–136. Kluwer, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Freitas, A.A. (2003). A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery. In: Ghosh, A., Tsutsui, S. (eds) Advances in Evolutionary Computing. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18965-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-18965-4_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-62386-8
Online ISBN: 978-3-642-18965-4
eBook Packages: Springer Book Archive