Skip to main content

Part of the book series: Natural Computing Series ((NCS))

Abstract

Clearly the quality of discovered knowledge strongly depends on the quality of the data being mined. This has motivated the development of several algorithms for data preparation tasks, as discussed in chapter 4.

“In reality, the boundary between pre-processor and classifier is arbitrary. If the pre-processor generated the predicted class label as a feature [attribute], then the classifier would be trivial. Similarly, the pre-processor could be trivial and the classifier do all the work.” [Sherrah et al. 1997, p. 305]

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Bala, K. De Jong, J. Huang, H. Vafaie and H. Wechsler. Hybrid learning using genetic algorithms and decision trees for pattern classification. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI ‘85),719–724. 1995.

    Google Scholar 

  2. J. Bala, K. De Jong, J. Huang, H. Vafaie and H. Wechsler. Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4(3), 297–312, 1996.

    Google Scholar 

  3. K. Chen and H. Liu. Towards an evolutionary algorithm: a comparison of two feature selection algorithms. Proceedings of the Congress on Evolutionary Computation (CEC ‘89), 1309–1313. Washington D.C., 1999.

    Google Scholar 

  4. S. Chen, C. Guerra-Salcedo, and S.F. Smith. Non-standard crossover for a standard representation — commonality-based feature subset selection. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ‘89), 129–134. Morgan Kaufmann, 1999.

    Google Scholar 

  5. K.J. Cherkauer and J.W. Shavlik. Growing simpler decision trees to facilitate knowledge discovery. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD ‘86), 315–318. AAAI Press, 1996.

    Google Scholar 

  6. C. Emmanouilidis, A. Hunter and J. Maclntyre. A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator. Proceedings of the 2000 Congress on Evolutionary Computation (CEC ‘2000), 309–316. IEEE, 2000.

    Google Scholar 

  7. A.A. Freitas. The principle of transformation between efficiency and effectiveness: towards a fair evaluation of the cost-effectiveness of KDD techniques. Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ‘87). Lecture Notes in Artificial Intelligence 1263, 299–306. Springer, 1997.

    Google Scholar 

  8. A.A. Freitas. A survey of evolutionary algorithms for data mining and knowledge discovery. To appear in: A. Ghosh and S. Tsutsui (Eds.) Advances in Evolutionary Computation. Springer, 2002.

    Google Scholar 

  9. D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.

    Google Scholar 

  10. C. Guerra-Salcedo and D. Whitley. Genetic search for feature subset selection: a comparison between CHC and GENESIS. Genetic Programming 1998: Proceedings of the 3rd Annual Conference, 504–509. Morgan Kaufmann, 1998.

    Google Scholar 

  11. C. Guerra-Salcedo and D. Whitley. Genetic approach to feature selection for ensemble creation. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ‘89), 236–243. Morgan Kaufmann, 1999.

    Google Scholar 

  12. C. Guerra-Salcedo and D. Whitley. Feature selection mechanisms for ensemble creation: a genetic search perspective. In: A.A. Freitas (Ed.) Data Mining with Evolutionary Algorithms: Research Directions — Papers from the AAAI ‘89/GECCO ‘89 Workshop. Technical Report WS-99–06, 13–17. AAAI Press, 1999.

    Google Scholar 

  13. C. Guerra-Salcedo, S. Chen, D. Whitley, and S. Smith. Fast and accurate feature selection using hybrid genetic strategies. Proceedings of the Congress on Evolutionary Computation (CEC ‘89), 177–184. Washington D.C., USA. 1999.

    Google Scholar 

  14. Y-J. Hu. A genetic programming approach to constructive induction. Genetic Programming 1998: Proceedings of the 3rd Annual Conference, 146–151. Morgan Kaufmann, 1998.

    Google Scholar 

  15. Y-J. Hu and D. Kibler. Generation of attributes for learning algorithms. Proceedings of the 1996 National Conference on Artificial Intelligence (AAAI ‘86), 806–811. AAAI Press, 1996.

    Google Scholar 

  16. H. Ishibuchi and T. Nakashima. Multi-objective pattern and feature selection by a genetic algorithm. Proceedings of the 2000 Genetic and Evolutionary Computation Conference (GECCO ‘2000), 1069–1076. Morgan Kaufmann, 2000.

    Google Scholar 

  17. Y. Kim, W.N. Street and F. Menczer. Feature selection in unsupervised learning via evolutionary search. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘2000), 365–369. ACM, 2000.

    Google Scholar 

  18. M. Kudo and J. Sklansky. Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33 (2000), 2541.

    Google Scholar 

  19. I. Kuscu. Evolution of learning rules for hard learning problems. Proceedings of the 5th Annual Evolutionary Programming Conference MIT Press, 1996.

    Google Scholar 

  20. I. Kuscu. A genetic constructive induction model. Proceedings of the Congress on Evolutionary Computation (CEC ‘89), 212–217. Washington D.C., 1999.

    Google Scholar 

  21. I. Kuscu. Generalisation and domain specific functions in genetic programming. Proceedings of the Congress on Evolutionary Computation (CEC ‘2000), 1393–1400. IEEE, 2000.

    Google Scholar 

  22. X. Llora and J.M. Garrell. Inducing partially-defined instances with evolutionary algorithms. Proceedings of the 18th International Conference on Machine Learning (ICML ‘2001), 337–344. Morgan Kaufmann, 2001.

    Google Scholar 

  23. M.J. Martin-Bautista and M.-A. Vila. A survey of genetic feature selection in mining issues. Proceedings of the Congress on Evolutionary Computation (CEC ‘89), 1314–1321. IEEE, 1999.

    Google Scholar 

  24. F. Menczer, M. Degeratu and W.N. Street. Efficient and scalable Pareto optimization by evolutionary local selection algorithms. Evolutionary Computation 8(2), 223–247, 2000.

    Google Scholar 

  25. A. Moser and M.N. Murty. On the scalability of genetic algorithms to very large-scale feature selection. Proceedings of the Real-World Applications of Evolutionary Computing (EvoWorkshops 2000). Lecture Notes in Computer Science 1803, 77–86. Springer, 2000.

    Google Scholar 

  26. Punch et al. 1993] Punch, W.F.; Goodman, E.D.; Pei, M.; Chia-Shun, L.; Hovland, P. and Enbody, R. Further research on feature selection and classification using genetic algorithms. Proceedings of the 5th International Conference Genetic Algorithms (ICGA ‘83),557–564.

    Google Scholar 

  27. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.

    Google Scholar 

  28. H. Ragavan, L. Rendell, M. Shaw and A. Tessmer. Complex concept acquisition through direct search and feature caching. Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI ‘83), 946–951. 1993.

    Google Scholar 

  29. M.L. Raymer, W.F. Punch, E.D. Goodman and L.A. Kuhn. Genetic programming for improved data mining - application to the biochemistry of protein interactions. Genetic Programming 1996: Proceedings of the 1st Annual Conference, 375–380. Morgan Kaufmann, 1996.

    Google Scholar 

  30. M.L. Raymer, W.F. Punch, E.D. Goodman, P.C. Sanschagrin and L.A. Kuhn. Simultaneous feature scaling and selection using a genetic algorithm. Proceedings of the 7th International Conference on Genetic Algorithms (ICGA ‘87), 561–567. Morgan Kaufmann, 1997.

    Google Scholar 

  31. M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn and A.K. Jain. Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4(2), 164–171, 2000.

    Google Scholar 

  32. P.K. Sharpe and R.P. Glover. Efficient GA based techniques for classification. Applied Intelligence 11, 277–284, 1999.

    Article  Google Scholar 

  33. J.R. Sherrah, R.E. Bogner and A. Bouzerdoum. The evolutionary pre-processor: automatic feature extraction for supervised classification using genetic programming. Genetic Programming 1997: Proceedings of the 2nd Annual Conference (GP ‘87), 304–312. Morgan Kaufmann, 1997.

    Google Scholar 

  34. T. Terano and Y. Ishino. Interactive genetic algorithm based feature selection and its application to marketing data analysis. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection, 393–406. Kluwer, 1998.

    Google Scholar 

  35. Thompson 1998] S. Thompson. Pruning boosted classifiers with a real valued genetic algorithm. Research and Develop. in Expert Systems XV — Proceedings of ES’98,133–146. Springer, 1998.

    Google Scholar 

  36. S. Thompson. Genetic algorithms as postprocessors for data mining. In: A.A. Freitas (Ed.) Data Mining with Evolutionary Algorithms: Research Directions — Papers from the AAAI Workshop, 18–22. Technical Report WS-99–06. AAAI Press, 1999.

    Google Scholar 

  37. H. Vafaie and K. DeJong. Evolutionary Feature Space Transformation. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection, 307–323. Kluwer, 1998.

    Google Scholar 

  38. K. Wang and S. Sundaresh. Selecting features by vertical compactness of data. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection, 71–84. Kluwer, 1998.

    Google Scholar 

  39. J. Yang and V. Honavar. Feature subset selection using a genetic algorithm. Genetic Programming 1997: Proceedings of the 2nd Annual Conference (GP ‘87), 380-–385. Morgan Kaufmann, 1997.

    Google Scholar 

  40. J. Yang and V. Honavar. Feature subset selection using a genetic algorithm. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection, 117–136. Kluwer, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Freitas, A.A. (2002). Evolutionary Algorithms for Data Preparation. In: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04923-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-04923-5_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-07763-0

  • Online ISBN: 978-3-662-04923-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics