Skip to main content

Computational Results on the Automatic Design of Full Rule Induction Algorithms

  • Chapter
  • First Online:
Automating the Design of Data Mining Algorithms

Part of the book series: Natural Computing Series ((NCS))

  • 990 Accesses

Abstract

Chapter 6 (entitled “Computational Results on the Automatic Design of Full Rule Induction Algorithms”), reports the results of extensive experiments performed to evaluate the proposed genetic programming system. This chapter is divided into two parts. In the first part the experiments evaluate the system’s effectiveness in producing rule induction algorithms robust across different application domains. In this part of the chapter we report results involving a number of issues, namely: (a) we investigate the system’s sensitivity to values of some parameters (like crossover and mutation rates, and the number of datasets used to “train” the system); (b) we compare the predictive accuracies obtained by the automatically evolved rule induction algorithms with the accuracies obtained by baseline well-known human-designed rule induction algorithms; (c) we discuss to what extent the automatically evolved algorithms differ from the manually-designed ones; (d) we investigate the system’s sensitivity to variations in the grammar; (e) we compare the effectiveness of genetic programming and hill-climbing search as different methods for searching in the space of candidate rule induction algorithms; and (f) we evaluate a multiobjective version of the proposed genetic programming system, based on the Pareto optimality concept. In the second part of this chapter the experiments evaluate the system’s effectiveness in producing rule induction algorithms tailored to a particular application domain. We first report the results of experiments evolving rule induction algorithms tailored to individual datasets of the well-known UCI (University of California at Irvine) dataset repository, and then report the results of experiments in more challenging datasets in the area of bioinformatics, involving protein function prediction. A brief overview of the problem of protein function prediction is included in this part of the chapter, to make it more self-contained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann (1998)

    Google Scholar 

  2. Brodley, C., Friedl, M.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)

    MATH  Google Scholar 

  3. Clare, A., King, R.D.: Machine learning of functional class from phenotype data. Bioinformatics 18(1), 160–166 (2002)

    Article  Google Scholar 

  4. Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Y. Kodratoff (ed.) Proc. of the European Working Session on Learning on Machine Learning (EWSL-91), pp. 151–163. Springer-Verlag, New York, NY, USA (1991)

    Google Scholar 

  5. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)

    Google Scholar 

  6. Cohen, W.W.: Fast effective rule induction. In: A. Prieditis, S. Russell (eds.) Proc. of the 12th Int. Conf. on Machine Learning (ICML-95), pp. 115–123. Morgan Kaufmann, Tahoe City, CA (1995)

    Google Scholar 

  7. Correa, E.S., Freitas, A.A., Johnson, C.G.: A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set. In: Proc. of the Genetic and Evolutionary Computation Conf. (GECCO-06), pp. 35–42. ACM Press (2006)

    Google Scholar 

  8. Fillmore, D.: It’s a GPCR world. Modern Drug Discovery 11(7), 24–28 (2004)

    Google Scholar 

  9. Freitas, A.A., Wieser, D., Apweiler, R.: On the importance of comprehensible classification models for protein function prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics (in press)

    Google Scholar 

  10. Higgs, P.G., Attwood, T.K.: Bioinformatics and Molecular Evolution. Blackwell (2005)

    Google Scholar 

  11. Holden, N., Freitas, A.: Hierarchical classification of G-protein-coupled receptors with a PSO/ACO algorithm. In: Proc. of the IEEE Swarm Intelligence Symposium (SIS-06), pp. 77–84. IEEE Press (2006)

    Google Scholar 

  12. J.He, Yao, X.: Towards an analytic framework for analyzing the computation time of evolutionary algorithms. Artificial Intelligence 145(1-2), 59–97 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  13. Karwath, A., King, R.: Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 3(11), online publication (2002). http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=107726

  14. Liu, H., Motoda, H. (eds.): Feature Selection for Knowledge Discovery and Data Mining. Kluwer (1998)

    Google Scholar 

  15. Mirkin, B., Ritter, O.: A feature-based approach to discrimination and prediction of protein folding groups. In: Genomics and Proteomics, pp. 155–177. Springer (2000)

    Google Scholar 

  16. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html (1998)

  17. Pappa, G.L.: Automatically evolving rule induction algorithms with grammar-based genetic programming. Ph.D. thesis, Computing Laboratory, University of Kent, Canterbury, UK (2007)

    Google Scholar 

  18. Pappa, G.L., Baines, A.J., Freitas, A.A.: Predicting post-synaptic activity in proteins with data mining. Bioinformatics 21(Suppl. 2), ii19–ii25 (2005)

    Article  Google Scholar 

  19. Pappa, G.L., Freitas, A.A.: Automatically evolving rule induction algorithms tailored to the prediction of postsynaptic activity in proteins. Intelligent Data Analysis 13(2), 243–259 (2009)

    Google Scholar 

  20. Pappa, G.L., Freitas, A.A.: Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowledge and Information Systems 19(3), 283–309 (2009)

    Article  Google Scholar 

  21. Pazzani, M.J.: Knowledge discovery from data? IEEE Intelligent Systems 15(2), 10–13 (2000)

    Article  Google Scholar 

  22. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann (1993)

    Google Scholar 

  23. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall (2002)

    Google Scholar 

  24. Szafron, D., Lu, P., Greiner, R., Wishart, D., Poulin, B., Eisner, R., Lu, Z., Poulin, B., Anvik, J., Macdonnel, C.: Proteome analyst – transparent high-throughput protein annotation: function, localization and custom predictors. Nuclei Acids Research 32, W365–W371 (2004)

    Article  Google Scholar 

  25. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann (2005)

    Google Scholar 

  26. Wong, M.L., Leung, K.S.: Data Mining Using Grammar-Based Genetic Programming and Applications. Kluwer, Norwell, MA, USA (2000)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gisele L. Pappa .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pappa, G.L., Freitas, A.A. (2010). Computational Results on the Automatic Design of Full Rule Induction Algorithms. In: Automating the Design of Data Mining Algorithms. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02541-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02541-9_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02540-2

  • Online ISBN: 978-3-642-02541-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics