Computational Results on the Automatic Design of Full Rule Induction Algorithms

Pappa, Gisele L.; Freitas, Alex A.

doi:10.1007/978-3-642-02541-9_6

Gisele L. Pappa³ &
Alex A. Freitas⁴

Part of the book series: Natural Computing Series ((NCS))

990 Accesses

Abstract

Chapter 6 (entitled “Computational Results on the Automatic Design of Full Rule Induction Algorithms”), reports the results of extensive experiments performed to evaluate the proposed genetic programming system. This chapter is divided into two parts. In the first part the experiments evaluate the system’s effectiveness in producing rule induction algorithms robust across different application domains. In this part of the chapter we report results involving a number of issues, namely: (a) we investigate the system’s sensitivity to values of some parameters (like crossover and mutation rates, and the number of datasets used to “train” the system); (b) we compare the predictive accuracies obtained by the automatically evolved rule induction algorithms with the accuracies obtained by baseline well-known human-designed rule induction algorithms; (c) we discuss to what extent the automatically evolved algorithms differ from the manually-designed ones; (d) we investigate the system’s sensitivity to variations in the grammar; (e) we compare the effectiveness of genetic programming and hill-climbing search as different methods for searching in the space of candidate rule induction algorithms; and (f) we evaluate a multiobjective version of the proposed genetic programming system, based on the Pareto optimality concept. In the second part of this chapter the experiments evaluate the system’s effectiveness in producing rule induction algorithms tailored to a particular application domain. We first report the results of experiments evolving rule induction algorithms tailored to individual datasets of the well-known UCI (University of California at Irvine) dataset repository, and then report the results of experiments in more challenging datasets in the area of bioinformatics, involving protein function prediction. A brief overview of the problem of protein function prediction is included in this part of the chapter, to make it more self-contained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann (1998)
Google Scholar
Brodley, C., Friedl, M.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
MATH Google Scholar
Clare, A., King, R.D.: Machine learning of functional class from phenotype data. Bioinformatics 18(1), 160–166 (2002)
Article Google Scholar
Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Y. Kodratoff (ed.) Proc. of the European Working Session on Learning on Machine Learning (EWSL-91), pp. 151–163. Springer-Verlag, New York, NY, USA (1991)
Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: A. Prieditis, S. Russell (eds.) Proc. of the 12th Int. Conf. on Machine Learning (ICML-95), pp. 115–123. Morgan Kaufmann, Tahoe City, CA (1995)
Google Scholar
Correa, E.S., Freitas, A.A., Johnson, C.G.: A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set. In: Proc. of the Genetic and Evolutionary Computation Conf. (GECCO-06), pp. 35–42. ACM Press (2006)
Google Scholar
Fillmore, D.: It’s a GPCR world. Modern Drug Discovery 11(7), 24–28 (2004)
Google Scholar
Freitas, A.A., Wieser, D., Apweiler, R.: On the importance of comprehensible classification models for protein function prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics (in press)
Google Scholar
Higgs, P.G., Attwood, T.K.: Bioinformatics and Molecular Evolution. Blackwell (2005)
Google Scholar
Holden, N., Freitas, A.: Hierarchical classification of G-protein-coupled receptors with a PSO/ACO algorithm. In: Proc. of the IEEE Swarm Intelligence Symposium (SIS-06), pp. 77–84. IEEE Press (2006)
Google Scholar
J.He, Yao, X.: Towards an analytic framework for analyzing the computation time of evolutionary algorithms. Artificial Intelligence 145(1-2), 59–97 (2003)
Article MATH MathSciNet Google Scholar
Karwath, A., King, R.: Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 3(11), online publication (2002). http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=107726
Liu, H., Motoda, H. (eds.): Feature Selection for Knowledge Discovery and Data Mining. Kluwer (1998)
Google Scholar
Mirkin, B., Ritter, O.: A feature-based approach to discrimination and prediction of protein folding groups. In: Genomics and Proteomics, pp. 155–177. Springer (2000)
Google Scholar
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html (1998)
Pappa, G.L.: Automatically evolving rule induction algorithms with grammar-based genetic programming. Ph.D. thesis, Computing Laboratory, University of Kent, Canterbury, UK (2007)
Google Scholar
Pappa, G.L., Baines, A.J., Freitas, A.A.: Predicting post-synaptic activity in proteins with data mining. Bioinformatics 21(Suppl. 2), ii19–ii25 (2005)
Article Google Scholar
Pappa, G.L., Freitas, A.A.: Automatically evolving rule induction algorithms tailored to the prediction of postsynaptic activity in proteins. Intelligent Data Analysis 13(2), 243–259 (2009)
Google Scholar
Pappa, G.L., Freitas, A.A.: Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowledge and Information Systems 19(3), 283–309 (2009)
Article Google Scholar
Pazzani, M.J.: Knowledge discovery from data? IEEE Intelligent Systems 15(2), 10–13 (2000)
Article Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann (1993)
Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall (2002)
Google Scholar
Szafron, D., Lu, P., Greiner, R., Wishart, D., Poulin, B., Eisner, R., Lu, Z., Poulin, B., Anvik, J., Macdonnel, C.: Proteome analyst – transparent high-throughput protein annotation: function, localization and custom predictors. Nuclei Acids Research 32, W365–W371 (2004)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann (2005)
Google Scholar
Wong, M.L., Leung, K.S.: Data Mining Using Grammar-Based Genetic Programming and Applications. Kluwer, Norwell, MA, USA (2000)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Dr. Gisele L. Pappa
School of Computing, University of Kent, Canterbury, UK
Dr. Alex A. Freitas

Authors

Dr. Gisele L. Pappa
View author publications
You can also search for this author in PubMed Google Scholar
Dr. Alex A. Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gisele L. Pappa .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pappa, G.L., Freitas, A.A. (2010). Computational Results on the Automatic Design of Full Rule Induction Algorithms. In: Automating the Design of Data Mining Algorithms. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02541-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-02541-9_6
Published: 28 October 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02540-2
Online ISBN: 978-3-642-02541-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics