Abstract
Chapter 6 (entitled “Computational Results on the Automatic Design of Full Rule Induction Algorithms”), reports the results of extensive experiments performed to evaluate the proposed genetic programming system. This chapter is divided into two parts. In the first part the experiments evaluate the system’s effectiveness in producing rule induction algorithms robust across different application domains. In this part of the chapter we report results involving a number of issues, namely: (a) we investigate the system’s sensitivity to values of some parameters (like crossover and mutation rates, and the number of datasets used to “train” the system); (b) we compare the predictive accuracies obtained by the automatically evolved rule induction algorithms with the accuracies obtained by baseline well-known human-designed rule induction algorithms; (c) we discuss to what extent the automatically evolved algorithms differ from the manually-designed ones; (d) we investigate the system’s sensitivity to variations in the grammar; (e) we compare the effectiveness of genetic programming and hill-climbing search as different methods for searching in the space of candidate rule induction algorithms; and (f) we evaluate a multiobjective version of the proposed genetic programming system, based on the Pareto optimality concept. In the second part of this chapter the experiments evaluate the system’s effectiveness in producing rule induction algorithms tailored to a particular application domain. We first report the results of experiments evolving rule induction algorithms tailored to individual datasets of the well-known UCI (University of California at Irvine) dataset repository, and then report the results of experiments in more challenging datasets in the area of bioinformatics, involving protein function prediction. A brief overview of the problem of protein function prediction is included in this part of the chapter, to make it more self-contained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann (1998)
Brodley, C., Friedl, M.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
Clare, A., King, R.D.: Machine learning of functional class from phenotype data. Bioinformatics 18(1), 160–166 (2002)
Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Y. Kodratoff (ed.) Proc. of the European Working Session on Learning on Machine Learning (EWSL-91), pp. 151–163. Springer-Verlag, New York, NY, USA (1991)
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Cohen, W.W.: Fast effective rule induction. In: A. Prieditis, S. Russell (eds.) Proc. of the 12th Int. Conf. on Machine Learning (ICML-95), pp. 115–123. Morgan Kaufmann, Tahoe City, CA (1995)
Correa, E.S., Freitas, A.A., Johnson, C.G.: A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set. In: Proc. of the Genetic and Evolutionary Computation Conf. (GECCO-06), pp. 35–42. ACM Press (2006)
Fillmore, D.: It’s a GPCR world. Modern Drug Discovery 11(7), 24–28 (2004)
Freitas, A.A., Wieser, D., Apweiler, R.: On the importance of comprehensible classification models for protein function prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics (in press)
Higgs, P.G., Attwood, T.K.: Bioinformatics and Molecular Evolution. Blackwell (2005)
Holden, N., Freitas, A.: Hierarchical classification of G-protein-coupled receptors with a PSO/ACO algorithm. In: Proc. of the IEEE Swarm Intelligence Symposium (SIS-06), pp. 77–84. IEEE Press (2006)
J.He, Yao, X.: Towards an analytic framework for analyzing the computation time of evolutionary algorithms. Artificial Intelligence 145(1-2), 59–97 (2003)
Karwath, A., King, R.: Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 3(11), online publication (2002). http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=107726
Liu, H., Motoda, H. (eds.): Feature Selection for Knowledge Discovery and Data Mining. Kluwer (1998)
Mirkin, B., Ritter, O.: A feature-based approach to discrimination and prediction of protein folding groups. In: Genomics and Proteomics, pp. 155–177. Springer (2000)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html (1998)
Pappa, G.L.: Automatically evolving rule induction algorithms with grammar-based genetic programming. Ph.D. thesis, Computing Laboratory, University of Kent, Canterbury, UK (2007)
Pappa, G.L., Baines, A.J., Freitas, A.A.: Predicting post-synaptic activity in proteins with data mining. Bioinformatics 21(Suppl. 2), ii19–ii25 (2005)
Pappa, G.L., Freitas, A.A.: Automatically evolving rule induction algorithms tailored to the prediction of postsynaptic activity in proteins. Intelligent Data Analysis 13(2), 243–259 (2009)
Pappa, G.L., Freitas, A.A.: Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowledge and Information Systems 19(3), 283–309 (2009)
Pazzani, M.J.: Knowledge discovery from data? IEEE Intelligent Systems 15(2), 10–13 (2000)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann (1993)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall (2002)
Szafron, D., Lu, P., Greiner, R., Wishart, D., Poulin, B., Eisner, R., Lu, Z., Poulin, B., Anvik, J., Macdonnel, C.: Proteome analyst – transparent high-throughput protein annotation: function, localization and custom predictors. Nuclei Acids Research 32, W365–W371 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann (2005)
Wong, M.L., Leung, K.S.: Data Mining Using Grammar-Based Genetic Programming and Applications. Kluwer, Norwell, MA, USA (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pappa, G.L., Freitas, A.A. (2010). Computational Results on the Automatic Design of Full Rule Induction Algorithms. In: Automating the Design of Data Mining Algorithms. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02541-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-02541-9_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02540-2
Online ISBN: 978-3-642-02541-9
eBook Packages: Computer ScienceComputer Science (R0)