Integrating memetic search into the BioHEL evolutionary learning system for large-scale datasets

Abstract

Local search methods are widely used to improve the performance of evolutionary computation algorithms in all kinds of domains. Employing advanced and efficient exploration mechanisms becomes crucial in complex and very large (in terms of search space) problems, such as when employing evolutionary algorithms to large-scale data mining tasks. Recently, the GAssist Pittsburgh evolutionary learning system was extended with memetic operators for discrete representations that use information from the supervised learning process to heuristically edit classification rules and rule sets. In this paper we first adapt some of these operators to BioHEL, a different evolutionary learning system applying the iterative learning approach, and afterwards propose versions of these operators designed for continuous attributes and for dealing with noise. The performance of all these operators and their combination is extensively evaluated on a broad range of synthetic large-scale datasets to identify the settings that present the best balance between efficiency and accuracy. Finally, the identified best configurations are compared with other classes of machine learning methods on both synthetic and real-world large-scale datasets and show very competent performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    Briefly described in the next subsection.

References

  1. 1.

    Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona

  2. 2.

    Bacardit J, Burke EK, Krasnogor N (2009) Improving the scalability of rule-based evolutionary learning. Memet Comput 1(1): 55–67

    Google Scholar 

  3. 3.

    Bacardit J, Goldberg DE, Butz MV, Llorà X, Garrell JM (2004) Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Parallel problem solving from nature, PPSN 2004. Springer, LNCS 3242, pp 1021–1031

  4. 4.

    Bacardit J, Krasnogor N (2008) Empirical evaluation of ensemble techniques for a pittsburgh learning classifier system. In: Learning classifier systems, Lecture Notes in Computer Science. Springer, vol. 4998. Berlin, pp 255–268

  5. 5.

    Bacardit J, Krasnogor N (2009) A mixed discrete-continuous attribute list representation for large scale classification domains. In: GECCO ’09: proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 1155–1162.

  6. 6.

    Bacardit Jaume, Krasnogor Natalio (2009) Performance and efficiency of memetic pittsburgh learning classifier systems. Evol Comput J 17(3):307–342

    Article  Google Scholar 

  7. 7.

    Bacardit J, Widera P, Márquez-Chamorro A, Divina F, Aguilar-Ruiz Jesús S, Krasnogor N (2012) Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics

  8. 8.

    Bassel GW, Glaab E, Marquez J, Holdsworth MJ, Bacardit J (2011) Functional network construction in arabidopsis using rule-based machine learning on large-scale data sets. Plant Cell Online 23(9):3101–3116

    Google Scholar 

  9. 9.

    Butz MV (2004) Rule-based evolutionary online learning systems: learning bounds, classification, and prediction. PhD thesis, Champaign (AAI3153259)

  10. 10.

    De Jong KA, Spears WM (1991) Learning concept classification rules using genetic algorithms. In: Proceedings of the international joint conference on artificial intelligence. Morgan Kaufmann, pp 651–656

  11. 11.

    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    Google Scholar 

  12. 12.

    Dorigo M, Stützle T (2004) And colony optimization. The MIT Press, Cambridge

  13. 13.

    Fernández A, García S, Luengo J, Bernadó-Mansilla E, Herrera F (2010) Genetics-based machine learning for rule induction: state of the art and taxonomy and comparative study. IEEE Trans Evol Comput 14(6):913–941

    Google Scholar 

  14. 14.

    Franco MA, Krasnogor N, Bacardit J (2010) Speeding up the evaluation of evolutionary learning systems using gpgpus. In: Proceedings of the 12th annual conference on genetic and evolutionary computation, GECCO ’10. ACM, New York, pp 1039–1046

  15. 15.

    Franco MA, Krasnogor N, Bacardit J (2012) Analysing biohel using challenging boolean functions. Evol Intell 5(2):87–102

    Google Scholar 

  16. 16.

    Franco MA, Krasnogor N, Bacardit J (2012) Post-processing operators for decision lists. In: Proceedings of the fourteenth international conference on genetic and evolutionary computation conference, GECCO ’12. ACM, New York, pp 847–854

  17. 17.

    Franco MA, Krasnogor N, Bacardit J (2012) Post-processing operators for decision lists. In: Proceedings of the fourteenth international conference on genetic and evolutionary computation conference, GECCO ’12, Philadelphia, p 847

  18. 18.

    Frank A, Asuncion A (2010) UCI machine learning repository

  19. 19.

    García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694

    Google Scholar 

  20. 20.

    Grefenstette JJ (1991) Lamarckian learning in multi-agent environments. In: Belew R, Booker L (eds) Proceedings of the fourth international conference on genetic algorithms. Morgan Kaufman, San Mateo, pp 303–310

  21. 21.

    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18

    Google Scholar 

  22. 22.

    Harik G (1999) Linkage learning via probabilistic modeling in the ecga. Technical Report 99010, Illinois Genetic Algorithms Lab, University of Illinois at Urbana-Champaign

  23. 23.

    Harik G, Lobo FG, Goldberg DE (1999) The compact genetic algorithm. IEEE-EC 3(4):287

    Google Scholar 

  24. 24.

    Kearns MJ, Vazirani UV (1994) Vazirani. An introduction to computational learning theory. MIT Press, Cambridge

  25. 25.

    Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, vol 4, pp 1942–1948

  26. 26.

    Koza JR (1992) Genetic programming. The MIT Press, Cambridge

  27. 27.

    Krasnogor N, Smith J (2005) A tutorial for competent memetic algorithms: model, taxonomy, and design issues. IEEE Trans Evol Comput 9(5):474–488

    Google Scholar 

  28. 28.

    Larrañaga P, Lozano JA (2002) Estimation of distribution algorithms. Kluwer Academic, Dordrecht

  29. 29.

    Llorà X, Priya A, Bhargava R (2009) Observer-invariant histopathology using genetics-based machine learning. Nat Comput 8:101–120. doi:10.1007/s11047-007-9056-6

    Google Scholar 

  30. 30.

    Llorà X, Sastry K, Goldberg DE (2005) The compact classifier system: scalability analysis and first results. In: Proceedings of the congress on evolutionary computation, vol 1. IEEE Press, pp 596–603

  31. 31.

    Llorà X, Sastry K, Lima CF, Lobo FG, Goldberg DE (2008) Linkage learning, rule representation, and the X-ray extended compact classifier system. In: Learning classifier systems. Revised Selected Papers of IWLCS 2006–2007, LNAI 4998. Springer, Berlin, pp 189–205

  32. 32.

    Pelikan M, Goldberg DE, Cantú-Paz E (1999) BOA: the Bayesian optimization algorithm. In: Proceedings of the genetic and evolutionary computation conference GECCO-99, vol I. Morgan Kaufmann, pp 525–532

  33. 33.

    Venturini G; SIA (1993) A supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil PB (ed) ECML-93, Proceedings of the European conference on machine learning. Springer, Berlin, pp 280–296

  34. 34.

    Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175

    Google Scholar 

  35. 35.

    Wyatt D, Bull L (2004) A memetic learning classifier system for describing continuous-valued problem spaces. In: Recent advances in memetic algorithms. Springer, New York, pp 355–396

Download references

Acknowledgments

We acknowledge the support of the UK Engineering and Physical Sciences Research Council (EPSRC) under grant EP/H016597/1. We are grateful for the use of the University of Nottingham’s High Performance Computing Facility.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jaume Bacardit.

Appendix

Appendix

See Tables 16, 17, 18 and 19.

Table 16 ES1: Full cross-validation accuracy results on the Checkerboard datasets
Table 17 ES1: Full run-time (in s) results on the Checkerboard datasets
Table 18 ES2: Full training accuracy results on the large-scale synthetic datasets
Table 19 ES2: Full run-time (in s) results on the large-scale synthetic datasets

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Calian, D.A., Bacardit, J. Integrating memetic search into the BioHEL evolutionary learning system for large-scale datasets. Memetic Comp. 5, 95–130 (2013). https://doi.org/10.1007/s12293-013-0108-4

Download citation

Keywords

  • Memetic algorithms
  • Evolutionary algorithms
  • Evolutionary rule learning