Genetic Programming Representations for Multi-dimensional Feature Learning in Biomedical Classification

  • William La CavaEmail author
  • Sara Silva
  • Leonardo Vanneschi
  • Lee Spector
  • Jason Moore
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10199)


We present a new classification method that uses genetic programming (GP) to evolve feature transformations for a deterministic, distanced-based classifier. This method, called M4GP, differs from common approaches to classifier representation in GP in that it does not enforce arbitrary decision boundaries and it allows individuals to produce multiple outputs via a stack-based GP system. In comparison to typical methods of classification, M4GP can be advantageous in its ability to produce readable models. We conduct a comprehensive study of M4GP, first in comparison to other GP classifiers, and then in comparison to six common machine learning classifiers. We conduct full hyper-parameter optimization for all of the methods on a suite of 16 biomedical data sets, ranging in size and difficulty. The results indicate that M4GP outperforms other GP methods for classification. M4GP performs competitively with other machine learning methods in terms of the accuracy of the produced models for most problems. M4GP also exhibits the ability to detect epistatic interactions better than the other methods.


Genetic programming Feature learning Classification 



This work was supported by the Warren Center for Network and Data Science, as well as NIH grants P30-ES013508, AI116794 and LM009012. S. Silva acknowledges project PERSEIDS (PTDC/EMS-SIS/0642/2014) and BioISI RD unit, UID/MULTI/04046/2013, funded by FCT/MCTES/PIDDAC, Portugal. This material is based upon work supported by the National Science Foundation under Grants Nos. 1617087, 1129139 and 1331283. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation.


  1. 1.
    Arnaldo, I., O’Reilly, U.-M., Veeramachaneni, K.: Building predictive models via feature synthesis, pp. 983–990. ACM Press (2015)Google Scholar
  2. 2.
    Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM (2006)Google Scholar
  3. 3.
    Choi, W.-J.: Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images. Inf. Sci. 212, 57–78 (2012)CrossRefGoogle Scholar
  4. 4.
    dos Santos, J.A., Ferreira, C.D.: A relevance feedback method based on genetic programming for classification of remote sensing images. Inf. Sci. 181(13), 2671–2684 (2011)CrossRefGoogle Scholar
  5. 5.
    Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Appl. Rev. 40(2), 121–144 (2010)Google Scholar
  6. 6.
    Fang, Y., Li, J.: A review of tournament selection in genetic programming. In: Cai, Z., Hu, C., Kang, Z., Liu, Y. (eds.) ISICA 2010. LNCS, vol. 6382, pp. 181–192. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-16493-4_19CrossRefGoogle Scholar
  7. 7.
    Guyon, I.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  8. 8.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  9. 9.
    Helmuth, T., Spector, L., Matheson, J.: Solving uncompromising problems with lexicase selection. IEEE Trans. Evol. Comput. PP(99), 1 (2014)Google Scholar
  10. 10.
    Icke, I., Bongard, J.C.: Improving genetic programming based symbolic regression using deterministic machine learning. In: 2013 IEEE Congress on Evolutionary Computation (CEC), pp. 1763–1770. IEEE (2013)Google Scholar
  11. 11.
    Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 48–60. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44303-3_5Google Scholar
  12. 12.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  13. 13.
    Kishore, J.K.: Application of genetic programming for multicategory pattern classification. IEEE Trans. Evol. Comput. 4(3), 242–258 (2000)CrossRefGoogle Scholar
  14. 14.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  15. 15.
    Cava, L.: Inference of compact nonlinear dynamic models by epigenetic local search. Eng. Appl. Artif. Intell. 55, 292–306 (2016)CrossRefGoogle Scholar
  16. 16.
    La Cava, W., Spector, L., Danai, K.: Epsilon-lexicase selection for regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 741–748. ACM, New York (2016)Google Scholar
  17. 17.
    Li, T.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)CrossRefGoogle Scholar
  18. 18.
    Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2013)Google Scholar
  19. 19.
    Liu, H.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRefGoogle Scholar
  20. 20.
    Liu, L.: Evolutionary compact embedding for large-scale image classification. Inf. Sci. 316, 567–581 (2015)CrossRefGoogle Scholar
  21. 21.
    Loveard, T., Ciesielski, V.: Representing classification problems in genetic programming. In: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 2, pp. 1070–1077. IEEE (2001)Google Scholar
  22. 22.
    McConaghy, T.: FFX fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, pp. 235–260. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  23. 23.
    Melin, P.: A new neural network model based on the LVQ algorithm for multi-class classification of arrhythmias. Inf. Sci. 279, 483–497 (2014)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56(1–3), 73–82 (2003)CrossRefGoogle Scholar
  25. 25.
    Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4), 445–455 (2010)CrossRefGoogle Scholar
  26. 26.
    Moore, J.H., Greene, C.S., Hill, D.P.: Identification of novel genetic models of glaucoma using the emergent genetic programming-based artificial intelligence system. In: Riolo, R., Worzel, W.P., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XII, pp. 17–35. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  27. 27.
    Muñoz, L., Silva, S., Trujillo, L.: M3GP Multiclass Classification with GP. In: Genetic Programming, pp. 78–91. Springer, Heidelberg (2015)Google Scholar
  28. 28.
    Murphy, K.P.: Machine learning: a probabilistic perspective. a probabilistic perspective. Adaptive computation. MIT Press, Cambridge (2012)zbMATHGoogle Scholar
  29. 29.
    Nguyen, T.: Hidden Markov models for cancer classification using gene expression profiles. Inf. Sci. 316, 293–307 (2015)CrossRefGoogle Scholar
  30. 30.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Perkis, T.: Stack-based genetic programming. In: Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, pp. 148–153. IEEE (1994)Google Scholar
  32. 32.
    Poli, R.: A field guide to genetic programming. Lulu Press, Raleigh (2008). [S.I.]. http://www.lulu.comGoogle Scholar
  33. 33.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)Google Scholar
  34. 34.
    Silva, S., Muñoz, L., Trujillo, L., Ingalalli, V., Castelli, M., Vanneschi, L.: Multiclass classificatin through multidimensional clustering. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, vol. 13. Springer, Ann Arbor (2015)Google Scholar
  35. 35.
    Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 401–408 (2012)Google Scholar
  36. 36.
    Tibshirani, R.: Diagnosis of multiple cancer types by Shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)CrossRefGoogle Scholar
  37. 37.
    Urbanowicz, R.J., Kiralis, J., Sinnott-Armstrong, N.A., Heberling, T., Fisher, J.M., Moore, J.H.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 5(1), 1 (2012)CrossRefGoogle Scholar
  38. 38.
    USGS. U.S. geological survey (USGS) earth resources observation systems (EROS) data center (EDC)Google Scholar
  39. 39.
    Vanneschi, L.: Classification of oncologic data with genetic programming. J. Artif. Evol. Appl. 1–13, 1–13 (2009)Google Scholar
  40. 40.
    Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • William La Cava
    • 1
    Email author
  • Sara Silva
    • 2
    • 3
  • Leonardo Vanneschi
    • 4
  • Lee Spector
    • 5
  • Jason Moore
    • 1
  1. 1.Institute for Biomedical InformaticsUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Faculdade de Ciências, Departamento de Informática, BioISI - Biosystems and Integrative Sciences InstituteUniversidade de LisboaLisboaPortugal
  3. 3.CISUC, Department of Informatics EngineeringUniversity of CoimbraCoimbraPortugal
  4. 4.NOVA IMSUniversidade Nova de LisboaLisbonPortugal
  5. 5.School of Cognitive ScienceHampshire CollegeAmherstUSA

Personalised recommendations