A Genetic Programming Approach Applied to Feature Selection from Medical Data

  • José A. Castellanos-GarzónEmail author
  • Juan Ramos
  • Yeray Mezquita Martín
  • Juan F. de Paz
  • Ernesto Costa
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 803)


Genetic programming represents a flexible and powerful evolutionary technique in machine learning. The use of genetic programming for rule induction has generated interesting results in classification problems. This paper proposes an evolutionary approach for logical rule induction, which is applied to clinical data. Since logical rules disclose knowledge from the analyzed data, we use such a knowledge to filter features from the target dataset. The results reached by the used dataset have been very promising when used in classification tasks and compared with other methods.


Medical data Feature selection Genetic programming Machine learning Data mining Evolutionary computation 



This work has been carried out under the iCIS project (CENTRO-07-ST24-FEDER-002003), which has been co-financed by QREN, in the scope of the Mais Centro Program and European Union’s FEDER.

This work has also been partially supported by the Interreg V-A Spain-Portugal Program (PocTep) and the European Regional Development Fund (ERDF) under the IOTEC project (grant 0123_IOTEC_3_E).

The research of Juan Ramos González has been co-financed by the European Social Fund and Junta de Castilla y Len (Operational Programme 2014–2020 for Castilla y Len, BOCYL EDU/602/2016).


  1. 1.
    Bandyopadhyay, S., Pal, S.K.: Classification and Learning Using Genetic Algorithms: Applications in Bioinformatics and Web Intelligence. Natural Computing Series. Springer, Heidelberg (2007). Scholar
  2. 2.
    Bonelli, P., Parodi, A.: An efficient classifier system and its experimental comparison with two representative learning methods on three medical domains. In: Proceedings of the 4th International Conference on Genetic Algorithms (ICGA), pp. 288–295 (1991)Google Scholar
  3. 3.
    Hong, J.H., Cho, S.B.: The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med. 36, 43–58 (2006)CrossRefGoogle Scholar
  4. 4.
    Kumar, T.P., Iba, H.: Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinf. 6, 353–367 (2009)CrossRefGoogle Scholar
  5. 5.
    Kumar, R., Verma, R.: Classification rule discovery for diabetes patients by using genetic programming. Int. J. Soft Comput. Eng. (IJSCE) 2, 183–185 (2012)Google Scholar
  6. 6.
    Larraaga, P., et al.: Machine learning in bioinformatics. Briefings Bioinf. 7, 86–112 (2006)CrossRefGoogle Scholar
  7. 7.
    Liu, K.H., Xu, C.G.: A genetic programming-based approach to the classification of multiclass microarray datasets. Bioinformatics 25, 331–337 (2009)CrossRefGoogle Scholar
  8. 8.
    Maulik, U., Bandyopadhyay, S., Mukhopadhyay, A.: Multiobjective Genetic Algorithms for Clustering: Applications in Data Mining and Bioinformatics. Springer, Heidelberg (2011). Scholar
  9. 9.
    Pea-Reyes, C.A., Sipper, M.: Evolutionary computation in medicine: an overview. Artif. Intell. Med. 19, 1–23 (2000)CrossRefGoogle Scholar
  10. 10.
    Podgorelec, V., Kokol, P., Stiglic, M.M., Hericko, M., Rozrnan, I.: Knowledge discovery with classification rules in a cardiovascular dataset. Comput. Methods Program. Biomed. 1, 539–549 (2005)Google Scholar
  11. 11.
    Soni, J., Ansari, U., Sharma, D., Soni, S.: Intelligent and effective heart disease prediction system using weighted associative classifiers. Int. J. Comput. Sci. Eng. (IJCSE) 3, 2385–2392 (2011)Google Scholar
  12. 12.
    Tsakonas, A., Dounias, G., Jantzen, J., Axer, H., Bjerregaard, B., von Keyserlingk, D.G.: Evolving rule-based systems in two medical domains using genetic programming. Artif. Intell. Med. 32, 195–216 (2004)CrossRefGoogle Scholar
  13. 13.
    Vargas, C.M.B., Chidambaram, C., Hembecker, F., Silvério, H.L.: A comparative study of machine learning and evolutionary computation approaches for protein secondary structure classification. In: Computational Biology and Applied Bioinformatics, pp. 239–258. InTech (2011)Google Scholar
  14. 14.
    Wolberg, W.H., Mangasarian, O.L.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 87, 9193–9196 (1990)CrossRefGoogle Scholar
  15. 15.
    Lucas, P.: Analysis of notions of diagnosis. Artif. Intell. 12(105), 295–343 (1998)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Lucas, P.: Prognostic methods in medicine. Artif. Intell. 15, 105–119 (1999)CrossRefGoogle Scholar
  17. 17.
    Ramos, J., Castellanos-Garzón, J.A., González-Briones, A., de Paz, J.F., Corchado, J.M.: An agent-based clustering approach for gene selection in gene expression microarray. Interdiscip. Sci. Comput. Life Sci. 9, 1–13 (2017)CrossRefGoogle Scholar
  18. 18.
    Castellanos-Garzón, J.A., Ramos, J., González-Briones, A., de Paz, J.F.: A clustering-based method for gene selection to classify tissue samples in lung cancer. In: Saberi Mohamad, M., Rocha, M., Fdez-Riverola, F., Domínguez Mayo, F., De Paz, J. (eds.) PACBB 2016. AISC, vol. 477, pp. 99–107. Springer, Cham (2016). Scholar
  19. 19.
    Castellanos-Garzón, J.A., Ramos, J.: A gene selection approach based on clustering for classification tasks in colon cancer. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 4(3), 1–10 (2015)Google Scholar
  20. 20.
    González-Briones, A., Ramos, J., De Paz, J.F.: A drug identification system for intoxicated drivers based on a systematic review. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 4(4), 83–101 (2015)Google Scholar
  21. 21.
    Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)CrossRefGoogle Scholar
  22. 22.
    Pappa, G.L., Freitas, A.A.: Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl. Inf. Syst. 19(3), 283–309 (2009)CrossRefGoogle Scholar
  23. 23.
    Alcalá-Fdez, J., et al.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft. Comput. 13, 307–318 (2009)CrossRefGoogle Scholar
  24. 24.
    Fernández, A., García, S., Luengo, J., Bernadó-Mansilla, E., Herrera, F.: Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans. Evol. Comput. 14(6), 913–941 (2010)CrossRefGoogle Scholar
  25. 25.
    Oyebode, O.K., Adeyemo, J.A.: Genetic programming: principles, applications and opportunities for hydrological modelling. World Acad. Sci. Eng. Technol. Int. J. Environ. Ecol. Geol. Min. Eng. 8, 310–316 (2014)Google Scholar
  26. 26.
    Freitas, A.A.: A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh, A., Tsutsui, S. (eds.) Advances in Evolutionary Computation, pp. 819–845. Springer, Heidelberg (2002)Google Scholar
  27. 27.
    Freitas, A.A.: A review of evolutionary algorithms for data mining. In: Maimon, O., Rokach, L. (eds.) Soft Computing for Knowledge Discovery and Data Mining, Part II, pp. 79–111. Springer, Boston (2008). Scholar
  28. 28.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  29. 29.
    Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, Cambridge (2012)CrossRefGoogle Scholar
  30. 30.
    Pappa, G.L., Freitas, A.A.: Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach. Springer, Heidelberg (2010). Scholar
  31. 31.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning, Tools and Techniques, 3rd edn. Elsevier Inc., Waltham (2011)Google Scholar
  32. 32.
    Bacardit, J., Goldberg, D.E., Butz, M.V.: Improving the performance of a pittsburgh learning classifier system using a default rule. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003-2005. LNCS (LNAI), vol. 4399, pp. 291–307. Springer, Heidelberg (2007). Scholar
  33. 33.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)zbMATHGoogle Scholar
  34. 34.
    Blake, C., Merz, C.: Repository of machine learning databases (UCI). Center for Machine Learning and Intelligent Systems (1998)Google Scholar
  35. 35.
    Kononenko, I., Simec, E., Robnik-Sikonja, M.: Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 7(1), 39–55 (1997)CrossRefGoogle Scholar
  36. 36.
    Li, J., Wong, L.: Using rules to analyse bio-medical data: a comparison between C4.5 and PCL. In: Dong, G., Tang, C., Wang, W. (eds.) WAIM 2003. LNCS, vol. 2762, pp. 254–265. Springer, Heidelberg (2003). Scholar
  37. 37.
    Zhou, Z.H., Jiang, Y.: NeC4.5: neural ensemble based C4.5. IEEE Trans. Knowl. Data Eng. 16(6), 770–773 (2004)CrossRefGoogle Scholar
  38. 38.
    Smirnov, E., Sprinkhuizen-Kuyper, I.G., Nalbantis, I.: Unanimous voting using support vector machines. Technical report, ERIM and Universiteit Rotterdam, IKAT, Universiteit Maastricht (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • José A. Castellanos-Garzón
    • 1
    • 2
    Email author
  • Juan Ramos
    • 1
  • Yeray Mezquita Martín
    • 1
  • Juan F. de Paz
    • 1
  • Ernesto Costa
    • 2
  1. 1.IBSAL/BISITE Research Group, Edificio I+D+i USALUniversity of SalamancaSalamancaSpain
  2. 2.CISUC, ECOS Research Group, Pólo II - Pinhal de MarrocosUniversity of CoimbraCoimbraPortugal

Personalised recommendations