M3GP – Multiclass Classification with GP

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9025)

Abstract

Data classification is one of the most ubiquitous machine learning tasks in science and engineering. However, Genetic Programming is still not a popular classification methodology, partially due to its poor performance in multiclass problems. The recently proposed M2GP - Multidimensional Multiclass Genetic Programming algorithm achieved promising results in this area, by evolving mappings of the \(p\)-dimensional data into a \(d\)-dimensional space, and applying a minimum Mahalanobis distance classifier. Despite good performance, M2GP employs a greedy strategy to set the number of dimensions \(d\) for the transformed data, and fixes it at the start of the search, an approach that is prone to locally optimal solutions. This work presents the M3GP algorithm, that stands for M2GP with multidimensional populations. M3GP extends M2GP by allowing the search process to progressively search for the optimal number of new dimensions \(d\) that maximize the classification accuracy. Experimental results show that M3GP can automatically determine a good value for \(d\) depending on the problem, and achieves excellent performance when compared to state-of-the-art-methods like Random Forests, Random Subspaces and Multilayer Perceptron on several benchmark and real-world problems.

Keywords

Genetic programming Classification Multiple classes Multidimensional clustering 

Notes

Acknowledgments

This work was partially supported by FCT funds (Portugal) under contract UID/Multi/04046/2013 and projects PTDC/EEI-CTP/2975/2012 (MaSSGP), PTDC/DTP-FTO/1747/2012 (InteleGen) and EXPL/EMS-SIS/1954/2013 (CancerSys). Funding was also provided by CONACYT (Mexico) Basic Science Research Project No. 178323, DGEST (Mexico) Research Projects No. 5149.13-P and 5414.11-P, and FP7-Marie Curie-IRSES 2013 project ACoBSEC. Finally, the first author is supported by scholarship No. 372126 from CONACYT.

References

  1. 1.
    Alcala-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garcia, S., Sanchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  2. 2.
    Bache, K., Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml. Accessed 26 January 2015
  3. 3.
    Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. Trans. Sys. Man Cyber Part C 40(2), 121–144 (2010)CrossRefGoogle Scholar
  4. 4.
    Haynes, T.: ollective adaptation: the exchange of coding segments. Evol. Comput. 6(4), 311–338 (1998). http://dx.doi.org/10.1162/evco.1998.6.4.311 CrossRefGoogle Scholar
  5. 5.
    Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., et al. (eds.) 17th European Conference on Genetic Programming. LNCS, vol. 8599, pp. 48–60. Springer, Granada (2014)CrossRefGoogle Scholar
  6. 6.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection, vol. 1. MIT press, Cambridge (1992)MATHGoogle Scholar
  7. 7.
    Koza, J.R.: Human-competitive results produced by genetic programming. Genet. Program. Evol. Mach. 11(3–4), 251–284 (2010)CrossRefGoogle Scholar
  8. 8.
    Lin, J.Y., Ke, H.R., Chien, B.C., Yang, W.P.: Designing a classifier by a layered multi-population genetic programming approach. Pattern Recogn. 40(8), 2211–2225 (2007)CrossRefMATHGoogle Scholar
  9. 9.
    Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of GECCO-2002, pp. 829–836. Morgan Kaufmann Publishers (2002)Google Scholar
  10. 10.
    Poli, R., Langdon, W.B., Mcphee, N.F.: A field guide to genetic programming. Lulu.com (Mar 2008)Google Scholar
  11. 11.
    U.S. Geological Survey (USGS): Earth resources observation systems (EROS) data center (EDC) (2015). http://glovis.usgs.gov/. Accessed 26 January 2015
  12. 12.
    Zhang, Y., Rockett, P.I.: A generic multi-dimensional feature extraction method using multiobjective genetic programming. Evol. Comput. 17(1), 89–115 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Tree-Lab, Posgrado En Ciencias de la IngenieríaInstituto Tecnológico de TijuanaTijuanaMexico
  2. 2.BioISI – Biosystems and Integrative Sciences Institute, Faculty of SciencesUniversity of LisbonLisbonPortugal
  3. 3.NOVA IMSUniversidade Nova de LisboaLisboaPortugal
  4. 4.CISUC, Department of Informatics EngineeringUniversity of CoimbraCoimbraPortugal

Personalised recommendations