M3GP – Multiclass Classification with GP
Data classification is one of the most ubiquitous machine learning tasks in science and engineering. However, Genetic Programming is still not a popular classification methodology, partially due to its poor performance in multiclass problems. The recently proposed M2GP - Multidimensional Multiclass Genetic Programming algorithm achieved promising results in this area, by evolving mappings of the \(p\)-dimensional data into a \(d\)-dimensional space, and applying a minimum Mahalanobis distance classifier. Despite good performance, M2GP employs a greedy strategy to set the number of dimensions \(d\) for the transformed data, and fixes it at the start of the search, an approach that is prone to locally optimal solutions. This work presents the M3GP algorithm, that stands for M2GP with multidimensional populations. M3GP extends M2GP by allowing the search process to progressively search for the optimal number of new dimensions \(d\) that maximize the classification accuracy. Experimental results show that M3GP can automatically determine a good value for \(d\) depending on the problem, and achieves excellent performance when compared to state-of-the-art-methods like Random Forests, Random Subspaces and Multilayer Perceptron on several benchmark and real-world problems.
KeywordsGenetic programming Classification Multiple classes Multidimensional clustering
This work was partially supported by FCT funds (Portugal) under contract UID/Multi/04046/2013 and projects PTDC/EEI-CTP/2975/2012 (MaSSGP), PTDC/DTP-FTO/1747/2012 (InteleGen) and EXPL/EMS-SIS/1954/2013 (CancerSys). Funding was also provided by CONACYT (Mexico) Basic Science Research Project No. 178323, DGEST (Mexico) Research Projects No. 5149.13-P and 5414.11-P, and FP7-Marie Curie-IRSES 2013 project ACoBSEC. Finally, the first author is supported by scholarship No. 372126 from CONACYT.
- 1.Alcala-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garcia, S., Sanchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
- 2.Bache, K., Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml. Accessed 26 January 2015
- 9.Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of GECCO-2002, pp. 829–836. Morgan Kaufmann Publishers (2002)Google Scholar
- 10.Poli, R., Langdon, W.B., Mcphee, N.F.: A field guide to genetic programming. Lulu.com (Mar 2008)Google Scholar
- 11.U.S. Geological Survey (USGS): Earth resources observation systems (EROS) data center (EDC) (2015). http://glovis.usgs.gov/. Accessed 26 January 2015