Advertisement

An Evolutionary Algorithm for Big Data Multi-Class Classification Problems

  • Michael F. Korns
Chapter
Part of the Genetic and Evolutionary Computation book series (GEVO)

Abstract

As symbolic regression (SR) has advanced into the early stages of commercial exploitation, the poor accuracy of SR still plagues even advanced commercial packages, and has become an issue for industrial users. Users expect a correct formula to be returned, especially in cases with zero noise and only one basis function with minimal complexity. At a minimum, users expect the response surface of the SR tool to be easily understood, so that the user can know a priori on what classes of problems to expect excellent, average, or poor accuracy. Poor or unknown accuracy is a hindrance to greater academic and industrial acceptance of SR tools. In several previous papers, we presented a complex algorithm for modern SR, which is extremely accurate for a large class of SR problems on noiseless data. Further research has shown that these extremely accurate SR algorithms also improve accuracy in noisy circumstances—albeit not extreme accuracy. Armed with these SR successes, we naively thought that achieving extreme accuracy applying GP to symbolic multi-class classification would be an easy goal. However, it seems algorithms having extreme accuracy in SR do not translate directly into symbolic multi-class classification. Furthermore, others have encountered serious issues applying GP to symbolic multi-class classification (Castelli et al. Applications of Evolutionary Computing, EvoApplications 2013: EvoCOMNET, EvoCOMPLEX, EvoENERGY, EvoFIN, EvoGAMES, EvoIASP, EvoINDUSTRY, EvoNUM, EvoPAR, EvoRISK, EvoROBOT, EvoSTOC, vol 7835, pp 334–343. Springer, Vienna, 2013). This is the first paper in a planned series developing the necessary algorithms for extreme accuracy in GP applied to symbolic multi-class classification. We develop an evolutionary algorithm for optimizing a single symbolic multi-class classification candidate. It is designed for big-data situations where the computational effort grows linearly as the number of features and training points increase. The algorithm’s behavior is demonstrated on theoretical problems, UCI benchmarks, and industry test cases.

Keywords

Genetic programming Symbolic classification Particle swarm Abstract expression grammar Grammar template genetic programming Genetic algorithms 

References

  1. 1.
    Castelli, M., Silva, S., Vanneschi, L., Cabral, A., Vasconcelos, M.J., Catarino, L., Carreiras, J.M.B.: Land cover/land use multiclass classification using gp with geometric semantic operators. In: Esparcia-Alcazar, A.I., Cioppa, A.D., De Falco, I., Tarantino, E., Cotta, C., Schaefer, R., Diwold, K., Glette, K., Tettamanzi, A., Agapitos, A., Burrelli, P., Merelo, J.J., Cagnoni, S., Zhang, M., Urquhart, N., Sim, K., Ekart, A., Fernandez de Vega, F., Silva, S., Haasdijk, E., Eiben, G., Simoes, A., Rohlfshagen, P. (eds.) Applications of Evolutionary Computing, EvoApplications 2013: EvoCOMNET, EvoCOMPLEX, EvoENERGY, EvoFIN, EvoGAMES, EvoIASP, EvoINDUSTRY, EvoNUM, EvoPAR, EvoRISK, EvoROBOT, EvoSTOC. Lecture Notes in Computer Sscienc, vol. 7835, pp. 334–343. Springer, Vienna (2013). https://doi.org/10.1007/978-3-642-37192-9_34 CrossRefGoogle Scholar
  2. 2.
    Gandomi, A.H., Alavi, A.H., Ryan, C. (eds.): Handbook of Genetic Programming Applications. Springer, Berlin (2015). https://doi.org/10.1007/978-3-319-20883-1 Google Scholar
  3. 3.
    Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., Garcia-Sanchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) 17th European Conference on Genetic Programming. Lecture Notes in Computer Science, vol. 8599, pp. 48–60. Springer, Granada (2014). https://doi.org/10.1007/978-3-662-44303-3_5 Google Scholar
  4. 4.
    Karaboga, D., Akay, B.: A survey: algorithms simulating bee swarm intelligence. Artif. Intell. Rev. 31(1–4), 61–85 (2009)CrossRefGoogle Scholar
  5. 5.
    Korns, M.F.: Abstract expression grammar symbolic regression. In: Riolo, R., McConaghy, T., Vladislavleva, E. (eds.) Genetic Programming Theory and Practice VIII. Genetic and Evolutionary Computation, vol. 8, chap. 7, pp. 109–128. Springer, Ann Arbor (2010). http://www.springer.com/computer/ai/book/978-1-4419-7746-5 Google Scholar
  6. 6.
    Korns, M.F.: Accuracy in symbolic regression. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, chap. 8, pp. 129–151. Springer, Ann Arbor (2011). https://doi.org/10.1007/978-1-4614-1770-5_8 Google Scholar
  7. 7.
    Korns, M.F.: A baseline symbolic regression algorithm. In: R. Riolo, E. Vladislavleva, M.D. Ritchie, J.H. Moore (eds.) Genetic Programming Theory and Practice X, Genetic and Evolutionary Computation, chap. 9, pp. 117–137. Springer, Ann Arbor (2012). https://doi.org/10.1007/978-1-4614-6846-2_9 Google Scholar
  8. 8.
    Korns, M.F.: Extreme accuracy in symbolic regression. In: Riolo, R., Moore, J.H., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XI, Genetic and Evolutionary Computation, chap. 1, pp. 1–30. Springer, Ann Arbor (2013). https://doi.org/10.1007/978-1-4939-0375-7_1 Google Scholar
  9. 9.
    Korns, M.F.: Extremely accurate symbolic regression for large feature problems. In: Riolo, R., Worzel, W.P., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XII, Genetic and Evolutionary Computation, pp. 109–131. Springer, Ann Arbor (2014). https://doi.org/10.1007/978-3-319-16030-6_7 Google Scholar
  10. 10.
    Korns, M.: Highly accurate symbolic regression with noisy training data. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, Genetic and Evolutionary Computation. Springer, Ann Arbor (2015). https://doi.org/10.1007/978-3-319-34223-8. http://www.springer.com/us/book/9783319342214 zbMATHGoogle Scholar
  11. 11.
    Kotanchek, M., Smits, G., Vladislavleva, E.: Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation, chap. 12, pp. 201–220. Springer, Ann Arbor (2007). https://doi.org/10.1007/978-0-387-76308-8_12. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.457.5272
  12. 12.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992). http://mitpress.mit.edu/books/genetic-programming zbMATHGoogle Scholar
  13. 13.
    Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA (1994). http://www.genetic-programming.org/gpbook2toc.html zbMATHGoogle Scholar
  14. 14.
    Koza, J.R., Andre, D., Bennett III, F.H., Keane, M.: Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufman (1999). http://www.genetic-programming.org/gpbook3toc.html
  15. 15.
    Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, Berlin (2002). https://doi.org/10.1007/978-3-662-04726-2. http://www.cs.ucl.ac.uk/staff/W.Langdon/FOGP/ CrossRefGoogle Scholar
  16. 16.
    McConaghy, T.: Ffx: Fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, chap. 13, pp. 235–260. Springer, Ann Arbor (2011). https://doi.org/10.1007/978-1-4614-1770-5_13. http://trent.st/content/2011-GPTP-FFX-paper.pdf Google Scholar
  17. 17.
    Nelder, J., Wedderburn, R.: Generalized linear models. Stat. Soc 135, 370–383CrossRefGoogle Scholar
  18. 18.
    Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report Microsoft Research Technical Report MSR-TR-98-14 (1998)Google Scholar
  19. 19.
    Poli, R., McPhee, N.F., Vanneschi, L.: Analysis of the effects of elitism on bloat in linear and tree-based genetic programming. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice VI, Genetic and Evolutionary Computation, chap. 7, pp. 91–111. Springer, Ann Arbor (2008). https://doi.org/10.1007/978-0-387-87623-8_7 Google Scholar
  20. 20.
    Smits, G., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.M., Yu, T., Riolo, R.L., Worzel, B. (eds.) Genetic Programming Theory and Practice II, chap. 17, pp. 283–299. Springer, Ann Arbor (2004). https://doi.org/10.1007/0-387-23254-0_17 Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Michael F. Korns
    • 1
  1. 1.Analytic Research FoundationHendersonUSA

Personalised recommendations