Strong Typing, Swarm Enhancement, and Deep Learning Feature Selection in the Pursuit of Symbolic Regression-Classification
Symbolic Classification (SC), an offshoot of Genetic Programming (GP), can play an important role in any well rounded predictive analytics tool kit, especially because of its so called “WhiteBox” properties. Recently, algorithms were developed to push SC to the level of basic classification accuracy competitive with existing commercially available classification tools, including the introduction of GP assisted Linear Discriminant Analysis (LDA). In this paper we add a number of important enhancements to our basic SC system and demonstrate their accuracy improvements on a set of theoretical problems and on a banking industry problem. We enhance GP assisted linear discriminant analysis with a modified version of Platt’s Sequential Minimal Optimization algorithm which we call (MSMO), and with swarm optimization techniques. We add a user-defined typing system, and we add deep learning feature selection to our basic SC system. This extended algorithm (LDA++) is highly competitive with the best commercially available M-Class classification techniques on both a set of theoretical problems and on a real world banking industry problem. This new LDA++ algorithm moves genetic programming classification solidly into the top rank of commercially available classification tools.
Our thanks to: Thomas May from Lantern Credit for assisting with the KNIME Learner training/scoring on all ten artificial classification problems.
- 2.Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Human Genetics 7, 179–188 (1936)Google Scholar
- 4.Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: European Conference on Genetic Programming 2014, pp. 48–60. Springer (2014)Google Scholar
- 6.Korns, M.F.: A baseline symbolic regression algorithm. In: Genetic Programming Theory and Practice X. Springer (2012)Google Scholar
- 7.Korns, M.F.: Extreme accuracy in symbolic regression. In: Genetic Programming Theory and Practice XI, pp. 1–30. Springer (2014)Google Scholar
- 8.Korns, M.F.: Highly accurate symbolic regression with noisy training data. In: Genetic Programming Theory and Practice XIII, pp. 91–115. Springer (2016)Google Scholar
- 9.Korns, M.F.: An evolutionary algorithm for big data multiclass classification problems. In: Genetic Programming Theory and Practice XIV. Springer (2017)Google Scholar
- 10.Korns, M.F.: Evolutionary linear discriminant analysis for multiclass classification problems. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 233–234. ACM (2017)Google Scholar
- 11.Korns, M.F.: Genetic programming symbolic classification: A study. In: Genetic Programming Theory and Practice XV, pp. 39–52. Springer (2017)Google Scholar
- 12.McLachlan, G.: Discriminant analysis and statistical pattern recognition, vol. 544. John Wiley & Sons (2004)Google Scholar
- 13.Munoz, L., Silva, S., Trujillo, L.: M3gp–multiclass classification with gp. In: European Conference on Genetic Programming 2015, pp. 78–91. Springer (2015)Google Scholar
- 14.Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Tech. Rep. MSR-TR-98-14, Microsoft Research (1998)Google Scholar