EuroGP 2017: Genetic Programming pp 114-130 | Cite as
RANSAC-GP: Dealing with Outliers in Symbolic Regression with Genetic Programming
Abstract
Genetic programming (GP) has been shown to be a powerful tool for automatic modeling and program induction. It is often used to solve difficult symbolic regression tasks, with many examples in real-world domains. However, the robustness of GP-based approaches has not been substantially studied. In particular, the present work deals with the issue of outliers, data in the training set that represent severe errors in the measuring process. In general, a datum is considered an outlier when it sharply deviates from the true behavior of the system of interest. GP practitioners know that such data points usually bias the search and produce inaccurate models. Therefore, this work presents a hybrid methodology based on the RAndom SAmpling Consensus (RANSAC) algorithm and GP, which we call RANSAC-GP. RANSAC is an approach to deal with outliers in parameter estimation problems, widely used in computer vision and related fields. On the other hand, this work presents the first application of RANSAC to symbolic regression with GP, with impressive results. The proposed algorithm is able to deal with extreme amounts of contamination in the training set, evolving highly accurate models even when the amount of outliers reaches 90%.
Keywords
Genetic programming RANSAC Robust regression OutliersNotes
Acknowledgments
First author was supported by CONACYT (México) scholarships No. 573397. This research was partially supported by CONACYT Basic Science Research Project No. 178323, CONACYT Fronteras de la Ciencia 2015-2 No. 944, as well as by FP7- Marie Curie-IRSES 2013 European Commission program with project ACoBSEC with contract No. 612689. Sara Silva acknowledges project PERSEIDS (PTDC/EMS-SIS/0642/2014) and BioISI RD unit, UID/MULTI/04046/2013, funded by FCT/MCTES/PIDDAC, Portugal.
References
- 1.Alfons, A., Croux, C., Gelper, S., et al.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Annals Appl. Stat. 7(1), 226–248 (2013)MathSciNetCrossRefMATHGoogle Scholar
- 2.Derpanis, K.G.: Overview of the RANSAC algorithm. Image Rochester NY 4(1), 2–3 (2010)Google Scholar
- 3.Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
- 4.Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MathSciNetMATHGoogle Scholar
- 5.Giloni, A., Padberg, M.: Least trimmed squares regression, least median squares regression, and mathematical programming. Math. Comput. Model. 35(9), 1043–1060 (2002)MathSciNetCrossRefMATHGoogle Scholar
- 6.Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37207-0_7 CrossRefGoogle Scholar
- 7.Hast, A., Nysjö, J., Marchetti, A.: Optimal RANSAC-towards a repeatable algorithm for finding the optimal set. J. WSCG 21(1), 21–30 (2013)Google Scholar
- 8.Kotanchek, M.E., Vladislavleva, E.Y., Smits, G.F.: Symbolic regression via genetic programming as a discovery engine: insights on outliers and prototypes. In: Riolo, R., O’Reilly, U.-M., McConaghy, T. (eds.) Genetic Programming Theory and Practice VII, pp. 55–72. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 9.La Cava, W., Spector, L., Danai, K.: Epsilon-lexicase selection for regression. In: GECCO 2016 Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 741–748. ACM, New York (2016)Google Scholar
- 10.Lacey, A., Pinitkarn, N., Thacker, N.A.: An evaluation of the performance of RANSAC algorithms for stereo camera calibrarion. In: BMVC, pp. 1–10 (2000)Google Scholar
- 11.Martínez, Y., Trujillo, L., Naredo, E., Legrand, P.: A comparison of fitness-case sampling methods for symbolic regression with genetic programming. In: Tantar, A.-A. (ed.) EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V. AISC, vol. 288, pp. 201–212. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-07494-8_14 Google Scholar
- 12.Martnez, Y., Naredo, E., Trujillo, L., Legrand, P., Lpez, U.: A comparison of fitness-case sampling methods for genetic programming. Journal of Experimental and Theoretical Artificial Intelligence (accepted to appear 2016)Google Scholar
- 13.McDermott, J., White, D.R., Luke, S., Manzoni, L., Castelli, M., Vanneschi, L., Jaskowski, W., Krawiec, K., Harper, R., De Jong, K., O’Reilly, U.M.: Genetic programming needs better benchmarks. In: GECCO 2012 Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, pp. 791–798. ACM, New York (2012)Google Scholar
- 14.Nunkesser, R., Morell, O.: An evolutionary algorithm for robust regression. Comput. Stat. & Data Anal. 54(12), 3242–3248 (2010)MathSciNetCrossRefMATHGoogle Scholar
- 15.Pearson, R.K.: Mining imperfect data: dealing with contamination and incomplete records. SIAM (2005)Google Scholar
- 16.Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)MathSciNetCrossRefMATHGoogle Scholar
- 17.Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: GECCO Companion 2012 Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 401–408. ACM (2012)Google Scholar
- 18.Tarsha-Kurdi, F., Landes, T., Grussenmeyer, P., et al.: Hough-transform and extended RANSAC algorithms for automatic detection of 3D building roof planes from Lidar data. In: Proceedings of the ISPRS Workshop on Laser Scanning. vol. 36, pp. 407–412 (2007)Google Scholar
- 19.Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)CrossRefGoogle Scholar
- 20.Zuliani, M.: RANSAC for Dummies. Vision Research Lab, University of California, Santa Barbara (2009)Google Scholar