RANSAC-GP: Dealing with Outliers in Symbolic Regression with Genetic Programming

  • Uriel López
  • Leonardo Trujillo
  • Yuliana Martinez
  • Pierrick Legrand
  • Enrique Naredo
  • Sara Silva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10196)


Genetic programming (GP) has been shown to be a powerful tool for automatic modeling and program induction. It is often used to solve difficult symbolic regression tasks, with many examples in real-world domains. However, the robustness of GP-based approaches has not been substantially studied. In particular, the present work deals with the issue of outliers, data in the training set that represent severe errors in the measuring process. In general, a datum is considered an outlier when it sharply deviates from the true behavior of the system of interest. GP practitioners know that such data points usually bias the search and produce inaccurate models. Therefore, this work presents a hybrid methodology based on the RAndom SAmpling Consensus (RANSAC) algorithm and GP, which we call RANSAC-GP. RANSAC is an approach to deal with outliers in parameter estimation problems, widely used in computer vision and related fields. On the other hand, this work presents the first application of RANSAC to symbolic regression with GP, with impressive results. The proposed algorithm is able to deal with extreme amounts of contamination in the training set, evolving highly accurate models even when the amount of outliers reaches 90%.


Genetic programming RANSAC Robust regression Outliers 



First author was supported by CONACYT (México) scholarships No. 573397. This research was partially supported by CONACYT Basic Science Research Project No. 178323, CONACYT Fronteras de la Ciencia 2015-2 No. 944, as well as by FP7- Marie Curie-IRSES 2013 European Commission program with project ACoBSEC with contract No. 612689. Sara Silva acknowledges project PERSEIDS (PTDC/EMS-SIS/0642/2014) and BioISI RD unit, UID/MULTI/04046/2013, funded by FCT/MCTES/PIDDAC, Portugal.


  1. 1.
    Alfons, A., Croux, C., Gelper, S., et al.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Annals Appl. Stat. 7(1), 226–248 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Derpanis, K.G.: Overview of the RANSAC algorithm. Image Rochester NY 4(1), 2–3 (2010)Google Scholar
  3. 3.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Giloni, A., Padberg, M.: Least trimmed squares regression, least median squares regression, and mathematical programming. Math. Comput. Model. 35(9), 1043–1060 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37207-0_7 CrossRefGoogle Scholar
  7. 7.
    Hast, A., Nysjö, J., Marchetti, A.: Optimal RANSAC-towards a repeatable algorithm for finding the optimal set. J. WSCG 21(1), 21–30 (2013)Google Scholar
  8. 8.
    Kotanchek, M.E., Vladislavleva, E.Y., Smits, G.F.: Symbolic regression via genetic programming as a discovery engine: insights on outliers and prototypes. In: Riolo, R., O’Reilly, U.-M., McConaghy, T. (eds.) Genetic Programming Theory and Practice VII, pp. 55–72. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    La Cava, W., Spector, L., Danai, K.: Epsilon-lexicase selection for regression. In: GECCO 2016 Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 741–748. ACM, New York (2016)Google Scholar
  10. 10.
    Lacey, A., Pinitkarn, N., Thacker, N.A.: An evaluation of the performance of RANSAC algorithms for stereo camera calibrarion. In: BMVC, pp. 1–10 (2000)Google Scholar
  11. 11.
    Martínez, Y., Trujillo, L., Naredo, E., Legrand, P.: A comparison of fitness-case sampling methods for symbolic regression with genetic programming. In: Tantar, A.-A. (ed.) EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V. AISC, vol. 288, pp. 201–212. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-07494-8_14 Google Scholar
  12. 12.
    Martnez, Y., Naredo, E., Trujillo, L., Legrand, P., Lpez, U.: A comparison of fitness-case sampling methods for genetic programming. Journal of Experimental and Theoretical Artificial Intelligence (accepted to appear 2016)Google Scholar
  13. 13.
    McDermott, J., White, D.R., Luke, S., Manzoni, L., Castelli, M., Vanneschi, L., Jaskowski, W., Krawiec, K., Harper, R., De Jong, K., O’Reilly, U.M.: Genetic programming needs better benchmarks. In: GECCO 2012 Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, pp. 791–798. ACM, New York (2012)Google Scholar
  14. 14.
    Nunkesser, R., Morell, O.: An evolutionary algorithm for robust regression. Comput. Stat. & Data Anal. 54(12), 3242–3248 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Pearson, R.K.: Mining imperfect data: dealing with contamination and incomplete records. SIAM (2005)Google Scholar
  16. 16.
    Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: GECCO Companion 2012 Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 401–408. ACM (2012)Google Scholar
  18. 18.
    Tarsha-Kurdi, F., Landes, T., Grussenmeyer, P., et al.: Hough-transform and extended RANSAC algorithms for automatic detection of 3D building roof planes from Lidar data. In: Proceedings of the ISPRS Workshop on Laser Scanning. vol. 36, pp. 407–412 (2007)Google Scholar
  19. 19.
    Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)CrossRefGoogle Scholar
  20. 20.
    Zuliani, M.: RANSAC for Dummies. Vision Research Lab, University of California, Santa Barbara (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Uriel López
    • 1
  • Leonardo Trujillo
    • 1
  • Yuliana Martinez
    • 1
  • Pierrick Legrand
    • 2
    • 3
    • 4
  • Enrique Naredo
    • 5
  • Sara Silva
    • 6
    • 7
  1. 1.Posgrado en Ciencias de la IngenieriaInstituto Tecnológico de Tijuana, Unidad Otay, Blvd. IndustrialTijuanaMexico
  2. 2.University of BordeauxBordeauxFrance
  3. 3.IMB, UMR CNRS 5251TalenceFrance
  4. 4.INRIA Bordeaux Sud-OuestTalenceFrance
  5. 5.Laboratorio Nacional de Geointeligencia (GeoINT)Centro de Investigación en Geografía y Geomática (CentroGeo)AguascalientesMexico
  6. 6.BioISI - Biosystems & Integrative Sciences Institute, Departamento de Informática, Faculdade de CiênciasUniversidade de LisboaLisboaPortugal
  7. 7.CISUC, Department of Informatics EngineeringUniversity of CoimbraCoimbraPortugal

Personalised recommendations