Advertisement

A Study on the Noise Label Influence in Boosting Algorithms: AdaBoost, GBM and XGBoost

  • Anabel Gómez-Ríos
  • Julián Luengo
  • Francisco Herrera
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10334)

Abstract

In classification, class noise alludes to incorrect labelling of instances and it causes the classifiers to perform worse. In this contribution, we test the resistance against noise of the most influential boosting algorithms. We explain the fundamentals of these state-of-the-art algorithms, providing an unified notation to facilitate their comparison. We analyse how they carry out the classification, what loss functions use and what techniques employ under the boosting scheme.

Keywords

Class noise Boosting Classification 

Notes

Acknowledgments

This work was supported by the National Research Project TIN2014-57251-P and Andalusian Research Plan P11-TIC-7765.

References

  1. 1.
    Alfaro, E., Gámez, M., García, N.: Adabag: an R package for classification with boosting and bagging. J. Stat. Softw. 54(2), 1–35 (2013). https://www.jstatsoft.org/article/view/v054i02 CrossRefGoogle Scholar
  2. 2.
    Álvarez, P.M., Luengo, J., Herrera, F.: A first study on the use of boosting for class noise reparation. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS, vol. 9648, pp. 549–559. Springer, Cham (2016). doi: 10.1007/978-3-319-32034-2_46 CrossRefGoogle Scholar
  3. 3.
    Cao, J., Kwong, S., Wang, R.: A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recogn. 45(12), 4451–4465 (2012)CrossRefMATHGoogle Scholar
  4. 4.
    Chen, T., Gestrin, C.: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)Google Scholar
  5. 5.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)CrossRefGoogle Scholar
  6. 6.
    Frénay, B., Verleysen, M.: Classification in the presence of noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefGoogle Scholar
  7. 7.
    Freund, Y., Schapire, R.E.: Foundations and algorithms. MIT press, Cambridge (2012)MATHGoogle Scholar
  8. 8.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 337–374 (2002)MathSciNetGoogle Scholar
  9. 9.
    Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002)CrossRefMATHMathSciNetGoogle Scholar
  10. 10.
    García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, New York (2015)CrossRefGoogle Scholar
  11. 11.
    Karmaker, A., Kwek, S.: A boosting approach to remove class label noise. Int. J. Hybrid Intell. Syst. 3(3), 169–177 (2006)CrossRefMATHGoogle Scholar
  12. 12.
    McDonald, R.A., Hand, D.J., Eckley, I.A.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 35–44. Springer, Heidelberg (2003). doi: 10.1007/3-540-44938-8_4 CrossRefGoogle Scholar
  13. 13.
    Miao, Q., Cao, Y., Xia, G., Gong, M., Liu, J., Song, J.: RBoost: label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. IEEE Trans. Neural Netw. Learn. Syst. 27(11), 2216–2228 (2015)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Rätsch, G., Onoda, T., Mller, K.R.: Soft margins for AdaBoost. Mach. Learn. 42(3), 287–320 (2001)CrossRefMATHGoogle Scholar
  15. 15.
    Ridgeway, G.: Generalized Boosted Models: A guide to the gbm package. Update 1(1), 1–15 (2007)Google Scholar
  16. 16.
    Sáez, J.A., Luengo, J., Herrera, F.: Evaluating the classifier behaviour with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176, 26–35 (2016)CrossRefGoogle Scholar
  17. 17.
    Sun, B., Chen, S., Wang, J., Chen, H.: A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl. Based Syst. 102, 87–102 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Anabel Gómez-Ríos
    • 1
  • Julián Luengo
    • 1
  • Francisco Herrera
    • 1
  1. 1.Department of Computer Science and Artificial IntelligenceUniversity of Granada, CITIC-UGRGranadaSpain

Personalised recommendations