A First Study on the Use of Boosting for Class Noise Reparation

  • Pablo Morales Álvarez
  • Julián Luengo
  • Francisco Herrera
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9648)

Abstract

Class noise refers to the incorrect labeling of examples in classification, and is known to negatively affect the performance of classifiers. In this contribution, we propose a boosting-based hybrid algorithm that combines data removal and data reparation to deal with noisy instances. A experimental procedure to compare its performance against no-preprocessing is developed and analyzed, laying the foundations for future works.

Keywords

Data reparation Data filtering Class noise Boosting Classification 

Notes

Acknowledgments

This work was supported by the National Research Project TIN2014-57251-P and Andalusian Research Plans P10-TIC-6858 and P11-TIC-7765. P.M. Álvarez also holds ICARO contract 135849 from University of Granada.

References

  1. 1.
    Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2010)Google Scholar
  2. 2.
    Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 621–630. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)MATHGoogle Scholar
  5. 5.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)MATHGoogle Scholar
  6. 6.
    Cherkassky, V., Mulier, F.M.: Learning from Data: Concepts, Theory, and Methods. John Wiley & Sons, New York (2007)CrossRefMATHGoogle Scholar
  7. 7.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123, July 1995Google Scholar
  8. 8.
    Cuendet, S., Hakkani-Tür, D., Shriberg, E.: Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 144–155. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. learn. 40(2), 139–157 (2000)CrossRefGoogle Scholar
  10. 10.
    Frénay, B., Verleysen, M.: Classification in the presence of label noise: A survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefGoogle Scholar
  11. 11.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Freund, Y., Schapire, R.E.: Boosting: Foundations and algorithms. MIT press, Cambridge (2012)MATHGoogle Scholar
  13. 13.
    Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: ICML, pp. 143–151, June 1999Google Scholar
  14. 14.
    García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, New York (2015)CrossRefGoogle Scholar
  15. 15.
    Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The elements of statistical learning: Data mining, inference and prediction. Math. Intel. 27(2), 83–85 (2005)Google Scholar
  16. 16.
    Karmaker, A., Kwek, S.: A boosting approach to remove class label noise. In: Fifth International Conference on Hybrid Intelligent Systems, 2005, HIS 2005. p. 6. IEEE, November 2005Google Scholar
  17. 17.
    Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22(3), 387–396 (2007)CrossRefGoogle Scholar
  18. 18.
    Koplowitz, J., Brown, T.A.: On the relation of performance to editing in nearest neighbor rules. Pattern Recogn. 13(3), 251–255 (1981)CrossRefGoogle Scholar
  19. 19.
    Lallich, S., Muhlenbach, F., Zighed, D.A.: Improving classification by removing or relabeling mislabeled instances. In: Hacid, M.-S., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds.) ISMIS 2002. LNCS (LNAI), vol. 2366, pp. 5–15. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Miranda, A.L.B., Garcia, L.P.F., Carvalho, A.C.P.L.F., Lorena, A.C.: Use of classification algorithms in noise detection and elimination. In: Corchado, E., Wu, X., Oja, E., Herrero, A., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 417–424. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  21. 21.
    Muhlenbach, F., Lallich, S., Zighed, D.A.: Identifying and handling mislabelled instances. J. Intell. Inf. Syst. 22(1), 89–109 (2004)CrossRefMATHGoogle Scholar
  22. 22.
    Pyle, D.: Data Preparation for Data Mining, vol. 1. Morgan Kaufmann, San Francisco (1999)Google Scholar
  23. 23.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)Google Scholar
  24. 24.
    Sáez, J.A., Galar, M., Luengo, J., Herrera, F.: Analyzing the presence of noise in multi-class problems: Alleviating its influence with the one-vs-one decomposition. Knowl. Inf. Syst. 38(1), 179–206 (2014)CrossRefGoogle Scholar
  25. 25.
    Sun, J.W., Zhao, F.Y., Wang, C.J., Chen, S.F.: Identifying and correcting mislabeled training instances. In: Future Generation Communication and Networking (FGCN 2007), vol. 1, pp. 244–250. IEEE, December 2007Google Scholar
  26. 26.
    Teng, C.M.: Correcting noisy data. In: ICML, pp. 239–248, June 1999Google Scholar
  27. 27.
    Teng, C.M.: Dealing with data corruption in remote sensing. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 452–463. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  28. 28.
    Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)CrossRefGoogle Scholar
  29. 29.
    Wheway, V.: Using boosting to detect noisy data. In: Kowalczyk, R., Loke, S.W., Reed, N.E., Graham, G. (eds.) PRICAI-WS 2000. LNCS (LNAI), vol. 2112, pp. 123–130. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  30. 30.
    Wu, X., Zhu, X.: Class noise vs. attribute noise: A quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Wu, X., Zhu, X.: Mining with noise knowledge: Error-aware data mining. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 38(4), 917–932 (2008)CrossRefGoogle Scholar
  32. 32.
    Zeng, X., Martinez, T.R.: An algorithm for correcting mislabeled data. Intel. Data Anal. 5(6), 491–502 (2001)MATHGoogle Scholar
  33. 33.
    Zeng, X., Martinez, T.R.: Using decision trees and soft labeling to filter mislabeled data. J. Intell. Syst. 17(4), 331–354 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Pablo Morales Álvarez
    • 1
  • Julián Luengo
    • 1
  • Francisco Herrera
    • 1
  1. 1.Department of Computer Science and Artificial Intelligence, CITIC-UGRUniversity of GranadaGranadaSpain

Personalised recommendations