Abstract
The success of machine learning relies on high-quality labeled training data. If there are incorrectly labeled data in the training data, the performance of the best classifier will be greatly reduced in a wide range of classification problems, and noisy tags are also often more harmful than noisy attributes. Unfortunately, large datasets almost contain incorrect or inaccurate labels. This paper proposes a simple and effective method that can identify noisy data in text classification datasets to a certain extent. We analyze the characteristics of the noisy data, and design the new method based on the idea of ensemble learning. The method combines with the majority voting and iterative methods to select the noisy data hidden in the dataset. Under the same conditions, our method can select more noisy data, and perform corresponding evaluations on the recall and precision of noise. The experimental results show that this method is better than some previous methods.
This work was supported by the National Key R&D Program of China (Grant No: 2017YFB0803203) and Shanghai Municipal Natural Science Foundation (Grant No. 15ZR1403700).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: International Conference on Machine Learning (ICML) (2018)
Wang, F., Chen, L., Li, C., Huang, S., Chen, Y., Qian, C., Change Loy, C.: The devil of face recognition is in the noise. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 780–795. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_47
Zlateski, A., Jaroensri, R., Sharma, P., Durand, F.: On the importance of label quality for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Nettleton, D., Orriolspuig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010)
Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noisy and supervised learning in medical domains: the effect of feature extraction. In: Computer-Based Medical Systems (CBMS) (2006)
Zhu, X., Wu, X.: Class noisy vs. attribute noisy: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Frenay, B., Verleysen, M.: Classification in the presence of label noisy: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Jakramate, B., Kaban, A.: Label-noisy robust logistic regression and its applications. In: Machine Learning and Knowledge Discovery in Databases, pp. 143–158 (2012)
Beigman, E., Klebanov, B.B.: Learning with annotation noisy. In: ACL-IJCNLP (2009)
Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: estimating uncertainty in dataset labels. arXiv preprint arXiv:1911.00068 (2019)
Toneva, M., Sordoni, A., Combes, R.T.D., et al.: An empirical study of example forgetting during deep neural network learning (2018)
Goldberger, J., Ben-Reuven, E.: Training deep neural-networks using a noisy adaptation layer. In: ICLR (2017)
Jiang, L., et al.: Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning (2018)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: 26th International Conference on Machine Learning, pp. 41–48. ACM, New York (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chai, Y., Wu, C., Zeng, J. (2021). Detect Noisy Label Based on Ensemble Learning. In: Meng, H., Lei, T., Li, M., Li, K., Xiong, N., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 88. Springer, Cham. https://doi.org/10.1007/978-3-030-70665-4_199
Download citation
DOI: https://doi.org/10.1007/978-3-030-70665-4_199
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70664-7
Online ISBN: 978-3-030-70665-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)