Detect Noisy Label Based on Ensemble Learning

Chai, Ying; Wu, Chengrong; Zeng, Jianping

doi:10.1007/978-3-030-70665-4_199

Ying Chai^8,9,
Chengrong Wu^8,9 &
Jianping Zeng^8,9

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 88))

Included in the following conference series:

The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

83 Accesses

Abstract

The success of machine learning relies on high-quality labeled training data. If there are incorrectly labeled data in the training data, the performance of the best classifier will be greatly reduced in a wide range of classification problems, and noisy tags are also often more harmful than noisy attributes. Unfortunately, large datasets almost contain incorrect or inaccurate labels. This paper proposes a simple and effective method that can identify noisy data in text classification datasets to a certain extent. We analyze the characteristics of the noisy data, and design the new method based on the idea of ensemble learning. The method combines with the majority voting and iterative methods to select the noisy data hidden in the dataset. Under the same conditions, our method can select more noisy data, and perform corresponding evaluations on the recall and precision of noise. The experimental results show that this method is better than some previous methods.

This work was supported by the National Key R&D Program of China (Grant No: 2017YFB0803203) and Shanghai Municipal Natural Science Foundation (Grant No. 15ZR1403700).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: International Conference on Machine Learning (ICML) (2018)
Google Scholar
Wang, F., Chen, L., Li, C., Huang, S., Chen, Y., Qian, C., Change Loy, C.: The devil of face recognition is in the noise. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 780–795. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_47
Chapter Google Scholar
Zlateski, A., Jaroensri, R., Sharma, P., Durand, F.: On the importance of label quality for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Nettleton, D., Orriolspuig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010)
Article Google Scholar
Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noisy and supervised learning in medical domains: the effect of feature extraction. In: Computer-Based Medical Systems (CBMS) (2006)
Google Scholar
Zhu, X., Wu, X.: Class noisy vs. attribute noisy: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Article Google Scholar
Frenay, B., Verleysen, M.: Classification in the presence of label noisy: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Article Google Scholar
Jakramate, B., Kaban, A.: Label-noisy robust logistic regression and its applications. In: Machine Learning and Knowledge Discovery in Databases, pp. 143–158 (2012)
Google Scholar
Beigman, E., Klebanov, B.B.: Learning with annotation noisy. In: ACL-IJCNLP (2009)
Google Scholar
Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: estimating uncertainty in dataset labels. arXiv preprint arXiv:1911.00068 (2019)
Toneva, M., Sordoni, A., Combes, R.T.D., et al.: An empirical study of example forgetting during deep neural network learning (2018)
Google Scholar
Goldberger, J., Ben-Reuven, E.: Training deep neural-networks using a noisy adaptation layer. In: ICLR (2017)
Google Scholar
Jiang, L., et al.: Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning (2018)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: 26th International Conference on Machine Learning, pp. 41–48. ACM, New York (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, 200433, China
Ying Chai, Chengrong Wu & Jianping Zeng
Engineering Research Center of Cyber Security Auditing and Monitoring, Ministry of Education, Shanghai, 200433, China
Ying Chai, Chengrong Wu & Jianping Zeng

Authors

Ying Chai
View author publications
You can also search for this author in PubMed Google Scholar
Chengrong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Chai .

Editor information

Editors and Affiliations

College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge, UK
Hongying Meng
School of Electronical Information and Artificial Engineering, Shaanxi University of Science and Technology, Xi’an, China
Tao Lei
College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge, UK
Maozhen Li
College of Electrical and Information, Hunan University, Changsha, China
Kenli Li
Division of Intelligent Future Technologies, Mälardalen University, Västerås, Västmanlands Län, Sweden
Ning Xiong
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Lipo Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chai, Y., Wu, C., Zeng, J. (2021). Detect Noisy Label Based on Ensemble Learning. In: Meng, H., Lei, T., Li, M., Li, K., Xiong, N., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 88. Springer, Cham. https://doi.org/10.1007/978-3-030-70665-4_199

Download citation

DOI: https://doi.org/10.1007/978-3-030-70665-4_199
Published: 27 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70664-7
Online ISBN: 978-3-030-70665-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics