A real-valued label noise cleaning method based on ensemble iterative filtering with noise score

Li, Chuang; Mao, Zhizhong; Jia, Mingxing

doi:10.1007/s13042-024-02137-z

A real-valued label noise cleaning method based on ensemble iterative filtering with noise score

Original Article
Published: 29 April 2024

(2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

70 Accesses
Explore all metrics

Abstract

Real-world data always contain noise for a variety of reasons. In a regression task, noisy labels interfere with the construction of an accurate model, leading to a decline in the prediction accuracy. Methods that have emerged to deal with continuous label noise are rather limited in contrast with those on class noise cleaning techniques. To address this gap, we propose a novel noise filter to clean noisy instances with real-valued label noise. This method combines several filtering strategies. First, an iterative filtering process is carried out, allowing us to avoid using potential noisy examples in each new filtering iteration. Second, we develop a noise score to assess the noise level of each detected noisy instance. The higher the noise score is, the more likely that the instance is noisy. Finally, an ensemble filtering scheme is implemented. The fusion of detection from different models makes the determination of noisy examples even more reliable. The validity of the proposed method is verified through extensive experiments. We discuss the selection of the best hyperparameters, and compare the developed method with several state-of-the-art noise filters using public regression datasets. The outcomes show that our method not only achieves a good balance between the elimination of noisy samples and the retention of clean samples but also outperforms all the other compared methods, especially at higher noise levels. Simultaneously, the results of a case study of temperature prediction in an electric arc furnace suggest that training a domain-related regressor on a dataset preprocessed with the proposed noise filter contributes to a great improvement in the prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensembles of label noise filters: a ranking approach

Article 13 July 2016

A Novel Noise Filter Based on Multiple Voting

Label denoising based on Bayesian aggregation

Article 19 December 2015

Data availability

The data used in the experiments mainly comes from KEEL and UCI repositories. Please refer to references [38] and [39] for detail.

References

Kang Z, Pan H, Hoi SCH et al (2020) Robust graph learning from noisy data. IEEE Trans Cybernet 50(5):1833–1843
Article Google Scholar
Sáez JA, Corchado E (2019) KSUFS: a novel unsupervised feature selection method based on statistical tests for standard and big data problems. IEEE Access 7:99754–99770
Article Google Scholar
Frenay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article Google Scholar
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22:177–210
Article Google Scholar
Sáez JA, Galar M, Luengo J et al (2014) Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl Inf Syst 38:179–206
Article Google Scholar
Gamberger D, Lavrac N, Dzeroski S (1996) Noise elimination in inductive concept learning: a case study in medical diagnosis. In: proceedings of the 7th international workshop on algorithmic learning theory, pp 199–212
García S, Luengo J, Herrera F (2016) Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl Based Syst 98:1–29
Article Google Scholar
Sáez JA, Galar M, Luengo J et al (2016) INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inform Fusion 27:19–32
Article Google Scholar
Luengo J, Shim SO, Alshomrani S et al (2018) CNC-NOS: class noise cleaning by ensemble filtering and noise scoring. Knowl Based Syst 140:27–49
Article Google Scholar
Nematzadeh Z, Ibrahim R, Selamat A (2020) Improving class noise detection and classification performance: a new two-filter CNDC model. Appl Soft Comput 94:106428
Article Google Scholar
Li C, Sheng VS, Jiang L et al (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103
Article Google Scholar
Jeatrakul P, Wong KW, Fung CC (2010) Data cleaning for classification using misclassification analysis. J Adv Comput Intell Intell Inf 14:297–302
Article Google Scholar
Algan G, Ulusoy I (2020) Image classification with deep learning in the presence of noisy labels: a survey. Knowl Based Syst 215:106771
Article Google Scholar
Wang Y, Liu W, Ma X, et al (2018) Iterative learning with open-set noisy labels. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8688–8696
Daiki T, Daiki I, Toshihiko Y et al (2018) Joint optimization framework for learning with noisy labels. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5552–5560
Yu X, Han B, Yao J et al (2019) How does disagreement help generalization against label corruption? In: international conference on machine learning, pp 7164–7173
Kordos M, Blachnik M (2012) Instance selection with neural networks for regression problems. In: international conference on artificial neural networks, pp 263–270
Martín J, Sáez JA, Corchado E (2021) On the regressand noise problem: model robustness and synergy with regression-adapted noise filters. IEEE Access 9:145800–145816
Article Google Scholar
González AA, Pastor JFD, Rodríguez JJ et al (2016) Instance selection for regression by discretization. Expert Syst Appl 54:340–350
Article Google Scholar
González AA, Pastor JFD, Rodríguez JJ et al (2016) Instance selection for regression: adapting DROP. Neurocomputing 201:66–81
Article Google Scholar
Sofie V, Assche AV (2003) Ensemble methods for noise elimination in classification problems. Multiple classifier systems. Springer, Berlin, pp 317–325
Google Scholar
Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques. J Comput Sci Technol 22:387–396
Article Google Scholar
Gamberger D, Lavrac N, Dzeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223
Article Google Scholar
Berghout T, Mouss LH, Kadri O et al (2020) Aircraft engines remaining useful life prediction with an adaptive denoising online sequential extreme learning machine. Eng Appl Artif Intel 96:103936
Article Google Scholar
Lv M, Hong Z, Chen L et al (2020) Temporal multi-graph convolutional network for traffic flow prediction. IEEE Trans Intell Transp Syst 22(6):3337–3348
Article Google Scholar
Ge L, Wu K, Zeng Y et al (2020) Multi-scale spatiotemporal graph convolution network for air quality prediction. Appl Intell 51:3491–3505
Article Google Scholar
Shine P, Scully T, Upton J et al (2019) Annual electricity consumption prediction and future expansion analysis on dairy farms using a support vector machine. Appl Energy 250:1110–1119
Article Google Scholar
Kara F, Aslantaş K, Çiçek A (2016) Prediction of cutting temperature in orthogonal machining of AISI 316L using artificial neural network. Appl Soft Comput 38:64–74
Article Google Scholar
Wang RY, Storey VC, Firth CP (1995) A framework for analysis of data quality research. IEEE Trans Knowl Data Eng 7:623–640
Article Google Scholar
Fernandez JMM, Cabal VA, Montequin VR et al (2008) Online estimation of electric arc furnace tap temperature by using fuzzy neural networks. Eng Appl Artif Intel 21(7):1001–1012
Article Google Scholar
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11(1):131–167
Article Google Scholar
Sun J, Zhao F, Wang C et al (2007) Identifying and correcting mislabeled training instances. In: proceedings of the future generation communication and networking, pp 244–250
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybernet 6(6):448–452
MathSciNet Google Scholar
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Article Google Scholar
Jiang G, Wang W, Qian Y et al (2021) A unified sample selection framework for output noise filtering: an error-bound perspective. J Mach Learn Res 22:1–66
MathSciNet Google Scholar
González AA, Blachnik M, Kordos M et al (2016) Fusion of instance selection methods in regression tasks. Inform Fusion 30:69–79
Article Google Scholar
Angelova A, Mostafam YA, Perona P (2005) Pruning training sets for learning of object categories. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 494–501
Fdez JA, Fernandez A, Luengo J et al (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Dheeru D, Graff C (2017) UCI Machine learning repository. http://archive.ics.uci.edu/ml. Accessed 2017
Zhao L, Gkountouna O, Pfoser D (2019) Spatial auto-regressive dependency interpretable learning based on spatial topological constraints. ACM Trans Spat Algorithms Syst 5(3):1–28
Article Google Scholar
Acı CI, Akay MF (2015) A hybrid congestion control algorithm for broadcast-based architectures with multiple input queues. J Supercomput 71:1907–1931
Article Google Scholar
Zhou F, Claire Q, King RD (2014) Predicting the geographical origin of music. In proceedings of the IEEE international conference on data mining, pp 1115–1120
Kaya H, Tüfekci P, Uzun E (2019) Predicting CO and NOx emissions from gas turbines: novel data and a benchmark PEMS. Turk J Electr Eng Comput Sci 27(6):4783–4796
Article Google Scholar
Moro S, Rita P, Vala B (2016) Predicting social media performance metrics and evaluation of the impact on brand building: a data mining approach. J Bus Res 69(9):3341–3351
Article Google Scholar
Vergara A, Vembu S, Ayhan T et al (2012) Chemical gas sensor drift compensation using classifier ensembles. Sens Actuators B Chem 166:320–329
Article Google Scholar
Lujan IR, Fonollosa J, Vergara A et al (2014) On the calibration of sensor arrays for pattern recognition using the minimal number of experiments. Chemom Intell Lab Syst 130:123–134
Article Google Scholar
Hoseinzade E, Haratizadeh S (2019) CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst Appl 129:273–285
Article Google Scholar
Rafiei MH, Adeli H (2016) A novel machine learning model for estimation of sale prices of real estate units. J Constr Eng Manag 142(2):04015066
Article Google Scholar
Vito SDE, Massera E, Piga M et al (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens Actuators B Chem 129(2):750–757
Article Google Scholar
Fanaee TH, Gama J (2014) Event labeling combining ensemble detectors and background knowledge. Prog Artif Intell 2(2):113–127
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
García S, Fernández A, Luengo J et al (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inform Sci 180(10):2044–2064
Article Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
MathSciNet Google Scholar
Hay T, Visuri VV, Aula M et al (2020) A review of mathematical process models for the electric arc furnace process. Steel Res Int 92(3):2000395
Article Google Scholar
Li C, Mao Z (2022) Generative adversarial network–based real-time temperature prediction model for heating stage of electric arc furnace. Trans Inst Meas Control 44(8):1669–1684
Article Google Scholar
Yuan P, Wang F, Mao Z (2006) Endpoint prediction of EAF based on G-SVM. J Iron Steel Res Int 18(10):7–10
Google Scholar
Fernandez JMM, Menendez C, Ortega FA et al (2009) A smart modelling for the casting temperature prediction in an electric arc furnace. Int J Comput Math 86(7):1182–1193
Article Google Scholar
Sismanis P (2019) Prediction of productivity and energy consumption in a consteel furnace using data-science models. In: proceedings of the 22th international conference on business information systems, pp 85–99

Download references

Acknowledgements

This work was supported by the Key Program of National Natural Science Foundation of China (No. 51634002) and National Natural Science Foundation of China (No. 61773101).

Author information

Authors and Affiliations

College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
Chuang Li, Zhizhong Mao & Mingxing Jia

Authors

Chuang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhizhong Mao
View author publications
You can also search for this author in PubMed Google Scholar
Mingxing Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhizhong Mao.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, C., Mao, Z. & Jia, M. A real-valued label noise cleaning method based on ensemble iterative filtering with noise score. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02137-z

Download citation

Received: 17 January 2023
Accepted: 13 March 2024
Published: 29 April 2024
DOI: https://doi.org/10.1007/s13042-024-02137-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A real-valued label noise cleaning method based on ensemble iterative filtering with noise score

Abstract

Access this article

Similar content being viewed by others

Ensembles of label noise filters: a ranking approach

A Novel Noise Filter Based on Multiple Voting

Label denoising based on Bayesian aggregation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A real-valued label noise cleaning method based on ensemble iterative filtering with noise score

Abstract

Access this article

Similar content being viewed by others

Ensembles of label noise filters: a ranking approach

A Novel Noise Filter Based on Multiple Voting

Label denoising based on Bayesian aggregation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation