A taxonomy on impact of label noise and feature noise using machine learning techniques

Shanthini, A.; Vinodhini, G.; Chandrasekaran, R. M.; Supraja, P.

doi:10.1007/s00500-019-03968-7

A taxonomy on impact of label noise and feature noise using machine learning techniques

Focus
Published: 08 April 2019

Volume 23, pages 8597–8607, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

A. Shanthini¹,
G. Vinodhini²,
R. M. Chandrasekaran² &
…
P. Supraja¹

1153 Accesses
18 Citations
Explore all metrics

Abstract

Soft computing techniques are effective techniques that are used in prediction of noise in the dataset which causes misclassification. In classification, it is expected to have perfect labeling, but the noise present in data has impact on the label mapped and influences the input values by affecting the input feature values of the instances. Existence of noise complicates prediction in the real-world data which leads to vicious effect of the classifier. Present study aims at quantitative assessment of label noise and feature noise through machine learning, and classification performance in medical datasets as noise handling has become an important aspect in the research work related to data mining and its application. Weak classifier boosting provides high standard accuracy levels in classification problems. This study explores the performance of most recent soft computing technique in machine learning which includes weak learner-based boosting algorithms, such as adaptive boosting, generalized tree boosting and extreme gradient boosting. Current study was made to compare and analyze disparate boosting algorithms in divergent noise and feature levels (5%, 10%, 15% and 20%) on distinct medical datasets. The performances of weak learners are measured in terms of accuracy and equalized loss of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis

Article 15 August 2021

Performance Analysis of Machine Learning Algorithms for Medical Datasets

Clinical decision support system based on RST with machine learning for medical data classification

Article 05 October 2023

References

Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837
Article Google Scholar
Cao J, Kwong S, Wang R (2012) A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recogn 45(12):4451–4465
Article MATH Google Scholar
Carlotto MJ (2009) Effect of errors in ground truth on classification accuracy. Int J Remote Sens 30:4831–4849 (Remote Sens 2017, 9: 173 23 of 24)
Article Google Scholar
Chen T, He T, Benesty M (2015) XGBoost: extreme gradient boosting. R package version 0.4-2, pp 1–4
Folleco A, Khoshgoftaar TM, Hulse JV, Napolitano A (2009) Identifying learners robust to low quality data. Informatica 33:245–259
MathSciNet MATH Google Scholar
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey, IEEE Trans. Neural Netw Learn Syst 25:845–869
Article Google Scholar
Garcia LP, de Carvalho AC, Lorena AC (2015) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108–119
Article Google Scholar
Görnitz N, Porbadnigk A, Binder A, Sannelli C, Braun ML, Müller KR, Kloft M (2014) Learning and evaluation in presence of non-IID label noise. In: Proceedings of the international conference on artificial intelligence and statistics, Reykjavik, Iceland, 22–25 April 2014, pp 293–302
Karmaker A, Kwek S (2006) A boosting approach to remove class label noise. Int J Hybrid Intell Syst 3(3):169–177
Article MATH Google Scholar
Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum 41(3):552–568
Article Google Scholar
Kottilingam K, Gunasekaran R, Saranya K (2016) A data activity-based server-side cache replacement for mobile devices. In: Proceedings on artificial intelligence and evolutionary computations in engineering systems. Springer, pp 579–589
Li Y, Wessels LF, de Ridder D, Reinders MJ (2007) Classification in the presence of class noise using a probabilistic kernel fisher method. Pattern Recogn 40(12):3349–3357
Article MATH Google Scholar
Mantas CJ, Abellán J (2014) Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst Appl 41(5):2514–2525
Article Google Scholar
Mellor A, Boukir S, Haywood A, Jones S (2015) Exploring issues of training data imbalance and mislabeling on Random Forest performance for large area land cover classification using the ensemble margin. ISPRS J Photogramm Remote Sens 105:155–168
Article Google Scholar
Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc., Lake Tahoe, USA, pp 1196–1204
Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: Proceedings of the 19th IEEE symposium on computer-based medical systems (CBMS’06), Salt Lake City, UT, USA, 22–23 June 2006, pp 708–713
Sabzevari M, Martínez-Muñoz G, Suárez A (2018) Vote-boosting ensembles. Pattern Recogn 83:119–133
Article Google Scholar
Saez JA, Luengo J, Herrera F (2016) Evaluating the classifier behaviour with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176:26–35
Article Google Scholar
Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595
Article Google Scholar
Shanthini A, Vinodhini G, Chandrasekaran RM (2018) Predicting students’ academic performance in the university using meta decision tree classifiers. J Comput Sci JCS. ISSN: 1552-6607. https://doi.org/10.3844/jcssp
Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131
Article Google Scholar
Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102
Article Google Scholar
Teng CM (2001) A comparison of noise handling techniques. In: Proceedings of the international florida artificial intelligence research society conference, Key West, FL, USA, 21–23 May 2001, pp 269–273
Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015a) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62
Article Google Scholar
Xiao T, Xia T, Yang Y, Huang C, Wang X (2015b) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp 2691–2699

Download references

Author information

Authors and Affiliations

Department of Information and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamilnadu, 603203, India
A. Shanthini & P. Supraja
Department of Computer Science and Engineering, Annamalai University, Chidambaram, India
G. Vinodhini & R. M. Chandrasekaran

Authors

A. Shanthini
View author publications
You can also search for this author in PubMed Google Scholar
G. Vinodhini
View author publications
You can also search for this author in PubMed Google Scholar
R. M. Chandrasekaran
View author publications
You can also search for this author in PubMed Google Scholar
P. Supraja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Shanthini.

Ethics declarations

Conflict of interest

The authors declare they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Not Applicable.

Additional information

Communicated by Sahul Smys.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shanthini, A., Vinodhini, G., Chandrasekaran, R.M. et al. A taxonomy on impact of label noise and feature noise using machine learning techniques. Soft Comput 23, 8597–8607 (2019). https://doi.org/10.1007/s00500-019-03968-7

Download citation

Published: 08 April 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00500-019-03968-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A taxonomy on impact of label noise and feature noise using machine learning techniques

Abstract

Access this article

Similar content being viewed by others

Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis

Performance Analysis of Machine Learning Algorithms for Medical Datasets

Clinical decision support system based on RST with machine learning for medical data classification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A taxonomy on impact of label noise and feature noise using machine learning techniques

Abstract

Access this article

Similar content being viewed by others

Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis

Performance Analysis of Machine Learning Algorithms for Medical Datasets

Clinical decision support system based on RST with machine learning for medical data classification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation