A taxonomy on impact of label noise and feature noise using machine learning techniques
- 23 Downloads
Soft computing techniques are effective techniques that are used in prediction of noise in the dataset which causes misclassification. In classification, it is expected to have perfect labeling, but the noise present in data has impact on the label mapped and influences the input values by affecting the input feature values of the instances. Existence of noise complicates prediction in the real-world data which leads to vicious effect of the classifier. Present study aims at quantitative assessment of label noise and feature noise through machine learning, and classification performance in medical datasets as noise handling has become an important aspect in the research work related to data mining and its application. Weak classifier boosting provides high standard accuracy levels in classification problems. This study explores the performance of most recent soft computing technique in machine learning which includes weak learner-based boosting algorithms, such as adaptive boosting, generalized tree boosting and extreme gradient boosting. Current study was made to compare and analyze disparate boosting algorithms in divergent noise and feature levels (5%, 10%, 15% and 20%) on distinct medical datasets. The performances of weak learners are measured in terms of accuracy and equalized loss of accuracy.
KeywordsFeature noise Label noise Machine learning Boosting
Compliance with ethical standards
Conflict of interest
The authors declare they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Chen T, He T, Benesty M (2015) XGBoost: extreme gradient boosting. R package version 0.4-2, pp 1–4Google Scholar
- Görnitz N, Porbadnigk A, Binder A, Sannelli C, Braun ML, Müller KR, Kloft M (2014) Learning and evaluation in presence of non-IID label noise. In: Proceedings of the international conference on artificial intelligence and statistics, Reykjavik, Iceland, 22–25 April 2014, pp 293–302Google Scholar
- Kottilingam K, Gunasekaran R, Saranya K (2016) A data activity-based server-side cache replacement for mobile devices. In: Proceedings on artificial intelligence and evolutionary computations in engineering systems. Springer, pp 579–589Google Scholar
- Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc., Lake Tahoe, USA, pp 1196–1204Google Scholar
- Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: Proceedings of the 19th IEEE symposium on computer-based medical systems (CBMS’06), Salt Lake City, UT, USA, 22–23 June 2006, pp 708–713Google Scholar
- Shanthini A, Vinodhini G, Chandrasekaran RM (2018) Predicting students’ academic performance in the university using meta decision tree classifiers. J Comput Sci JCS. ISSN: 1552-6607. https://doi.org/10.3844/jcssp
- Teng CM (2001) A comparison of noise handling techniques. In: Proceedings of the international florida artificial intelligence research society conference, Key West, FL, USA, 21–23 May 2001, pp 269–273Google Scholar
- Xiao T, Xia T, Yang Y, Huang C, Wang X (2015b) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp 2691–2699Google Scholar