Artificial Intelligence Review

, Volume 33, Issue 4, pp 275–306 | Cite as

A study of the effect of different types of noise on the precision of supervised learning techniques

  • David F. Nettleton
  • Albert Orriols-Puig
  • Albert Fornells
Article

Abstract

Machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models. Therefore, effectively dealing with noise is a key aspect in supervised learning to obtain reliable models from data. Although several authors have studied the effect of noise for some particular learners, comparisons of its effect among different learners are lacking. In this paper, we address this issue by systematically comparing how different degrees of noise affect four supervised learners that belong to different paradigms. Specifically, we consider the Naïve Bayes probabilistic classifier, the C4.5 decision tree, the IBk instance-based learner and the SMO support vector machine. We have selected four methods which enable us to contrast different learning paradigms, and which are considered to be four of the top ten algorithms in data mining (Yu et al. 2007). We test them on a collection of data sets that are perturbed with noise in the input attributes and noise in the output class. As an initial hypothesis, we assign the techniques to two groups, NB with C4.5 and IBk with SMO, based on their proposed sensitivity to noise, the first group being the least sensitive. The analysis enables us to extract key observations about the effect of different types and degrees of noise on these learning techniques. In general, we find that Naïve Bayes appears as the most robust algorithm, and SMO the least, relative to the other two techniques. However, we find that the underlying empirical behavior of the techniques is more complex, and varies depending on the noise type and the specific data set being processed. In general, noise in the training data set is found to give the most difficulty to the learners.

Keywords

Attribute noise Class noise Machine learning techniques Noise impacts 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66Google Scholar
  2. Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man–Mach Stud 36: 267–287CrossRefGoogle Scholar
  3. Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4): 343–370Google Scholar
  4. Asuncion A, Newman DJ (2007) UCI repository of machine learning databases. Available by anonymous ftp to ics.uci.edu in the pub/machine-learning-databases directory. University of CaliforniaGoogle Scholar
  5. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32: 675–701CrossRefGoogle Scholar
  6. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11: 86–92CrossRefMATHGoogle Scholar
  7. Fürnkranz J (1997) Noise-tolerant windowing. In: Proceedings of the 15th international joint conference on artificial intelligence (IJCAI-97), Nagoya, Japan. Morgan Kaufmann, pp 852–857Google Scholar
  8. Goldman SA, Sloan RH (1995) Can PAC learning algorithms tolerate random attribute noise. Algorithmica 14(1): 70–84 (Springer, New York)CrossRefMathSciNetMATHGoogle Scholar
  9. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten, IH (2009) The WEKA data mining software: an update; SIGKDD Explor 10–18Google Scholar
  10. Hunt EB, Martin J, Stone P (1966) Experiments in induction. Academic Press, New YorkGoogle Scholar
  11. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Eleventh conference on uncertainty in artificial intelligence, San Mateo, pp 338–345Google Scholar
  12. Kearns M (1998) Efficient noise-tolerant learning from statistical queries. J ACM 45(6): 983–1006CrossRefMathSciNetMATHGoogle Scholar
  13. Meeson S, Blott BH, Killingback ALT (1996) EIT data noise evaluation in the clinical environment. Physiol Meas 17: A33–A38CrossRefGoogle Scholar
  14. Nelson R (2005) Overcoming noise in data-acquisition systems (WEBCAST). Test Meas World. http://www.tmworld.com/article/319648- Overcomming_noise_in_data_acquisition_systems.php
  15. Nettleton D, Torra V (2001) A comparison of active set method and genetic algorithm approaches for learning weighting vectors in some aggregation operators. Int J Intel Syst 16(9): 1069–1083CrossRefMATHGoogle Scholar
  16. Nettleton D, Muñiz J (2001) Processing and representation of meta-data for sleep apnea diagnosis with an artificial intelligence approach. Int J Med Inform 63(1–2): 77–89CrossRefGoogle Scholar
  17. Platt J (1998) Fast training of support vector Machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods—support vector learning, Chap 12. MIT Press, pp 169–185Google Scholar
  18. Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106 (Kluwer Academic Publishers)Google Scholar
  19. Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann, San MateoGoogle Scholar
  20. Sloan R (1988) Types of noise in data for concept learning. In: Annual workshop on computational learning theory. Proceedings of the first annual workshop on Computational learning theory: 91–96. SIGART: ACM special interest group on artificial intelligenceGoogle Scholar
  21. Sloan RH (1995) Four types of noise in data for PAC learning. Inform Process Lett 54(3): 157–162CrossRefMATHGoogle Scholar
  22. Torra V (1997) The weighted owa operator. Int J Intell Syst 12(2): 153–166CrossRefMATHGoogle Scholar
  23. Vapnik VN (1995) The nature of statistical learning theory. Springer Verlag, New YorkMATHGoogle Scholar
  24. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscoMATHGoogle Scholar
  25. Yu S, Zhou ZH, Steinbac M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. Knowl Inform Syst 14(1): 1–37Google Scholar
  26. Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceedings of the 20th ICML international conference on machine learning, Washington, DC, pp 920–927Google Scholar
  27. Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intel Rev 22: 177–210 (Kluwer Academic Publishers)CrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • David F. Nettleton
    • 1
    • 2
  • Albert Orriols-Puig
    • 2
  • Albert Fornells
    • 2
  1. 1.Department of TechnologyPompeu Fabra UniversityBarcelonaSpain
  2. 2.Grup de Recerca en Sistemes Intel·ligents, Enginyeria i Arquitectura La SalleUniversitat Ramon LlullBarcelonaSpain

Personalised recommendations