Label denoising based on Bayesian aggregation

  • Parsa Bagherzadeh
  • Hadi Sadoghi Yazdi
Original Article


Label noise is a common problem that affects supervised learning and can produce misleading results. It is shown that only \(5\,\%\) of switched labels lead to a decrease of performances. Therefore, the true class of an instance must be distinguished from its observed label. In the past decade, classification in presence of label noise was the topic of interest. Several scholars focused on kNN-based approaches for data cleansing. These types of approaches often are susceptible to high label noise rate and when a batch of instances with noisy labels are exist they may deteriorate the results. The problem arises since the methods have a local view of instances. Another approach is to have a global view of instances. In a global view, instances with large distance from their respective classes are detected as noisy. A potential problem however is the determination of a threshold. An inappropriate threshold may lead to detection of a correct instance as noisy instance. In this paper a new method for label denoising based on Bayesian aggregation is proposed which solves the problems of kNN-based approaches by aggregating the local and global views of instances. The aggregation of local and global information leads to a more robust and accurate detection of instances with noisy labels and estimation of their true labels. The experimental results show the capabilities and robustness of the proposed method.


Label noise Mislabeled data Bayesain aggregation Data cleansing Supervised learning 



The authors would like to thank anonymous reviewers for their valuable comments and suggestions.


  1. 1.
    Pang-Ning T, Steinbach M, Kumar V (2006) Introduction to data mining. In: Library of congress, p 74Google Scholar
  2. 2.
    Quinlan JR (1986) Ind Dec Trees Mach Learn 1:81–106Google Scholar
  3. 3.
    Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22:177–210CrossRefzbMATHGoogle Scholar
  4. 4.
    Wu X (1995) Knowledge acquisition from databases. Intellect booksGoogle Scholar
  5. 5.
    Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Advances in pattern recognition. Springer, Berlin, pp 621–630Google Scholar
  6. 6.
    Frnay B, Verleysen M (2014) Classification in the presence of label noise: a survey. In: IEEE transactions on neural networks and learning systems, vol 25, pp 845–869Google Scholar
  7. 7.
    Michalek JE, Tripathi RC (1980) The effect of errors in diagnosis and measurement on the estimation of the probability of an event. J Am Stat Assoc 75:713–721MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Bi Y, Jeske DR (2010) The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise. J Multivar Anal 101:1622–1637MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Zhang J, Yang Y (2003) Robustness of regularized linear classification methods in text categorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, New York, pp 190–197Google Scholar
  10. 10.
    Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 1999:131–167Google Scholar
  11. 11.
    Hawkins DM (1980) Identification of outliers, vol 11. Springer, BerlinGoogle Scholar
  12. 12.
    Beckman RJ, Cook RD (1983) Outliers. Technometrics 25:119–149MathSciNetGoogle Scholar
  13. 13.
    Schlkopf B, Williamson RC, Smola AJ, Shawe-Taylor J, Platt JC (1999) Support vector method for novelty detection. In: NIPS, pp 582–588Google Scholar
  14. 14.
    Hayton P, Schlkopf B, Tarassenko L, Anuzis P (2000) Support vector novelty detection applied to jet engine vibration spectra. In: NIPS, pp 946–952Google Scholar
  15. 15.
    Hart P (1968) The condensed nearest neighbor rule (Corresp.) In: IEEE transactions on information theory, vol 14. pp 515–516. doi: 10.1109/TIT.1968.1054155
  16. 16.
    Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66Google Scholar
  17. 17.
    Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man Mach Stud 36:267–287CrossRefGoogle Scholar
  18. 18.
    Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286CrossRefzbMATHGoogle Scholar
  19. 19.
    Abelln J, Masegosa AR (2010) Bagging decision trees on data sets with classification noise. In: Foundations of information and knowledge systems. Springer, Berlin, pp 248–265Google Scholar
  20. 20.
    Valizadegan H, Tan PN (2007) Kernel based detection of mislabeled training examples. In: SDM. SIAM, Philadelphia, pp 309–319Google Scholar
  21. 21.
    Blake C, Merz CJ (1998) UCI repository of machine learning databasesGoogle Scholar
  22. 22.
    Demar J (2006) Statistical comparisons of classifiers over multiple data sets The. J Mach Learn Res 7:1–30MathSciNetGoogle Scholar
  23. 23.
    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701CrossRefzbMATHGoogle Scholar
  24. 24.
    Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11:86–92MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Gates G (1972) The reduced nearest neighbor rule (Corresp.) In: IEEE transactions on information theory, vol 18, pp 431–433. doi: 10.1109/TIT.1972.1054809

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Computer Engineering, Faculty of EngineeringFerdowsi University of MashhadMashhadIran

Personalised recommendations