Advertisement

Dealing with Noisy Data

  • Salvador García
  • Julián Luengo
  • Francisco Herrera
Chapter
Part of the Intelligent Systems Reference Library book series (ISRL, volume 72)

Abstract

This chapter focuses on the noise imperfections of the data. The presence of noise in data is a common problem that produces several negative consequences in classification problems. Noise is an unavoidable problem, which affects the data collection and data preparation processes in Data Mining applications, where errors commonly occur. The performance of the models built under such circumstances will heavily depend on the quality of the training data, but also on the robustness against the noise of the model learner itself. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve without using specialized techniques—particularly if they are noise-sensitive. Identifying the noise is a complex task that will be developed in Sect. 5.1. Once the noise has been identified, the different kinds of such an imperfection are described in Sect. 5.2. From this point on, the two main approaches carried out in the literature are described. On the first hand, modifying and cleaning the data is studied in Sect. 5.3, whereas designing noise robust Machine Learning algorithms is tackled in Sect. 5.4. An empirical comparison between the latest approaches in the specialized literature is made in Sect. 5.5.

Keywords

Class Label Noisy Data Attribute Noise Noise Filter Multiclass Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Abellán, J., Masegosa, A.R.: Bagging decision trees on data sets with classification noise. In: Link S., Prade H. (eds.) FoIKS, Lecture Notes in Computer Science, vol. 5956, pp. 248–265. Springer, Heidelberg (2009)Google Scholar
  2. 2.
    Abellán, J., Masegosa, A.R.: Bagging schemes on the presence of class noise in classification. Expert Syst. Appl. 39(8), 6827–6837 (2012)CrossRefGoogle Scholar
  3. 3.
    Aha, D.W., Kibler, D.: Noise-tolerant instance-based learning algorithms. In: Proceedings of the 11th International Joint Conference on Artificial Intelligence, Vol. 1, IJCAI’89, pp. 794–799. Morgan Kaufmann Publishers Inc. (1989)Google Scholar
  4. 4.
    Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. Fus. Found. Methodol. Appl. 13, 307–318 (2009)Google Scholar
  5. 5.
    Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1, 113–141 (2000)MathSciNetGoogle Scholar
  6. 6.
    Anand, R., Mehrotra, K., Mohan, C.K., Ranka, S.: Efficient classification for multiclass problems using modular neural networks. IEEE Trans. Neural Netw. 6(1), 117–124 (1995)CrossRefGoogle Scholar
  7. 7.
    Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1988)Google Scholar
  8. 8.
    Bonissone, P., Cadenas, J.M., Carmen Garrido, M., Díaz-Valladares, A.: A fuzzy random forest. Int. J. Approx. Reason. 51(7), 729–747 (2010)CrossRefGoogle Scholar
  9. 9.
    Bootkrajang, J., Kaban, A.: Multi-class classification in the presence of labelling errors. In: ESANN 2011, 19th European Symposium on Artificial Neural Networks, Bruges, Belgium, 27–29 April 2011, Proceedings, ESANN (2011)Google Scholar
  10. 10.
    Brodley, C.E., Friedl, M.A.: Identifying and eliminating mislabeled training instances. In: Clancey W.J., Weld D.S. (eds.) AAAI/IAAI, Vol. 1, pp. 799–805 (1996)Google Scholar
  11. 11.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)zbMATHGoogle Scholar
  12. 12.
    Catal, C., Alan, O., Balkan, K.: Class noise detection based on software metrics and ROC curves. Inf. Sci. 181(21), 4867–4877 (2011)CrossRefGoogle Scholar
  13. 13.
    Chang, C.C., Lin, C.J.: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar
  14. 14.
    Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  15. 15.
    Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Funk P., González-Calero P.A. (eds.) ECCBR, pp. 128–141 (2004)Google Scholar
  16. 16.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)CrossRefGoogle Scholar
  17. 17.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2(1), 263–286 (1995)zbMATHGoogle Scholar
  18. 18.
    Du, W., Urahama, K.: Error-correcting semi-supervised pattern recognition with mode filter on graphs. J. Adv. Comput. Intell. Intell. Inform. 15(9), 1262–1268 (2011)Google Scholar
  19. 19.
    Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. Neural Netw. Learn. Syst. IEEE Trans. 25(5), 845–869 (2014)CrossRefGoogle Scholar
  20. 20.
    Fürnkranz, J.: Round robin classification. J. Mach. Learn. Res. 2, 721–747 (2002)zbMATHMathSciNetGoogle Scholar
  21. 21.
    Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit. 44(8), 1761–1776 (2011)CrossRefGoogle Scholar
  22. 22.
    Galar, M., Fernández, A., Tartas, E.B., Sola, H.B., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C 42(4), 463–484 (2012)CrossRefGoogle Scholar
  23. 23.
    Gamberger, D., Boskovic, R., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proceedings of the Sixteenth International conference on machine learning, pp. 143–151. Morgan Kaufmann Publishers (1999)Google Scholar
  24. 24.
    Gamberger, D., Lavrac, N., Dzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14, 205–223 (2000)CrossRefGoogle Scholar
  25. 25.
    García, V., Alejo, R., Sánchez, J., Sotoca, J., Mollineda, R.: Combined effects of class imbalance and class overlap on instance-based classification. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) Intelligent Data Engineering and Automated Learning IDEAL 2006. Lecture Notes in Computer Science, vol. 4224, pp. 371–378. Springer, Berlin (2006)CrossRefGoogle Scholar
  26. 26.
    García, V., Mollineda, R., Sánchez, J.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11(3–4), 269–280 (2008)CrossRefGoogle Scholar
  27. 27.
    García, V., Sánchez, J., Mollineda, R.: An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 397–406. Springer, Heidelberg (2007)Google Scholar
  28. 28.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  29. 29.
    Haralick, R.M.: The table look-up rule. Commun. Stat. Theory Methods A 5(12), 1163–1191 (1976)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)CrossRefGoogle Scholar
  31. 31.
    Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2, 9–37 (1998)CrossRefGoogle Scholar
  32. 32.
    Hernández-Lobato, D., Hernández-Lobato, J.M., Dupont, P.: Robust multi-class gaussian process classification. In: Shawe-Taylor J., Zemel R.S., Bartlett P.L., Pereira F.C.N., Weinberger K.Q. (eds.) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain, NIPS, pp. 280–288 (2011)Google Scholar
  33. 33.
    Heskes, T.: The use of being stubborn and introspective. In: Ritter, H., Cruse, H., Dean, J. (eds.) Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, pp. 725–741. Kluwer, Dordrecht (2001)Google Scholar
  34. 34.
    Ho, T.K.: Multiple classifier combination: lessons and next steps. In: Kandel, Bunke E. (eds.) Hybrid Methods in Pattern Recognition, pp. 171-198. World Scientific, New York (2002)Google Scholar
  35. 35.
    Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)CrossRefGoogle Scholar
  36. 36.
    Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)CrossRefGoogle Scholar
  37. 37.
    Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)CrossRefGoogle Scholar
  38. 38.
    Huang, Y.S., Suen, C.Y.: A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. Pattern Anal. Mach. Intell. 17, 90–93 (1995)CrossRefGoogle Scholar
  39. 39.
    Huber, P.J.: Robust Statistics. Wiley, New York (1981)zbMATHCrossRefGoogle Scholar
  40. 40.
    Hüllermeier, E., Vanderlooy, S.: Combining predictions in pairwise classification: an optimal adaptive voting strategy and its relation to weighted voting. Pattern Recognit. 43(1), 128–142 (2010)zbMATHCrossRefGoogle Scholar
  41. 41.
    Japkowicz, N.: Class imbalance: are we focusing on the right issue? In: II Workshop on learning from imbalanced data sets, ICML, pp. 17–23 (2003)Google Scholar
  42. 42.
    Jeatrakul, P., Wong, K., Fung, C.: Data cleaning for classification using misclassification analysis. J. Adv. Comput. Intell. Intell. Inform. 14(3), 297–302 (2010)Google Scholar
  43. 43.
    Jo, T., Japkowicz, N.: Class Imbalances versus small disjuncts. SIGKDD Explor. 6(1), 40–49 (2004)MathSciNetCrossRefGoogle Scholar
  44. 44.
    John, G.H.: Robust decision trees: removing outliers from databases. In: Fayyad, U.M., Uthurusamy, R. (eds.) Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pp. 174–179. Montreal, Canada, August (1995)Google Scholar
  45. 45.
    Karmaker, A., Kwek, S.: A boosting approach to remove class label noise. Int. J. Hybrid Intell. Syst. 3(3), 169–177 (2006)zbMATHGoogle Scholar
  46. 46.
    Kermanidis, K.L.: The effect of borderline examples on language learning. J. Exp. Theor. Artif. Intell. 21, 19–42 (2009)zbMATHCrossRefGoogle Scholar
  47. 47.
    Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41(3), 552–568 (2011)CrossRefGoogle Scholar
  48. 48.
    Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22, 387–396 (2007)CrossRefGoogle Scholar
  49. 49.
    Klebanov, B.B., Beigman, E.: Some empirical evidence for annotation noise in a benchmarked dataset. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pp. 438–446. Association for Computational Linguistics (2010)Google Scholar
  50. 50.
    Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Fogelman Soulié F., Hérault J. (eds.) Neurocomputing: Algorithms, Architectures and Applications, pp. 41–50. Springer, Heidelberg (1990)Google Scholar
  51. 51.
    Knerr, S., Personnaz, L., Dreyfus, G., Member, S.: Handwritten digit recognition by neural networks with single-layer training. IEEE Trans. Neural Netw. 3, 962–968 (1992)CrossRefGoogle Scholar
  52. 52.
    Kubat, M., Matwin, S.: Addresing the curse of imbalanced training sets: one-side selection. In: Proceedings of the 14th International Conference on Machine Learning, pp. 179–186 (1997)Google Scholar
  53. 53.
    Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Chichester (2004)CrossRefGoogle Scholar
  54. 54.
    Kuncheva, L.I.: Diversity in multiple classifier systems. Inform. Fus. 6, 3–4 (2005)CrossRefGoogle Scholar
  55. 55.
    Lorena, A., de Carvalho, A., Gama, J.: A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30, 19–37 (2008)CrossRefGoogle Scholar
  56. 56.
    Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. In: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence, pp. 546–551 (1997)Google Scholar
  57. 57.
    Malossini, A., Blanzieri, E., Ng, R.T.: Detecting potential labeling errors in microarrays by data perturbation. Bioinformatics 22(17), 2114–2121 (2006)CrossRefGoogle Scholar
  58. 58.
    Mandler, E., Schuermann, J.: Combining the classification results of independent classifiers based on the Dempster/Shafer theory of evidence. In: Gelsema E.S., Kanal L.N. (eds.) Pattern Recognition and Artificial Intelligence, pp. 381–393. Amsterdam: North-Holland (1988)Google Scholar
  59. 59.
    Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)CrossRefGoogle Scholar
  60. 60.
    Maulik, U., Chakraborty, D.: A robust multiple classifier system for pixel classification of remote sensing images. Fundamenta Informaticae 101(4), 286–304 (2010)MathSciNetGoogle Scholar
  61. 61.
    Mayoraz, E., Moreira, M.: On the decomposition of polychotomies into dichotomies (1996)Google Scholar
  62. 62.
    Mazurov, V.D., Krivonogov, A.I., Kazantsev, V.S.: Solving of optimization and identification problems by the committee methods. Pattern Recognit. 20, 371–378 (1987)zbMATHMathSciNetCrossRefGoogle Scholar
  63. 63.
    Mclachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition (Wiley Series in Probability and Statistics). Wiley-Interscience, New York (2004)Google Scholar
  64. 64.
    Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. In: Roli F., Kittler J., Windeatt T. (eds.) Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 3077, pp. 293–302. Springer, Heidelberg (2004)Google Scholar
  65. 65.
    Miranda, A.L.B., Garcia, L.P.F., Carvalho, A.C.P.L.F., Lorena, A.C.: Use of classification algorithms in noise detection and elimination. In: Corchado E., Wu X., Oja E., Herrero l., Baruque B. (eds.) HAIS, Lecture Notes in Computer Science, vol. 5572, pp. 417–424. Springer, Heidelberg (2009)Google Scholar
  66. 66.
    Muhlenbach, F., Lallich, S., Zighed, D.A.: Identifying and handling mislabelled instances. J. Intell. Inf. Syst. 22(1), 89–109 (2004)CrossRefGoogle Scholar
  67. 67.
    Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. Rough Sets and Current Trends in Computing. LNCS, vol. 6086, pp. 158–167. Springer, Berlin (2010)Google Scholar
  68. 68.
    Nath, R.K.: Fingerprint recognition using multiple classifier system. Fractals 15(3), 273–278 (2007)zbMATHMathSciNetCrossRefGoogle Scholar
  69. 69.
    Nettleton, D., Orriols-Puig, A., Fornells, A.: A Study of the Effect of Different Types of Noise on the Precision of Supervised Learning Techniques. Artif. Intell. Rev. 33, 275–306 (2010)CrossRefGoogle Scholar
  70. 70.
    Pérez Carlos Javier, G.F.J.M.J.R.M.R.C.: Misclassified multinomial data: a Bayesian approach. RACSAM 101(1), 71–80 (2007)Google Scholar
  71. 71.
    Pimenta, E., Gama, J.: A study on error correcting output codes. In: Portuguese Conference on Artificial Intelligence EPIA 2005, 218–223 (2005)Google Scholar
  72. 72.
    Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–45 (2006)CrossRefGoogle Scholar
  73. 73.
    Qian, B., Rasheed, K.: Foreign exchange market prediction with multiple classifiers. J. Forecast. 29(3), 271–284 (2010)zbMATHMathSciNetGoogle Scholar
  74. 74.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  75. 75.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)Google Scholar
  76. 76.
    Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)zbMATHMathSciNetGoogle Scholar
  77. 77.
    Sáez, J.A., Luengo, J., Herrera, F.: Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit. 46(1), 355–364 (2013)CrossRefGoogle Scholar
  78. 78.
    Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recognit. Lett. 24(7), 1015–1022 (2003)CrossRefGoogle Scholar
  79. 79.
    Segata, N., Blanzieri, E., Delany, S.J., Cunningham, P.: Noise reduction for instance-based learning with a local maximal margin approach. J. Intell. Inf. Syst. 35(2), 301–331 (2010)CrossRefGoogle Scholar
  80. 80.
    Shapley, L., Grofman, B.: Optimizing group judgmental accuracy in the presence of interdependencies. Pub. Choice 43, 329–343 (1984)CrossRefGoogle Scholar
  81. 81.
    Smith, M.R., Martinez, T.R.: Improving classification accuracy by identifying and removing instances that should be misclassified. In: IJCNN, pp. 2690–2697 (2011)Google Scholar
  82. 82.
    Sun, J., ying Zhao, F., Wang, C.J., Chen, S.: Identifying and correcting mislabeled training instances. In: FGCN (1), pp. 244–250. IEEE (2007)Google Scholar
  83. 83.
    Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of Imbalanced Data: a Review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)CrossRefGoogle Scholar
  84. 84.
    Teng, C.M.: Correcting Noisy Data. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 239–248. Morgan Kaufmann Publishers, San Francisco, USA (1999)Google Scholar
  85. 85.
    Teng, C.M.: Polishing blemishes: Issues in data correction. IEEE Intell. Syst. 19(2), 34–39 (2004)CrossRefGoogle Scholar
  86. 86.
    Thongkam, J., Xu, G., Zhang, Y., Huang, F.: Support vector machine for outlier detection in breast cancer survivability prediction. In: Ishikawa Y., He J., Xu G., Shi Y., Huang G., Pang C., Zhang Q., Wang G. (eds.) APWeb Workshops, Lecture Notes in Computer Science, vol. 4977, pp. 99–109. Springer (2008)Google Scholar
  87. 87.
    Titterington, D.M., Murray, G.D., Murray, L.S., Spiegelhalter, D.J., Skene, A.M., Habbema, J.D.F., Gelpke, G.J.: Comparison of discriminant techniques applied to a complex data set of head injured patients. J. R. Stat. Soc. Series A (General) 144, 145–175 (1981)zbMATHMathSciNetCrossRefGoogle Scholar
  88. 88.
    Tomek, I.: Two Modifications of CNN. IEEE Tran. Syst. Man Cybern. 7(2), 679–772 (1976)MathSciNetGoogle Scholar
  89. 89.
    Verbaeten, S., Assche, A.V.: Ensemble methods for noise elimination in classification problems. In: Fourth International Workshop on Multiple Classifier Systems, pp. 317–325. Springer, Heidelberg (2003)Google Scholar
  90. 90.
    Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)CrossRefGoogle Scholar
  91. 91.
    Wemecke, K.D.: A coupling procedure for the discrimination of mixed data. Biometrics 48, 497–506 (1992)CrossRefGoogle Scholar
  92. 92.
    Wheway, V.: Using boosting to detect noisy data. In: Revised Papers from the PRICAI 2000 Workshop Reader, Four Workshops Held at PRICAI 2000 on Advances in Artificial Intelligence, pp. 123–132. Springer (2001)Google Scholar
  93. 93.
    Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97, pp. 403–411. Morgan Kaufmann Publishers Inc. (1997)Google Scholar
  94. 94.
    Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inform. Fus. 16, 3–17 (2013)Google Scholar
  95. 95.
    Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)zbMATHMathSciNetGoogle Scholar
  96. 96.
    Wu, X.: Knowledge Acquisition From Databases. Ablex Publishing Corp, Norwood (1996)Google Scholar
  97. 97.
    Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)CrossRefGoogle Scholar
  98. 98.
    Zhang, C., Wu, C., Blanzieri, E., Zhou, Y., Wang, Y., Du, W., Liang, Y.: Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model. Bioinformatics 25(20), 2708–2714 (2009)CrossRefGoogle Scholar
  99. 99.
    Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Analyzing software measurement data with clustering techniques. IEEE Intell. Syst. 19(2), 20–27 (2004)CrossRefGoogle Scholar
  100. 100.
    Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004)zbMATHCrossRefGoogle Scholar
  101. 101.
    Zhu, X., Wu, X.: Class noise handling for effective cost-sensitive learning by cost-guided iterative classification filtering. IEEE Trans. Knowl. Data Eng. 18(10), 1435–1440 (2006)MathSciNetCrossRefGoogle Scholar
  102. 102.
    Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: Proceeding of the Twentieth International Conference on Machine Learning, pp. 920–927 (2003)Google Scholar
  103. 103.
    Zhu, X., Wu, X., Chen, Q.: Bridging local and global data cleansing: Identifying class noise in large, distributed data datasets. Data Min. Knowl. Discov. 12(2–3), 275–308 (2006)MathSciNetCrossRefGoogle Scholar
  104. 104.
    Zhu, X., Wu, X., Yang, Y.: Error detection and impact-sensitive instance ranking in noisy datasets. In: Proceedings of the Nineteenth National Conference on Artificial Intelligence, pp. 378–383. AAAI Press (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Salvador García
    • 1
  • Julián Luengo
    • 2
  • Francisco Herrera
    • 3
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain
  2. 2.Department of Civil EngineeringUniversity of BurgosBurgosSpain
  3. 3.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations