Skip to main content

A Method for Class Noise Detection Based on K-means and SVM Algorithms

  • Conference paper
  • First Online:
Intelligent Software Methodologies, Tools and Techniques (SoMeT 2015)

Abstract

One of the techniques for improving the accuracy of induced classifier is noise filtering. The classifiers prediction performance is affected by the noisy datasets used in the induction of classifiers. Therefore, it is very important to detect and remove the noise in order to increase the classification accuracy. This paper proposed a model for noise detection in the datasets using k-means and support vector machine (SVM) techniques. The proposed model has been tested using the datasets from University of California, Irvine machine learning repository. Experimental results reveal that the proposed model can improve data quality and increase the classification accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lowongtrakool, C.: Noise filtering in unsupervised clustering using computation intelligence. Int. J. Math. Anal. 6, 2911–2920 (2012)

    Google Scholar 

  2. Sluban, B., Gamberger, D., Lavra, N.: Advances in class noise detection, pp. 1105–1106 (2010)

    Google Scholar 

  3. Daza, L., Acuna, E.: An algorithm for detecting noise on supervised classification (2007)

    Google Scholar 

  4. Frank, A., Asuncion, A: UCI machine learning repository (2011). https://archive.ics.uci.edu/ml15:22

  5. Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 1–43 (2004)

    Article  Google Scholar 

  6. Van Hulse, J.D., Khoshgoftaar, T.M., Huang, H.: The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 11, 171–190 (2006)

    Article  Google Scholar 

  7. Miranda, A.L., Garcia, L.P.F., Carvalho, A.C., Lorena, A.C.: Use of classification algorithms in noise detection and elimination. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 417–424. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Li, D., Hu, W., Xiong, W., Yang, J.: Fuzzy relevance vector machine for learning from unbalanced data and noise. Pattern Recogn. Lett. 29, 1175–1181 (2008)

    Article  Google Scholar 

  9. Xiong, H., Pandey, G., Member, S.: Enhancing data analysis with noise removal. IEEE Trans. Knowl. Data Eng. 18, 304–319 (2006)

    Article  Google Scholar 

  10. Li, Y.: Classification in the presence of class noise. Pattern Recogn. 5, 1–30 (2003)

    Google Scholar 

  11. Zeng, X., Martinez, T.: A noise filtering method using neural networks. In: IEEE lnternational Workshop on Soft Computing Techniques in Instrumentatian, Measurement and Related Application, SCIMA 2003, pp. 26–31. IEEE (2003)

    Google Scholar 

  12. Zhu, X., Chen, Q.: eliminating class noise in large datasets, pp. 920–927.(2003)

    Google Scholar 

  13. Lawrence, N.D., Schölkopf, B.: Estimating a kernel Fisher discriminant in the presence of label noise. In: ICML, pp. 306–313. Citeseer (2001)

    Google Scholar 

  14. Gamberger, D., Lavrac, N.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14(2), 205–223 (2000)

    Article  Google Scholar 

  15. Shah, Z., Mahmood, A.N., Mustafa, A.K.: A hybrid approach to improving clustering accuracy using SVM. In: Industrial Electronics and Applications (ICIEA), pp. 783–788. IEEE (2013)

    Google Scholar 

  16. Vapnik, V.N., Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    Google Scholar 

  17. Jiang, B., Zhang, X., Cai, T.: Estimating the confidence interval for prediction errors of support vector machine classifiers. J Mach. Learn. Res. 9, 521–540 (2008)

    MathSciNet  Google Scholar 

  18. Kordos, M., Rusiecki, A.: Improving MLP neural network performance by noise reduction. In: Dediu, A.-H., Martín-Vide, C., Truthe, B., Vega-Rodríguez, M.A. (eds.) TPNC 2013. LNCS, vol. 8273, pp. 133–144. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. Salehi, S., Selamat, A., Mashinchi, R., Fujita, H.: The synergistic combination of particle swarm optimization and fuzzy sets to design granular classifier. Knowl.-Based Syst. 76, 200–218 (2015)

    Article  Google Scholar 

  20. Byeon, B., Rasheed, K., Doshi, P.: Enhancing the quality of noisy training data using a genetic algorithm and prototype selection. In: IC-AI, pp. 821–827 (2008)

    Google Scholar 

  21. Utkin, L.V., Zhuk, Y.A.: Robust boosting classification models with local sets of probability distributions. Knowl.-Based Syst. 61, 59–75 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

This work is supported by the Ministry of Education and Research Management Centre at the Universiti Teknologi Malaysia under the Research University Grant Scheme (Vote No. Q.J130000.2528.05H84).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zahra Nematzadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nematzadeh, Z., Ibrahim, R., Selamat, A. (2015). A Method for Class Noise Detection Based on K-means and SVM Algorithms. In: Fujita, H., Guizzi, G. (eds) Intelligent Software Methodologies, Tools and Techniques. SoMeT 2015. Communications in Computer and Information Science, vol 532. Springer, Cham. https://doi.org/10.1007/978-3-319-22689-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22689-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22688-0

  • Online ISBN: 978-3-319-22689-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics