Skip to main content

Data Preprocessing and Finding Optimal Value of K for KNN Model

  • Conference paper
  • First Online:
Soft Computing and Signal Processing (ICSCSP 2021)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1413))

Included in the following conference series:

Abstract

K-nearest neighbor (KNN) is a simple classifier used in the classification of medical data. The performance of KNN depends on the data used for classification and the number of neighbors considered (K). Data preprocessing is considered to be an important step in data mining to improve the quality of the data. Preprocessing involves data cleaning by removing duplicates and noise, data normalization, feature selection, etc. Hence, in this paper, preprocessing the data is done by removing the irrelevant attributes present in the dataset using correlation matrix, and suitable value of K is chosen for KNN algorithm which helps in improving the performance of KNN model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Z. Deng et al., Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)

    Article  Google Scholar 

  2. H.K. Chantar, D.W. Corne, Feature subset selection for Arabic document categorization using BPSO-KNN, in 2011 Third World Congress on Nature and Biologically Inspired Computing (IEEE, 2011), pp. 546–551

    Google Scholar 

  3. H.S. Khamis, K.W. Cheruiyot, S. Kimani, Application of k-nearest neighbor classification in medical data mining. Int. J. Inf. Commun. Technol. Res. 4(4) (2014)

    Google Scholar 

  4. S. Garcia, J. Luengo, F. Herrera, Data Preprocessing in Data Mining (Springer, 2015)

    Google Scholar 

  5. L. Jiang et al., Survey of improving k-nearest-neighbor for classification, in Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 1 (IEEE, 2007), pp. 679–683

    Google Scholar 

  6. H. Parvin, H. Alizadeh, B. Minaei-Bidgoli, MKNN: Modified k-nearest neighbor, in Proceedings of the World Congress on Engineering and Computer Science, vol. 1 (Citeseer, 2008)

    Google Scholar 

  7. Q. Song, J. Ni, G. Wang, A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2011)

    Google Scholar 

  8. Y. Li, T. Li, H. Liu, Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53(3), 551–577 (2017)

    Google Scholar 

  9. S.A. Mostafa et al., Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 54, 90–99 (2019)

    Google Scholar 

  10. C.H. Park, S.B. Kim, Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst. Appl. 42(5), 2336–2342 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roopashri Shetty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shetty, R., Geetha, M., Acharya, D.U., Shyamala, G. (2022). Data Preprocessing and Finding Optimal Value of K for KNN Model. In: Reddy, V.S., Prasad, V.K., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2021. Advances in Intelligent Systems and Computing, vol 1413. Springer, Singapore. https://doi.org/10.1007/978-981-16-7088-6_1

Download citation

Publish with us

Policies and ethics