Skip to main content
Log in

A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

A Correction to this article was published on 29 April 2024

This article has been updated

Abstract

The k nearest neighbor (KNN) classifier is one of the well-known instance-based classifiers. Nevertheless, the low efficiency in both running time and memory usage is a great challenge in the KNN classifier and its improvements due to noise and redundant samples. Although hybrid instance reduction approaches have been postulated as a good solution, they still suffer from the following issues: (a) adopted edition methods in existing hybrid instance reduction approaches are susceptible to harmful samples around the tested sample; (b) existing hybrid instance reduction approaches retain many internal samples, which contributes little to the classification accuracy and (or) leading to the low reduction rate; (c) existing hybrid instance reduction approaches rely on more than one parameter. The chief contributions of this article are that (a) a novel heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering (HIRRDKM) is proposed against the above issues; (b) a novel concept, i.e., the adaptive relative distance, is first proposed and calculated for each sample; (c) a novel edition method based on adaptive relative distance in HIRRDKM is second proposed to filter out harmful samples; (d) a novel condensing method based on adaptive relative distance and k-means clustering in HIRRDKM is third proposed to obtain condensed borderline samples from the training set without harmful samples. Experiments have proved that (a) HIRRDKM outperforms 6 state-of-the-art hybrid instance reduction methods on real data sets from various fields in weighing reduction rate and classification accuracy of KNN-based classifiers; (b) the running time of HIRRDKM is competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Change history

References

  1. Zhu H, Wang X, Wang R (2022) Fuzzy monotonic K-nearest neighbor versus monotonic fuzzy K-nearest neighbor. IEEE Trans Fuzzy Syst 30(9):3501–3513

    Article  Google Scholar 

  2. Ma Y, Huang R, Yan M, Li G, Wang T (2022) Attention-based local mean K-nearest centroid neighbor classifier. Expert Syst Appl 201:117159

    Article  Google Scholar 

  3. Pan Z, Wang Y, Ku W (2017) A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Syst Appl 67:115–125

    Article  Google Scholar 

  4. Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recognit Lett 140:172–178

    Article  Google Scholar 

  5. Heo JP, Lin Z, Yoon SE (2019) Distance encoded product quantization for approximate K-nearest neighbor search in high-dimensional space. IEEE Trans Pattern Anal Mach Intell 41(9):2084–2097

    Article  Google Scholar 

  6. Nikolaidis K, Rodriguez-Martinez E, Goulermas JY, Wu QH (2012) Spectral graph optimization for instance reduction. IEEE Trans Neural Netw Learn Syst 23(7):1169–1175

    Article  Google Scholar 

  7. Xuan J et al (2015) Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng 27(1):264–280

    Article  Google Scholar 

  8. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC-2 3:408–421

    Article  MathSciNet  Google Scholar 

  9. Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  10. Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541

    Article  Google Scholar 

  11. Sánchez J, Barandela R, Marques A, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recognit Lett 24(7):1015–1022

    Article  Google Scholar 

  12. Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433

    Article  Google Scholar 

  13. Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017

    MathSciNet  Google Scholar 

  14. Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715

    Article  Google Scholar 

  15. Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660

    Article  Google Scholar 

  16. Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recognit Lett 31(2):131–142

    Article  Google Scholar 

  17. Hernandezleal P, Carrascoochoa JA, MartínezTrinidad JF, Olveralopez JA (2013) Instancerank based on borders for instance selection. Pattern Recognit 46(1):365–375

    Article  Google Scholar 

  18. Li J, Wang Y (2015) A new fast reduction technique based on binary nearest neighbor tree. Neurocomputing 149(3):647–1657

    Google Scholar 

  19. Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40(17):6894–6900

    Article  Google Scholar 

  20. Leyva E, Antonio G, Raúl P (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recognit 48(4):1523–1537

    Article  Google Scholar 

  21. Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287

    Article  Google Scholar 

  22. Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245

    Article  Google Scholar 

  23. Khan I, Luo Z, Huang JZ, Shahzad W (2020) Variable weighting in fuzzy k-means clustering to determine the number of clusters. IEEE Trans Knowl Data Eng 23(9):1838–1853

    Article  Google Scholar 

  24. Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80(1):30–36

    Article  Google Scholar 

  25. Zhu Y, Jia C, Li G, Song J (2020) Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal Biochem 593:113592

    Article  Google Scholar 

  26. Aziz Y, Memon KH (2023) Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recognit 136:109183

    Article  Google Scholar 

  27. Zhao Y, Wang Y, Zhang J, Fu CW, Xu M, Moritz D (2022) KD-Box: line-segment-based KD-tree for interactive exploration of large-scale time-series data. IEEE Trans Vis Comput Graph 28(1):890–900

    Article  Google Scholar 

  28. Mohammadi M, Hofman W, Tan YH (2019) A comparative study of ontology matching systems via inferential statistics. IEEE Trans Knowl Data Eng 31(4):615–628

    Article  Google Scholar 

  29. Trabelsi A, Elouedi Z, Lefevre E (2023) An ensemble classifier through rough set reducts for handling data with evidential attributes. Inf Sci 635:414–429

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62306050.

Author information

Authors and Affiliations

Authors

Contributions

Junnan Li: Software, Conceptualization, Methodology, Formal analysis, Funding acquisition, Supervision, Writing - review & editing and Methodology, Project administration Qing Zhao: Methodology, Formal analysis, and Project administration Shuang Liu: Project administration, Methodology, and Writing - review & editing

Corresponding author

Correspondence to Qing Zhao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: In this article Junnan Li was incorrectly denoted as the corresponding author but it should have been Qing Zhao.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (RAR 4089 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhao, Q. & Liu, S. A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering. J Supercomput (2024). https://doi.org/10.1007/s11227-023-05885-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-023-05885-x

Keywords

Navigation