Abstract
The k nearest neighbor (KNN) classifier is one of the well-known instance-based classifiers. Nevertheless, the low efficiency in both running time and memory usage is a great challenge in the KNN classifier and its improvements due to noise and redundant samples. Although hybrid instance reduction approaches have been postulated as a good solution, they still suffer from the following issues: (a) adopted edition methods in existing hybrid instance reduction approaches are susceptible to harmful samples around the tested sample; (b) existing hybrid instance reduction approaches retain many internal samples, which contributes little to the classification accuracy and (or) leading to the low reduction rate; (c) existing hybrid instance reduction approaches rely on more than one parameter. The chief contributions of this article are that (a) a novel heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering (HIRRDKM) is proposed against the above issues; (b) a novel concept, i.e., the adaptive relative distance, is first proposed and calculated for each sample; (c) a novel edition method based on adaptive relative distance in HIRRDKM is second proposed to filter out harmful samples; (d) a novel condensing method based on adaptive relative distance and k-means clustering in HIRRDKM is third proposed to obtain condensed borderline samples from the training set without harmful samples. Experiments have proved that (a) HIRRDKM outperforms 6 state-of-the-art hybrid instance reduction methods on real data sets from various fields in weighing reduction rate and classification accuracy of KNN-based classifiers; (b) the running time of HIRRDKM is competitive.
Similar content being viewed by others
Change history
29 April 2024
A Correction to this paper has been published: https://doi.org/10.1007/s11227-024-06139-0
References
Zhu H, Wang X, Wang R (2022) Fuzzy monotonic K-nearest neighbor versus monotonic fuzzy K-nearest neighbor. IEEE Trans Fuzzy Syst 30(9):3501–3513
Ma Y, Huang R, Yan M, Li G, Wang T (2022) Attention-based local mean K-nearest centroid neighbor classifier. Expert Syst Appl 201:117159
Pan Z, Wang Y, Ku W (2017) A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Syst Appl 67:115–125
Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recognit Lett 140:172–178
Heo JP, Lin Z, Yoon SE (2019) Distance encoded product quantization for approximate K-nearest neighbor search in high-dimensional space. IEEE Trans Pattern Anal Mach Intell 41(9):2084–2097
Nikolaidis K, Rodriguez-Martinez E, Goulermas JY, Wu QH (2012) Spectral graph optimization for instance reduction. IEEE Trans Neural Netw Learn Syst 23(7):1169–1175
Xuan J et al (2015) Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng 27(1):264–280
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC-2 3:408–421
Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516
Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541
Sánchez J, Barandela R, Marques A, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recognit Lett 24(7):1015–1022
Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433
Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017
Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715
Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660
Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recognit Lett 31(2):131–142
Hernandezleal P, Carrascoochoa JA, MartínezTrinidad JF, Olveralopez JA (2013) Instancerank based on borders for instance selection. Pattern Recognit 46(1):365–375
Li J, Wang Y (2015) A new fast reduction technique based on binary nearest neighbor tree. Neurocomputing 149(3):647–1657
Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40(17):6894–6900
Leyva E, Antonio G, Raúl P (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recognit 48(4):1523–1537
Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287
Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245
Khan I, Luo Z, Huang JZ, Shahzad W (2020) Variable weighting in fuzzy k-means clustering to determine the number of clusters. IEEE Trans Knowl Data Eng 23(9):1838–1853
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80(1):30–36
Zhu Y, Jia C, Li G, Song J (2020) Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal Biochem 593:113592
Aziz Y, Memon KH (2023) Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recognit 136:109183
Zhao Y, Wang Y, Zhang J, Fu CW, Xu M, Moritz D (2022) KD-Box: line-segment-based KD-tree for interactive exploration of large-scale time-series data. IEEE Trans Vis Comput Graph 28(1):890–900
Mohammadi M, Hofman W, Tan YH (2019) A comparative study of ontology matching systems via inferential statistics. IEEE Trans Knowl Data Eng 31(4):615–628
Trabelsi A, Elouedi Z, Lefevre E (2023) An ensemble classifier through rough set reducts for handling data with evidential attributes. Inf Sci 635:414–429
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 62306050.
Author information
Authors and Affiliations
Contributions
Junnan Li: Software, Conceptualization, Methodology, Formal analysis, Funding acquisition, Supervision, Writing - review & editing and Methodology, Project administration Qing Zhao: Methodology, Formal analysis, and Project administration Shuang Liu: Project administration, Methodology, and Writing - review & editing
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: In this article Junnan Li was incorrectly denoted as the corresponding author but it should have been Qing Zhao.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J., Zhao, Q. & Liu, S. A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering. J Supercomput (2024). https://doi.org/10.1007/s11227-023-05885-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-023-05885-x