A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering

Li, Junnan; Zhao, Qing; Liu, Shuang

doi:10.1007/s11227-023-05885-x

A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering

Published: 27 February 2024

(2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Junnan Li^1,2,3,
Qing Zhao^4,5 &
Shuang Liu²

45 Accesses
Explore all metrics

A Correction to this article was published on 29 April 2024

This article has been updated

Abstract

The k nearest neighbor (KNN) classifier is one of the well-known instance-based classifiers. Nevertheless, the low efficiency in both running time and memory usage is a great challenge in the KNN classifier and its improvements due to noise and redundant samples. Although hybrid instance reduction approaches have been postulated as a good solution, they still suffer from the following issues: (a) adopted edition methods in existing hybrid instance reduction approaches are susceptible to harmful samples around the tested sample; (b) existing hybrid instance reduction approaches retain many internal samples, which contributes little to the classification accuracy and (or) leading to the low reduction rate; (c) existing hybrid instance reduction approaches rely on more than one parameter. The chief contributions of this article are that (a) a novel heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering (HIRRDKM) is proposed against the above issues; (b) a novel concept, i.e., the adaptive relative distance, is first proposed and calculated for each sample; (c) a novel edition method based on adaptive relative distance in HIRRDKM is second proposed to filter out harmful samples; (d) a novel condensing method based on adaptive relative distance and k-means clustering in HIRRDKM is third proposed to obtain condensed borderline samples from the training set without harmful samples. Experiments have proved that (a) HIRRDKM outperforms 6 state-of-the-art hybrid instance reduction methods on real data sets from various fields in weighing reduction rate and classification accuracy of KNN-based classifiers; (b) the running time of HIRRDKM is competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review for class-imbalance in semi-supervised learning

Article 04 September 2023

A comprehensive survey on feature selection in the various fields of machine learning

Article 23 July 2021

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Change history

29 April 2024
A Correction to this paper has been published: https://doi.org/10.1007/s11227-024-06139-0

References

Zhu H, Wang X, Wang R (2022) Fuzzy monotonic K-nearest neighbor versus monotonic fuzzy K-nearest neighbor. IEEE Trans Fuzzy Syst 30(9):3501–3513
Article Google Scholar
Ma Y, Huang R, Yan M, Li G, Wang T (2022) Attention-based local mean K-nearest centroid neighbor classifier. Expert Syst Appl 201:117159
Article Google Scholar
Pan Z, Wang Y, Ku W (2017) A new k-harmonic nearest neighbor classifier based on the multi-local means. Expert Syst Appl 67:115–125
Article Google Scholar
Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recognit Lett 140:172–178
Article Google Scholar
Heo JP, Lin Z, Yoon SE (2019) Distance encoded product quantization for approximate K-nearest neighbor search in high-dimensional space. IEEE Trans Pattern Anal Mach Intell 41(9):2084–2097
Article Google Scholar
Nikolaidis K, Rodriguez-Martinez E, Goulermas JY, Wu QH (2012) Spectral graph optimization for instance reduction. IEEE Trans Neural Netw Learn Syst 23(7):1169–1175
Article Google Scholar
Xuan J et al (2015) Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng 27(1):264–280
Article Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC-2 3:408–421
Article MathSciNet Google Scholar
Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516
Article Google Scholar
Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541
Article Google Scholar
Sánchez J, Barandela R, Marques A, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recognit Lett 24(7):1015–1022
Article Google Scholar
Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433
Article Google Scholar
Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017
MathSciNet Google Scholar
Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715
Article Google Scholar
Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660
Article Google Scholar
Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recognit Lett 31(2):131–142
Article Google Scholar
Hernandezleal P, Carrascoochoa JA, MartínezTrinidad JF, Olveralopez JA (2013) Instancerank based on borders for instance selection. Pattern Recognit 46(1):365–375
Article Google Scholar
Li J, Wang Y (2015) A new fast reduction technique based on binary nearest neighbor tree. Neurocomputing 149(3):647–1657
Google Scholar
Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40(17):6894–6900
Article Google Scholar
Leyva E, Antonio G, Raúl P (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recognit 48(4):1523–1537
Article Google Scholar
Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287
Article Google Scholar
Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245
Article Google Scholar
Khan I, Luo Z, Huang JZ, Shahzad W (2020) Variable weighting in fuzzy k-means clustering to determine the number of clusters. IEEE Trans Knowl Data Eng 23(9):1838–1853
Article Google Scholar
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80(1):30–36
Article Google Scholar
Zhu Y, Jia C, Li G, Song J (2020) Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal Biochem 593:113592
Article Google Scholar
Aziz Y, Memon KH (2023) Fast geometrical extraction of nearest neighbors from multi-dimensional data. Pattern Recognit 136:109183
Article Google Scholar
Zhao Y, Wang Y, Zhang J, Fu CW, Xu M, Moritz D (2022) KD-Box: line-segment-based KD-tree for interactive exploration of large-scale time-series data. IEEE Trans Vis Comput Graph 28(1):890–900
Article Google Scholar
Mohammadi M, Hofman W, Tan YH (2019) A comparative study of ontology matching systems via inferential statistics. IEEE Trans Knowl Data Eng 31(4):615–628
Article Google Scholar
Trabelsi A, Elouedi Z, Lefevre E (2023) An ensemble classifier through rough set reducts for handling data with evidential attributes. Inf Sci 635:414–429
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62306050.

Author information

Authors and Affiliations

School of Cybersecurity, University of Electronic Science and Technology of China, Chengdu, 611731, China
Junnan Li
School of Artificial Intelligence and Big Data, Chongqing Industry Polytechnic College, Chongqing, 401120, China
Junnan Li & Shuang Liu
Mashang Consumer Finance Co., Ltd., Chongqing, 401120, China
Junnan Li
Chongqing Yubei Data Valley Primary School, Chongqing, 401120, China
Qing Zhao
College of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China
Qing Zhao

Authors

Junnan Li
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Junnan Li: Software, Conceptualization, Methodology, Formal analysis, Funding acquisition, Supervision, Writing - review & editing and Methodology, Project administration Qing Zhao: Methodology, Formal analysis, and Project administration Shuang Liu: Project administration, Methodology, and Writing - review & editing

Corresponding author

Correspondence to Qing Zhao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: In this article Junnan Li was incorrectly denoted as the corresponding author but it should have been Qing Zhao.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (RAR 4089 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Zhao, Q. & Liu, S. A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering. J Supercomput (2024). https://doi.org/10.1007/s11227-023-05885-x

Download citation

Accepted: 26 December 2023
Published: 27 February 2024
DOI: https://doi.org/10.1007/s11227-023-05885-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering

Abstract

Access this article

Similar content being viewed by others

A systematic review for class-imbalance in semi-supervised learning

A comprehensive survey on feature selection in the various fields of machine learning

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Change history

29 April 2024

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (RAR 4089 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering

Abstract

Access this article

Similar content being viewed by others

A systematic review for class-imbalance in semi-supervised learning

A comprehensive survey on feature selection in the various fields of machine learning

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Change history

29 April 2024

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (RAR 4089 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation