Efficient $$k$$ -NN classification based on homogeneous clusters

Ougiaroglou, Stefanos; Evangelidis, Georgios

doi:10.1007/s10462-013-9411-1

Efficient $k$-NN classification based on homogeneous clusters

Published: 11 July 2013

Volume 42, pages 491–513, (2014)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Stefanos Ougiaroglou¹ &
Georgios Evangelidis¹

307 Accesses
3 Citations
Explore all metrics

Abstract

The $k$-NN classifier is a widely used classification algorithm. However, exhaustively searching the whole dataset for the nearest neighbors is prohibitive for large datasets because of the high computational cost involved. The paper proposes an efficient model for fast and accurate nearest neighbor classification. The model consists of a non-parametric cluster-based preprocessing algorithm that constructs a two-level speed-up data structure and algorithms that access this structure to perform the classification. Furthermore, the paper demonstrates how the proposed model can improve the performance on reduced sets built by various data reduction techniques. The proposed classification model was evaluated using eight real-life datasets and compared to known speed-up methods. The experimental results show that it is a fast and accurate classifier, and, in addition, it involves low pre-processing computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

DRTs have two points of view: (1) item reduction, and, (2) dimensionality reduction. We consider them from the first point of view.
Detailed experimental results are available at http://users.uom.gr/~stoug/AIRJ_experiments.zip.
http://sci2s.ugr.es/keel/datasets.php.

References

Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man Mach Stud 36(2):267–287
Article Google Scholar
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Google Scholar
Alcalá-Fdez J, Sánchez L, García S, del Jesús MJ, Ventura S, i Guiu JMG, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Article Google Scholar
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2):153–172
Article MathSciNet MATH Google Scholar
Chen CH, Jóźwik A (1996) A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn Lett 17:819–823
Article Google Scholar
Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition—volume 02. IEEE Computer Society, Washington, DC, USA, ICPR ’06, pp 556–559
Dasarathy BV (1991) Nearest neighbor (NN) norms : NN pattern classification techniques. IEEE Computer Society Press, Silver Spring
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Devi VS, Murty MN (2002) An incremental prototype set building technique. Pattern Recogn 35(2):505–513
Article MATH Google Scholar
García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J Heuristics 15(6):617–644
Article MATH Google Scholar
Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
Article Google Scholar
Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433
Article Google Scholar
Grochowski M, Jankowski N (2004) Comparison of instance selection algorithms ii. Results and comments. In: artificial intelligence and soft computing—ICAISC 2004, LNCS, vol 3070. Springer, Berlin, pp 580–585
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, San Francisco, CA
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516
Article Google Scholar
Hwang S, Cho S (2007) Clustering-based reference set reduction for k-nearest neighbor. In: 4th international symposium on neural networks: part II-advances in neural networks. Springer, ISNN ’07, pp 880–888
James M (1985) Classification algorithms. Wiley, New York
MATH Google Scholar
Jankowski N, Grochowski M (2004) Comparison of instances seletion algorithms i. Algorithms survey. In: Artificial intelligence and soft computing—ICAISC 2004, LNCS, vol 3070. Springer, Berlin, pp 598–603
Karamitopoulos L, Evangelidis G (2009) Cluster-based similarity search in time series. In: Proceedings of the fourth Balkan conference in informatics. IEEE Computer Society, Washington, DC, USA, BCI ’09, pp 113–118
Lozano M (2007) Data reduction techniques in classification processes (Phd Thesis). Universitat Jaume I
Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, New York/London
McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, CA, pp 281–298
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34(2):133–143
Article Google Scholar
Olvera-Lopez JA, Carrasco-Ochoa JA, Trinidad JFM (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141
Article MathSciNet Google Scholar
Ougiaroglou S, Evangelidis G (2012a) Efficient dataset size reduction by finding homogeneous clusters. In: Proceedings of the fifth Balkan conference in informatics. ACM, New York, NY, USA, BCI ’12, pp 168–173
Ougiaroglou S, Evangelidis G (2012b) A fast hybrid $k$-NN classifier based on homogeneous clusters. In: Artificial intelligence applications and innovations. Springer, Berlin, IFIP advances in information and communication technology 381:327–336
Ougiaroglou S, Evangelidis G, Dervos DA (2012) An adaptive hybrid and cluster-based model for speeding up the $k$-NN classifier. In: Proceedings of the 7th international conference on hybrid artificial intelligent systems—volume part II. Springer, Berlin, Heidelberg, HAIS’12, pp 163–175
Ritter G, Woodruff H, Lowry S, Isenhour T (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6):665–669
Article MATH Google Scholar
Rokach L (2007) Data mining with decision trees: theory and applications. Series in machine perception and artificial intelligence. World Scientific Publishing Company, Incorporated, Singapore
Google Scholar
Samet H (2006) Foundations of multidimensional and metric data structures. The Morgan Kaufmann series in computer graphics. Morgan Kaufmann, San Francisco, CA
Sánchez JS (2004) High training set size reduction by space partitioning and prototype abstraction. Pattern Recogn 37(7):1561–1564
Article Google Scholar
Sheskin D (2011) Handbook of parametric and nonparametric statistical procedures. A Chapman and Hall book, Chapman and Hall/CRC, London
MATH Google Scholar
Toussaint G (2002) Proximity graphs for nearest neighbor decision rules: recent progress. In: 34th symposium on the interface, pp 17–20
Triguero I, Derrac J, Francisco Herrera SG (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern Part C 42(1):86–100
Article Google Scholar
Wang X (2011) A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: The 2011 international joint conference on neural networks (IJCNN), pp 1293–1299
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Article MATH Google Scholar
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Article MATH Google Scholar
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, NY, USA, ICML ’06, pp 1033–1040
Zhang B, Srihari SN (2004) Fast $k$-nearest neighbor classification using cluster-based trees. IEEE Trans Pattern Anal Mach Intell 26(4):525–528
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece
Stefanos Ougiaroglou & Georgios Evangelidis

Authors

Stefanos Ougiaroglou
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Evangelidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefanos Ougiaroglou.

Additional information

Stefanos Ougiaroglou is supported by the Greek State Scholarships Foundation (IKY).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ougiaroglou, S., Evangelidis, G. Efficient $k$-NN classification based on homogeneous clusters. Artif Intell Rev 42, 491–513 (2014). https://doi.org/10.1007/s10462-013-9411-1

Download citation

Published: 11 July 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10462-013-9411-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Efficient \(k\)-NN classification based on homogeneous clusters

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient \(k\)-NN classification based on homogeneous clusters

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation