Skip to main content
Log in

Efficient \(k\)-NN classification based on homogeneous clusters

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The \(k\)-NN classifier is a widely used classification algorithm. However, exhaustively searching the whole dataset for the nearest neighbors is prohibitive for large datasets because of the high computational cost involved. The paper proposes an efficient model for fast and accurate nearest neighbor classification. The model consists of a non-parametric cluster-based preprocessing algorithm that constructs a two-level speed-up data structure and algorithms that access this structure to perform the classification. Furthermore, the paper demonstrates how the proposed model can improve the performance on reduced sets built by various data reduction techniques. The proposed classification model was evaluated using eight real-life datasets and compared to known speed-up methods. The experimental results show that it is a fast and accurate classifier, and, in addition, it involves low pre-processing computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. DRTs have two points of view: (1) item reduction, and, (2) dimensionality reduction. We consider them from the first point of view.

  2. Detailed experimental results are available at http://users.uom.gr/~stoug/AIRJ_experiments.zip.

  3. http://sci2s.ugr.es/keel/datasets.php.

References

  • Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man Mach Stud 36(2):267–287

    Article  Google Scholar 

  • Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  • Alcalá-Fdez J, Sánchez L, García S, del Jesús MJ, Ventura S, i Guiu JMG, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318

    Article  Google Scholar 

  • Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2):153–172

    Article  MathSciNet  MATH  Google Scholar 

  • Chen CH, Jóźwik A (1996) A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn Lett 17:819–823

    Article  Google Scholar 

  • Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition—volume 02. IEEE Computer Society, Washington, DC, USA, ICPR ’06, pp 556–559

  • Dasarathy BV (1991) Nearest neighbor (NN) norms : NN pattern classification techniques. IEEE Computer Society Press, Silver Spring

    Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Devi VS, Murty MN (2002) An incremental prototype set building technique. Pattern Recogn 35(2):505–513

    Article  MATH  Google Scholar 

  • García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J Heuristics 15(6):617–644

    Article  MATH  Google Scholar 

  • Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435

    Article  Google Scholar 

  • Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433

    Article  Google Scholar 

  • Grochowski M, Jankowski N (2004) Comparison of instance selection algorithms ii. Results and comments. In: artificial intelligence and soft computing—ICAISC 2004, LNCS, vol 3070. Springer, Berlin, pp 580–585

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, San Francisco, CA

  • Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  • Hwang S, Cho S (2007) Clustering-based reference set reduction for k-nearest neighbor. In: 4th international symposium on neural networks: part II-advances in neural networks. Springer, ISNN ’07, pp 880–888

  • James M (1985) Classification algorithms. Wiley, New York

    MATH  Google Scholar 

  • Jankowski N, Grochowski M (2004) Comparison of instances seletion algorithms i. Algorithms survey. In: Artificial intelligence and soft computing—ICAISC 2004, LNCS, vol 3070. Springer, Berlin, pp 598–603

  • Karamitopoulos L, Evangelidis G (2009) Cluster-based similarity search in time series. In: Proceedings of the fourth Balkan conference in informatics. IEEE Computer Society, Washington, DC, USA, BCI ’09, pp 113–118

  • Lozano M (2007) Data reduction techniques in classification processes (Phd Thesis). Universitat Jaume I

  • Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, New York/London

  • McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, CA, pp 281–298

  • Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34(2):133–143

    Article  Google Scholar 

  • Olvera-Lopez JA, Carrasco-Ochoa JA, Trinidad JFM (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141

    Article  MathSciNet  Google Scholar 

  • Ougiaroglou S, Evangelidis G (2012a) Efficient dataset size reduction by finding homogeneous clusters. In: Proceedings of the fifth Balkan conference in informatics. ACM, New York, NY, USA, BCI ’12, pp 168–173

  • Ougiaroglou S, Evangelidis G (2012b) A fast hybrid \(k\)-NN classifier based on homogeneous clusters. In: Artificial intelligence applications and innovations. Springer, Berlin, IFIP advances in information and communication technology 381:327–336

  • Ougiaroglou S, Evangelidis G, Dervos DA (2012) An adaptive hybrid and cluster-based model for speeding up the \(k\)-NN classifier. In: Proceedings of the 7th international conference on hybrid artificial intelligent systems—volume part II. Springer, Berlin, Heidelberg, HAIS’12, pp 163–175

  • Ritter G, Woodruff H, Lowry S, Isenhour T (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6):665–669

    Article  MATH  Google Scholar 

  • Rokach L (2007) Data mining with decision trees: theory and applications. Series in machine perception and artificial intelligence. World Scientific Publishing Company, Incorporated, Singapore

    Google Scholar 

  • Samet H (2006) Foundations of multidimensional and metric data structures. The Morgan Kaufmann series in computer graphics. Morgan Kaufmann, San Francisco, CA

  • Sánchez JS (2004) High training set size reduction by space partitioning and prototype abstraction. Pattern Recogn 37(7):1561–1564

    Article  Google Scholar 

  • Sheskin D (2011) Handbook of parametric and nonparametric statistical procedures. A Chapman and Hall book, Chapman and Hall/CRC, London

    MATH  Google Scholar 

  • Toussaint G (2002) Proximity graphs for nearest neighbor decision rules: recent progress. In: 34th symposium on the interface, pp 17–20

  • Triguero I, Derrac J, Francisco Herrera SG (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern Part C 42(1):86–100

    Article  Google Scholar 

  • Wang X (2011) A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: The 2011 international joint conference on neural networks (IJCNN), pp 1293–1299

  • Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    Article  MATH  Google Scholar 

  • Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286

    Article  MATH  Google Scholar 

  • Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, NY, USA, ICML ’06, pp 1033–1040

  • Zhang B, Srihari SN (2004) Fast \(k\)-nearest neighbor classification using cluster-based trees. IEEE Trans Pattern Anal Mach Intell 26(4):525–528

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefanos Ougiaroglou.

Additional information

Stefanos Ougiaroglou is supported by the Greek State Scholarships Foundation (IKY).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ougiaroglou, S., Evangelidis, G. Efficient \(k\)-NN classification based on homogeneous clusters. Artif Intell Rev 42, 491–513 (2014). https://doi.org/10.1007/s10462-013-9411-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-013-9411-1

Keywords

Navigation