Advertisement

Artificial Intelligence Review

, Volume 42, Issue 3, pp 491–513 | Cite as

Efficient \(k\)-NN classification based on homogeneous clusters

  • Stefanos OugiaroglouEmail author
  • Georgios Evangelidis
Article

Abstract

The \(k\)-NN classifier is a widely used classification algorithm. However, exhaustively searching the whole dataset for the nearest neighbors is prohibitive for large datasets because of the high computational cost involved. The paper proposes an efficient model for fast and accurate nearest neighbor classification. The model consists of a non-parametric cluster-based preprocessing algorithm that constructs a two-level speed-up data structure and algorithms that access this structure to perform the classification. Furthermore, the paper demonstrates how the proposed model can improve the performance on reduced sets built by various data reduction techniques. The proposed classification model was evaluated using eight real-life datasets and compared to known speed-up methods. The experimental results show that it is a fast and accurate classifier, and, in addition, it involves low pre-processing computational cost.

Keywords

Nearest neighbors Classification Clustering 

References

  1. Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man Mach Stud 36(2):267–287CrossRefGoogle Scholar
  2. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66Google Scholar
  3. Alcalá-Fdez J, Sánchez L, García S, del Jesús MJ, Ventura S, i Guiu JMG, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318CrossRefGoogle Scholar
  4. Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2):153–172MathSciNetCrossRefzbMATHGoogle Scholar
  5. Chen CH, Jóźwik A (1996) A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn Lett 17:819–823CrossRefGoogle Scholar
  6. Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition—volume 02. IEEE Computer Society, Washington, DC, USA, ICPR ’06, pp 556–559Google Scholar
  7. Dasarathy BV (1991) Nearest neighbor (NN) norms : NN pattern classification techniques. IEEE Computer Society Press, Silver SpringGoogle Scholar
  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  9. Devi VS, Murty MN (2002) An incremental prototype set building technique. Pattern Recogn 35(2):505–513CrossRefzbMATHGoogle Scholar
  10. García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J Heuristics 15(6):617–644CrossRefzbMATHGoogle Scholar
  11. Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRefGoogle Scholar
  12. Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433CrossRefGoogle Scholar
  13. Grochowski M, Jankowski N (2004) Comparison of instance selection algorithms ii. Results and comments. In: artificial intelligence and soft computing—ICAISC 2004, LNCS, vol 3070. Springer, Berlin, pp 580–585Google Scholar
  14. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, San Francisco, CAGoogle Scholar
  15. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516CrossRefGoogle Scholar
  16. Hwang S, Cho S (2007) Clustering-based reference set reduction for k-nearest neighbor. In: 4th international symposium on neural networks: part II-advances in neural networks. Springer, ISNN ’07, pp 880–888Google Scholar
  17. James M (1985) Classification algorithms. Wiley, New YorkzbMATHGoogle Scholar
  18. Jankowski N, Grochowski M (2004) Comparison of instances seletion algorithms i. Algorithms survey. In: Artificial intelligence and soft computing—ICAISC 2004, LNCS, vol 3070. Springer, Berlin, pp 598–603Google Scholar
  19. Karamitopoulos L, Evangelidis G (2009) Cluster-based similarity search in time series. In: Proceedings of the fourth Balkan conference in informatics. IEEE Computer Society, Washington, DC, USA, BCI ’09, pp 113–118Google Scholar
  20. Lozano M (2007) Data reduction techniques in classification processes (Phd Thesis). Universitat Jaume IGoogle Scholar
  21. Mardia K, Kent J, Bibby J (1979) Multivariate analysis. Academic Press, New York/LondonGoogle Scholar
  22. McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, CA, pp 281–298Google Scholar
  23. Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34(2):133–143CrossRefGoogle Scholar
  24. Olvera-Lopez JA, Carrasco-Ochoa JA, Trinidad JFM (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141MathSciNetCrossRefGoogle Scholar
  25. Ougiaroglou S, Evangelidis G (2012a) Efficient dataset size reduction by finding homogeneous clusters. In: Proceedings of the fifth Balkan conference in informatics. ACM, New York, NY, USA, BCI ’12, pp 168–173Google Scholar
  26. Ougiaroglou S, Evangelidis G (2012b) A fast hybrid \(k\)-NN classifier based on homogeneous clusters. In: Artificial intelligence applications and innovations. Springer, Berlin, IFIP advances in information and communication technology 381:327–336Google Scholar
  27. Ougiaroglou S, Evangelidis G, Dervos DA (2012) An adaptive hybrid and cluster-based model for speeding up the \(k\)-NN classifier. In: Proceedings of the 7th international conference on hybrid artificial intelligent systems—volume part II. Springer, Berlin, Heidelberg, HAIS’12, pp 163–175Google Scholar
  28. Ritter G, Woodruff H, Lowry S, Isenhour T (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6):665–669CrossRefzbMATHGoogle Scholar
  29. Rokach L (2007) Data mining with decision trees: theory and applications. Series in machine perception and artificial intelligence. World Scientific Publishing Company, Incorporated, SingaporeGoogle Scholar
  30. Samet H (2006) Foundations of multidimensional and metric data structures. The Morgan Kaufmann series in computer graphics. Morgan Kaufmann, San Francisco, CAGoogle Scholar
  31. Sánchez JS (2004) High training set size reduction by space partitioning and prototype abstraction. Pattern Recogn 37(7):1561–1564CrossRefGoogle Scholar
  32. Sheskin D (2011) Handbook of parametric and nonparametric statistical procedures. A Chapman and Hall book, Chapman and Hall/CRC, LondonzbMATHGoogle Scholar
  33. Toussaint G (2002) Proximity graphs for nearest neighbor decision rules: recent progress. In: 34th symposium on the interface, pp 17–20Google Scholar
  34. Triguero I, Derrac J, Francisco Herrera SG (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern Part C 42(1):86–100CrossRefGoogle Scholar
  35. Wang X (2011) A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: The 2011 international joint conference on neural networks (IJCNN), pp 1293–1299Google Scholar
  36. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421CrossRefzbMATHGoogle Scholar
  37. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286CrossRefzbMATHGoogle Scholar
  38. Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, NY, USA, ICML ’06, pp 1033–1040Google Scholar
  39. Zhang B, Srihari SN (2004) Fast \(k\)-nearest neighbor classification using cluster-based trees. IEEE Trans Pattern Anal Mach Intell 26(4):525–528CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Department of Applied InformaticsUniversity of MacedoniaThessalonikiGreece

Personalised recommendations