An Adaptive Hybrid and Cluster-Based Model for Speeding Up the k-NN Classifier
A well known classification method is the k-Nearest Neighbors (k-NN) classifier. However, sequentially searching for the nearest neighbors in large datasets downgrades its performance because of the high computational cost involved. This paper proposes a cluster-based classification model for speeding up the k-NN classifier. The model aims to reduce the cost as much as possible and to maintain the classification accuracy at a high level. It consists of a simple data structure and a hybrid, adaptive algorithm that accesses this structure. Initially, a preprocessing clustering procedure builds the data structure. Then, the proposed algorithm, based on user-defined acceptance criteria, attempts to classify an incoming item using the nearest cluster centroids. Upon failure, the incoming item is classified by searching for the k nearest neighbors within specific clusters. The proposed approach was tested on five real life datasets. The results show that it can be used either to achieve a high accuracy with gains in cost or to reduce the cost at a minimum level with slightly lower accuracy.
Keywordsk-NN classifier cluster-based classification data reduction
Unable to display preview. Download preview PDF.
- 2.Dasarathy, B.V.: Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press (1991)Google Scholar
- 3.Datta, P., Kibler, D.: Learning symbolic prototypes. In: Proceedings of the Fourteenth ICML, pp. 158–166. Morgan Kaufmann (1997)Google Scholar
- 4.Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
- 5.Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(prePrints) (2011)Google Scholar
- 7.Hruschka, E.R., Hruschka, E.R.J., Ebecken, N.F.: Towards efficient imputation by nearest-neighbors: A clustering-based approach. In: Australian Conference on Artificial Intelligence, pp. 513–525 (2004)Google Scholar
- 9.Lozano, M.: Data Reduction Techniques in Classification processes (Phd Thesis). Universitat Jaume I (2007)Google Scholar
- 10.Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press (1979)Google Scholar
- 11.McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symp. on Math. Statistics and Probability, pp. 281–298. University of California Press, Berkeley (1967)Google Scholar
- 13.Samet, H.: Foundations of multidimensional and metric data structures. The Morgan Kaufmann series in computer graphics. Elsevier,Morgan Kaufmann (2006)Google Scholar
- 15.Toussaint, G.: Proximity graphs for nearest neighbor decision rules: Recent progress. In: 34th Symposium on the INTERFACE, pp. 17–20 (2002)Google Scholar