Fast k Most Similar Neighbor Classifier for Mixed Data Based on a Tree Structure and Approximating-Eliminating
The k nearest neighbor (k-NN) classifier has been extensively used as a nonparametric technique in Pattern Recognition. However, in some applications where the training set is large, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed to avoid this problem. Most of these classifiers rely on metric properties, usually the triangle inequality, to reduce the number of prototype comparisons. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbor (k-MSN) classifier, which uses a Tree structure and an Approximating and Eliminating approach for Mixed Data, not based on metric properties (Tree AEMD), is introduced. The proposed classifier is compared against other fast k-NN classifiers.
Keywordsk-NN Classifier Fast k-NN Classifiers Mixed Data
- 6.Tokoro, K., Yamaguchi, K., Masuda, S.: Improvements of TLAESA nearest neighbor search and extension to approximation search. In: ACSC 2006: Proceedings of the 29th Australian Computer Science Conference, pp. 77–83 (2006)Google Scholar
- 8.Moreno-Seco, F., Mico, L., Oncina, J.: Approximate Nearest Neighbor Search with the Fukunaga and Narendra Algorithm and its Application to Chromosome Classification. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 322–328. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 9.Oncina, J., Thollard, F., Gómez-Ballester, E., Micó, L., Moreno-Seco, F.: A Tabular Pruning Rule in Tree-Based Fast Nearest Neighbor Search Algorithms. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4478, pp. 306–313. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 12.Blake, C., Merz, C.U.: Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA (1998), http://www.uci.edu/mlearn/databases/