Fast k Most Similar Neighbor Classifier for Mixed Data Based on a Tree Structure and Approximating-Eliminating

  • Selene Hernández-Rodríguez
  • J. A. Carrasco-Ochoa
  • J. Fco. Martínez-Trinidad
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5197)

Abstract

The k nearest neighbor (k-NN) classifier has been extensively used as a nonparametric technique in Pattern Recognition. However, in some applications where the training set is large, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed to avoid this problem. Most of these classifiers rely on metric properties, usually the triangle inequality, to reduce the number of prototype comparisons. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbor (k-MSN) classifier, which uses a Tree structure and an Approximating and Eliminating approach for Mixed Data, not based on metric properties (Tree AEMD), is introduced. The proposed classifier is compared against other fast k-NN classifiers.

Keywords

k-NN Classifier Fast k-NN Classifiers Mixed Data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Trans. Information Theory 13, 21–27 (1967)CrossRefMATHGoogle Scholar
  2. 2.
    Vidal, P.E.: An algorithm for finding nearest neighbours in (approximately) constant average time complexity. Pattern Recognition Letters 4, 145–157 (1986)CrossRefGoogle Scholar
  3. 3.
    Micó, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing-time and memory requirements. Pattern Recognition Letters 15, 9–17 (1994)CrossRefGoogle Scholar
  4. 4.
    Figueroa, K., Chávez, E., Navarro, G., Paredes, R.: On the least cost for proximity searching in metric spaces. In: Àlvarez, C., Serna, M. (eds.) WEA 2006. LNCS, vol. 4007, pp. 279–290. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Mico, L., Oncina, J., Carrasco, R.: A fast Branch and Bound nearest neighbor classifier in metric spaces. Pattern Recognition Letters 17, 731–739 (1996)CrossRefGoogle Scholar
  6. 6.
    Tokoro, K., Yamaguchi, K., Masuda, S.: Improvements of TLAESA nearest neighbor search and extension to approximation search. In: ACSC 2006: Proceedings of the 29th Australian Computer Science Conference, pp. 77–83 (2006)Google Scholar
  7. 7.
    Fukunaga, K., Narendra, P.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 24, 743–750 (1975)MATHGoogle Scholar
  8. 8.
    Moreno-Seco, F., Mico, L., Oncina, J.: Approximate Nearest Neighbor Search with the Fukunaga and Narendra Algorithm and its Application to Chromosome Classification. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 322–328. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Oncina, J., Thollard, F., Gómez-Ballester, E., Micó, L., Moreno-Seco, F.: A Tabular Pruning Rule in Tree-Based Fast Nearest Neighbor Search Algorithms. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4478, pp. 306–313. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    García-Serrano, J.R., Martínez-Trinidad, J.F.: Extension to C-Means Algorithm for the use of Similarity Functions. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 354–359. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  11. 11.
    Hernández-Rodríguez, S., Martínez-Trinidad, J., Carrasco-Ochoa, A.: Fast k Most Similar Neighbor Classifier for Mixed Data Based on a Tree Structure. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 407–416. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Blake, C., Merz, C.U.: Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA (1998), http://www.uci.edu/mlearn/databases/

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Selene Hernández-Rodríguez
    • 1
  • J. A. Carrasco-Ochoa
    • 1
  • J. Fco. Martínez-Trinidad
    • 1
  1. 1.Computer Science Department, National Institute of Astrophysics, Optics and ElectronicsPueblaMéxico

Personalised recommendations