A New Nearest Neighbor Rule for Text Categorization

  • Reynaldo Gil-García
  • Aurora Pons-Porrata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4225)


The nearest neighbor (NN) rule is usually chosen in a large number of pattern recognition systems due to its simplicity and good properties. In particular, this rule has been successfully applied to text categorization. A vast number of NN algorithms have been developed during the last years. They differ in how they find the nearest neighbors, how they obtain the votes of categories, and which decision rule they use. A new NN classification rule which comes from the use of a different definition of neighborhood is introduced in this paper. The experimental results on Reuters-21578 standard benchmark collection show that our algorithm achieves better classification rates than the k-NN rule while decreasing classification time.


Near Neighbor Text Categorization Spherical Region Pattern Recognition System Neighbor Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Sebastiani, F.: Text categorization. In: Text Mining and its Applications to Intelligence, CRM and Knowledge Management, WIT Press, Southampton (2005)Google Scholar
  2. 2.
    Duda, R., Hart, P., Stark, D.G.: Pattern Classification. Wiley-Interscience, Chichester (2000)Google Scholar
  3. 3.
    Ramasubramanian, V., Paliwal, K.K.: Fast nearest-neighbor search algorithms based on approximation-elimination search. Pattern Recognition 33, 1497–1510 (2000)CrossRefGoogle Scholar
  4. 4.
    Moreno-Seco, F., Micó, L., Oncina, J.: A new classification rule based on nearest neighbour search. In: 17th International Conference on Pattern Recognition, vol. 4, pp. 408–411. IEEE Computer Society Press, Los Alamitos (2004)CrossRefGoogle Scholar
  5. 5.
    Sánchez, J., Pla, F., Ferri, F.: On the use of neighbourhood-based non-parametric classifiers. Pattern Recognition Letters 18, 1179–1186 (1997)CrossRefGoogle Scholar
  6. 6.
    Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Shin, K., Abraham, A., Han, S.: Improving knn text categorization by removing outliers from training set. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 563–566. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Han, E.H., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 53–65. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  9. 9.
    Yang, Y.: A study on thresholding strategies for text categorization. In: SIGIR 2001, 24th ACM International Conference on Research and Development in Information Retrieval, pp. 137–145. ACM Press, New York (2001)Google Scholar
  10. 10.
    Yang, Y.: Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: SIGIR 1994, 17th ACM International Conference on Research and Development in Information Retrieval, Ireland, pp. 13–22 (1994)Google Scholar
  11. 11.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)CrossRefGoogle Scholar
  12. 12.
    Debole, F., Sebastiani, F.: An analysis of the relative hardness of Reuters-21578 subsets. Journal of the American Society for Information Science and Technology 56, 584–596 (2005)CrossRefGoogle Scholar
  13. 13.
    Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: SIGIR1992, 15th ACM International Conference on Research and Development in Information Retrieval, Denmark, pp. 37–50 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Reynaldo Gil-García
    • 1
  • Aurora Pons-Porrata
    • 1
  1. 1.Center of Pattern Recognition and Data MiningUniversidad de OrienteSantiago de CubaCuba

Personalised recommendations