A New Nearest Neighbor Rule for Text Categorization
The nearest neighbor (NN) rule is usually chosen in a large number of pattern recognition systems due to its simplicity and good properties. In particular, this rule has been successfully applied to text categorization. A vast number of NN algorithms have been developed during the last years. They differ in how they find the nearest neighbors, how they obtain the votes of categories, and which decision rule they use. A new NN classification rule which comes from the use of a different definition of neighborhood is introduced in this paper. The experimental results on Reuters-21578 standard benchmark collection show that our algorithm achieves better classification rates than the k-NN rule while decreasing classification time.
KeywordsNear Neighbor Text Categorization Spherical Region Pattern Recognition System Neighbor Rule
- 1.Sebastiani, F.: Text categorization. In: Text Mining and its Applications to Intelligence, CRM and Knowledge Management, WIT Press, Southampton (2005)Google Scholar
- 2.Duda, R., Hart, P., Stark, D.G.: Pattern Classification. Wiley-Interscience, Chichester (2000)Google Scholar
- 9.Yang, Y.: A study on thresholding strategies for text categorization. In: SIGIR 2001, 24th ACM International Conference on Research and Development in Information Retrieval, pp. 137–145. ACM Press, New York (2001)Google Scholar
- 10.Yang, Y.: Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: SIGIR 1994, 17th ACM International Conference on Research and Development in Information Retrieval, Ireland, pp. 13–22 (1994)Google Scholar
- 13.Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: SIGIR1992, 15th ACM International Conference on Research and Development in Information Retrieval, Denmark, pp. 37–50 (1992)Google Scholar