An kNN Model-Based Approach and Its Application in Text Categorization

Guo, Gongde; Wang, Hui; Bell, David; Bi, Yaxin; Greer, Kieran

doi:10.1007/978-3-540-24630-5_69

Gongde Guo⁵,
Hui Wang⁵,
David Bell⁶,
Yaxin Bi⁶ &
…
Kieran Greer⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1103 Accesses
24 Citations

Abstract

An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.

A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lam, W., Ho, C.Y.: Using a Generalized Instance Set for Automatic Text Categorization. In: SIGIR 1998, pp. 81–89 (1998)
Google Scholar
Lewis, D.D.: Naïve (Bayes) at forty: The independent assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Cohen, W.W., Singer, Y.: Context–Sensitive Learning Methods for Text Categorization. ACM Trans. Inform. Syst. 17(2), 141–173 (1999)
Article Google Scholar
Li, H., Yamanishi, K.: Text Classification Using ESC-based Stochastic Decision Lists. In: Proceedings of CIKM 1999, 8th ACM International Conference on Information and Knowledge Management, pp. 122–130 (1999)
Google Scholar
Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Google Scholar
Ruiz, M.E., Srinivasan, P.: Hierarchical Neural Networks for Text Categorization. In: Proceedings of SIGIR 1999, 22nd ACM International Information Retrieval, pp. 281–282 (1999)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1996)
MATH Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Joachims, T.: A Statistical Learning Model of Text Classification for Support Vector Machines. In: Proceedings of SIGIR 2001, 24th ACM International Conference on Research and Development in Information Retrieval, pp. 128–136 (2001)
Google Scholar
Rocchio Jr., J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Inc., Englewood Cliffs (1971)
Google Scholar
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Test Categorization. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning, pp. 143–151 (1997)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Network. Machine Learning 20, 273–297 (1995)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, University of Ulster, BT37 0QB, Newtownabbey, Northern Ireland, UK
Gongde Guo, Hui Wang & Kieran Greer
School of Computer Science, Queen’s University Belfast, Belfast, BT7 1NN, UK
David Bell & Yaxin Bi

Authors

Gongde Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
David Bell
View author publications
You can also search for this author in PubMed Google Scholar
Yaxin Bi
View author publications
You can also search for this author in PubMed Google Scholar
Kieran Greer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K. (2004). An kNN Model-Based Approach and Its Application in Text Categorization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_69

Download citation

DOI: https://doi.org/10.1007/978-3-540-24630-5_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics