Skip to main content

An kNN Model-Based Approach and Its Application in Text Categorization

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Abstract

An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.

A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lam, W., Ho, C.Y.: Using a Generalized Instance Set for Automatic Text Categorization. In: SIGIR 1998, pp. 81–89 (1998)

    Google Scholar 

  2. Lewis, D.D.: Naïve (Bayes) at forty: The independent assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  3. Cohen, W.W., Singer, Y.: Context–Sensitive Learning Methods for Text Categorization. ACM Trans. Inform. Syst. 17(2), 141–173 (1999)

    Article  Google Scholar 

  4. Li, H., Yamanishi, K.: Text Classification Using ESC-based Stochastic Decision Lists. In: Proceedings of CIKM 1999, 8th ACM International Conference on Information and Knowledge Management, pp. 122–130 (1999)

    Google Scholar 

  5. Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)

    Google Scholar 

  6. Ruiz, M.E., Srinivasan, P.: Hierarchical Neural Networks for Text Categorization. In: Proceedings of SIGIR 1999, 22nd ACM International Information Retrieval, pp. 281–282 (1999)

    Google Scholar 

  7. Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1996)

    MATH  Google Scholar 

  8. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  9. Joachims, T.: A Statistical Learning Model of Text Classification for Support Vector Machines. In: Proceedings of SIGIR 2001, 24th ACM International Conference on Research and Development in Information Retrieval, pp. 128–136 (2001)

    Google Scholar 

  10. Rocchio Jr., J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Inc., Englewood Cliffs (1971)

    Google Scholar 

  11. Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Test Categorization. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning, pp. 143–151 (1997)

    Google Scholar 

  12. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  13. Cortes, C., Vapnik, V.: Support-Vector Network. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K. (2004). An kNN Model-Based Approach and Its Application in Text Categorization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24630-5_69

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21006-1

  • Online ISBN: 978-3-540-24630-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics