Improving kNN Based Text Classification with Well Estimated Parameters

  • Heui Seok Lim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3316)

Abstract

This paper propose a method which improves performance of kNN based text classification by using well estimated parameters. Some variants of the kNN method with different decision functions, k values, and feature sets are proposed and evaluated to find out adequate parameters. Our experimental results show that kNN method with carefully chosen parameters are very significant in improving the performance and reducing size of feature set. We carefully conclude that it is very worthy of tuning parameters of kNN method to increase performance rather than having hard time in developing a new learning method.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lee, D.G.: A High Speed Index Term Extracting System Considering the Morphological Configuration of Noun. Master thesis of Dept. Computer Science and Engineering, Korea University (2000)Google Scholar
  2. 2.
    Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training algorithms for text categorization. In: Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval (1994)Google Scholar
  3. 3.
    Lewis, D.D.: Feature Selection and Feature Extraction of Text Categorization. In: Proc. of Speech and Natural Language Workshop, pp. 212–217 (1992)Google Scholar
  4. 4.
    Mitchell, T.M.: Machine learning. McGraw Hill, New York (1996)MATHGoogle Scholar
  5. 5.
    Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)Google Scholar
  6. 6.
    Tzeras, K., Hartman, S.: Automatic indexing based on bayesian inference networks. In: Proc. of the 16th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 22–34 (1993)Google Scholar
  7. 7.
    Yang, Y.: Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: 17th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 13–22 (1994)Google Scholar
  8. 8.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Access, 99–95 (1996)Google Scholar
  9. 9.
    Yang, Y., Pedersen, J.P.: A comparative study on feature selection in text categorization. In: Fisher Jr, D.H. (ed.) The Fourteenth Int. Conf. on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)Google Scholar
  10. 10.
    Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proc. of the Fourth Annual Symposium on Document Analysis and Information Retrieval (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Heui Seok Lim
    • 1
  1. 1.Dept. of SoftwareHanshin UniversityKorea

Personalised recommendations