Advertisement

Text Classification: Combining Grouping, LSA and kNN vs Support Vector Machine

  • Naohiro Ishii
  • Takeshi Murai
  • Takahiro Yamada
  • Yongguang Bao
  • Susumu Suzuki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4252)

Abstract

Text classification is a key technique for handling and organizing text data. The support vector machine(SVM) is shown to be better for the classification among well-known methods. In this paper, the grouping method of the similar words, is proposed for the classification of documents, which is applied to Reuters news and it is shown that the grouping of words has equivalent ability to the Latent Semantic Analysis(LSA) in the classification accuracy. Further, a new combining method is proposed for the classification, which consists of Grouping, LSA followed by the k-Nearest Neighbor classification ( k-NN ). The combining method proposed here, shows the higher accuracy in the classification than the conventional methods of the kNN, and the LSA followed by the kNN. Then, the combining method shows almost same accuracies as SVM.

Keywords

Support Vector Machine Classification Accuracy Text Classification Latent Semantic Analysis Vector Space Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Grossman, D.A., Frieder, O.: Information Retrieval - Algorithms and Heuristics, p. 332. Springer, Heidelberg (2004)MATHGoogle Scholar
  2. 2.
    Sebastiani, F.: A tutorial on automated text categorization. In: Proc. of ASAI 1999, 1st Argentinian Symposium on Artificial Intelligence. Buenos Aires, pp. 7–35 (1999)Google Scholar
  3. 3.
    Derrwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  4. 4.
    Landauer, P.W., Folz, T.K., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  5. 5.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  6. 6.
    Bao, B., Ishii, N.: Combining Multiple K-Nearest Neighbor Classifiers for Text Classification by Reducts. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 340–347. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Sirmakessis, S.: Text Mining and its Application, p. 204. Springer, Heidelberg (2003)Google Scholar
  8. 8.
    Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web, p. 285. Wiley, Chichester (2003)Google Scholar
  9. 9.
  10. 10.
    Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)MATHGoogle Scholar
  11. 11.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proc. of ACM SIGIR Cof. On Res. And Development in Information Retrieval, SIGIR 1999, pp. 42–49 (1999)Google Scholar
  12. 12.
    Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proc. of ACM SIGIR Cof. On Res. And Development in Information Retrieval, SIGIR 2001, pp. 128–136 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Naohiro Ishii
    • 1
  • Takeshi Murai
    • 1
  • Takahiro Yamada
    • 1
  • Yongguang Bao
    • 1
  • Susumu Suzuki
    • 1
  1. 1.Aichi Institute of TechnologyToyotaJapan

Personalised recommendations