Advertisement

Feature Selection in Text Classification Via SVM and LSI

  • Ziqiang Wang
  • Dexian Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3971)

Abstract

Text classification is a problem of assigning a document into one or more predefined classes. One of the most interesting issues in text categorization is feature selection. This paper proposes a novel approach in feature selection based on support vector machine(SVM) and latent semantic indexing(LSI), which can identify LSI-subspace that is suited for classification. Experimental results show that the proposed method can achieve higher classification accuracies and is of less training and prediction time.

Keywords

Support Vector Machine Feature Selection Information Retrieval Text Categorization Vector Space Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  2. 2.
    Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Gey, F. (ed.) Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM Press, New York (1999)CrossRefGoogle Scholar
  3. 3.
    Joachims, T.: A Statistical Learning Model of Text Classification with Support Vector Machines. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 128–136. ACM Press, New York (2001)CrossRefGoogle Scholar
  4. 4.
    Yang, Y., Pederson, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Diego (1997)Google Scholar
  5. 5.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  6. 6.
    Papadimitriou, C.H., et al.: Latent Semantic Indexing: a Probabilistic Analysis. Journal of Computer and System Sciences 61(2), 217–235 (2000)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Vapnik, V.: Statistical Learning Theory. John Wiley&Sons, New York (1998)MATHGoogle Scholar
  8. 8.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  9. 9.
    Guyon, I., Weston, J., Barnhill, S.: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning 46(3), 389–422 (2002)MATHCrossRefGoogle Scholar
  10. 10.
    Li, T., Zhu, S., Ogihara, M.: Efficient Multi-way Text Categorization via Generalized Discriminant Analysis. In: Kraft, D. (ed.) Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 317–324. ACM Press, New York (2003)CrossRefGoogle Scholar
  11. 11.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  12. 12.
    Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Trans. Neural Networks 10(5), 1048–1054 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ziqiang Wang
    • 1
  • Dexian Zhang
    • 1
  1. 1.School of Information Science and EngineeringHenan University of TechnologyZheng ZhouChina

Personalised recommendations