Use of Linguistic Features in Context-Sensitive Text Classification

  • Alex K. S. Wong
  • John W. T. Lee
  • Daniel S. Yeung
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3930)


Many popular Text Classification (TC) models use simple occurrence of words in a document as features to base their classifications. They commonly assume word occurrences to be statistically independent in their design. Although such assumption does not hold in general, these TC models are robust and efficient in their task. Some recent studies have shown context-sensitive TC approaches were able to perform better in general. On the other hand, although complex linguistic or semantic features may intuitively be more relevant in TC, studies on their effectiveness have produced mixed and inconclusive results. In this paper, we present our investigation on the use of some complex linguistic features with two context-sensitive TC methods. Our experimental results show potential advantages of such approach.


Text Classification Semantic Feature Word Sense Linguistic Feature Word Sense Disambiguation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Proceedings of the MSW workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 70–87 (2004)Google Scholar
  2. 2.
    Cohen, W.W.: Fast Effective Rule Induction. In: Proceedings of the 12th International Conference on Machine Learning, Lake Tahoe, CA (1995)Google Scholar
  3. 3.
    Cohen, W.W., Singer, Y.: Context-sensitive Learning Methods for Text Categorization. ACM Transactions on Information Systems 13(1), 100–111 (1999)Google Scholar
  4. 4.
    Furnkranz, J., Widmer, G.: Incremental Reduced Error Pruning. In: Proceedings of the 11th Annual Conference on Machine Learning, New Brunswick, NJ. Morgan Kaufmann Publishers Inc., San Francisco (1994)Google Scholar
  5. 5.
  6. 6.
    Miller, G.A.: WordNet: An On-line Lexical Database. International Journal of Lexicography 3(4) (1990)Google Scholar
  7. 7.
    Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.: Using a Semantic Concordance for Sense Identification. In: Proceedings of the Human Language Technology Workshop (1994)Google Scholar
  8. 8.
    Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Rocchio, J.: Relevance Feedback Information Retrieval. In: Salton, G. (ed.) The Smart Retrieval System – Experiments in Automatic Document Processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  10. 10.
    Sanderson, M.: Word Sense Disambiguation and Information Retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151 (1994)Google Scholar
  11. 11.
    Scott, S., Matwin, S.: Feature Engineering for Text Classification. In: Proceedings of ICML, pp. 379–388 (1999)Google Scholar
  12. 12.
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alex K. S. Wong
    • 1
  • John W. T. Lee
    • 1
  • Daniel S. Yeung
    • 1
  1. 1.Department of ComputingThe Hong Kong Polytechnic UniversityKowloon, Hong Kong

Personalised recommendations