Methods for Augmenting Semantic Models with Structural Information for Text Classification

  • Jonathan M. Fishbein
  • Chris Eliasmith
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words or ‘concepts’. Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. Here, we investigate three methods to augment semantic modelling with syntactic structure, which encode the structure across all features of the document vector while preserving text semantics. We present classification results for these methods versus the Bag-of-Concepts semantic modelling representation to determine which method best improves classification scores.

Keywords

Vector Space Model Text Classification Parts of Speech Tagging Syntactic Structure Semantics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kehagias, A., et al.: A comparison of word- and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487–493 (2004)Google Scholar
  4. 4.
    Sahlgren, M.: An Introduction to Random Indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (2005)Google Scholar
  5. 5.
    Plate, T.A.: Holographic Reduced Representation: Distributed representation for cognitive structures. CSLI Publications, Stanford (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jonathan M. Fishbein
    • 1
  • Chris Eliasmith
    • 1
    • 2
  1. 1.Department of Systems Design Engineering 
  2. 2.Department of Philosophy Centre for Theoretical NeuroscienceUniversity of WaterlooWaterlooCanada

Personalised recommendations