Integrating Structure and Meaning: A New Method for Encoding Structure for Text Classification

  • Jonathan M. Fishbein
  • Chris Eliasmith
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words or ‘concepts’. Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduced Representations (HRRs) as a technique to encode both semantic and syntactic structure. This method improves on previous attempts in the literature by encoding the structure across all features of the document vector while preserving text semantics. Our method does not increase the dimensionality of the document vectors, allowing for efficient computation and storage. We present classification results of our HRR text representations versus Bag-of-Concepts representations and show that our method of including structure improves text classification results.

Keywords

Holographic Reduced Representations Vector Space Model Text Classification Part of Speech Tagging Random Indexing Syntax Semantics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Moschitti, A., Basili, R.: Complex linguistic features for text classification: a comprehensive study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)Google Scholar
  2. 2.
    Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487–493 (2004)Google Scholar
  3. 3.
    Deerwester, S.C., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  4. 4.
    Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behaviour Resrach Methods, Instrumentation and Computers 28(2), 203–208 (1996)Google Scholar
  5. 5.
    Johnson, W.B., Lindenstrauss, J.: Extensions to Lipshitz mapping into Hilbert space. Contemporary Mathematics 26 (1984)Google Scholar
  6. 6.
    Sahlgren, M.: An Introduction to Random Indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (2005)Google Scholar
  7. 7.
    Plate, T.A.: Holographic Reduced Representation: Distributed representation for cognitive structures. CSLI Publications (2003)Google Scholar
  8. 8.
    Eliasmith, C., Thagard, P.: Integrating structure and meaning: A distributed model of analogical mapping. Cognitive Science 25(2), 245–286 (2001)CrossRefGoogle Scholar
  9. 9.
    Eliasmith, C.: Cognition with neurons: A large-scale, bilogically realistic model of the Wason task. In: Proceedings of the XXVII Annual Conference of the Cognitive Science Society (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jonathan M. Fishbein
    • 1
  • Chris Eliasmith
    • 1
    • 2
  1. 1.Department of Systems Design Engineering 
  2. 2.Department of Philosophy Centre for Theoretical NeuroscienceUniversity of WaterlooWaterlooCanada

Personalised recommendations