Integrating Structure and Meaning: A New Method for Encoding Structure for Text Classification
Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words or ‘concepts’. Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduced Representations (HRRs) as a technique to encode both semantic and syntactic structure. This method improves on previous attempts in the literature by encoding the structure across all features of the document vector while preserving text semantics. Our method does not increase the dimensionality of the document vectors, allowing for efficient computation and storage. We present classification results of our HRR text representations versus Bag-of-Concepts representations and show that our method of including structure improves text classification results.
KeywordsHolographic Reduced Representations Vector Space Model Text Classification Part of Speech Tagging Random Indexing Syntax Semantics
Unable to display preview. Download preview PDF.
- 1.Moschitti, A., Basili, R.: Complex linguistic features for text classification: a comprehensive study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)Google Scholar
- 2.Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487–493 (2004)Google Scholar
- 4.Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behaviour Resrach Methods, Instrumentation and Computers 28(2), 203–208 (1996)Google Scholar
- 5.Johnson, W.B., Lindenstrauss, J.: Extensions to Lipshitz mapping into Hilbert space. Contemporary Mathematics 26 (1984)Google Scholar
- 6.Sahlgren, M.: An Introduction to Random Indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (2005)Google Scholar
- 7.Plate, T.A.: Holographic Reduced Representation: Distributed representation for cognitive structures. CSLI Publications (2003)Google Scholar
- 9.Eliasmith, C.: Cognition with neurons: A large-scale, bilogically realistic model of the Wason task. In: Proceedings of the XXVII Annual Conference of the Cognitive Science Society (2005)Google Scholar