Extracting Semantic Representations from Large Text Corpora

  • Malti Patel
  • John A. Bullinaria
  • Joseph P. Levy
Conference paper
Part of the Perspectives in Neural Computing book series (PERSPECT.NEURAL)


Many connectionist language processing models have now reached a level of detail at which more realistic representations of semantics are required. In this paper we discuss the extraction of semantic representations from the word co-occurrence statistics of large text corpora and present a preliminary investigation into the validation and optimisation of such representations. We find that there is significantly more variation across the extraction procedures and evaluation criteria than is commonly assumed.


Window Size Target Word Lexical Decision Semantic Representation Distance Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Battig WF and Montague WE. Category norms for verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology Monograph 1969; 80Google Scholar
  2. 2.
    Bullinaria JA. Modelling Reading, Spelling and Past Tense Learning with Artificial Neural Networks. Brain and Language 1997; in pressGoogle Scholar
  3. 3.
    Bullinaria JA. Modelling Lexical Decision: Who needs a lexicon? In Keating JG. (Ed) Neural Computing Research and Applications III, 62–69. Maynooth, Ireland: St. Patrick’s College, 1995Google Scholar
  4. 4.
    Bullinaria JA. Connectionist Models of Reading: Incorporating Semantics. In Proceedings of the First European Workshop on Cognitive Modelling, 224–229, Berlin: Technische Universitat Berlin, 1996Google Scholar
  5. 5.
    Bullinaria JA and Huckle CC. Modelling Lexical Decision Using Corpus Derived Semantic Representations in a Connectionist Network. In Proceedings of the Fourth Neural Computational and Psychology Workshop 1997Google Scholar
  6. 6.
    Coltheart M, Curtis B, Atkins P and Haller M. Models of Reading Aloud: Dual-Route and Parallel-Distributed-Processing Approaches, Psychological Review 1993; 100: 589–608CrossRefGoogle Scholar
  7. 7.
    Hinton GE and Shallice T. Lesioning an Attractor Network: Investigations of Acquired Dyslexia. Psychological Review 1991; 98: 74–95CrossRefGoogle Scholar
  8. 8.
    Landauer TK and Dumais ST. A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. Psychological Review 1997; 104: 211–240CrossRefGoogle Scholar
  9. 9.
    Leech G. 100 million words of English: the British National Corpus. Language Research 1992, 28:1–13Google Scholar
  10. 10.
    Levy JP, Bullinaria JA and Patel M. Evaluating the Use of Word Co-Occurrence Statistics as Semantic Representations, in preparationGoogle Scholar
  11. 11.
    Lund K, Burgess C and Atchley RA. Semantic and Associative Priming in High-dimensional Semantic Space. In Moore JD and Lehman JF (Eds), Proceedings of the Seventeenth Annual Meeting of the Cognitive Science Society, 660–665. Lawrence Erlbaum Associates, Pittsburgh PA 1995Google Scholar
  12. 12.
    Lund K and Burgess C. Producing High-dimensional Semantic Spaces from Lexical Co-occurrence. Behaviour Research Methods, Instruments and Computers 1996; 2: 203–208CrossRefGoogle Scholar
  13. 13.
    Miller GA and Fellbaume C. Semantic networks of English. Cognition 1991; 41: 197–229CrossRefGoogle Scholar
  14. 14.
    Moss HE, Ostrin RK, Tyler LK and Marslen-Wilson WD. Accessing Different Types of Lexical Semantic Information: Evidence From Priming. Journal of Experimental Psychology: Learning, Memory and Cognition 1995; 21: 863–883CrossRefGoogle Scholar
  15. 15.
    Patel M. Using Neural Nets to Investigate Lexical Analysis. PRICAI’96: Topics in Artificial Intelligence 1996; 241–252Google Scholar
  16. 16.
    Plaut DC. Semantic and Associative Priming in a Distributed Attractor Network. Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society 1995; 37–42Google Scholar
  17. 17.
    Plaut DC and Shallice T. Deep Dyslexia: A case study of connectionist neuropsychology. Cognitive Neuropsychology 1993; 10: 377–500CrossRefGoogle Scholar
  18. 18.
    Plaut DC, McClelland JL, Seidenberg MS and Patterson KE. Understanding Normal and Impaired Word Reading: Computational Principles in Quasi-Regular Domains. Psychological Review 1996; 103: 56–115CrossRefGoogle Scholar
  19. 19.
    Schutze H. Word Space. In Hanson SJ, Cowan JD and Giles CL (Eds), Advances in Neural Information Processing Systems 5, 895–902. Morgan Kaufmann, San Mateo CA, 1993.Google Scholar
  20. 20.
    Seidenberg MS and McClelland JL. A Distributed, Developmental Model of Word Recognition and Naming. Psychological Review 1989; 96: 523–568CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 1998

Authors and Affiliations

  • Malti Patel
    • 1
  • John A. Bullinaria
    • 2
  • Joseph P. Levy
    • 2
  1. 1.Department of ComputingMacquarie UniversitySydneyAustralia
  2. 2.Department of PsychologyBirkbeck College LondonUK

Personalised recommendations