Advertisement

Machine Learning

, Volume 7, Issue 2–3, pp 195–225 | Cite as

Distributed representations, simple recurrent networks, and grammatical structure

  • Jeffrey L. Elman
Article

Abstract

In this paper three problems for a connectionist account of language are considered
  1. 1.

    What is the nature of linguistic representations?

     
  2. 2.

    How can complex structural relationships such as constituent be represented?

     
  3. 3.

    How can the apparently open-ended nature of language be accommodated by a fixed-resource system?

     

Using a prediction task, a simple recurrent network (SRN) is trained on multiclausal sentences which contain multiply-embedded relative clauses. Principal component analysis of the hidden unit activation patterns reveals that the network solves the task by developing complex distributed representations which encode the relevant grammatical relations and hierarchical constituent structure. Differences between the SRN state representations and the more traditional pushdown store are discussed in the final section.

Keywords

Distributed representations simple recurrent networks grammatical structure 

References

  1. Baker C.L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10, 533–581.Google Scholar
  2. Bates E., & MacWhinney B. (1982). Functionalist approaches to grammar. In E.Wanners, & L.Gleitman (Eds.). Language acquisition: The state of the art. New York: Cambridge University Press.Google Scholar
  3. Chafe W. (1970). Meaning and the structure of language. Chicago: University of Chicago Press.Google Scholar
  4. Chalmers, D.J. (1990). Syntactic transformations on distributed representations. Center for Research on Concepts and Cognition, Indiana University.Google Scholar
  5. Chomsky N. (1957). Syntactic structures. The Hague: Mouton.Google Scholar
  6. Dell G. (1986). A spreading activation theory of retrieval in sentence production. Psychological Review, 93, 283–321.Google Scholar
  7. Dolan C., & Dyer M.G. (1987). Symbolic schemata in connectionist memories: Role binding and the evolution of structure (Technical Report UCLA-AI-87–11). Los Angeles, CA: University of California, Los Angeles. Artificial Intelligence Laboratory.Google Scholar
  8. Dolan C.P., & Smolensky P. (1988). Implementing a connectionist production system using tensor products (Technical Report UCLA-AI-88–15). Los Angeles, CA: University of California, Los Angeles, Artificial Intelligence Laboratory.Google Scholar
  9. Elman J.L. (1989). Representation and structure in connectionist models (Technical Report CRL-8903). San Diego, CA: University of California, San Diego, Center for Research in Language.Google Scholar
  10. Elman J.L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.Google Scholar
  11. Fauconnier G. (1985) Mental spaces. Cambridge, MA: MIT Press.Google Scholar
  12. Feldman J.A. & Ballard D.H. (1982). Connectionist models and their properties. Cognitive Science, 6, 205–254.Google Scholar
  13. Fillmore C.J. (1982). Frame semantics. In Linguistics in the morning calm. Seoul: Hansin.Google Scholar
  14. Flury G. (1988). Common principal components and related multivariate models. New York: Wiley.Google Scholar
  15. Fodor J. (1976). The language of thought. Harvester Press, Sussex.Google Scholar
  16. Fodor J., & Pylyshyn Z. (1988). Connectionism and cognitive architecture: A critical analysis. In S.Pinker & J.Mehler (Eds.), Connections and symbols. Cambridge, MA: MIT Press.Google Scholar
  17. Forster K.I. (1979). Levels of processing and the structure of the language processor. In W.E.Cooper, & E.Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett. Hillsdale NJ: Lawrence Erlbaum Associates.Google Scholar
  18. Gasser, M., & Lee, C-D. (1990). Networks that learn phonology. Computer Science Department, Indiana University.Google Scholar
  19. Givon T. (1984). Syntax: A functional-typological introduction. Volume 1. Amsterdam: John BenjaminsGoogle Scholar
  20. Gold E.M. (1967). Language identification in the limit. Information and Control 16, 447–474.Google Scholar
  21. Gonzalez R.C., & Wintz P. (1977). Digital image processing. Reading, MA: Adcison-Wesley.Google Scholar
  22. Grosjean F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics 28, 267–283.Google Scholar
  23. Hanson S.J., & Burr D.J. (1987). Knowledge representation in connectionist networks. Bell Communications Research, Morristown, New Jersey.Google Scholar
  24. Hare M. (1990). The role of similarity in Hungarian vowel harmony: A connectionist account (CRL Technical Report 9004). San Diego, CA: University of California, Center for Research in Language.Google Scholar
  25. Hare M., Corina D., & Cottrell G. (1988). Connectionist perspective on prosodic structure (CRL Newsletter, Vol. 3, No. 2). San Diego, CA: University of California, Center for Research in Language.Google Scholar
  26. Hinton, G.E. (1988). Representing part-whole hierarchies in connectionist networks (Technical Report CRG-TR-88-2). University of Toronto, Connectionist Research Group.Google Scholar
  27. Hinton G.E., McClelland J.L., & Rumelhart D.E. (1986). Distributed representations. in D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol 1). Cambridge, MA: MIT Press.Google Scholar
  28. Hopper P.J., & Thompson S.A. (1980). Transitivity in grammar and discourse. Language, 56, 251–299.Google Scholar
  29. Hornik, K., Stinchcombe, M., & White, H. (in press). Multi-layer feedforward networks are universal approximators. Neural Networds.Google Scholar
  30. Jordan M.I. (1986). Serial order: A parallel distributed processing approach (Technical Report 8604) San Diego, CA: University of California, San Diego, Institute for Cognitive Science.Google Scholar
  31. Kawamoto A.H. (1988). Distributed representations of ambiguous words and their resolution in a connectionist network. In S.L.Small, G.W.Cottrell, & M.K.Tanenhaus (Eds.) Lexical ambiguity resolution Perspecnyes from psycholinguistics, neuropsychology, and artificial intelligence. San Mateo, CA: Morgan Kaufmann Publishers.Google Scholar
  32. Kirsh, D. (in press). When is information represented explicitly? In J. Hanson (Ed.). Information, thought and content. Vancouver: University of British Columbia.Google Scholar
  33. Kuno S. (1987). Functional syntax: Anaphora, discourse and empathy. Chicago: The University of Chicago Press.Google Scholar
  34. Kutas M. (1988). Event-related brain potentials (ERPs) elicited during rapid seria presentation of congruous and incongruous sentences. In R.Rohrbaugh, J.Rohrbaugh, & P.Parasuramen (Eds.), Current trends in brain potential research (EEG Supplement 40). Amsterdam: Elsevier.Google Scholar
  35. Kutas M., & Hillyard S.A. (1980). Reading senseless sentences: Brain potentials effect semantic inconguity. Science, 207, 203–205.Google Scholar
  36. Lakoff G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.Google Scholar
  37. Langacker R.W. (1987). Foundations of cognitive grammar: Theoretical perspectives Volume 1. Stanford Stanford University Press.Google Scholar
  38. Langacker R.W. (1988). A usage-based model. Current Issues in Linguistic Theory, 50, 127–161.Google Scholar
  39. MacWhinney B., Leinbach J., Taraban R., & McDonald J. (1989). Language learning: Cues or rules? Journal of Memory and Language, 28, 255–277.Google Scholar
  40. Marslen-Wilson W., & Tyler L.K. (1980). The temporal structure of spoken language understanding Cognition 8, 1–71.Google Scholar
  41. McClelland J.L. (1987). The case for interactionism in language processing. In M.Coltheart (Ed.). Attention and performance XII: The psychology of reading. London: Erlbaum.Google Scholar
  42. McClelland, J.L., St. John, M., & Taraban, R. (1989). Sentence comprehension: A parallel distributed processing approach. Manuscript, Department of Psychology, Carnegie Mellon University.Google Scholar
  43. McMillan C., & Smolensky P. (1988). Analyzing a connectionist model as a system of soft rules (Technical Report CU-CS-303–88). University of Colorado, Boulder, Department of Computer Science.Google Scholar
  44. Miikkulainen R., & Dyer M. (1989a). Encoding input/output representations in connectionist cognitive systems In D.S.Touretzky, G.E.Hinton, & T.J.Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School. Los Altos, CA: Morgan Kaufmann Publishers.Google Scholar
  45. Miikkulainen, R., & Dyer, M. (1989b). A modular neural network architecture for sequential paraphrasing of script-based stories. In Proceedings of the International Joint Conference on Neural Networks, IEEE.Google Scholar
  46. Mozer, M. (1988). A focused back-propagation algorithm for temporal pattern recognition. (Technical Report CRG-TR-88-3). University of Toronto, Departments of Psychology and Computer Science.Google Scholar
  47. Mozer M.C., & Smolensky P. (1989). Skeletonization: A technique for trimming the fat from a network via relevance assessment (Technical Report CU-CS-421–89). University of Colorado, Boulder, Department of Computer ScienceGoogle Scholar
  48. Oden G. (1978). Semantic constraints and judged preference for interpretations of ambiguous sentences. Memory and Cognition, 6, 26–37.Google Scholar
  49. Pinker S. (1989). Learnability and cognition: The acquisition of argument structure Cambridge, MA: MIT Press.Google Scholar
  50. Pollack J.B. (1988). Recursive auto-associative memory: Decising compositional distributed representations Proceedings of the Tenth Annual Conference of the Cognitive Science Society. Hillsdate, NJ: Lawrence Erlbaum.Google Scholar
  51. Pollack, J.B. (in press). Recursive distributed representations. Artificial Intelligence.Google Scholar
  52. Ramsey, W. (1989). The philosophical implications of connectionism. Ph.D. thesis, University of California. San Diego.Google Scholar
  53. Reich P.A., & Dell G.S. (1977). Finiteness and embedding. In E.L.BlansittJr. & P.Maher (Eds.). The third LACUS forum. Columbia, SC: Hornbeam Press.Google Scholar
  54. Rumelhart D.E., Hinton G.E., & Williams R.J. (1986). Learning internal representations by error propagation In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.Google Scholar
  55. Rumelhart D.E., & McClelland J.L. (1986a). PDP Models and general issues in cognitive science. In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.Google Scholar
  56. Rumelhart D.E., & McClelland J.L. (1986b). On learning the past tenses of English verbs. In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.Google Scholar
  57. Salasoo A., & Pisoni D.B. (1985). Interaction of knowledge sources in spoken word identification. Journal of Memory and Language, 24, 210–231.Google Scholar
  58. Sanger D. (1989). Contribution analysis: A technique for assigning responsibilities to hidden units in connectionist networks (Technical Report CU-CS-435–89). University of Colorado, Boulder. Department of Computer Science.Google Scholar
  59. Schlesinger I.M. (1971). On linguistic competence. IN Y.Bar-Hillel (Ed.). Pragmatics of natural languages. Dordrecht, Holland: Reidel.Google Scholar
  60. Sejnowski T.J., & Rosenberg C.R. (1987). Parallel networks that learn to pronounce English text. Complex Systems 1, 145–168.Google Scholar
  61. Servan-Schreiber D., Cleeremans A., & McClelland J.L. (1991). Graded state machines. The representation of temporal contingencies in simple recurrent networks. Machine Learning, 7, 161–193.Google Scholar
  62. Shastri, L., & Ajjanagadde, V. (1989). A connectionist system for rule based reasoning with multi-place predicates and variables (Technical Report MS-CIS-8905). University of Pennsylvania, Computer and Information Science Department.Google Scholar
  63. Smolensky P. (1987a). On variable binding and the representation of symbolic structures in connectionist systems (Technical Report CU-CS-355–87). University of Colorado, Boulder, Department of Computer Science.Google Scholar
  64. Smolensky P. (1987b). On the proper treatment of connectionism (Technical Report CU-CS-377–87). University of Colorado, Boulder, Department of Computer Science.Google Scholar
  65. Smolensky P. (1987c). Putting together connectionism-again (Technical Report CU-CS-378–87) University of Colorado, Boulder, Department of Computer Science.Google Scholar
  66. Smolensky, P. (1988). On the proper treatment of connectionism. The Behavioral and Brain Sciences, 11 Google Scholar
  67. Smolensky, P. (in press). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence.Google Scholar
  68. St. John, M., & McClelland, J.L. (in press). Learning and applying contextual constraints in sentence comprehension (Technical Report). Pittsburgh, PA: Carnegie Mellon University, Department of Psychology.Google Scholar
  69. Stemberger J.P. (1985). The lexicon in a model of language production. New York: Garland Publishing.Google Scholar
  70. Stinchcombe, M., & White, H. (1989). Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. Proceedings of the International Joint Conference on Neural Networks. Washington, D.C.Google Scholar
  71. Stolz W. (1967). A study of the ability to decode grammatically novel sentences. Journal of Verbal Learning and Verbal Behavior, 6, 867–873.Google Scholar
  72. Tanenhaus, M.K., Garnseyh, S.M., & Boland, J. (in press). Combinatory lexical information and language comprehension. In G. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives. Cambridge, MA: MIT Press.Google Scholar
  73. Touretzky D.S. (1986). BoltzCONS: Reconciling connectionism with the recursive nature of stacks and trees. Proceedings of the Eight Annual Conference of the Cognitive Science Society, Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  74. Touretzky D.S. (1989). Rules and maps in connectionist symbol processing (Technical Report CMU-CS-89–158) Pittsburgh, PA: Carnegie Mellon University, Department of Computer Science.Google Scholar
  75. Touretzky, D.S. (1989). Towards a connectionist phonology: The “many maps” approach to sequence manipulation. Proceedings of the 11th Annual Conference of the Cognitive Science Society, 188–195.Google Scholar
  76. Touretzky, D.S., & Hinton, G.E. (1985). Symbols among the neurons: Details of a connectionist inference archjtecture. Proceedings of the Ninth International Joint Conference on Artificial Intelligence. Los Angeles.Google Scholar
  77. Touretzky D.S., & Wheeler D.W. (1989). A connectionist implementation of cognitive phonology (Technical Report CMU-CS-89–144). Pittsburgh, PA: Carnegie Mellon University, School of Computer Science.Google Scholar
  78. Van Gelder, T.J. (in press). Compositionality: Variations on a classical theme. Cognitive Science.Google Scholar

Copyright information

© Kluwer Academic Publishers 1991

Authors and Affiliations

  • Jeffrey L. Elman
    • 1
    • 2
  1. 1.Department of Cognitive ScienceUniversity of CaliforniaSan Diego
  2. 2.Department of LinguisticsUniversity of CaliforniaSan Diego

Personalised recommendations