Abstract
In this paper three problems for a connectionist account of language are considered
-
1.
What is the nature of linguistic representations?
-
2.
How can complex structural relationships such as constituent be represented?
-
3.
How can the apparently open-ended nature of language be accommodated by a fixed-resource system?
Using a prediction task, a simple recurrent network (SRN) is trained on multiclausal sentences which contain multiply-embedded relative clauses. Principal component analysis of the hidden unit activation patterns reveals that the network solves the task by developing complex distributed representations which encode the relevant grammatical relations and hierarchical constituent structure. Differences between the SRN state representations and the more traditional pushdown store are discussed in the final section.
References
Baker C.L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10, 533–581.
Bates E., & MacWhinney B. (1982). Functionalist approaches to grammar. In E.Wanners, & L.Gleitman (Eds.). Language acquisition: The state of the art. New York: Cambridge University Press.
Chafe W. (1970). Meaning and the structure of language. Chicago: University of Chicago Press.
Chalmers, D.J. (1990). Syntactic transformations on distributed representations. Center for Research on Concepts and Cognition, Indiana University.
Chomsky N. (1957). Syntactic structures. The Hague: Mouton.
Dell G. (1986). A spreading activation theory of retrieval in sentence production. Psychological Review, 93, 283–321.
Dolan C., & Dyer M.G. (1987). Symbolic schemata in connectionist memories: Role binding and the evolution of structure (Technical Report UCLA-AI-87–11). Los Angeles, CA: University of California, Los Angeles. Artificial Intelligence Laboratory.
Dolan C.P., & Smolensky P. (1988). Implementing a connectionist production system using tensor products (Technical Report UCLA-AI-88–15). Los Angeles, CA: University of California, Los Angeles, Artificial Intelligence Laboratory.
Elman J.L. (1989). Representation and structure in connectionist models (Technical Report CRL-8903). San Diego, CA: University of California, San Diego, Center for Research in Language.
Elman J.L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.
Fauconnier G. (1985) Mental spaces. Cambridge, MA: MIT Press.
Feldman J.A. & Ballard D.H. (1982). Connectionist models and their properties. Cognitive Science, 6, 205–254.
Fillmore C.J. (1982). Frame semantics. In Linguistics in the morning calm. Seoul: Hansin.
Flury G. (1988). Common principal components and related multivariate models. New York: Wiley.
Fodor J. (1976). The language of thought. Harvester Press, Sussex.
Fodor J., & Pylyshyn Z. (1988). Connectionism and cognitive architecture: A critical analysis. In S.Pinker & J.Mehler (Eds.), Connections and symbols. Cambridge, MA: MIT Press.
Forster K.I. (1979). Levels of processing and the structure of the language processor. In W.E.Cooper, & E.Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett. Hillsdale NJ: Lawrence Erlbaum Associates.
Gasser, M., & Lee, C-D. (1990). Networks that learn phonology. Computer Science Department, Indiana University.
Givon T. (1984). Syntax: A functional-typological introduction. Volume 1. Amsterdam: John Benjamins
Gold E.M. (1967). Language identification in the limit. Information and Control 16, 447–474.
Gonzalez R.C., & Wintz P. (1977). Digital image processing. Reading, MA: Adcison-Wesley.
Grosjean F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics 28, 267–283.
Hanson S.J., & Burr D.J. (1987). Knowledge representation in connectionist networks. Bell Communications Research, Morristown, New Jersey.
Hare M. (1990). The role of similarity in Hungarian vowel harmony: A connectionist account (CRL Technical Report 9004). San Diego, CA: University of California, Center for Research in Language.
Hare M., Corina D., & Cottrell G. (1988). Connectionist perspective on prosodic structure (CRL Newsletter, Vol. 3, No. 2). San Diego, CA: University of California, Center for Research in Language.
Hinton, G.E. (1988). Representing part-whole hierarchies in connectionist networks (Technical Report CRG-TR-88-2). University of Toronto, Connectionist Research Group.
Hinton G.E., McClelland J.L., & Rumelhart D.E. (1986). Distributed representations. in D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol 1). Cambridge, MA: MIT Press.
Hopper P.J., & Thompson S.A. (1980). Transitivity in grammar and discourse. Language, 56, 251–299.
Hornik, K., Stinchcombe, M., & White, H. (in press). Multi-layer feedforward networks are universal approximators. Neural Networds.
Jordan M.I. (1986). Serial order: A parallel distributed processing approach (Technical Report 8604) San Diego, CA: University of California, San Diego, Institute for Cognitive Science.
Kawamoto A.H. (1988). Distributed representations of ambiguous words and their resolution in a connectionist network. In S.L.Small, G.W.Cottrell, & M.K.Tanenhaus (Eds.) Lexical ambiguity resolution Perspecnyes from psycholinguistics, neuropsychology, and artificial intelligence. San Mateo, CA: Morgan Kaufmann Publishers.
Kirsh, D. (in press). When is information represented explicitly? In J. Hanson (Ed.). Information, thought and content. Vancouver: University of British Columbia.
Kuno S. (1987). Functional syntax: Anaphora, discourse and empathy. Chicago: The University of Chicago Press.
Kutas M. (1988). Event-related brain potentials (ERPs) elicited during rapid seria presentation of congruous and incongruous sentences. In R.Rohrbaugh, J.Rohrbaugh, & P.Parasuramen (Eds.), Current trends in brain potential research (EEG Supplement 40). Amsterdam: Elsevier.
Kutas M., & Hillyard S.A. (1980). Reading senseless sentences: Brain potentials effect semantic inconguity. Science, 207, 203–205.
Lakoff G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.
Langacker R.W. (1987). Foundations of cognitive grammar: Theoretical perspectives Volume 1. Stanford Stanford University Press.
Langacker R.W. (1988). A usage-based model. Current Issues in Linguistic Theory, 50, 127–161.
MacWhinney B., Leinbach J., Taraban R., & McDonald J. (1989). Language learning: Cues or rules? Journal of Memory and Language, 28, 255–277.
Marslen-Wilson W., & Tyler L.K. (1980). The temporal structure of spoken language understanding Cognition 8, 1–71.
McClelland J.L. (1987). The case for interactionism in language processing. In M.Coltheart (Ed.). Attention and performance XII: The psychology of reading. London: Erlbaum.
McClelland, J.L., St. John, M., & Taraban, R. (1989). Sentence comprehension: A parallel distributed processing approach. Manuscript, Department of Psychology, Carnegie Mellon University.
McMillan C., & Smolensky P. (1988). Analyzing a connectionist model as a system of soft rules (Technical Report CU-CS-303–88). University of Colorado, Boulder, Department of Computer Science.
Miikkulainen R., & Dyer M. (1989a). Encoding input/output representations in connectionist cognitive systems In D.S.Touretzky, G.E.Hinton, & T.J.Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School. Los Altos, CA: Morgan Kaufmann Publishers.
Miikkulainen, R., & Dyer, M. (1989b). A modular neural network architecture for sequential paraphrasing of script-based stories. In Proceedings of the International Joint Conference on Neural Networks, IEEE.
Mozer, M. (1988). A focused back-propagation algorithm for temporal pattern recognition. (Technical Report CRG-TR-88-3). University of Toronto, Departments of Psychology and Computer Science.
Mozer M.C., & Smolensky P. (1989). Skeletonization: A technique for trimming the fat from a network via relevance assessment (Technical Report CU-CS-421–89). University of Colorado, Boulder, Department of Computer Science
Oden G. (1978). Semantic constraints and judged preference for interpretations of ambiguous sentences. Memory and Cognition, 6, 26–37.
Pinker S. (1989). Learnability and cognition: The acquisition of argument structure Cambridge, MA: MIT Press.
Pollack J.B. (1988). Recursive auto-associative memory: Decising compositional distributed representations Proceedings of the Tenth Annual Conference of the Cognitive Science Society. Hillsdate, NJ: Lawrence Erlbaum.
Pollack, J.B. (in press). Recursive distributed representations. Artificial Intelligence.
Ramsey, W. (1989). The philosophical implications of connectionism. Ph.D. thesis, University of California. San Diego.
Reich P.A., & Dell G.S. (1977). Finiteness and embedding. In E.L.BlansittJr. & P.Maher (Eds.). The third LACUS forum. Columbia, SC: Hornbeam Press.
Rumelhart D.E., Hinton G.E., & Williams R.J. (1986). Learning internal representations by error propagation In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.
Rumelhart D.E., & McClelland J.L. (1986a). PDP Models and general issues in cognitive science. In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.
Rumelhart D.E., & McClelland J.L. (1986b). On learning the past tenses of English verbs. In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.
Salasoo A., & Pisoni D.B. (1985). Interaction of knowledge sources in spoken word identification. Journal of Memory and Language, 24, 210–231.
Sanger D. (1989). Contribution analysis: A technique for assigning responsibilities to hidden units in connectionist networks (Technical Report CU-CS-435–89). University of Colorado, Boulder. Department of Computer Science.
Schlesinger I.M. (1971). On linguistic competence. IN Y.Bar-Hillel (Ed.). Pragmatics of natural languages. Dordrecht, Holland: Reidel.
Sejnowski T.J., & Rosenberg C.R. (1987). Parallel networks that learn to pronounce English text. Complex Systems 1, 145–168.
Servan-Schreiber D., Cleeremans A., & McClelland J.L. (1991). Graded state machines. The representation of temporal contingencies in simple recurrent networks. Machine Learning, 7, 161–193.
Shastri, L., & Ajjanagadde, V. (1989). A connectionist system for rule based reasoning with multi-place predicates and variables (Technical Report MS-CIS-8905). University of Pennsylvania, Computer and Information Science Department.
Smolensky P. (1987a). On variable binding and the representation of symbolic structures in connectionist systems (Technical Report CU-CS-355–87). University of Colorado, Boulder, Department of Computer Science.
Smolensky P. (1987b). On the proper treatment of connectionism (Technical Report CU-CS-377–87). University of Colorado, Boulder, Department of Computer Science.
Smolensky P. (1987c). Putting together connectionism-again (Technical Report CU-CS-378–87) University of Colorado, Boulder, Department of Computer Science.
Smolensky, P. (1988). On the proper treatment of connectionism. The Behavioral and Brain Sciences, 11
Smolensky, P. (in press). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence.
St. John, M., & McClelland, J.L. (in press). Learning and applying contextual constraints in sentence comprehension (Technical Report). Pittsburgh, PA: Carnegie Mellon University, Department of Psychology.
Stemberger J.P. (1985). The lexicon in a model of language production. New York: Garland Publishing.
Stinchcombe, M., & White, H. (1989). Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. Proceedings of the International Joint Conference on Neural Networks. Washington, D.C.
Stolz W. (1967). A study of the ability to decode grammatically novel sentences. Journal of Verbal Learning and Verbal Behavior, 6, 867–873.
Tanenhaus, M.K., Garnseyh, S.M., & Boland, J. (in press). Combinatory lexical information and language comprehension. In G. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives. Cambridge, MA: MIT Press.
Touretzky D.S. (1986). BoltzCONS: Reconciling connectionism with the recursive nature of stacks and trees. Proceedings of the Eight Annual Conference of the Cognitive Science Society, Hillsdale, NJ: Lawrence Erlbaum.
Touretzky D.S. (1989). Rules and maps in connectionist symbol processing (Technical Report CMU-CS-89–158) Pittsburgh, PA: Carnegie Mellon University, Department of Computer Science.
Touretzky, D.S. (1989). Towards a connectionist phonology: The “many maps” approach to sequence manipulation. Proceedings of the 11th Annual Conference of the Cognitive Science Society, 188–195.
Touretzky, D.S., & Hinton, G.E. (1985). Symbols among the neurons: Details of a connectionist inference archjtecture. Proceedings of the Ninth International Joint Conference on Artificial Intelligence. Los Angeles.
Touretzky D.S., & Wheeler D.W. (1989). A connectionist implementation of cognitive phonology (Technical Report CMU-CS-89–144). Pittsburgh, PA: Carnegie Mellon University, School of Computer Science.
Van Gelder, T.J. (in press). Compositionality: Variations on a classical theme. Cognitive Science.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Elman, J.L. Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7, 195–225 (1991). https://doi.org/10.1007/BF00114844
Issue Date:
DOI: https://doi.org/10.1007/BF00114844
Keywords
- Distributed representations
- simple recurrent networks
- grammatical structure