Machine Learning

, Volume 2, Issue 1, pp 9–38 | Cite as

Learning syntax by automata induction

  • Robert C. Berwick
  • Sam Pilato


In this paper we propose an explicit computer model for learning natural language syntax based on Angluin's (1982) efficient induction algorithms, using a complete corpus of grammatical example sentences. We use these results to show how inductive inference methods may be applied to learn substantial, coherent subparts of at least one natural language — English — that are not susceptible to the kinds of learning envisioned in linguistic theory. As two concrete case studies, we show how to learn English auxiliary verb sequences (such as could be taking, will have been taking) and the sequences of articles and adjectives that appear before noun phrases (such as the very old big deer). Both systems can be acquired in a computationally feasible amount of time using either positive examples, or, in an incremental mode, with implicit negative examples (examples outside a finite corpus are considered to be negative examples). As far as we know, this is the first computer procedure that learns a full-scale range of noun subclasses and noun phrase structure. The generalizations and the time required for acquisition match our knowledge of child language acquisition for these two cases. More importantly, these results show that just where linguistic theories admit to highly irregular subportions, we can apply efficient automata-theoretic learning algorithms. Since the algorithm works only for fragments of language syntax, we do not believe that it suffices for all of language acquisition. Rather, we would claim that language acquisition is nonuniform and susceptible to a variety of acquisition strategies; this algorithm may be one these.


Formal inductive inference language acquisition automata theory 


  1. Akmajian, A., Steele, S., & Wasow, T. (1979). The category AUX in universal grammar. Linguistic Inquiry, 10, 1–64.Google Scholar
  2. Angluin, D. (1977). Inductive inference of formal languages from positive data. Information and Control, 45, 117–135.Google Scholar
  3. Angluin, D. (1982). Inference of reversible languages. Journal of the Association for Computing Machinery, 29, 741–765.Google Scholar
  4. Berwick, R. (1982). Locality principles and the acquisition of syntactic knowledge. Doctoral dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  5. Berwick, R. (1985). The acquisition of syntactic knowledge. Cambridge, MA: MIT Press.Google Scholar
  6. Brown, R. (1973). A first language. Cambridge, MA: Harvard University Press.Google Scholar
  7. Fu, K., & Booth, T. (1975). Grammatical inference: Introduction and survey. IEEE Transactions on Systems, Man, and Cybernetics, 5, 95–111.Google Scholar
  8. Gleitman, L., & Wanner, E. (1982). Language acquisition: The state of the state of the art. In E. Wanner & L.Gleitman (Eds.), Language acquisition: The state of the art. New York: Cambridge University Press.Google Scholar
  9. Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447–474.Google Scholar
  10. Gold, E. M. (1978). Complexity of automaton identification from given data. Information and Control, 37, 302–320.Google Scholar
  11. Gonzalez, R. C., & Thomason, M. G. (1978). Syntactic pattern recognition. Reading, MA: Addison-Wesley.Google Scholar
  12. Jackendoff, R. (1977). X syntax: A study in phrase structure. Cambridge, MA: MIT Press.Google Scholar
  13. Langley, P. (1982). Language acquisition through error recovery. Cognition and Brain Theory, 3, 211–255.Google Scholar
  14. Lightfoot, D. (1982). The language lottery. Cambridge, MA: MIT Press.Google Scholar
  15. MacWhinney, B. (1982). Basic processes in syntactic acquisition. In S.A. Kuczaj II (Ed.), Language development: Vol. 1. Syntax and semantics. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  16. Mitchell, T. M. (1978). Version spaces: An approach to concept learning. Doctoral dissertation, Department of Electrical Engineering, Standord University, Stanford, CA.Google Scholar
  17. Olivier, D. (1968). Stochastic grammars and language acquisition mechanisms. Doctoral dissertation, Department of Psychology and Social Relations, Harvard University, Cambridge, MA.Google Scholar
  18. Osherson, D., Stob, M., & Weinstein, S. (1985). Systems that learn. Cambridge, MA: MIT Press.Google Scholar
  19. Pinker, S. (1984). Language learnability and language development. Cambridge, MA: Harvard University Press.Google Scholar
  20. Wexler, K., & Culicover, P. (1982). Formal principles of language acquisition. Cambridge, MA: MIT Press.Google Scholar
  21. Wolff, J. G. (1978). Grammar discovery as data compression. In Proceedings of the AISB/GI Conference on Artificial Intelligence (pp. 375–379). Hamburg, West Germany.Google Scholar
  22. Wolff, J. G. (1982). Language acquisition, data compression, and generalization. La nguage and Communication, 2, 57–89.Google Scholar

Copyright information

© Kluwer Academic Publishers 1987

Authors and Affiliations

  • Robert C. Berwick
    • 1
  • Sam Pilato
    • 2
  1. 1.MIT Artificial Intelligence LaboratoryCambridgeU.S.A.
  2. 2.Brattle Research CorporationCambridgeU.S.A.

Personalised recommendations