Artificial Intelligence Review

, Volume 36, Issue 1, pp 1–27 | Cite as

A survey of grammatical inference methods for natural language learning

  • Arianna D’Ulizia
  • Fernando Ferri
  • Patrizia Grifoni


The high complexity of natural language and the huge amount of human and temporal resources necessary for producing the grammars lead several researchers in the area of Natural Language Processing to investigate various solutions for automating grammar generation and updating processes. Many algorithms for Context-Free Grammar inference have been developed in the literature. This paper provides a survey of the methodologies for inferring context-free grammars from examples, developed by researchers in the last decade. After introducing some preliminary definitions and notations concerning learning and inductive inference, some of the most relevant existing grammatical inference methods for Natural Language are described and classified according to the kind of presentation (if text or informant) and the type of information (if supervised, unsupervised, or semi-supervised). Moreover, the state of the art of the strategies for evaluation and comparison of different grammar inference methods is presented. The goal of the paper is to provide a reader with introduction to major concepts and current approaches in Natural Language Learning research.


Grammatical inference Natural language Context free grammar 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adriaans PW (2001) Learning shallow context-free languages under simple distributions. In: Opestake A, Vermeulen K (eds) Algebras, diagrams and decisions in language, logic and computation, CSLI/CUPGoogle Scholar
  2. Adriaans PW (1992) Language learning from a categorial perspective. PhD thesis, University of Amsterdam, AmsterdamGoogle Scholar
  3. Adriaans PW, Vervoort M (2002) The EMILE 4.1 grammar induction toolbox. In: Adriaans P, Fernau H, van Zaanen M (eds) Grammatical inference: algorithms and applications: 6th international colloquium: ICGI 2002. Lecture notes in computer science, vol 2484. Springer, Heidelberg, pp 293–295Google Scholar
  4. Angluin D (1982) Inference of reversible languages. J ACM 29: 741–765MathSciNetMATHCrossRefGoogle Scholar
  5. Baker JK (1979) Trainable grammars for speech recognition. In: Klatt DH, Wolf JJ (eds) Speech communication papers for the 97th meeting of the acoustical society of America, pp 547–550Google Scholar
  6. Black E, Abney S, Flickinger D, Gdaniec C, Grishman R, Harrison P, Hindle D, Ingria R, Jelinek F, Klavans J, Liberman M, Marcus M, Roukos S, Santorini B, Strzalkowski T (1991) A procedure for quantitatively comparing the syntactic coverage of English grammars. In: Proceedings of the DARPA speech and natural language workshop, pp 306–311Google Scholar
  7. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with cotraining. In: Proceedings of the workshop on computational learning theoryGoogle Scholar
  8. Bonnema R, Bod R, Scha R (1997) A DOP model for semantic interpretation. In: ACL 1997, pp 159–167Google Scholar
  9. Briscoe T (2000) Grammatical acquisition: inductive bias and coevolution of language and the language acquisition device. Language, pp 245–296Google Scholar
  10. Charniak E, Johnson M (2005) Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd annual meeting of the ACL, Ann Arbor, pp 173–180Google Scholar
  11. Charniak E (1997) Statistical parsing with a context-free grammar and word statistics. In: Proceedings of the fourteenth national conference on artificial intelligence, Menlo Park. AAAI Press/MIT PressGoogle Scholar
  12. Chomsky N (1957) Syntactic Structures. The Hague Mouton.Google Scholar
  13. Clark A (2001) Unsupervised induction of stochastic context-free grammars using distributional clustering. In: ConLL ‘01: Proceedings of the 2001 workshop on computational natural language learning, Morristown, NJ, USA. Association for Computational Linguistics, pp 1–8Google Scholar
  14. Cramer B (2007) Limitations of current grammar induction algorithms. In: Proceedings of the 45th annual meeting of the ACL: student research workshop, June 25–26, 2007, Prague, Czech RepublicGoogle Scholar
  15. Déjean H (2000) ALLiS: a symbolic learning system for natural language learning. In: Cardie C, Daelemans W, N’edellec C, Tjong Kim Sang E (eds) Proceedings of the fourth conference on computational natural language learning and of the second learning language in logic workshop; Lisbon, Portugal. Held in cooperation with ICGI-2000, pp 95–98Google Scholar
  16. de la Higuera C, Oncina J (2003) Identification with Probability One of Stochastic Deterministic Linear Languages. In: Proceedings of ALT 2003. Springer, Berlin, Heidelberg, pp 134–148Google Scholar
  17. Denis F (1998) Pac learning from positive statistical queries. In: Proceedings of 9th international conference on algorithmic learning theory—ALT ‘98, Springer, pp 112–126Google Scholar
  18. Edelman S, Solan Z, Horn D, Ruppin E (2005) Learning syntactic constructions from raw corpora. In: 29th Boston University conference on language development, Cascadilla PressGoogle Scholar
  19. Emerald JD, Subramanian KG, Thomas DG (1996) Learning code regular and code linear languages. In: Proceedings of international colloquium on grammatical inference (ICGI-96), lecture notes in artificial intelligence 1147, Springer, pp 211–221Google Scholar
  20. Garcia P, Vidal E (1990) Inference of K-testable languages in the strict sense and applications to syntactic pattern recognition. J IEEE Trans Pattern Anal Mach Intell 12(9): 920–925CrossRefGoogle Scholar
  21. Gold EM (1967) Language identification in the limit. Inform Control 10: 447–474MATHCrossRefGoogle Scholar
  22. Hänig C, Bordag S, Quasthoff U (2008) UnsuParse: unsupervised parsing with unsupervised part of speech tagging. In: Proceedings of the sixth international language resources and evaluation (LREC 2008)Google Scholar
  23. Hopcroft JE, Ullman JE (1979) Introduction to automata theory, languages, and computation. Addison-Wesley, New YorkMATHGoogle Scholar
  24. Horning JJ (1969) A study of grammatical inference. PhD thesis, Stanford University, Stanford:CA, USAGoogle Scholar
  25. Kasami T (1965) An efficient recognition and syntax analysis algorithm for context-free languages. Science report, Air Force Cambridge Research Laboratory, BedfordGoogle Scholar
  26. Koshiba T, Makinen E, Takada Y (1997) Inferring pure context-free languages from positive data. Technical report A-1997-14, Department of Computer Science, University of TampereGoogle Scholar
  27. Langley P, Stromsten S (2000) Learning context-free grammars with a simplicity bias. In: Proceedings of the eleventh European conference on machine learning (ECML 2000), lecture notes in artificial intelligence 1810, Springer, pp 220–228Google Scholar
  28. Levenshtein VI (1965) Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSR 163(4): 845–848 (Original in Russian)MathSciNetGoogle Scholar
  29. MacWhinney B (1991) The CHILDES project: tools for analyzing talk. Erlbaum, MahwahGoogle Scholar
  30. Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn treebank. Comput Linguist 19(2): 313–330Google Scholar
  31. McClosky D, Charniak E, Johnson M (2006) Effective self-training for parsing. In: Proceedings of HLT-NAACL 2006Google Scholar
  32. Nakamura K (2003) Incremental learning of context free grammars by extended inductive cyk algorithm. In: Higuera C, Adriaans PW, Zaanen M, Oncina J (eds) ECML workshop on learning contex-free grammars. Ruder Boskovic Institute, Zagreb, pp 53–64Google Scholar
  33. Nakamura K, Matsumoto M (2002) Incremental learning of context free grammars. In: ICGI ‘02: proceedings of the 6th international colloquium on grammatical inference (London, UK), Springer, pp 174–184Google Scholar
  34. Nakamura K, Ishiwata T (2000) Synthesizing context free grammars from sample strings based on inductive cyk algorithm. In: ICGI ‘00: proceedings of the 5th international colloquium on grammatical inference, London, UK, Springer, pp 186–195Google Scholar
  35. Petasis G, Paliouras G, Karkaletsis V, Halatsis C, Spyropoulos CD (2004) e-GRIDS: computationally efficient grammatical inference from positive examples. GRAMMARS 7: 69–110Google Scholar
  36. Pullum GK (2003) Learnability. In: The Oxford International Encyclopaedia of Linguistics, 2nd edn. Oxford, Oxford University Press, pp 431–434Google Scholar
  37. Rissanen J (1982) A universal prior for integers and estimation by minimum description length. Ann Statist 11: 416–431MathSciNetCrossRefGoogle Scholar
  38. Roberts A, Atwell E (2002) Unsupervised grammar inference systems for natural language. Research report number 2002.20. School of Computing, University of LeedsGoogle Scholar
  39. Sakakibara Y (1997) Recent advances of grammatical inference. Theor Comput Sci 185: 15–45MathSciNetMATHCrossRefGoogle Scholar
  40. Sakakibara Y, Brown M, Hughley R, Mian I, Sjolander K, Underwood R, Haussler D (1994) Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 22: 5112–5120CrossRefGoogle Scholar
  41. Sakakibara Y, Muramatsu H (2000) Learning context-free grammars from partially structured examples. In: Proceedings of the 5th international colloquium on grammatical inference: algorithms and applications (ICGI), pp 229–240Google Scholar
  42. Salvador I, Benedı JM (2002) RNA modeling by combining stochastic context-free grammars and n-Gram models. Int J Pattern Recogn Artif Intell 16(3): 309–316CrossRefGoogle Scholar
  43. Seginer Y (2007) Fast unsupervised incremental parsing. In: Proceedings of the ACL 2007, PragueGoogle Scholar
  44. Solan Z, Horn D, Ruppin E, Edelman S (2005) Unsupervised learning of natural languages. Proc Natl Acad Sci USA 102(33): 11629–11634CrossRefGoogle Scholar
  45. Steedman M, Osborne M, Sarkar A, Clark S, Hwa R, Hockenmaier J, Ruhlen P, Baker S, Crim J (2003) Bootstrapping statistical parsers from small datasets. In: Proceedings of the annual meeting of the European chapter of the ACL, Budapest, HungaryGoogle Scholar
  46. van Zaanen MV (2001) Bootstrapping structure into language: alignment-based learning. PhD thesis, School of Computing, University of Leeds, UKGoogle Scholar
  47. van Zaanen M, Adriaans P (2001) Alignment-based learning versus EMILE: a comparison. In: Proceedings of the Belgian-Dutch conference on artificial intelligence (BNAIC), Amsterdam, The NetherlandsGoogle Scholar
  48. Watkinson S, Manandhar S (2001) A psychologically plausible and computationally effective approach to learning syntax. In: Proceedings of the workshop computational natural language learning (CoNLL-2001), pp 160–167Google Scholar
  49. Yokomori T (1995) On polynomial-time learnability in the limit of strictly deterministic automata. J Mach Learn 19: 153–179MATHGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Arianna D’Ulizia
    • 1
  • Fernando Ferri
    • 1
  • Patrizia Grifoni
    • 1
  1. 1.Consiglio Nazionale delle Ricerche – Istituto di Ricerche sulla Popolazione e le Politiche SocialiRomeItaly

Personalised recommendations