Machine Learning

, Volume 96, Issue 1–2, pp 5–31 | Cite as

Distributional learning of parallel multiple context-free grammars



Natural languages require grammars beyond context-free for their description. Here we extend a family of distributional learning algorithms for context-free grammars to the class of Parallel Multiple Context-Free Grammars (pmcfgs). These grammars have two additional operations beyond the simple context-free operation of concatenation: the ability to interleave strings of symbols, and the ability to copy or duplicate strings. This allows the grammars to generate some non-semilinear languages, which are outside the class of mildly context-sensitive grammars. These grammars, if augmented with a suitable feature mechanism, are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language.

We present a learning algorithm for a large subclass of these grammars, that includes all regular languages but not all context-free languages. This algorithm relies on a generalisation of the notion of distribution as a function from tuples of strings to entire sentences; we define nonterminals using finite sets of these functions. Our learning algorithm uses a nonprobabilistic learning paradigm which allows for membership queries as well as positive samples; it runs in polynomial time.


Mildly context-sensitive Grammatical inference Semilinearity 


  1. Andrews, A. (1996). Semantic case-stacking and inside-out unification. Australian Journal of Linguistics, 16(1), 1–55. CrossRefGoogle Scholar
  2. Angluin, D. (1980). Finding patterns common to a set of strings. Journal of Computer and System Sciences, 21(1), 46–62. MATHMathSciNetCrossRefGoogle Scholar
  3. Angluin, D. (1982). Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3), 741–765. MATHMathSciNetCrossRefGoogle Scholar
  4. Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 87–106. MATHMathSciNetCrossRefGoogle Scholar
  5. Berwick, R., Pietroski, P., Yankama, B., & Chomsky, N. (2011). Poverty of the stimulus revisited. Cognitive Science, 35, 1207–1242. CrossRefGoogle Scholar
  6. Bhatt, R., & Joshi, A. (2004). Semilinearity is a syntactic invariant: a reply to Michaelis and Kracht 1997. Linguistic Inquiry, 35(4), 683–692. CrossRefGoogle Scholar
  7. Boullier, P. (1999). Chinese numbers, MIX, scrambling, and range concatenation grammars. In Proceedings of the 9th conference of the European chapter of the association for computational linguistics (EACL 99) (pp. 8–12). Google Scholar
  8. Chandlee, J., & Heinz, J. (2012). Bounded copying is subsequential: implications for metathesis and reduplication. In Twelfth meeting of the ACL special interest group on computational morphology and phonology, association for computational linguistics (pp. 42–51). Google Scholar
  9. Chomsky, N. (1956). Three models for the description of language. IEEE Transactions on Information Theory, 2(3), 113–124. MATHCrossRefGoogle Scholar
  10. Clark, A. (2010). Learning context free grammars with the syntactic concept lattice. In Sempere and García (2010) (pp. 38–51). Google Scholar
  11. Clark, A., & Eyraud, R. (2007). Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research, 8, 1725–1745. MATHMathSciNetGoogle Scholar
  12. Clark, A., & Lappin, S. (2011). Linguistic nativism and the poverty of the stimulus. New York/Oxford: Wiley/Blackwell Sci. CrossRefGoogle Scholar
  13. Evans, N. (1995). A grammar of Kayardild: with historical-comparative notes on Tangkic (Vol. 15). Berlin: de Gruyter. CrossRefGoogle Scholar
  14. Gazdar, G., Klein, E., Pullum, G., & Sag, I. (1985). Generalised phrase structure grammar. Oxford: Blackwell Sci. Google Scholar
  15. Gold, E. M. (1967). Language identification in the limit. Information and Computation, 10(5), 447–474. MATHGoogle Scholar
  16. Groenink, A. (1995). Literal movement grammars. In Proceedings of the seventh conference of the European chapter of the association for computational linguistics, University College, Dublin (pp. 90–97). CrossRefGoogle Scholar
  17. Groenink, A. (1997). Mild context-sensitivity and tuple-based generalizations of context-grammar. Linguistics and Philosophy, 20(6), 607–636. CrossRefGoogle Scholar
  18. Huybrechts, R. A. C. (1984). The weak inadequacy of context-free phrase structure grammars. In G. de Haan, M. Trommelen, & W. Zonneveld (Eds.), Van Periferie naar Kern, Dordrecht: Foris. Google Scholar
  19. Inkelas, S. (2008). The dual theory of reduplication. Linguistics, 46(2), 351–401. CrossRefGoogle Scholar
  20. Inkelas, S., & Zoll, C. (2005). Reduplication: doubling in morphology. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  21. Joshi, A., Vijay-Shanker, K., & Weir, D. (1991). The convergence of mildly context-sensitive grammar formalisms. In P. Sells, S. Shieber, & T. Wasow (Eds.), Foundational issues in natural language processing (pp. 31–81). Cambridge: MIT Press. Google Scholar
  22. Kobele, G. (2006). Generating copies: an investigation into structural identity in language and grammar. PhD thesis, University of California Los Angeles. Google Scholar
  23. Kracht, M. (2011). Interpreted languages and compositionality. Berlin: Springer. CrossRefGoogle Scholar
  24. Ljunglöf, P. (2005). A polynomial time extension of parallel multiple context-free grammar. In P. Blache, E. Stabler, J. Busquets, & R. Moot (Eds.), Lecture notes in computer science: Vol. 3492. Logical aspects of computational linguistics (pp. 177–188). Berlin: Springer. CrossRefGoogle Scholar
  25. Michaelis, J., & Kracht, M. (1997). Semilinearity as a syntactic invariant. In C. Retoré (Ed.), Logical aspects of computational linguistics (pp. 329–345). Berlin: Springer. CrossRefGoogle Scholar
  26. Oates, T., Armstrong, T., Becerra-Bonache, L., & Atamas, M. (2006). Inferring grammars for mildly context sensitive languages in polynomial-time. In Y. Sakakibara, S. Kobayashi, K. Sato, T. Nishino, & E. Tomita (Eds.), Lecture notes in computer science (Vol. 4201, pp. 137–147). Berlin: Springer. Google Scholar
  27. Okhotin, A. (2001). Conjunctive grammars. Journal of Automata, Languages and Combinatorics, 6(4), 519–535. MATHMathSciNetGoogle Scholar
  28. Radzinski, D. (1991). Chinese number-names, tree adjoining languages, and mild context-sensitivity. Computational Linguistics, 17(3), 277–299. Google Scholar
  29. Sadler, L., & Nordlinger, R. (2006). Case stacking in realizational morphology. Linguistics, 44(3), 459–487. CrossRefGoogle Scholar
  30. Seki, H., Matsumura, T., Fujii, M., & Kasami, T. (1991). On multiple context-free grammars. Theoretical Computer Science, 88(2), 191–229. MATHMathSciNetCrossRefGoogle Scholar
  31. Sempere, J. M. & García, P. (Eds.) (2010). Grammatical inference: theoretical results and applications. In 10th International Colloquium, ICGI 2010. Berlin: Springer. Google Scholar
  32. Shieber, S. M. (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8, 333–343. CrossRefGoogle Scholar
  33. Shinohara, T. (1994). Rich classes inferrable from positive data—length-bounded elementary formal systems. Information and Computation, 108(2), 175–186. MATHMathSciNetCrossRefGoogle Scholar
  34. Smullyan, R. (1961). Theory of formal systems. Princeton: Princeton University Press. MATHGoogle Scholar
  35. Vijay-Shanker, K., & Weir, D. J. (1994). The equivalence of four extensions of context-free grammars. Mathematical Systems Theory, 27(6), 511–546. MATHMathSciNetCrossRefGoogle Scholar
  36. Vijay-Shanker, K., Weir, D. J., & Joshi, A. K. (1987). Characterizing structural descriptions produced by various grammatical formalisms. In Proceedings of the 25th annual meeting of association for computational linguistics, Stanford (pp. 104–111). CrossRefGoogle Scholar
  37. Yoshinaka, R. (2010). Polynomial-time identification of multiple context-free languages from positive data and membership queries. In Sempere and García (2010) (pp. 230–244). Google Scholar
  38. Yoshinaka, R. (2011a). Efficient learning of multiple context-free languages with multidimensional substitutability from positive data. Theoretical Computer Science, 412(19), 1821–1831. MATHMathSciNetCrossRefGoogle Scholar
  39. Yoshinaka, R. (2011b). Towards dual approaches for learning context-free grammars based on syntactic concept lattices. In G. Mauri & A. Leporati (Eds.), Lecture notes in computer science: Vol. 6795. Developments in language theory (pp. 429–440). Berlin: Springer. CrossRefGoogle Scholar
  40. Yoshinaka, R., & Clark, A. (2012). Polynomial time learning of some multiple context-free languages with a minimally adequate teacher. In P. Groote & M. J. Nederhof (Eds.), Lecture notes in computer science: Vol. 7395. Formal grammar (pp. 192–207). Berlin: Springer. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Department of PhilosophyKing’s College LondonLondonUK
  2. 2.Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations