Abstract
We examine the expressive power of probabilistic context free grammars (PCFGs), with a special focus on the use of probabilities as a mechanism for reducing ambiguity by filtering out unwanted parses. Probabilities in PCFGs induce an ordering relation among the set of trees that yield a given input sentence. PCFG parsers return the trees bearing the maximum probability for a given sentence, discarding all other possible trees. This mechanism is naturally viewed as a way of defining a new class of tree languages. We formalize the tree language thus defined, study its expressive power, and show that the latter is beyond context freeness. While the increased expressive power offered by PCFGs helps to reduce ambiguity, we show that, in general, it cannot be decided whether a PCFG removes all ambiguities.
Similar content being viewed by others
References
Abney, S., 1996, “Statistical methods and linguistics,” in The Balancing Act: Combining Symbolic and Statistical Approaches to Language, J. Klavans and P. Resnik, eds., Cambridge, MA: The MIT Press.
Bod, R., Scha, R., and Sima'an, K. (eds.), 2002, Data Oriented Parsing, CSLI.
Booth, T. and Thompson, R., 1973, “Applying probability measures to abstract languages,” IEEE Transaction on Computers C-33(5), 442–450.
Charniak, E., 1995, “Parsing with context-free grammars and word statistics,” Technical Report CS-95-28, Department of Computer Science, Brown University, Providence.
Chaudhuri, R. and Rao, A.N.V., 1986, “Approximating grammar probabilities: Solution of a conjecture,” Journal of the ACM 33(4), 702–705.
Collins, M., 1999, “Head-driven statistical models for natural language parsing,” Ph.D. thesis, University of Pennsylvania, PA.
Cortes, C. and Mohri, M., 2000, “Context-free recognition with weighted automata,” Grammars 2–3(3).
Eisner, J., 1996, “Three new probabilistic models for dependency parsing: an exploration,” in Proceedings of 16th International Conference on Computational Linguistics (COLING), Copenhagen, Denmark, pp. 340–245.
Eisner, J., 2000, “Bilexical grammars and their cubic-time parsing algorithms,” in Advances in Probabilistic and Other Parsing Technologies, H. Bunt and A. Nijholt, eds., Kluwer Academic Publishers, pp. 29–62.
Hopcroft, J. and Ullman, J., 1979, Introduction to Automata Theory, Lanaguges, and Computation, Reading, MA: Addison Wesley.
Horning, J.J., 1969, “A study of grammatical inference,” Ph.D. thesis, Stanford University.
Infante-Lopez, G., 2005, “Two-level probabilistic grammars for natural language parsing,” Ph.D. thesis, University of Amsterdam.
Infante-Lopez, G. and de Rijke, M., 2004, “Alternative approaches for generating bodies of grammar rules,” in Proceedings of the 42nd Annual Meeting of the ACL, Barcelona.
Klein, D. and Manning, C., 2003, “Accurate unlexicalized parsing,” in Proceedings of the 41st Annual Meeting of the ACL.
Manning, C. and Schütze, H., 1999, Foundations of Statistical Natural Language Processing, Cambridge, MA: The MIT Press.
Parikh, R.J., 1966, “On context-free languages,” Journal of the ACM 13, 570–581.
Wetherell, C.S., 1980, “Probabilistic Languages: A review and some questions,” ACM Computer Surveys 4(12), 361–379.
Wich, K., 2000, “Exponential ambiguity of context-free grammars,” in Proceedings of the 4th International Conference on Developments in Language Theory, pp. 125–138.
Wich, K., 2001, “Characterization of context-free languages with polynomially bounded ambiguity,” in Proceedings of the 26th International Symposium on Mathematical Foundations of Computer Science (MFCS).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Infante-Lopez, G., De Rijke, M. A Note on the Expressive Power of Probabilistic Context Free Grammars. JoLLI 15, 219–231 (2006). https://doi.org/10.1007/s10849-005-9002-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10849-005-9002-x