On the Unsupervised Induction of Phrase-Structure Grammars

  • C. de Marcken
Part of the Text, Speech and Language Technology book series (TLTB, volume 11)


Researchers investigating the acquisition of phrase-structure grammars from raw text have had only mixed success. In particular, unsupervised learning techniques, such as the inside-outside algorithm (Baker, 1979) for estimating the parameters of stochastic context-free grammars (SCFGs), tend to produce grammars that structure text in ways contrary to our linguistic intuitions. One effective way around this problem is to use hand-structured text like the Penn Treebank (Marcus, 1991) to constrain the learner: (Pereira and Schabes, 1992) demonstrate that the inside-outside algorithm can learn grammars effectively given such constraint, and currently the best performing parsers are trained on treebanks (Black et al., 1992; Magerman, 1995).


Phrase Structure Parse Tree Minimum Description Length Computational Linguistics Prepositional Phrase 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Baker, J. K. 1979. Trainable grammars for speech recognition. In Proceedings of the 97th Meeting of the Acoustical Society of America, pp. 547–550.Google Scholar
  2. Black, E., Jelinek, F., Lafferty, J., Magerman, D. M., Mercer, R. and Roukos, S. 1992. Toward history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the February 1992 DARPA Speech and Natural Language Workshop.Google Scholar
  3. Briscoe, T. and Waegner, N. 1992. Robust stochastic parsing using the inside-outside algorithm. In Proc. of the AAAI Workshop on Probabilistic-Based Natural Language Processing Techniques, pp. 39–52.Google Scholar
  4. Chen, S. F. 1995. Bayesian grammar induction for language modeling. In Proc. 32nd Annual Meeting of the Association for Computational Linguistics, pp. 228–235, Cambridge, Massachusetts.Google Scholar
  5. Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. John Wiley Sons, New York, NY.Google Scholar
  6. de Marcken, C. 1995. The unsupervised acquisition of a lexicon from continuous speech. Memo A.I. Memo 1558, MIT Artificial Intelligence Lab., Cambridge, Massachusetts.Google Scholar
  7. Lari, K. and Young, S. J. 1990. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4: 35–56.CrossRefGoogle Scholar
  8. Magerman, D. M. and Marcus, M. P. 1990. Parsing a natural language using mutual information statistics. In Proc. of the American Association for Artificial Intelligence, pp. 984–989.Google Scholar
  9. Magerman, D. M. 1995. Statistical decision-tree models for parsing. In Proc. 32nd Annual Meeting of the Association for Computational Linguistics, pp. 276–283, Cambridge, Massachusetts.CrossRefGoogle Scholar
  10. Marcus, M. P. 1991. Very large annotated database of American English. In Proceedings of the DARPA Speech and Natural Language Workshop. Google Scholar
  11. Olivier, D. C. 1968. Stochastic Grammars and Language Acquisition Mechanisms. Ph.D. thesis, Harvard University, Cambridge, Massachusetts.Google Scholar
  12. Pereira, F. and Schabes, Y. 1992. Inside-outside reestimation from partially bracketed corpora. In Proc. 29th Annual Meeting of the Association for Computational Linguistics, pp. 128–135, Berkeley, California.Google Scholar
  13. Sleator, D. D. K. and Temperley, D. 1991. Parsing english with a link grammar. Technical Report CMU-CS-91–196, Carnegie Mellon University, Pittsburgh, Pennsylvania.Google Scholar
  14. Stolcke, A. 1994. Bayesian Learning of Probabalistic Language Models. Ph.D. thesis, University of California at Berkeley, Berkeley, CA.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1999

Authors and Affiliations

  • C. de Marcken

There are no affiliations available

Personalised recommendations