Skip to main content

Inducing probabilistic grammars by Bayesian model merging

Part of the Lecture Notes in Computer Science book series (LNAI,volume 862)

Abstract

We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based n-grams, and stochastic context-free grammars.

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Angluin, D., & C. H. Smith. 1983. Inductive inference: Theory and methods. ACM Computing Surveys 15.237–269.

    Article  Google Scholar 

  • Baker, James K. 1979. Trainable grammars for speech recognition. In Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, ed. by Jared J. Wolf & Dennis H. Klatt, 547–550, MIT, Cambridge, Mass.

    Google Scholar 

  • Baum, Leonard E., Ted Petrie, George Soules, & Norman Weiss. 1970. A maximization technique occuring in the statistical analysis of probabilistic functions in Markov chains. The Annals of Mathematical Statistics 41.164–171.

    Google Scholar 

  • Bell, Timothy C., John G. Cleary, & Ian H. Witten. 1990. Text Compression. Englewood Cliffs, N.J.: Prentice Hall.

    Google Scholar 

  • Booth, Taylor L., & Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442–450.

    Google Scholar 

  • Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, & Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics 18.467–479.

    Google Scholar 

  • Buntine, Wray. 1992. Learning classification trees. In Artificial Intelligence Frontiers in Statistics: AI and Statistics III, ed. by D. J. Hand. Chapman & Hall.

    Google Scholar 

  • Carrasco, Rafael C., & José Oncina, 1994. Learning stochastic regular grammars by means of a state merging method. This volume.

    Google Scholar 

  • Cook, Craig M., Azriel Rosenfeld, & Alan R. Aronson. 1976. Grammatical inference by hill climbing. Information Sciences 10.59–80.

    Google Scholar 

  • Dempster, A. P., N. M. Laird, & D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 34.1–38.

    Google Scholar 

  • Gull, S. F. 1988. Bayesian inductive inference and maximum entropy. In Maximum Entropy and Bayesian Methods in Science and Engineering, Volume 1: Foundations, ed. by G. J. Erickson & C. R. Smith, 53–74. Dordrecht: Kluwer.

    Google Scholar 

  • Hopcroft, John E., & Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages, and Computation. Reading, Mass.: Addison-Wesley.

    Google Scholar 

  • Horning, James Jay. 1969. A study of grammatical inference. Technical Report CS 139, Computer Science Department, Stanford University, Stanford, Ca.

    Google Scholar 

  • Jelinek, Frederick, John D. Lafferty, & Robert L. Mercer. 1992. Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding. Recent Advances, Trends, and Applications, ed. by Pietro Laface & Renato De Mori, volume F75 of NATO Advanced Sciences Institutes Series, 345–360. Berlin: Springer Verlag. Proceedings of the NATO Advanced Study Institute, Cetraro, Italy, July 1990.

    Google Scholar 

  • Langley, Pat, 1994. Simplicity and representation change in grammar induction. Unpublished mss.

    Google Scholar 

  • Omohundro, Stephen M. 1992. Best-first model merging for dynamic learning and recognition. Technical Report TR-92-004, International Computer Science Institute, Berkeley, Ca.

    Google Scholar 

  • Oncina, José, Pedro García, & Enrique Vidal. 1993. Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15.448–458.

    Article  Google Scholar 

  • Quinlan, J. Ross, & Ronald L. Rivest. 1989. Inferring decision trees using the minimum description length principle. Information and Computation 80.227–248.

    Article  Google Scholar 

  • Rabiner, L. R., & B. H. Juang. 1986. An introduction to hidden Markov models. IEEE ASSP Magazine 3.4–16.

    Google Scholar 

  • Ron, Dana, Yoram Singer, & Naftali Tishby. 1994. The power of amnesia. In Advances in Neural Information Processing Systems 6, ed. by Jack Cowan, Gerald Tesauro, & Joshua Alspector. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Sakakibara, Yasubumi. 1990. Learning context-free grammars from structural data in polynomial time. Theoretical Computer Science 76.223–242.

    Google Scholar 

  • Stolcke, Andreas, 1994. Bayesian Learning of Probabilistic Language Models. Berkeley, CA: University of California dissertation.

    Google Scholar 

  • —, & Stephen Omohundro. 1994. Best-first model merging for hidden Markov model induction. Technical Report TR-94-003, International Computer Science Institute, Berkeley, CA.

    Google Scholar 

  • Wolff, J. G. 1987. Cognitive development as optimisation. In Computational models of learning, ed. by L. Bolc, 161–205. Berlin: Springer Verlag.

    Google Scholar 

  • Wooters, Chuck, & Andreas Stolcke. 1994. Multiple-pronunciation lexical modeling in a speaker-independent speech understanding system. In Proceedings International Conference on Spoken Language Processing, Yokohama.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rafael C. Carrasco Jose Oncina

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stolcke, A., Omohundro, S. (1994). Inducing probabilistic grammars by Bayesian model merging. In: Carrasco, R.C., Oncina, J. (eds) Grammatical Inference and Applications. ICGI 1994. Lecture Notes in Computer Science, vol 862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58473-0_141

Download citation

  • DOI: https://doi.org/10.1007/3-540-58473-0_141

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58473-5

  • Online ISBN: 978-3-540-48985-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics