Inducing probabilistic grammars by Bayesian model merging

Stolcke, Andreas; Omohundro, Stephen

doi:10.1007/3-540-58473-0_141

Andreas Stolcke¹ &
Stephen Omohundro¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 862))

Included in the following conference series:

International Colloquium on Grammatical Inference

289 Accesses
68 Citations

Abstract

We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based n-grams, and stochastic context-free grammars.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angluin, D., & C. H. Smith. 1983. Inductive inference: Theory and methods. ACM Computing Surveys 15.237–269.
Article Google Scholar
Baker, James K. 1979. Trainable grammars for speech recognition. In Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, ed. by Jared J. Wolf & Dennis H. Klatt, 547–550, MIT, Cambridge, Mass.
Google Scholar
Baum, Leonard E., Ted Petrie, George Soules, & Norman Weiss. 1970. A maximization technique occuring in the statistical analysis of probabilistic functions in Markov chains. The Annals of Mathematical Statistics 41.164–171.
Google Scholar
Bell, Timothy C., John G. Cleary, & Ian H. Witten. 1990. Text Compression. Englewood Cliffs, N.J.: Prentice Hall.
Google Scholar
Booth, Taylor L., & Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442–450.
Google Scholar
Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, & Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics 18.467–479.
Google Scholar
Buntine, Wray. 1992. Learning classification trees. In Artificial Intelligence Frontiers in Statistics: AI and Statistics III, ed. by D. J. Hand. Chapman & Hall.
Google Scholar
Carrasco, Rafael C., & José Oncina, 1994. Learning stochastic regular grammars by means of a state merging method. This volume.
Google Scholar
Cook, Craig M., Azriel Rosenfeld, & Alan R. Aronson. 1976. Grammatical inference by hill climbing. Information Sciences 10.59–80.
Google Scholar
Dempster, A. P., N. M. Laird, & D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 34.1–38.
Google Scholar
Gull, S. F. 1988. Bayesian inductive inference and maximum entropy. In Maximum Entropy and Bayesian Methods in Science and Engineering, Volume 1: Foundations, ed. by G. J. Erickson & C. R. Smith, 53–74. Dordrecht: Kluwer.
Google Scholar
Hopcroft, John E., & Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages, and Computation. Reading, Mass.: Addison-Wesley.
Google Scholar
Horning, James Jay. 1969. A study of grammatical inference. Technical Report CS 139, Computer Science Department, Stanford University, Stanford, Ca.
Google Scholar
Jelinek, Frederick, John D. Lafferty, & Robert L. Mercer. 1992. Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding. Recent Advances, Trends, and Applications, ed. by Pietro Laface & Renato De Mori, volume F75 of NATO Advanced Sciences Institutes Series, 345–360. Berlin: Springer Verlag. Proceedings of the NATO Advanced Study Institute, Cetraro, Italy, July 1990.
Google Scholar
Langley, Pat, 1994. Simplicity and representation change in grammar induction. Unpublished mss.
Google Scholar
Omohundro, Stephen M. 1992. Best-first model merging for dynamic learning and recognition. Technical Report TR-92-004, International Computer Science Institute, Berkeley, Ca.
Google Scholar
Oncina, José, Pedro García, & Enrique Vidal. 1993. Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15.448–458.
Article Google Scholar
Quinlan, J. Ross, & Ronald L. Rivest. 1989. Inferring decision trees using the minimum description length principle. Information and Computation 80.227–248.
Article Google Scholar
Rabiner, L. R., & B. H. Juang. 1986. An introduction to hidden Markov models. IEEE ASSP Magazine 3.4–16.
Google Scholar
Ron, Dana, Yoram Singer, & Naftali Tishby. 1994. The power of amnesia. In Advances in Neural Information Processing Systems 6, ed. by Jack Cowan, Gerald Tesauro, & Joshua Alspector. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Sakakibara, Yasubumi. 1990. Learning context-free grammars from structural data in polynomial time. Theoretical Computer Science 76.223–242.
Google Scholar
Stolcke, Andreas, 1994. Bayesian Learning of Probabilistic Language Models. Berkeley, CA: University of California dissertation.
Google Scholar
—, & Stephen Omohundro. 1994. Best-first model merging for hidden Markov model induction. Technical Report TR-94-003, International Computer Science Institute, Berkeley, CA.
Google Scholar
Wolff, J. G. 1987. Cognitive development as optimisation. In Computational models of learning, ed. by L. Bolc, 161–205. Berlin: Springer Verlag.
Google Scholar
Wooters, Chuck, & Andreas Stolcke. 1994. Multiple-pronunciation lexical modeling in a speaker-independent speech understanding system. In Proceedings International Conference on Spoken Language Processing, Yokohama.
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, 1947 Center St., Suite 600, 94707, Berkeley, CA
Andreas Stolcke & Stephen Omohundro

Authors

Andreas Stolcke
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Omohundro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rafael C. Carrasco Jose Oncina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stolcke, A., Omohundro, S. (1994). Inducing probabilistic grammars by Bayesian model merging. In: Carrasco, R.C., Oncina, J. (eds) Grammatical Inference and Applications. ICGI 1994. Lecture Notes in Computer Science, vol 862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58473-0_141

Download citation

DOI: https://doi.org/10.1007/3-540-58473-0_141
Published: 04 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58473-5
Online ISBN: 978-3-540-48985-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics