Abstract
We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based n-grams, and stochastic context-free grammars.
Preview
Unable to display preview. Download preview PDF.
References
Angluin, D., & C. H. Smith. 1983. Inductive inference: Theory and methods. ACM Computing Surveys 15.237–269.
Baker, James K. 1979. Trainable grammars for speech recognition. In Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, ed. by Jared J. Wolf & Dennis H. Klatt, 547–550, MIT, Cambridge, Mass.
Baum, Leonard E., Ted Petrie, George Soules, & Norman Weiss. 1970. A maximization technique occuring in the statistical analysis of probabilistic functions in Markov chains. The Annals of Mathematical Statistics 41.164–171.
Bell, Timothy C., John G. Cleary, & Ian H. Witten. 1990. Text Compression. Englewood Cliffs, N.J.: Prentice Hall.
Booth, Taylor L., & Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442–450.
Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, & Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics 18.467–479.
Buntine, Wray. 1992. Learning classification trees. In Artificial Intelligence Frontiers in Statistics: AI and Statistics III, ed. by D. J. Hand. Chapman & Hall.
Carrasco, Rafael C., & José Oncina, 1994. Learning stochastic regular grammars by means of a state merging method. This volume.
Cook, Craig M., Azriel Rosenfeld, & Alan R. Aronson. 1976. Grammatical inference by hill climbing. Information Sciences 10.59–80.
Dempster, A. P., N. M. Laird, & D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 34.1–38.
Gull, S. F. 1988. Bayesian inductive inference and maximum entropy. In Maximum Entropy and Bayesian Methods in Science and Engineering, Volume 1: Foundations, ed. by G. J. Erickson & C. R. Smith, 53–74. Dordrecht: Kluwer.
Hopcroft, John E., & Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages, and Computation. Reading, Mass.: Addison-Wesley.
Horning, James Jay. 1969. A study of grammatical inference. Technical Report CS 139, Computer Science Department, Stanford University, Stanford, Ca.
Jelinek, Frederick, John D. Lafferty, & Robert L. Mercer. 1992. Basic methods of probabilistic context free grammars. In Speech Recognition and Understanding. Recent Advances, Trends, and Applications, ed. by Pietro Laface & Renato De Mori, volume F75 of NATO Advanced Sciences Institutes Series, 345–360. Berlin: Springer Verlag. Proceedings of the NATO Advanced Study Institute, Cetraro, Italy, July 1990.
Langley, Pat, 1994. Simplicity and representation change in grammar induction. Unpublished mss.
Omohundro, Stephen M. 1992. Best-first model merging for dynamic learning and recognition. Technical Report TR-92-004, International Computer Science Institute, Berkeley, Ca.
Oncina, José, Pedro GarcÃa, & Enrique Vidal. 1993. Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15.448–458.
Quinlan, J. Ross, & Ronald L. Rivest. 1989. Inferring decision trees using the minimum description length principle. Information and Computation 80.227–248.
Rabiner, L. R., & B. H. Juang. 1986. An introduction to hidden Markov models. IEEE ASSP Magazine 3.4–16.
Ron, Dana, Yoram Singer, & Naftali Tishby. 1994. The power of amnesia. In Advances in Neural Information Processing Systems 6, ed. by Jack Cowan, Gerald Tesauro, & Joshua Alspector. San Mateo, CA: Morgan Kaufmann.
Sakakibara, Yasubumi. 1990. Learning context-free grammars from structural data in polynomial time. Theoretical Computer Science 76.223–242.
Stolcke, Andreas, 1994. Bayesian Learning of Probabilistic Language Models. Berkeley, CA: University of California dissertation.
—, & Stephen Omohundro. 1994. Best-first model merging for hidden Markov model induction. Technical Report TR-94-003, International Computer Science Institute, Berkeley, CA.
Wolff, J. G. 1987. Cognitive development as optimisation. In Computational models of learning, ed. by L. Bolc, 161–205. Berlin: Springer Verlag.
Wooters, Chuck, & Andreas Stolcke. 1994. Multiple-pronunciation lexical modeling in a speaker-independent speech understanding system. In Proceedings International Conference on Spoken Language Processing, Yokohama.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stolcke, A., Omohundro, S. (1994). Inducing probabilistic grammars by Bayesian model merging. In: Carrasco, R.C., Oncina, J. (eds) Grammatical Inference and Applications. ICGI 1994. Lecture Notes in Computer Science, vol 862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58473-0_141
Download citation
DOI: https://doi.org/10.1007/3-540-58473-0_141
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58473-5
Online ISBN: 978-3-540-48985-6
eBook Packages: Springer Book Archive