Learning Context-Free Grammars with a Simplicity Bias
We examine the role of simplicity in directing the induction of context-free grammars from sample sentences. We present a rational reconstruction of Wolff’s SNPR — the Grids system — which incorporates a bias toward grammars that minimize description length. The algorithm alternates between merging existing nonterminal symbols and creating new symbols, using a beam search to move from complex to simpler grammars. Experiments suggest that this approach can induce accurate grammars and that it scales reasonably to more difficult domains.
- 2.Grünwald, P. (1996). A minimum description length approach to grammar inference. In S. Wermter, E. Riloff, & G. Scheler (Eds.) Connectionist, statistical and symbolic approaches to learning for natural language processing. Lecture Notes in Computer Science, 1040. Berlin: Springer-Verlag.Google Scholar
- 3.Stolcke, A. (1994). Bayesian learning of probabilistic language models. Doctoral dissertation, Division of Computer Science, University of California, Berkeley.Google Scholar
- 4.VanLehn, K., & Ball, W. (1987). A version space approach to learning context-free grammars. Machine Learning, 2, 39–74.Google Scholar