Stochastic k-testable Tree Languages and Applications
In this paper, we describe a generalization for tree stochastic languages of the k-gram models. These models are based on the k-testable class, a subclass of the languages recognizable by ascending tree automata. One of the advantages of this approach is that the probabilistic model can be updated in an incremental fashion. Another feature is that backing-off schemes can be defined. As an illustration of their applicability, they have been used to compress tree data files at a better rate than string-based methods.
Keywordstree grammars stochastic models backing-off data compression
Unable to display preview. Download preview PDF.
- Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jenifer Lai, and Robert L. Mercer. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–479, 1992.Google Scholar
- Eugene Charniak. Statistical Language Learning. MIT Press, 1993.Google Scholar
- Pedro García. Learning k-testable tree sets from positive data. Technical Report DSIC-ii-1993-46, DSIC, Universidad Politécnica de Valencia, 1993.Google Scholar
- Pedro García and Enrique Vidal. Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(9):920–925, sep 1990.Google Scholar
- Frederick Jelinek. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, Massachusetts, 1998.Google Scholar
- Timo Knuutila. Inference of k-testable tree languages. In H. Bunke, editor, Advances in Structural and Syntactic Pattern Recognition (Proc. Intl. Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland). World Scientific, aug 1993.Google Scholar
- Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19:313–330, 1993.Google Scholar
- J.R. Rico-Juan, J. Calera-Rubio, and R.C. Carrasco. Stochastic k-testable tree languages and applications. http://www.dlsi.ua.es/~calera/fulltext02.ps.gz, 2002.
- G. Rozenberg and A. Salomaa, editors. Handbook of Formal Languages Springer, 1997.Google Scholar
- Yasubumi Sakakibara. Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97(1):23–60, March 1992.Google Scholar
- I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kauffman Publishing, San Francisco, 2nd edition, 1999.Google Scholar