Abstract
In this paper, we study selection criteria for the use of word trigger pairs in statistical language modeling. A word trigger pair is defined as a long-distance word pair. To select the most significant trigger pairs, we need suitable criteria which are the topics of this paper. We extend a baseline language model by a single word trigger pair and use the perplexity of this extended language model as selection criterion. This extension is applied to all possible trigger pairs, the number of which is the square of the vocabulary size. When a unigram language model is applied as baseline model, this approach produces the mutual information criterion used in [7, 11]. The more interesting case is to use this criterion in the context of a more powerful model such as a bigram/trigram model with a cache. We study different variants of including word trigger pairs into such a language model. This approach produced better word trigger pairs than the conventional mutual information criterion. When used on the Wall Street Journal corpus, the trigger pairs selected reduced the perplexity of a trigram/cache language model from 138 to 128 for a 5-million word training set and from 92 to 87 for a 38-million word training set.
Preview
Unable to display preview. Download preview PDF.
References
L.R. Bahl, F. Jelinek, R.L. Mercer and A. Nadas. “Next Word Statistical Predictor”. IBM Techn. Disclosure Bulletin, 27(7A), pp. 3941–3942, 1984.
A. Berger, S. Della Pietra and V. Della Pietra. “A Maximum Entropy Approach to Natural Language Processing”. In Computational Linguistics, Vol. 22, No. 1, pp. 39–71, March 1996.
A.P. Dempster, N.M. Laird and D.B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm”. In Journal of the Royal Statistical Society, Vol. 39, No. 1, pp. 1–38, 1977.
S. Della Pietra, V. Della Pietra, J. Gillett, J. Lafferty, H. Printz and L. Ures. “Inference and Estimation of a Long-Range Trigram Model”. In Lecture Notes in Artificial Intelligence, Grammatical Inference and Applications, ICGI-94, Alicante, Spain, Springer-Verlag, pp. 78–92, September 1994.
F. Jelinek. “Self-Organized Language Modeling for Speech Recognition”. In Readings in Speech Recognition, A. Waibel and K.F. Lee (eds.), pp. 450–506, MorganKaufmann, 1991.
S.M. Katz. “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer”. In IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 35, pp. 400–401, March 1987.
R. Lau, R. Rosenfeld and S. Roukos. “Trigger-Based Language Models: A Maximum Entropy Approach”. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Minnesota, MN, pp. II 45–48, April 1993.
R. Lau, R. Rosenfeld and S. Roukos. “Adaptive Language Modeling Using the Maximum Entropy Approach”. In Proceedings of the ARPA Human Language Technology Workshop, pp. 108–113, Morgan-Kaufmann, March 1993.
H. Ney, M. Generet and F. Wessel. “Extensions of Absolute Discounting for Language Modeling”. In Fourth European Conference on Speech Communication and Technology, pp. 1245–1248, Madrid, September 1995.
D.B. Paul and J.B. Baker. “The Design for the Wall Street Journal-based CSR Corpus”. In Proceedings of the DARPA SLS Workshop, pp. 357–361, February 1992.
R. Rosenfeld. “Adaptive Statistical Language Modeling: A Maximum Entropy Approach”. Ph.D. thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, CMU-CS-94-138, 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tillmann, C., Ney, H. (1996). Selection criteria for word trigger pairs in language modeling. In: Miclet, L., de la Higuera, C. (eds) Grammatical Interference: Learning Syntax from Sentences. ICGI 1996. Lecture Notes in Computer Science, vol 1147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0033345
Download citation
DOI: https://doi.org/10.1007/BFb0033345
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61778-5
Online ISBN: 978-3-540-70678-6
eBook Packages: Springer Book Archive