Skip to main content

Selection criteria for word trigger pairs in language modeling

  • Session: Natural Language and Pattern Recognition
  • Conference paper
  • First Online:
Book cover Grammatical Interference: Learning Syntax from Sentences (ICGI 1996)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1147))

Included in the following conference series:

Abstract

In this paper, we study selection criteria for the use of word trigger pairs in statistical language modeling. A word trigger pair is defined as a long-distance word pair. To select the most significant trigger pairs, we need suitable criteria which are the topics of this paper. We extend a baseline language model by a single word trigger pair and use the perplexity of this extended language model as selection criterion. This extension is applied to all possible trigger pairs, the number of which is the square of the vocabulary size. When a unigram language model is applied as baseline model, this approach produces the mutual information criterion used in [7, 11]. The more interesting case is to use this criterion in the context of a more powerful model such as a bigram/trigram model with a cache. We study different variants of including word trigger pairs into such a language model. This approach produced better word trigger pairs than the conventional mutual information criterion. When used on the Wall Street Journal corpus, the trigger pairs selected reduced the perplexity of a trigram/cache language model from 138 to 128 for a 5-million word training set and from 92 to 87 for a 38-million word training set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.R. Bahl, F. Jelinek, R.L. Mercer and A. Nadas. “Next Word Statistical Predictor”. IBM Techn. Disclosure Bulletin, 27(7A), pp. 3941–3942, 1984.

    Google Scholar 

  2. A. Berger, S. Della Pietra and V. Della Pietra. “A Maximum Entropy Approach to Natural Language Processing”. In Computational Linguistics, Vol. 22, No. 1, pp. 39–71, March 1996.

    Google Scholar 

  3. A.P. Dempster, N.M. Laird and D.B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm”. In Journal of the Royal Statistical Society, Vol. 39, No. 1, pp. 1–38, 1977.

    Google Scholar 

  4. S. Della Pietra, V. Della Pietra, J. Gillett, J. Lafferty, H. Printz and L. Ures. “Inference and Estimation of a Long-Range Trigram Model”. In Lecture Notes in Artificial Intelligence, Grammatical Inference and Applications, ICGI-94, Alicante, Spain, Springer-Verlag, pp. 78–92, September 1994.

    Google Scholar 

  5. F. Jelinek. “Self-Organized Language Modeling for Speech Recognition”. In Readings in Speech Recognition, A. Waibel and K.F. Lee (eds.), pp. 450–506, MorganKaufmann, 1991.

    Google Scholar 

  6. S.M. Katz. “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer”. In IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 35, pp. 400–401, March 1987.

    Google Scholar 

  7. R. Lau, R. Rosenfeld and S. Roukos. “Trigger-Based Language Models: A Maximum Entropy Approach”. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Minnesota, MN, pp. II 45–48, April 1993.

    Google Scholar 

  8. R. Lau, R. Rosenfeld and S. Roukos. “Adaptive Language Modeling Using the Maximum Entropy Approach”. In Proceedings of the ARPA Human Language Technology Workshop, pp. 108–113, Morgan-Kaufmann, March 1993.

    Google Scholar 

  9. H. Ney, M. Generet and F. Wessel. “Extensions of Absolute Discounting for Language Modeling”. In Fourth European Conference on Speech Communication and Technology, pp. 1245–1248, Madrid, September 1995.

    Google Scholar 

  10. D.B. Paul and J.B. Baker. “The Design for the Wall Street Journal-based CSR Corpus”. In Proceedings of the DARPA SLS Workshop, pp. 357–361, February 1992.

    Google Scholar 

  11. R. Rosenfeld. “Adaptive Statistical Language Modeling: A Maximum Entropy Approach”. Ph.D. thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, CMU-CS-94-138, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Laurent Miclet Colin de la Higuera

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tillmann, C., Ney, H. (1996). Selection criteria for word trigger pairs in language modeling. In: Miclet, L., de la Higuera, C. (eds) Grammatical Interference: Learning Syntax from Sentences. ICGI 1996. Lecture Notes in Computer Science, vol 1147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0033345

Download citation

  • DOI: https://doi.org/10.1007/BFb0033345

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61778-5

  • Online ISBN: 978-3-540-70678-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics