In statistical language modelling the classic model used is n-gram. This model is not able however to capture long term dependencies, i.e. dependencies larger than n. An alternative to this model is the probabilistic automaton. Unfortunately, it appears that preliminary experiments on the use of this model in language modelling is not yet competitive, partly because it tries to model too long term dependencies. We propose here to improve the use of this model by restricting the dependency to a more reasonable value. Experiments shows an improvement of 45% reduction in the perplexity obtained on the Wall Street Journal language modeling task.


Language Modeling Wall Street Journal State Automaton Position Model Probabilistic Automaton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Callut, J., Dupont, P.: Learning partially observable markov models from first passage times. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 91–103. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Second ICGI, pp. 139–152 (1994)Google Scholar
  3. 3.
    Charniak, E.: Immediate-head parsing for language models. In: 10th Conf. of the Association for Computational linguistic, ACL 2001 (2001)Google Scholar
  4. 4.
    Chen, S.F.: Building Probabilistic Models for natural Language. PhD thesis, Harvard University Cambridge Massachusetts (May 1996)Google Scholar
  5. 5.
    Daciuk, J., van Noord, G.: Finite automata for compact representation of language models in NLP. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 65–73. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Dupont, P., Amengual, J.C.: Smoothing probabilistic automata: an error-correcting approach. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS, vol. 1891, pp. 51–64. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Goodman, J.: A bit of progress in language modeling. Technical report, Microsoft Research (2001)Google Scholar
  8. 8.
    Hirschman, L.: Multi-site data collection for a spoken language corpus. In: DARPA Speech and Natural Language Workshop, pp. 7–14 (1992)Google Scholar
  9. 9.
    Kenneth, C., Ted, H., Jianfeng, G.: Compressing trigram language models with Golomb coding. In: Joint EMNLP-CoNLL, pp. 199–207 (2007)Google Scholar
  10. 10.
    Kermorvant, C., Dupont, P.: Stochastic grammatical inference with multinomial tests. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS, vol. 2484, pp. 149–160. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Intl. Conf. on Acoustic, Speech and Signal Processing, pp. 181–184 (1995)Google Scholar
  12. 12.
    Marcus, M., Santorini, S., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  13. 13.
    McAllester, D., Shapire, R.: On the convergence rate of the good-turing estimators. In: Conf. on Computational Learning Theory, pp. 1–66 (2000)Google Scholar
  14. 14.
    Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. In: Proceedings of COLT 1995, pp. 31–40 (1995)Google Scholar
  15. 15.
    Siivola, V., Hirsimäki, T., Virpioja, S.: On growing and pruning kneser-ney smoothed n-gram models. IEEE Transactions on Audio, Speech and Language Processing 15(5), 1617–1624 (2007)CrossRefGoogle Scholar
  16. 16.
    Stolcke, A.: Entropy-based pruning of backoff language models. In: DARPA Broadcast News Transcription and Understanding Workshop, pp. 270–274 (1998)Google Scholar
  17. 17.
    Thollard, F.: Improving probabilistic grammatical inference core algorithms with post-processing techniques. In: ICML 2001, pp. 561–568. Morgan Kaufmann, San Francisco (2001)Google Scholar
  18. 18.
    Thollard, F., Clark, A.: Shallow parsing using probabilistic grammatical inference. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS, vol. 2484, pp. 269–282. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  19. 19.
    Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In: ICML (2000)Google Scholar
  20. 20.
    Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines – Part I and II. PAMI 27(7) (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Arnaud Zdziobeck
    • 1
  • Franck Thollard
    • 1
  1. 1.Laboratoire Hubert Curien, UMR CNRS 5516Université de Lyon, Université Jean Monnet, Saint-ÉtienneFrance

Personalised recommendations