Applications of probabilistic grammatical inference are limited due to time and space consuming constraints. In statistical language modeling, for example, large corpora are now available and lead to managing automata with millions of states. We propose in this article a method for pruning automata (when restricted to tree based structures) which is not only efficient (sub-quadratic) but that allows to dramatically reduce the size of the automaton with a small impact on the underlying distribution. Results are evaluated on a language modeling task.


Machine Translation Tree Automaton Pruning Method Quadratic Complexity Probabilistic Automaton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(2), 205–225 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. In: Proceedings of COLT 1995, pp. 31–40 (1995)Google Scholar
  3. 3.
    Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 139–152. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  4. 4.
    Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In: Langley, P. (ed.) ICML. Morgan Kaufmann, San Francisco (2000)Google Scholar
  5. 5.
    Thollard, F.: Improving probabilistic grammatical inference core algorithms with post-processing techniques. In: ICML 2001, pp. 561–568. Morgan Kaufmann, San Francisco (2001)Google Scholar
  6. 6.
    Kermorvant, C., Dupont, P.: Stochastic grammatical inference with multinomial tests. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS, vol. 2484, pp. 149–160. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines – Part I and II. IEEE trans. on PAMI 27(7) (2005)Google Scholar
  8. 8.
    Goodman, J.: A bit of progress in language modeling. Technical report, Microsoft Research (2001)Google Scholar
  9. 9.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. on Information Systems 22(2), 179–214 (2004)CrossRefGoogle Scholar
  10. 10.
    Clark, A., Thollard, F.: Pac-learnability of probabilistic deterministic finite state automata. JMLR 5, 473–497 (2004)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, Chichester (1991)CrossRefzbMATHGoogle Scholar
  12. 12.
    Abe, N., Warmuth, M.: On the computational complexity of approximating distributions by probabilistic automata. Machine Learning 9, 205–260 (1992)zbMATHGoogle Scholar
  13. 13.
    Carrasco, R.C.: Accurate computation of the relative entropy between stochastic regular grammars. RAIRO TIA 31(5), 437–444 (1997)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Charniak, E.: Immediate-head parsing for language models. In: 10th Conf. of the Association for Computational linguistic, ACL 2001 (2001)Google Scholar
  15. 15.
    Stolcke, A.: Entropy-based pruning of backoff language models. In: DARPA Broadcast News Transcription and Understanding Workshop, pp. 270–274 (1998)Google Scholar
  16. 16.
    Callut, J., Dupont, P.: Learning partially observable markov models from first passage times. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 91–103. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Dupont, P., Chase, L.: Using symbol clustering to improve probabilistic automaton inference. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 232–243. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  18. 18.
    Thollard, F., Clark, A.: Shallow parsing using probabilistic grammatical inference. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS, vol. 2484, pp. 269–282. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Franck Thollard
    • 1
  • Baptiste Jeudy
    • 1
  1. 1.Laboratoire Hubert Curien UMR CNR 5516Université de Lyon, Université Jean-MonnetFrance

Personalised recommendations