A Statistical-Estimation Method for Stochastic Finite-State Transducers Based on Entropy Measures

  • David Picó
  • Francisco Casacuberta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1876)


The stochastic extension of formal translations constitutes a suitable framework for dealing with many problems in Syntactic Pattern Recognition. Some estimation criteria have already been proposed and developed for the parameter estimation of Regular Syntax-Directed Translation Schemata. Here, a new criterium is proposed for dealing with situations when training data is sparse. This criterium is based on entropy measurements, somehow inspired in the Maximum Mutual Information criterium, and it takes into account the possibility of ambiguity in translations (i.e., the translation model may yield different output strings for a single input string.) The goal in the stochastic framework is to find the most probable translation of a given input string. Experiments were performed on a translation task which has a high degree of ambiguity.


Machine translation stochastic finite-state transducers probabilistic estimation 


  1. 1.
    Aho, A. V. and Ullman, J. D. (1972). The Theory of Parsing, Translation and Compiling. Vol. 1. Prentice-Hall.Google Scholar
  2. 2.
    Amengual, J. C., Benedí, J. B., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V. M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., and Vilar, J. M. (1998). The Eutrans-I Speech Translation System. Submitted to Machine Translation.Google Scholar
  3. 3.
    Berstel, J. (1979). Transductions and Context-Free Languages. B. G. Teubner Stuttgart.Google Scholar
  4. 4.
    Brown, P. F. (1987). The Acoustic-Modelling Problem in Automatic Speech Recognition. Ph. Dissertation. Carnegie-Mellon University.Google Scholar
  5. 5.
    Casacuberta, F. (2000). Morphic Generator Translation Inference. To be submited for publication.Google Scholar
  6. 6.
    Casacuberta, F. (1995). Probabilistic Estimation of Stochastic Regular Syntax-Directed Translation Schemes. Proc. of the VI Spanish Symposium on Pattern Recognition and Image Analysis, pp. 201–207.Google Scholar
  7. 7.
    Casacuberta, F. (1996). Growth Transformations for Probabilistic Functions of Stochastic Grammars. International Journal of Pattern Recognition and Artificial Intelligence, vol. 10, n. 3, pp. 183–201, Word Scientific Publishing Company.CrossRefGoogle Scholar
  8. 8.
    Casacuberta, F. (1996). Maximum Mutual Information and Conditional Maximum Likelihood Estimation of Stochastic Regular Syntax-Directed Translation Schemes. Grammatical inference: Learning Syntax from Sentences, L. Miclet and C. de la Higuera (eds.). Lecture Notes in Artificial Intelligence. Vol. 1147, pp. 282–291. Springer Verlag.Google Scholar
  9. 9.
    Casacuberta, F., de la Higuera, C. (1998). Computational Complexity of Problems on Probabilistic Grammars and Transducers. To be published.Google Scholar
  10. 10.
    Cardin, R., Normandin, Y., DeMori, R. (1994). High Performance Connected Digit Recognition using Maximum Mutual Information Estimation. IEEE Trans. on Speech and Audio Processing, vol. 2(2), pp. 300–311.Google Scholar
  11. 11.
    Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion). Journal of the Royal Statistical Society, ser. B, vol. 39, num. 1, pp. 1–38.zbMATHMathSciNetGoogle Scholar
  12. 12.
    Fu, K. S.(1982). Syntactic Pattern Recognition and Applications. Ed. Prentice-Hall.Google Scholar
  13. 13.
    Gopalakrishnan, P. S., Kanevsky, D., Nádas, A., and Nahamoo. D. (1991). An Inequality for Rational Functions with Applications to Some Statistical Estimation Problems. IEEE Transactions on Information Theory, vol. 37, no. 1.Google Scholar
  14. 14.
    González, R. C., and Thomason, M. G. (1978). Syntactic Pattern Recognition: An Introduction, Addison-Wesley.Google Scholar
  15. 15.
    Jelinek, F. (1998) Statistical Methods for Speech Recognition. MIT Press, 1998.Google Scholar
  16. 16.
    Oncina, J., García, P., and Vidal, E. (1993). Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 5, pp. 448–458.CrossRefGoogle Scholar
  17. 17.
    Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, pp. 379–423 (Part I), pp. 623–656 (Part II).MathSciNetGoogle Scholar
  18. 18.
    Vidal, E. (1997). Finite-State Speech-to-Speech Translation. Proceedings of the International Conference on Acoustic, Speech and Signal Processing, vol. 1, pp. 111–114. Munich (Germany).Google Scholar
  19. 19.
    Vidal, E., Casacuberta, F., and García, P. (1995). Grammatical Inference and Speech Recognition, New Advances and Trends in Speech Recognition and Coding. NATO ASI Series. pp. 174–191. Springer-Verlag.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • David Picó
    • 1
  • Francisco Casacuberta
    • 1
  1. 1.Institut Tecnològic d’Informàtica, Departament de Sistemes Informàtics i ComputacióUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations