Abstract
Formal translations constitute a suitable framework for dealing with many problems in pattern recognition and computational linguistics. The application of formal transducers to these areas requires a stochastic extension for dealing with noisy, distorted patterns with high variability. In this paper, some estimation criteria are proposed and developed for the parameter estimation of regular syntax-directed translation schemata. These criteria are: maximum likelihood estimation, minimum conditional entropy estimation and conditional maximum likelihood estimation. The last two criteria were proposed in order to deal with situations when training data is sparse. These criteria take into account the possibility of ambiguity in the translations: i.e., there can be different output strings for a single input string. In this case, the final goal of the stochastic framework is to find the highest probability translation of a given input string. These criteria were tested on a translation task which has a high degree of ambiguity.
Article PDF
Similar content being viewed by others
References
Aho, A. V.& Ullman, J. D. (1972). The theory of parsing, translation and compiling. vol. 1. Prentice-Hall.
Amengual, J. C., Benedí, J. B., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V. M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E.,& Vilar, J. M. (1998). The EuTrans-I speech translation system. submitted to Machine Translation.
Bahl, L.R., Jelinek, F.,& Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5:2, 179–196.
Baum, L. E.& Sell, G. R. (1968). Growth transformations for functions on manifolds. Pacific Journal of Mathematics, 26:2, 211–227.
Berstel, J. (1979). Transductions&Context-Free Languages. Stuttgart: B.G. Teubner.
Brown, P. F. (1987). The acoustic-modelling problem in automatic speech recognition. Ph. Dissertation. Carnegie-Mellon University.
Cardin, R., Normandin, Y.,& DeMori, R. (1994). High performance connected digit recognition using maximum mutual information estimation. IEEE Trans. on Speech&Audio Processing, 2:2, 300–311.
Casacuberta, F. (1900). Some relations among stochastic finite state networks used in automatic speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligente, PAMI-12(7), 691–693.
Casacuberta, F. (1994). Statistical estimation of stochastic context-free grammars using the inside-outside algorithm&a transformation on grammars. R. Carrasco& J. Oncina, (Eds.). Grammatical inference and applications, Lecture notes in artificial intelligence (Vol. 862), pp. 119–129, Springer-Verlag.
Casacuberta, F. (1995). Probabilistic estimation of stochastic regular syntax-directed translation schemes. Proc. of the VI Spanish Symposium on Pattern Recognition and Image Analysis (pp. 201–207).
Casacuberta, F. (1996). Growth transformations for probabilistic functions of stochastic grammars. International Journal of Pattern Recognition and Artificial Intelligence, 10:3, 183–201.
Casacuberta, F. (1996). Maximum mutual information and conditional maximum likelihood estimation of stochastic regular syntax-directed translation schemes. L. Miclet& C. de la Higuera, (Eds.). Grammatical inference: Learning syntax from sentences. Lecture notes in artificial intelligence (Vol. 1147, pp. 282–291). Springer-Verlag.
Casacuberta, F. (2000). Morphic generator translation inference, (to be submited to ICGI'2000).
Casacuberta, F.& de la Higuera, C. (2000). Computational complexity of problems on probabilistic grammars and transducers (to be published).
Dempster, A. P., Laird, N. M.,& Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Ser. B, 39:1, 1–38.
Fu, K. S. (Ed.) (1982). Syntactic pattern recognition applications. Englewood Cliffs, NJ: Prentice-Hall.
Gildea, D.& Jurafsky, D. (1996). Learning bias and phonological-rule induction. Computational Linguistics, 22:4, 497–530.
González, R. C.& Thomason, M. G. (1978). Syntactic pattern recognition: An introduction, Reading, MA: Addison-Wesley.
Gopalakrishnan, P. S., Kanevsky, D., Nádas, A.,& Nahamoo, D. (1991). An inequality for rational functions with applications to some statistical estimation problems. IEEE Transactions on Information Theory, 37:1.
Jelinek, F.& Lafferty, J. D. (1991). Computation of the probability of initial substring generation by stochastic context-free grammars. Computational Linguistics, 17:3, 315–323.
Maryanski, F. J.& Thomason, M. G. (1979). Properties of stochastic syntax-directed translation schemata. International Journal of Computer and Information Sciences, 8:2, 89–110.
Merhav, N.& Ephraim, Y. (1991). Maximum likelihood hidden Markov modelling using a dominant sequence of states. IEEE Transactions on Signal Processing, 39:9, 2111–2115.
Mohri, M. (1997). Finite-state transducers in language and speech processing. Computational Linguistics, 23:2, 269–311.
Nádas, A., Hahamoo, D.,& Picherny, M. (1988). On a model-robust training method for speech recognition. Trans. on Acoustic, Speech and Signal Processing, 36:9, 1432–1435.
Oflazer, K. (1996). Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22(1), 73–89.
Oncina, J., García, P.,& Vidal, E. (1993). Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis&Machine Intelligence, 15:5, 448–458.
Roche, E.& Schabes, Y. (1995). Deterministic part-of-speech tagging with finite-state transducers. Computational Linguistics, 21:2, 227–253.
Sánchez, J. A.& Benedí, J. M. (1997). Consistency of stocastic context-free grammars from probabilistic estimation based on growth transformation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:9, 1052–1059.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423, (Part I); pp. 623- 656 (Part II).
Thomason, M. G. (1976). Regular stochastic syntax-directed translations. Technical Report CS-76-17.
Vidal, E. (1997). Finite-state speech-to-speech translation. Proceedings of the International Conference on Acoustic, Speech and Signal Processing (Vol.1, pp. 111–114. Munich, Germany).
Vidal, E., Casacuberta, F.,& García, P. (1995). Grammatical inference and speech recognition. New Advances and Trends in Speech Recognition and Coding. NATO ASI Series. (pp. 174–191), Springer-Verlag.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Picó, D., Casacuberta, F. Some Statistical-Estimation Methods for Stochastic Finite-State Transducers. Machine Learning 44, 121–141 (2001). https://doi.org/10.1023/A:1010880113956
Issue Date:
DOI: https://doi.org/10.1023/A:1010880113956