GIATI: A General Methodology for Finite-State Translation Using Alignments

  • David Picó
  • Jesús Tomás
  • Francisco Casacuberta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3138)

Abstract

Statistical techniques for machine translation have experienced an increasing interest by the natural language research community in the last years. Both statistical language modeling and statistical machine translation are now well-established disciplines with solid basis and outstanding results. On the other hand, finite-state transducers have revealed as an efficient and flexible formalism for the representation of a wide range of the kind of information that arises in natural language processing.

This paper presents a powerful general framework for combining statistical techniques with grammatical inference and finite-state traducers. The GIATI methodology proposed here provides a schema for building inference algorithms that are able to generate finite-state transducers from parallel corpora of text making use of information supplied by robust statistical techniques such as n-grams and alignments. Here, the general method is presented together with two concrete inference algorithms and some experiments that show the validity of the GIATI framework for real-world translation tasks.

References

  1. 1.
    Amengual, J.C., Benedí, J.M., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V.M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., Vilar, J.M.: The EUTRANS-I Speech Translation System, Machine Translation Journal 15 (2000)Google Scholar
  2. 2.
    Berstel, J.: Transductions and context-free languages, Teubner Stuttgart (1979)Google Scholar
  3. 3.
    Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roosin, P.S.: A Statistical Approach to Sense Disambiguation in Machine Translation. Computational Linguistics, 79–86 (1990)Google Scholar
  4. 4.
    Casacuberta, F., Vidal, E., Picó, D.: Inference of finite-state transducers from regular languages. Acepted for publications in Pattern RecognitionGoogle Scholar
  5. 5.
    Casacuberta, F., de la Higuera, C.: Linguistic Decoding is a Difficult Computational Problem. Pattern Recognition Letters 20 (1999)Google Scholar
  6. 6.
    Llorens, D.: Suavizado de autómatas y traductores finitos estocásticos, Phd Thesis, Universitat Politècnica de València (2000)Google Scholar
  7. 7.
    Mohri, M.: Finite-State Transducers in Language and Speech Processing. Computational Linguistics 23 (1997)Google Scholar
  8. 8.
    J. Tomás, Casacuberta, F.: Monotone Statistical Translation using Word Groups. In: Proceedings of the Machine Translation Summit VIII, Santiago de Compostela, Spain (2001) Google Scholar
  9. 9.
    Vilar, J.M.: Improve the Learning of subsequential Transducers by Using Alignments and Dictionaries. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 298–311. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation, http://citeseer.nj.nec.com/zens02phrasebased.html
  11. 11.
    TransType 2 Project, http://tt2.sema.es

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • David Picó
    • 1
  • Jesús Tomás
    • 1
  • Francisco Casacuberta
    • 1
  1. 1.Departament de Sistemes Informàtics i ComputacióInstitut Tecnològic d’Informàtica, Universitat Politècnica de ValènciaValenciaSpain

Personalised recommendations