Machine Translation

, Volume 22, Issue 1–2, pp 29–66 | Cite as

Using target-language information to train part-of-speech taggers for machine translation

  • Felipe Sánchez-Martínez
  • Juan Antonio Pérez-Ortiz
  • Mikel L. Forcada
Article

Abstract

Although corpus-based approaches to machine translation (MT) are growing in interest, they are not applicable when the translation involves less-resourced language pairs for which there are no parallel corpora available; in those cases, the rule-based approach is the only applicable solution. Most rule-based MT systems make use of part-of-speech (PoS) taggers to solve the PoS ambiguities in the source-language texts to translate; those MT systems require accurate PoS taggers to produce reliable translations in the target language (TL). The standard statistical approach to PoS ambiguity resolution (or tagging) uses hidden Markov models (HMM) trained in a supervised way from hand-tagged corpora, an expensive resource not always available, or in an unsupervised way through the Baum-Welch expectation-maximization algorithm; both methods use information only from the language being tagged. However, when tagging is considered as an intermediate task for the translation procedure, that is, when the PoS tagger is to be embedded as a module within an MT system, information from the TL can be (unsupervisedly) used in the training phase to increase the translation quality of the whole MT system. This paper presents a method to train HMM-based PoS taggers to be used in MT; the new method uses not only information from the source language (SL), as general-purpose methods do, but also information from the TL and from the remaining modules of the MT system in which the PoS tagger is to be embedded. We find that the translation quality of the MT system embedding a PoS tagger trained in an unsupervised manner through this new method is clearly better than that of the same MT system embedding a PoS tagger trained through the Baum-Welch algorithm, and comparable to that obtained by embedding a PoS tagger trained in a supervised way from hand-tagged corpora.

Keywords

Rule-based machine translation Part-of-speech tagging Hidden Markov models Language modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Armentano-Oller C, Carrasco RC, Corbí-Bellot AM, Forcada ML, Ginestí-Rosell M, Ortiz-Rojas S, Pérez-Ortiz JA, Ramírez-Sánchez G, Sánchez-Martínez F, Scalco MA (2006) Open-source Portuguese-Spanish machine translation. In: Computational processing of the Portuguese language, proceedings of the 7th international workshop on computational processing of written and spoken Portuguese, vol 3960 of lecture notes in computer science. Itatiaia, RJ, Brazil: Springer-Verlag, pp 50–59Google Scholar
  2. Armentano-Oller C, Forcada ML (2006) Open-source machine translation between small languages: Catalan and Aranese Occitan. In: Proceedings of strategies for developing machine translation for minority languages (5th workshop on speech and language technology for minority languages), Genoa, Italy, pp 51–54Google Scholar
  3. Arnold D (2003) Why translation is difficult for computers. In: Somers H (eds) Computers and translation: a translator’s guide. John Benjamins, Amsterdam/Philadelphia, pp 119–142Google Scholar
  4. Baum LE (1972) An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3: 1–8Google Scholar
  5. Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6): 1554–1563CrossRefGoogle Scholar
  6. Brants T, Samuelsson C (1995) Tagging the Teleman corpus. In: Proceedings of the 10th Nordic conference of computational linguistics, Helsinki, Finland, pp 7–20Google Scholar
  7. Brill E (1992) A simple rule-based part-of-speech tagger. In: Proceedings of the 3rd applied natural language processing conference, Trento, Italy, pp 152–155Google Scholar
  8. Brill E (1995a) Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Comput Linguist 21(4): 543–565Google Scholar
  9. Brill E (1995b) Unsupervised learning of disambiguation rules for part of speech tagging. In: Proceedings of the third workshop on very large corpora, Somerset, NJ, pp 1–13Google Scholar
  10. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311Google Scholar
  11. Carbonell J, Klein S, Miller D, Steinbaum M, Grassiany T, Frei J (2006) Context-based machine translation. In: Proceedings of the 7th conference of the association for machine translation in the Americas. Visions for the future of machine translation, Cambridge, MA, pp 19–28Google Scholar
  12. Carl, M, Way, A (eds) (2003) Recent advances in example-based machine translation, vol 21. Kluwer Academic Publishers, Dordrecht/Boston/LondonGoogle Scholar
  13. Cutting D, Kupiec J, Pedersen J, Sibun P (1992) A practical part-of-speech tagger. In: Proceedings of the 3rd applied natural language processing conference, Trento, Italy, pp 133–140Google Scholar
  14. Dermatas E, Kokkinakis G (1995) Automatic stochastic tagging of natural language texts. Comput Linguist 21(2): 137–163Google Scholar
  15. Dien D, Kiem H (2003) POS-tagger for English-Vietnamese bilingual corpus. In: Proceedings of the workshop on building and using parallel texts: data driven machine translation and beyond, at the human language technology and the north American chapter of the association for computational linguistics joint conference, Edmonton, Canada, pp 88–95Google Scholar
  16. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap Vol. 57 of monographs on statistics and applied probability. Chapman & Hall/CRC, London, UKGoogle Scholar
  17. Foster G, Isabelle P, Plamondon P (1997) Target text mediated interactive machine translation. Mach Transl 2(1–2): 175–194CrossRefGoogle Scholar
  18. Gale WA, Church KW (1990) Poor estimates of context are worse than none. In: Proceedings of the third DARPA workshop on speech and natural language. San Mateo, CA: Morgan Kaufmann Publishers Inc., pp 283–287Google Scholar
  19. Gale WA, Sampson G (1995) Good-turing frequency estimation without tears. J Quant Linguist 2(3): 217–237CrossRefGoogle Scholar
  20. Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge, MAGoogle Scholar
  21. Kim JD, Lee SZ, Rim HC (1999) HMM specialization with selective kexicalization. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, College Park, MD, pp 121–127Google Scholar
  22. Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing. Barcelona, Spain, pp 388–395Google Scholar
  23. Koehn P (2008) Statistical machine translation. Cambridge University Press, Cambridge, UKGoogle Scholar
  24. Kupiec J (1992) Robust part-of-speech tagging using a hidden Markov model. Comput Speech Lang 6(3): 225–242CrossRefGoogle Scholar
  25. Levenshtein VI (1965) Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4):845–848. English translation in Soviet Physics Doklady 10(8):707–710 (1966)Google Scholar
  26. Manning CD, Schütze (1999) Foundations of statistical natural language processing. MIT Press, Cambridge, MAGoogle Scholar
  27. Merialdo B (1994) Tagging English text with a probabilistic model. Comput Linguist 20(2): 155–171Google Scholar
  28. Nagao M (1984) Framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence. Amsterdam, The Netherlands, North Holland, pp 173–180Google Scholar
  29. Och FJ (2005) Statistical machine translation: foundations and recent advances. Tutorial at MT Summit X, Phuket, ThailandGoogle Scholar
  30. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics. Association for Computational Linguistics, Philadelphia, PA, pp 311–318Google Scholar
  31. Pla F, Molina A (2004) Improving part-of-speech tagging using lexicalized HMMs. Nat Lang Eng 10(2): 167–189CrossRefGoogle Scholar
  32. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc Inst Electr Electron Eng (IEEE) 77(2): 257–286Google Scholar
  33. Sánchez-Villamil E, Forcada ML, Carrasco RC (2004) Unsupervised training of a finite-state sliding-window part-of-speech tagger. In: Advances in natural language processing, proceedings of the 4th international conference EsTAL (España for Natural Language Processing), Vol 3230 of lecture notes in computer science. Alicante, Spain: Springer-Verlag, pp 454–463Google Scholar
  34. Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2004a) Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation, Baltimore, MD, pp 135–144Google Scholar
  35. Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2004b) Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems. In: Advances in natural language processing, proceedings of the 4th international conference EsTAL (España for Natural Language Processing), vol 3230 of lecture notes in computer science. Alicante, Spain: Springer-Verlag, pp 137–148Google Scholar
  36. Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2006) Speeding up target-language driven part-of-speech tagger training for machine translation. In: Advances in artificial intelligence, proceedings of the 5th Mexican international conference on artificial intelligence, vol 4293 of lecture notes in computer science. Apizaco, Tlaxcala, Mexico: Springer-Verlag, pp 844–854Google Scholar
  37. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas. Visions for the future of machine translation, Cambridge, MA, pp 223–231Google Scholar
  38. Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing, Denver, CO, pp 901–904Google Scholar
  39. Yarowsky D, Ngai G (2001) Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In: Proceedings of the second meeting of the North American chapter of the association for computational linguistics, Pittsburgh, PA, pp 200–207Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Felipe Sánchez-Martínez
    • 1
  • Juan Antonio Pérez-Ortiz
    • 1
  • Mikel L. Forcada
    • 1
  1. 1.Dept. de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain

Personalised recommendations