Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems

Sánchez-Martínez, Felipe; Pérez-Ortiz, Juan Antonio; Forcada, Mikel L.

doi:10.1007/978-3-540-30228-5_13

Felipe Sánchez-Martínez⁵,
Juan Antonio Pérez-Ortiz⁵ &
Mikel L. Forcada⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3230))

Included in the following conference series:

International Conference on Natural Language Processing (in Spain)

657 Accesses
2 Citations

Abstract

When automatically translating between related languages, one of the main sources of machine translation errors is the incorrect resolution of part-of-speech (PoS) ambiguities. Hidden Markov models (HMM) are the standard statistical approach to try to properly resolve such ambiguities. The usual training algorithms collect statistics from source-language texts in order to adjust the parameters of the HMM, but if the HMM is to be embedded in a machine translation system, target-language information may also prove valuable. We study how to use a target-language model (in addition to source-language texts) to improve the tagging and translation performance of a statistical PoS tagger of an otherwise rule-based, shallow-transfer machine translation engine, although other architectures may be considered as well. The method may also be used to customize the machine translation engine to a particular target language, text type, or subject, or to statistically “retune” it after introducing new transfer rules.

Work funded by the Spanish Government through grants TIC2003-08681-C02-01 and BES-2004-4711. We thank Rafael C. Carrasco for useful comments on this work. We also thank Geoffrey Sampson (University of Sussex, England) for his Simple Good-Turing implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving Machine Translation Using Parts-Of-Speech Tags and Dependency Parsing

Parse and Corpus-Based Machine Translation

A Comparative Study on Effective Approaches for Unsupervised Statistical Machine Translation

References

Brants, T., Samuelsson, C.: Tagging the Teleman corpus. In: Proceedings of the 10th Nordic Conference of Computational Linguistics, Helsinki, Finland (1995)
Google Scholar
Canals-Marote, R., et al.: The Spanish-Catalan machine translation system interNOSTRUM. In: Proceedings of MT Summit VIII, Machine Translation in the Information Age, pp. 73–76 (2001)
Google Scholar
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Third Conference on Applied Natural Language Processing. Association for Computational Linguistics. Proceedings of the Conference, pp. 133–140 (1992)
Google Scholar
Gale, W., Sampson, G.: Good-Turing smoothing without tears. Journal of Quantitative Linguistics 2(3) (1995)
Google Scholar
Hutchins, W.J., Somers, H.L.: An Introduction to Machine Translation. Academic Press, London (1992)
MATH Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press, Cambridge (1997)
Google Scholar
Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language 6(3), 225–242 (1992)
Article Google Scholar
Pla, F., Molina, A.: Improving part-of-speech tagging using lexicalized HMMs. Journal of Natural Language Engineering 10(2), 167–189 (2004)
Article Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03071, Alacant, Spain
Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz & Mikel L. Forcada

Authors

Felipe Sánchez-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Juan Antonio Pérez-Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
Mikel L. Forcada
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
José Luis Vicedo
Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Spain
Patricio Martínez-Barco
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Departamento de Lenguajes y Sistemas Informáticos, Carretera de San Vicente del Raspeig, Universidad de Alicante, 03690 San Vicente del Raspeig, Alicante, Spain
Maximiliano Saiz Noeda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez-Martínez, F., Pérez-Ortiz, J.A., Forcada, M.L. (2004). Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-30228-5_13
Published: 20 October 2004
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improving Machine Translation Using Parts-Of-Speech Tags and Dependency Parsing

Parse and Corpus-Based Machine Translation

A Comparative Study on Effective Approaches for Unsupervised Statistical Machine Translation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improving Machine Translation Using Parts-Of-Speech Tags and Dependency Parsing

Parse and Corpus-Based Machine Translation

A Comparative Study on Effective Approaches for Unsupervised Statistical Machine Translation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation