Abstract
When automatically translating between related languages, one of the main sources of machine translation errors is the incorrect resolution of part-of-speech (PoS) ambiguities. Hidden Markov models (HMM) are the standard statistical approach to try to properly resolve such ambiguities. The usual training algorithms collect statistics from source-language texts in order to adjust the parameters of the HMM, but if the HMM is to be embedded in a machine translation system, target-language information may also prove valuable. We study how to use a target-language model (in addition to source-language texts) to improve the tagging and translation performance of a statistical PoS tagger of an otherwise rule-based, shallow-transfer machine translation engine, although other architectures may be considered as well. The method may also be used to customize the machine translation engine to a particular target language, text type, or subject, or to statistically “retune” it after introducing new transfer rules.
Work funded by the Spanish Government through grants TIC2003-08681-C02-01 and BES-2004-4711. We thank Rafael C. Carrasco for useful comments on this work. We also thank Geoffrey Sampson (University of Sussex, England) for his Simple Good-Turing implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brants, T., Samuelsson, C.: Tagging the Teleman corpus. In: Proceedings of the 10th Nordic Conference of Computational Linguistics, Helsinki, Finland (1995)
Canals-Marote, R., et al.: The Spanish-Catalan machine translation system interNOSTRUM. In: Proceedings of MT Summit VIII, Machine Translation in the Information Age, pp. 73–76 (2001)
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Third Conference on Applied Natural Language Processing. Association for Computational Linguistics. Proceedings of the Conference, pp. 133–140 (1992)
Gale, W., Sampson, G.: Good-Turing smoothing without tears. Journal of Quantitative Linguistics 2(3) (1995)
Hutchins, W.J., Somers, H.L.: An Introduction to Machine Translation. Academic Press, London (1992)
Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press, Cambridge (1997)
Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language 6(3), 225–242 (1992)
Pla, F., Molina, A.: Improving part-of-speech tagging using lexicalized HMMs. Journal of Natural Language Engineering 10(2), 167–189 (2004)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez-Martínez, F., Pérez-Ortiz, J.A., Forcada, M.L. (2004). Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-30228-5_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive