Towards the Improvement of Statistical Translation Models Using Linguistic Features

Pérez, Alicia; Torres, Inés; Casacuberta, Francisco

doi:10.1007/11816508_71

Alicia Pérez²¹,
Inés Torres²¹ &
Francisco Casacuberta²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

International Conference on Natural Language Processing (in Finland)

1585 Accesses
1 Citations

Abstract

Statistical translation models can be inferred from bilingual samples whenever enough training data are available. However, bilingual corpora are usually too scarce resources so as to get reliable statistical models, particularly, when we are dealing with very inflected languages, or with agglutinative languages, where many words appear just once. Such events often distort the statistics. In order to cope with this problem, we have turned to morphological knowledge. Instead of dealing directly with running words, we also take advantage of lemmas, thus, producing the translation in two stages. In the first stage we transform the source sentence into a lemmatized target sentence, and in the second stage we convert the lemmatized target sentence into the target full forms.

This work has been partially supported by the Industry Department of the Basque Government and by the University of the Basque Country under grants INTEK CN02AD02 and 9/UPV 00224.310-15900/2004 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Och, F.J., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., Jain, V., Jin, Z., Radev, D.: Final report of johns hopkins 2003 summer workshop on syntax for statistical machine translation. Technical report, Johns Hopkins University (2004)
Google Scholar
Casacuberta, F., Ney, H., Och, F.J., Vidal, E., Vilar, J.M., Barrachina, S., García-Varea, I., Llorens, D., Martínez, C., Molau, S., Nevado, F., Pastor, M., Picó, D., Sanchis, A., Tillmann, C.: Some approaches to statistical and finite-state speech-to-speech translation. Computer Speech and Language 18, 25–47 (2004)
Article Google Scholar
Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30, 205–225 (2004)
Article MathSciNet Google Scholar
Casacuberta, F., de la Higuera, C.: Computational complexity of problems on probabilistic grammars and transducers. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS, vol. 1891, pp. 15–24. Springer, Heidelberg (2000)
Chapter Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)
Google Scholar
Torres, I., Varona, A.: k-tss language models in a speech recognition systems. Computer Speech and Language 15, 127–149 (2001)
Article Google Scholar
Pérez, A., Casacuberta, F., Torres, M., Guijarrubia, V.: Finite state transducers based on k-TSS grammars for speech translation. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds.) FSMNLP 2005. LNCS, vol. 4002, pp. 270–272. Springer, Heidelberg (2006)
Chapter Google Scholar
García, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 920–925 (1990)
Article Google Scholar
Varona, A., Torres, I.: Back-off smoothing evaluation over syntactic language models. In: Proc. of European Conference on Speech Technology, vol. 3, pp. 2135–2138 (2001)
Google Scholar
Nießen, S.: Improving statistical machine translation using morpho-syntactic information. PhD thesis, Computer Science Department, RWTH Aachen University, Advisors: Dr. Ing. Hermann Ney and Dr. Enrique Vidal (2002)
Google Scholar
Pérez, A., Torres, I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: Proceedings of the 5th SALTMIL Workshop on Minority Languages, Genoa, Italy (2006)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association Computational Linguistics (ACL), Philadelphia, pp. 311–318 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco,
Alicia Pérez & Inés Torres
Departamento de Sistemas Informáticos y Computación, Institut Tecnològic d’Informàtica, Universidad Politécnica de Valencia,
Francisco Casacuberta

Authors

Alicia Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Inés Torres
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Casacuberta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Turku Centre for Computer Science (TUCS), Department of Information Technology, University of Turku, Joukahaisenkatu 3-5 B, FIN-20520, Turku, Finland
Tapio Salakoski
Turku Centre for Computer Science (TUCS) and Department of IT, University of Turku, Lemminkäisenkatu 14 A, 20520, Turku, Finland
Filip Ginter & Sampo Pyysalo &
Department of Information Technology, University of Turku, Lemminkäisenkatu 14–18 A, FIN-20520, Turku, Finland
Tapio Pahikkala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pérez, A., Torres, I., Casacuberta, F. (2006). Towards the Improvement of Statistical Translation Models Using Linguistic Features. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_71

Download citation

DOI: https://doi.org/10.1007/11816508_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics