Investigating the contribution of linguistic information to quality estimation

Felice, Mariano; Specia, Lucia

doi:10.1007/s10590-013-9137-5

Investigating the contribution of linguistic information to quality estimation

Published: 30 August 2013

Volume 27, pages 193–212, (2013)
Cite this article

Machine Translation

Mariano Felice¹ &
Lucia Specia²

424 Accesses
2 Citations
Explore all metrics

Abstract

This paper describes a study on the contribution of linguistically-informed features to the task of quality estimation for machine translation at sentence level. A standard regression algorithm is used to build models using a combination of linguistic and non-linguistic features extracted from the input text and its machine translation. Experiments with three English–Spanish translation datasets show that linguistic features on their own are not able to outperform shallower features based on statistics from the input text, its translation and additional corpora. However, further analysis suggests that linguistic information can be useful to produce better results if carefully combined with other features. An in-depth analysis of the results highlights a number of issues related to the use of linguistic features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

A Bayesian non-linear method for feature selection in machine translation quality estimation

Article 30 January 2015

Machine Translation Quality Estimation: Applications and Future Perspectives

Notes

http://www.dcs.shef.ac.uk/~lucia/resources.html.
E.g. (1) The girl beside me was smiling rather brightly. She thought it was an honor that the exchange student should be seated next to her. \(\rightarrow \) *La niña a mi lado estaba sonriente bastante bien. Ella pensó que era un honor que el intercambio de estudiantes se encuentra próximo a ella. (superfluous) (2) She is thought to have killed herself through suffocation using a plastic bag. \(\rightarrow \) *Ella se cree que han matado a ella mediante asfixia utilizando una bolsa de plástico. (confusing).
E.g. *Alguna s de estas personas se convertir á en héroes. (number mismatch), *Barricad as fueron cread os en la calle Cortlandt. (gender mismatch), *Buen a mentiros os están cualificados en lectura. (internal NP gender and number mismatch).
These included common deictic terms compiled from various sources, such as hoy, allí, tú (Spanish) or that, now or there (English).
http://kenai.com/projects/jmyspell.
http://www.openoffice.org/.
I won’t give it away. \(\rightarrow \) *He ganado ’ t darle.
For 147 features: \(2^{147}\).
For 147 features, worst case is \(147\times (147+1)/2=10{,}878\) (roughly \(2^{13}\)).

References

Alpaydin E (2010) Introduction to machine learning (adaptive computation and machine learning), 2nd edn. MIT Press, Cambridge
Google Scholar
Avramidis E (2012) Quality estimation for machine translation output using linguistic analysis and decoding features. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 84–90
Bach N, Huang F, Al-Onaizan Y (2011) Goodness: a method for measuring machine translation confidence. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Portland, Oregon, USA, pp 211–219
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. Final report of Johns Hopkins (2003) summer workshop on speech and language engineering. Johns Hopkins University, Baltimore, Maryland, USA
Buck C (2012) Black box features for the wmt 2012 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 91–95
Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 10–51
Carroll JB (1964) Language and thought. Prentice-Hall, Englewood Cliffs
Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar
Dugast D (1980) La statistique lexicale. Slatkine, Geneva
Google Scholar
Felice M, Specia L (2012) Linguistic features for quality estimation. In: Proceedings of the seventh workshop on statistical machine translation. Association for Computational Linguistics, Montréal, Canada, pp 96–103
Giménez J, Màrquez L (2010) Linguistic measures for automatic machine translation evaluation. Mach Transl 24(3):209–240
Article Google Scholar
Guiraud P (1954) Les Caractères Statistiques du Vocabulaire. Presses Universitaires de France, Paris
Google Scholar
Halliday MAK, Hasan R (1976) Cohesion in english. Longman, London
Google Scholar
Hardmeier C (2011) Improving machine translation quality prediction with syntactic tree kernels. In: Proceedings of the 15th conference of the European Association for Machine Translation (EAMT 2011), Leuven, Belgium, pp 233–240
Hardmeier C, Nivre J, Tiedemann J (2012) Tree kernels for machine translation quality estimation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 109–113
Herdan G (1960) Type-token mathematics: a textbook of mathematical linguistics. Mouton & Co., The Hague
MATH Google Scholar
Jarvis S (2002) Short texts, best-fitting curves and new measures of lexical diversity. Lang Test 19(1): 57–84
Google Scholar
Padró L, Collado M, Reese S, Lloberes M, Castellón I (2010) Freeling 2.1: five years of open-source language processing tools. In: Chair NCC, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the seventh conference on international language resources and evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp 311–318
Pighin D, Màrquez L (2011) Automatic projection of semantic structures: an application to pairwise translation ranking. In: Fifth workshop on syntax, semantics and structure in statistical translation (SSST-5), Portland, Oregon
Pighin D, González M, Màrquez L (2012) The UPC submission to the WMT 2012 shared task on quality estimation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 127–132
Popovic M (2012) Morpheme- and pos-based ibm1 and language model scores for translation quality estimation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 133–137
Quirk CB (2004) Training a sentence-level machine translation confidence metric. In: Proceedings of the international conference on language resources and evaluation, Lisbon, Portugal, LREC 2004, vol 4, pp 825–828
Rubino R, Foster J, Wagner J, Roturier J, Samad Zadeh Kaljahi R, Hollowood F (2012) Dcu-symantec submission for the wmt 2012 quality estimation task. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 138–144
Schmid H (1995) Improvements in part-of-speech tagging with an application to german. In: Proceedings of the ACL SIGDAT-Workshop, Dublin, Ireland, pp 47–50
Soricut R, Bach N, Wang Z (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation. Association for Computational Linguistics, Montréal, Canada, pp 145–151
Specia L (2011) Exploiting objective annotations for measuring translation post-editing effort. In: Proceedings of the 15th conference of the European Association for Machine Translation, Leuven, pp 73–80
Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009a) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th annual conference of the European Association for Machine Translation (EAMT), Barcelona, Spain, pp 28–35
Specia L, Turchi M, Wang Z, Shawe-Taylor J, Saunders C (2009b) Improving the confidence of machine translation quality estimates. In: Proceedings of the twelfth machine translation summit (MT Summit XII), Ottawa, Canada, pp 136–143
Specia L, Hajlaoui N, Hallett C, Aziz W (2011) Predicting machine translation adequacy. In: Machine translation summit XIII, Xiamen, China, pp 19–23
Stolcke A (2002) Srilm-an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing (ICSLP 2002), Denver, USA, vol 2, pp 901–904
Taulé M, Martí MA, Recasens M (2008) Ancora: multilevel annotated corpora for catalan and spanish. In: Chair NCC, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the sixth international conference on language resources and evaluation (LREC’08). European Language Resources Association (ELRA), Marrakech, Morocco
Vilar D, Xu J, D’Haro LF, Ney H (2006) Error analysis of machine translation output. In: International conference on language resources and evaluation, Genoa, Italy, pp 697–702
Xiong D, Zhang M, Li H (2010) Error detection for statistical machine translation using linguistic features. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp 604–611

Download references

Acknowledgments

M. Felice thanks the support from the European Commission, Education & Training, Eramus Mundus: EMMC 2008-0083, Erasmus Mundus Masters in NLP & HLT programme.

Author information

Authors and Affiliations

Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
Mariano Felice
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
Lucia Specia

Authors

Mariano Felice
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Specia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariano Felice.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Felice, M., Specia, L. Investigating the contribution of linguistic information to quality estimation. Machine Translation 27, 193–212 (2013). https://doi.org/10.1007/s10590-013-9137-5

Download citation

Received: 14 October 2012
Accepted: 19 April 2013
Published: 30 August 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10590-013-9137-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating the contribution of linguistic information to quality estimation

Abstract

Access this article

Similar content being viewed by others

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

A Bayesian non-linear method for feature selection in machine translation quality estimation

Machine Translation Quality Estimation: Applications and Future Perspectives

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Investigating the contribution of linguistic information to quality estimation

Abstract

Access this article

Similar content being viewed by others

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

A Bayesian non-linear method for feature selection in machine translation quality estimation

Machine Translation Quality Estimation: Applications and Future Perspectives

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation