Automatic quality estimation for speech translation using joint ASR and MT features

Le, Ngoc-Tien; Lecouteux, Benjamin; Besacier, Laurent

doi:10.1007/s10590-018-9218-6

Automatic quality estimation for speech translation using joint ASR and MT features

Published: 01 June 2018

Volume 32, pages 325–351, (2018)
Cite this article

Machine Translation

373 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper addresses the automatic quality estimation of spoken language translation (SLT). This relatively new task is defined and formalized as a sequence-labeling problem where each word in the SLT hypothesis is tagged as good or bad according to a large feature set. We propose several word confidence estimators (WCE) based on our automatic evaluation of transcription (ASR) quality, translation (MT) quality, or both (combined ASR + MT). This research work is possible because we built a specific corpus, which contains 6.7k utterances comprising the quintuplet: ASR output, verbatim transcript, text translation, speech translation, and post-edition of the translation. The conclusion of our multiple experiments using joint ASR and MT features for WCE is that MT features remain the most influential while ASR features can bring interesting complementary information. In addition, the last part of the paper proposes to disentangle ASR errors and MT errors where each word in the SLT hypothesis is tagged as good, \(asr\_error\) or \(mt\_error\). Robust quality estimators for SLT can be used for re-scoring speech translation graphs or for providing feedback to the user in interactive speech translation or computer-assisted speech-to-text scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Machine translation systems and quality assessment: a systematic review

Article Open access 10 April 2021

Notes

\(q_{i}\) could be also more than 2 labels, or even scores, but this paper mostly deals with error detection using a binary set of labels, with the exception of Sect. 7 where three labels are considered.
https://github.com/besacier/WCE-SLT-LIG.
cf. http://www.statmt.org/wmt17/ for the most recent such instance.
https://github.com/hlt-mt/TranscRater.
http://www.quest.dcs.shef.ac.uk/.
https://github.com/qe-team/marmot.
https://github.com/besacier/WCE-LIG.
Using this kind of feature is controversial, but we observed that such features are available in general scenarios, so we decided to include them in our experiments. Contrastive results without these two features will be also given later on.
http://github.com/besacier/WCE-LIG.
https://github.com/besacier/WCE-SLT-LIG/.
These 3 alternative utterances are simply added to the corpus as 3 examples and are used independently of each other.
Corresponding to optimization of the F-measure on bad labels (errors).
In principle, three data sets would have been needed to (a) train classifiers, (b) apply feature selection, (c) evaluate WCE performance. Since we only have a dev and a tst set, we found this procedure acceptable.
http://tienhuong.weebly.com/examples-for-the-paper.html.
However, we observed that the use of different label sets (Method 1, Method 2, Intersection(Method 1, Method 2)) does not have a strong influence on the results, so we omit these results here.

References

Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher D, Lenz HJ (eds) Learning from data: artificial intelligence and statistics V. Springer, New York, pp 199–206
Chapter Google Scholar
Asadi A, Schwartz R, Makhoul J (1990) Automatic detection of new words in a large vocabulary continuous speech recognition system. In: Proceedings of the international conference on acoustics, speech and signal processing, Albuquerque, New Mexico, USA, pp 263–265
Bach N, Huang F, Al-Onaizan Y (2011) Goodness: a method for measuring machine translation confidence. In: Proceedings of the 49th annual meeting of the association for computational linguistics, Portland, Oregon, pp 211–219
Besacier L, Lecouteux B, Luong NQ, Hour K, Hadjsalah M (2014) Word confidence estimation for speech translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Lake Tahoe, CA, USA, pp 169–175
Besacier L, Lecouteux B, Luong NQ, Le NT (2015) Spoken language translation graphs re-decoding using automatic quality assessment. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU 2015), Scotsdale, Arizona, United States, p 8
Biçici E (2013) Referential translation machines for quality estimation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 343–351
de Souza JGC, Zamani H, Negri M, Turchi M, Falavigna D (2015) Multitask learning for adaptive quality estimation of automatically transcribed utterances. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies. Denver, Colorado, USA, pp 714–724
Fayolle J, Moreau F, Raymond C, Gravier G, Gros P (2010) CRF-based combination of contextual features to improve a posteriori word-level confidence measures. In: Proceedings of the 11th annual conference of the international speech communication association, Makuhari, Chiba, Japan, pp 1942–1945
Federico M, Cettolo M, Bentivogli L, Paul M, Stüker S (2012) Overview of the IWSLT 2012 evaluation campaign. In: Proceedings of the 9th international workshop on spoken language translation (IWSLT), Hong Kong, China, pp 12–33
Galliano S, Geoffrois E, Gravier G, Bonastre JF, Mostefa D, Choukri K (2006) Corpus description of the ESTER evaluation campaign for the rich transcription of French broadcast news. In: In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy, pp 315–320
Han ALF, Lu Y, Wong DF, Chao LS, He L, Xing J (2013) Quality estimation for machine translation using the joint method of evaluation criteria and statistical modeling. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 365–372
Jalalvand S, Negri M, Turchi M, C de Souza JG, Daniele F, Qwaider MRH (2016) Transcrater: a tool for automatic speech recognition quality estimation. In: Proceedings of ACL-2016 system demonstrations, Berlin, Germany, pp 43–48
Kemp T, Schaaf T (1997) Estimating confidence using word lattices. In: Proceedings of the European conference on speech communication technology, Rhodes, Greece, pp 827–830
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting et labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001), Williamstown, MA, USA, pp 282–289
Langlois D, Raybaud S, Smaïli K (2012) Loria system for the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation, Montreal, Quebec, Canada, pp 114–119
Laurent A, Camelin N, Raymond C (2014) Boosting bonsai trees for efficient features combination: application to speaker role identification. In: Proceedings of the 15th annual conference of the international speech communication association (INTERSPEECH 2014), Singapore, pp 76–80
Lavergne T, Cappé O, Yvon F (2010) Practical very large scale CRFs. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp 504–513
Lecouteux B, Linarès G, Favre B (2009) Combined low level and high level features for out-of-vocabulary word detection. In: Proceedings of the 10th annual conference of the international speech communication association (INTERSPEECH 2009), Brighton, UK, pp 1187–1190
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707–710
MathSciNet Google Scholar
Logacheva V, Hokamp C, Specia L (2016) MARMOT: a toolkit for translation quality estimation at the word level. In: Proceedings of the tenth international conference on language resources and evaluation (LREC 2016), Portorož, Slovenia, p 4
Luong NQ, Besacier L, Lecouteux B (2013a) Word confidence estimation and its integration in sentence quality estimation for machine translation. In: Proceedings of the fifth international conference on knowledge and systems engineering (KSE 2013), Hanoi, Vietnam, p 12
Luong NQ, Besacier L, Lecouteux B (2014a) LIG system for word level QE task at WMT14. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 335–341
Luong NQ, Besacier L, Lecouteux B (2014b) Word confidence estimation for SMT N-best list re-ranking. In: Proceedings of the EACL 2014 workshop on humans and computer-assisted translation, Gothenburg, Sweden, pp 1–9
Luong NQ, Lecouteux B, Besacier L (2013b) LIG system for WMT13 QE task: investigating the usefulness of features in word confidence estimation for MT. In: Proceedings of the eighth workshop on statistical machine translation, association for computational linguistics, Sofia, Bulgaria, pp 396–391
Luong NQ, Besacier L, Lecouteux B (2015) Towards accurate predictors of word quality for machine translation: lessons learned on french-english and english-spanish systems. Data Knowl Eng 96:32–42
Article Google Scholar
Ng RWM, Doulaty M, Doddipatla R, Aziz W, Shah K, Saz O, Hasan M, AlHarbi G, Specia L, Hain T (2014) The USFD spoken language translation system for IWSLT 2014. In: Proceedings of the international workshop on spoken language translation (IWSLT), Lake Tahoe, CA, USA, pp 86–91
Ng RWM, Shah K, Aziz W, Specia L, Hain T (2015) Quality estimation for asr k-best list rescoring in spoken language translation. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015. Brisbane, Queensland, Australia, pp 5226–5230
Ng RWM, Shah K, Specia L, Hain T (2016) Groupwise learning for ASR k-best list reranking in spoken language translation. In: Proceedings of the 41st international conference on acoustics, speech and signal processing (ICASSP 2016), Shanghai, China, pp 6120–6124
Potet M, Besacier L, Blanchon H (2010) The LIG machine translation system for WMT 2010. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, Sweden, pp 161–166
Potet M, Esperança-Rodier E, Besacier L, Blanchon H (2012) Collection of a large database of french-english SMT output corrections. In: Proceedings of the eighth international conference on language resources and evaluation (LREC-2012), Istanbul, Turkey, pp 4043–4048
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding, Waikoloa, Hawaii, US, p 4
Servan C, Le NT, Luong NQ, Lecouteux B, Besacier L (2015) An open source toolkit for word-level confidence estimation in machine translation. In: Proceedings of the 12th international workshop on spoken language translation (IWSLT’15), Da Nang, Vietnam, pp 196–203
Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 259–268
Specia L, Paetzold G, Scarton C (2015) Multi-level translation quality prediction with QuEst++. In: Proceedings of ACL-IJCNLP 2015 system demonstrations, Beijing, China, pp 115–120
Young SR (1994) Recognition confidence measures: detection of misrecognitions and out-of-vocabulary words. In: Proceedings of the international conference on acoustics, speech and signal processing, Adelaide, South Australia, pp 21–24
Zamani H, de Souza JG, Negri M, Turchi M, Falavigna D (2015) Reference-free and confidence-independent binary quality estimation for automatic speech recognition. In: Proceedings of the second Italian conference on computational linguistics (CLiC-it), Trento, Italy, pp 280–285

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique de Grenoble, University of Grenoble Alpes, Building IMAG, 700 Centrale, 38401, Saint Martin d’Hères, France
Ngoc-Tien Le, Benjamin Lecouteux & Laurent Besacier

Authors

Ngoc-Tien Le
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Lecouteux
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Besacier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurent Besacier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le, NT., Lecouteux, B. & Besacier, L. Automatic quality estimation for speech translation using joint ASR and MT features. Machine Translation 32, 325–351 (2018). https://doi.org/10.1007/s10590-018-9218-6

Download citation

Received: 29 July 2016
Accepted: 22 March 2018
Published: 01 June 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10590-018-9218-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic quality estimation for speech translation using joint ASR and MT features

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A comprehensive survey on automatic speech recognition using neural networks

Machine translation systems and quality assessment: a systematic review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic quality estimation for speech translation using joint ASR and MT features

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A comprehensive survey on automatic speech recognition using neural networks

Machine translation systems and quality assessment: a systematic review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation