Skip to main content
Log in

Automatic quality estimation for speech translation using joint ASR and MT features

  • Published:
Machine Translation

Abstract

This paper addresses the automatic quality estimation of spoken language translation (SLT). This relatively new task is defined and formalized as a sequence-labeling problem where each word in the SLT hypothesis is tagged as good or bad according to a large feature set. We propose several word confidence estimators (WCE) based on our automatic evaluation of transcription (ASR) quality, translation (MT) quality, or both (combined ASR + MT). This research work is possible because we built a specific corpus, which contains 6.7k utterances comprising the quintuplet: ASR output, verbatim transcript, text translation, speech translation, and post-edition of the translation. The conclusion of our multiple experiments using joint ASR and MT features for WCE is that MT features remain the most influential while ASR features can bring interesting complementary information. In addition, the last part of the paper proposes to disentangle ASR errors and MT errors where each word in the SLT hypothesis is tagged as good, \(asr\_error\) or \(mt\_error\). Robust quality estimators for SLT can be used for re-scoring speech translation graphs or for providing feedback to the user in interactive speech translation or computer-assisted speech-to-text scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. \(q_{i}\) could be also more than 2 labels, or even scores, but this paper mostly deals with error detection using a binary set of labels, with the exception of Sect. 7 where three labels are considered.

  2. https://github.com/besacier/WCE-SLT-LIG.

  3. cf. http://www.statmt.org/wmt17/ for the most recent such instance.

  4. https://github.com/hlt-mt/TranscRater.

  5. http://www.quest.dcs.shef.ac.uk/.

  6. https://github.com/qe-team/marmot.

  7. https://github.com/besacier/WCE-LIG.

  8. Using this kind of feature is controversial, but we observed that such features are available in general scenarios, so we decided to include them in our experiments. Contrastive results without these two features will be also given later on.

  9. http://github.com/besacier/WCE-LIG.

  10. https://github.com/besacier/WCE-SLT-LIG/.

  11. These 3 alternative utterances are simply added to the corpus as 3 examples and are used independently of each other.

  12. Corresponding to optimization of the F-measure on bad labels (errors).

  13. In principle, three data sets would have been needed to (a) train classifiers, (b) apply feature selection, (c) evaluate WCE performance. Since we only have a dev and a tst set, we found this procedure acceptable.

  14. http://tienhuong.weebly.com/examples-for-the-paper.html.

  15. However, we observed that the use of different label sets (Method 1, Method 2, Intersection(Method 1, Method 2)) does not have a strong influence on the results, so we omit these results here.

References

  • Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher D, Lenz HJ (eds) Learning from data: artificial intelligence and statistics V. Springer, New York, pp 199–206

    Chapter  Google Scholar 

  • Asadi A, Schwartz R, Makhoul J (1990) Automatic detection of new words in a large vocabulary continuous speech recognition system. In: Proceedings of the international conference on acoustics, speech and signal processing, Albuquerque, New Mexico, USA, pp 263–265

  • Bach N, Huang F, Al-Onaizan Y (2011) Goodness: a method for measuring machine translation confidence. In: Proceedings of the 49th annual meeting of the association for computational linguistics, Portland, Oregon, pp 211–219

  • Besacier L, Lecouteux B, Luong NQ, Hour K, Hadjsalah M (2014) Word confidence estimation for speech translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Lake Tahoe, CA, USA, pp 169–175

  • Besacier L, Lecouteux B, Luong NQ, Le NT (2015) Spoken language translation graphs re-decoding using automatic quality assessment. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU 2015), Scotsdale, Arizona, United States, p 8

  • Biçici E (2013) Referential translation machines for quality estimation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 343–351

  • de Souza JGC, Zamani H, Negri M, Turchi M, Falavigna D (2015) Multitask learning for adaptive quality estimation of automatically transcribed utterances. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies. Denver, Colorado, USA, pp 714–724

  • Fayolle J, Moreau F, Raymond C, Gravier G, Gros P (2010) CRF-based combination of contextual features to improve a posteriori word-level confidence measures. In: Proceedings of the 11th annual conference of the international speech communication association, Makuhari, Chiba, Japan, pp 1942–1945

  • Federico M, Cettolo M, Bentivogli L, Paul M, Stüker S (2012) Overview of the IWSLT 2012 evaluation campaign. In: Proceedings of the 9th international workshop on spoken language translation (IWSLT), Hong Kong, China, pp 12–33

  • Galliano S, Geoffrois E, Gravier G, Bonastre JF, Mostefa D, Choukri K (2006) Corpus description of the ESTER evaluation campaign for the rich transcription of French broadcast news. In: In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy, pp 315–320

  • Han ALF, Lu Y, Wong DF, Chao LS, He L, Xing J (2013) Quality estimation for machine translation using the joint method of evaluation criteria and statistical modeling. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 365–372

  • Jalalvand S, Negri M, Turchi M, C de Souza JG, Daniele F, Qwaider MRH (2016) Transcrater: a tool for automatic speech recognition quality estimation. In: Proceedings of ACL-2016 system demonstrations, Berlin, Germany, pp 43–48

  • Kemp T, Schaaf T (1997) Estimating confidence using word lattices. In: Proceedings of the European conference on speech communication technology, Rhodes, Greece, pp 827–830

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting et labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001), Williamstown, MA, USA, pp 282–289

  • Langlois D, Raybaud S, Smaïli K (2012) Loria system for the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation, Montreal, Quebec, Canada, pp 114–119

  • Laurent A, Camelin N, Raymond C (2014) Boosting bonsai trees for efficient features combination: application to speaker role identification. In: Proceedings of the 15th annual conference of the international speech communication association (INTERSPEECH 2014), Singapore, pp 76–80

  • Lavergne T, Cappé O, Yvon F (2010) Practical very large scale CRFs. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp 504–513

  • Lecouteux B, Linarès G, Favre B (2009) Combined low level and high level features for out-of-vocabulary word detection. In: Proceedings of the 10th annual conference of the international speech communication association (INTERSPEECH 2009), Brighton, UK, pp 1187–1190

  • Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707–710

    MathSciNet  Google Scholar 

  • Logacheva V, Hokamp C, Specia L (2016) MARMOT: a toolkit for translation quality estimation at the word level. In: Proceedings of the tenth international conference on language resources and evaluation (LREC 2016), Portorož, Slovenia, p 4

  • Luong NQ, Besacier L, Lecouteux B (2013a) Word confidence estimation and its integration in sentence quality estimation for machine translation. In: Proceedings of the fifth international conference on knowledge and systems engineering (KSE 2013), Hanoi, Vietnam, p 12

  • Luong NQ, Besacier L, Lecouteux B (2014a) LIG system for word level QE task at WMT14. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 335–341

  • Luong NQ, Besacier L, Lecouteux B (2014b) Word confidence estimation for SMT N-best list re-ranking. In: Proceedings of the EACL 2014 workshop on humans and computer-assisted translation, Gothenburg, Sweden, pp 1–9

  • Luong NQ, Lecouteux B, Besacier L (2013b) LIG system for WMT13 QE task: investigating the usefulness of features in word confidence estimation for MT. In: Proceedings of the eighth workshop on statistical machine translation, association for computational linguistics, Sofia, Bulgaria, pp 396–391

  • Luong NQ, Besacier L, Lecouteux B (2015) Towards accurate predictors of word quality for machine translation: lessons learned on french-english and english-spanish systems. Data Knowl Eng 96:32–42

    Article  Google Scholar 

  • Ng RWM, Doulaty M, Doddipatla R, Aziz W, Shah K, Saz O, Hasan M, AlHarbi G, Specia L, Hain T (2014) The USFD spoken language translation system for IWSLT 2014. In: Proceedings of the international workshop on spoken language translation (IWSLT), Lake Tahoe, CA, USA, pp 86–91

  • Ng RWM, Shah K, Aziz W, Specia L, Hain T (2015) Quality estimation for asr k-best list rescoring in spoken language translation. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015. Brisbane, Queensland, Australia, pp 5226–5230

  • Ng RWM, Shah K, Specia L, Hain T (2016) Groupwise learning for ASR k-best list reranking in spoken language translation. In: Proceedings of the 41st international conference on acoustics, speech and signal processing (ICASSP 2016), Shanghai, China, pp 6120–6124

  • Potet M, Besacier L, Blanchon H (2010) The LIG machine translation system for WMT 2010. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, Sweden, pp 161–166

  • Potet M, Esperança-Rodier E, Besacier L, Blanchon H (2012) Collection of a large database of french-english SMT output corrections. In: Proceedings of the eighth international conference on language resources and evaluation (LREC-2012), Istanbul, Turkey, pp 4043–4048

  • Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding, Waikoloa, Hawaii, US, p 4

  • Servan C, Le NT, Luong NQ, Lecouteux B, Besacier L (2015) An open source toolkit for word-level confidence estimation in machine translation. In: Proceedings of the 12th international workshop on spoken language translation (IWSLT’15), Da Nang, Vietnam, pp 196–203

  • Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 259–268

  • Specia L, Paetzold G, Scarton C (2015) Multi-level translation quality prediction with QuEst++. In: Proceedings of ACL-IJCNLP 2015 system demonstrations, Beijing, China, pp 115–120

  • Young SR (1994) Recognition confidence measures: detection of misrecognitions and out-of-vocabulary words. In: Proceedings of the international conference on acoustics, speech and signal processing, Adelaide, South Australia, pp 21–24

  • Zamani H, de Souza JG, Negri M, Turchi M, Falavigna D (2015) Reference-free and confidence-independent binary quality estimation for automatic speech recognition. In: Proceedings of the second Italian conference on computational linguistics (CLiC-it), Trento, Italy, pp 280–285

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurent Besacier.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le, NT., Lecouteux, B. & Besacier, L. Automatic quality estimation for speech translation using joint ASR and MT features. Machine Translation 32, 325–351 (2018). https://doi.org/10.1007/s10590-018-9218-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-018-9218-6

Keywords

Navigation