Statistical Error Correction Methods for Domain-Specific ASR Systems

Cucu, Horia; Buzo, Andi; Besacier, Laurent; Burileanu, Corneliu

doi:10.1007/978-3-642-39593-2_7

Horia Cucu²²,
Andi Buzo²²,
Laurent Besacier²³ &
…
Corneliu Burileanu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

2743 Accesses
7 Citations

Abstract

Whenever an ASR company promises to deliver error-proof transcripts to the end user, manual verification and correction of the raw ASR transcripts cannot be avoided. This manual post-editing process systematically generates new and correct domain-specific data which can be used to incrementally improve the original ASR system. This paper proposes a statistic, SMT-based ASR error correction method, which takes advantage of the past corrected ASR errors to automatically post-process its future transcripts. We show that the proposed method can bring more than 10% WER improvements using only 2000 user-corrected sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yu, D., Hwang, M., Mau, P., Acero, A., Deng, L.: Unsupervised Learning from Users’ Error Correction in Speech Dictation. In: 8th International Conference on Spoken Language Processing (Interspeech), Jeju Island, Korea, pp. 1969–1972 (2004)
Google Scholar
Cucu, H., Besacier, L., Burileanu, C., Buzo, A.: Investigating the Role of Machine Translated Text in ASR Domain Adaptation: Unsupervised and Semi-supervised Methods. In: The 2011 Automatic Speech Recognition and Understanding Workshop (ASRU 2011), Hawaii, USA, pp. 260–265 (2011)
Google Scholar
Cucu, H., Besacier, L., Burileanu, C., Buzo, A.: ASR Domain Adaptation Methods for Low-Resourced Languages: Application to Romanian Language. In: 20 th European Signal Processing Conference (EUSIPCO), Bucharest, Romania (2012)
Google Scholar
Ringger, E.K., Allen, J.F.: A Fertility Channel Model for Post-Correction of Continuous Speech Recognition. In: 4th International Conference on Spoken Language Processing, Philadelphia, USA, vol. 2, pp. 897–900 (1996)
Google Scholar
Jung, S., Jeong, M., Lee, G.G.: Speech recognition error correction using maximum entropy language model. In: 8th International Conference on Spoken Language Processing (Interspeech), Jeju Island, Korea, pp. 2137–2140 (2004)
Google Scholar
Kaki, S., Sumita, E., Iida, H.: A method for correcting errors in speech recognition, using the statistical features of character co-occurrence. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL), Montreal, Canada, pp. 653–657 (1998)
Google Scholar
Brandow, R.L., Strzalkowski, T.: Improving speech recognition through text-based linguistic post-processing. United States Patent 6064957 (2000)
Google Scholar
Mangu, L., Padmanabhan, M.: Error corrective mechanisms for speech recognition. In: 27th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, USA, vol. 1, pp. 29–32 (2001)
Google Scholar
Sarma, A., Palmer, D.D.: Context-based speech recognition error detection and correction. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2004), Boston, USA, pp. 85–88 (2004)
Google Scholar
Moses Statistical Machine Translation Toolkit, http://www.statmt.org/moses
Simard, M., Ueffing, N., Isabelle, P., Kuhn, R.: Rule-Based Translation with Statistical Phrase-Based Post-Editing. In: 2nd Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 203–206 (2007)
Google Scholar
Simard, M., Goutte, C., Isabelle, P.: Statistical Phrase-based Post-editing. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2007), Rochester, NY, USA, pp. 508–515 (2007)
Google Scholar
Lagarda, A.L., Alabau, V., Casacuberta, F., Silva, R., Díaz-de-Liaño, E.: Statistical Post-Editing of a Rule-Based Machine Translation System. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2009), Boulder, CO, USA, pp. 217–220 (2009)
Google Scholar
Cucu, H., Besacier, L., Burileanu, C., Buzo, A.: Enhancing Automatic Speech Recognition for Romanian by Using Machine Translated and Web-based Text Corpora. In: The 15th International Conference SPECOM, Kazan, Russia (2011)
Google Scholar
CMU-Sphinx Speech Recognition Toolkit, http://cmusphinx.sourceforge.net
SRI Language Modeling Toolkit, http://www.speech.sri.com/projects/srilm

Download references

Author information

Authors and Affiliations

University “Politehnica” of Bucharest, Romania
Horia Cucu, Andi Buzo & Corneliu Burileanu
LIG, University Joseph Fourier, Grenoble, France
Laurent Besacier

Authors

Horia Cucu
View author publications
You can also search for this author in PubMed Google Scholar
Andi Buzo
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Besacier
View author publications
You can also search for this author in PubMed Google Scholar
Corneliu Burileanu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Avinguda Catalunya, 35, 43002, Tarragona, Spain
Adrian-Horia Dediu & Carlos Martín-Vide &
Research Institute for Information and Language Processing, Research Group in Computational Linguistics, University of Wolverhampton, WV1 1SB, Wolverhampton, UK
Ruslan Mitkov
Fakultät für Informatik, Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Bianca Truthe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cucu, H., Buzo, A., Besacier, L., Burileanu, C. (2013). Statistical Error Correction Methods for Domain-Specific ASR Systems. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-39593-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39592-5
Online ISBN: 978-3-642-39593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics