Abstract
A new possible confidence measure for automatic speech recognition is presented along with results of tests where they were applied. A classical method based on comparing the strongest hypotheses with an average of a few next hypotheses was used as a ground truth. Details of our own method based on comparison of edit distances are depicted with results of tests. It was found useful for spoken dialogue system as a module asking to repeat a phrase or declaring that it was not recognised. The method was designed for Polish language, which is morphologically rich.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Guo G, Huang C, Jiang H, Wang RH (2004) A comparative study on various confidence measures in large vocabulary speech recognition. Proceedings of international symposium on Chinese spoken language, pp 9–12
Razik J, Mella O, Fohr D, Haton J (2011) Frame-synchronous and local confidence measures for automatic speech recognition. Int J Pattern Recognit Artif Intell 25:157–182
Wessel F, Schluter R, Macherey K, Ney H (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Proc 9(3):288–298
Molina C, Yoma N, Huenupan F, Garreton C, Wuth J (2010) Maximum entropy-based reinforcement learning using a condense measure in speech recognition for telephone speech. IEEE Trans Audio, Speech Lang Proc 18(5):1041–1052
Ziółko B, Jadczyk T, Skurzok D, Ziółko M (2012) Confidence measure by substring comparison for automatic speech recognition. ICALIP, Shanghai
Zhou L, Shi Y, Sears A (2010) Third-party error detection support mechanisms for dictation speech recognition. Interact Comput 22:375–388
Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182–1192
Huet S, Gravier G, Sebillot P (2010) Morpho-syntactic post-processing of n-best lists for improved French automatic speech recognition. Comput Speech Lang 24:663–684
Kim W, Hansen J (2010) Phonetic distance based condense measure. IEEE Signal Process Lett 17(2):121–124
Seigel M, Woodland P (2011) Combining information sources for confidence estimation with crf models. Proceedings of InterSpeech
Ziółko M, Gałka J, Ziółko B, Jadczyk T, Skurzok D, Mąsior M (2011) Automatic speech recognition system dedicated for Polish. Proceedings of Interspeech, Florence
Nouza J, Zdansky J, David P, Cerva P, Kolorenc J, Nejedlova D (2005) Fully automated system for Czech spoken broadcast transcription with very large (300 k+) lexicon. Proceedings of InterSpeech, pp 1681–1684
Hirsimaki T, Pylkkonen J, Kurimo M (2009) Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans Audio Speech Lang Process 17(4):724–732
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707–710
Grocholewski S (1998) First database for spoken Polish. Proceedings of international conference on language resources and evaluation, Grenada, pp 1059–1062
Acknowledgments
The project was funded by the National Science Centre allocated on the basis of a decision DEC-2011/03/D/ST6/00914.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht(Outside the USA)
About this paper
Cite this paper
Skurzok, D., Ziółko, B. (2013). Edit Distance Comparison Confidence Measure for Speech Recognition. In: Park, J., Ng, JY., Jeong, HY., Waluyo, B. (eds) Multimedia and Ubiquitous Engineering. Lecture Notes in Electrical Engineering, vol 240. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6738-6_19
Download citation
DOI: https://doi.org/10.1007/978-94-007-6738-6_19
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6737-9
Online ISBN: 978-94-007-6738-6
eBook Packages: EngineeringEngineering (R0)