Edit Distance Comparison Confidence Measure for Speech Recognition

  • Dawid SkurzokEmail author
  • Bartosz Ziółko
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 240)


A new possible confidence measure for automatic speech recognition is presented along with results of tests where they were applied. A classical method based on comparing the strongest hypotheses with an average of a few next hypotheses was used as a ground truth. Details of our own method based on comparison of edit distances are depicted with results of tests. It was found useful for spoken dialogue system as a module asking to repeat a phrase or declaring that it was not recognised. The method was designed for Polish language, which is morphologically rich.


Speech recognition decisions Polish 



The project was funded by the National Science Centre allocated on the basis of a decision DEC-2011/03/D/ST6/00914.


  1. 1.
    Guo G, Huang C, Jiang H, Wang RH (2004) A comparative study on various confidence measures in large vocabulary speech recognition. Proceedings of international symposium on Chinese spoken language, pp 9–12Google Scholar
  2. 2.
    Razik J, Mella O, Fohr D, Haton J (2011) Frame-synchronous and local confidence measures for automatic speech recognition. Int J Pattern Recognit Artif Intell 25:157–182CrossRefMathSciNetGoogle Scholar
  3. 3.
    Wessel F, Schluter R, Macherey K, Ney H (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Proc 9(3):288–298CrossRefGoogle Scholar
  4. 4.
    Molina C, Yoma N, Huenupan F, Garreton C, Wuth J (2010) Maximum entropy-based reinforcement learning using a condense measure in speech recognition for telephone speech. IEEE Trans Audio, Speech Lang Proc 18(5):1041–1052Google Scholar
  5. 5.
    Ziółko B, Jadczyk T, Skurzok D, Ziółko M (2012) Confidence measure by substring comparison for automatic speech recognition. ICALIP, ShanghaiGoogle Scholar
  6. 6.
    Zhou L, Shi Y, Sears A (2010) Third-party error detection support mechanisms for dictation speech recognition. Interact Comput 22:375–388CrossRefGoogle Scholar
  7. 7.
    Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182–1192CrossRefGoogle Scholar
  8. 8.
    Huet S, Gravier G, Sebillot P (2010) Morpho-syntactic post-processing of n-best lists for improved French automatic speech recognition. Comput Speech Lang 24:663–684CrossRefGoogle Scholar
  9. 9.
    Kim W, Hansen J (2010) Phonetic distance based condense measure. IEEE Signal Process Lett 17(2):121–124CrossRefGoogle Scholar
  10. 10.
    Seigel M, Woodland P (2011) Combining information sources for confidence estimation with crf models. Proceedings of InterSpeechGoogle Scholar
  11. 11.
    Ziółko M, Gałka J, Ziółko B, Jadczyk T, Skurzok D, Mąsior M (2011) Automatic speech recognition system dedicated for Polish. Proceedings of Interspeech, FlorenceGoogle Scholar
  12. 12.
    Nouza J, Zdansky J, David P, Cerva P, Kolorenc J, Nejedlova D (2005) Fully automated system for Czech spoken broadcast transcription with very large (300 k+) lexicon. Proceedings of InterSpeech, pp 1681–1684Google Scholar
  13. 13.
    Hirsimaki T, Pylkkonen J, Kurimo M (2009) Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans Audio Speech Lang Process 17(4):724–732CrossRefGoogle Scholar
  14. 14.
    Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707–710MathSciNetGoogle Scholar
  15. 15.
    Grocholewski S (1998) First database for spoken Polish. Proceedings of international conference on language resources and evaluation, Grenada, pp 1059–1062Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht(Outside the USA) 2013

Authors and Affiliations

  1. 1.Department of ElectronicsAGH University of Science and TechnologyKrakówPoland

Personalised recommendations