Advertisement

Confidence Measures in Automatic Speech Recognition Systems for Error Detection in Restricted Domains

  • Julia Olcoz
  • Alfonso Ortega
  • Antonio Miguel
  • Eduardo Lleida
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

This paper presents the performance achieved using Confidence Measures (CM) in Automatic Speech Recognition (ASR) for the transcription of weather reports from the Spanish public broadcast channel (RTVE). In the CM computation, first Acoustic-Phonetic Decoding (APD) is carried out, then we align reference and hypothesis word sequences through a phone-graph, and finally in this decoding mesh given a time interval, the maximum posterior probability of the hypothesized word is selected as the CM value. The final goal is to use the CM module as an extension of the ASR system to automatically evaluate the reliability of recognition results, discarding low confidence words at the output. These CM can be used as a tool for Unsupervised Learning Techniques, and also for helping human supervision of recognition results. If accurate enough, these CM would increase the usability as well as the robustness of speech applications.

Keywords

Automatic Speech Recognition Unsupervised Learning Techniques Confidence Measures Acoustic-Phonetic Decoding Error Detection Restricted Domains 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Imseng, D., Potard, B., Motticek, P., Nanchen, A., Bourlard, H.: Exploiting untranscribed foreign data for speech recognition in well-resourced languages. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (2014)Google Scholar
  2. 2.
    Vesely, K., Burget, L.: Semi-supervised training of deep neural networks. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 267–272 (2013)Google Scholar
  3. 3.
    Jiang, H.: Confidence Measures for speech recognition: A survey. Speech Communication 45, 455–470 (2005)CrossRefGoogle Scholar
  4. 4.
    Cox, S., Rose, R.: Confidence Measures for the switchboard database. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 511–514 (1996)Google Scholar
  5. 5.
    Wessel, F., Schluter, R., Macharey, K., Ney, H.: Confidence Measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing 9(3), 288–298 (2001)CrossRefGoogle Scholar
  6. 6.
    Lleida, E., Rose, R.: Likelihood ratio decoding and confidence measures for continuous speech recognition. In: Proceeding of the Fourth International Conference on Spoken Language Processing, pp. 478–481 (1996)Google Scholar
  7. 7.
    Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mario, J., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: EUROSPEECH (1993)Google Scholar
  8. 8.
    Moreno, A., Borge, L., Christoph, D., Khalid, C., Stephan, A., Jeffrey, A.: Speech-Dat Car: a large vocabulary speech database for automotive environments. In: Proceedings II LREC (2000)Google Scholar
  9. 9.
    Justo, R., Saz, O., Guijarrubia, V., Miguel, A., Torres, M., Lleida, E.: Improving dialogue systems in a home automation environment. In: Proceedings of the First International Conference on Ambient Media and Systems (Ambi-Sys), Quebec City (2008)Google Scholar
  10. 10.
    Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book, version 3.4. Microsoft Corporation (1995)Google Scholar
  11. 11.
    Stolcke, A.: An Extensible Language Modeling Toolkit. In: International Conference on Spoken Language Processing (ICSLP 2002), Denver (2002)Google Scholar
  12. 12.
    Gauvain, J., Chin-Hui, L.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing 2(2), 291–299 (1994)CrossRefGoogle Scholar
  13. 13.
    Mohri, M., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. In: International Conference on Spoken Language Processing (ICSLP 2002), Denver (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Julia Olcoz
    • 1
  • Alfonso Ortega
    • 1
  • Antonio Miguel
    • 1
  • Eduardo Lleida
    • 1
  1. 1.ViVoLab, Aragon Institute for Engineering Research (I3A)University of ZaragozaSpain

Personalised recommendations