Controlling the Uncertainty Area in the Real Time LVCSR Application

  • Nickolay Merkin
  • Ivan Medennikov
  • Alexei Romanenko
  • Alexander Zatvornitskiy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8773)

Abstract

We propose an approach to improving the usability of an automatic speech recognition system in real time. We introduce the concept of an “uncertainty area” (UA): a time span within which the current recognition result may vary. By fixing the length of the UA we make it possible to start editing the recognized text without waiting for the phrase to end. We control the length of the UA by regularly pruning hypotheses using additional criteria. The approach was implemented in the software-hardware system for closed captioning of Russian live TV broadcasts.

Keywords

Automated closed captioning ASR respeaking technology real-time editing of ASR results live broadcasting 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Evans, M.J.: Speech Recognition in Assisted and Live Subtitling for Television. R&D White Paper WHP 065, BBC Research & Development (2003)Google Scholar
  2. 2.
    Pražák, A., Loose, Z., Trmal, J., Psutka, V.J., Psutka, J.: Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker’s Needs. In: Proc. of the INTERSPEECH, Portland, USA, September 9-13 (2012)Google Scholar
  3. 3.
    Viterbi, A.J.: Convolutional codes and their performance in communication systems. IEEE Transactions on Communication Technology 19(5), 751–772 (1971)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)CrossRefGoogle Scholar
  5. 5.
    Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: Cross-Validation State Control in Acoustic Model Training of Automatic Speech Recognition System. Scientific and Technical Journal Priborostroenie 57(2), 23–28 (2014)Google Scholar
  6. 6.
    Yurkov, P., Korenevsky, M., Levin, K.: An Improvement of robustness to speech loudness change for an ASR system based on LC-RC features. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 62–66 (2011)Google Scholar
  7. 7.
    Prisyach, T., Khokhlov, Y.: Class acoustic models in automatic speech recognition. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 67–72 (2011)Google Scholar
  8. 8.
    Korenevsky, M., Bulusheva, A., Levin, K.: Unknown Words Modeling in Training and Using Language Models for Russian LVCSR System. In: Proc. of the SPECOM, Kazan, Russia, pp. 144–150 (2011)Google Scholar
  9. 9.
    Tomashenko, N., Khokhlov, Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Proc. SPECOM, Plzen, Czech Republic, September 1-5, pp. 146–153 (2013)Google Scholar
  10. 10.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)CrossRefGoogle Scholar
  11. 11.
    Schwarz, P.: Phoneme recognition based on long temporal context (PhD thesis). Faculty of Information Technology BUT, Brno (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Nickolay Merkin
    • 1
  • Ivan Medennikov
    • 2
    • 3
  • Alexei Romanenko
    • 2
  • Alexander Zatvornitskiy
    • 1
  1. 1.Speech Technology CenterSaint-PetersburgRussia
  2. 2.ITMO UniversitySaint-PetersburgRussia
  3. 3.SPb State UniversitySaint-PetersburgRussia

Personalised recommendations