Live TV subtitling through respeaking with remote cutting-edge technology

  • Aleš PražákEmail author
  • Zdeněk Loose
  • Josef V. Psutka
  • Vlasta Radová
  • Josef Psutka


This article presents an original system for live TV subtitling using respeaking and automatic speech recognition. Unlike several commercially available live subtitling solutions, the technology presented in this article comprises a speech recognition system specifically designed for live subtitling, realizing the full potential of state-of-the-art speech technology. The enhancements implemented in our remote live subtitling system architecture are described and accompanied by real-world parameters obtained during several years of deployment at the public service broadcaster in the Czech Republic. This article also presents our four-phase respeaker training system and some new techniques related to the whole life cycle of live subtitles, such as a method for automatic live subtitle retiming or a technique for live subtitle delay elimination. This article can serve as an inspiration for how to deal with live subtitling, especially in minor languages.


Live subtitling Respeaking Automatic speech recognition 



This work was supported by European structural and investment funds (ESIF) (No. CZ.02.1.01/0.0/0.0/17_048/0007267).


  1. 1.
    Braunschweiler N, Gales MJ, Buchholz S (2010) Lightly supervised recognition for automatic alignment of large coherent speech recordings. in INTERSPEECHGoogle Scholar
  2. 2.
    de Castro M, Carrero D, Puente L, Ruiz B (2011) Real-time subtitle synchronization in live television programs. In 2011 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)Google Scholar
  3. 3.
    Dragon NaturallySpeaking (2018) Available:
  4. 4.
    Evans MJ (2003) WHP 065. British Broadcasting CorporationGoogle Scholar
  5. 5.
  6. 6.
    Hrúz M, Pražák A, Bušta M (2018) Multimodal name recognition in live TV subtitling. in INTERSPEECHGoogle Scholar
  7. 7.
    IBM Desktop ViaVoice (2018) Available:
  8. 8.
    Lehečka J, Pražák A (2018) Online LDA-Based Language Model Adaptation. in TSD 2018: Text, Speech and DialogueCrossRefGoogle Scholar
  9. 9.
    Levin K, Ponomareva I, Bulusheva A, Chernykh G, Medennikov I, Merkin N, Prudnikov A, Tomashenko NA (2014) Automated closed captioning for Russian live broadcasting. in INTERSPEECHGoogle Scholar
  10. 10.
    Ofcom (2017) Ofcom’s code on television access services. OfcomGoogle Scholar
  11. 11.
    Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlíček P, Qian Y, Schwarz P, Silovský J, Stemmer G, Veselý K (2011) The Kaldi speech recognition toolkit. In 2011 IEEE Workshop on Automatic Speech Recognition and UnderstandingGoogle Scholar
  12. 12.
    Pražák A, Loose Z, Trmal J, Psutka JV, Psutka J (2012) Novel approach to live captioning through re-speaking: Tailoring speech recognition to re-speaker’s needs. in INTERSPEECHGoogle Scholar
  13. 13.
    Pražák A, Psutka JV, Hoidekr J, Kanis J, Müller L, Psutka J (2006) Automatic online subtitling of the Czech parliament meetings. in TSD 2006: Text, Speech and DialogueCrossRefGoogle Scholar
  14. 14.
    Psutka JV, Pražák A, Psutka J, Radová V (2014) Captioning of live TV commentaries from the Olympic Games in Sochi: Some interesting insights. in TSD 2014: Text, Speech and DialogueCrossRefGoogle Scholar
  15. 15.
    Romero-Fresco P (2009) More haste less speed: Edited versus verbatim respoken subtitles. VIAL - Vigo International Journal of Applied Linguistics 6:109–133Google Scholar
  16. 16.
    Romero-Fresco P (2015) The reception of subtitles for the deaf and hard of hearing in Europe. Peter Lang, BerlinGoogle Scholar
  17. 17.
    Romero-Fresco P (2016) Accessing communication: The quality of live subtitles in the UK. Language & Communication 49:56–69Google Scholar
  18. 18.
    Romero-Fresco P, Pérez M (2015) Accuracy rate in live subtitling: The NER model. in Audiovisual Translation in a Global Context, Palgrave Macmillan, pp. 28–50.Google Scholar
  19. 19.
    SpeechTech MegaWord (2018) Available:
  20. 20.
    Stan A, Bell P, King S (2012) A grapheme-based method for automatic alignment of speech and text data. In 2012 IEEE Spoken Language Technology Workshop (SLT)Google Scholar
  21. 21.
    Stenograph L.L.C. (2018) Available:
  22. 22.
    Švec J, Lehečka J, Ircing P, Skorkovská L, Pražák A, Vavruška J, Stanislav P, Hoidekr J (2014) General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes. Language Resources and Evaluation 48(2):227–248CrossRefGoogle Scholar
  23. 23.
    Van Waes L, Leijten M, Remael A (2011) Live subtitling with speech recognition: Causes and consequences of revisions in the production process. Third International symposium on live subtitling: Exploring new avenues and new contexts, AntwerpenGoogle Scholar
  24. 24.
    Vaněk J, Trmal J, Psutka JV, Psutka J (2012) Optimized acoustic likelihoods computation for NVIDIA and ATI/AMD graphics processors. IEEE Transactions on Audio, Speech, and Language Processing 20(6):1818–1828CrossRefGoogle Scholar
  25. 25.
    Vaněk J, Zelinka J, Soutner D, Psutka J (2017) A Regularization Post Layer: An Additional Way how to Make Deep Neural Networks Robust. in SLSPCrossRefGoogle Scholar
  26. 26.
    Velotype (2018) Available:
  27. 27.
    Ware T, Simpson M (2016) WHP 318. British Broadcasting CorporationGoogle Scholar
  28. 28.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of West BohemiaPilsenCzech Republic
  2. 2.SpeechTech, s.r.oPilsenCzech Republic

Personalised recommendations