Efficiency of Speech Alignment for Semi-automated Subtitling in Dutch

  • Patrick Wambacq
  • Kris Demuynck
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6836)


This paper describes the use of speech alignment to aid in the process of subtitling Dutch TV programs. The recognizer aligns the audio stream with an existing transcript. The goal is therefore not to transcribe but to generate the correct timing of every word. The system performs subtasks such as audio segmentation, transcript preprocessing, alignment and subtitle compression. The result is not perfect but good enough to gain efficiency when used by a professional subtitler as a starting point to refine and finalize the subtitles. In our tests, considerable time savings of 47 to 53% on average are obtained, such that the generation of subtitles for a 1 hour program, is lowered from between 4 and 7 hours to between 2.5 and 4 hours. This is all the more important in the context of an increased pressure from user groups on governments and broadcasters to reach 100% subtitled TV programs.


Automatic Speech Recognition Word Error Rate Audio Stream Broadcast News Speaker Adaptation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Meinedo, H., Viveiros, M., da Silva Neto, J.P.: Evaluation of a Live Broadcast News Subtitling System for Portuguese. In: Proc. Interspeech 2008, Brisbane, Australia, pp. 508–511 (September 2008)Google Scholar
  2. 2.
    Homma, S., Kobayashi, A., Oku, T., Sato, S., Imai, T., Takagi, T.: New Real-Time Closed-Captioning System for Japanese Broadcast News Programs. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 651–654. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Vandecatseye, A., Martens, J.-P.: A Fast, Accurate and Stream-Based Speaker Segmentation and Clustering Algorithm. In: Proceedings of the 8th European Conference on Speech Communication and Technology, Eurospeech 2003, Geneva, Switzerland, vol. 2, pp. 941–944 (September 2003)Google Scholar
  4. 4.
    Demuynck, K., Roelens, J., Van Compernolle, D., Wambacq, P.: SPRAAK: An Open Source SPeech Recognition and Automatic Annotation Kit. In: Proc. Interspeech 2008, Brisbane, Australia, p. 495 (September 2008)Google Scholar
  5. 5.
    Demuynck, K., Puurula, A., Van Compernolle, D., Wambacq, P.: The ESAT 2008 System for N-Best Dutch Speech Recognition Benchmark. In: Proc. IEEE ASRU Workshop, Merano, Italy, pp. 339–343 (December 2009)Google Scholar
  6. 6.
    Demuynck, K., Laureys, T., Wambacq, P., Van Compernolle, D.: Automatic phonemic labeling and segmentation of spoken Dutch. In: Proc. LREC-2004, Lisbon, Portugal, pp. 61–64 (May 2004)Google Scholar
  7. 7.
    Mertens, P., Vercammen, F.: ”FONILEX manual”, K.U.Leuven – CCL Technical report (1998),
  8. 8.
    Daelemans, W., Höthker, A., Tjong Kim Sang, E.: Automatic sentence simplification for subtitling in Dutch and English. In: Proc. LREC 2004, Lisbon, Portugal, pp. 1045–1048 ( May 2004)Google Scholar
  9. 9.
    Boersma, P., Weenink, D.: Praat: doing phonetics by computer (Version 5.1.05) (Computer program), from (retrieved May 1, 2009)
  10. 10.
    Wambacq, P., Vanroose, P., Yang, X., Duchateau, J., Van Uytsel, D.H.: Speech Recognition for Subtitling Purposes. In: Proc. 5th Intl. Conf. Languages & The Media, Berlin, Germany, p. 46 (November 2004) (Abstract)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Patrick Wambacq
    • 1
  • Kris Demuynck
    • 1
  1. 1.ESAT/PSI-SpeechKatholieke Universiteit LeuvenBelgium

Personalised recommendations