Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8113)


A solution to the problem of fast single-pass alignment of speech with imperfect transcripts is introduced. The proposed technique is based on constructing a special word network for segmentation. We examine robustness and segmentation quality for different types of errors and different levels of noise in the text, depending on the parameters of network tuning. Experiments showed that with properly selected parameters the algorithm is robust to noise of any type in transcripts. The proposed approach has been successfully applied to the task of creating movie subtitles.


speech segmentation imperfect transcriptions speech-text alignment closed caption 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pitz, M., Molau, S., Schluter, R., Ney, H.: Automatic transcription verification of broadcast news and similar speech corpora. In: Proc. DARPA Broadcast News Workshop, Herndon, VA, pp. 157–159 (1999)Google Scholar
  2. 2.
    Lamel, L., Gauvain, J.L., Adda, G.: Lightly supervised acoustic model training. In: Proc. ISCA ITRW ASR 2000 (2000)Google Scholar
  3. 3.
    Moreno, P., Joerg, C., Van Thong, J.-M., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Proc. ICSLP 1998, Sydney, Australia, pp. 2711–2714. IEEE Press (1998)Google Scholar
  4. 4.
    Braunschweiler, N., Gales, M.J.F., Buchholz, S.: Lightly supervised recognition for automatic alignment of large coherent speech recordings. In: Proc. of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2222–2225 (2010)Google Scholar
  5. 5.
    Boeffard, O., Charonnat, L., Maguer, S., Lolive, D., Vidal, G.: Towards Fully Automatic Annotation of Audiobooks for TTS. In: Proc. LREC (2012)Google Scholar
  6. 6.
    Katsamanis, A., Black, M.P., Georgiou, P.G., Goldstein, L., Narayanan, S.: SailAlign: Robust long speech-text alignment. In: Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research (2011)Google Scholar
  7. 7.
    Haubold, A., Kender, J.R.: Augmented segmentation and visualization for presentation 2005, pp. 51–60. ACM Press, Singapore (2005)Google Scholar
  8. 8.
    Hazen, T.J.: Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings. In: Interspeech. IEEE Press, Pittsburgh (2006)Google Scholar
  9. 9.
    Lecouteux, B., Linarés, G., Nocéra, P., Bonastre, J.-F.: Imperfect transcript driven speech recognition. In: Proc. Interspeech (2006)Google Scholar
  10. 10.
    Placeway, P., Lafferty, J.: Cheating with Imperfect Transcripts. In: Proceedings ICSLP (1996)Google Scholar
  11. 11.
    Stan, A., Bell, P., King, S.: A grapheme-based method for automatic alignment of speech and text data. In: Proc. IEEE Workshop on Spoken Language Technology (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  1. 1.Speech Technology CenterSaint-PetersburgRussia

Personalised recommendations