Skip to main content

Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data

  • Conference paper
Speech and Computer (SPECOM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:


A solution to the problem of fast single-pass alignment of speech with imperfect transcripts is introduced. The proposed technique is based on constructing a special word network for segmentation. We examine robustness and segmentation quality for different types of errors and different levels of noise in the text, depending on the parameters of network tuning. Experiments showed that with properly selected parameters the algorithm is robust to noise of any type in transcripts. The proposed approach has been successfully applied to the task of creating movie subtitles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Pitz, M., Molau, S., Schluter, R., Ney, H.: Automatic transcription verification of broadcast news and similar speech corpora. In: Proc. DARPA Broadcast News Workshop, Herndon, VA, pp. 157–159 (1999)

    Google Scholar 

  2. Lamel, L., Gauvain, J.L., Adda, G.: Lightly supervised acoustic model training. In: Proc. ISCA ITRW ASR 2000 (2000)

    Google Scholar 

  3. Moreno, P., Joerg, C., Van Thong, J.-M., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Proc. ICSLP 1998, Sydney, Australia, pp. 2711–2714. IEEE Press (1998)

    Google Scholar 

  4. Braunschweiler, N., Gales, M.J.F., Buchholz, S.: Lightly supervised recognition for automatic alignment of large coherent speech recordings. In: Proc. of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2222–2225 (2010)

    Google Scholar 

  5. Boeffard, O., Charonnat, L., Maguer, S., Lolive, D., Vidal, G.: Towards Fully Automatic Annotation of Audiobooks for TTS. In: Proc. LREC (2012)

    Google Scholar 

  6. Katsamanis, A., Black, M.P., Georgiou, P.G., Goldstein, L., Narayanan, S.: SailAlign: Robust long speech-text alignment. In: Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research (2011)

    Google Scholar 

  7. Haubold, A., Kender, J.R.: Augmented segmentation and visualization for presentation 2005, pp. 51–60. ACM Press, Singapore (2005)

    Google Scholar 

  8. Hazen, T.J.: Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings. In: Interspeech. IEEE Press, Pittsburgh (2006)

    Google Scholar 

  9. Lecouteux, B., Linarés, G., Nocéra, P., Bonastre, J.-F.: Imperfect transcript driven speech recognition. In: Proc. Interspeech (2006)

    Google Scholar 

  10. Placeway, P., Lafferty, J.: Cheating with Imperfect Transcripts. In: Proceedings ICSLP (1996)

    Google Scholar 

  11. Stan, A., Bell, P., King, S.: A grapheme-based method for automatic alignment of speech and text data. In: Proc. IEEE Workshop on Spoken Language Technology (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Tomashenko, N.A., Khokhlov, Y.Y. (2013). Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham.

Download citation

  • DOI:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01930-7

  • Online ISBN: 978-3-319-01931-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics