Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data
Conference paper
- 5 Citations
- 1k Downloads
Abstract
A solution to the problem of fast single-pass alignment of speech with imperfect transcripts is introduced. The proposed technique is based on constructing a special word network for segmentation. We examine robustness and segmentation quality for different types of errors and different levels of noise in the text, depending on the parameters of network tuning. Experiments showed that with properly selected parameters the algorithm is robust to noise of any type in transcripts. The proposed approach has been successfully applied to the task of creating movie subtitles.
Keywords
speech segmentation imperfect transcriptions speech-text alignment closed captionPreview
Unable to display preview. Download preview PDF.
References
- 1.Pitz, M., Molau, S., Schluter, R., Ney, H.: Automatic transcription verification of broadcast news and similar speech corpora. In: Proc. DARPA Broadcast News Workshop, Herndon, VA, pp. 157–159 (1999)Google Scholar
- 2.Lamel, L., Gauvain, J.L., Adda, G.: Lightly supervised acoustic model training. In: Proc. ISCA ITRW ASR 2000 (2000)Google Scholar
- 3.Moreno, P., Joerg, C., Van Thong, J.-M., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Proc. ICSLP 1998, Sydney, Australia, pp. 2711–2714. IEEE Press (1998)Google Scholar
- 4.Braunschweiler, N., Gales, M.J.F., Buchholz, S.: Lightly supervised recognition for automatic alignment of large coherent speech recordings. In: Proc. of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2222–2225 (2010)Google Scholar
- 5.Boeffard, O., Charonnat, L., Maguer, S., Lolive, D., Vidal, G.: Towards Fully Automatic Annotation of Audiobooks for TTS. In: Proc. LREC (2012)Google Scholar
- 6.Katsamanis, A., Black, M.P., Georgiou, P.G., Goldstein, L., Narayanan, S.: SailAlign: Robust long speech-text alignment. In: Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research (2011)Google Scholar
- 7.Haubold, A., Kender, J.R.: Augmented segmentation and visualization for presentation 2005, pp. 51–60. ACM Press, Singapore (2005)Google Scholar
- 8.Hazen, T.J.: Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings. In: Interspeech. IEEE Press, Pittsburgh (2006)Google Scholar
- 9.Lecouteux, B., Linarés, G., Nocéra, P., Bonastre, J.-F.: Imperfect transcript driven speech recognition. In: Proc. Interspeech (2006)Google Scholar
- 10.Placeway, P., Lafferty, J.: Cheating with Imperfect Transcripts. In: Proceedings ICSLP (1996)Google Scholar
- 11.Stan, A., Bell, P., King, S.: A grapheme-based method for automatic alignment of speech and text data. In: Proc. IEEE Workshop on Spoken Language Technology (2012)Google Scholar
Copyright information
© Springer International Publishing Switzerland 2013