Text-to-Speech Alignment for Imperfect Transcriptions
In this paper we propose a method for text-to-speech alignment intended for imperfect (text) transcriptions. We designed an ASR-based (automatic speech recognition) tool complemented with a special post-processing layer that finds anchor points in the transcription and then aligns the data between these anchor points. As the system is not dependent on usually employed keyword-spotter it is not as vulnerable to the noisy recordings as some other approaches. We also present other features of the system (e.g. keeping of the document structure and processing of the numbers) that allow us to use it in many other specific tasks. The performance is evaluated over a challenging set of recordings containing spontaneous speech with many hesitations, repetitions etc. as well as over noisy recordings.
Keywordsunsupervised text-to-speech alignment inaccurate transcription automatic speech recognition
Unable to display preview. Download preview PDF.
- 1.Zhang, J., Pan, F., Yan, Y.: An LVCSR Based Automatic Scoring Method in English Reading Tests. In: 4th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2012, Nanchang, pp. 34–37 (2012)Google Scholar
- 2.Córdova Lucero, D.P., Toledano, D.T.: Preliminary Results of Alignment of Text and Audio in News and Songs. In: Joint 7th Spanish Speech Technology Workshop and the Iberian SLTech Workshop, Madrid, pp. 59–68 (2012)Google Scholar
- 3.Bohac, M., Blavka, K.: Automatic Segmentation and Annotation of Audio Archive Documents. In: 10th International Workshop on Electronics, Control, Measurement and Signals, ECMS 2011, Liberec, pp. 1–6 (2011)Google Scholar
- 4.Nouza, J., Zdansky, J., Cerva, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: Proc. of 15th IEEE MELECON Conference, Malta, pp. 202–205 (2010)Google Scholar
- 6.HTK Toolkit (March 2013), http://htk.eng.cam.ac.uk
- 7.Moreno, P.J.: Joerg, Ch. F., Van Thong, J.-M. and Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: The 5th International Conference on Spoken Language Processing - ICSLP, Sydney (1998)Google Scholar
- 8.Seps, L.: NanoTrans Editor for Orthographic and Phonetic Transcriptions. In: The 36th International Conference on Telecommunications and Signal Processing (TSP), Rome (in press, 2013)Google Scholar
- 9.Nouza, J., Zdansky, J., Cerva, P., Silovsky, J.: Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak). In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Second COST 2102. LNCS, vol. 5967, pp. 225–241. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 12.Nouza, J., Cerva, P., Zdansky, J., Kucharova, M.: A Study on Adapting Czech Automatic Speech Recognition System to Croatian Language. In: 54th International Symposium ELMAR 2012, Zadar, pp. 227–230 (2012)Google Scholar