Advertisement

Text-to-Speech Alignment for Imperfect Transcriptions

  • Marek Boháč
  • Karel Blavka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)

Abstract

In this paper we propose a method for text-to-speech alignment intended for imperfect (text) transcriptions. We designed an ASR-based (automatic speech recognition) tool complemented with a special post-processing layer that finds anchor points in the transcription and then aligns the data between these anchor points. As the system is not dependent on usually employed keyword-spotter it is not as vulnerable to the noisy recordings as some other approaches. We also present other features of the system (e.g. keeping of the document structure and processing of the numbers) that allow us to use it in many other specific tasks. The performance is evaluated over a challenging set of recordings containing spontaneous speech with many hesitations, repetitions etc. as well as over noisy recordings.

Keywords

unsupervised text-to-speech alignment inaccurate transcription automatic speech recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhang, J., Pan, F., Yan, Y.: An LVCSR Based Automatic Scoring Method in English Reading Tests. In: 4th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2012, Nanchang, pp. 34–37 (2012)Google Scholar
  2. 2.
    Córdova Lucero, D.P., Toledano, D.T.: Preliminary Results of Alignment of Text and Audio in News and Songs. In: Joint 7th Spanish Speech Technology Workshop and the Iberian SLTech Workshop, Madrid, pp. 59–68 (2012)Google Scholar
  3. 3.
    Bohac, M., Blavka, K.: Automatic Segmentation and Annotation of Audio Archive Documents. In: 10th International Workshop on Electronics, Control, Measurement and Signals, ECMS 2011, Liberec, pp. 1–6 (2011)Google Scholar
  4. 4.
    Nouza, J., Zdansky, J., Cerva, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: Proc. of 15th IEEE MELECON Conference, Malta, pp. 202–205 (2010)Google Scholar
  5. 5.
    Stanislav, P., Švec, J., Šmídl, L.: Unsupervised Synchronization of Hidden Subtitles with Audio Track Using Keyword Spotting Algorithm. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 422–430. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    HTK Toolkit (March 2013), http://htk.eng.cam.ac.uk
  7. 7.
    Moreno, P.J.: Joerg, Ch. F., Van Thong, J.-M. and Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: The 5th International Conference on Spoken Language Processing - ICSLP, Sydney (1998)Google Scholar
  8. 8.
    Seps, L.: NanoTrans Editor for Orthographic and Phonetic Transcriptions. In: The 36th International Conference on Telecommunications and Signal Processing (TSP), Rome (in press, 2013)Google Scholar
  9. 9.
    Nouza, J., Zdansky, J., Cerva, P., Silovsky, J.: Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak). In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Second COST 2102. LNCS, vol. 5967, pp. 225–241. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Wagner, R.A., Fischer, M.J.: The String-to-String Correction Problem. Journal of the ACM 21(1), 168–173 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Boháč, M., Nouza, J., Blavka, K.: Investigation on Most Frequent Errors in Large-Scale Speech Recognition Applications. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 520–527. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Nouza, J., Cerva, P., Zdansky, J., Kucharova, M.: A Study on Adapting Czech Automatic Speech Recognition System to Croatian Language. In: 54th International Symposium ELMAR 2012, Zadar, pp. 227–230 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marek Boháč
    • 1
  • Karel Blavka
    • 1
  1. 1.SpeechLab, Faculty of MechatronicsTechnical University of LiberecLiberecCzech Republic

Personalised recommendations