Advertisement

The Translectures-UPV Toolkit

  • M. A. del-Agua
  • A. Giménez
  • N. Serrano
  • J. Andrés-Ferrer
  • J. Civera
  • A. Sanchis
  • A. Juan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

Over the past few years, online multimedia educational repositories have increased in number and popularity. The main aim of the transLectures project is to develop cost-effective solutions for producing accurate transcriptions and translations for large video lecture repositories, such as VideoLectures.NET or the Universitat Politècnica de València’s repository, poliMedia. In this paper, we present the transLectures-UPV toolkit (TLK), which has been specifically designed to meet the requirements of the transLectures project, but can also be used as a conventional ASR toolkit. The main features of the current release include HMM training and decoding with speaker adaptation techniques (fCMLLR). TLK has been tested on the VideoLectures.NET and poliMedia repositories, yielding very competitive results. TLK has been released under the permissive open source Apache License v2.0 and can be directly downloaded from the transLectures website.

Keywords

TLK ASR toolkit transLectures HMM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Final report on massive adaptation (M36). To be delivered on October 2014 (2014)Google Scholar
  2. 2.
  3. 3.
    Opencast Matterhorn, http://opencast.org/matterhorn/
  4. 4.
    sclite - Score speech recognition system output, http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm
  5. 5.
  6. 6.
    TLK: The transLectures-UPV Toolkit, https://www.translectures.eu/tlk/
  7. 7.
    Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)CrossRefGoogle Scholar
  9. 9.
    Digalakis, V., Rtischev, D., Neumeyer, L., Sa, E.: Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures. IEEE Transactions on Speech and Audio Processing 3, 357–366 (1995)CrossRefGoogle Scholar
  10. 10.
    Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proc. of ICASSP (2013)Google Scholar
  11. 11.
    Munteanu, C., Baecker, R., Penn, G., Toms, E., James, D.: The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. In: Proc. of CHI, pp. 493–502 (2006)Google Scholar
  12. 12.
    Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proceedings of the IEEE 88(8), 1224–1240 (2000)CrossRefGoogle Scholar
  13. 13.
    Ortmanns, S., Ney, H., Eiden, A.: Language-model look-ahead for large vocabulary speech recognition. In: Proc. of ICSLP, vol. 4, pp. 2095–2098 (1996)Google Scholar
  14. 14.
    Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Computer Speech and Language 11(1), 43–72 (1997)CrossRefGoogle Scholar
  15. 15.
    Povey, D., et al.: The Kaldi Speech Recognition Toolkit. In: Proc. of ASRU (2011)Google Scholar
  16. 16.
    Rumelhart, D., Hintont, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  17. 17.
    Rybach, D., et al.: The RWTH Aachen University Open Source Speech Recognition System. In: Proc. Interspeech, pp. 2111–2114 (2009)Google Scholar
  18. 18.
    Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription. In: Proc. of ASRU, pp. 24–29 (2011)Google Scholar
  19. 19.
    Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)CrossRefzbMATHGoogle Scholar
  20. 20.
    Young, S., et al.: The HTK Book. Cambridge University Engineering Department (1995)Google Scholar
  21. 21.
    Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based state tying for high accuracy acoustic modelling. In: Proc. of HLT, pp. 307–312 (1994)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • M. A. del-Agua
    • 1
  • A. Giménez
    • 1
  • N. Serrano
    • 1
  • J. Andrés-Ferrer
    • 1
  • J. Civera
    • 1
  • A. Sanchis
    • 1
  • A. Juan
    • 1
  1. 1.MLLP, DSICUniversitat Politècnica de València (UPV)ValènciaSpain

Personalised recommendations