Abstract
Online multimedia repositories are growing rapidly. However, language barriers are often difficult to overcome for many of the current and potential users. In this paper we describe a TTS Spanish system and we apply it to the synthesis of transcribed and translated video lectures. A statistical parametric speech synthesis system, in which the acoustic mapping is performed with either HMM-based or DNN-based acoustic models, has been developed. To the best of our knowledge, this is the first time that a DNN-based TTS system has been implemented for the synthesis of Spanish. A comparative objective evaluation between both models has been carried out. Our results show that DNN-based systems can reconstruct speech waveforms more accurately.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahocoder, http://aholab.ehu.es/ahocoder
Coursera, http://www.coursera.org
HMM-Based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp
Khan Academy, http://www.khanacademy.org
Axelrod, A., He, X., Gao, J.: Domain adaptation via pseudo in-domain data selection. In: Proc. of EMNLP, pp. 355–362 (2011)
Bottou, L.: Stochastic gradient learning in neural networks. In: Proceedings of Neuro-Nîmes 1991. EC2, Nimes, France (1991)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)
Erro, D., Sainz, I., Navas, E., Hernaez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Processing 8(2), 184–194 (2014)
Fan, Y., Qian, Y., Xie, F., Soong, F.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proc. of Interspeech (submitted 2014)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. of ICASSP, vol. 1, pp. 373–376 (1996)
King, S.: Measuring a decade of progress in text-to-speech. Loquens 1(1), e006 (2014)
Koehn, P.: Statistical Machine Translation. Cambridge University Press (2010)
Kominek, J., Schultz, T., Black, A.W.: Synthesizer voice quality of new languages calibrated with mean mel cepstral distortion. In: Proc. of SLTU, pp. 63–68 (2008)
Lopez, A.: Statistical machine translation. ACM Computing Surveys 40(3), 8:1–8:49 (2008)
poliMedia: The polimedia video-lecture repository (2007), http://media.upv.es
Sainz, I., Erro, D., Navas, E., Hernáez, I., Sánchez, J., Saratxaga, I.: Aholab speech synthesizer for albayzin 2012 speech synthesis evaluation. In: Proc. of IberSPEECH, pp. 645–652 (2012)
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent dnn for conversational speech transcription. In: Proc. of ASRU, pp. 24–29 (2011)
Shinoda, K., Watanabe, T.: MDL-based context-dependent subword modeling for speech recognition. Journal of the Acoustical Society of Japan 21(2), 79–86 (2000)
Silvestre-Cerdà, J.A., et al.: Translectures. In: Proc. of IberSPEECH, pp. 345–351 (2012)
TED Ideas worth spreading, http://www.ted.com
The transLectures-UPV Team.: The transLectures-UPV toolkit (TLK), http://translectures.eu/tlk
Toda, T., Black, A.W., Tokuda, K.: Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. In: Proc. of ISCA Speech Synthesis Workshop (2004)
Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from hmm using dynamic features. In: Proc. of ICASSP, vol. 1, pp. 660–663 (1995)
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Transactions on Information and Systems 85(3), 455–464 (2002)
transLectures: D3.1.2: Second report on massive adaptation, http://www.translectures.eu/wp-content/uploads/2014/01/transLectures-D3.1.2-15Nov2013.pdf
Turró, C., Ferrando, M., Busquets, J., Cañero, A.: Polimedia: a system for successful video e-learning. In: Proc. of EUNIS (2009)
Videolectures.NET: Exchange ideas and share knowledge, http://www.videolectures.net
Wu, Y.J., King, S., Tokuda, K.: Cross-lingual speaker adaptation for HMM-based speech synthesis. In: Proc. of ISCSLP, pp. 1–4 (2008)
Yamagishi, J.: An introduction to HMM-based speech synthesis. Tech. rep. Centre for Speech Technology Research (2006), https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/TrajectoryModelling/HTS-Introduction.pdf
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. of Eurospeech, pp. 2347–2350 (1999)
Zen, H., Senior, A.: Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: Proc. of ICASSP, pp. 3872–3876 (2014)
Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proc. of ICASSP, pp. 7962–7966 (2013)
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Piqueras, S., del-Agua, M.A., Giménez, A., Civera, J., Juan, A. (2014). Statistical Text-to-Speech Synthesis of Spanish Subtitles. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-13623-3_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)