Hidden Markov Models for Artificial Voice Production and Accent Modification

  • Marvin Coto-JiménezEmail author
  • John Goddard-Close
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10022)


In this paper, we consider the problem of accent modification between Castilian Spanish and Mexican Spanish. This is an interesting application area for tasks such as the automatic dubbing of pictures and videos with different accents. We initially apply statistical parametric speech synthesis to produce two artificial voices, each with the required accent, using Hidden Markov Models (HMM). This type of speech synthesis technique is capable of learning and reproducing certain essential parameters of the voice in question. We then propose a way to adapt these parameters between the two accents. The prosodic differences in the voices are modeled and transformed directly using this adaptation method. In order to produce the voices initially, we use a speech database that was developed by professional actors from Spain and Mexico. The results obtained from subjective and objective tests are promising, and the method is essentially applicable to accent modification between other Spanish accents.


HMM Speech synthesis Accents Castilian Spanish Mexican Spanish 



This work was supported by the SEP and CONACyT under the Program SEP-CONACyT, CB-2012-01, No.182432, in Mexico, as well as the University of Costa Rica in Costa Rica. We also want to thank ELRA for supplying the original Emotional speech synthesis database.


  1. 1.
    Hermansky, H.: Should recognizers have ears? Speech Commun. 25(1), 3–27 (1998)CrossRefGoogle Scholar
  2. 2.
    Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden markov models. Proc. IEEE 101(5), 1234–1252 (2013)CrossRefGoogle Scholar
  3. 3.
    Lazaridis, A., Khoury, E., Goldman, J.-P., Avanzi, M., Marcel, S., Garner, P.N.: Swiss french regional accent identification. In: Proceedings of Odyssey (2014)Google Scholar
  4. 4.
    Woehrling, C., de Mareüil, P.B.: Identification of regional accents in french: perception and categorization. In: INTERSPEECH (2006)Google Scholar
  5. 5.
    Leemann, A.: Comparative analysis of voice fundamental frequency behavior of four swiss german dialects: Elektronische daten, Ph.D. dissertation, Selbstverlag (2009)Google Scholar
  6. 6.
    Beckman, M., Daz-Campos, M., McGory, J.T., Morgan, T.A.: Intonation across spanish, in the tones and break indices framework. Probus 14(1), 9–36 (2002)CrossRefGoogle Scholar
  7. 7.
    Kawahara, H.: Straight, exploitation of the other aspect of vocoder: perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol. 27(6), 349–353 (2006)CrossRefGoogle Scholar
  8. 8.
    Wu, Y.-J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in hmm-based speech synthesis. In: Interspeech, pp. 528–531 (2009)Google Scholar
  9. 9.
    Wu, Y.-J., King, S., Tokuda, K.: Cross-lingual speaker adaptation for HMM-based speech synthesis. In: 6th International Symposium on Chinese Spoken Language Processing, ISCSLP 2008, p. 14. IEEE (2008)Google Scholar
  10. 10.
    Liang, H., Dines, J., Saheer, L.: A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4598–4601. IEEE (2010)Google Scholar
  11. 11.
    Oura, K., Tokuda, K., Yamagishi, J., King, S., Wester, M.: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4594–4597. IEEE (2010)Google Scholar
  12. 12.
    Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K.: Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis. In: 8th ISCA Speech Synthesis Workshop, pp. 317–322 (2013)Google Scholar
  13. 13.
    Nagahama, D., Nose, T., Koriyama, T., Kobayashi, T.: Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)Google Scholar
  14. 14.
    Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRefGoogle Scholar
  15. 15.
    Acero, A., Deng, L., Kristjansson, T.T., Zhang, J.: HMM adaptation using vector taylor series for noisy speech recognition. In: INTERSPEECH, pp. 869–872 (2000)Google Scholar
  16. 16.
    Motlicek, P., Garner, P.N., Kim, N., Cho, J.: Accent adaptation using subspace gaussian mixture models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7170–7174. IEEE (2013)Google Scholar
  17. 17.
    Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP01), vol. 2, pp. 805–808. IEEE (2001)Google Scholar
  18. 18.
    Liang, H., Dines, J.: An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation. Technical report, Idiap (2010)Google Scholar
  19. 19.
    Llisterri, J., Mariño, J.B.: Spanish adaptation of sampa and automatic phonetic transcription. Reporte técnico del ESPRIT PROJECT, vol. 6819 (1993)Google Scholar
  20. 20.
    Caballero, M., Moreno, A., Nogueiras, A.: Data driven multidialectal phone set for spanish dialects. In: INTERSPEECH. Citeseer (2004)Google Scholar
  21. 21.
    Elra catalogue: Emotional speech synthesis database. Accessed 30 Nov 2014
  22. 22.
    HTS: HMM speech synthesis system. Accessed 20 Jan 2015
  23. 23.
    Yan, Q., Vaseghi, S., Rentzos, D., Ho, C.-H.: Analysis by synthesis of acoustic correlates of british, australian and american accents. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, p. I637. IEEE (2004)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.University of Costa RicaSan JoséCosta Rica
  2. 2.Metropolitan Autonomous UniversityMéxicoMexico

Personalised recommendations