A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus

Vijayalakshmi, P.; Ramani, B.; Jeeva, M. P. Actlin; Nagarajan, T.

doi:10.1007/s00034-017-0659-6

A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus

Published: 18 September 2017

Volume 37, pages 2142–2163, (2018)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

P. Vijayalakshmi¹,
B. Ramani¹,
M. P. Actlin Jeeva¹ &
…
T. Nagarajan¹

353 Accesses
5 Citations
Explore all metrics

Abstract

A multilingual synthesizer synthesizes speech, for any given monolingual or mixed-language text, that is intelligible to human listeners. The necessity for such synthesizer arises in a country like India, where multiple languages coexist. For the current work, multilingual synthesizers are developed using HMM-based speech synthesis technique. However, for a mixed-language text, the synthesized speech shows speaker switching at language switching points which is quite annoying to the listener. This is due to the fact that, speech data used for training is collected for each language from a different (native) speaker. To overcome the speaker switching at language switching points, a polyglot speech synthesizer is developed using polyglot speech corpus (all the speech data in a single speaker’s voice). The polyglot speech corpus is obtained using cross-lingual voice conversion (CLVC) technique. In the current work, polyglot synthesizer is developed for five languages namely Tamil, Telugu, Hindi, Malayalam and Indian English. The regional Indian languages considered are acoustically similar, to certain extent, and hence, common phoneset and question set is used to build the synthesizer. Experiments are carried out by developing various bilingual polyglot synthesizers to choose the language (thereby the speaker) that can be considered as target for polyglot synthesizer. The performance of the synthesizers is evaluated subjectively for speaker/language switching using perceptual test and quality using mean opinion score. Speaker identity is evaluated objectively using a GMM-based speaker identification system. Further, the polyglot synthesizer developed using polyglot speech corpus is compared with the adaptation-based polyglot synthesizer, in terms of quality of the synthesized speech and amount of data required for adaptation and voice conversion. It is observed that the performance of the polyglot synthesizer developed using polyglot speech corpus obtained from CLVC technique is better or almost similar to that of the adaptation-based polyglot synthesizer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multi-level GMM-Based Cross-Lingual Voice Conversion Using Language-Specific Mixture Weights for Polyglot Synthesis

Article 10 July 2015

Improved HMM-Based Mixed-Language (Telugu–Hindi) Polyglot Speech Synthesis

Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis

Article 06 June 2014

Notes

Various synthesizers were developed by varying the amount of data as 1, 2, 3, 4, 5 and 12 h. Reasonable quality of speech is obtained with 30 min of data itself. For the current work, 1 h of speech data is considered.

References

L. Badino, C. Barolo, S. Quazza, Language independent phoneme mapping for foreign TTS, in ISCA Workshop on Speech Synthesis, pp. 217–218 (2004)
A.W. Black, K.A. Lenzo, Multilingual text-to-speech synthesis, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. III-761–III-764 (2004)
N. Campbell, Foreign language speech synthesis, in The Third ESCA/COCOSDA Workshop on Speech, Synthesis, pp. 177–180 (1998)
N. Campbell, Talking foreign—concatenative speech synthesis and the language barrier, in EUROSPEECH, pp. 337–340 (2001)
C.P. Chen, Y.C. Huang, C.H. Wu, K.D. Lee, Polyglot speech synthesis based on cross-lingual frame selection using auditory and articulatory features. IEEE/ ACM Trans. Audio Speech Lang. Process. 22(10), 1558–1570 (2014)
Article Google Scholar
A.J. Hunt, A.W. Black, Unit selection in a concatenative speech synthesis system using a large speech database. Int. Conf. Acoust. Speech Signal Process. (ICASSP) 1, 373–376 (1996)
Google Scholar
J. Latorre, K. Iwano, S. Furui, New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer. Speech Commun. 48(10), 1227–1242 (2006)
Article Google Scholar
A.F. Machado, M. Quieroz, Voice conversion: a critical survey, in Sound and Music Computing, pp. 291–298 (2010)
M. Mashimo, T. Toda, K. Shikano, N. Campbell, Evaluation of cross-language voice conversion based on GMM and STRAIGHT, in EUROSPEECH, pp. 361–364 (2001)
M. Moberg, K. Parssinen, J. Iso-Sipila, Cross-lingual phoneme mapping for multilingual synthesis systems, in INTERSPEECH, pp. 1029–1032 (2004)
B. Mobius, J. Schroeter, J. Van Santen, R. Sproat, J. Olive, Recent advances in multilingual text-to-speech synthesis, in Fortschritte der Akustik - DAGA 96 (DEGA, Oldenburg, 1996), pp. 82–85
Y. Qian, H. Liang, F.K. Soong, A Cross-language state sharing and mapping approach to bilingual (Mandarin–English) TTS. IEEE Trans. Audio Speech Lang. Process. 17(6), 1231–1239 (2009)
Article Google Scholar
B. Ramani, S. Lilly Christina, G. Anushiya Rachel, V. Sherlin Solomi, M.K. Nandwana, A. Prakash, A. Shanmugam, R. Krishnan, S. Kishore, K. Samudravijaya, P. Vijayalakshmi, T. Nagarajan, H.A. Murthy, A common attribute based unified HTS framework for speech synthesis in Indian languages, in ISCA Workshop on Speech Synthesis, pp. 291–296 (2013)
B. Ramani, V. Sherlin Solomi, G. Anushiya Rachel, S. Lilly Christina, P. Vijayalakshmi, T. Nagarajan, H.A. Murthy, Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil, in National Conference on Communications (NCC), pp. 1–5 (2013)
B. Ramani, M.P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan, A multi-level GMM-based cross-lingual voice conversion using language specific mixture weights for polyglot synthesis. Circuits Syst. Signal Process. 35(4), 1283–1311 (2016)
Article MathSciNet Google Scholar
B. Sharma, S.R.M. Prasanna, Polyglot speech synthesis: a review. IETE Tech. Rev. 34(4), 366–389 (2017)
Article Google Scholar
V. Sherlin Solomi, S. Lilly Christina, G. Anushiya Rachel, B. Ramani, P. Vijayalakshmi, T. Nagarajan, Analysis on acoustic similarities between Tamil and English phonemes using product of likelihood-Gaussians for an HMM-based mixed-language synthesizer, in International Conference Oriental COCOSDA, pp. 1–5 (2013)
V. Sherlin Solomi, M.S. Saranya, G. Anushiya Rachel, P. Vijayalakshmi, T. Nagarajan, Performance comparison of KLD and PoG metrics for finding the acoustic similarity between phonemes for the development of a polyglot synthesizer, in IEEE TENCON, pp. 1–4 (2014)
Y. Stylianou, O. Cappe, E. Moulines, Statistical methods for voice quality transformation, in EUROSPEECH, pp. 447–450 (1995)
D. Sundermann, H. Hoge, A. Bonafonte, H. Ney, A. Black, S. Narayanan, Text-independent voice conversion based on unit selection. Int. Conf. Acoust. Speech Signal Process. (ICASSP) 1, I81–I84 (2006)
Google Scholar
Y. Tabet, M. Boughazi, Speech synthesis techniques—a survey, in 7th International Workshop on Systems, Signal Processing and Their Applications (WOSSPA), pp. 67–70 (2011)
Technology Development for Indian Languages Programme, DeitY, http://tdil.mit.gov.in/AboutUs.aspx (2016). Accessed on 30 June 2017
T. Toda, H. Saruwatari, K. Shikano, Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. Int. Conf. Acoust. Speech Signal Process. (ICASSP) 2, 841–844 (2001)
Google Scholar
T. Toda, A.W. Black, K. Tokuda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)
Article Google Scholar
C. Traber, K. Huber, K. Nedir, B. Pfister, E. Keller, B. Zellner, From multilingual to polyglot speech synthesis, in EUROSPEECH, pp. 835–838 (1999)
H. Valbret, E. Moulines, J.P. Tubach, Voice transformation using PSOLA technique. Int. Conf. Acoust. Speech Signal Process. (ICASSP) 1, 145–148 (1992)
Google Scholar
S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (for HTK Version 3.4) (Cambridge University Engineering Department, Cambridge, 2002)
Google Scholar
H. Zen, K. Tokuda, A.W. Black, Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Article Google Scholar
M. Zhang, J. Tao, J. Tian, X. Wang, Text-independent voice conversion based on state mapped codebook, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4605–4608 (2008)

Download references

Author information

Authors and Affiliations

SSN College of Engineering, Old Mahabalipuram Road, Chennai, India
P. Vijayalakshmi, B. Ramani, M. P. Actlin Jeeva & T. Nagarajan

Authors

P. Vijayalakshmi
View author publications
You can also search for this author in PubMed Google Scholar
B. Ramani
View author publications
You can also search for this author in PubMed Google Scholar
M. P. Actlin Jeeva
View author publications
You can also search for this author in PubMed Google Scholar
T. Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Vijayalakshmi.

Additional information

The authors would like to thank the Department of Information Technology, Ministry of Communication and Technology, Government of India, for funding the project, “Development of Text-to-Speech synthesis for Indian Languages Phase II”, Ref. No. 11(7)/2011- HCC(TDIL).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vijayalakshmi, P., Ramani, B., Jeeva, M.P.A. et al. A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus. Circuits Syst Signal Process 37, 2142–2163 (2018). https://doi.org/10.1007/s00034-017-0659-6

Download citation

Received: 30 June 2016
Revised: 31 August 2017
Accepted: 01 September 2017
Published: 18 September 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s00034-017-0659-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus

Abstract

Access this article

Similar content being viewed by others

A Multi-level GMM-Based Cross-Lingual Voice Conversion Using Language-Specific Mixture Weights for Polyglot Synthesis

Improved HMM-Based Mixed-Language (Telugu–Hindi) Polyglot Speech Synthesis

Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus

Abstract

Access this article

Similar content being viewed by others

A Multi-level GMM-Based Cross-Lingual Voice Conversion Using Language-Specific Mixture Weights for Polyglot Synthesis

Improved HMM-Based Mixed-Language (Telugu–Hindi) Polyglot Speech Synthesis

Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation