DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

Houidhek, Amal; Colotte, Vincent; Mnasri, Zied; Jouvet, Denis

doi:10.1007/978-3-030-00810-9_2

Amal Houidhek^16,17,
Vincent Colotte¹⁷,
Zied Mnasri¹⁶ &
…
Denis Jouvet¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11171))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

610 Accesses
2 Citations

Abstract

This paper investigates the use of deep neural networks (DNN) for Arabic speech synthesis. In parametric speech synthesis, whether HMM-based or DNN-based, each speech segment is described with a set of contextual features. These contextual features correspond to linguistic, phonetic and prosodic information that may affect the pronunciation of the segments. Gemination and vowel quantity (short vowel vs. long vowel) are two particular and important phenomena in Arabic language. Hence, it is worth investigating if those phenomena must be handled by using specific speech units, or if their specification in the contextual features is enough. Consequently four modelling approaches are evaluated by considering geminated consonants (respectively long vowels) either as fully-fledged phoneme units or as the same phoneme as their simple (respectively short) counterparts. Although no significant difference has been observed in previous studies relying on HMM-based modelling, this paper examines these modelling variants in the framework of DNN-based speech synthesis. Listening tests are conducted to evaluate the four modelling approaches, and to assess the performance of DNN-based Arabic speech synthesis with respect to previous HMM-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Article 22 November 2018

Duration modelling and evaluation for Arabic statistical parametric speech synthesis

Article 02 November 2020

$$F_{0}$$ Modeling Using DNN for Arabic Parametric Speech Synthesis

References

Abdel-Hamid, O., Abdou, S.M., Rashwan, M.: Improving Arabic HMM based speech synthesis quality. In: 9th International Conference on Spoken Language Processing, INTERSPEECH 2006, Pittsburgh, Pennsylvania (2006)
Google Scholar
Al-Ani, S.H.: Arabic Phonology: An Acoustical and Physiological Investigation, vol. 61. Walter de Gruyter, Berlin (1970)
Book Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)
Article Google Scholar
Black, A.W., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007. vol. 4, pp. IV–1229. IEEE (2007)
Google Scholar
Fan, Y., Qian, Y., Xie, F.L., Soong, F.K.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: 15th Annual Conference of the International Speech Communication Association, Singapore (2014)
Google Scholar
Halabi, N.: Modern standard Arabic speech corpus. Ph.D. thesis, University of Southampton (2015)
Google Scholar
Houidhek, A., Colotte, V., Mnasri, Z., Jouvet, D., Zangar, I.: Statistical modelling of speech units in HMM-based speech synthesis for Arabic. In: 8th Language & Technology Conference, LTC 2017, Poznan, Poland (2017)
Google Scholar
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, ICASSP 1996, vol. 1, pp. 373–376. IEEE, Atlanta (1996)
Google Scholar
ITU: 800, methods for subjective determination of transmission quality. International Telecommunication Union (1996)
Google Scholar
Jurafsky, D.: Speech and language processing: an introduction to natural language processing. In: Computational Linguistics, and Speech Recognition (2000)
Google Scholar
Kawahara, H., Masuda-Katsuse, I., De Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3), 187–207 (1999)
Article Google Scholar
Khalil, K.M., Adnan, C.: Arabic HMM-based speech synthesis. In: International Conference on Electrical Engineering and Software Applications, ICEESA 2013, pp. 1–5. IEEE, Hammamet (2013)
Google Scholar
Krstulovic, S., Hunecke, A., Schröder, M.: An HMM-based speech synthesis system applied to German and its adaptation to a limited set of expressive football announcements. In: 8th Annual Conference of the International Speech Communication Association, pp. 1897–1900. Citeseer, Antwerp (2007)
Google Scholar
Maguer, S.L., Barbot, N., Boeffard, O.: Evaluation of contextual descriptors for HMM-based speech synthesis in French. In: 8th Workshop on Speech Synthesis, Barcelona, Spain (2013)
Google Scholar
Morise, M., Yokomori, F., Ozawa, K.: WORLD: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE Trans. Inf. Syst. 99(7), 1877–1884 (2016)
Article Google Scholar
Newman, D.: The phonetic status of Arabic within the world’s languages: the uniqueness of the lughat al-daad. Antwerp Pap. Linguist. 100, 65–75 (2002)
Google Scholar
Selouani, S.A., Caelen, J.: Arabic phonetic features recognition using modular connectionist architectures. In: Proceedings of the IEEE 4th Workshop on Interactive Voice Technology for Telecommunications Applications, IVTTA 1998, pp. 155–160. IEEE, Torino (1998)
Google Scholar
Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English. In: IEEE Speech Synthesis Workshop, Santa Monica, CA, USA, pp. 227–230 (2002)
Google Scholar
Watts, O., Henter, G.E., Merritt, T., Wu, Z., King, S.: From HMMS to DNNs: where do the improvements come from? In: International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, pp. 5505–5509. IEEE, Lujiazui (2016)
Google Scholar
Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: Proceedings of the SSW, Sunnyvale, USA (2016)
Google Scholar
Zangar, I., Mnasri, Z., Colotte, V., Jouvet, D., Houidhek, A.: Duration modeling using DNN for Arabic speech synthesis. In: 9th International Conference on Speech Prosody, Poznan, Poland, pp. 597–601 (2018)
Google Scholar
Zen, H.: Deep learning in speech synthesis. In: SSW, Barcelona, Spain, p. 309 (2013)
Google Scholar
Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, pp. 7962–7966. IEEE (2013)
Google Scholar
Zen, H., Tokuda, K., Masuko, T., Kobayasih, T., Kitamura, T.: A hidden semi-Markov model-based speech synthesis system. IEICE Trans. Inf. Syst. 90(5), 825–834 (2007)
Article Google Scholar
Zhang, M., Tao, J., Jia, H., Wang, X.: Improving HMM based speech synthesis by reducing over-smoothing problems. In: 6th International Symposium on Chinese Spoken Language Processing, ISCSLP 2008, pp. 1–4. IEEE, Kunming (2008)
Google Scholar

Download references

Acknowledgements

This research work was conducted under PHC-Utique Program in the framework of CMCU (Comité Mixte de Coopération Universitaire) grant N 15G1405.

Author information

Authors and Affiliations

Electrical Engineering Department, Ecole Nationale d’Ingénieurs de Tunis, University Tunis El Manar, Tunis, Tunisia
Amal Houidhek & Zied Mnasri
Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
Amal Houidhek, Vincent Colotte & Denis Jouvet

Authors

Amal Houidhek
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Colotte
View author publications
You can also search for this author in PubMed Google Scholar
Zied Mnasri
View author publications
You can also search for this author in PubMed Google Scholar
Denis Jouvet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amal Houidhek .

Editor information

Editors and Affiliations

University of Mons, Mons, Belgium
Thierry Dutoit
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
University of Mons, Mons, Belgium
Gueorgui Pironkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Houidhek, A., Colotte, V., Mnasri, Z., Jouvet, D. (2018). DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation. In: Dutoit, T., Martín-Vide, C., Pironkov, G. (eds) Statistical Language and Speech Processing. SLSP 2018. Lecture Notes in Computer Science(), vol 11171. Springer, Cham. https://doi.org/10.1007/978-3-030-00810-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-00810-9_2
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00809-3
Online ISBN: 978-3-030-00810-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

Abstract

Access this chapter

Similar content being viewed by others

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Duration modelling and evaluation for Arabic statistical parametric speech synthesis

$$F_{0}$$ Modeling Using DNN for Arabic Parametric Speech Synthesis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

Abstract

Access this chapter

Similar content being viewed by others

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Duration modelling and evaluation for Arabic statistical parametric speech synthesis

$$F_{0}$$ Modeling Using DNN for Arabic Parametric Speech Synthesis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation