Arabic Text-to-Speech Service with Syrian Dialect

Saleh, Hadi; Mohammad, Ali; Jafar, Kamel; Solieman, Monaf; Ahmad, Bashar; Hasan, Samer

doi:10.1007/978-981-99-2969-6_10

Hadi Saleh⁶,
Ali Mohammad⁶,
Kamel Jafar^7,8,
Monaf Solieman^9,10,
Bashar Ahmad¹¹ &
…
Samer Hasan¹¹

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 352))

Included in the following conference series:

International KES Conference on Intelligent Decision Technologies

159 Accesses

Abstract

This research aims to develop an Arabic text-to-speech (TTS) service with Syrian dialect, which is a variety of Arabic spoken in Syria and some neighboring countries, with easy access to it for people with disabilities or difficulty reading Arabic, such as people with visual impairments or learning disabilities. To achieve this goal, we employ two state-of-the-art Machine Learning (ML) approaches: Tactron2 and Transformers, which have achieved impressive results in various natural language processing tasks, including TTS. We compared the two approaches and evaluated the resulting TTS service using subjective measures. Our results show that both approaches can produce high-quality speech in the Syrian dialect, but transformers have the advantage of being more efficient and more flexible in handling different languages and accents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Darwish, K., et al.: A panoramic survey of natural language processing in the Arab world. Commun. ACM 64(4), 72–81 (2021)
Article Google Scholar
Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)
Amazon. Build a unique Brand Voice with Amazon Polly (2021). https://aws.amazon.com/blogs/machine-learning/build-a-unique-brand-voice-with-amazon-polly. Accessed 23 Sept 2021
Google. Custom Voice (Beta) Overview (2021). https://cloud.google.com/text-to-speech/custom-voice/docs. Accessed 23 Sept 2021
Griffin, D., Lim, J.: Signal estimation from modified short-time fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)
Article Google Scholar
van den Oord, A., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Li, N., Liu, S., Liu, Y., Zhao, S., Liu, M.: Neural speech synthesis with transformer network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 6706–6713 (2019)
Google Scholar
Zerrouki, T., Abu Shquier, M.M., Balla, A., Bousbia, N., Sakraoui, I., Boudardara, F.: Adapting espeak to Arabic language: converting Arabic text to speech language using espeak. Int. J. Reason.-Based Intell. Syst. 11(1), 76–89 (2019)
Google Scholar
Zine, O., Meziane, A.: Novel approach for quality enhancement of Arabic text to speech synthesis. In: International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) 2017, pp. 1–6 (2017)
Google Scholar
Zine, O., Meziane, A., Boudchiche, M.: Towards a high-quality lemma-based text to speech system for the Arabic language. In: Lachkar, A., Bouzoubaa, K., Mazroui, A., Hamdani, A., Lekhouaja, A. (eds.) ICALP 2017. CCIS, vol. 782, pp. 53–66. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73500-9_4
Chapter Google Scholar
Abdelali, A., Attia, M., Samih, Y., Darwish, K., Mubarak, H.: Diacritization of maghrebi Arabic sub-dialects, arXiv preprint arXiv:1810.06619 (2018)
Zine, O., Meziane, A., et al.: Text-to-speech technology for Arabic language learners. In: 2018 IEEE 5th International Congress on Information Science and Technology (CiSt), pp. 432–436 (2018)
Google Scholar
Fahmy, F.K., Khalil, M.I., Abbas, H.M.: A transfer learning end-to-end Arabic text-to-speech (TTS) deep architecture. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 266–277 (2020)
Google Scholar
Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783 (2018)
Google Scholar
Karita, S., et al.: A comparative study on transformer vs RNN in speech applications. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 449–456 (2019)
Google Scholar
Ren, Y., et al.: Fastspeech: fast, robust and controllable text to speech. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

HSE University, Moscow, Russia
Hadi Saleh & Ali Mohammad
Russian Technological University - MIREA, Moscow, Russia
Kamel Jafar
Syrian Virtual University - SVU, Damascus, Syria
Kamel Jafar
Tartous University, Tartous, Syria
Monaf Solieman
Al-Andalus University for Medical Sciences, Tartous, Syria
Monaf Solieman
Infostrategic, Dubai, UAE
Bashar Ahmad & Samer Hasan

Authors

Hadi Saleh
View author publications
You can also search for this author in PubMed Google Scholar
Ali Mohammad
View author publications
You can also search for this author in PubMed Google Scholar
Kamel Jafar
View author publications
You can also search for this author in PubMed Google Scholar
Monaf Solieman
View author publications
You can also search for this author in PubMed Google Scholar
Bashar Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Samer Hasan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Mohammad .

Editor information

Editors and Affiliations

Gdynia Maritime University, Gdynia, Poland
Ireneusz Czarnowski
KES International Research, Shoreham-by-sea, UK
R.J. Howlett
KES International, Selby, UK
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saleh, H., Mohammad, A., Jafar, K., Solieman, M., Ahmad, B., Hasan, S. (2023). Arabic Text-to-Speech Service with Syrian Dialect. In: Czarnowski, I., Howlett, R., Jain, L.C. (eds) Intelligent Decision Technologies. KESIDT 2023. Smart Innovation, Systems and Technologies, vol 352. Springer, Singapore. https://doi.org/10.1007/978-981-99-2969-6_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-2969-6_10
Published: 30 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2968-9
Online ISBN: 978-981-99-2969-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Arabic Text-to-Speech Service with Syrian Dialect