Abstract
This research aims to develop an Arabic text-to-speech (TTS) service with Syrian dialect, which is a variety of Arabic spoken in Syria and some neighboring countries, with easy access to it for people with disabilities or difficulty reading Arabic, such as people with visual impairments or learning disabilities. To achieve this goal, we employ two state-of-the-art Machine Learning (ML) approaches: Tactron2 and Transformers, which have achieved impressive results in various natural language processing tasks, including TTS. We compared the two approaches and evaluated the resulting TTS service using subjective measures. Our results show that both approaches can produce high-quality speech in the Syrian dialect, but transformers have the advantage of being more efficient and more flexible in handling different languages and accents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Darwish, K., et al.: A panoramic survey of natural language processing in the Arab world. Commun. ACM 64(4), 72–81 (2021)
Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)
Amazon. Build a unique Brand Voice with Amazon Polly (2021). https://aws.amazon.com/blogs/machine-learning/build-a-unique-brand-voice-with-amazon-polly. Accessed 23 Sept 2021
Google. Custom Voice (Beta) Overview (2021). https://cloud.google.com/text-to-speech/custom-voice/docs. Accessed 23 Sept 2021
Griffin, D., Lim, J.: Signal estimation from modified short-time fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)
van den Oord, A., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Li, N., Liu, S., Liu, Y., Zhao, S., Liu, M.: Neural speech synthesis with transformer network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 6706–6713 (2019)
Zerrouki, T., Abu Shquier, M.M., Balla, A., Bousbia, N., Sakraoui, I., Boudardara, F.: Adapting espeak to Arabic language: converting Arabic text to speech language using espeak. Int. J. Reason.-Based Intell. Syst. 11(1), 76–89 (2019)
Zine, O., Meziane, A.: Novel approach for quality enhancement of Arabic text to speech synthesis. In: International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) 2017, pp. 1–6 (2017)
Zine, O., Meziane, A., Boudchiche, M.: Towards a high-quality lemma-based text to speech system for the Arabic language. In: Lachkar, A., Bouzoubaa, K., Mazroui, A., Hamdani, A., Lekhouaja, A. (eds.) ICALP 2017. CCIS, vol. 782, pp. 53–66. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73500-9_4
Abdelali, A., Attia, M., Samih, Y., Darwish, K., Mubarak, H.: Diacritization of maghrebi Arabic sub-dialects, arXiv preprint arXiv:1810.06619 (2018)
Zine, O., Meziane, A., et al.: Text-to-speech technology for Arabic language learners. In: 2018 IEEE 5th International Congress on Information Science and Technology (CiSt), pp. 432–436 (2018)
Fahmy, F.K., Khalil, M.I., Abbas, H.M.: A transfer learning end-to-end Arabic text-to-speech (TTS) deep architecture. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 266–277 (2020)
Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783 (2018)
Karita, S., et al.: A comparative study on transformer vs RNN in speech applications. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 449–456 (2019)
Ren, Y., et al.: Fastspeech: fast, robust and controllable text to speech. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saleh, H., Mohammad, A., Jafar, K., Solieman, M., Ahmad, B., Hasan, S. (2023). Arabic Text-to-Speech Service with Syrian Dialect. In: Czarnowski, I., Howlett, R., Jain, L.C. (eds) Intelligent Decision Technologies. KESIDT 2023. Smart Innovation, Systems and Technologies, vol 352. Springer, Singapore. https://doi.org/10.1007/978-981-99-2969-6_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-2969-6_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2968-9
Online ISBN: 978-981-99-2969-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)