Abstract
The TTS (Text to Speech) synthesis systems have been developed for Indian languages for a few decades. Very little work has been done specifically for the Gujarati Language. The synthesized speech doesn’t sound as similar to human natural speech. Naturalness is the key parameter to achieving a natural-sounding effect in speech synthesis. This paper proposes a method for improving the naturalness of speech synthesis for the Gujarati language using fuzzy logic. The pause (silence) in-between words is also an important feature of a speech. The pause may not be the same after each word in a sentence. It is dependent upon the characteristics of the language and other parameters of the sentence. In the classic architecture of TTS, fuzzy logic is proposed as a new approach to calculate the pause to be applied after each word. The system takes a sentence or paragraph as an input which has the words Importance, Sentence Size, and Position in Sentence derived variables. The fuzzy logic produces the pause in seconds that can be applied after each word. The membership value of derived variables is calculated using straight-line formula. The developed TTS system is tested on a SARS-CoV-2 Covid-19 news dataset in the Gujarati language. The dataset is designed by collecting the news lines from websites of popular news channels in the Gujarati language. The fuzzy logic is proposed in solving the problem of naturalness in synthesized speech and aiming to achieve a more natural-sounding effect in generated speech. This paper describes the implementation of fuzzy logic in achieving naturalness in speech synthesis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brito JA (2009) A fuzzy-genetic approach for the computational modeling of speech articulatory processes. Sci Inform Syst 21(3):269–276. Retrieved from http://www.redalyc.org/articulo.oa?id=427739442009
Chala TD, Guta AC, Asebel MH (2022) Design and development of a text-to-speech synthesizer for Afan Oromo. SN Comput Sci A Springer Nat J 1–7. https://doi.org/10.1007/s42979-022-01306-7
Cuzzocrea A, Mumolo E, Grasso GM (2019) An effective and efficient genetic-fuzzy algorithm for supporting advanced human-machine interfaces in big data settings. MDPI 13(1):1–31. Retrieved from https://www.mdpi.com/1999-4893/13/1/13/htm
Jitca D, Apopei V, Grigoras F (2002) Improved speech synthesis using fuzzy logic. Int J Speech Technol 227–235. https://doi.org/10.1023/A:1020288622651
Lago E, Honijosa MA, Jimenez CJ, Barriga A, Sanchez-Solano S (1997) FPGA implementation of fuzzy controllers. In: XII conference on design of circuits and integrated systems (DCIS’97), pp 715–720
Lakra S, Prasad T, Sharma D, Atrey SH, Sharma A (2012) Application of fuzzy mathematics to. Retrieved from arxiv.org, Cornell University: https://arxiv.org/pdf/1209.4535
Li YA, Han C, Mesgarani N (2022) StyleTTS: a style-based generative model for. https://doi.org/10.48550/arXiv.2205.15439
Manic M, Cvetkovic D, Trascevic M (1999) Intelligibility speech estimation using fuzzy logic inferencing. Sci J Facta Universitatis 1(4):27–37
Massaro DW, Cohen MM (1990) Perception of synthesized audible and visible speech. Psychol Sci 55–63. https://doi.org/10.1111/j.1467-9280.1990.tb00068.x
Mathworks (n.d.) Type-2 fuzzy inference system. Retrieved from https://www.mathworks.com/. https://www.mathworks.com/help/fuzzy/type-2-fuzzy-inference-systems.html
Necibi K (2020) Fuzzy logic applied for pronunciation assessment. Int J Comput Assisted Lang Learn Teach 10(1):60–72. https://doi.org/10.4018/IJCALLT.2020010105
Ode TA, Jobi O, Beaumont AJ, Sylvia Wong S (2006) Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: a fuzzy computational approach. Comput Sci Res Group 20(4):563–588. https://doi.org/10.1016/j.csl.2005.08.006
Rapits S, Carayannis G (2015) Fuzzy logic for rule based formant speech synthesis. Retrieved from https://www.yumpu.com/: https://www.yumpu.com/en/document/view/38938586/fuzzy-logic-for-rule-based-formant-speech-synthesis
Tan X, Chen J, Liu H, Cong J, Zhang C, Liu Y, Liu T-Y (2022) Electrical engineering and systems science > audio and speech processing. Retrieved from https://arxiv-export1.library.cornell.edu/. https://arxiv-export1.library.cornell.edu/abs/2205.04421
Torre Toledano D, RodrĂguez Crespo MA, Escalada Sardina JG (1998) Trying to mimic human segmentation of speech using HMM and fuzzy logic correction rules. Third ESCA/COCOSDA workshop for speech synthesis. Jenolan Caves (Australia), pp 1–7
Williams JB (2005) Prosody in text-to-speech synthesis using fuzzy logic. West Virginia University, Morgantown, West Virginia
Zhang Z (2022) Application of intelligent speech synthesis technology assisted by mobile intelligent terminal in foreign language teaching. Math Probl Eng 2022:1–10. https://doi.org/10.1155/2022/9751094
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shah, B., Sajja, S. (2023). Improving Naturalness in Speech Synthesis Using Fuzzy Logic. In: Senjyu, T., So–In, C., Joshi, A. (eds) Smart Trends in Computing and Communications. SMART 2023. Lecture Notes in Networks and Systems, vol 645. Springer, Singapore. https://doi.org/10.1007/978-981-99-0769-4_22
Download citation
DOI: https://doi.org/10.1007/978-981-99-0769-4_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0768-7
Online ISBN: 978-981-99-0769-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)