Skip to main content

Improving Naturalness in Speech Synthesis Using Fuzzy Logic

  • Conference paper
  • First Online:
Smart Trends in Computing and Communications (SMART 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 645))

  • 294 Accesses

Abstract

The TTS (Text to Speech) synthesis systems have been developed for Indian languages for a few decades. Very little work has been done specifically for the Gujarati Language. The synthesized speech doesn’t sound as similar to human natural speech. Naturalness is the key parameter to achieving a natural-sounding effect in speech synthesis. This paper proposes a method for improving the naturalness of speech synthesis for the Gujarati language using fuzzy logic. The pause (silence) in-between words is also an important feature of a speech. The pause may not be the same after each word in a sentence. It is dependent upon the characteristics of the language and other parameters of the sentence. In the classic architecture of TTS, fuzzy logic is proposed as a new approach to calculate the pause to be applied after each word. The system takes a sentence or paragraph as an input which has the words Importance, Sentence Size, and Position in Sentence derived variables. The fuzzy logic produces the pause in seconds that can be applied after each word. The membership value of derived variables is calculated using straight-line formula. The developed TTS system is tested on a SARS-CoV-2 Covid-19 news dataset in the Gujarati language. The dataset is designed by collecting the news lines from websites of popular news channels in the Gujarati language. The fuzzy logic is proposed in solving the problem of naturalness in synthesized speech and aiming to achieve a more natural-sounding effect in generated speech. This paper describes the implementation of fuzzy logic in achieving naturalness in speech synthesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brito JA (2009) A fuzzy-genetic approach for the computational modeling of speech articulatory processes. Sci Inform Syst 21(3):269–276. Retrieved from http://www.redalyc.org/articulo.oa?id=427739442009

  2. Chala TD, Guta AC, Asebel MH (2022) Design and development of a text-to-speech synthesizer for Afan Oromo. SN Comput Sci A Springer Nat J 1–7. https://doi.org/10.1007/s42979-022-01306-7

  3. Cuzzocrea A, Mumolo E, Grasso GM (2019) An effective and efficient genetic-fuzzy algorithm for supporting advanced human-machine interfaces in big data settings. MDPI 13(1):1–31. Retrieved from https://www.mdpi.com/1999-4893/13/1/13/htm

  4. Jitca D, Apopei V, Grigoras F (2002) Improved speech synthesis using fuzzy logic. Int J Speech Technol 227–235. https://doi.org/10.1023/A:1020288622651

  5. Lago E, Honijosa MA, Jimenez CJ, Barriga A, Sanchez-Solano S (1997) FPGA implementation of fuzzy controllers. In: XII conference on design of circuits and integrated systems (DCIS’97), pp 715–720

    Google Scholar 

  6. Lakra S, Prasad T, Sharma D, Atrey SH, Sharma A (2012) Application of fuzzy mathematics to. Retrieved from arxiv.org, Cornell University: https://arxiv.org/pdf/1209.4535

  7. Li YA, Han C, Mesgarani N (2022) StyleTTS: a style-based generative model for. https://doi.org/10.48550/arXiv.2205.15439

  8. Manic M, Cvetkovic D, Trascevic M (1999) Intelligibility speech estimation using fuzzy logic inferencing. Sci J Facta Universitatis 1(4):27–37

    Google Scholar 

  9. Massaro DW, Cohen MM (1990) Perception of synthesized audible and visible speech. Psychol Sci 55–63. https://doi.org/10.1111/j.1467-9280.1990.tb00068.x

  10. Mathworks (n.d.) Type-2 fuzzy inference system. Retrieved from https://www.mathworks.com/. https://www.mathworks.com/help/fuzzy/type-2-fuzzy-inference-systems.html

  11. Necibi K (2020) Fuzzy logic applied for pronunciation assessment. Int J Comput Assisted Lang Learn Teach 10(1):60–72. https://doi.org/10.4018/IJCALLT.2020010105

    Article  Google Scholar 

  12. Ode TA, Jobi O, Beaumont AJ, Sylvia Wong S (2006) Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: a fuzzy computational approach. Comput Sci Res Group 20(4):563–588. https://doi.org/10.1016/j.csl.2005.08.006

  13. Rapits S, Carayannis G (2015) Fuzzy logic for rule based formant speech synthesis. Retrieved from https://www.yumpu.com/: https://www.yumpu.com/en/document/view/38938586/fuzzy-logic-for-rule-based-formant-speech-synthesis

  14. Tan X, Chen J, Liu H, Cong J, Zhang C, Liu Y, Liu T-Y (2022) Electrical engineering and systems science > audio and speech processing. Retrieved from https://arxiv-export1.library.cornell.edu/. https://arxiv-export1.library.cornell.edu/abs/2205.04421

  15. Torre Toledano D, Rodríguez Crespo MA, Escalada Sardina JG (1998) Trying to mimic human segmentation of speech using HMM and fuzzy logic correction rules. Third ESCA/COCOSDA workshop for speech synthesis. Jenolan Caves (Australia), pp 1–7

    Google Scholar 

  16. Williams JB (2005) Prosody in text-to-speech synthesis using fuzzy logic. West Virginia University, Morgantown, West Virginia

    Book  Google Scholar 

  17. Zhang Z (2022) Application of intelligent speech synthesis technology assisted by mobile intelligent terminal in foreign language teaching. Math Probl Eng 2022:1–10. https://doi.org/10.1155/2022/9751094

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Gargi Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, B., Sajja, S. (2023). Improving Naturalness in Speech Synthesis Using Fuzzy Logic. In: Senjyu, T., So–In, C., Joshi, A. (eds) Smart Trends in Computing and Communications. SMART 2023. Lecture Notes in Networks and Systems, vol 645. Springer, Singapore. https://doi.org/10.1007/978-981-99-0769-4_22

Download citation

Publish with us

Policies and ethics