Skip to main content

Advertisement

Log in

An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Speech recognition is one of the major research regions these days under speech processing. This paper depends on developing a whole process that takes the input as the text file from the user and provides the output in speech form. This paper proposes a text to speech synthesizer for the Hindi language depends on the coefficients of Mel-frequency cepstral (MFCC) features are extracted to the production and linguistic constraints proposed for modeling the parameters such as intonation, duration, and syllable intensities. The features extracted from the MFCC features are phrasing, fundamental frequency, duration, etc. Neural network models are discovered to confine the features as mentioned earlier, employing MFCC. The performance of the proposed ALO-ANN is computed utilizing objective measures such as prediction error (η), standard deviation (σ), and linear correlation coefficient (χ). The accuracy predicted of the proposed ALO-ANN models is high when compared with other models such as DNN and ANN. The prediction accuracy is high for ALO-ANN models when compared with other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Abhash D, Sarmah P, Samudravijaya K, Prasanna SRM (2019) Development of Assamese text-to-speech system using deep neural network. In 2019 National Conference on Communications (NCC), pp. 1–5. IEEE

  2. Absa AH, Deriche M, Elshafei-Ahmed M, Elhadj YM, Juang BH (2018) A hybrid unsupervised segmentation algorithm for Arabic speech using feature fusion and a genetic algorithm (July 2018). IEEE Access 6:43157–43169

    Article  Google Scholar 

  3. Afzal H Md, Memon S, Gregory MA (2010) A novel approach for MFCC features extraction, In 2010 4th International Conference on Signal Processing and Communication Systems, pp. 1–5. IEEE, 2010.

  4. Ansal V (2020) ALO-optimized artificial neural network-controlled dynamic voltage restorer for compensation of voltage issues in distribution system. Soft Comput 24(2):1171–1184

    Article  Google Scholar 

  5. Archana B, Dev A, Kumari R, Agrawal SS (2016) Labelling of Hindi speech. IETE J Res 62ript to speech conversion for Hindi la(2):146–153

    Google Scholar 

  6. Athiyaa N, Grasha Jacob (2019) Spoken language identification system using MFCC features and Gaussian mixture model for tamil and telugu languages 06(04): 4243–4248

  7. Baby A, Prakash JJ, Vignesh SR, Murthy HA (2017) Deep learning techniques in tandem with signal processing cues for phonetic segmentation for text to speech synthesis in indian languages. In INTERSPEECH (pp. 3817–3821)

  8. Begum A, Askari SM, Sharma U (2019) Text-to-speech synthesis system for Mymensinghiya dialect of Bangla language. In: Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 291–303

    Chapter  Google Scholar 

  9. Gilbert AC, Wolpert M, Saito H, Kousaie S, Itzhak I, Baum SR (2019) Adaptive and selective production of syllable duration and fundamental frequency as word segmentation cues by French-English bilinguals. J Acoustical Soc America 146(6):4255–4272

    Article  Google Scholar 

  10. Gowthul Alam MM, Baulkani S (2019) Geometric structure information based multi-objective function to increase fuzzy clustering performance with artificial and real-life data. Soft Computing 23(4):1079–1098

  11. Han T, Liu Q, Zhang L, Tan ACC (2019) Fault feature extraction of low speed roller bearing based on Teager energy operator and CEEMD. Measurement 138:400–408

    Article  Google Scholar 

  12. Jalin AF, Jayakumari J (2017) Text to speech synthesis system for tamil using HMM. In: 2017 IEEE International Conference on Circuits and Systems (ICCS), pp. 447–451. IEEE

    Google Scholar 

  13. Javed M, Baig MM, Qazi SA (2019) Unsupervised phonetic segmentation of classical Arabic speech using forward and inverse characteristics of the vocal tract. Arab J Sci Eng:1–7

  14. Joshi MM, Agarwal S, Shaikh S, Pitale P (2019) Text to speech synthesis for Hindi language using festival framework. International Research Journal of Engineering and Technology (IRJET) 06(04):630–632

    Google Scholar 

  15. Li Y, Tao J, Hirose K, Xu X, Lai W (2015) Hierarchical stress modeling and generation in mandarin for expressive text-to-speech. Speech Comm 72:59–73

    Article  Google Scholar 

  16. Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98

    Article  Google Scholar 

  17. Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18

    Article  Google Scholar 

  18. Patil HA, Viswanath S (2011) Effectiveness of Teager energy operator for epoch detection from speech signals. Int J Speech Technol 14(4):321

    Article  Google Scholar 

  19. Rajendran V, Kumar GB (2019) A robust syllable centric pronunciation model for Tamil text to speech synthesizer. IETE J Res 65(5):601–612

    Article  Google Scholar 

  20. Rajisha TM, Sunija AP, Riyas KS (2016) Performance analysis of Malayalam language speech emotion recognition system using ANN/SVM. Procedia Technology 24:1097–1104

    Article  Google Scholar 

  21. Ramani B, Actlin Jeeva MP, Vijayalakshmi P, Nagarajan T (2016) A multi-level GMM-based cross-lingual voice conversion using language-specific mixture weights for polyglot synthesis. Circuits, Systems, and Signal Processing 35(4):1283–1311

    Article  MathSciNet  Google Scholar 

  22. Ramteke GD, Ramteke RJ (2017) Efficient model for numerical text-to-speech synthesis system in Marathi, Hindi and English languages. International Journal of Image, Graphics & Signal Processing 9(3):1–13

    Article  Google Scholar 

  23. Rathod Prajakta S (2011) Script to speech conversion for Hindi language by using artificial neural network. In 2011 Nirma University International Conference on Engineering, pp. 1–5. IEEE

  24. Rebai I, Ben Ayed Y (2015) Text-to-speech synthesis system with Arabic diacritic recognition system. Comput Speech Lang 34(1):43–60

    Article  Google Scholar 

  25. Reddy VR, Rao KS (2013) Two-stage intonation modeling using feed forward neural networks for syllable based text-to-speech synthesis. Comput Speech Lang 27(5):1105–1126

    Article  Google Scholar 

  26. Reddy VR, Rao KS (2016) Prosody modeling for syllable based text-to-speech synthesis using feed forward neural networks. Neurocomputing. 171:1323–1334

    Article  Google Scholar 

  27. Rejeesh MR (2019) Interest point based face recognition using adaptive neuro fuzzy inference system. Multimedia Tools and Applications 78(16):22691–22710

  28. Rejeesh MR, Thejaswini P (2020) MOTF: Multi-objective Optimal Trilateral Filtering based partial moving frame algorithm for image denoising. Multimedia Tools and Applications 79(37-38):28411–28430

  29. Ribeiro MS, Watts O, Yamagishi J (2016) Syllable-level representations of Suprasegmental features for DNN-based text-to-speech synthesis. In: INTERSPEECH, pp 3186–3190

    Chapter  Google Scholar 

  30. Sangramsing K, Gawali B (2015) The Marathi text-to-speech synthesizer based on artificial neural networks. Int Res J Eng Technol (IRJET), 02 (08): 948–953, 2015

  31. Shahzada SK, Habib T, Mumtaz B, Adeeba F, Haq E u (2016) Subjective testing of Urdu text-to-speech (TTS) system. Language & Technology:101–108

  32. Sharma P, Abrol V, Sao AK (2018) Reducing footprint of unit selection based text-to-speech system using compressed sensing and sparse representation. Comput Speech Lang 52:191–208

    Article  Google Scholar 

  33. Shen J, Shepherd J, Ngu AHH (2006) Towards effective content-based music retrieval with multiple acoustic feature combination. IEEE Transactions on Multimedia 8(6):1179–1189

    Article  Google Scholar 

  34. Shreekanth T, Udayashankara V, Chandrika M (2015) Duration modelling using neural networks for hindi TTS system considering position of syllable in a word. Procedia Computer Science 46:60–67

    Article  Google Scholar 

  35. Subhashini J, Kumar CM (2019) An algorithm to identify syllable from a visual speech recognition system. Wirel Pers Commun 107(4):2105–2121

    Article  Google Scholar 

  36. Sundararaj V (2019) Optimal task assignment in mobile cloud computing by queue based ant-bee algorithm. Wirel Pers Commun 104(1):173–197

    Article  Google Scholar 

  37. Sundararaj V (2019) Optimised denoising scheme via opposition-based self-adaptive learning PSO algorithm for wavelet-based ECG signal noise reduction. Int J Biomed Eng Technol 31(4):325

    Article  Google Scholar 

  38. Sundararaj V, Anoop V, Dixit P, Arjaria A, Chourasia U, Bhambri P, MR R, Sundararaj R (2020) CCGPA-MPPT: Cauchy preferential crossover-based global pollination algorithm for MPPT in photovoltaic system. Prog Photovolt Res Appl 28(11):1128–1145

    Article  Google Scholar 

  39. Tripathi K, Sarkar P, Sreenivasa Rao K (2016) Sentence based discourse classification for hindi story text-to-speech (TTS) system. In Proceedings of the 13th International Conference on Natural Language Processing, pp. 46–54.

  40. Vinu S (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126

    Google Scholar 

  41. Vinu S, Muthukumar S, Kumar RS (2018) An optimal cluster formation based energy efficient dynamic scheduling hybrid MAC protocol for heavy traffic load in wireless sensor networks. Comput Secur 77:277–288

    Article  Google Scholar 

  42. Zhao Xufang, Douglas O'Shaughnessy (2008) A new hybrid approach for automatic speech signal segmentation using silence signal detection, energy convex hull, and spectral variation. In 2008 Canadian Conference on Electrical and Computer Engineering, pp. 000145–000148. IEEE, 2008.

  43. Yadav J, Sreenivasa Rao K (2016) Prosodic mapping using neural networks for emotion conversion in Hindi language. Circuits, Systems, and Signal Processing 35(1):139–162

    Article  MathSciNet  Google Scholar 

  44. Zhou S, Jia J, Zhang L, Wang Y, Chen W, Meng F, Fei Y, Shen J (2020) Inferring emphasis for real voice data: an attentive multimodal neural network approach. In: International conference on multimedia modeling. Springer, Cham, pp 52–62

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruchika Kumari.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumari, R., Dev, A. & Kumar, A. An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language. Multimed Tools Appl 80, 24669–24695 (2021). https://doi.org/10.1007/s11042-021-10771-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10771-w

Keywords

Navigation