Skip to main content

Advertisement

Log in

An efficient model for text-to-speech synthesis in Indian languages

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Speech Synthesis deals with artificial production of speech and a text-to-speech system (TTS) in this aspect converts natural language text into a spoken waveform or speech. There are a number of TTS systems available today for different languages, still Indian languages are lacking behind in providing high quality synthesized speech. Even though almost all Indian languages share a common phonetic base, till now a generic model for all official Indian languages is not available. Also, the existing speech synthesis techniques are found to be less effective in the scripting format of Indian languages. Considering the intelligibility of speech production and increasing memory requirement in concatenative speech synthesis technique, in this paper, we have proposed an efficient technique for text-to-speech synthesis in Indian languages. The model uses a pronunciation rule based waveform concatenation approach, to produce intelligible speech minimizing the memory requirement. To show the effectiveness of the technique, at an initial step of implementation the Odia (formerly Oriya), Bengali and Hindi languages are considered. The model is being compared with the existing technique and the results of our experiments show our technique outperforms the existing technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  • Alías, F., Sevillano, X., Socoró, J. C., & Gonzalvo, X. (2008). Towards high-quality next-generation text-to-speech synthesis: A multidomain approach by automatic domain classification. IEEE Transactions on Audio, Speech and Language Processing, 16, 1340–1354.

    Article  Google Scholar 

  • Alwan, A., Narayanan, S., Strope, B., & Shen, A. (1995). Speech production and perception models and their applications to synthesis, recognition, and coding. In ISSSE ’95, Proceedings of URSI International Symposium on Signals, Systems, and Electronics (pp. 367–372).

  • Bates, M. (1975). The use of syntax in a speech understanding system. IEEE Transactions on Acoustics, Speech and Signal Processing, 23, 112–117.

    Article  Google Scholar 

  • Bhakat, R. K., Narendra, N. P., & Rao, K. S. (2013). Corpus based emotional speech synthesis in hindi, pattern recognition and machine intelligence. Lecture Notes in Computer Science, 8251, 390–395.

    Article  Google Scholar 

  • Buza, O., Toderean, G., Nica, A., Caruntu, A. (2006). Voice signal processing for speech synthesis. In Proceedings of the IEEE International Conference on Automation, Quality and Testing, Robotics (Vol. 2, pp. 360–364).

  • Feng, J., Ramabhadran, B., Hansel, J., & Williams, J. D. (2012). Trends in speech and language processing. IEEE Signal Processing Magazine, 29, 177–179.

    Article  Google Scholar 

  • http://dhvani.sourceforge.net. Accessed 7 August 2014.

  • http://www.unicode.org/. Accessed 7 August 2014.

  • http://tdil.mit.gov.in/. Accessed 7 August 2014.

  • https://ccrma.stanford.edu/courses/422/projects/WaveFormat/. Accessed 7th August 2014.

  • Liang, M.S., Yang, R.C., Chiang, Y.C., Lyu, D.C., & Lyu, R.Y. (2004). A Taiwanese text-to-speech system with applications to language learning. In Proceedings of the IEEE International Conference on Advanced Learning Technologies (pp. 91–95).

  • Manning, A., & Amare, N. (2007). A simpler approach to grammar: (Re)engineering parts-of-speech instruction to assist EFL/ESP students. In Proceedings of IEEE International Professional Communication Conference (pp. 1–9).

  • Narendra, N. P., & Rao, K. S. (2013). Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis. Applied Soft Computing, 13, 773–781.

    Article  Google Scholar 

  • Narendra, N. P., Rao, K. S., Ghosh, K., Vempada, R. R., & Maity, S. (2011). Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology, 14, 167–181.

    Article  Google Scholar 

  • Nebbia, L., Quazza, S., & Luigi Salza, P. (1998). A specialised speech synthesis technique for application to automatic reverse directory service. In 4th Workshop Interactive Voice Technology for Telecommunications (pp. 223–228).

  • Nukaga, N., Kamoshida, R., Nagamatsu, K., & Kitahara, Y. (2006). Scalable implementation of unit selection based text-to-speech system for embedded solutions. In ICASSP, IEEE.

  • Rafieee, M.S., Jafari, S., Ahmadi, H.S., & Jafari, M. (2011). Considerations to spoken language recognition for text-to-speech applications. In ICCMS’13th Proceedings (pp. 330–309).

  • Raghavendra, E.V., Desai, S., Yegnanarayana, B., Black, A.W., & Prahallad, K. (2008). Global syllable set for building speech synthesis in Indian languages. In IEEE Workshop on Spoken Language Technology (pp. 49–52).

  • Raj, A. K., Sarkar, T., Pammi, S.C., Yuvaraj, S., Bansal, M., Prahallad, K., & Black, A.W. (2007). Text processing for text-to-speech systems in Indian languages. In ISCA’ 6th Workshop on Speech Synthesis.

  • Rama, J., Ramakrishnan, A. G., Muralishankar, R., & Prathibha, R. (2002). A complete text-to-speech synthesis system in Tamil. In WSS’ Proceedings (pp. 191–194).

  • Ramani, B., Actlin Jeeva, M.P., Vijayalakshmi, P., & Nagarajan, T. (2013). Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages. In TENCON’s 10th Proceedings (pp. 1–4).

  • Sak, H., Saraclar, M., & Guungoor, T. (2010). Morphology-based and sub-word language modeling for Turkish speech recognition. In ICASSP Proceedings (pp. 5402–5405).

  • Tabet, Y., & Boughazi, M. (2011) Speech synthesis techniques. A survey. In Proceedings of the 7th IEEE International Workshop on System, Signal Processing and their Applications (pp. 67–70).

  • Talesara, S., Patil, H.A., Patel, T., Sailor, H., & Shah, N.A. (2013). Novel Gaussian filter-based automatic labeling of speech data for TTS system in Gujarati language. In ICALP Proceedings (pp. 139–142).

  • Tiomkin, S., Malah, D., Shechtman, S., & Kons, Z. (2011). A hybrid text-to-speech system that combines concatenative and statistical synthesis units. IEEE Transactions on Audio, Speech and Language Processing, 19, 1278–1288.

    Article  Google Scholar 

  • Vinodh, M.V., Bellur, A, Narayan, K.B., Thakare, D.M., Susan, A., Suthakar, N.M., & Murthy, H.A. (2010). Using polysyllabic units for text to speech synthesis in Indian languages. In Proceedings NCC (pp. 1–5).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumya Priyadarsini Panda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panda, S.P., Nayak, A.K. An efficient model for text-to-speech synthesis in Indian languages. Int J Speech Technol 18, 305–315 (2015). https://doi.org/10.1007/s10772-015-9271-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9271-y

Keywords

Navigation