Prosody Modeling: A Review Report on Indian Language

  • Sudipta Acharya
  • Shyamal Kr. Das Mandal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8284)

Abstract

This paper presents a detail study on prosody parameters such as Pause, Duration, F0 and Intensity, and different methods for their modeling for Indian language. Various Speech Synthesis Systems are now appearing for some of the major Indian languages; however, all of these can only generate flat and monotonous speech – raising perceptual difficulties to sustain listening. Prosody (intonation and rhythm) of spoken language plays an important role for intelligibility and naturalness in synthesized speech.

Keywords

Text-to-Speech Synthesis (TTS) prosody F0 modeling pause modeling duration modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Black, W.A., Hunt, J.A.: Generating F0 Contours From ToBI Labels Using Linear Regression. In: Proceedings of Third International Conference on Spoken Language Processing, ICSLP 1996, Philadelphia, USA (1996)Google Scholar
  2. 2.
    Campbell, N.: Timing in Speech: A Multi-Level Process. In: Horne, M. (ed.) Prosody: Theory and Experiment, Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  3. 3.
    Chen, S.H., Lai, W.H., Wang, Y.R.: A New Duration Modeling Approach for Mandarin Speech. IEEE Transactions on Speech and Audio Processing 11(4) (2003)Google Scholar
  4. 4.
    Chung, H.: Segment duration in spoken Korean. In: Proc. Int. Conf. Spoken Language Processing, Denver, Colorado, USA, pp. 1105–1108 (September 2002)Google Scholar
  5. 5.
    Das Mandal, S., Saha, A., Basu, T., Hirose, K., Fujisaki, H.: Modeling of Sentence-medial Pauses in Bangla Readout Speech: Occurrence and Duration, Interspeech 2010, Makuhari, Japan, September 26-30 (2010)Google Scholar
  6. 6.
    Das Mandal, S., Warsi, H.A., Basu, T., Hisore, K., Fujisaki, H.: Analysis and Synthesis of F0 contours for Bangla readout speech. In: OCOCOSDA 2010 (2010)Google Scholar
  7. 7.
    Fujisaki, H., Ohno, S., Yagi, T., Ono, T.: Analysis and interpretation of fundamental frequency contours of British English In terms of a command-response model. In: ICSLP 1998 (1998a)Google Scholar
  8. 8.
    Fujisaki, H., Ohno, S., Yamada, S.: Analysis Of Occurrence Of Pauses And Their Durations In Japanese Text Reading. In: ICSLP 1998 (1998)Google Scholar
  9. 9.
    Fujisaki, H., Ohno, S., Yamada, S.: Factors Affecting the Occurrence and Duration of Sentence-medial Pauses in Japanese Text Reading. In: Proc. ICPhS 1999, San Francisco, vol. 1, pp. 659–662 (1999)Google Scholar
  10. 10.
    Fujisaki, H.: Analysis and modeling of fundamental frequency contours of Korean utterances — A preliminary study —. In: Lee, H.B. (ed.) Phonetics and Linguistics — in honor of Prof., pp. 640–657 (1996)Google Scholar
  11. 11.
    Fujisaki, H.: Information, Prosody, and Modeling — with Emphasis on Tonal Features of Speech (Plenary Keynote Paper). In: Proceedings of Speech Prosody 2004, Nara, Japan, pp. 1–10 (2004)Google Scholar
  12. 12.
    Fujisaki, H., Hirose, K.: Analysis of Voice Fundamental Frequency Contours for Declarative Sentences of Japanese. Journal of the Acoustical Society of Japan 5, 233–242 (1984)CrossRefGoogle Scholar
  13. 13.
    Fujisaki, H.: Prosody, Information, and Modeling - with Emphasis on Tonal Features of Speech –. In: Proceedings of Workshop on Spoken Language Processing, Mumbai, India (Invited Keynote Paper) (2003)Google Scholar
  14. 14.
    Fujisaki, H.: Prosody, Models, and Spontaneous Speech. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds.) Computing Prosody, pp. 27–42. Springer, New York (1996)Google Scholar
  15. 15.
  16. 16.
    Fujisaki, H., Ljungqvist, M., Murata, H.: Analysis and modeling of word accent and sentence intonation in Swedish. In: Proceedings of 1993 International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 211–214 (1993)Google Scholar
  17. 17.
    Fujisaki, H., Ohno, S., Yamada, S.: Analysis of occurrence of pauses and their durations in Japanese text reading. In: ICSLP 1998 (1998)Google Scholar
  18. 18.
    Fujisaki, H., Ohno, S., Yamada, S.: “Factors Affecting the Occurrence and Duration of Sentence-medial Pauses in Japanese Text Reading. In: Proc. ICPhS 1999, San Francisco, vol. 1, pp. 659–662 (1999)Google Scholar
  19. 19.
    Gopinath Deepa, P., Vinod Chandra, S.S., Veena, S.G.: A hybrid duration model using CART and HMM. In: Proceedings of IEEE, TENCON 2008 (2008)Google Scholar
  20. 20.
    Hirst, D.: Automatic Analysis of Prosody for Multi-lingual Speech Corpora. In: Keller, E., Bailly, G., Monaghan, A., Terken, J., Huckvale, M. (eds.) Improvements in Speech Synthesis, Cost 258: The naturalness of synthetic speech, pp. 320–327. John Wiley & Sons, West Sussex (2002)Google Scholar
  21. 21.
    Hwang, S.H., Chen, S.H.: Neural-network-based F0 text-to-speech synthesizer for Mandarin. IEE Proc. Image Signal Processing 141, 384–390 (1994)CrossRefGoogle Scholar
  22. 22.
    Klatt, H.D.: Review of Text-to-Speech Conversion for English. Journal of the Acoustical Society of America 82, 737–793 (1987)CrossRefGoogle Scholar
  23. 23.
    Krishna, N.S., Murthy, H.: Duration modeling of Indian languages Hindi and Telugu. In: 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, pp. 197–202 (May 2004)Google Scholar
  24. 24.
    Krishna, N.S., Talukdar, P.P., Bali, K., Ramakrishnan, A.G.: Duration Modeling for Hindi Text-to-Speech Synthesis System. In: Proceedings of International Conference on Spoken Language Processing, ICSLP 2004, Korea (2004)Google Scholar
  25. 25.
    Kumar, A.S.M., Rajendran, S., Yegnanarayana, B.: Intonation component of text-to speech system for Hindi. Computer Speech and Language 7, 283–301 (1993)CrossRefGoogle Scholar
  26. 26.
    Kumar, S.R.R., Yegnanarayana, B.: Significance of durational knowledge for speech synthesis in Indian languages. In: Proc. IEEE Region 10 Conf. Convergent Technologies for the Asia-Pacific, Bombay, India, pp. 486–489 (November 1989)Google Scholar
  27. 27.
    Lin-Shan, L., Chiu-Yu, T., Ming, O.-Y.: The Synthesis Rules in a Chinese Textto-Speech System. IEEE Trans. Acoustic, Speech, Signal processing 37(9), 269–285 (1989)Google Scholar
  28. 28.
    Lee, S., Oh, Y.W.: Tree-Based Modeling of Intonation. Computer Speech and Language 15, 75–98 (2001)CrossRefGoogle Scholar
  29. 29.
    Lee, S., Oh, Y.W.: Tree-Based Modeling of Prosodic Phrasing and Segmental Duration for Korean TTS Systems. Speech Communication 28, 283–300 (1999a)CrossRefGoogle Scholar
  30. 30.
    Lee, S., Oh, Y.W.: CART-Based Modeling of Korean Segmental Duration. In: Proceedings of Oriental Cocosda Workshop (1999b)Google Scholar
  31. 31.
    Lehiste, I., Olive, J.P., Streeter, L.A.: Role of duration in disambiguating syntactically ambiguous sentences. Journal of the Acoustical Society of America 60, 1199–1202 (1976)CrossRefGoogle Scholar
  32. 32.
    Li, Y., Lee, T., Qian, Y.: ” Analysis and Modeling of F0 Contours for Cantonese Text-to- Speech" TALIP. TALIP 3(3), 169–180 (2004)CrossRefGoogle Scholar
  33. 33.
    Mixdorff, H.: An integrated approach to modeling German prosody PhD thesis, Technical University, Dresden, Germany (July 2002)Google Scholar
  34. 34.
    Mixdorff, H., Fujisaki, H.: Analysis of voice fundamental frequency contours of German utterances using a quantitative model. In: Proceedings of 1994 International Conference on Spoken Language Processing, vol. 4, pp. 2231–2234 (1994)Google Scholar
  35. 35.
    Mixdorff, H.: A Novel Approach to the Fully Automatic Extraction of Fujisaki Model Parameters. In: Proceedings of ICASSP 2000, Istanbul, Turkey, vol. 3, pp. 1281–1284 (2000)Google Scholar
  36. 36.
    Möbius, B., van Santen, J.P.H.: Modeling Segmental Duration in German Text-to-Speech Synthesis. In: Proceedings of International Conference on Spoken Language Processing, ICSLP 1996, Philadelphia, USA, October 3-6, vol. 4, pp. 2395–2398 (1996)Google Scholar
  37. 37.
    Norkevičius, G., et al.: Modeling Phone Duration of Lithuanian by Classification and Regression Trees, using Very Large Speech Corpus. Informatica 19(2), 271–284 (2008)Google Scholar
  38. 38.
    Öztürk, Ö., Çiloğlu, T.: Segmental duration modelling in turkish. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 669–676. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  39. 39.
    Pierrehumbert, J.: Tonal Elements and Their Alignment. In: Horne, M. (ed.) Prosody: Theory and Experiment, Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  40. 40.
    Roy, R., Basu, T., Saha, A., Basu, J., Mandal, S.D.: Duration modeling for Bangla text to speech synthesis system. In: International conference on Asian Language Processing, Thailand (2008)Google Scholar
  41. 41.
    Rajeswari, K.C., Maheswari, P.U.: Prosody Modeling Techniques for Text-to-Speech Synthesis Systems: A Survey. International Journal of Computer Applications 39(16) (2012)Google Scholar
  42. 42.
    Rao, K.S., Yegnanarayana, B.: Modeling Syllable Duration in Indian Languages Using Neural Networks. In: Proceedings Int. Conf. Acoust. Speech Signal Processing, Montreal, Quebec, Canada, pp. 313–316 (2004)Google Scholar
  43. 43.
    Rao, K.S., Yegnanarayana, B.: Intonation modeling for Indian languages. Computer Speech and Language 23, 240–256 (2009)CrossRefGoogle Scholar
  44. 44.
    Rao, K.S.: Predicting Prosody from Text for Text-to-Speech Synthesis. Springer Briefs in Electrical and Computer Engineering. Springer Science Business Media, New York (2012), doi:10.1007/978-1-4614-1338-7CrossRefMATHGoogle Scholar
  45. 45.
    Sreenivasa, R.K., Yegnanarayana, B.: Modeling syllable duration in Indian languages using support vector machines. In: Proc. 2nd Int. Conf. Intelligent Sensing and Information Processing,ICISIP-2005, Chennai, India (January 2005)Google Scholar
  46. 46.
    Reddy, V.R., Rao, K.S.: Intonation modeling using FFNN for syllable based Bengali text to speech synthesis. In: Computer and Communication Technology(ICCCT), pp. 334–339 (2011)Google Scholar
  47. 47.
    Romportl, J., Kala, J.: Prosody modelling in Czech Text-to-Speech synthesis. In: The Proceedings of Sixth International Workshop on Speech Synthesis (2007)Google Scholar
  48. 48.
    Roy, C., Basu, T., Saha, A., Das Mandal, S.K., Datta, A.K.: Studies on Duration of Steady States and Transitions in V-V Combination in Bangla Words. In: Proc. of FRSM-2008, Kolkata, India, pp. 157–160 (2008)Google Scholar
  49. 49.
    Roy, R., Basu, T., Basu, J., Saha, A.: Study of Nucleus Vowel Duration and its Role in Prosody of Bangla. In: Proc. of Oriental COCOSDA 2007, Hanoi, Vietnam, pp. 181–184 (2007)Google Scholar
  50. 50.
    Scordilis, M.S., Gowdy, J.N.: Neural network based generation of fundamental frequency contours. In: Proc. IEEE Int. Conf. Acoust, Glasgow, Scotland, vol. 1, pp. 219–222 (May 1989)Google Scholar
  51. 51.
    Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J.: ToBI: A Standard For Labeling English Prosody. In: Proceedings of the 1992 International Conference on Spoken Language Processing, vol. 2, pp. 867–870 (1992)Google Scholar
  52. 52.
    Taylor, P.A.: A Phonetic Model of English Intonation, Ph.D. Dissertation, University of Edinburgh (1992)Google Scholar
  53. 53.
    Taylor, P.A.: The Tilt Intonation Model. In: Proceedings of ICSLP (1998); Taylor, P.A.: Analysis and synthesis of Intonation Using the Tilt Model. Journal of the Acoustical Society of America 107(3), 1697–1714 (2000)Google Scholar
  54. 54.
    Taylor, P.A., Isard, S.D.: A New Model Of Intonation For Use With Speech Recognition And Synthesis. In: International Conference on Spoken Language Processing, Banff, Canada (1992)Google Scholar
  55. 55.
    Yu, J., Tao, J.: The Pause Duration Prediction for Mandarin Text-to-Speech System. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEENLP-KE 2005 (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Sudipta Acharya
    • 1
  • Shyamal Kr. Das Mandal
    • 1
  1. 1.Indian Institute of Technology KharagpurIndia

Personalised recommendations