IC3 2012: Contemporary Computing pp 118-129 | Cite as

Data-Driven Phrase Break Prediction for Bengali Text-to-Speech System

  • Krishnendu Ghosh
  • K. Sreenivasa Rao
Part of the Communications in Computer and Information Science book series (CCIS, volume 306)

Abstract

In this paper, an approach is proposed to accurately predict the locations of phrase breaks in a sentence for a Bengali text-to-speech (TTS) synthesis system. Determining the positions of phrase breaks is one of the most important tasks for generating natural and intelligible speech. In order to approximate the break locations, a feed-forward neural network (FFNN) based approach is proposed in the current study. For acquiring prosodic phrase break knowledge, morphological information along with widely-used positional and structural features are analyzed. The importance of all the features is demonstrated using a model-dependent feature selection approach. Finally the phrase break predicting model is implemented with the selected optimal set of features and incorporated inside a Bengali TTS system built using Festival framework [1]. The proposed FFNN model is developed using the optimally selected morphological, positional and structural features. The performance of the proposed FFNN model is compared with widely used Classification and Regression Tree (CART) model for prediction of breaks and no-breaks. The FFNN model is evaluated objectively on the basis of precision, recall and a harmonized measure - F score. The significance of the phrase break module is further analyzed by conducting subjective listening tests.

Keywords

Phrase break prediction morphological positional and structural features CART FFNN 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Narendra, N.P., Rao, K.S., Ghosh, K., Reddy, V.R., Maity, S.: Development of Syllable-based Text to Speech Synthesis System in Bengali. International Journal of Speech Technology 14(3), 167–181 (2011)CrossRefGoogle Scholar
  2. 2.
    Hirschberg, J.: Pitch accent in context: Predicting intonational prominence from text. Artificial Intelligence (63) (1993)Google Scholar
  3. 3.
    Fordyce, C.S., Ostendorf, M.: Prosody Prediction for Speech Synthesis Using Transformational Rule Based Learning. In: Proceedings of International Conference of Spoken Language Processing, pp. 682–685 (1998)Google Scholar
  4. 4.
    Krishna, N.S., Murthy, H.A.: A New Prosodic Phrasing Model for Indian Language Telugu. In: Proceedings of Interspeech, pp. 793–796 (2004)Google Scholar
  5. 5.
    Sun, X., Applebaum, T.H.: Intonational Phrase Break Prediction Using Decision Tree and N-Gram Model. In: Proceedings of Eurospeech (2001)Google Scholar
  6. 6.
    Gee, J.P., Grosjean, F.: Performance structures: a psycholinguistic and linguistic appraisal. Cognitive Psychology (15), 411–458 (1983)Google Scholar
  7. 7.
    Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Computer Speech and Language (12), 99–117 (1998)Google Scholar
  8. 8.
    Silverman, K.: The Sructure and Processing of Fundamental Frequency Contours. Ph.D. thesis, University of Cambridge (1987)Google Scholar
  9. 9.
    Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish Text-to-Speech. Speech Communication 18, 281–290 (1996)CrossRefGoogle Scholar
  10. 10.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall, New York (1984)MATHGoogle Scholar
  11. 11.
    Busser, G., Daelemans, W., van den Bosch, A.: Predicting phrase breaks with memory-based learning. In: Proceedings of 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire, Scotland, pp. 29–34 (2001)Google Scholar
  12. 12.
    Yegnanarayana, B.: Artificial Neural Networks. Prentice-Hall, New Delhi (1999)Google Scholar
  13. 13.
    Kishore, S.P., Black, A.W.: Unit size in unit selection speech synthesis. In: Proceedings of Eurospeech, pp. 1317–1320 (2003)Google Scholar
  14. 14.
    Thomas, H.S., Rao, M.N., Ramalingam, C.: Natural Sounding TTS based on Syllable like Units. In: Proceedings of European Signal Processing Conference, Florence, Italy (2006)Google Scholar
  15. 15.
    Roach, P.: English Phonetics and Phonology. Cambridge University Press, Cambridge (1991)Google Scholar
  16. 16.
    Gabrilovich, E., Markovitch, S.: Feature Generation for Text Categorization using World Knowledge. In: IJCAI, pp. 1048–1053 (2005)Google Scholar
  17. 17.
    Dash, M., Liu, H.: Feature Selection for Classification. In: Intelligent Data Analysis, vol. 1, pp. 131–156 (1997)Google Scholar
  18. 18.
    Mladenic, D., Grobelnik, M.: Feature Selection for Classification Based on Text Hierarchy. In: Proceedings of Text and the Web, Conference on Automated Learning and Discovery (1998)Google Scholar
  19. 19.
    Kwak, N., Choi, C.H.: Input Feature Selection for Classification Problems. IEEE Transactions on Neural Networks 13(1), 143–159 (2002)CrossRefGoogle Scholar
  20. 20.
    Leray, P., Gallinari, P.: Feature selection with neural networks. Pattern Recognition Letters Archive 23(11) (September 2002)Google Scholar
  21. 21.
    Tamura, S., Tateishi, M.: Capabilities of a Four-Layered Feedforward Neural Network: Four Layers Versus Three. IEEE Transactions on Neural Networks 8, 251–255 (1997)CrossRefGoogle Scholar
  22. 22.
    Sontag, E.D.: Feedback stabilization using two hidden layer nets. IEEE Transactions on Neural Networks 3, 981–990 (1992)CrossRefGoogle Scholar
  23. 23.
    Ghosh, K., Reddy, V.R., Rao, K.S.: Phrase Break Prediction for Bengali Text to Speech Synthesis System. In: Proceedings of International Conference of Natural Language Processing, Chennai (2011)Google Scholar
  24. 24.
    Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Computer Speech & Language 21(2), 282–285 (2007)CrossRefGoogle Scholar
  25. 25.
    Mitchell, T.M.: Machine Learning, 123 p. McGraw Hill, New York (1997)MATHGoogle Scholar
  26. 26.
    Hogg, R.V., Ledolter, J.: Engineering Statistics. Macmillan, New York (1987)Google Scholar
  27. 27.
    Schmidt, H., Atterer, M.: New statistical methods for phrase break prediction. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)Google Scholar
  28. 28.
    Pfitzinger, H., Reichel, U.: Text-based and Signal-based Prediction of Break Indices and Pause Durations. In: Proceedings of Speech Prosody, Dresden, pp. 133–136 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Krishnendu Ghosh
    • 1
  • K. Sreenivasa Rao
    • 1
  1. 1.School of Information TechnologyIndian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations