IC3 2012: Contemporary Computing pp 118-129 | Cite as
Data-Driven Phrase Break Prediction for Bengali Text-to-Speech System
Abstract
In this paper, an approach is proposed to accurately predict the locations of phrase breaks in a sentence for a Bengali text-to-speech (TTS) synthesis system. Determining the positions of phrase breaks is one of the most important tasks for generating natural and intelligible speech. In order to approximate the break locations, a feed-forward neural network (FFNN) based approach is proposed in the current study. For acquiring prosodic phrase break knowledge, morphological information along with widely-used positional and structural features are analyzed. The importance of all the features is demonstrated using a model-dependent feature selection approach. Finally the phrase break predicting model is implemented with the selected optimal set of features and incorporated inside a Bengali TTS system built using Festival framework [1]. The proposed FFNN model is developed using the optimally selected morphological, positional and structural features. The performance of the proposed FFNN model is compared with widely used Classification and Regression Tree (CART) model for prediction of breaks and no-breaks. The FFNN model is evaluated objectively on the basis of precision, recall and a harmonized measure - F score. The significance of the phrase break module is further analyzed by conducting subjective listening tests.
Keywords
Phrase break prediction morphological positional and structural features CART FFNNPreview
Unable to display preview. Download preview PDF.
References
- 1.Narendra, N.P., Rao, K.S., Ghosh, K., Reddy, V.R., Maity, S.: Development of Syllable-based Text to Speech Synthesis System in Bengali. International Journal of Speech Technology 14(3), 167–181 (2011)CrossRefGoogle Scholar
- 2.Hirschberg, J.: Pitch accent in context: Predicting intonational prominence from text. Artificial Intelligence (63) (1993)Google Scholar
- 3.Fordyce, C.S., Ostendorf, M.: Prosody Prediction for Speech Synthesis Using Transformational Rule Based Learning. In: Proceedings of International Conference of Spoken Language Processing, pp. 682–685 (1998)Google Scholar
- 4.Krishna, N.S., Murthy, H.A.: A New Prosodic Phrasing Model for Indian Language Telugu. In: Proceedings of Interspeech, pp. 793–796 (2004)Google Scholar
- 5.Sun, X., Applebaum, T.H.: Intonational Phrase Break Prediction Using Decision Tree and N-Gram Model. In: Proceedings of Eurospeech (2001)Google Scholar
- 6.Gee, J.P., Grosjean, F.: Performance structures: a psycholinguistic and linguistic appraisal. Cognitive Psychology (15), 411–458 (1983)Google Scholar
- 7.Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Computer Speech and Language (12), 99–117 (1998)Google Scholar
- 8.Silverman, K.: The Sructure and Processing of Fundamental Frequency Contours. Ph.D. thesis, University of Cambridge (1987)Google Scholar
- 9.Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish Text-to-Speech. Speech Communication 18, 281–290 (1996)CrossRefGoogle Scholar
- 10.Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall, New York (1984)MATHGoogle Scholar
- 11.Busser, G., Daelemans, W., van den Bosch, A.: Predicting phrase breaks with memory-based learning. In: Proceedings of 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire, Scotland, pp. 29–34 (2001)Google Scholar
- 12.Yegnanarayana, B.: Artificial Neural Networks. Prentice-Hall, New Delhi (1999)Google Scholar
- 13.Kishore, S.P., Black, A.W.: Unit size in unit selection speech synthesis. In: Proceedings of Eurospeech, pp. 1317–1320 (2003)Google Scholar
- 14.Thomas, H.S., Rao, M.N., Ramalingam, C.: Natural Sounding TTS based on Syllable like Units. In: Proceedings of European Signal Processing Conference, Florence, Italy (2006)Google Scholar
- 15.Roach, P.: English Phonetics and Phonology. Cambridge University Press, Cambridge (1991)Google Scholar
- 16.Gabrilovich, E., Markovitch, S.: Feature Generation for Text Categorization using World Knowledge. In: IJCAI, pp. 1048–1053 (2005)Google Scholar
- 17.Dash, M., Liu, H.: Feature Selection for Classification. In: Intelligent Data Analysis, vol. 1, pp. 131–156 (1997)Google Scholar
- 18.Mladenic, D., Grobelnik, M.: Feature Selection for Classification Based on Text Hierarchy. In: Proceedings of Text and the Web, Conference on Automated Learning and Discovery (1998)Google Scholar
- 19.Kwak, N., Choi, C.H.: Input Feature Selection for Classification Problems. IEEE Transactions on Neural Networks 13(1), 143–159 (2002)CrossRefGoogle Scholar
- 20.Leray, P., Gallinari, P.: Feature selection with neural networks. Pattern Recognition Letters Archive 23(11) (September 2002)Google Scholar
- 21.Tamura, S., Tateishi, M.: Capabilities of a Four-Layered Feedforward Neural Network: Four Layers Versus Three. IEEE Transactions on Neural Networks 8, 251–255 (1997)CrossRefGoogle Scholar
- 22.Sontag, E.D.: Feedback stabilization using two hidden layer nets. IEEE Transactions on Neural Networks 3, 981–990 (1992)CrossRefGoogle Scholar
- 23.Ghosh, K., Reddy, V.R., Rao, K.S.: Phrase Break Prediction for Bengali Text to Speech Synthesis System. In: Proceedings of International Conference of Natural Language Processing, Chennai (2011)Google Scholar
- 24.Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Computer Speech & Language 21(2), 282–285 (2007)CrossRefGoogle Scholar
- 25.Mitchell, T.M.: Machine Learning, 123 p. McGraw Hill, New York (1997)MATHGoogle Scholar
- 26.Hogg, R.V., Ledolter, J.: Engineering Statistics. Macmillan, New York (1987)Google Scholar
- 27.Schmidt, H., Atterer, M.: New statistical methods for phrase break prediction. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)Google Scholar
- 28.Pfitzinger, H., Reichel, U.: Text-based and Signal-based Prediction of Break Indices and Pause Durations. In: Proceedings of Speech Prosody, Dresden, pp. 133–136 (2006)Google Scholar