Prosodic Cues for Automatic Phrase Boundary Detection in ASR
This article presents a cross-lingual study for Hungarian and Finnish about the segmentation of continuous speech on word and phrasal level based on prosodic features. A word level segmenter has been developed which can indicate the word boundaries with acceptable accuracy for both languages. The ultimate aim is to increase the robustness of Automatic Speech Recognizers (ASR) by detection of word and phrase boundaries, and thus significantly decrease the searching space during the decoding process, very time-consuming in case of agglutinative languages, like Hungarian and Finnish. They are however fixed stressed languages, so by stress detection, word beginnings can be marked with reliable accuracy. An algorithm based on data-driven (HMM) approach was developed and evaluated. The best results were obtained by time series of fundamental frequency and energy together. Syllable length was found to be much less effective, hence was discarded. By use of supra-segmental features, word boundaries can be marked with high correctness ratio, if we allow not to find all of them. The method we evaluated is easily adaptable to other fixed-stress languages. To investigate this we adapted the method to the Finnish language and obtained similar results.
KeywordsContinuous Speech Word Boundary Automatic Speech Recognition System Phrase Boundary Word Unit
Unable to display preview. Download preview PDF.
- 1.Di Cristo: Aspects phonétiques et phonologiques des éléments prosodiques. Modèles linguistiques Tome III 2, 24–83 (1981)Google Scholar
- 2.Langlais, P., Méloni, H.: Integration of a prosodic component in an automatic speech recognition system. In: 3rd European Conference on Speech Communication and Technology, Berlin, pp. 2007–2010 (1993)Google Scholar
- 3.Mandal, S., Datta, A.K., Gupta, B.: Word boundary Detection of Continuous Speech Signal for Standard Colloquial Bengali (SCB) Using Suprasegmental Features. FRSM (2003)Google Scholar
- 4.Peters, B.: Multiple cues for phonetic phrase boundaries in German spontaneous speech. In: Proceedings 15th ICPhS, ICPhS, Barcelona CA, pp. 1795–1798 (2003)Google Scholar
- 5.Roach, P.: BABEL: An Eastern European multi-language database. In: International Conference on Speech and Language Processing, Philadelphia (1996)Google Scholar
- 8.Yang, L.: Duration and pauses as phrase and boundary marking indicators in speech. In: Proceedings 15th ICPhS, pp. 1791–1794. ICPhS, Barcelona (2003)Google Scholar
- 9.Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., et al.: The HTK Book (for version 3.3), pp. 22–131. Cambridge University, Cambridge (2005)Google Scholar
- 10.Venditti, J., Hirschberg, J.: Intonation and discourse processing. In: Proceedings 15th ICPhS, pp. 107–114. ICPhS, Barcelona (2003)Google Scholar
- 11.Vainio, M., Altosaar, T., Karjalainen, M., Aulanko, R., Werner, S.: Neural network models for Finnish prosody. In: Proceedings of ICPhS 1999, pp. 2347–2350. ICPhS, San Francisco (1999)Google Scholar