Prosodic word boundary detection from Bengali continuous speech

Bhowmik, Tanmay; Das Mandal, Shyamal Kumar

doi:10.1007/s10579-019-09478-0

Prosodic word boundary detection from Bengali continuous speech

Original Paper
Published: 13 November 2019

Volume 54, pages 747–765, (2020)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

344 Accesses
1 Citation
Explore all metrics

Abstract

Detection of word boundaries in continuous speech is a tedious process due to the absence of a definite pause or silence in the word boundary position. Thus, continuous speech recognition is a very challenging task. However, the prosodic word boundaries, unlike the written word boundaries, can be predicted using the prosodic parameters of continuous speech. This paper proposes a method for detecting such prosodic word boundaries from Bengali continuous speech. Bengali is a bound-stress language, where stress is observed on the first syllable of a prosodic word. Empirical Mode Decomposition is applied to the logarithm of fundamental frequency (F₀) contour of continuous speech to detect prosodic word boundaries. 200 Bengali readout sentences, read by ten speakers, are analyzed for the present work. An overall prosodic boundary detection accuracy of 88.05% is achieved, whereas precision and recall values are 90.73% and 88.31%, respectively, with f-score as 89.5. A prosodic word dictionary comprising 5031 prosodic words has been developed by analyzing 1526 Bengali sentences with the proposed prosodic word boundary detection method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0

Prosodic Phrase Boundary Classification Based on Czech Speech Corpora

Word Estimation in Continuous Colloquial Bengali Speech

References

Acharya, S., & Das Mandal, S. K. (2013). Prosodic word and phrase boundary detection based on F0 contour analysis using empirical mode decomposition. In Oriental COCOSDA/CASLRE (pp. 1–5). IEEE.
Agarwal, A., Jain, A., Prakash, N., & Agarwal, S. (2010). Word boundary detection in continuous speech based on supra segmental features for Hindi Language. In 2nd International Conference on Signal Processing Systems (pp. 591–594). Dalian: IEEE.
Alam, F., Murtoza Habib, S., Sultana, A., & Khan, M. (2010). Development of annotated bangla speech corpora. In Spoken Languages Technologies for Under-Resourced Languages (pp. 35–41).
Ananthakrishnan, S., & Narayanan, S. (2007). Improved speech recognition using acoustic and lexical correlates of pitch accent in a n-best rescoring framework. In International Conference on Acoustics, Speech and Signal Processing—ICASSP’07 (pp. 873–876). Honolulu: IEEE.
Ananthakrishnan, S., & Narayanan, S. (2009). Unsupervised adaptation of categorical prosody models for prosody labeling and speech recognition. IEEE Transactions on Audio, Speech and Language Processing,17(1), 138–149.
Article Google Scholar
Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596.
Article Google Scholar
Bhowmik, T. (2017). Prosodic and Phonological Feature based Speech Recognition System for Bengali, PhD Thesis, IIT Kharagpur.
Bhowmik, T., & Das Mandal, S. K. (2018). Manner of articulation based Bengali phoneme classification. International Journal of Speech Technology,21(2), 233–250.
Article Google Scholar
Boersma, P., & Weenink, D. (2016). Praat: Doing phonetics by computer.[computer program]. Version 6.0.19. Retrieved 2016, from http://www.praat.org.
Campbell, N. (1993). Automatic detection of prosodic boundaries in speech. Speech Communication,13(3–4), 343–354.
Article Google Scholar
Campbell, N., & Black, A. (1997). Prosody and the selection of source units for concatenative synthesis. Progress in speech synthesis (pp. 279–292). New York: Springer.
Chapter Google Scholar
Chen, S.-H., Yang, J.-H., Chiang, C.-Y., Liu, M.-C., & Wang, Y.-R. (2012). A new prosody-assisted mandarin ASR system. IEEE Transactions on Audio, Speech and Language Processing,20(6), 1669–1684.
Article Google Scholar
Das Mandal, S. K. (2007). Role of Shape Parameters in Speech Recognition: A study on standard colloquial Bengali (SCB), PhD Thesis, Jadavpur University, Kolkata, India.
Das Mandal, S., Gupta, B., & Datta, A. (2007). Word boundary detection based on suprasegmental features: A case study on Bangla speech. International Journal of Speech Technology,9(1–2), 17–28.
Article Google Scholar
Das Mandal, S., Saha, A., & Datta, A. (2005). Annotated speech corpora development in Indian languages. Vishwa Bharat,6, 49–64.
Google Scholar
Das Mandal, S., Warsi, A., Basu, T., Hirose, K., & Fujisaki, H. (2010). Analysis and synthesis of F0 contours for Bangla readout speech. In Oriental COCOSDA (pp. 1–6). Kathmandu: IEEE.
Fujii, K., Kashioka, H., & Campbell, N. (2003). Target cost of FQ based on polynomial regression in concatenative speech synthesis. In 15th international congress of phonetic sciences (ICPhS-15) (pp. 2577–2580). Barcelona.
Fujisaki, H. (1997). Prosody, models, and spontaneous speech. Computing Prosody (pp. 27–42). New York: Springer.
Google Scholar
Fujisaki, H. (2004). Information, prosody, and modeling -with emphasis on tonal features of speech. In Speech Prosody (pp. 1–10). Nara, Japan: ISCA.
Fujisaki, H., & Kawai, H. (1988). Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese. In International Conference on Acoustic, Speech, and Signal Processing-ICASSP’88 (pp. 663–666). New York: IEEE.
Ganguly, N. R., Datta, A. K., & Mukherjee, B. (1998). Acoustic correlates of perceptual stress in Bengali text reading. In International conference on Computational Linguistics, Speech and Document Processing, (pp. B68–B71). ISI Calcutta.
Hayes, B., & Lahiri, A. (1991). Bengali intonational phonology. Natural Language & Linguistic Theory,9(1), 47–96.
Article Google Scholar
Hirose, K., & Minematsu, N. (2004). Use of prosodic features for speech recognition. In INTERSPEECH (pp. 1445–1448). Jeju Island, Korea: ISCA.
Huang, N., Shen, Z., Long, S., Wu, M., Shih, H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences,454(1971), 903–995.
Article Google Scholar
Iwano, K., & Hirose, K. (1999). Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition. In International Conference on Acoustics, Speech, and Signal Processing—ICASSP’99 (pp. 133–136). Phoenix: IEEE.
Lehiste, I., & Lass, N. (1976). Suprasegmental features of speech. In N. Lass (Ed.), Contemporary issues in experimental phonetics (pp. 225–239). New York: Academic Press.
Chapter Google Scholar
Milone, D., & Rubio, A. (2003). Prosodic and accentual information for automatic speech recognition. IEEE Transaction on Speech and Audio Processing,11(4), 321–333.
Article Google Scholar
Narusawa, S., Minematsu, N., Hirose, K., & Fujisaki, H. (2002). A method for automatic extraction of model parameters from fundamental frequency contours of speech. In 2002 IEEE International conference on acoustics, speech, and signal processing (Vol. 1, pp. 506–509). Orlando, Florida: IEEE.
Google Scholar
Rajendran, S., & Yegnanarayana, B. (1996). Word boundary hypothesization for continuous speech in Hindi based on F0 patterns. Speech Communication,18(1), 21–46.
Article Google Scholar
Rilling, G., Flandrin, P., & Goncalves, P. (2003). On empirical mode decomposition and its algorithms. In IEEE-EURASIP workshop on nonlinear signal and image processing (Vol. 3, pp. 8–11). NSIP-03, Grado (I).
Sagisaka, Y., Campbell, N., & Higuchi, N. (2012). Computing PROSODY: Computational models for processing spontaneous speech. Kyoto: Springer Science and Business Media.
Google Scholar
Tsiartas, A., Ghosh, P., Georgiou, P., & Narayanan, S. (2009). Robust word boundary detection in spontaneous speech using acoustic and lexical cues. In International Conference on Acoustics, Speech, and Signal Processing—ICASSP’2009 (pp. 4785–4788). Taipei, Taiwan: IEEE.
Vergyri, D., Stolcke, A., Gadde, V., Ferrer, L., & Shriberg, E. (2003). Prosodic knowledge sources for automatic speech recognition. In International Conference on Acoustics, Speech, and Signal Processing—ICASSP’2003 (pp. I–I). Honk Kong: IEEE.

Download references

Author information

Authors and Affiliations

Bennett University, Greater Noida, Uttar Pradesh, India
Tanmay Bhowmik
CET, Indian Institute of Technology Kharagpur, Kharagpur, India
Shyamal Kumar Das Mandal

Authors

Tanmay Bhowmik
View author publications
You can also search for this author in PubMed Google Scholar
Shyamal Kumar Das Mandal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tanmay Bhowmik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhowmik, T., Das Mandal, S.K. Prosodic word boundary detection from Bengali continuous speech. Lang Resources & Evaluation 54, 747–765 (2020). https://doi.org/10.1007/s10579-019-09478-0

Download citation

Published: 13 November 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10579-019-09478-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prosodic word boundary detection from Bengali continuous speech

Abstract

Access this article

Similar content being viewed by others

Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0

Prosodic Phrase Boundary Classification Based on Czech Speech Corpora

Word Estimation in Continuous Colloquial Bengali Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prosodic word boundary detection from Bengali continuous speech

Abstract

Access this article

Similar content being viewed by others

Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0

Prosodic Phrase Boundary Classification Based on Czech Speech Corpora

Word Estimation in Continuous Colloquial Bengali Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation