An Innovative Prosody Modeling Method for Chinese Speech Recognition

Peng, Gang; Wang, William S.-Y.

doi:10.1023/B:IJST.0000017013.70486.51

An Innovative Prosody Modeling Method for Chinese Speech Recognition

Published: April 2004

Volume 7, pages 129–140, (2004)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Gang Peng¹ &
William S.-Y. Wang¹

116 Accesses
4 Citations
Explore all metrics

Abstract

This paper presents an innovative method for prosody modeling in Chinese speech recognition. Our method first evaluated the reliability of the prosodic information by which the recognition system dynamically tunes the balance between the spectral scores and prosodic scores. The basic idea of this method is to use prosodic knowledge based on its reliability. The higher the reliability, the more the prosodic information contributes to recognition. Thus, this method will not introduce extra errors but will incorporate more knowledge into the recognition system. Experimental results showed that this method reduced the relative word error rate by as much as 52.9% and 46.0% for Mandarin and Cantonese digit string recognition tasks, respectively. When incorporating tone information into Cantonese Large Vocabulary Continuous Speech Recognition (LVCSR) via the proposed method, a 20.16% relative character error rate reduction was obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Article Open access 11 July 2018

Chen-Yu Chiang

Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system

Article 20 January 2021

Chuya China Bhanja, Mohammad Azharuddin Laskar & Rabul Hussain Laskar

Prosody Modeling: A Review Report on Indian Language

References

Boersma, P. and Weenink, D. (2001). Praat: Doing phonetics by computer [Online]. Available: http://www.fon.hum. uva.nl/praat/
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. Boston: Kluwer Academic Publishers, vol. 2, pp. 121–167.
Google Scholar
Burshtein, D. (1996). Robust parametric modeling of durations in Hidden Markov Models. IEEE Transactions on Speech and Audio Processing, 4(3):240–242.
Google Scholar
Ferguson, J.D. (1980). Variable duration models for speech. Proceedings of Symposia on the Application of Hidden Markov Models to Text and Speech. New-Jersey: Princeton, pp. 143–179.
Google Scholar
Gandour, J., Tumtavitikul, A., and Satthamnuwong, N. (1999). Effects of speaking rate on Thai tones. Phonetica, 56:123–134.
Google Scholar
Hess,W. (1983). Pitch Determination of Speech Signals: Algorithms and Devices. Berlin: Springer-Verlag.
Google Scholar
Huang, Hank C.-H. and Seide, F. (2000) Pitch tracking and tone features for Mandarin speech recognition. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, pp. 1523–1526.
Google Scholar
Kong, J.-P. (2001). Study on dynamic glottis through highspeed digital imaging. Ph.D. thesis, City University of Hong Kong.
Lau, W., Lee, T., Wong, Y.W., and Ching, P.C. (2000). Incorporating tone information into Cantonese large-vocabulary continuous speech recognition. Proceedings of the 2000 International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 883–886.
Google Scholar
Lee, K.-F., Hon, H.-W., and Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(1):35–45.
Google Scholar
Lee, T., Lo, W.K., Ching, P.C., and Meng, Helen. (2002a). Spoken language resources for Cantonese speech processing. Speech Communication, 36:327–342.
Google Scholar
Lee, T., Lau, W., Wong, Y.W., and Ching, P.C. (2002b). Using tone information in Cantonese continuous speech recognition. ACM Transactions on Asia Language Information Processing, 1(1):83–102.
Google Scholar
Levinson, S.E. (1986). Continuously variable duration Hidden Markov Models for automatic speech recognition. Computer Speech and Language, 1:29–45.
Google Scholar
Lin, T. and Wang, L.J. (1992). Yu Yin Xue Jiao Cheng (in Pinyin). Beijing University Publishing.
Linguistic Society of Hong Kong (LSHK). (2002). Hong Kong Jyut Ping Character Table, 2nd ed. Linguistic Society of Hong Kong.
Peng, G. (2002). Reliability index guided prosody modeling in speech recognition. Ph.D. Dissertation, City University of Hong Kong.
Potisuk, S., Harper, M.P., and Gandour, J. (1999). Classification of Thai tone sequences in syllable-segmentated speech using the analysis-by-synthesis method. IEEE Transactions on Speech and Audio Processing, 7(1):95–102.
Google Scholar
Rabiner, L.R. (1984a). On the application of energy contours to the recognition of connected word sequence. AT&T Bell Laboratories Techinical Journal, 63(9):1981–1995.
Google Scholar
Rabiner, L.R. (1984b). On the performance of isolated word speech recognizers using vector quantization and temporal energy contours. AT&T Bell Laboratories Techinical Journal, 63(7):1245–1260.
Google Scholar
Rabiner, L.R. (1989). High performance connected digit recognition using Hidden Markov Models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(8):1214–1225.
Google Scholar
Ramesh, P. and Wilpon, J.G. (1992). Modeling state durations in Hidden Markov Models for automatic speech recognition. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 381–384.
Google Scholar
Russell, M.J. and Moore, R.K. (1985). Explicit modeling of state occupancy in Hidden Markov Models for automatic speech recognition. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5-8.
Shen, X.-N. (1990). Tonal coarticulation in Mandarin. Journal of Phonetics, 18:281–295.
Google Scholar
Talkin, D. (1995). A robust algorithm for pitch tracking. In W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis. Amsterdam and New York: Elsevier, chapter 14, pp. 495–518.
Google Scholar
Wang,W.S.-Y. (1973). The Chineese language. Scientific American, 228:50–63.
Google Scholar
Wang, W.S.-Y. and Li, K.-P. (1967). Tone 3 in Pekinese. Journal of Speech and Hearing Research, 10(3):629–636.
Google Scholar
Wilpon, J.G., Lee, C.-H., and Rabiner, L.R. (1991). Improvements in connected digit recognition using higher order spectral and energy features. Proceedings of the International Conference on Acoustics, Speech, and Signal Procesing (ICASSP), vol. 1, pp. 349–352.
Google Scholar
WiseNews. (2001). [Online]. Available: http://libwisenews.wisers.net.
Wu, Z.-J. (1984). Tone sandhi of tri-syllabic words in Mandarin. Journal of Chinese Linguistics, 2:70–92.
Google Scholar
Xu, Y. (1994). Production and perception of coarticulated tones. Journal of the Acoustical Society of America (JASA), 95(4):2240–2253.
Google Scholar
Xu, Y. (1997). Contextual tonal variations in Mandarin. Journal of Phonetics, 25:61–83.
Google Scholar
Zhang, B., Liu, J., Peng, G., and Wang, W.S.-Y. (1999). A high performance Mandarin digit recognizer. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (ISSPA), vol. 2, pp. 629–632.
Google Scholar

Download references

Author information

Authors and Affiliations

Language Engineering Laboratory, Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Gang Peng & William S.-Y. Wang

Authors

Gang Peng
View author publications
You can also search for this author in PubMed Google Scholar
William S.-Y. Wang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, G., Wang, W.SY. An Innovative Prosody Modeling Method for Chinese Speech Recognition. International Journal of Speech Technology 7, 129–140 (2004). https://doi.org/10.1023/B:IJST.0000017013.70486.51

Download citation

Issue Date: April 2004
DOI: https://doi.org/10.1023/B:IJST.0000017013.70486.51

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Innovative Prosody Modeling Method for Chinese Speech Recognition

Abstract

Access this article

Similar content being viewed by others

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system

Prosody Modeling: A Review Report on Indian Language

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

An Innovative Prosody Modeling Method for Chinese Speech Recognition

Abstract

Access this article

Similar content being viewed by others

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system

Prosody Modeling: A Review Report on Indian Language

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation