Tone Modeling for Continuous Mandarin Speech Recognition

Cao, Yang; Zhang, Shuwu; Huang, Taiyi; Xu, Bo

doi:10.1023/B:IJST.0000017012.11970.6a

Tone Modeling for Continuous Mandarin Speech Recognition

Published: April 2004

Volume 7, pages 115–128, (2004)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Yang Cao¹,
Shuwu Zhang²,
Taiyi Huang² &
…
Bo Xu²

120 Accesses
6 Citations
Explore all metrics

Abstract

Tone study is very important for Mandarin speech recognition. In this paper, a Mixture Stochastic Polynomial Tone Model (MSPTM) is proposed for tone modeling in continuous Mandarin speech. In this model the pitch contour, main representative of tone pattern, is described as a mixed stochastic trajectory. The mean trajectory is represented by a polynomial function of normalized time while the variance is time varying. Effective training and tone recognition algorithms were developed. The experimental results based on the proposed MSPTM showed 40.7% tone recognition error rate reduction relative to the traditional Hidden Markov Model (HMM) tone model. We also present a decision tree based approach to learning the tone pattern variation in continuous speech. The phonetic and linguistic factors that may affect the tone patterns were taken into consideration while constructing the tree. After the tree was established, 28 different tone patterns were obtained. We found that in addition to the tone of the neighboring syllable, Consonant/Vowel type of the syllable and the position of the syllable in the utterance also made important contributions to tone pattern variations in continuous speech. Finally, a new approach of integrating tone information into the search process at word level is discussed. Experiments on continuous Mandarin speech recognition showed that the new tone model and tone information integration method were efficient, achieving a 16.2% relative character error rate reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Article 08 February 2018

Tone Nucleus Model for Emotional Mandarin Speech Synthesis

Towards a Hybrid Learning Approach to Efficient Tone Pattern Recognition

References

Cao,Y., Huang, T.-Y., Xu, B., and Li, C.-R. (2000). Astochastic polynomial tone model for continuous Mandarin speech. ICSLP'2000 Proceedings.
Chang, P.-C, Sun, S.-W., and Chen, S.-H. (1972). Mandarin Tone recognition by multilayer perception. IEEE Trans. On Audio and Electroacoustic, 20:367–377.
Google Scholar
Chen, C.J., Gopinath, R.A., and Monkowshi, M.D. (1997). New method in continuous Mandarin speech recognition. In ICASSP'97 Proceedings (CDROM).
Chen, S.-H., Hwang, S.-H., and Wang, Y.-R. (1998). An RNNbased prosodic information synthesizer for Mandarin text-tospeech. IEEE Trans. on Speech and Signal Processing, 6(3):226–239.
Google Scholar
Chen, S.-H. and Wang, Y.-R. (1995). Tone recognition of continuous Mandarin speech based on neural networks. IEEE Trans. on Speech and Signal Processing, 3(2):146–150.
Google Scholar
Dempster, A.P., Larid, N.M., and Rubin, D.B. (1977). Maximum-likelihood from Incomplete Data via the EM algorithm. Journal of Royal Statistical Society Series B, 39:13–18.
Google Scholar
Hastie, T. and Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society (B), 58:155–176.
Google Scholar
Huang, H. and Seide, F. (2000). Pitch tracking and tone features for Mandarin speech recognition. ICASSP'2000 Proceedings, pp. 1523-1526.
Jain, A.K. et al. (2000). Statistical pattern recognition: A review. IEEE Trans on Pattern Analysis and Machine Intelligence, 22(1):4–37.
Google Scholar
Juang, B.H. and Katagiri, S. (1992). Discriminative learning for minimum error training. IEEE Trans. on Signal Processing, 40(12):3043–3051.
Google Scholar
Juang, B.H., Chou, W., and Lee, C.-H. (1997). Minimum classification error rate methods for speech recognition. IEEE Trans. on Speech and Audio Processing, 5(3):257–265.
Google Scholar
Lee, T., Carlson, R., and Granstorm, B. (1998). Context-dependent duration modeling for continuous speech recognition. ICSLP'98 Proceedings (CDROM).
Lin, M.-C. (1998). The Acoustic and Perceptual Characteristics of Chinese Mandarin Speech. Chinese Language (in Chinese), No. 2.
Ma, B. et al. (1996). Context-dependent acoustic models in Chinese speech recognition. In ICASSP'96 Proceedings (CDROM).
Russell, M. and Moore, R. (1985). Explicit modeling of state occupancy in Hidden Markov models for automatic speech recognition. ICASSP'1985, Proceedings, pp. 2376-2379.
Wang, C. and Seneff, S. (1998). A study of tone and tempo in continuous Mandarin digital strings and their application in telephone quality speech recognition. ICSLP'98 Proceedings, pp. 695-698.
Wang, H.-M. et al. (1997). Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data. IEEE Trans. on Speech and Audio Processing, 5(2):196–201.
Google Scholar
Wang, Y.R. et al. (1994). Tone recognition of continuous Chinese speech based on Hidden Markov model. Int. J. Pattern Recognition and Artificial Intelligence, 8(1):233–246.
Google Scholar
Wong,Y.W. and Chang, E. (2001). The effect of pitch and lexical tone on different Mandarin speech recognition tasks. Eurospeech'2001 Proceedings (CDROM).
Zhao, L. et al. (1997). HMM based recognition of Chinese tones in continuous speech. The First China-Japan Workshop on Spoken Language Processing Proceedings (CDROM).

Download references

Author information

Authors and Affiliations

Nokia Research Center, China
Yang Cao
National Laboratory of Pattern Recognition, Chinese Academy of Sciences, China
Shuwu Zhang, Taiyi Huang & Bo Xu

Authors

Yang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Shuwu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Taiyi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Xu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, Y., Zhang, S., Huang, T. et al. Tone Modeling for Continuous Mandarin Speech Recognition. International Journal of Speech Technology 7, 115–128 (2004). https://doi.org/10.1023/B:IJST.0000017012.11970.6a

Download citation

Issue Date: April 2004
DOI: https://doi.org/10.1023/B:IJST.0000017012.11970.6a

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tone Modeling for Continuous Mandarin Speech Recognition

Abstract

Access this article

Similar content being viewed by others

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Tone Nucleus Model for Emotional Mandarin Speech Synthesis

Towards a Hybrid Learning Approach to Efficient Tone Pattern Recognition

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Tone Modeling for Continuous Mandarin Speech Recognition

Abstract

Access this article

Similar content being viewed by others

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Tone Nucleus Model for Emotional Mandarin Speech Synthesis

Towards a Hybrid Learning Approach to Efficient Tone Pattern Recognition

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation