Skip to main content
Log in

Hierarchical Structure and Word Strength Prediction of Mandarin Prosody

International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

We use Stem-ML to build an automatic learning system for Mandarin prosody that allows us to make quantitative measurements of prosodic strengths. Stem-ML is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds. Because Stem-ML describes the interactions between nearby tones or accents, we were able to use a highly constrained model with only one accent template for each lexical tone category, and a single prosodic strength per word. The model accurately reproduces the intonation of the speaker, capturing 87% of the variance of the speech's fundamental frequency, f 0. The result reveals strong alternating metrical patterns in words, and suggests that the speaker uses word strength to mark a hierarchy of sentence, clause, phrase, and word boundaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

References

  • Bellegarda, J., Silverman, K., Lenzo, K., and Anderson, V. (2001). Statistical prosodic modeling: From corpus design to parameter estimation. IEEE Transactions on Speech and Audio Processing, 9(1):52–66.

    Google Scholar 

  • Fujisaki, H. (1983). Dynamic characteristics of voice fundamental frequency in speech and singing. In P.F. MacNeilage (Ed.), The Production of Speech. Berlin: Springer-Verlag, pp. 39–55.

    Google Scholar 

  • Hirschberg, J. and Pierrehumbert, J. (1986). The intonational structuring of discourse. In Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics. vol. 24, pp. 136–144.

    Google Scholar 

  • Hollien, H. (1981). In search of vocal frequency control mechanisms. In D.M. Bless and J.H. Abbs (Eds.), Vocal Fold Physiology: Contemporary Research and Clinical Issues. San Diego, CA: College-Hill Press, pp. 361–367.

    Google Scholar 

  • Kochanski, G. and Shih, C. (2001). Automated modelling of Chinese intonation in continuous speech. In Proceedings of Eurospeech 2001, Aalborg, Denmark. International Speech Communication Association.

  • Kochanski, G.P. and Shih, C. (2000). Language independent prosody description. In Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China.

  • Kochanski, G.P. and Shih, C. (2002). Soft templates for prosody mark-up. Accepted by Speech Communication.

  • Lea, W. (1973). Segmental and suprasegmental influences on fundamental frequency contours. In L. Hyman (Ed.), Consonant Types and Tones, University of Southern California, Los Angeles, pp. 15–70.

  • Levenberg, K. (1944). A method for the solution of certain problems in least squares. Quart. Applied Math., 2:164–168.

    Google Scholar 

  • Liberman, M.Y. and Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8:249–336.

    Google Scholar 

  • Lin, M.-C. and Yan, J. (1983). The stress pattern and its acoustic correlates in Beijing Mandarin. In Proceedings of the 10th International Congress of Phonetic Sciences. pp. 504–514.

  • Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Applied Math, 11:431–441.

    Google Scholar 

  • MathSoft, Inc. (1995). Splus Online Documentation, 3.3 edition. Subroutine ltsreg(), set to exclude the 5 most extreme data points from the objective function.

  • Ohman, S. (1967). Word and sentence intonation, a quantitative model (Technical report). Department of Speech Communication, Royal Institute of Technology (KTH).

  • Shih, C. (1986). The Prosodic Domain of Tone Sandhi in Chinese. Ph.D. Thesis, University of California, San Diego.

    Google Scholar 

  • Shih, C., Kochanski, G., Fosler-Lussier, E., Chan, M., and Yuan, J.-H. (2001). Implications of prosody modeling for prosody recognition. In M. Bacchiani, J. Hirschberg, D. Litman, and M. Ostendorf (Eds). Proceedings of the ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding. Red Bank, NJ: International Speech Communication Association, pp. 133–138.

    Google Scholar 

  • Shih, C. and Kochanski, G.P. (2000). Chinese tone modeling with Stem-ML. In Proceedings of the Sixth International Conference on Speech and Language Processing. Beijing, China.

  • Silverman, K.E. (1987). The Structure and Processing of Fundamental Frequency Contours. Ph.D. Thesis, University of Cambridge, UK.

  • Stevens, K.N. (1998). Acoustic Phonetics. Cambridge,MA: The MIT Press.

    Google Scholar 

  • Wilder, C.N. (1981). Chest wall preparation for phonation in female speakers. In D.M. Bless and J.H. Abbs (Eds.), Vocal Fold Physiology: Comtemporary Research and Clinical Issues. San Diego, CA: College-Hill Press, pp. 109–123.

    Google Scholar 

  • Winkworth, A.L., Davis, P.J., Adams, R.D., and Ellis, E. (1995). Breathing patterns during spontaneous speech. Journal of Speech and Hearing Research, 38(1):124–144.

    Google Scholar 

  • Xu, Y. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech Communication, 33:319–337.

    Google Scholar 

  • Xu, Y. and Sun, X.J. (2000). How fast can we really change pitch? Maximum speed of pitch change revisited. In Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP). Beijing, China.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kochanski, G., Shih, C. & Jing, H. Hierarchical Structure and Word Strength Prediction of Mandarin Prosody. International Journal of Speech Technology 6, 33–43 (2003). https://doi.org/10.1023/A:1021095805490

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021095805490

Navigation