Skip to main content

A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English

  • 1903 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9811)

Abstract

One of the most recently proposed techniques for modeling the prosody of an utterance is the decomposition of its pitch, duration and/or energy contour into physiologically motivated units called atoms, based on matching pursuit. Since this model is based on the physiology of the production of sentence intonation, it is essentially language independent. However, the intonation of an utterance in a particular language is obviously under the influence of factors of a predominantly linguistic nature. In this research, restricted to the case of American English with prosody annotated using standard ToBI conventions, we have shown that, under certain mild constraints, the positive and negative atoms identified in the pitch contour coincide very well with high and low pitch accents and phrase accents of ToBI. By giving a linguistic interpretation of the atom decomposition model, this research enables its practical use in domains such as speech synthesis or cross-lingual prosody transfer.

Keywords

  • Atom decomposition
  • Pitch contour
  • ToBI

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-43958-7_6
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-43958-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    Six recordings from the initial set of 910 recordings were excluded because they contained the ERROR tag where a boundary tone was expected.

References

  1. Fujisaki, H., Nagashima, S.: A model for the synthesis of pitch contours of connected speech. Technical Report, Engineering Research Institute. University of Tokyo, Japan (1969)

    Google Scholar 

  2. Strik, H.: Physiological control and behaviour of the voice source in the production of prosody, Ph.D. thesis, Department of Language and Speech, University of Nijmegen, Netherlands (1994)

    Google Scholar 

  3. Kochanski, G.P., Shih, C.: Stem-ML: Language independent prosody description. In: International Conference on Spoken Language Processing (ICSLP), vol. 3, pp. 239–242 (2000)

    Google Scholar 

  4. Honnet, P.-E., Gerazov, B., Garner, P.N.: Atom decomposition-based intonation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP (2015)

    Google Scholar 

  5. Gerazov, B., Honnet, P-E., Gjoreski, A., Garner, P.: Weighted correlation based atom decomposition intonation modeling. In: INTERSPEECH (2015)

    Google Scholar 

  6. Pierrehumbert, J.B.: The phonetics and phonology of English intonation (Ph.D. thesis). MIT, Cambridge, MA, USA (1980)

    Google Scholar 

  7. Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierre-humbert, J., Hirschberg, J.: ToBI: A standard for labeling English prosody. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 867–870 (1992)

    Google Scholar 

  8. Taylor, P.: Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3), 1697–1714 (2000)

    CrossRef  Google Scholar 

  9. Aubergé, V.: Prosody modeling with a dynamic lexicon of intonative forms: Application for text-to-speech synthesis. In: Proceedings of the ESCA Workshop on Prosody, pp. 62–65 (1993)

    Google Scholar 

  10. Holm, B,. Bailly G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 203–206 (2000)

    Google Scholar 

  11. Kohler, K.J.: Studies in German intonation, Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung. Universität Kiel, vol. 25, 295–360 (1991)

    Google Scholar 

  12. Kohler, K.J.: Parametric control of prosodic variables by symbolic input in TTS synthesis. In: van Santen, J., Sproat, R., Olive, J., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 459–475. Springer, New York (1997)

    CrossRef  Google Scholar 

  13. Beckman, M.E., Hirschberg, J., Shattuck-Hufnagel, S.: The original ToBI system and the evolution of the ToBI framework. In: Jun, S.-A. (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 9–54. Oxford University Press, UK (2005)

    CrossRef  Google Scholar 

  14. Ostendorf, M., Price, P., Shattuck-Hufnagel, S.: The Boston University Radio News Corpus. Linguistic Data Consortium (1995)

    Google Scholar 

  15. Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)

    CrossRef  MATH  Google Scholar 

  16. Hermes, D.J.: Measuring the perceptual similarity of pitch contours. J. Speech Lang. Hear. Res. 41(1), 73–82 (1998)

    CrossRef  Google Scholar 

  17. Öhman, S.: Word and sentence intonation: A quantitative model. Speech Transmission Laboratory, Department of Speech Communication, Royal Institute of Technology (1967)

    Google Scholar 

  18. Prom-on, S., Xu, Y., Thipakorn, B.: Modeling tone and intonation in Mandarin and English as a process of target approximation. J. Acoust. Soc. Am. 125, 405–424 (2009)

    CrossRef  Google Scholar 

  19. Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: ICASSP 2000, vol. 3, pp. 1281–1284 (2000)

    Google Scholar 

Download references

Acknowledgments

The presented study was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia (grant TR32035), and was carried out within the SCOPES project “SP2: SCOPES Project for Speech Prosody” (No. CRSII2-147611/1), supported by Swiss National Science Foundation. The authors are grateful to the company Speech Morphing, Inc. from Campbell, CA, USA, for providing the speech corpus used in the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milan Sečujski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Delić, T., Gerazov, B., Popović, B., Sečujski, M. (2016). A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43958-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43957-0

  • Online ISBN: 978-3-319-43958-7

  • eBook Packages: Computer ScienceComputer Science (R0)