A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English

  • Tijana Delić
  • Branislav Gerazov
  • Branislav Popović
  • Milan SečujskiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)


One of the most recently proposed techniques for modeling the prosody of an utterance is the decomposition of its pitch, duration and/or energy contour into physiologically motivated units called atoms, based on matching pursuit. Since this model is based on the physiology of the production of sentence intonation, it is essentially language independent. However, the intonation of an utterance in a particular language is obviously under the influence of factors of a predominantly linguistic nature. In this research, restricted to the case of American English with prosody annotated using standard ToBI conventions, we have shown that, under certain mild constraints, the positive and negative atoms identified in the pitch contour coincide very well with high and low pitch accents and phrase accents of ToBI. By giving a linguistic interpretation of the atom decomposition model, this research enables its practical use in domains such as speech synthesis or cross-lingual prosody transfer.


Atom decomposition Pitch contour ToBI 



The presented study was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia (grant TR32035), and was carried out within the SCOPES project “SP2: SCOPES Project for Speech Prosody” (No. CRSII2-147611/1), supported by Swiss National Science Foundation. The authors are grateful to the company Speech Morphing, Inc. from Campbell, CA, USA, for providing the speech corpus used in the experiments.


  1. 1.
    Fujisaki, H., Nagashima, S.: A model for the synthesis of pitch contours of connected speech. Technical Report, Engineering Research Institute. University of Tokyo, Japan (1969)Google Scholar
  2. 2.
    Strik, H.: Physiological control and behaviour of the voice source in the production of prosody, Ph.D. thesis, Department of Language and Speech, University of Nijmegen, Netherlands (1994)Google Scholar
  3. 3.
    Kochanski, G.P., Shih, C.: Stem-ML: Language independent prosody description. In: International Conference on Spoken Language Processing (ICSLP), vol. 3, pp. 239–242 (2000)Google Scholar
  4. 4.
    Honnet, P.-E., Gerazov, B., Garner, P.N.: Atom decomposition-based intonation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP (2015)Google Scholar
  5. 5.
    Gerazov, B., Honnet, P-E., Gjoreski, A., Garner, P.: Weighted correlation based atom decomposition intonation modeling. In: INTERSPEECH (2015)Google Scholar
  6. 6.
    Pierrehumbert, J.B.: The phonetics and phonology of English intonation (Ph.D. thesis). MIT, Cambridge, MA, USA (1980)Google Scholar
  7. 7.
    Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierre-humbert, J., Hirschberg, J.: ToBI: A standard for labeling English prosody. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 867–870 (1992)Google Scholar
  8. 8.
    Taylor, P.: Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3), 1697–1714 (2000)CrossRefGoogle Scholar
  9. 9.
    Aubergé, V.: Prosody modeling with a dynamic lexicon of intonative forms: Application for text-to-speech synthesis. In: Proceedings of the ESCA Workshop on Prosody, pp. 62–65 (1993)Google Scholar
  10. 10.
    Holm, B,. Bailly G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 203–206 (2000)Google Scholar
  11. 11.
    Kohler, K.J.: Studies in German intonation, Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung. Universität Kiel, vol. 25, 295–360 (1991)Google Scholar
  12. 12.
    Kohler, K.J.: Parametric control of prosodic variables by symbolic input in TTS synthesis. In: van Santen, J., Sproat, R., Olive, J., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 459–475. Springer, New York (1997)CrossRefGoogle Scholar
  13. 13.
    Beckman, M.E., Hirschberg, J., Shattuck-Hufnagel, S.: The original ToBI system and the evolution of the ToBI framework. In: Jun, S.-A. (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 9–54. Oxford University Press, UK (2005)CrossRefGoogle Scholar
  14. 14.
    Ostendorf, M., Price, P., Shattuck-Hufnagel, S.: The Boston University Radio News Corpus. Linguistic Data Consortium (1995)Google Scholar
  15. 15.
    Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)CrossRefzbMATHGoogle Scholar
  16. 16.
    Hermes, D.J.: Measuring the perceptual similarity of pitch contours. J. Speech Lang. Hear. Res. 41(1), 73–82 (1998)CrossRefGoogle Scholar
  17. 17.
    Öhman, S.: Word and sentence intonation: A quantitative model. Speech Transmission Laboratory, Department of Speech Communication, Royal Institute of Technology (1967)Google Scholar
  18. 18.
    Prom-on, S., Xu, Y., Thipakorn, B.: Modeling tone and intonation in Mandarin and English as a process of target approximation. J. Acoust. Soc. Am. 125, 405–424 (2009)CrossRefGoogle Scholar
  19. 19.
    Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: ICASSP 2000, vol. 3, pp. 1281–1284 (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Tijana Delić
    • 1
  • Branislav Gerazov
    • 2
  • Branislav Popović
    • 1
  • Milan Sečujski
    • 1
    Email author
  1. 1.Faculty of Technical SciencesUniversity of Novi SadNovi SadSerbia
  2. 2.Faculty of Electrical Engineering and Information TechologiesSs. Cyril and Methodius UniversitySkopjeMacedonia

Personalised recommendations