Generation of Pauses Within the z-score Model

  • Plínio Almeida Barbosa
  • Gérard Bailly
Chapter

Abstract

We have previously proposed [BB94] a model for the generation of segmental durations that proceeds in two steps: (1) prediction of the timing of a salient acoustic event per syllable according to phonotactic and syntactic information, and (2) application of a repartition model that determines the duration of each individual segment between these events. This chapter focusses on the repartition model and describes how the initial model has been enriched to account for the emergence of pauses as speech rate is decreased. It describes a perceptual evaluation of the whole model. This evaluation shows that, for the same distribution of prediction errors, a precise timing of these events is perceptually more relevant than a segment-based method aiming at predicting precisely each individual segmental duration.

Keywords

Coherence Tempo Acoustics Mandel Santen 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [A1175]
    G. Allen. Speech rhythm: its relation to performance universals and articulatory timing. J. Phonetics 3:75–86, 1975.Google Scholar
  2. [AS92]
    M. Abe and H. Sato. Two-stage F 0 control model using syllable based F 0 units. In Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, 53–56, 1992.Google Scholar
  3. [Aub92]
    V. Aubergé. Developing a structured lexicon for synthesis of prosody. In Talking Machines: Theories, Models and Designs, G. Bailly and C. Benoit, eds. Elsevier B.V., North-Holland, Amsterdam, 307–321, 1992.Google Scholar
  4. [Bai89]
    G. Bailly, Integration of rhythmic and syntactic constraints in a model of generation of French prosody. Speech Comm. 8:137–146, 1989.CrossRefGoogle Scholar
  5. [BAL91 ]
    V. Berthier, C. Abry, and T. Lallouache. Coordination du geste et de la parole dans la production d’un instrument traditionnel. In Proceedings, Twelfth XIIe International Congress of Phonetic Sciences, vol. 4, Aix-en-Provence, France, 34–37, 1991.Google Scholar
  6. [BB92]
    P. Barbosa and G. Bailly. Generating segmental duration by p-centres. In Fourth Rhythm Workshop: Rhythm Perception and Production, C. Auxiette, C. Drake, and C. Gérard, eds. Ville de Bourges, Bourges — France, 163–168, 1992.Google Scholar
  7. [BB94]
    P. Barbosa and G. Bailly. Characterisation of rhythmic patterns for text-to-speech synthesis. Speech Comm. 15:127–137, 1994.CrossRefGoogle Scholar
  8. [BBW92]
    G. Bailly, T. Barbe, and H. Wang. Automatic labelling of large prosodic databases: tools, methodology and links with a text-to-speech system. In Talking Machines: Theories, Models and Designs, G. Bailly and C. Benoit, eds. Elsevier B.V., North-Holland, Amsterdam, 323–333, 1992.Google Scholar
  9. [BMA89]
    G. Bailly, P. F. Marteau, and C. Abry. A new algorithm for temporal decomposition of speech, application to a numerical model of coarticulation. In Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 508–511, 1989.Google Scholar
  10. [BS87]
    K. Bartkova and C. Sorin. A model of segmental duration for speech synthesis in French. Speech Comm. 6:245–260, 1987.CrossRefGoogle Scholar
  11. [Cam92]
    W. Campbell. Multi-level Timing in Speech. Ph.D. thesis, University of Sussex, Sussex, UK, 1992.Google Scholar
  12. [CM90]
    F. Charpentier and E. Moulines. Pitch-synchronous waveform processing techniques for text-to-speech using diphones. Speech Comm. 9(5-6):453–467, 1990.CrossRefGoogle Scholar
  13. [CN88]
    A. Cutler and D. Norris. The role of strong syllables in segmentation for lexical access. J. Experimental Psychology: Human Perception and Performance 14:113–121, 1988.CrossRefGoogle Scholar
  14. [Cou91]
    E. Couper-Kuhlen. A rhythm-based metric for turn-taking. In Proceedings, Twelfth International Congress of Phonetic Sciences, vol. 1, Aix-en-Provence, France, 275–278, 1991.Google Scholar
  15. [Dau83]
    R. M. Dauer. Stress-timing and syllable-timing re-analyzed. J. Phonetics 11:51–62, 1983.Google Scholar
  16. [Fan91]
    G. Fant. Units of temporal organization. Stress groups versus syllables and words. In Proceedings, Twelfth International Congress of Phonetic Sciences, vol. 1, Aix-en-Provence, 247–250, France, 1991.Google Scholar
  17. [Fra74]
    P. Fraisse. La psychologie du rythme. Presses Universitaires de France, Paris, 1974.Google Scholar
  18. [Fra80]
    P. Fraisse. Des synchronisations sensori-motrices aux rythmes. In Anticipation et Comportement, J. Requin, éd. Editions du CNRS, Paris, 233–257, 1980.Google Scholar
  19. [GA94]
    C. Gérard and C. Auxiette. The processing of musical prosody by musical and nonmusical children. Music Perception 9:471–503, 1992.CrossRefGoogle Scholar
  20. [GG83]
    J. P. Gee and F. Grosjean. Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology 15:418–458, 1983.Google Scholar
  21. [GJM94]
    L. A. Gerken, P. W. Jusczyk, and D. R. Mandel. When prosody fails to cue syntactic structure: 9-months-olds’ sensitivity to phonological versus syntactic phrases. Cognition, 20:237–265, 1994.CrossRefGoogle Scholar
  22. [HM87]
    D. Hary and G. P. Moore. Synchronizing human movement with an external clock source. Biological Cybernetics 56:305–311, 1987.CrossRefGoogle Scholar
  23. [Jor89]
    M. I. Jordan. Serial order: A parallel, distributed processing approach. In Advances in Connectionist Theory: Speech, J. L. Elman and D. E. Rumelhart, eds. Lawrence Erlbaum, Hillsdale, NJ, 1989.Google Scholar
  24. [Kla82]
    D. H. Klatt. The KLATTalk text-to-speech conversion system. In Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Paris, France, 1589–1592, 1982.Google Scholar
  25. [Koh86]
    K. J. Kohler. Invariability and variability in speech timing: from utterance to segment in German. In Invariance and Variability in Speech Processes, J. Perkell and D. H. Klatt, eds. Lawrence Erlbaum, Hillsdale, NJ, 268–298, 1986.Google Scholar
  26. [Kon91]
    G. Konopczynski. Acquisition de la proéminence dans le langage émergent. In Proceedings, International Congress of Phonetic Sciences, vol. 1, Aix-en-Provence, France, 333–337, 1991.Google Scholar
  27. [Lea74]
    W. A. Lea. Prosodic Aids to Speech Recognition: IV. A General Strategy for Prosodically-guided Speech Understanding. Univac Report PX10791, Sperry Univac, DSD, St. Paul, MN, 1974.Google Scholar
  28. [Leh77]
    I. Lehiste. Isochrony reconsidered. J. Phonetics 5:253–263, 1977.Google Scholar
  29. [Lli89]
    R. R. Llinás. The role of the intrinsic electrophysiological properties of central neurons in oscillation and resonance. In Cell to Cell Signalling: From Experiments to Theoretical Models, A. Goldbeter, ed. Academic Press, New York, 3–16, 1989.Google Scholar
  30. [MAB95]
    Y. Morlec, V. Auberge, and G. Bailly. Evaluation of automatic generation of prosody with a superposition model. International Congress of Phonetic Sciences, Stockholm, Sweden, 1995.Google Scholar
  31. [Mar75]
    S. M. Marcus. Perceptual centres. Unpublished fellowship dissertation, King’s College, Cambridge, UK, 1975.Google Scholar
  32. [Mar76]
    S. M. Marcus. Perceptual centres. PhD thesis, Cambridge University, Cambridge, 1976.Google Scholar
  33. [Mar81]
    S. M. Marcus. Acoustic determinants of Perceptual center (p-center) location. Perception and Psychophysics 30(3):247–256, 1981.CrossRefGoogle Scholar
  34. [MG93]
    P. Monnin and F. Grosjean. Les structures de performance en français: Caractérisation et prédiction. L’Année Psychologique 93:9–30, 1993.CrossRefGoogle Scholar
  35. [MMF76]
    J. Morton, S. Marcus, and C. Frankish. Perceptual centers (p-centers). Psychological Revue 83(5):405–408, 1976.CrossRefGoogle Scholar
  36. [Noo91]
    S. G. Nooteboom. Some observations on the temporal organisation and rhythm of speech. In Proceedings, International Congress of Phonetic Sciences, vol. 1, Aixen-Provence, France, 228–237, 1991.Google Scholar
  37. [OSh81 ]
    D. O’ Shaughnessy. A study of French vowel and consonant durations. J. Phonetics 9:385–406, 1981.Google Scholar
  38. [Pas92]
    V. Pasdeloup. Durée inter-syllabique dans le groupe accentuel en français. in XIXe Journées d’Etudes sur la Parole, 531–536, 1992.Google Scholar
  39. [PMK93]
    V. Pasdeloup, J. Morais, and R. Kolinsky. Are stress and phonemic string processed separately? Evidence from speech illusions. In Proceedings of the European Conference on Speech Communication and Technology, vol. 2, Berlin, 775–778, 1993.Google Scholar
  40. [Pom89]
    B. Pompino-Marschall. On the psychoacoustic nature of the p-center phenomenon. J. Phonetics 17:175–192, 1989.Google Scholar
  41. [Pri92]
    W. Prinz. Distal focussing in action control. In Fourth Rhythm Workshop: Rhythm Perception and Production, C. Auxiette, C. Drake, and C. Gérard, eds. Bourges, 65–71, 1992.Google Scholar
  42. [Sco93]
    S. Scott. Perceptual Centres in Speech—An Acoustic Analysis. Ph.D. thesis, University College, London, 1993.Google Scholar
  43. [SGE80]
    V. N. Sorokin, T. Gay, and W. Ewan. Some biomechanical correlates of the jaw movements. J. Acoust. Soc. Amer. 68:S32, 1980.Google Scholar
  44. [Sha82]
    L. H. Shaffer. Rhythm and timing in skill. Psychological Review 89:109–122, 1982.CrossRefGoogle Scholar
  45. [Smi78]
    B. L. Smith. Temporal aspects of English speech production: A developmental perspective. J. Phonetics 6:37–67, 1978.Google Scholar
  46. [SSV92]
    A. Semjen, H. H. Schulze, and D. Vorberg. Temporal control in the coordination between repetitive tapping and periodic external stimuli. In Fourth Rhythm Workshop: Rhythm Perception and Production, C. Auxiette, C. Drake, and C. Gérard, eds. Bourges, 73–78, 1992.Google Scholar
  47. [Tra92]
    C. Traber. F 0 generation with a database of natural F 0 patterns and with a neural network. In Talking Machines: Theories, Models and Designs, G. Bailly and C. Benoit, eds. Elsevier B.V., North-Holland, Amsterdam, 287–304, 1992.Google Scholar
  48. [TSR90]
    M. Turvey, R. Schmidt, and L. Rosenblum. Clock and motor components in absolute coordination of rhythmic movements. Haskins Laboratories Status Report on Speech Research, New Haven, CT, 231–242, 1990.Google Scholar
  49. [van94]
    J. P. H. van Santen. Assignment of segmental duration in text-to-speech synthesis. Computer, Speech and Language 8:95–128, 1994.CrossRefGoogle Scholar
  50. [WH94]
    B. Williams and S. M. Hiller. The question of randomness in English foot timing: A control experiment. J. Phonetics 22:423–439, 1994.Google Scholar
  51. [Wit77]
    I. H. Witten. A flexible scheme for assigning timing and pitch to synthetic speech. Language and Speech 20:240–260, 1977.Google Scholar
  52. [Woo51]
    H. Woodrow. Time perception. In Handbook of Experimental Psychology, S. Stevens, ed. Wiley, New York, 1224–1236, 1951.Google Scholar
  53. [WSOP92]
    C. Wightman, S. Shattuck-Hufnagel, M. Ostendorf, and P. Price. Segmental durations in the vicinity of prosodic boundaries. J. Acoust. Soc. Amer. 91(3): 1707–1717, 1992.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 1997

Authors and Affiliations

  • Plínio Almeida Barbosa
  • Gérard Bailly

There are no affiliations available

Personalised recommendations