Progress in Speech Synthesis pp 109-121 | Cite as
A Model of Timing for Nonsegmental Phonological Structure
Abstract
Usually the problem of timing in speech synthesis is construed as the search for appropriate algorithms for altering durations of speech units under various conditions (e.g., stressed versus unstressed syllables, final versus non-final position, nature of surrounding segments). This chapter proposes a model of phonological representation and phonetic interpretation based on Firthian prosodic analysis [Fir57], which is instantiated in the YorkTalk speech generation system. In this model timing is treated as part of phonetic interpretation and not as an integral part of phonological representation. This leads us to explore the possibility that speech rhythm is the product of relationships between abstract constituents of linguistic structure of which there is no single optimal distinguished unit.
Keywords
Phonological Representation Speech Synthesis Acoustical Society ofAmerica Natural Speech Synthetic SpeechPreview
Unable to display preview. Download preview PDF.
References
- [Abe64]D. Abercrombie. Syllable quantity and enclitics in English. In Honour of Daniel Jones, D. Abercrombie, D. B. Fry, P. A. D. MacCarthy, N. C. Scott and J. L. Trim, eds. Longman Green, London, 216–222, 1964.Google Scholar
- [BG89]C. P. Browman and L. M. Goldstein. Towards an articulatory phonology. Phonololgy Yearbook 3:219–252, 1989.Google Scholar
- [CI91]W. N. Campbell and S. D. Isard. Segment durations in a syllable frame. J. Phonetics 19:37–47, 1991.Google Scholar
- [Car57]J. C. Carnochan. Gemination in Hausa. In Studies in Linguistic Analysis, Special Volume of the Philological Society, 2nd edition, 49–81, 1957.Google Scholar
- [CH68]N. Chomsky and M. Halle. The Sound Pattern of English. Harper & Row: New York, 1968.Google Scholar
- [CUB73]C. H. Coker, N. Umeda, and C. P. Browman. Automatic synthesis from ordinary English text. IEEE Transactions on Audio and Electroacoustics, AU-21, 3:293–298, 1973.CrossRefGoogle Scholar
- [Col92]J. C. Coleman. The phonetic interpretation of headed phonological structures containing overlapping constituents. Phonology Yearbook 9(1):1–44, 1992.CrossRefGoogle Scholar
- [Col93]J. C. Coleman. Polysyllabic words in the YorkTalk synthesis system. In Papers in Laboratory Phonology HI, P. Keating, ed. Cambridge University Press, 293–324, 1993.Google Scholar
- [Fir57]J. R. Firth. A synopsis of Linguistic Theory. In Studies in Linguistic Analysis, Special Volume of the Philological Society, 2nd edition, 1–32, 1957.Google Scholar
- [Fow80]C. A. Fowler. Coarticulation and theories of extrinsic timing. Journal of Phonetics 8:113–133, 1980.Google Scholar
- [Fow81]C. A. Fowler. A relationship between coarticulation and compensatory shortening. Phonetica 38:35–50, 1981.CrossRefGoogle Scholar
- [Fow83]C. A. Fowler. Converging sources of evidence for spoken and pereived rhythms of speech: Cyclic production of vowels in sequences of monosyllabic stress feet. Journal of Experimental Psychology: General 112:386–412, 1983.CrossRefGoogle Scholar
- [Hen49]E. J. A. Henderson. Prosodies in Siamese. Asia Major 1:198–215, 1949.Google Scholar
- [Hen52]E. J. A. Henderson. The phonology of loanwords in some South-East Asian languages. Transactions of the Philological Society 131–158, 1952.Google Scholar
- [Kel89]J. Kelly. Swahili phonologcal structure: A prosodic view. In Le Swahili et ses Limites, M. F. Rombi, ed. Editions Recherche sur les Civilisations, Paris, 25–31, 1989.Google Scholar
- [Kel92]J. Kelly. Systems for open syllabics in North Welsh. In Studies in Systemic Phonology, P. Tench, ed. Pinter Publishers, London and New York, 87–97, 1992.Google Scholar
- [Ken94]M. Kenstowicz. Phonology in Generative Grammar. Basil Blackwell, Oxford, 1994.Google Scholar
- [Kla]D. H. Klatt. Klattalk: The conversion of English text to speech. Unpublished manuscript, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- [Kla87]D. H. Klatt. Review of text-to-speech conversion for English. Journal of the Acoustical Society of America 82(3):737–793, 1987.CrossRefGoogle Scholar
- [LR73]B. Lindblom and K. Rapp. Some temporal regularities of spoken Swedish. Papers in Linguistics from the University of Stockholm 21:1–59, 1973.Google Scholar
- [Loc90]J. K. Local. Some rhythm, resonance and quality variations in urban Tyneside speech. In Studies in the Pronunciation of English: A Commemorative Volume in Honour of A C Gimson, S. Ramsaren, ed. Routledge, London, 286–292, 1990.Google Scholar
- [Loc92]J. K. Local. Modelling assimilation in a non-segmental rule-free phonology. In Papers in Laboratory Phonology II, G. J. Docherty and D. R. Ladd, eds. CUP, Cambridge, 190–223, 1992.Google Scholar
- [LO94]J. K. Local and R. A. Ogden. Temporal exponents of word-structure in English. York Research Papers in Linguistics. YLLS/RP 1994.Google Scholar
- [Man92]S. Y. Manuel, S. Shattuck-Hufnagel, M. Huffman, K. N. Stevens, R. Carlson, and S. Hunnicutt. Studies of vowel and consonant reduction. In Proceedings of ICSLP 2:943–946, 1992.Google Scholar
- [Ogd92]R. A. Ogden. Parametric interpretation in YorkTalk. York Papers in Linguistics 16:81–99, 1992.Google Scholar
- [Ogd93]R. A. Ogden. European Patent Application 93307872.7 — YorkTalk. 1993.Google Scholar
- [Par84]B. H. Partee. Compositionality. In Varieties of Formal Semantics, F. Landman and F. Veltman, eds. Foris, Dordrecht, 281–312, 1984.Google Scholar
- [Ril92]M. D. Riley. Tree-based modeling for speech synthesis. In Talking Machines: Theories, Models, and Designs, G. Bailly and C. Benoit, eds. Elsevier, North-Holland, Amsterdam, 265–273, 1992.Google Scholar
- [Sim92]A. Simpson. The phonologies of the English auxiliary system. In Who Climbs the Grammar Tree? R. Tracy, ed. Niemeyer, Tuebingen, 209–219, 1992.Google Scholar
- [Smi93]C. L. Smith. Prosodic patterns in the coordination of vowel and consonant gestures. Paper given at the Fourth Laboratory Phonology Meeting, Oxford, August, 1993.Google Scholar
- [Spr66]R. K. Sprigg. Vowel harmony in Lhasa Tibetan: Prosodic analysis applied to interrelated vocalic features of successive syllables. Bulletin of the School of Oriental and African Studies 24:116–138, 1966.CrossRefGoogle Scholar
- [van92]J. P. H. van Santen. Deriving text-to-speech durations from natural speech. In Talking Machines: Theories, Models, and Designs, G. Bailly and C. Benoit, eds. Elsevier, North-Holland, Amsterdam, 275–285, 1992.Google Scholar
- [van94]J. P. H. van Santen. Assignment of segmental duration in text-to-speech synthesis. Computer Speech & Language 8:95–128, 1994.CrossRefGoogle Scholar
- [VCR92]J. P. H. van Santen, J. Coleman, and M. Randolph. Effects of postvocalic voicing on the time course of vowels and diphthongs. Journal of the Acoustical Society of America 92:2444, 1992.Google Scholar
- [Whe81]D. Wheeler. Aspects of a Categorial Theory of Phonology. Graduate Linguistics Student Association, University of Massachusetts at Amherst, 1981.Google Scholar
- [Wii91]K. Wiik. On a third type of speech rhythm: Foot timing. In Proceedings of the Twelfth International Congress of Phonetic Sciences, Aix-en-Provence, 3:298–301, 1991.Google Scholar