Abstract
This paper presents a constraint-based model for the interpretation of multilinear representations of speech utterances which can provide important fine-grained information for speech recognition applications. The model uses explicit structural constraints specifying time maps—overlap and precedence relations between features in both the phonological and the phonetic domains—in order to recognize well-formed syllable structures. In the phonological domain, these constraints together form a complete phonotactic description of the language, while in the phonetic domain, the constraints define the internal structure of phonological features based on phonetic realisations. The constraints are enhanced by a constraint relaxation procedure to cater for underspecified input and allow output representations to be extrapolated based on the phonetic and phonological information contained in the constraints and the rankings which have been assigned to them. This approach thus describes the integration of explicit phonetic knowledge into a computational linguistic model to improve robustness in speech recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ali, A.M..A., Van der Spiegel, J., Mueller, P., Haentjaens, G., and Berman, J. An Acoustic-Phonetic Feature-Based System for Automatic Phoneme Recognition in Continuous Speech. In: IEEE International Symposium on Circuits and Systems (ISCAS-99), 1999: III-118–III-121.
Ashby, S., Carson-Berndsen, J. and Joue, G. A testbed for the development of multilingual phonotactic descriptions. In: Proceedings of Eurospeech 2001, Aalborg, 2001: 321–324.
Bird, S. and Klein, E. Phonological Events. Journal of Linguistics 26 (1990): 33–56.
Boersma, P. Functional Phonology. LOT, Netherlands Graduate School of Linguistics, The Hague. 1998.
Browman, C.P. and Goldstein, L. Articulatory gestures as phonological units. In: Phonology 6, Cambridge University Press, Cambridge, 1989: 201–251.
Carson, J. Unification and Transduction in Computational Phonology. In: Proceedings of the 12th International Conference on Computational Linguistics, Budapest, 1, 1988: 106–111.
Carson-Berndsen, J. Phonological Processing of Speech Variants. In: Proceedings of the 13th International Conference on Computational Linguistics (COLING-90) Helsinki, 3, 1990: 21–24.
Carson-Berndsen, J. Time Map Phonology: Finite State Models and Event Logics in Speech Recognition. Kluwer Academic Publishers, Dordrecht, 1998.
Carson-Berndsen, J. A Generic Lexicon Tool for Word Model Definition in Multimodal Applications. In: Proceedings of EUROSPEECH 99, 6th European Conference on Speech Communication and Technology, Budapest, September 1999: 2235–2238.
Carson-Berndsen, J. Finite State Models, Event Logics and Statistics in Speech Recognition. In: G. Gazdar, K. Sparck Jones, and R. Needham(eds.): Computers, Language and Speech: Integrating formal theories and statistical data. Philosophical Transactions of the Royal Society, Series A, 358(1770), 2000: 1255–1266.
Carson-Berndsen, J. Multilingual Time Maps: Portable Phonotactic Models for Speech Technology. In: Proceedings of the LREC Workshop on Portability Issues in Human Language Technology. Las Palmas, May 2002.
Carson-Berndsen, J. and Joue, G. Cognitive constraints in a computational linguistic model for speech recognition. In: Proceedings of the 11th Irish Conference on Artificial Intelligence and Cognitive Science, Galway, Ireland, 2000.
Carson-Berndsen, J. and Walsh, M. Generic techniques for multilingual speech technology applications. In: Proceedings of the 7th Conference on Automatic Natural Language Processing, Lausanne, Switzerland, 2000a: 61–70.
Carson-Berndsen, J. and Walsh, M. Interpreting Multilinear Representations in Speech. In: Proceedings of the 8th Australian Conference on Speech Science and Technology, Canberra, Australia, 2000b: 472–477.
Carson-Berndsen, J., Joue, G. and Walsh, M. Phonotactic Constraint Ranking for Speech Recognition. In: W. Daelemans, K. Sima’an, J. Veenstra, and J. Zavrel (eds.): Computational Linguistics in the Netherlands 2000, Editions Rodopi b.v. Amsterdam, New York: 2001: 16–29.
Chang, S., Shastri, L. and Greenberg, S. Automatic Phonetic Transcription of Spontaneous Speech (American English). In: ICSLP-2000, Beijing, October 2000.
Chang, S., Greenberg, S. and Wester, M. An Elitist Approach to Articulatory-Acoustic Feature Classification. In: Proceedings of Eurospeech 2001, Aalborg, 2001: 1725–1728.
Church, K.W. Phonological Parsing in Speech Recognition. Kluwer Academic Publishers, Boston, 1987.
Coleman, J.S. and Local, J.K. Monostratal Phonology and Speech Synthesis. In: P. Tench (ed.): Studies in Systemic Phonology. London, Pinter Publishers. 1992: 183–193.
Deng, L. Speech Recognition Using Autosegmental Representation of Phonological Units with Interface to the Trended HMM. Free Speech Journal, 1997.
Deng, L. A dynamic feature-based approach to the interface between phonology and phonetics for speech modelling and recognition. Speech Communication 24 (1998): 299–323.
Goldsmith, J. Autosegmental Phonology. Indiana University Linguistics Club, Bloomington Indiana. 1976.
Goldsmith, J. Autosegmental and Metrical Phonology. Basil Blackwell, Cambridge, MA, 1990.
Greenberg, S. Speaking in shorthand-a syllable-centric perspective for understanding pronunciation variation. Speech Communication 29(2–4) (1999): 159–176.
Jusek, A., Rautenstrauch, H., Fink, G.A., Kummert, F., Sagerer, G., Carson-Berndsen, J. and Gibbon, D. Detektion unbekannter Wörter mit Hilfe phonotaktischer Modelle. In: Mustererkennung 94, 16. DAGM-Symposium Wien Berlin, Springer Verlag, 1994: 238–245.
Kent, R.D. and Read, C. The Acoustic Analysis of Speech. Whurr Publishers, 1992.
Koreman, J., Andreeva, B. and Strik, H. Acoustic Parameters Versus Phonetic Features in ASR. In: Proceedings of ICPhS 99, 1999: 719–722.
Pols, L. Flexible, Robust, and Efficient Human Speech Processing Versus Present-day Speech Technology. In: Proceedings of ICPhS 99, 1999: 9–16.
Salomon, A. and Espy-Wilson, C. Automatic Detection of Manner Events based on Temporal Parameters. In: Proceedings of EUROSPEECH 99, 6th European Conference on Speech Communication and Technology, 1999: 2797–2800.
Stevens, K.N. Acoustic Phonetics. The MIT Press, Cambridge MA, London, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer
About this chapter
Cite this chapter
Carson-Berndsen, J., Walsh, M. (2005). Phonetic Time Maps. In: Barry, W.J., van Dommelen, W.A. (eds) The Integration of Phonetic Knowledge in Speech Technology. Text, Speech and Language Technology, vol 25. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2637-4_4
Download citation
DOI: https://doi.org/10.1007/1-4020-2637-4_4
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2635-5
Online ISBN: 978-1-4020-2637-9
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)