Abstract
This chapter presents three prosodic recognition models which are capable of resolving syntactic ambiguities using acoustic features measured from the speech signal. The models are based on multi-variate statistical techniques that identify a linear relationship between sets of acoustic and syntactic features. One of the models requires hand-labelled break indices for training and achieves up to 76% accuracy in resolving syntactic ambiguities on a standard corpus. The other two prosodic recognition models can be trained without any prosodic labels. These prosodically unsupervised models achieve recognition accuracy of up to 74%. This result suggests that it may be possible to train prosodic recognition models for very large speech corpora without requiring any prosodic labels.
Research was carried out while affiliated with the Speech Technology Research Group, University of Sydney and ATR Interpreting Telecommunications Research Labs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
T. W. Anderson. An Introduction to Multivariate Statistical Analysis: 2nd ed. New York: Wiley, 1984.
W. N. Campbell. Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343–354, 1993.
T. H. Crystal and A. S. House. Articulation rate and the duration of syllables and stress groups in connected speech. J. Acoust. Soc. Am., 88:101–112, 1990.
D. Grinberg, J. Lafferty, and D. Sleator. A robust parsing algorithm for link grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies, Prague, 1995.
A. J. Hunt. Utilising prosody to perform syntactic disambiguation. In Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1339–1342, 1993.
A. J. Hunt. A generalised model for utilising prosodic information in continuous speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 169–172, 1994.
A. J. Hunt. Improving speech understanding through integration of prosody and syntax. In Proceedings of the 7th Aust. Joint Conference on Artificial Intelligence, pp. 442–449, Armidale, Australia, 1994.
A. J. Hunt. A prosodic recognition module based on linear discriminant analysis. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, pp. 1119–1122, 1994.
A. J. Hunt. Models of Prosody and Syntax and their Application to Automatic Speech Recognition. Ph.D. thesis, University of Sydney, 1995.
A. J. Hunt. Syntactic influence on prosodic phrasing in the framework of the link grammar. In Proceedings of the European Conference on Speech Communication and Technology, Madrid, Spain, 1995.
D. H. Klatt. Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3:129–140, 1975.
I. A. Melcuk. Dependency Syntax: Theory and Practice. Albany: State University of New York Press, 1988.
M. Ostendorf, P. J. Price, and S. Shattuck-Hufnagel. The Boston University Radio News Corpus. Technical Report ECS- 95–001, Boston University ECS Dept., 1995.
M. Ostendorf, C. W. Wightman, and N. M. Veilleux. Parse scoring with prosodic information: An analysis-by-synthesis approach. Computer Speech and Language, 7:193–210, 1993.
J. Pitrelli, M. E. Beckman, and J. Hirschberg. Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 123–126, 1994.
P. J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. The use of prosody in syntactic disambiguation. J. Acoust. Soc. Am., 90:2956–2970, 1991.
K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, Wightman, P. Price, J. Pierrehumbert, and J. Hirschberg. ToBI: a standard for labelling English prosody. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, Vol. 2, pp. 867–870, 1992.
SPlus. Guide to Statistical and Mathematical Analysis. Seattle: StatSci, 1993.
D. Sleator and D. Temperley. Parsing English with a link grammar. Technical report, CMU-CS-91–196, School of Computer Science, Carnegie Mellon University, 1991.
N. Veilleux and M. Ostendorf. Probabilistic parse scoring with prosodic information. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, Vol. II, pp. 51–54, 1993.
N. M. Veilleux and M. Ostendorf. Prosody/parse scoring and its application in ATIS. In Proceedings of the DARPA Workshop on Speech and Natural Language Processing, 1993.
N. M. Veilleux, M. Ostendorf, and C. W. Wightman. Parse scoring with prosodic information. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, pp. 1605–1608, 1992.
C. W. Wightman and M. Ostendorf. Automatic labelling of prosodic patterns. IEEE Trans, on Speech and Audio Processes, 2:469–481, 1994.
C. W. Wightman, S. Shattuck-Hufnagel, M. Ostendorf, and P. J. Price. Segmental durations in the vicinity of prosodic phrase boundaries. J. Acoust. Soc. Am., 91:1707–1717, 1992.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Hunt, A.J. (1997). Training Prosody-Syntax Recognition Models without Prosodic Labels. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_20
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2258-3_20
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive