Training Prosody-Syntax Recognition Models without Prosodic Labels

Hunt, Andrew J.

doi:10.1007/978-1-4612-2258-3_20

Andrew J. Hunt

289 Accesses
1 Citations

Abstract

This chapter presents three prosodic recognition models which are capable of resolving syntactic ambiguities using acoustic features measured from the speech signal. The models are based on multi-variate statistical techniques that identify a linear relationship between sets of acoustic and syntactic features. One of the models requires hand-labelled break indices for training and achieves up to 76% accuracy in resolving syntactic ambiguities on a standard corpus. The other two prosodic recognition models can be trained without any prosodic labels. These prosodically unsupervised models achieve recognition accuracy of up to 74%. This result suggests that it may be possible to train prosodic recognition models for very large speech corpora without requiring any prosodic labels.

Research was carried out while affiliated with the Speech Technology Research Group, University of Sydney and ATR Interpreting Telecommunications Research Labs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech Processing and Prosody

Conceptualizing syntactic categories as semantic categories: Unifying part-of-speech identification and semantics using co-occurrence vector averaging

Article 13 September 2018

Cross-Language Dependency Parsing Using Part-of-Speech Patterns

References

T. W. Anderson. An Introduction to Multivariate Statistical Analysis: 2nd ed. New York: Wiley, 1984.
MATH Google Scholar
W. N. Campbell. Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343–354, 1993.
Article Google Scholar
T. H. Crystal and A. S. House. Articulation rate and the duration of syllables and stress groups in connected speech. J. Acoust. Soc. Am., 88:101–112, 1990.
Article ADS Google Scholar
D. Grinberg, J. Lafferty, and D. Sleator. A robust parsing algorithm for link grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies, Prague, 1995.
Google Scholar
A. J. Hunt. Utilising prosody to perform syntactic disambiguation. In Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1339–1342, 1993.
Google Scholar
A. J. Hunt. A generalised model for utilising prosodic information in continuous speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 169–172, 1994.
Google Scholar
A. J. Hunt. Improving speech understanding through integration of prosody and syntax. In Proceedings of the 7th Aust. Joint Conference on Artificial Intelligence, pp. 442–449, Armidale, Australia, 1994.
Google Scholar
A. J. Hunt. A prosodic recognition module based on linear discriminant analysis. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, pp. 1119–1122, 1994.
Google Scholar
A. J. Hunt. Models of Prosody and Syntax and their Application to Automatic Speech Recognition. Ph.D. thesis, University of Sydney, 1995.
Google Scholar
A. J. Hunt. Syntactic influence on prosodic phrasing in the framework of the link grammar. In Proceedings of the European Conference on Speech Communication and Technology, Madrid, Spain, 1995.
Google Scholar
D. H. Klatt. Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3:129–140, 1975.
ADS Google Scholar
I. A. Melcuk. Dependency Syntax: Theory and Practice. Albany: State University of New York Press, 1988.
Google Scholar
M. Ostendorf, P. J. Price, and S. Shattuck-Hufnagel. The Boston University Radio News Corpus. Technical Report ECS- 95–001, Boston University ECS Dept., 1995.
Google Scholar
M. Ostendorf, C. W. Wightman, and N. M. Veilleux. Parse scoring with prosodic information: An analysis-by-synthesis approach. Computer Speech and Language, 7:193–210, 1993.
Article Google Scholar
J. Pitrelli, M. E. Beckman, and J. Hirschberg. Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 123–126, 1994.
Google Scholar
P. J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. The use of prosody in syntactic disambiguation. J. Acoust. Soc. Am., 90:2956–2970, 1991.
Article ADS Google Scholar
K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, Wightman, P. Price, J. Pierrehumbert, and J. Hirschberg. ToBI: a standard for labelling English prosody. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, Vol. 2, pp. 867–870, 1992.
Google Scholar
SPlus. Guide to Statistical and Mathematical Analysis. Seattle: StatSci, 1993.
Google Scholar
D. Sleator and D. Temperley. Parsing English with a link grammar. Technical report, CMU-CS-91–196, School of Computer Science, Carnegie Mellon University, 1991.
Google Scholar
N. Veilleux and M. Ostendorf. Probabilistic parse scoring with prosodic information. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, Vol. II, pp. 51–54, 1993.
Article Google Scholar
N. M. Veilleux and M. Ostendorf. Prosody/parse scoring and its application in ATIS. In Proceedings of the DARPA Workshop on Speech and Natural Language Processing, 1993.
Google Scholar
N. M. Veilleux, M. Ostendorf, and C. W. Wightman. Parse scoring with prosodic information. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, pp. 1605–1608, 1992.
Google Scholar
C. W. Wightman and M. Ostendorf. Automatic labelling of prosodic patterns. IEEE Trans, on Speech and Audio Processes, 2:469–481, 1994.
Article Google Scholar
C. W. Wightman, S. Shattuck-Hufnagel, M. Ostendorf, and P. J. Price. Segmental durations in the vicinity of prosodic phrase boundaries. J. Acoust. Soc. Am., 91:1707–1717, 1992.
Article ADS Google Scholar

Download references

Authors

Andrew J. Hunt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ATR Interpreting Telecommunications Research Labs, 2-2, Hikaridai, Seika-cho, Soraku-gun, 619-02, Kyoto, Japan
Yoshinori Sagisaka , Nick Campbell & Norio Higuchi , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hunt, A.J. (1997). Training Prosody-Syntax Recognition Models without Prosodic Labels. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_20

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2258-3_20
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Training Prosody-Syntax Recognition Models without Prosodic Labels

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech Processing and Prosody

Conceptualizing syntactic categories as semantic categories: Unifying part-of-speech identification and semantics using co-occurrence vector averaging

Cross-Language Dependency Parsing Using Part-of-Speech Patterns

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Training Prosody-Syntax Recognition Models without Prosodic Labels

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech Processing and Prosody

Conceptualizing syntactic categories as semantic categories: Unifying part-of-speech identification and semantics using co-occurrence vector averaging

Cross-Language Dependency Parsing Using Part-of-Speech Patterns

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation