Skip to main content

Training Prosody-Syntax Recognition Models without Prosodic Labels

  • Chapter
Computing Prosody

Abstract

This chapter presents three prosodic recognition models which are capable of resolving syntactic ambiguities using acoustic features measured from the speech signal. The models are based on multi-variate statistical techniques that identify a linear relationship between sets of acoustic and syntactic features. One of the models requires hand-labelled break indices for training and achieves up to 76% accuracy in resolving syntactic ambiguities on a standard corpus. The other two prosodic recognition models can be trained without any prosodic labels. These prosodically unsupervised models achieve recognition accuracy of up to 74%. This result suggests that it may be possible to train prosodic recognition models for very large speech corpora without requiring any prosodic labels.

Research was carried out while affiliated with the Speech Technology Research Group, University of Sydney and ATR Interpreting Telecommunications Research Labs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. T. W. Anderson. An Introduction to Multivariate Statistical Analysis: 2nd ed. New York: Wiley, 1984.

    MATH  Google Scholar 

  2. W. N. Campbell. Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343–354, 1993.

    Article  Google Scholar 

  3. T. H. Crystal and A. S. House. Articulation rate and the duration of syllables and stress groups in connected speech. J. Acoust. Soc. Am., 88:101–112, 1990.

    Article  ADS  Google Scholar 

  4. D. Grinberg, J. Lafferty, and D. Sleator. A robust parsing algorithm for link grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies, Prague, 1995.

    Google Scholar 

  5. A. J. Hunt. Utilising prosody to perform syntactic disambiguation. In Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1339–1342, 1993.

    Google Scholar 

  6. A. J. Hunt. A generalised model for utilising prosodic information in continuous speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 169–172, 1994.

    Google Scholar 

  7. A. J. Hunt. Improving speech understanding through integration of prosody and syntax. In Proceedings of the 7th Aust. Joint Conference on Artificial Intelligence, pp. 442–449, Armidale, Australia, 1994.

    Google Scholar 

  8. A. J. Hunt. A prosodic recognition module based on linear discriminant analysis. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, pp. 1119–1122, 1994.

    Google Scholar 

  9. A. J. Hunt. Models of Prosody and Syntax and their Application to Automatic Speech Recognition. Ph.D. thesis, University of Sydney, 1995.

    Google Scholar 

  10. A. J. Hunt. Syntactic influence on prosodic phrasing in the framework of the link grammar. In Proceedings of the European Conference on Speech Communication and Technology, Madrid, Spain, 1995.

    Google Scholar 

  11. D. H. Klatt. Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3:129–140, 1975.

    ADS  Google Scholar 

  12. I. A. Melcuk. Dependency Syntax: Theory and Practice. Albany: State University of New York Press, 1988.

    Google Scholar 

  13. M. Ostendorf, P. J. Price, and S. Shattuck-Hufnagel. The Boston University Radio News Corpus. Technical Report ECS- 95–001, Boston University ECS Dept., 1995.

    Google Scholar 

  14. M. Ostendorf, C. W. Wightman, and N. M. Veilleux. Parse scoring with prosodic information: An analysis-by-synthesis approach. Computer Speech and Language, 7:193–210, 1993.

    Article  Google Scholar 

  15. J. Pitrelli, M. E. Beckman, and J. Hirschberg. Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 123–126, 1994.

    Google Scholar 

  16. P. J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. The use of prosody in syntactic disambiguation. J. Acoust. Soc. Am., 90:2956–2970, 1991.

    Article  ADS  Google Scholar 

  17. K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, Wightman, P. Price, J. Pierrehumbert, and J. Hirschberg. ToBI: a standard for labelling English prosody. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, Vol. 2, pp. 867–870, 1992.

    Google Scholar 

  18. SPlus. Guide to Statistical and Mathematical Analysis. Seattle: StatSci, 1993.

    Google Scholar 

  19. D. Sleator and D. Temperley. Parsing English with a link grammar. Technical report, CMU-CS-91–196, School of Computer Science, Carnegie Mellon University, 1991.

    Google Scholar 

  20. N. Veilleux and M. Ostendorf. Probabilistic parse scoring with prosodic information. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, Vol. II, pp. 51–54, 1993.

    Article  Google Scholar 

  21. N. M. Veilleux and M. Ostendorf. Prosody/parse scoring and its application in ATIS. In Proceedings of the DARPA Workshop on Speech and Natural Language Processing, 1993.

    Google Scholar 

  22. N. M. Veilleux, M. Ostendorf, and C. W. Wightman. Parse scoring with prosodic information. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, pp. 1605–1608, 1992.

    Google Scholar 

  23. C. W. Wightman and M. Ostendorf. Automatic labelling of prosodic patterns. IEEE Trans, on Speech and Audio Processes, 2:469–481, 1994.

    Article  Google Scholar 

  24. C. W. Wightman, S. Shattuck-Hufnagel, M. Ostendorf, and P. J. Price. Segmental durations in the vicinity of prosodic phrase boundaries. J. Acoust. Soc. Am., 91:1707–1717, 1992.

    Article  ADS  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Hunt, A.J. (1997). Training Prosody-Syntax Recognition Models without Prosodic Labels. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2258-3_20

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-7476-6

  • Online ISBN: 978-1-4612-2258-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics