Using Prosody for Automatic Sentence Segmentation of Multi-party Meetings

  • Jáchym Kolář
  • Elizabeth Shriberg
  • Yang Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)


We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification results for reference word transcripts as well as for transcripts from a state-of-the-art automatic speech recognizer (ASR). We also compare results using the lexical model plus a pause-only prosody model, versus results using additional prosodic features. Results show that (1) information from pauses is important, including pause duration both at the boundary and at the previous and following word boundaries; (2) adding duration, pitch, and energy features yields significant improvement over pause alone; (3) the integrated boosting-based model performs better than the HMM for ASR conditions; (4) training the boosting-based model on recognized words yields further improvement.


Hide Markov Model Minority Class Speech Recognition System Word Boundary Prosodic Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Stolcke, A., Shriberg, E., Bates, R., Ostendorf, M., Hakkani, D., Plauche, M., Tur, G., Lu, Y.: Automatic Detection of Sentence Boundaries and Disfluencies Based on Recognized Words. In: Proc. ICSLP 1998, Sydney, pp. 2247–2250 (1998)Google Scholar
  2. 2.
    Shriberg, E., Stolcke, A., Hakkani-Tur, D., Tur, G.: Prosody-based Automatic Segmentation of Speech into Sentences and Topics. Speech Communication 32(1-2), 127–154 (2000)CrossRefGoogle Scholar
  3. 3.
    Warnke, V., Kompe, R., Niemann, H., Nöth, E.: Integrated Dialog Act Segmentation and Classification Using Prosodic Features and Language Models. In: Proc. EUROSPEECH 1997, Rhodes, Greece, pp. 207–210 (1997)Google Scholar
  4. 4.
    Huang, J., Zweig, G.: Maximum Entropy Model for Punctuation Annotation from Speech. In: Proc. ICSLP 2002, Denver, pp. 917–920 (2002)Google Scholar
  5. 5.
    Kim, J.H., Woodland, P.: A Combined Punctuation Generation and Speech Recognition System and Its Performance Enhancement Using Prosody. Speech Communication 41(4), 563–577 (2003)CrossRefGoogle Scholar
  6. 6.
    Liu, Y., Stolcke, A., Harper, M., Shriberg, E.: Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech. In: Proc. EMNLP, Barcelona, Spain (2004)Google Scholar
  7. 7.
    Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Peskin, B., Harper, M.: The ICSI-SRI-UW Metadata Extraction System. In: ICSLP 2004, Jeju, Korea (2004)Google Scholar
  8. 8.
    Kolář, J., Švec, J., Psutka, J.: Automatic Punctuation Annotation in Czech Broadcast News Speech. In: Proc. SPECOM 2004, St. Petersburg, Russia (2004)Google Scholar
  9. 9.
    Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Using Conditional Random Fields for Sentence Boundary Detection in Speech. In: Proc. ACL, Ann Arbor, pp. 451–458 (2005)Google Scholar
  10. 10.
    Ang, J., Liu, Y., Shriberg, E.: Automatic Dialog Act Segmentation and Classification in Multiparty Meetings. In: Proc. IEEE ICASSP 2005, Philadelphia, pp. 1061–1064 (2005)Google Scholar
  11. 11.
    Ji, G., Bilmes, J.: Dialog Act Tagging Using Graphical Models. In: Proc. IEEE ICASSP 2005, Philadelphia, pp. 33–36 (2005)Google Scholar
  12. 12.
    Zimmermann, M., Stolcke, A., Shriberg, E.: Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings. In: Proc.: IEEE ICASSP 2006, Toulouse, France (2006)Google Scholar
  13. 13.
    Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: Proc. IEEE ICASSP 2003, Hong Kong, pp. 364–367 (2003)Google Scholar
  14. 14.
    Dhillon, R., et al.: Meeting Recorder Project: Dialog Act Labeling Guide. ICSI Technical Report TR-04-02, International Computer Science Institute, Berkeley (2004)Google Scholar
  15. 15.
    Shriberg, E., et al.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: Proc. SIGDIAL, Cambridge, MA, USA (2004)Google Scholar
  16. 16.
    Zhu, Q., Stolcke, A., Chen, B., Morgan, N.: Using MLP Features in SRI’s Conversational Speech Recognition System. In: Proc. INTERSPEECH 2005, Lisboa, pp. 2141–2144 (2005)Google Scholar
  17. 17.
    Buckow, J., Warnke, V., Huber, R., Batliner, A., Nöth, E., Niemann, H.: Fast and Robust Features for Prosodic Classification. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 193–198. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  18. 18.
    Liu, Y., Shriberg, E., Stolcke, A., Harper, M.: Using Machine Learning to Cope with Imbalanced Classes in Natural Speech: Evidence from Sentence Boundary and Disfluency Detection. In: Proc ICSLP 2004, Jeju, Korea (2004)Google Scholar
  19. 19.
    Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)MATHMathSciNetGoogle Scholar
  20. 20.
    Schapire, R.E., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 39(2/3), 135–168 (2000)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jáchym Kolář
    • 1
    • 2
  • Elizabeth Shriberg
    • 1
    • 3
  • Yang Liu
    • 1
    • 4
  1. 1.International Computer Science InstituteBerkeleyUSA
  2. 2.Department of CyberneticsUniversity of West Bohemia in PilsenCzech Republic
  3. 3.SRI InternationalMenlo ParkUSA
  4. 4.University of Texas at DallasUSA

Personalised recommendations