Multistream Recognition of Dialogue Acts in Meetings

  • Alfred Dielmann
  • Steve Renals
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)


We propose a joint segmentation and classification approach for the dialogue act recognition task on natural multi-party meetings (ICSI Meeting Corpus). Five broad DA categories are automatically recognised using a generative Dynamic Bayesian Network based infrastructure. Prosodic features and a switching graphical model are used to estimate DA boundaries, in conjunction with a factored language model which is used to relate words and DA categories. This easily generalizable and extensible system promotes a rational approach to the joint DA segmentation and recognition task, and is capable of good recognition performance.


Automatic Speech Recognition Dynamic Bayesian Network Word Error Rate Prosodic Feature Conditional Probability Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ang, J., Liu, Y., Shriberg, E.: Automatic dialog act segmentation and classification in multiparty meetings. In: Proc. of the IEEE ICASSP (March 2005)Google Scholar
  2. 2.
    Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI meeting recorder dialog act (MRDA) corpus. In: Proc. HLT-NAACL SIGDIAL Workshop (April–May 2004)Google Scholar
  3. 3.
    Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Jurafsky, D., Taylor, P., Martin, R., Van Ess-Dykema, C., Meteer, M.: Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics (26), 339–373 (2000)CrossRefGoogle Scholar
  4. 4.
    Nagata, M., Morimoto, T.: An experimental statistical dialogue model to predict the speech act type of the next utterance. In: Proc. of the International Symposium on Spoken Dialogue, pp. 83–86 (November 1993)Google Scholar
  5. 5.
    Bilmes, J., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of HLT/NAACL 2003 (May 2003)Google Scholar
  6. 6.
    Shriberg, E., Bates, R., Taylor, P., Stolcke, A., Jurafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., Van Ess-Dykema, C.: Can prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech (41), 439–487 (1998)Google Scholar
  7. 7.
    Hastie, H., Poesio, M., Isard, S.: Automatically predicting dialogue structure using prosodic features. Speech Communication (36), 63–79 (2002)CrossRefGoogle Scholar
  8. 8.
    Zimmermann, M., Liu, Y., Shriberg, E., Stolcke, A.: Toward joint segmentation and classification of dialog acts in multiparty meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Ji, G., Bilmes, J.: Dialog act tagging using graphical models. In: Proc. of the IEEE ICASSP (March 2005)Google Scholar
  10. 10.
    Venkataraman, A., Ferrer, L., Stolcke, A., Shriberg, E.: Training a prosody-based dialog act tagger from unlabeled data. In: Proc. of the IEEE ICASSP (April 2003)Google Scholar
  11. 11.
    Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI meeting corpus. In: Proc. IEEE ICASSP (April 2003)Google Scholar
  12. 12.
    Hain, T., Karafiát, M., Garau, G., Moore, D., Wan, V., Ordelman, R., Renals, S.: Transcription of conference room meetings: An investigation. In: Proc. Interspeech 2005, Eurospeech, Lisbon (September 2005)Google Scholar
  13. 13.
    Kirchhoff, K., Bilmes, J., Henderson, J., Schwartz, R., Noamany, M., Schone, P., Ji, G., Das, S., Egan, M., He, F., Vergyri, D., Liu, D., Duta, N.: Novel approaches to arabic speech recognition - final report from the jhu summer workshop 2002. Tech. Rep., John-Hopkins University (2002)Google Scholar
  14. 14.
    Stolcke, A.: SRILM an extensible language modeling toolkit. In: Proc. Int. Conf. on Spoken Language Processing (September 2002)Google Scholar
  15. 15.
    Murphy, K.P.: Dynamic Bayesian networks: Representation, inference and learning. Ph.D. Thesis, UC Berkeley, Computer Science Division (July 2002)Google Scholar
  16. 16.
    Bilmes, J., Zweig, G.: The Graphical Model ToolKit: an open source software system for speech and time-series processing. In: Proc. IEEE ICASSP (June 2002)Google Scholar
  17. 17.
    Bilmes, J.A.: Dynamic bayesian multinets. In: Proc. Int. Conf. on Uncertainty in Artificial Intelligence (2000)Google Scholar
  18. 18.
    Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: A pre-announcement. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alfred Dielmann
    • 1
  • Steve Renals
    • 1
  1. 1.Centre for Speech Technology ResearchUniversity of EdinburghEdinburghUK

Personalised recommendations