Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings

  • Jun Ogata
  • Futoshi Asano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4105)


In this paper, we presents a stream-based speech event classification and segmentation method in meeting recordings. Four speech events are considered: normal speech, laughter, cough and pause between talks. hidden Markov Models (HMMs) are used to model these speech events and a model topology optimization using Bayesian Information Criterion (BIC) is applied. Experimental results have shown that our system can obtain satisfying results. Based on the detected speech events, the recording of the meeting is structured using an XML-based description language and is visualized by a browser.


Hide Markov Model Bayesian Information Criterion Speech Recognition Automatic Speech Recognition Acoustic Event 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 605–608 (2004)Google Scholar
  2. 2.
    Dielmann, A., Renals, S.: Dynamic Bayesian networks for Meeting structuring. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 629–632 (2004)Google Scholar
  3. 3.
    Temko, A., Nadeu, C.: Classification of Meeting-Room Acoustic Events with Support Vector Machines and Variable-Feature-Set Clustering. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 505–508 (2005)Google Scholar
  4. 4.
    Truong, K., Leeuwen, D.: Automatic Detection of Laughter. In: Proceeding of European Conference on Speech Communication and Technology (Interspeech 2005), pp. 485–488 (2005)Google Scholar
  5. 5.
    Kennedy, L.S., Ellis, D.P.W.: Laughter Detection of in Meetings. In: Proceeding of NIST ICASSP 2004 Meeting Recognition Workshop (2004)Google Scholar
  6. 6.
    Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight Sound Effects Detection in Audio Stream. In: Proceeding of IEEE International Conference on Multimedia and Expo. (ICME 2003), pp. 37–40 (2003)Google Scholar
  7. 7.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jun Ogata
    • 1
  • Futoshi Asano
    • 1
  1. 1.National Institute of Advanced Industrial Science and Technology (AIST)IbarakiJapan

Personalised recommendations