Stream-Based Classification and Segmentation of Speech Events in Meeting Recordings
In this paper, we presents a stream-based speech event classification and segmentation method in meeting recordings. Four speech events are considered: normal speech, laughter, cough and pause between talks. hidden Markov Models (HMMs) are used to model these speech events and a model topology optimization using Bayesian Information Criterion (BIC) is applied. Experimental results have shown that our system can obtain satisfying results. Based on the detected speech events, the recording of the meeting is structured using an XML-based description language and is visualized by a browser.
KeywordsHide Markov Model Bayesian Information Criterion Speech Recognition Automatic Speech Recognition Acoustic Event
Unable to display preview. Download preview PDF.
- 1.Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 605–608 (2004)Google Scholar
- 2.Dielmann, A., Renals, S.: Dynamic Bayesian networks for Meeting structuring. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. 629–632 (2004)Google Scholar
- 3.Temko, A., Nadeu, C.: Classification of Meeting-Room Acoustic Events with Support Vector Machines and Variable-Feature-Set Clustering. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 505–508 (2005)Google Scholar
- 4.Truong, K., Leeuwen, D.: Automatic Detection of Laughter. In: Proceeding of European Conference on Speech Communication and Technology (Interspeech 2005), pp. 485–488 (2005)Google Scholar
- 5.Kennedy, L.S., Ellis, D.P.W.: Laughter Detection of in Meetings. In: Proceeding of NIST ICASSP 2004 Meeting Recognition Workshop (2004)Google Scholar
- 6.Cai, R., Lu, L., Zhang, H.-J., Cai, L.-H.: Highlight Sound Effects Detection in Audio Stream. In: Proceeding of IEEE International Conference on Multimedia and Expo. (ICME 2003), pp. 37–40 (2003)Google Scholar