A Speaker Diarization System with Robust Speaker Localization and Voice Activity Detection
In real-world auditory scene analysis of human-robot interactions, three types of information are essential and need to be extracted from the observation data – who speaks when and where. We present a speaker diarization system that is used to accomplish the resolution. Multiple signal classification (MUSIC) is a powerful method for voice activity detection (VAD) and direction of arrival (DOA) estimation. We propose our system and compare its performance in VAD and DOA with the method based on MUSIC algorithm.
KeywordsGround Truth Sound Source Audio Signal Blind Source Separation Free Speech
Unable to display preview. Download preview PDF.
- 1.Kubota, Y., Yoshida, M., Komatani, K., Ogata, T., Okuno, H.G.: Design and implementation of 3d auditory scene visualizer towards auditory awareness with face tracking. In: Tenth IEEE International Symposium on Multimedia, pp. 468–476 (2008)Google Scholar
- 3.Araki, S., Hori, T., Fujimoto, M., Watanabe, S., Yoshioka, T., Nakatani, T., Nakamura, A.: Online meeting recognizer with multichannel speaker diarization. In: ASILOMAR, pp. 1697–1701 (2010)Google Scholar
- 5.Nakamura, K., Nakadai, K., Asano, F., Ince, G.: Intelligent sound source localization and its application to multimodal human tracking. In: Proceedings of the IEEE/RSJ International Conference on IROS, pp. 143–148 (2011)Google Scholar
- 6.Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley Interscience (2001)Google Scholar
- 7.Ono, N.: Stable and fast update rules for independent vector analysis based on auxiliary function technique. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 189–192 (2011)Google Scholar