Abstract
The conventional bootstrapping approaches of speaker models in unsupervised speaker indexing tasks are very sensitive to the bootstrapping segment duration. If the duration is insufficient to build speaker model, such as in telephone conversations and meetings scenario, serious problems will arise. We therefore propose a robust bootstrapping framework, which employs Multi-EigenSpace modeling technique based on Regression Class (RC-MES) to build speaker models with sparse data, and a short-segment clustering to prevent the too short segments from influencing bootstrapping. For a real discussion archive with a total duration of 8 hours, we demonstrate the significant robustness of the proposed method, which not only improves the speaker change detection performance but also outperforms the conventional bootstrapping methods, even if the average bootstrapping segment duration is less than 5 seconds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Delacourt, P., Kryze, D., Wellekens, C.J.: Detection of Speaker Changes in an Audio Document. In : Proc. Eur. Conf. Speech Commum. Tech (EUROSPEECH), vol. 3, 1195–1198 (1999)
Moh, Y., Nguyen, P., Junqua, J.-C.: Towards Domain Independent Speaker Clustering. In: Proc. IEEE Int. Conf. Acoust. Speech. Signal Process (ICASSP), vol. 2, pp. 85–88 (2003)
Wu, T., Lu, L., Chen, K., Zhang, H.: UBM-Based Real-Time Speaker Segmentation for Boradcasting News. In: Proc. IEEE Int. Conf. Acoust. Speech. Signal Process (ICASSP), vol. 2, pp. 193–196 (2003)
Kwon, S., Narayanan, S.: Unsupervised Speaker Indexing Using Generic Models. IEEE Trans. On Speech and Audio Processing 13(5), 1004–1013 (2005)
Thyes, O., Kuhn, R., Nguyen, P., Junqua J.-C.: Speaker Identification and Verification Using Eigenvoices. In: Proc. IEEE Int. Conf. Acoust. Speech. Signal Process (ICASSP), vol. 2, pp. 242–246 (2000)
Aubert, X.L.: Eigen-MLLRs Applied to Unsupervised Speaker Enrollment for Large Vocabulary Continous Speech Recognition. In: Proc. IEEE Int. Conf. Acoust. Speech. Signal Process (ICASSP), vol. 1, pp. 17–21 (2004)
Fu, Z., Zhao, R.: Speaker Modeling Technique Based on Regression Class for Speaker Identification with Sparse Trainging. In: Li, S.Z., Lai, J.-H., Tan, T., Feng, G.-C., Wang, Y. (eds.) SINOBIOMETRICS 2004. LNCS, vol. 3338, Springer, Heidelberg, GuangZhou, China (2004)
Ajmera, J., McCowan, I., Bourland, H.: Robust Speaker Change Detection. IEEE Signal Processing Letters 11(8), 649–651 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
ZhongHua, F. (2007). Robust Bootstrapping of Speaker Models for Unsupervised Speaker Indexing. In: Sebe, N., Liu, Y., Zhuang, Y., Huang, T.S. (eds) Multimedia Content Analysis and Mining. MCAM 2007. Lecture Notes in Computer Science, vol 4577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73417-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-73417-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73416-1
Online ISBN: 978-3-540-73417-8
eBook Packages: Computer ScienceComputer Science (R0)