Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features
In this paper, we propose hybrid fusion of audio and explicit correlation features for speaker identity verification applications. Experiments were performed with the GMM based speaker models with a hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with implicit eigen lip and audio MFCC features. An evaluation of the system performance with different gender specific datasets from controlled VidTIMIT data base and opportunistic UCBN database shows a significant performance improvement.
KeywordsAudio-visual speaker identity verification liveness checking cross modal correlations
- 2.Kuratate, T., Munhall, K.G., Rubin, P.E., Vatikiotis-Bateson, E., Yehia, H.: Audio-visual synthesis of talking faces from speech production correlates. In: Proc. EuroSpeech 1999, ESCA (1999)Google Scholar
- 3.Maeda, S.: A face model derived from a guided PCA of motion capture data and McGurk effects. In: Proceedings of the ATR symposium on Cross-modal Processing of Faces and Voices, pp. 63–64 (January 2005)Google Scholar
- 6.Borga, M., Knutsson, H.: Finding Efficient Nonlinear Visual Operators using Canonical Correlation Analysis. In: Proc. of SSAB 2000, Halmstad, pp. 13–16 (2000)Google Scholar