Multimodal Authentication Using Asynchronous HMMs
It has often been shown that using multiple modalities to authenticate the identity of a person is more robust than using only one. Various combination techniques exist and are often performed at the level of the output scores of each modality system. In this paper, we present a novel HMM architecture able to model the joint probability distribution of pairs of asynchronous sequences (such as speech and video streams) describing the same event. We show how this model can be used for audio-visual person authentication. Results on the M2VTS database show robust performances of the system under various audio noise conditions, when compared to other state-of-the-art techniques.
Unable to display preview. Download preview PDF.
- S. Bengio. An asynchronous hidden markov model for audio-visual speech recognition. Technical Report IDIAP-RR 02-26, IDIAP, 2002.Google Scholar
- S. Bengio. An asynchronous hidden markov model for audio-visual speech recognition. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, NIPS 15, 2003.Google Scholar
- S. Bengio and Y. Bengio. An EM algorithm for asynchronous input/output hidden markov models. In Proceedings of the International Conference on Neural Information Processing, ICONIP, Hong Kong, 1996.Google Scholar
- R. Durbin, S. Eddy, A. Krogh, and G. Michison. Biological Sequence Analysis: Probabilistic Models of proteins and nucleic acids. Cambridge University Press, 1998.Google Scholar
- S. Pigeon and L. Vandendorpe. The M2VTS multimodal face database (release 1.00). In Proceedings of the First International Conference on Audio-and Videobased Biometric Person Authentication ABVPA, 1997.Google Scholar
- Laurence R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.Google Scholar
- Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10(1–3), 2000.Google Scholar
- A. Ross, A.K. Jain, and J. Z. Qian. Information fusion in biometrics. In Proceedings of the 3rd International Conference on Audio-and Video-Based Person Authentication (AVBPA), pages 354–359, 2001.Google Scholar
- A. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones. The noisex-92 study on the effect of additive noise on automatic speech recognition. Technical report, DRA Speech Research Unit, 1992.Google Scholar