Dynamic Bayesian Networks for Audio-Visual Speaker Recognition

  • Dongdong Li
  • Yingchun Yang
  • Zhaohui Wu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3832)


Audio-Visual speaker recognition promises higher performance than any single modal biometric systems. This paper further improves the novel approach based on Dynamic Bayesian Networks (DBNs) to bimodal speaker recognition. In the present paper, we investigate five different topologies of feature-level fusion framework using DBNs. We demonstrate that the performance of multimodal systems can be further improved by modeling the correlation of between the speech features and the face features appropriately. The experiment conducted on a multi-modal database of 54 users indicates promising results, with an absolute improvement of about 7.44% in the best case and 3.13% in the worst case compared with single modal speaker recognition system.


Face Feature Speaker Recognition Dynamic Bayesian Network Speaker Identification Speech Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Murphy, K.: Dynamic Bayesian Networks: Representation, Inference and Learning. Ph.D. thesis, U.C. Berkeley (2002)Google Scholar
  2. 2.
    Li, D., Yang, Y., Wu, Z., Liu, W.: Add prior knowledge to speaker recognition. In: Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2005, part of the SPIE Defense and Security Symposium 2005, vol. 5813, pp. 192–200 (2005)Google Scholar
  3. 3.
    Nefian, A.V., Liang, L.H., Liu, X.X., Pi, X., Murphy, K.: Dynamic Bayesian networks for audio-visual speech recognition. EURASIP, Journal of Applied Signal Processing 2002(11), 1274–1288 (2002)zbMATHCrossRefGoogle Scholar
  4. 4.
    Pavlovic, V., Garg, A., Rehg, J., Huang, T.S.: Multimodal speaker detection using error feedback dynamic Bayesian networks. In: Computer Vision and Pattern Recognition, vol. 2, pp. 34–41 (2000)Google Scholar
  5. 5.
    Li, D., Sang, L.F., Yang, Y., Wu, Z.: Bimodal Speaker Identification Using Dynamic Bayesian Network. In: Li, S.Z., Lai, J.-H., Tan, T., Feng, G.-C., Wang, Y. (eds.) SINOBIOMETRICS 2004. LNCS, vol. 3338, pp. 577–585. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Wang, Y., Tan, T., Jain, A.K.: Combining Face and Iris Biometrics for Identity Verification. In: Proc. of 4th Int’l Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA), Guildford, UK, pp. 805–813 (2003)Google Scholar
  7. 7.
    Sang, L., Wu, Z., Yang, Y., Zhang, W.: Automatic Speaker Recognition Using Dynamic Bayesian Network. In: IEEE ICASSP, vol. 1, pp. 188–191 (2003)Google Scholar
  8. 8.
    Murphy, K.: The Bayes Net Toolbox for Matlab. Computing Science and Statistics 33 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Dongdong Li
    • 1
  • Yingchun Yang
    • 1
  • Zhaohui Wu
    • 1
  1. 1.Department of Computer Science and TechnologyZhejiang UniversityHangzhouP.R. China

Personalised recommendations