Integrating Complementary Features with a Confidence Measure for Speaker Identification

  • Nengheng Zheng
  • P. C. Ching
  • Ning Wang
  • Tan Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)


This paper investigates the effectiveness of integrating complementary acoustic features for improved speaker identification performance. The complementary contributions of two acoustic features, i.e. the conventional vocal tract related features MFCC and the recently proposed vocal source related features WOCOR, for speaker identification are studied. An integrating system, which performs a score level fusion of MFCC and WOCOR with a confidence measure as the weighting parameter, is proposed to take full advantage of the complementarity between the two features. The confidence measure is derived based on the speaker discrimination powers of MFCC and WOCOR in each individual identification trial so as to give more weight to the one with higher confidence in speaker discrimination. Experiments show that information fusion with such a confidence measure based varying weight outperforms that with a pre-trained fixed weight in speaker identification.


Gaussian Mixture Model Speaker Recognition Speaker Identification Discrimination Ratio Complementary Feature 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)CrossRefGoogle Scholar
  2. 2.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixturemodels. Digital Signal Processing 10(1-3), 19–41 (2000)CrossRefGoogle Scholar
  3. 3.
    Sonmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: A lognormal tied mixture model of pitch for prosody based speaker recognition. In: Proc. Eurospeech, pp. 1391–1394 (1997)Google Scholar
  4. 4.
    Imperl, B., Kacic, Z., Horvat, B.: A study of harmonic features for speaker recognition. Speech Communication 22(4), 385–402 (1997)CrossRefGoogle Scholar
  5. 5.
    Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech Audio Processing 7(5), 569–585 (1999)CrossRefGoogle Scholar
  6. 6.
    Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones1, D., Xiang, B.: The SuperSID project: Exploiting highlevel information for high-accuracy speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, pp. 784–787 (2003)Google Scholar
  7. 7.
    Zheng, N.H., Ching, P.C., Lee, T.: Time frequency analysis of vocal source signal for speaker recognition. In: Proc. Int. Conf. on Spoken Language Processing, pp. 2333–2336 (2004)Google Scholar
  8. 8.
    Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)Google Scholar
  9. 9.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  10. 10.
    Ross, A., Jain, A., Qian, J.-Z.: Information fusion in biometrics. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 354–359. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  11. 11.
    Garcia-Romero, D., Fierrez-Aguilar, J., Gonzalez-Rodriguez, J., Garcia, J.O.: On the use of quality measures for text-independent speaker recognition. In: ESCA Workshop on Speaker and Language Recognition, Odyssey, pp. 105–110 (2004)Google Scholar
  12. 12.
    Toh, K.-A., Yau, W.-Y.: Fingerprint and speaker verification decisions fusion using a functional link network. IEEE Trans. System, Man and Cybernetics B 35(3), 357–370 (2005)CrossRefGoogle Scholar
  13. 13.
    Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis. Elsevier, Amsterdam (1995)Google Scholar
  14. 14.
    Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)Google Scholar
  15. 15.
    Zheng, N.H., Qin, C., Lee, T., Ching, P.C.: CU2C: A dual-condition Cantonese speech database for speaker recognition applications. In: Proc. Oriental- COCOSDA, pp. 67–72 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nengheng Zheng
    • 1
  • P. C. Ching
    • 1
  • Ning Wang
    • 1
  • Tan Lee
    • 1
  1. 1.Department of Electronic EngineeringThe Chinese University of Hong KongHong Kong

Personalised recommendations