Speaker Verification Using Complementary Information from Vocal Source and Vocal Tract

  • Nengheng Zheng
  • Ning Wang
  • Tan Lee
  • P. C. Ching
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)


This paper describes a speaker verification system which uses two complementary acoustic features: Mel-frequency cepstral coefficients (MFCC) and wavelet octave coefficients of residues (WOCOR). While MFCC characterizes mainly the spectral envelope, or the formant structure of the vocal tract system, WOCOR aims at representing the spectro-temporal characteristics of the vocal source excitation. Speaker verification experiments carried out on the ISCSLP 2006 SRE database demonstrate the complementary contributions of MFCC and WOCOR to speaker verification. Particularly, WOCOR performs even better than MFCC in single channel speaker verification task. Combining MFCC and WOCOR achieves higher performance than using MFCC only in both single and cross channel speaker verification tasks.


Speaker Recognition Speaker Verification False Rejection Rate Spectral Envelope Testing Segment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)CrossRefGoogle Scholar
  2. 2.
    Reynolds, D.A.: Speaker identification and verification using gaussian mixture speaker models. Speech Communication 17(1), 91–108 (1995)CrossRefGoogle Scholar
  3. 3.
    Schmidt-Nielsen, A., Crystal, T.H.: Speaker verification by human listeners: Experiments comparing human and machine performance using the nist 1998 speaker evaluation data. Digital Signal Processing 10(1-2), 249–266 (2000)CrossRefGoogle Scholar
  4. 4.
    Atal, B.S.: Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. 52, 1687–1697 (1972)CrossRefGoogle Scholar
  5. 5.
    Sonmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: A lognormal tied mixture model of pitch for prosody based speaker recognition. In: Proc. Eurospeech, pp. 1391–1394 (1997)Google Scholar
  6. 6.
    Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones1, D., Xiang, B.: The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, pp. 784–787 (2003)Google Scholar
  7. 7.
    Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: Highlevel speaker verification with support vector machines. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, pp. 73–76 (2004)Google Scholar
  8. 8.
    Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)Google Scholar
  9. 9.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  10. 10.
    Zheng, N.H., Ching, P.C., Lee, T.: Time frequency analysis of vocal source signal for speaker recognition. In: Proc. Int. Conf. on Spoken Language Processing, pp. 2333–2336 (2004)Google Scholar
  11. 11.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)CrossRefGoogle Scholar
  12. 12.
    Atal, B.S.: Efectiveness of linear prediction characteristics of the speech wave for automatic speaker identication and verication. J. Acoust. Soc. Am. 55(6), 1304–1312 (1974)CrossRefGoogle Scholar
  13. 13.
    Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis. Elsevier, Amsterdam (1995)Google Scholar
  14. 14.
    Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)MATHGoogle Scholar
  15. 15.
    Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proc. Eurospeech, pp. 1895–1898 (1997)Google Scholar
  16. 16.
    Martin, A., Przybocki, M.: The nist 1999 speaker recognition evaluation: An overview. Digital Signal Processing 10(1-18) (2000)Google Scholar
  17. 17.
    Chan, W., Lee, T., Zheng, N., Ouyang, H.: Use of vocal source features in speaker segmentation. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, pp. 657–660 (2006)Google Scholar
  18. 18.
    Campbell, J.P., Tremain, T.E., Welch, V.: The proposed federal standard 1016 4800 bps voice coder: CELP. Speech Technology, 58–64 (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nengheng Zheng
    • 1
  • Ning Wang
    • 1
  • Tan Lee
    • 1
  • P. C. Ching
    • 1
  1. 1.Department of Electronic EngineeringThe Chinese University of Hong KongHong Kong

Personalised recommendations