Selective Fusion for Speaker Verification in Surveillance

  • Yosef A. Solewicz
  • Moshe Koppel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3495)


This paper presents an improved speaker verification technique that is especially appropriate for surveillance scenarios. The main idea is a meta-learning scheme aimed at improving fusion of low- and high-level speech information. While some existing systems fuse several classifier outputs, the proposed method uses a selective fusion scheme that takes into account conveying channel, speaking style and speaker stress as estimated on the test utterance. Moreover, we show that simultaneously employing multi-resolution versions of regular classifiers boosts fusion performance. The proposed selective fusion method aided by multi-resolution classifiers decreases error rate by 30% over ordinary fusion.


Support Vector Machine Classifier Speaker Recognition Speaker Verification Universal Background Model Speaker Verification System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Reynolds, D., Quatieri, T., Dunn, R.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10(1), 19–41 (2000)CrossRefGoogle Scholar
  2. 2.
    NIST - Speaker Recognition Evaluations,
  3. 3.
    Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear Feature Based Classification of Speech under Stress. IEEE Transactions on Speech & Audio Processing 9(2), 201–216 (2001)CrossRefGoogle Scholar
  4. 4.
    Campbell, J., Reynolds, D., Dunn, R.: Fusing High- and Low-Level Features for Speaker Recognition. In: Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech), Geneva, Switzerland, pp. 2665–2668 (2003)Google Scholar
  5. 5.
    Solewicz, Y.A., Koppel, M.: Enhanced Fusion Methods for Speaker Verification. In: 9th International Conference Speech and Computer (SPECOM 2004), St. Petersburg, Russia, pp. 388–392 (2004)Google Scholar
  6. 6.
    Doddington, G.: Speaker Recognition based on Idiolectal Differences between Speakers. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark, pp. 2517–2520 (2001)Google Scholar
  7. 7.
    Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score Normalization for Text-Independent Speaker Verification Systems. Digital Signal Processing 10, 42–54 (2000)CrossRefGoogle Scholar
  8. 8.
    Andrews, W.D., Kohler, M.A., Campbell, J.P., Godfrey, J.J., Hernández-Cordero, J.: Gender-Dependent Phonetic Refraction for Speaker Recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, Florida, pp. 149–152 (2002)Google Scholar
  9. 9.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  10. 10.
    Ramaswamy, G., Navratil, J., Chaudhari, U., Zilca, R., Pelecanos, J.: The IBM Systems for the NIST 2003 Speaker Recognition Evaluation. In: NIST 2003 Speaker Recognition Workshop, College Park, Maryland (2003)Google Scholar
  11. 11.
    Przybocki, M., Martin, A.: The NIST Year 2001 Speaker Recognition Evaluation Plan (2001),
  12. 12.
    SWITCHBOARD: A User’s Manual. Linguistic Data Consortium,

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Yosef A. Solewicz
    • 1
    • 2
  • Moshe Koppel
    • 1
  1. 1.Dept. of Computer ScienceBar-Ilan UniversityRamat-GanIsrael
  2. 2.Division of Identification and Forensic ScienceIsrael National PoliceJerusalemIsrael

Personalised recommendations