Towards Structured Approaches to Arbitrary Data Selection and Performance Prediction for Speaker Recognition

  • Howard Lei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5558)


We developed measures relating feature vector distributions to speaker recognition (SR) performances for performance prediction and potential arbitrary data selection for SR. We examined the measures of mutual information, kurtosis, correlation, and measures pertaining to intra- and inter-speaker variability. We applied the measures on feature vectors of phones to determine which measures gave good SR performance prediction of phones standalone and in combination. We found that mutual information had an -83.5% correlation with the Equal Error Rates (EERs) of each phone. Also, Pearson’s correlation between the feature vectors of two phones had a -48.6% correlation with the relative EER improvement of the score-level combination of the phones. When implemented in our new data-selection scheme (which does not require a SR system to be run), the measures allowed us to select data with 2.13% overall EER improvement (on SRE08) over data selected via a brute-force approach, at a fifth of the computational costs.


Text-dependent speaker recognition mutual information relevance redundancy data selection 


  1. 1.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.: Speaker Verification using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)Google Scholar
  2. 2.
    Sturim, D., Reynolds, D., Dunn, R., Quatieri, T.: Speaker Verification using Text-Constrained Gaussian Mixture Models. In: ICASSP, vol. 1, pp. 677–680 (2002)Google Scholar
  3. 3.
    Lei, H., Mirghafori, N.: Word-Conditioned Phone N-grams for Speaker Recognition. In: ICASSP, vol. 4, pp. 253–256 (2007)Google Scholar
  4. 4.
    Hannani, A., Toledano, D., Petrovska-Delacrétaz, D., Montero-Asenjo, A., Hennebert, J.: Using Data-driven and Phonetic Units for Speaker Verification. In: IEEE Odyssey (2006)Google Scholar
  5. 5.
    Gerber, M., Beutler, R., Pfisher, B.: Quasi Text-Independent Speaker-Verification based on Pattern Matching. In: Interspeech, pp. 1993–1996 (2007)Google Scholar
  6. 6.
    Stolcke, A., Bratth, H., Butzberger, J., Franco, H., Rao Gadde, V., Plauche, M., Richey, C., Shriberg, E., Sonmez, K., Weng, F., Zheng, J.: The SRI March 2000 Hub-5 Conversational Speech Transcription System. In: NIST Speech Transcription Workshop (March 2000)Google Scholar
  7. 7.
    Bonastre, J.F., Wils, F., Meignier, S.: ALIZE, a free Toolkit for Speaker Recognition. In: ICASSP, vol. 1, pp. 737–740 (2005)Google Scholar
  8. 8.
    HMM Toolkit (HTK),
  9. 9.
    Kwak, N., Choi, C.: Input Feature Selection by Mutual Information Based on Parzen Window. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002)Google Scholar
  10. 10.
    Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005)Google Scholar
  11. 11.
    Ellis, D., Bilmes, J.: Using Mutual Information to Design Feature Combinations. In: ICSLP (2000)Google Scholar
  12. 12.
    Parzen, E.: On Estimation of a Probability Density Function and Mode. Annals of Math. Statistics 33 (1962)Google Scholar
  13. 13.
    Xie, Y., Dai, B., Yao, Z., Liu, M.: Kurtosis Normalization in Feature Space for Robust Speaker Verification. In: ICASSP, vol. 1 (2006)Google Scholar
  14. 14.
    Vogt, R., Kajarekar, S., Sridharan, S.: Discriminant NAP for SVM Speaker Recognition. In: IEEE Odyssey (2008)Google Scholar
  15. 15.
    Lei, H.: NAP, WCCN, a New Linear Kernel, and Keyword Weighting for the HMM Supervector Speaker Recognition System. Technical report, International Computer Sciences Institute (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Howard Lei
    • 1
  1. 1.The International Computer Science InstituteBerkeleyUSA

Personalised recommendations