Multimedia Tools and Applications

, Volume 77, Issue 7, pp 8273–8294 | Cite as

Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification

  • Waquar Ahmad
  • Harish Karnick
  • Rajesh M. Hegde


This work explores the use of phoneme level information in cohort selection to improve the performance of a speaker verification system. In speaker verification, cohort is used in score normalization to get a better performance. Score normalization is a technique to reduce the undesirable variation arising from acoustically mismatched conditions. Proper selection of cohort significantly improves speaker verification performance. In this paper, we investigate cohort selection based on a speaker model cluster under the i-vector framework that we call the i-vector model cluster (IMC). Two approaches for cohort selection are proposed. First approach utilizes speaker specific properties and called speaker specific cohort selection (SSCS). In this approach, speaker level information is used for cohort selection. The second approach is phoneme specific cohort selection (PSCS). This method improves cohort set selection by using phoneme level information. Phoneme level information is further employed in a late fusion approach that uses a majority voting method on normalized scores to improve the performance of the speaker verification system. Speaker verification experiments were conducted using the TIMIT, HINDI and YOHO databases. An equal error rate improvement of 19.01%, 14.61% and 19.4%is obtained for the proposed method compared to the standard ZT-Norm method for TIMIT, HINDI and YOHO datasets. Reasonable improvements in performance are also obtained in terms of minimum decision cost function (min DCF) and detection error trade-off (DET) curves.


Speaker verification Speaker recognition Cohort selection 


  1. 1.
    Apsingekar V, DeLeon P (2009) Speaker model clustering for efficient speaker identification in large population applications. IEEE Trans Acoust Speech Signal Process 17(4):848–853Google Scholar
  2. 2.
    Apsingekar V, DeLeon P (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118CrossRefGoogle Scholar
  3. 3.
    Auckenthaler R, Carey M, Lloyd-Thomas H (2000) Score normalization for text-independent speaker verification systems. Digital Signal Process 10(1–3):42–54CrossRefGoogle Scholar
  4. 4.
    Bimbot F, Bonastre J-F, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-García J, Petrovska-Delacrétaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Proc 2004:430–451Google Scholar
  5. 5.
    Campbell J Jr (1997) Speaker recognition: A tutorial. Proc IEEE 85(9):1437–1462CrossRefGoogle Scholar
  6. 6.
    Campbell JP (1995) Testing with the yoho cd-rom voice verification corpus 1995 international conference on acoustics, speech, and signal processing, 1995. ICASSP-95, vol 1. IEEE, pp 341–344Google Scholar
  7. 7.
    Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using gmm supervectors for speaker verification. Signal Proc Lett IEEE 13(5):308–311CrossRefGoogle Scholar
  8. 8.
    Das RK, Jelil S, Prasanna SM (2016) Significance of constraining text in limited data text-independent speaker verification 2016 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5Google Scholar
  9. 9.
    (2001) Database for indian languages, Speech and vision lab, IIT Madras, ChennaiGoogle Scholar
  10. 10.
    Dehak N, Dehak R, Glass J, Reynolds D, Kenny P (2010) Cosine similarity scoring without score norMalization techniques Proceedings Odyssey speaker and language recognition workshopGoogle Scholar
  11. 11.
    Eatock S, Mason J (1994) A quantitative assesment of the relative speaker discriminating properties of phonemes Proceedings of the ICASSP 1994, pp 133–136Google Scholar
  12. 12.
    Fienberg SE (1970) An iterative procedure for estimation in contingency tables. Annals of Mathematical Statistics 41(3):907–917MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press ProfessionalGoogle Scholar
  14. 14.
    Garofolo JS (1993) Timit acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, PhiladelphiaCrossRefGoogle Scholar
  15. 15.
    Hatch AO, Kajarekar SS, Stolcke A (2006) Within-class covariance normalization for svm-based speaker recognition INTERSPEECH, pp 1471–1474Google Scholar
  16. 16.
    Hosom J-P, Vermeulen PJ, Shaw J (2016) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination, uS Patent 9,230,550Google Scholar
  17. 17.
    Hultzen I, Jr JA, Miron M (1964) Tables of transitional frequencies of english phonemes. University of Illinois Press, Urbana, IlGoogle Scholar
  18. 18.
    Jirouek R, Peuil S (1995) On the effective implementation of the iterative proportional fitting procedure. Comput Stat Data Anal 19(2):177–189CrossRefGoogle Scholar
  19. 19.
    Kenny P (2005) Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM Montreal (Report) CRIM 06:8–13Google Scholar
  20. 20.
    Kenny P, Stafylakis T, Alam J, Kockmann M (2015) An i-vector backend for speaker verification Proceedings interspeech, pp 2307–2310Google Scholar
  21. 21.
    Kinnunen T, Hautamäki V, Fränti P (2004) Fusion of spectral feature sets for accurate speaker identification 9th conference speech and computerGoogle Scholar
  22. 22.
    Kinnunen T, Kärkkäinen I, Fränti P Report series a, the mystery of cohort selectionGoogle Scholar
  23. 23.
    Kucera H, Francis W N (1967) Computational analysis of present day american english. Brown University PressGoogle Scholar
  24. 24.
    Larcher A, Bousquet P, Lee K.A, Matrouf D, Li H, Bonastre J-F (2012) I-vectors in the context of phonetically-constrained short utterances for speaker verification 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4773–4776Google Scholar
  25. 25.
    Lei Y, Scheffer N, Ferrer L, McLaren M (2014) A novel scheme for speaker recognition using a phonetically-aware deep neural network 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1695–1699Google Scholar
  26. 26.
    Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance Proceedings eurospeech, vol 97, pp 1895–1898Google Scholar
  27. 27.
    Matějka P, Glembek O, Castaldo F, Alam MJ, Plchot O, Kenny P, Burget L, Černocky J (2011) Full-covariance ubm and heavy-tailed plda in i-vector speaker verification 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4828–4831Google Scholar
  28. 28.
    Nagineni S, Hegde R (2010) On line client-wise cohort set selection for speaker verification using iterative normalization of confusion matrices Proceedings eursipco, pp 576–580Google Scholar
  29. 29.
    Najim D, Patrick K, Réda D, Pierre D, Pierre O (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRefGoogle Scholar
  30. 30.
    Ramos-Castro D, Fierrez-Aguilar J, Gonzalez-Rodriguez J, Ortega-Garcia J (2007) Speaker verification using speaker-and test-dependent fast score normalization. Pattern Recogn Lett 28(1):90–98CrossRefGoogle Scholar
  31. 31.
    Reynolds DA (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Comm 17(1–2):91–108CrossRefGoogle Scholar
  32. 32.
    Reynolds DA (1997) Comparison of background normalization methods for text-independent speaker verification EurospeechGoogle Scholar
  33. 33.
    Reynolds DA, Campbell WM (2008) Text-independent speaker recognition Springer handbook of speech processing. Springer, pp 763–782Google Scholar
  34. 34.
    Rosenberg AE (1976) Automatic speaker verification: A review. Proc IEEE 64 (4):475–487CrossRefGoogle Scholar
  35. 35.
    Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for tnorm in text-independent speaker verification ICASSP, pp 741–744Google Scholar
  36. 36.
    Vincent E, Watanabe S, Nugraha AA, Barker J, Marxer R An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech & LanguageGoogle Scholar
  37. 37.
    Young S J, Young S (1993) The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge Department of EngineeringGoogle Scholar
  38. 38.
    Zeinali H, Sameti H, Burget L, Černockỳ J, Maghsoodi N, Matějka P (2016) i-vector/hmm based text-dependent speaker verification system for reddots challenge. Interspeech 2016:440–444CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Waquar Ahmad
    • 1
  • Harish Karnick
    • 2
  • Rajesh M. Hegde
    • 3
  1. 1.Department of ECENIT SikkimRavanglaIndia
  2. 2.Department of Computer Science and EngineeringIIT KanpurKanpurIndia
  3. 3.Department of Electrical EngineeringIIT KanpurKanpurIndia

Personalised recommendations