Advertisement

Investigating Text-Independent Speaker Verification Systems Under Varied Data Conditions

  • Rohan Kumar DasEmail author
  • S. R. Mahadeva Prasanna
Article
  • 37 Downloads

Abstract

This work makes an investigation into speaker verification (SV) from the view of practical systems. Limited data SV is preferred in order to have user comfort and effective decision delivery for regular usage. However, reduction in speech data affects the SV performance that becomes a concern for field deployment. In this work, varied data conditions for SV are explored, and sufficient train with limited test data is presented as a preferable anatomy for practical systems. Different explorations are made from the perspective of improving performance in varied data conditions. These explorations include vocal tract constriction feature to include speaker-specific acoustic–phonetic information, different attributes of voice source features that carry alternative/complementary information from that carried by conventional mel-frequency cepstral coefficient features. Further, kernel discriminant analysis is performed at the back end of i-vector-based speaker modeling for channel/session compensation that is found to work well for varied data conditions. Finally, a framework is proposed in combination with the stated explorations to have a better speaker characterization, which is more effective in case of sufficient train and limited test data scenario. The proposed framework achieves significant improvement in performance [equal error rate (EER): 11.20%, detection cost function (DCF): 0.1990], compared to the baseline (EER: 22.31%, DCF: 0.4128) for sufficient train with 2-s test segment case, showing scope toward application-oriented systems.

Keywords

Text-independent speaker verification Limited data Short utterances 

Notes

References

  1. 1.
    T.V. Ananthapadmanabha, A.P. Prathosh, A.G. Ramakrishnan, Detection of closure-burst transitions of stops and affricates in continuous speech using plosion index. J. Acoust. Soc. Am. 135(1), 460–471 (2014)CrossRefGoogle Scholar
  2. 2.
    D. Chakrabarty, S.R.M. Prasanna, R.K. Das, Development and evaluation of online text-independent speaker verification system for remote person authentication. Int. J. Speech Technol. 16(1), 75–88 (2013)CrossRefGoogle Scholar
  3. 3.
    W. Chan, N. Zheng, T. Lee, Discrimination power of vocal source and vocal tract related features for speaker segmentation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1884–1892 (2007)CrossRefGoogle Scholar
  4. 4.
    R.K. Das, B., A., S.R.M. Prasanna, A.G. Ramakrishnan, Combining source and system information for limited data speaker verification, in Interspeech 2014, (Singapore, 2014), pp. 1836–1840Google Scholar
  5. 5.
    R.K. Das S. Jelil, S.R.M. Prasanna Exploring session variability and template aging in speaker verification for fixed phrase short utterances. In: Interspeech 2016, pp. 445–449Google Scholar
  6. 6.
    R.K. Das S. Jelil, S.R.M. Prasanna, Significance of constraining text in limited data text-independent speaker verification, in International Conference on Signal Processing and Communications (SPCOM) 2016, IISc Bangalore (2016)Google Scholar
  7. 7.
    R.K. Das, S. Jelil, S.R.M. Prasanna, Development of multi-level speech based person authentication system. J. Signal Process. Syst. 88(3), 259–271 (2017)CrossRefGoogle Scholar
  8. 8.
    R.K. Das, S. Jelil, S.R.M. Prasanna, Exploring text-constraint models and source information for long-enrollment with short-test speaker verification. Circuits Syst. Signal Process. (2018)Google Scholar
  9. 9.
    R.K. Das, A.B. Manam, S.R.M. Prasanna, Exploring kernel discriminant analysis for speaker verification with limited test data. Pattern Recognit. Lett. 98, 26–31 (2017)CrossRefGoogle Scholar
  10. 10.
    R.K Das, D. Pati, S.R.M. Prasanna, Different aspects of source information for limited data speaker verification, in National Conference on Communications (NCC) 2015, IIT Bombay (2015)Google Scholar
  11. 11.
    R.K. Das, S.R.M. Prasanna, Speaker Verification for Variable Duration Segments and the Effect of Session Variability, chap. 16, Springer, Lecture Notes in Electrical Engineering (2015), pp. 193–200Google Scholar
  12. 12.
    R.K. Das, S.R.M. Prasanna, Exploring different attributes of source information for speaker verification with limited test data. J. Acoust. Soc. Am. 140(1), 184–190 (2016)CrossRefGoogle Scholar
  13. 13.
    R.K. Das, S.R.M. Prasanna Speaker verification from short utterance perspective: A review. IETE Technical Review pp. 1–19 (2017)Google Scholar
  14. 14.
    R.K. Das, S.R.M. Prasanna, Investigating text-independent speaker verification from practically realizable system perspective, in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Honolulu, Hawaii (2018)Google Scholar
  15. 15.
    N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  16. 16.
    S. Dey, S. Barman, R.K. Bhukya, R.K. Das, B.C. Haris, S.R.M. Prasanna, R. Sinha, Speech biometric based attendance system, in National Conference on Communications (NCC) 2014, IIT Kanpur (2014)Google Scholar
  17. 17.
    R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2000)zbMATHGoogle Scholar
  18. 18.
    N. Fatima, T.F. Zheng, Vowel-category based short utterance speaker recognition, in International Conference on Systems and Informatics (ICSAI 2012) (2012)Google Scholar
  19. 19.
    W.M. Fisher, G.R. Doddington, K.M. Goudie-Marshall, The DARPA speech recognition research database: specifications and status, in Proceedings of DARPA Workshop on Speech Recognition, (1986), pp. 93–99Google Scholar
  20. 20.
    S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Process. 29(2), 254–272 (1981)CrossRefGoogle Scholar
  21. 21.
    D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems. In: In Proc. Interspeech, pp. 249–252 (2011)Google Scholar
  22. 22.
    J. Gudnason, M. Brookes, Voice source cepstrum coefficients for speaker identification, in Proceedings of ICASSP, (2008), pp. 4821–4824Google Scholar
  23. 23.
    T. Hasan, R. Saeidi, J.H.L. Hansen, D.A. van Leeuwen, Duration mismatch compensation for i-vector based speaker recognition systems, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013, (2013), pp. 7663–7667Google Scholar
  24. 24.
    A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceedings of International Conference on Spoken Language Processing (ICSLP), (2006), pp. 1471–1474Google Scholar
  25. 25.
    V. Hautamki, Y.C. Cheng, P. Rajan, C.H. Lee, Minimax i-vector extractor for short duration speaker verification, in INTERSPEECH 2013 (2013)Google Scholar
  26. 26.
    M. Hèbert, Text-Dependent Speaker Recognition (Springer, Berlin, 2008), pp. 743–762Google Scholar
  27. 27.
    K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters 13(1), 52–55 (2006)CrossRefGoogle Scholar
  28. 28.
    A. Kanagasundaram, D. Dean, S. Sridharan, J. Gonzalez-Dominguez, J. Gonzalez-Rodriguez, D. Ramos, Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Commun. 59, 69–82 (2014)CrossRefGoogle Scholar
  29. 29.
    A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, M. Mason, i-vector based speaker recognition on short utterances, in Interspeech 2011 (2011)Google Scholar
  30. 30.
    T. Kinnunen, H. Li, An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52, 12–40 (2010)CrossRefGoogle Scholar
  31. 31.
    T. Kinnunen, M. Sahidullah, H. Delgado, M. Todisco, N. Evans, J. Yamagishi, K.A. Lee, The asvspoof 2017 challenge: assessing the limits of replay spoofing attack detection. Proc. Interspeech 2017, 2–6 (2017)CrossRefGoogle Scholar
  32. 32.
    A. Larcher, P. Bousquet, K.A. Lee, D. Matrouf, H. Li, J.F. Bonastre, i-vectors in the context of phonetically-constrained short utterances for speaker verification, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012, (2012), pp. 4773–4776Google Scholar
  33. 33.
    A. Larcher, K.A. Lee, B. Ma, H. Li, Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013, (2013), pp. 7673–7677Google Scholar
  34. 34.
    K.A. Lee, A. Larcher, H. Thai, B. Ma, H. Li, Joint application of speech and speaker recognition for automation and security in smart home, in INTERSPEECH, (2011), pp. 3317–3318Google Scholar
  35. 35.
    K.A. Lee, B. Ma, H. Li, Speaker verification makes its debut in smartphone, in SLTC Newsletter (2013)Google Scholar
  36. 36.
    A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance, in Proceedings of Eurospeech, (Rhodes, Greece, 1997), pp. 1895–1898Google Scholar
  37. 37.
    K. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans Audio Speech Lang Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  38. 38.
    D. Pati, S.R.M. Prasanna, Speaker information from subband energies of linear prediction residual. Natl. Conf. Commun. (NCC) 2010, 1–4 (2010)Google Scholar
  39. 39.
    S.R.M. Prasanna, C. Gupta, B. Yegananarayana, Extraction of speaker specific information from linear prediction residual of speech. Speech Communication 48, 1243–1261 (2006)CrossRefGoogle Scholar
  40. 40.
    A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans Audio Speech Lang Process. 21, issue 12, 2471–2480 (2013)Google Scholar
  41. 41.
    A.G. Ramakrishnan, B. Abhiram, S.R.M. Prasanna, Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. JASA Express Lett. 137, EL469–EL475 (2015)Google Scholar
  42. 42.
    S. Hayakawa, K. Takeda, F. Itakura, Speaker identification using harmonic structure of lp-residual spectrum. Biom. Pers. Authent. 1206, 253–260 (1997)Google Scholar
  43. 43.
    G. Sarkar, G. Saha, Real time implementation of speaker identification system with frame picking algorithm. Procedia Comput. Sci. 2(0), 173–180 (2010). Proc. of the Int. Conference and Exhibition on Biometrics TechnologyCrossRefGoogle Scholar
  44. 44.
    B.D. Sarma, S.R.M. Prasanna, Analysis of vocal tract constrictions using zero frequency filtering. IEEE Signal Process. Lett. 21(12), 1481–1485 (2014)CrossRefGoogle Scholar
  45. 45.
    B. Schölkopf, K.R. Mullert, Fisher discriminant analysis with kernels. in Proceedings of IEEE Workshop on Neural Networks for Signal Processing IX (1999)Google Scholar
  46. 46.
    The NIST Year 2003 Speaker Recognition Evaluation Plan, NIST (2003)Google Scholar
  47. 47.
    P. Thvenaz, H. Hgli, Usefulness of the lpc-residue in text-independent speaker verification. Speech Commun. 17(12), 145–157 (1995)CrossRefGoogle Scholar
  48. 48.
    Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilc i, M. Sahidullah, A. Sizov, Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge, in Interspeech 2015, Dresden, Germany (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringNational University of SingaporeSingaporeSingapore
  2. 2.Department of Electronics and Electrical EngineeringIndian Institute of Technology GuwahatiGuwahatiIndia

Personalised recommendations