Advertisement

Circuits, Systems, and Signal Processing

, Volume 38, Issue 4, pp 1775–1792 | Cite as

Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification

  • Rohan Kumar DasEmail author
  • Sarfaraz Jelil
  • S. R. Mahadeva Prasanna
Article
  • 63 Downloads

Abstract

This work focuses on long-enrollment with short-test speaker verification (SV) from the perspective of application-oriented systems. The importance of phonetic match between train and test models is explored in terms of having a text-constraint model-based framework on Part IV of RedDots database. This database has a text-dependent and a text-prompted-based enrollment conditions for speaker modeling. Two different text-constraint setups are formalized for evaluating the effect of text match on train and test sessions. Further, the excitation source features mel power difference of spectrum in subbands, residual mel frequency cepstral coefficient and discrete cosine transform of integrated linear prediction residual are investigated to determine their significance for text-constraint-based framework. Although the source features individually perform poorer compared to the conventional mel frequency cepstral coefficient (MFCC) features, their significance is reflected in fusion due to the complementary nature of information carried by them. Additionally, the source features become imperative for text-constraint-based models for long-enrollment with short-test SV in fusion to MFCC features and achieves commendable improvement from baseline framework of text-prompted-based enrollment condition. This thus minimizes the performance difference between text-dependent and text-prompted-based enrollment condition showing importance of text-constraint models and source information in long-enrollment with short-test-based framework favorable from the perspective of field deployable systems.

Keywords

Speaker verification Short utterances Text-constraint Source features 

References

  1. 1.
    M.J. Alam, P. Kenny, V. Gupta, Tandem features for text-dependent speaker verification on the reddots corpus. Interspeech 2016, 420–424 (2016)CrossRefGoogle Scholar
  2. 2.
    T.V. Ananthapadmanabha, A.P. Prathosh, A.G. Ramakrishnan, Detection of closure-burst transitions of stops and affricates in continuous speech using plosion index. J. Acoust. Soc. Am. 135(1), 460–471 (2014)CrossRefGoogle Scholar
  3. 3.
    D. Chakrabarty, S.R.M. Prasanna, R.K. Das, Development and evaluation of online text-independent speaker verification system for remote person authentication. Int. J. Speech Technol. 16(1), 75–88 (2013)CrossRefGoogle Scholar
  4. 4.
    W. Chan, N. Zheng, T. Lee, Discrimination power of vocal source and vocal tract related features for speaker segmentation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1884–1892 (2007)CrossRefGoogle Scholar
  5. 5.
    R.K. Das, B, A, S.R.M. Prasanna, A.G. Ramakrishnan, Combining source and system information for limited data speaker verification, in Interspeech 2014 (Singapore, 2014), pp. 1836–1840Google Scholar
  6. 6.
    R.K. Das, S. Jelil, S.R.M. Prasanna, Exploring session variability and template aging in speaker verification for fixed phrase short utterances, in Interspeech 2016, pp. 445–449Google Scholar
  7. 7.
    R.K. Das, S. Jelil, S.R.M. Prasanna, Significance of constraining text in limited data text-independent speaker verification, in International Conference on Signal Processing and Communications (SPCOM) 2016, (IISc Bangalore, 2016)Google Scholar
  8. 8.
    R.K. Das, S. Jelil, S.R.M. Prasanna, Development of multi-level speech based person authentication system. J. Signal Process. Syst. 88(3), 259–271 (2017)CrossRefGoogle Scholar
  9. 9.
    R.K. Das, D. Pati, S.R.M. Prasanna, Different aspects of source information for limited data speaker verification, in National conference on communications (NCC) 2015 (IIT Bombay, 2015)Google Scholar
  10. 10.
    R.K. Das, S.R.M. Prasanna, Speaker verification for variable duration segments and the effect of session variability. Lecture Notes in Electrical Engineering, chap. 16 (Springer, 2015), pp. 193–200Google Scholar
  11. 11.
    R.K. Das, S.R.M. Prasanna, Exploring different attributes of source information for speaker verification with limited test data. J. Acoust. Soc. Am. 140(1), 184–190 (2016)CrossRefGoogle Scholar
  12. 12.
    R.K. Das, S.R.M. Prasanna, Speaker verification from short utterance perspective: a review. IETE Tech. Rev. (2017).  https://doi.org/10.1080/02564602.2017.1357507 Google Scholar
  13. 13.
    N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  14. 14.
    S. Dey, S. Barman, R.K. Bhukya, R.K. Das, B C Haris, S.R.M. Prasanna, R. Sinha, Speech biometric based attendance system, in National Conference on Communications (NCC) 2014 (IIT Kanpur, 2014)Google Scholar
  15. 15.
    S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)CrossRefGoogle Scholar
  16. 16.
    S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)CrossRefGoogle Scholar
  17. 17.
    D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems, in Proc. Interspeech (2011), pp. 249–252Google Scholar
  18. 18.
    J. Gudnason, M. Brookes, Voice source cepstrum coefficients for speaker identification, in Proc. ICASSP (2008) , pp. 4821–4824Google Scholar
  19. 19.
    S. Hayakawa, K. Takeda, F. Itakura, Speaker identification using harmonic structure of lp-residual spectrum. Lecture Notesin Biometric Personal Aunthentification, vol. 1206 (Springer, Berlin , 1997), pp. 253–260Google Scholar
  20. 20.
    S. Jelil, R.K. Das, R. Sinha, S.R.M. Prasanna, Speaker verification using gaussian posteriorgrams on fixed phrase short utterances, in Interspeech 2015 (Dresden, Germany, 2015), pp. 1042–1046Google Scholar
  21. 21.
    A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, M. Mason, i-vector based speaker recognition on short utterances, In Interspeech 2011 (2011)Google Scholar
  22. 22.
    T. Kinnunen, M. Sahidullah, I. Kukanov, H. Delgado, M. Todisco, A.K. Sarkar, N.B. Thomsen, V. Hautamki, N. Evans, Z.H. Tan, Utterance verification for text-dependent speaker recognition: a comparative assessment using the reddots corpus. Interspeech 2016, 430–434 (2016)CrossRefGoogle Scholar
  23. 23.
    A. Larcher, P. Bousquet, K.A. Lee, D. Matrouf, H. Li, J.F. Bonastre, i-vectors in the context of phonetically-constrained short utterances for speaker verification, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012 (2012), pp. 4773–4776Google Scholar
  24. 24.
    A. Larcher, K.A. Lee, B. Ma, H. Li, Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013 (2013), pp. 7673–7677Google Scholar
  25. 25.
    A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)CrossRefGoogle Scholar
  26. 26.
    K.A. Lee, A. Larcher, W. Guangsen, K. Patrick, N. Brummer, D. van Leeuwen, H. Aronowitz, M. Kockmann, C. Vaquero, B. Ma, H. Li, T. Stafylakis, J. Alam, A. Swart, J. Perez, The RedDots data collection for speaker recognition, in Interspeech 2015 (Dresden, Germany, 2015), pp. 2996–3000Google Scholar
  27. 27.
    K.A. Lee, A. Larcher, H. Thai, B. Ma, H. Li, Joint application of speech and speaker recognition for automation and security in smart home, in INTERSPEECH (2011), pp. 3317–3318Google Scholar
  28. 28.
    J. Ma, S. Irtza, K. Sriskandaraja, V. Sethu, E. Ambikairajah, Parallel speaker and content modelling for text-dependent speaker verification. Interspeech 2016, 435–439 (2016)CrossRefGoogle Scholar
  29. 29.
    A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance, in Proc. Eurospeech (Rhodes, Greece, 1997), pp. 1895–1898Google Scholar
  30. 30.
    I. Mporas, S. Safavi, R. Sotudeh, Improving robustness of speaker verification by fusion of prompted text-dependent and text-independent operation modalities, in Speech and Computer, ed. by A. Ronzhin, R. Potapova, G. Németh (Springer International Publishing, Cham, 2016), pp. 378–385CrossRefGoogle Scholar
  31. 31.
    K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)CrossRefGoogle Scholar
  32. 32.
    D. Pati, S.R.M. Prasanna, Speaker information from subband energies of linear prediction residual, in National Conference on Communications (NCC) (2010), pp. 1–4Google Scholar
  33. 33.
    D. Pati, S.R.M. Prasanna, A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana 38(4), 591–620 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    S.R.M. Prasanna, C. Gupta, B. Yegananarayana, Extraction of speaker specific information from linear prediction residual of speech. Speech Commun. 48, 1243–1261 (2006)CrossRefGoogle Scholar
  35. 35.
    A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index, in IEEE Trans. on Audio, Speech, and Language Processing vol. 21, no. 12 (2013), pp. 2471–2480Google Scholar
  36. 36.
    B. Putra, Suyanto Implementation of secure speaker verification at web login page using mel frequency cepstral coefficient-gaussian mixture model (mfcc-gmm). in 2011 2nd international conference on instrumentation control and automation (ICA) (2011), pp 358–363Google Scholar
  37. 37.
    A.G. Ramakrishnan, B. Abhiram, S.R.M. Prasanna, Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. JASA Express Lett. 137, EL469–EL475 (2015)Google Scholar
  38. 38.
    R. Ramos-Lara, M. Lpez-Garca, E. Cant-Navarro, L. Puente-Rodriguez, Real-time speaker verification system implemented on reconfigurable hardware. J. Signal Process. Syst. 71(2), 89–103 (2013)CrossRefGoogle Scholar
  39. 39.
    S. Safavi, H. Gan, I. Mporas, Improving speaker verification performance under spoofing attacks by fusion of different operational modes, in 13th IEEE International Colloquium on Signal Processing its Applications (CSPA), vol. 2017 (2017), pp. 219–223Google Scholar
  40. 40.
    S. Safavi, I. Mporas, Combination of rule-based and data-driven fusion methodologies for different speaker verification modes of operation. in IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) 2017 (2017), pp. 354–359Google Scholar
  41. 41.
    S. Safavi, I. Mporas, Improving performance of speaker identification systems using score level fusion of two modes of operation, in Speech and Computer, ed. by A. Karpov, R. Potapova, I. Mporas (Springer International Publishing, Cham, 2017), pp. 438–444CrossRefGoogle Scholar
  42. 42.
    A.K. Sarkar, Z.H. Tan, Text dependent speaker verification using un-supervised hmm-ubm and temporal gmm-ubm. Interspeech 2016, 425–429 (2016)CrossRefGoogle Scholar
  43. 43.
    P. Thvenaz, H. Hgli, Usefulness of the lpc-residue in text-independent speaker verification. Speech Commun. 17(1–2), 145–157 (1995)CrossRefGoogle Scholar
  44. 44.
    G. Wang, K.A. Lee, T.H. Nguyen, H. Sun, B. Ma, Joint speaker and lexical modeling for short-term characterization of speaker, in Interspeech 2016 (2016), pp. 415–419Google Scholar
  45. 45.
    H. Zeinali, H. Sameti, L. Burget, J. ernock, N. Maghsoodi, P. Matjka, i-vector/hmm based text-dependent speaker verification system for reddots challenge, in Interspeech 2016 (2016), pp. 440–444Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringNational University of SingaporeSingaporeSingapore
  2. 2.Department of Electronics and Electrical EngineeringIndian Institute of Technology GuwahatiGuwahatiIndia

Personalised recommendations