Robust Methods for Text-Dependent Speaker Verification

Bhukya, Ramesh K.; Prasanna, S. R. Mahadeva; Sarma, Biswajit Dev

doi:10.1007/s00034-019-01125-x

Robust Methods for Text-Dependent Speaker Verification

Published: 03 May 2019

Volume 38, pages 5253–5288, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Ramesh K. Bhukya ORCID: orcid.org/0000-0002-6221-5627¹,
S. R. Mahadeva Prasanna^1,2 &
Biswajit Dev Sarma³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In this work, we explore various noise robust techniques at different stages of a Text-Dependent Speaker Verification (TDSV) system. A speech-specific knowledge-based robust end points detection technique is used for noise compensation at signal level. Feature-level compensation is done by using robust features extracted from Hilbert Spectrum (HS) of the Intrinsic Mode Functions obtained from Modified Empirical Mode Decomposition of speech. We also explored a combined temporal and spectral speech enhancement technique prior to the end points detection for enhancing speech regions embedded in noise. All experimental studies are conducted using two databases, namely the RSR2015 and the IITG database. It is found that the use of robust end points detection improves the performance of the TDSV system compared to the energy-based end points detection in both clean and degraded speech conditions. Use of noise robust HS features augmented with Mel-frequency cepstral coefficients further improves the performance of the system. It is also found that the use of speech enhancement prior to signal and feature-level compensation results in further improvement in performance for the low SNR cases. The final combined system obtained by using three robust methods provides a relative improvement from 6 to 25% in terms of the EER, on the RSR2015 database corrupted with Babble noise of varying strength and by around from 30 to 45% relative improvement on the IITG database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating Text-Independent Speaker Verification Systems Under Varied Data Conditions

Article 18 January 2019

Speaker-specific-text based speaker verification system using spectral and phase based features

Article 12 May 2017

Processing degraded speech for text dependent speaker verification

Article 24 August 2017

References

L.D. Alsteris, K.K. Paliwal, Further intelligibility results from human listening tests using the short-time phase spectrum. Speech Commun. 48(6), 727–736 (2006)
Article Google Scholar
Y. Bayya, D.N. Gowda, Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55(6), 782–795 (2013)
Article Google Scholar
H. Beigi, Speaker Recognition: Advancements and Challenges (INTECH Open Access Publisher, London, 2012)
Google Scholar
R.K. Bhukya, B.D. Sarma, S.R.M. Prasanna, End point detection using speech-specific knowledge for text-dependent speaker verification. Circuits Syst. Signal Process. 37(12), 5507–5539 (2018)
Article Google Scholar
G. Biagetti, P. Crippa, L. Falaschetti, S. Orcioni, C. Turchetti, An investigation on the accuracy of truncated DKLT representation for speaker identification with short sequences of speech frames. IEEE Trans. Cybern. 47(12), 4235–4249 (2017)
Article Google Scholar
G. Biagetti, P. Crippa, L. Falaschetti, S. Orcioni, C. Turchetti, Speaker identification in noisy conditions using short sequences of speech frames. In: International Conference on Intelligent Decision Technologies (Springer, 2017), pp. 43–52
H. Boril, P. Fousek, P. Pollák, Data-driven design of front-end filter bank for Lombard speech recognition. In: Ninth International Conference on Spoken Language Processing (2006)
A. Bouchikhi, A.O. Boudraa, Multicomponent AM-FM signals analysis based on EMD-B-splines ESA. Signal Process. 92(9), 2214–2228 (2012)
Article Google Scholar
C. Charbuillet, B. Gas, M. Chetouani, J. Zarader, Optimizing feature complementarity by evolution strategy: application to automatic speaker verification. Speech Commun. 51(9), 724–731 (2009)
Article Google Scholar
K.T. Deepak, S.R.M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1205–1219 (2016)
Article Google Scholar
K.T. Deepak, B.D. Sarma, S.R.M. Prasanna, Foreground speech segmentation using zero frequency filtered signal. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
S. Dey, S. Barman, R.K. Bhukya, R.K. Das, B.C. Haris, S.R.M. Prasanna, R. Sinha, Speech biometric based attendance system. In: National Conference on Communications (2014)
N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. Signal Process. Lett. IEEE 17(3), 273–276 (2010)
Article Google Scholar
G.R. Doddington, M.A. Przybocki, A.F. Martin, D.A. Reynolds, The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun. 31(2), 225–254 (2000)
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2), 443–445 (1985)
Article Google Scholar
P. Flandrin, Some aspects of huangs empirical mode decomposition, from interpretation to applications. In: International Conference of Computational Harmonic Analysis CHA, vol. 4 (2004)
P. Flandrin, P. Gonçalves, G. Rilling, EMD equivalent filter banks, from interpretation to applications, in Hilbert-Huang Transform and Its Applications. Interdisciplinary Mathematical Sciences, ed. by N.E. Huang, S.S.P. Shen (World Scientific Publishing, Singapore, 2005), pp. 57–74
Chapter Google Scholar
S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
Article Google Scholar
T. Ganchev, N. Fakotakis, G. Kokkinakis, Comparative evaluation of various MFCC implementations on the speaker verification task. Proc. SPECOM 1, 191–194 (2005)
Google Scholar
S. Gazor, W. Zhang, A soft voice activity detector based on a Laplacian–Gaussian model. IEEE Trans. Speech Audio Proces. 11(5), 498–505 (2003)
Article Google Scholar
F. Gianfelici, G. Biagetti, P. Crippa, C. Turchetti, Multicomponent AM-FM representations: an asymptotically exact approach. IEEE Trans. Audio Speech Lang. Process. 15(3), 823–837 (2007)
Article Google Scholar
M. Hébert, Text-dependent speaker recognition, in Springer Handbook of Speech Processing, ed. by J. Benesty, M.M. Sondhi, Y.A. Huang (Springer, 2008), pp. 743–762
R.S. Holambe, M.S. Deshpande, Advances in Non-linear Modeling for Speech Processing (Springer Science & Business Media, Berlin, 2012)
Book Google Scholar
N.E. Huang, Empirical mode decomposition and Hilbert spectral analysis (1998), https://ntrs.nasa.gov/search.jsp?R=19990078602
N.E. Huang, S.S. Shen, Hilbert–Huang transform and Its Applications, vol. 5 (World Scientific, Singapore, 2005)
Book Google Scholar
J.C. Junqua, B. Reaves, B. Mak, A study of endpoint detection algorithms in adverse conditions: incidence on a DTW and HMM recognizer. In: Second European Conference on Speech Communication and Technology (1991)
K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)
Article Google Scholar
A.N. Khan, B. Yegnanarayana, Vowel onset point based variable frame rate analysis for speech recognition. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005 (IEEE, 2005), pp. 392–394
B.K. Khonglah, R.K. Bhukya, S.R.M. Prasanna, Processing degraded speech for text dependent speaker verification. Int. J. Speech Technol. 20(4), 839–850 (2017)
Article Google Scholar
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Article Google Scholar
H. Kremer, A. Cohen, T. Vaich, Voice activity detector (VAD) for hmm based speech recognition. In: Proceedings of ICSPAT (1999)
P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Commun. 53(2), 154–174 (2011)
Article Google Scholar
A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)
Article Google Scholar
K.A. Lee, A. Larcher, H. Thai, B. Ma, H. Li, Joint application of speech and speaker recognition for automation and security in smart home. In: INTERSPEECH (2011), pp. 3317–3318
Q. Li, J. Zheng, A. Tsai, Q. Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech Audio Process. 10(3), 146–157 (2002)
Article Google Scholar
D. Mahanta, A. Paul, R.K. Bhukya, R.K. Das, R. Sinha, S.R.M. Prasanna, Warping path and gross spectrum information for speaker verification under degraded condition. In: 22nd National Conference on Communication (NCC) (IEEE, 2016), pp. 1–6
J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
S. Marinov, H.I. Skövde, Text dependent and text independent speaker verification systems. Technology and applications. Overview article (2003)
A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance. Technical report, National Institute of Standards and Technology, Gaithersburg MD (1997)
N. McLaughlin, J. Ming, D. Crookes, Speaker recognition in noisy conditions with limited training data. In: 2011 19th European Signal Processing Conference (IEEE, 2011), pp. 1294–1298
J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, M.A. Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)
Article Google Scholar
A. Paul, D. Mahanta, R.K. Das, R.K. Bhukya, S. Prasanna, Presence of speech region detection using vowel-like regions and spectral slope information. In: 2017 14th IEEE India Council International Conference (INDICON) (IEEE, 2017), p. 15
G. Pradhan, S.R.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)
Article Google Scholar
S.R.M. Prasanna, G. Pradhan, Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans. Audio Speech Lang. Process. 19(8), 2552–2565 (2011)
Article Google Scholar
S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation information. In: Ninth European Conference on Speech Communication and Technology (2005)
S.R.M. Prasanna, J.M. Zachariah, B. Yegnanarayana, Begin-end detection using vowel onset points. In: Workshop on Spoken Language Processing (2003)
L.R. Rabiner, R.W. Schafer et al., Introduction to digital speech processing. Found. Trends® Signal Process. 1(1–2), 1–194 (2007)
Article Google Scholar
K. Ramesh, S.R.M. Prasanna, R.K. Das, Significance of glottal activity detection and glottal signature for text dependent speaker verification. In: 2014 IEEE International Conference on Signal Processing and Communications (SPCOM) (2014), pp. 1–5
B.D. Sarma, S.R.M. Prasanna, P. Sarmah, Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun. 92, 77–89 (2017)
Article Google Scholar
R. Sharma, R.K. Bhukya, S.R.M. Prasanna, Analysis of the Hilbert spectrum for text-dependent speaker verification. Speech Commun. 96, 207–224 (2018)
Article Google Scholar
R. Sharma, S.R.M. Prasanna, A better decomposition of speech obtained using modified empirical mode decomposition. Digit. Signal Process. 58, 26–39 (2016)
Article Google Scholar
R. Sharma, S.R.M. Prasanna, R.K. Bhukya, R.K. Das, Analysis of the intrinsic mode functions for speaker information. Speech Commun. 91, 1–16 (2017)
Article Google Scholar
R. Sharma, L. Vignolo, G. Schlotthauer, M.A. Colominas, H.L. Rufiner, S.R.M. Prasanna, Empirical mode decomposition for adaptive AM-FM analysis of speech: a review. Speech Commun. 88, 39–64 (2017)
Article Google Scholar
A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Article Google Scholar
J.D. Wu, Y.J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011)
Article Google Scholar
B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Speech Audio Process. 13(4), 575–582 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electro Medical and Speech Technology Laboratory, Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Ramesh K. Bhukya & S. R. Mahadeva Prasanna
Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, 580011, India
S. R. Mahadeva Prasanna
Bay Area Advanced Analytics India (P) Ltd., a Kaliber.AI company, Guwahati, 781039, India
Biswajit Dev Sarma

Authors

Ramesh K. Bhukya
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar
Biswajit Dev Sarma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramesh K. Bhukya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhukya, R.K., Prasanna, S.R.M. & Sarma, B.D. Robust Methods for Text-Dependent Speaker Verification. Circuits Syst Signal Process 38, 5253–5288 (2019). https://doi.org/10.1007/s00034-019-01125-x

Download citation

Received: 24 September 2018
Revised: 18 April 2019
Accepted: 19 April 2019
Published: 03 May 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00034-019-01125-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Methods for Text-Dependent Speaker Verification

Abstract

Access this article

Similar content being viewed by others

Investigating Text-Independent Speaker Verification Systems Under Varied Data Conditions

Speaker-specific-text based speaker verification system using spectral and phase based features

Processing degraded speech for text dependent speaker verification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Methods for Text-Dependent Speaker Verification

Abstract

Access this article

Similar content being viewed by others

Investigating Text-Independent Speaker Verification Systems Under Varied Data Conditions

Speaker-specific-text based speaker verification system using spectral and phase based features

Processing degraded speech for text dependent speaker verification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation