Skip to main content
Log in

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The main novelty of this work resides in incorporating a Gammatone filter-bank as a substitute of the Mel filter-bank in the extraction pipeline of the Product Spectrum PS. The proposed feature is dubbed the Gammatone Product-Spectrum Cepstral coefficients GPSCC. Experimental results are undertaken on TIMIT and noisy TIMIT corpora using the Gaussian Mixture Model with Universal Background Model (GMM-UBM) recognition algorithm. Performance evaluations indicate that GPSCC shows a drastic reduction in Equal Error Rates compared to other related features and this gain in performance is more pronounced at low signal to noise ratios. Also, our study demonstrates the merit of the Gammatone filter-bank in improving robustness to codec-degraded speech at different bit rates. Furthermore, the proposed GPSCC feature achieves the best verification performance under aggressive compression. Interestingly, at 6.60 kbps we observe that GPSCC achieves an absolute error reduction of 12% compared to the Mel Frequency Cepstral Coefficients (MFCC).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Code available at: http://cs.uef.fi/pages/tkinnu/VQVAD/VQVAD.zip.

References

  1. Alsteris LD, Paliwal KK (2007) Short-time phase spectrum in speech processing: a review and some experimental results. Digit Signal Process 17(3):578–616

    Article  Google Scholar 

  2. Asbai N, Bengherabi M, Amrouche A, Aklouf Y (2015) Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers. Int J Speech Technol 18(2):195–203

    Article  Google Scholar 

  3. Boulkenafet Z, Bengherabi M, Nouali O, Cheriet M (2013) Using the conformal embedding analysis to compensate the channel effect in the i-vector based speaker verification system. In: 2013 International Conference of the BIOSIG Special Interest Group (BIOSIG), pp 241–248

  4. Brummer N, Burget L, Ernock JH, Glembek O, Grezl F, Karafiát M, Strasheim A (2007) Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans Audio, Speech, Lang Proc 15(7):2072–2084

  5. Brummer N Focal: Tools for Fusion and Calibration of automatic speaker detection systems. http://www.dsp.sun.ac.za/nbrummer/focal

  6. Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acous Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  7. Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19 (4):788–798

    Article  Google Scholar 

  8. Dimitriadis D, Maragos P, Potamianos A (2011) On the effects of filterbank design and energy computation on robust speech recognition. IEEE Trans Audio Speech Lang Process 19:1504–1516

    Article  Google Scholar 

  9. Fedila M, Amrouche A (2012) Automatic speaker recognition for mobile communications using AMR-WB speech coding. In: 11th international conference on information science, signal processing and their applications ISSPA. IEEE, pp 1034–1038

  10. Fedila M, Harizi F, Bengherabi M, Amrouche A (2014) Robust speaker verification using a new front end based on multitaper and gammatone filters. In: Tenth International Conference on Signal Image Technology and Internet-Based Systems (SITIS). IEEE, pp 99–103

  11. Fedila M, Bengherabi M, Amrouche A (2015) Consolidating product spectrum and gammatone filter-bank for robust speaker verification under noisy conditions. In: International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 347–352

  12. Fernndez Gallardo L (2016) Human and automatic speaker recognition over telecommunication channels. Springer Science Business Media

  13. Gallardo LF, Wagner M, Mller S (2014) Advantages of wideband over narrowband channels for speaker verification employing MFCCs and LFCCs. In: Fifteenth Annual Conference of the International Speech Communication Association, pp 1115–1119

  14. Gallardo LF, Wagner M, Mller S (2014) I-vector speaker verification for speech degraded by narrowband and wideband channels. In: Proceedings of Speech Communication 11. ITG Symposium. VDE, pp 1–4

  15. Gerkmann T, Krawczyk-Becker M, Roux J (2015) Phase processing for single channel speech enhancement: history and recent advances. IEEE Signal Process Mag 32(2):55–66

    Article  Google Scholar 

  16. Gold B, Morgan N, Ellis D (2011) The auditory system as a filter bank, speech and audio signal processing: processing and perception of speech and music. Wiley

  17. Hegde RM, Murthy HA, Gadde VRR (2007) Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15 (1):190–202

    Article  Google Scholar 

  18. Ireland D, McBride S, Knuepffer C (2015) Adaptive multi-rate compression effects on vowel analysis. In: Bioengineering and biotechnology, vol 3

  19. Recommendation G (2003) 722.2: Wideband Coding of Speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

  20. Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus Eigenchannels in speaker recognition. IEEE Trans Audio Speech and Lang Process 15(4):1435–1447

    Article  Google Scholar 

  21. Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Trans Audio Speech Lang Process 24:1315–1329

    Article  Google Scholar 

  22. Kinnunen T, Rajan P (2013) A practical self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. International conference of acoustics speech and signal processing. In: ICASSP, pp 7229–7233

  23. Kinnunen T, Saeidi R, Sedlak F, Lee K A, Sandberg J, Hansson-Sandsten M, Li H (2012) Low-variance multitaper MFCC features: a case study in robust speaker verification. IEEE Trans Audio Speech Lang Proc, pp 1990–2001

  24. Li Z, Gao Y (2015) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):1–16

  25. Li Q, Huang Y (2011) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech and Lang Process 19(6):1791–1801

    Article  Google Scholar 

  26. Madikeri SR, Talambedu A, Murthy HA (2015) Modified group delay feature based total variability space modelling for speaker recognition. Int J Speech Technol 18 (1):17–23

    Article  Google Scholar 

  27. Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. National Inst of Standards and Technology Gaithersburg MD

  28. Mclaren M, Abrash V, Graciarena M, Lei Y, Pesan J (2013) Improving robustness to compressed speech in speaker recognition. In: INTERSPEECH, pp 3698–3702

  29. Mowlaee P, Saeidi R, Stylianou Y (2016) Advances in phase-aware signal processing in speech communication. Speech Commun 81:1–29

    Article  Google Scholar 

  30. The NIST year 2008 and 2010 speaker recognition evaluation plans, http://www.itl.nist.gov/iad/mig/tests/sre

  31. Paliwal KK, Atal BS (2003) Frequency-related representation of speech. Eighth European Conference on Speech Communication and Technology

  32. Rajan P, Kinnunen T, Hanilci C, Pohjalainen J, Alku P (2013) Using group delay functions from all-pole models for speaker recognition. In: INTERSPEECH, pp 2489–2493

  33. Reynolds D, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10(1):19–41

    Article  Google Scholar 

  34. Sadjadi SO, Slaneyand M, Heck L (2013) MSR identity toolbox

  35. Sebastian J, Kumar M, Murthy HA (2016) An analysis of the high resolution property of group delay function with applications to audio signal processing. Speech Comm 81:42–53

    Article  Google Scholar 

  36. Tiwari V (2010) MFCC and its applications in speaker recognition. Int J Emerg Technol 1(3):19–22

    Google Scholar 

  37. Linguistic Data Consortium (1990) The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. NIST Speech Disc CD1-1.1

  38. Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251. http://spib.rice.edu/spib/selectnoise.html

    Article  Google Scholar 

  39. Vijayan K, Reddy PR, Murty KSR (2016) Significance of analytic phase of speech signals in speaker verification. Speech Comm 81:54–71

    Article  Google Scholar 

  40. Ying L (2006) Phase unwrapping Wiley encyclopedia of biomedical engineering.

  41. Zhao X, Wang DL (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. International Conference on Acoustics Speech and Signal Processing (ICASSP), 2013, IEEE, pp 7204–7208

  42. Zhou X, Garcia-Romero D, Duraiswami R, Espy-Wilson C, Shamma S (2011) Linear versus Mel frequency cepstral coefficients for speaker recognition. In: Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop, pp 559–564

  43. Zhu D, Paliwal KK (2004) Product of power spectrum and group delay function for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, vol 1, pp I–125

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Bengherabi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fedila, M., Bengherabi, M. & Amrouche, A. Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimed Tools Appl 77, 16721–16739 (2018). https://doi.org/10.1007/s11042-017-5237-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5237-1

Keywords

Navigation