Acoustic feature extraction method for robust speaker identification

Li, Zuoqiang; Gao, Yong

doi:10.1007/s11042-015-2660-z

Acoustic feature extraction method for robust speaker identification

Published: 05 May 2015

Volume 75, pages 7391–7406, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

553 Accesses
15 Citations
Explore all metrics

Abstract

When there is a mismatch between the acoustic training environment and the testing environment, the performance of automatic speaker identification systems degrades significantly. A robust feature extraction method for speaker recognition based on the gammatone filter is therefore proposed in this paper. By employing the working mechanism of the human auditory model instead of the traditional triangular filter banks, gammatone filter banks are used to simulate the auditory model of the human ear cochlea. The cube root compression method, equal loudness technology, and relative spectral (RASTA) filtering technology are incorporated into the robust feature extraction process. A simulation experiment is conducted based on the Gaussian mixture model (GMM) recognition algorithm. The experimental results indicate that the proposed feature parameters could show superior robustness and represent the characteristics of the speaker better than the conventional mel-frequency cepstrum coefficient (MFCC), cochlear cepstrum coefficient (CFCC) and relative spectra-perceptual linear predictive (RASTA-PLP) parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Article 09 March 2020

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Article 27 October 2022

New Front End Based on Multitaper and Gammatone Filters for Robust Speaker Verification

References

Acero A (1993) Acoustical and environmental robustness in automatic speech recognition. Springer, vol. 201
Cohen JR (1989) Application of an auditory model to speech recognition. J Acoust Soc Am 85:2623–2629
Article Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Sig Process 28(4):357–366
Article Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Article Google Scholar
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589
Article Google Scholar
Hermansky H, Morgan N, Bayya A, et al (1992) RASTA-PLP speech analysis technique. In: ICASSP-92, IEEE International Conference on Acoustics, Speech and Signal Processing. vol 1, 121–124
Huang X, Acero A, Hon HW (2001) Spoken language processing. Prentice Hall, Englewood Cliffs
Google Scholar
Hunt M, Lefebvre C (1988) Speaker dependent and independent speech recognition experiments with an auditory model. In: ICASSP-88, IEEE International Conference on Acoustics, Speech and Signal Processing. 215–218
Johannesma PIM (1972) The pre-response stimulus ensemble of neurons in the cochlear nucleus. Symposium on hearing theory. IPO, Eindhoven, pp 58–69
Google Scholar
Lawrence R, Rabiner (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Li Q, Huang Y (2011) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech Lang Process 19(6):1791–1801
Article Google Scholar
Lyon RF, Katsiamis AG, Drakakis EM (2010) History and future of auditory filter models. In: ISCAS, IEEE International Symposium on Circuits and Systems. 3809–3812
Nemer E, Goubran R, Mahmoud S (2001) Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans Speech Audio Process 9(3):217–231
Article Google Scholar
Panagiotakis C, Tziritas GA (2005) Speech/music discriminator based on RMS and zero-crossings. IEEE Trans Multimed 7(1):155–166
Article Google Scholar
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83
Article Google Scholar
Saeidi R, Pohjalainen J, Kinnunen T et al (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Trans Sig Process Lett 17(6):599–602
Article Google Scholar
Sahidullah M, Saha G (2012) Design analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Comm 54(4):543–565
Article Google Scholar
Seneff S (1990) A joint synchrony/mean-rate model of auditory speech processing. Readings in speech recognition. Morgan Kaufmann Publishers Inc: 101–111
Shao Y, Jin Z, Wang DL, et al (2009) An auditory-based feature for robust speech recognition. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. 4625–4628
Stevens SS (1957) On the psychophysical law. Psychol Rev 64(3):153
Article Google Scholar
Stevens SS (1972) Perceived level of noise by Mark VII and decibels (E). J Acoust Soc Am 51(2B):575–601
Article Google Scholar
Varga A, Steenneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm 12(3):247–251
Article Google Scholar
Von Be Kesy G (1961) Concerning the pleasures of observing and the mechanics of the inner ear. Nobel Lecture, December, 11
Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616
Article Google Scholar
Zheng R, Zhang S, Xu B (2004) Text-independent speaker identification using GMM-UBM and frame level likelihood normalization. IEEE International Symposium on Chinese Spoken Language Processing. 289–292
Zue V, Glass J, Goodine D, et al (1990) The summit speech recognition system: Phonological modelling and lexical access. In: ICASSP-90, IEEE International Conference on Acoustics, Speech and Signal Processing. vol 1, 49–52

Download references

Acknowledgments

The authors would like to give an acknowledgment to authors of references and a great deal of work done by them, as well as the co-workers for their helpful comments. The authors would like to thank anonymous reviewers for their useful comments that help revising the paper.

Author information

Authors and Affiliations

College of Electronics and Information Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
Zuoqiang Li & Yong Gao

Authors

Zuoqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zuoqiang Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Gao, Y. Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75, 7391–7406 (2016). https://doi.org/10.1007/s11042-015-2660-z

Download citation

Received: 05 November 2014
Revised: 20 April 2015
Accepted: 24 April 2015
Published: 05 May 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11042-015-2660-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic feature extraction method for robust speaker identification

Abstract

Access this article

Similar content being viewed by others

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

New Front End Based on Multitaper and Gammatone Filters for Robust Speaker Verification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Acoustic feature extraction method for robust speaker identification

Abstract

Access this article

Similar content being viewed by others

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

New Front End Based on Multitaper and Gammatone Filters for Robust Speaker Verification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation