Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual

Sreehari, V. R; Mary, Leena

doi:10.1007/s10772-021-09895-z

Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual

Published: 06 September 2021

Volume 25, pages 147–161, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

270 Accesses
Explore all metrics

Abstract

Automatic speaker recognition (ASR) is a challenging task when the duration of the test speech is very short i.e., a few seconds. Source features extracted from short speech utterances are shown to be effective for such cases. This paper proposes a system based on LP residual for text independent speaker recognition. Discrete wavelet transform (DWT) and stationary wavelet transform (SWT) are experimented to parameterize the LP residual. DWT works well in case of denoising and compression. SWT works well in reconstructing the noised signal at higher levels of decomposition than DWT. SWT/DWT coefficients of LP residual are used for implementing an i-vector/P-LDA based speaker recognition system. Effectiveness of the system is evaluated by using 10 s–10 s task of NIST speaker recognition evaluation (SRE) 2010 database. To evaluate robustness in degraded environments, the speech files are mixed with white noise from NOISEX-92 database. Speaker recognition using SWT level-3 results in an equal error rate (EER) of 40 and decision cost function (DCF) of 0.3956 for voice part of the signal in 10 s training—10 s testing data set. It has been shown that the proposed method gives robust speaker recognition performance in terms of DCF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Strategy for developing a speech recognition model specialized for patients with depression or Parkinson’s disease with small size speech database

Article 23 May 2024

References

Ananthapadmanabha, T. V., & Fant, G. (1982). Calculation of true glottal flow and its components. Speech Communication, pp. 167–184.
Bonastre, J. -F., Wils, F., & Meignier, S. (2005). ALIZE, a free toolkit for speaker recognition. In: Proceedings of ICASSP, pp. 737–740.
Chan, W., Lee, T., Zheng, N., & Ouyang, H. (2006). Use of vocal source features in speaker segmentation. In: Proceedings on IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, May, pp. 14–19.
Chen, S. H., & Wang, H. C. (2004). Improvement of speaker recognition by combining residual and prosodic features with acoustic features. In: Proceedings of ICASSP, pp. 93–96.
Chetouani, M., Faundez-Zanuy, M., Gas, B., & Zarader, J. L. (2009). Investigation on LP-residual representations for speaker identification. Pattern Recognition, 42, 487–494.
Article MATH Google Scholar
Das, Rohan Kumar, & Mahadeva Prasanna, S. R. (2017). Speaker verification from short utterance perspective: A review. In: IETE Technical Review, 7 September, https://doi.org/10.1080/02564602.2017.1357507.
Daubechies, I. (1990). The wavelet transform time-frequency localization and signal analysis. IEEE Transactions on Information and Theory, 36, 961–1004, 1992: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics https://epubs.siam.org/doi/abs/10.1137/1.9781611970104.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, Language Processing, 19(4), 788–798.
Article Google Scholar
Fatima, N., & Zheng, T.F. (2012). Short utterance speaker recognition a research agenda. In: International Conference on Systems and Informatics (ICSAI) 2012, 19th May, pp. 1746–1750.
Fauve, B. G. B., Matrouf, D., Scheffer, N., Bonastre, J., & Mason, J. S. D. (2007). State-of-the-art performance in text-independent speaker verification through open-source software. IEEE Transactions on Audio, Speech, and Language Process, 15(7), 221–243.
Article Google Scholar
Ferrer, L., Nandwana, M. K., McLaren, M. L., Castan, D., & Lawson, A. (2019). Towards fail-safe speaker recognition? Trial-based calibration with a reject option. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(1), 140–153.
Article Google Scholar
Furui, S. (1986). Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustics, Speech, Signal Processing, 34(1), 52–59.
Article Google Scholar
Jahangir, R., et al. (2020). Text-independent speaker identification through feature fusion and deep neural network. IEEE Access, 8, 32187–32202. https://doi.org/10.1109/ACCESS.2020.2973541.
Article Google Scholar
Jiang, Y., Lee, K., & Wang, L. (2014). PLDA in the I-supervector space for text-independent speaker verification. EURASIP Journal on Audio, Music and Speech Processing, 2014, 29.
Article Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S., & Mason, M. (2011). I-vector based speaker recognition on short utterances (pp. 2341–2344). Florence: InterSpeech.
Google Scholar
Kawakami, Y., Wang, L., & Nakagawa, S. (2013). Speaker identification using pseudo pitch synchronized phase information in noisy environments. In: Proceedings on APSIPA.
Kawakami, Y., Wang, L., Kai, A., & Nakagawa, S. (2014). Speaker identification by combining various vocal tract and vocal source features. In: Proceedings of International Conference on Text, Speech and Dialogue 2014, pp. 382–389.
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., & Dumouchel, P. (2008). A study of inter-speaker variability in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 980–988.
Article Google Scholar
Kinnunen, T. T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Li, L., Wang, D., Zhang, C., & Zheng, T. F. (2016). Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(6), 1129–1139.
Article Google Scholar
Mahadeva Prasanna, S. R., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker specific excitation information from linear prediction residual of speech. Speech Communication, 48(10), 1234–1261.
Article Google Scholar
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.
Article Google Scholar
McClanahan, R. D., Stewart, B., & De Leon, P. L. (2014). Performance of I-vector speaker verification and the detection of synthetic speech. In: IEEE International Conference on Acoustistics and Speech Signal Processing (ICASSP), May, pp. 3779–3783.
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–56.
Article Google Scholar
Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech Audio Process, 17(5), 569–586.
Article Google Scholar
Pradhan, G., & Prasanna, S. R. M. (2011). Speaker verification under degraded condition: a perceptual study. International Journal of Speech Technology, 14(4), 405–417.
Article Google Scholar
Pradhan, G., & Prasanna, S. R. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech and Language Processing, 21(4), 854–867.
Article Google Scholar
Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.
Article Google Scholar
Qiu, L., & Er, M. H. (1995). Wavelet spectrogram of noisy signals. International Journal on Electronics, 79, 665–677.
Article Google Scholar
Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17, 91–108.
Article Google Scholar
Rouat, J., Liu, Yong Chun, & Morissette, D. (1997). A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Communication, 21, 191–207.
Article Google Scholar
Sadjadi, S. O., Pelecanos, J., & Ganapathy, S. (2016). The ibm speaker recognition system: Recent advances and error analysis. Proceedings on Interspeech, 2016, 3633–3637.
Google Scholar
Scheffer, N., Ferrer, L., Graciarena, M., Kajarekar, S., Shriberg, E., & Stolcke, A. (2010). The SRI NIST, speaker recognition evaluation system. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2011, 5292–5295.
Google Scholar
Selva Nidhyananthan, S., Shantha Selva Kumari, R., & Jaffino, G. (2012). Robust speaker identification using vocal source information. In: International Conference on Devices Circuits and Systems (ICDCS) 2012, pp. 182–186.
Solomonoff, A., Campbell, W., & Boardman, I. (2005). Advances in channel compensation for SVM speaker recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005) (Philadelphia, USA, March 2005), pp. 629–632.
Sri Rama Murty , K., Boominathan, V., & Vijayan, K. (2012). Allpass modeling of LP residual for speaker recognition. In: International Conference on Signal Processing and Communications (SPCOM), July, pp. 1–5.
Stolcke, A., Kajarekar, S., & Ferrer, L. (2008). Nonparametric feature normalization for SVM-based speaker verification. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008) (Las Vegas, Nevada, April 2008), pp. 1577–1580.
Taherian, H., Wang, Z., Chang, J., & Wang, D. (2020). Robust speaker recognition based on single-channel and multi-channel speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1293–1302. https://doi.org/10.1109/TASLP.2020.2986896.
Article Google Scholar
Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). Speech Coding and Synthesis, pp. 497–518.
Tech, M., & Bansal, A. (2012). Speaker recognition using MFCC front end analysis and VQ modeling techniques for Hindi words using MATLAB. International Journal of Computer Applications, 45, 24.
Google Scholar
Thevenaz, P., & Hugli, H. (1995). Usefullness of the LPC-residue in text-independent speaker verification. Speech Communication, 17(1–2), 145–157.
Article Google Scholar
Varga, A., & Steeneken, H. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.
Article Google Scholar
Wang, N., Ching, P. C., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio Speech and Language Processing, 19(1), 196–205.
Article Google Scholar
Wang, L., Nakagawa, S., Dang, J., Wei, J., Shen, T., Li, L., & Zheng, T. F. (2017). Pseudo-pitch-synchronized phase information extraction and its application for robust speaker recognition. In: IEEE 6th Global Conference on Consumer Electronics (GCCE) 2017, Oct, pp. 1–5.
Yao, Q., & Mak, M. (2018). SNR-invariant multitask deep neural networks for robust speaker verification. IEEE Signal Processing Letters, 25(11), 1670–1674.
Article Google Scholar
Yegnanarayana, B., Avendano, C., Hermansky, H., & Murthy, P. S. (1999). Speech enhancement using linear prediction residual. Speech Communication, 28, 25–42.
Article Google Scholar
Yu, J., Markov, K., & Matsui, T. (2019). Articulatory and spectrum information fusion based on deep recurrent neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 742–752.
Article Google Scholar
Zhang, X., Zou, X., Sun, M., Zheng, T. F., Jia, C., & Wang, Y. (2019). Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction. IEEE Access, 7, 27874–27882. https://doi.org/10.1109/ACCESS.2019.2901812.
Article Google Scholar
Zhao, X., Wang, Y., & Wang, D. (2014). Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Transactions on Audio Speech and Language Processing, 22, 836–845.
Article Google Scholar
Zheng, N., & Ching, P. (2004). Using Haar transformed vocal source information for automatic speaker recognition. IEEE International Conference on Acoustics and Speech Signal Processing (ICASSP), 1, 77–80.
Google Scholar
Zheng, N., Ching, P. C., & Lee, T. (2004). Time frequency analysis of vocal source signal for speaker recognition. In: Proceedings of ICSLP, pp. 2333–2336.
Zilca, R. D., Kingsbury, B., Navratil, J., & Ramaswamy, G. N. (2006). Pseudo pitch synchronous analysis of speech with applications to speaker recognition. IEEE Transaction on Speech and Audio Processing, 14(2), 467–478.
Article Google Scholar
Zilca, R. D., Navratil, J., & Ramaswamy, G. N. (2003). ”SynPitch”: A pseudo Pitch Synchrounous Algorithm For Speaker Recognition. In: Eurospeech ’03, Geneva, pp. 2649–2652.

Download references

Acknowledgements

Authors would like to thank Kerala State Council for Science Technology and Environment, India (KSCSTE) for their support. Authors would also like to thank Indian Institute of Technology, Hyderabad (IITH), for their technical support.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, College of Engineering Trivandrum, Kerala University, Trivandrum, India
V. R Sreehari
Department of Electronics and Communication, Rajiv Gandhi Institute of Technology, Kottayam, Kerala, India
Leena Mary

Authors

V. R Sreehari
View author publications
You can also search for this author in PubMed Google Scholar
Leena Mary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. R Sreehari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sreehari, V.R., Mary, L. Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual. Int J Speech Technol 25, 147–161 (2022). https://doi.org/10.1007/s10772-021-09895-z

Download citation

Received: 22 December 2020
Accepted: 21 August 2021
Published: 06 September 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10772-021-09895-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Strategy for developing a speech recognition model specialized for patients with depression or Parkinson’s disease with small size speech database

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Strategy for developing a speech recognition model specialized for patients with depression or Parkinson’s disease with small size speech database

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation