Processing degraded speech for text dependent speaker verification

Khonglah, Banriskhem K.; Bhukya, Ramesh K.; Prasanna, S. R. Mahadeva

doi:10.1007/s10772-017-9451-z

Processing degraded speech for text dependent speaker verification

Published: 24 August 2017

Volume 20, pages 839–850, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Banriskhem K. Khonglah¹,
Ramesh K. Bhukya ORCID: orcid.org/0000-0002-6221-5627¹ &
S. R. Mahadeva Prasanna¹

308 Accesses
6 Citations
Explore all metrics

Abstract

This work explores the use of speech enhancement for enhancing degraded speech which may be useful for text dependent speaker verification system. The degradation may be due to noise or background speech. The text dependent speaker verification is based on the dynamic time warping (DTW) method. Hence there is a necessity of the end point detection. The end point detection can be performed easily if the speech is clean. However the presence of degradation tends to give errors in the estimation of the end points and this error propagates into the overall accuracy of the speaker verification system. Temporal and spectral enhancement is performed on the degraded speech so that ideally the nature of the enhanced speech will be similar to the clean speech. Results show that the temporal and spectral processing methods do contribute to the task by eliminating the degradation and improved accuracy is obtained for the text dependent speaker verification system using DTW.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Methods for Text-Dependent Speaker Verification

Article 03 May 2019

Exploring single channel speech separation for short-time text-dependent speaker verification

Article 13 January 2022

A Comparative Study on Effect of Temporal Phase for Speaker Verification

References

Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Article Google Scholar
Chakrabarty, D., Prasanna, S. R., Mahadeva, Das, & Kumar, Rohan. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.
Article Google Scholar
Das, C. K., Sanaullah, M., Sarower, H. M. G., & Hassan, M. M. (2009). Development of a cell phone based remote control system: An effective switching system for controlling home and office appliances. International Journal of Electrical and Computer Sciences IJECS, 9(10), 37–43.
Google Scholar
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Article Google Scholar
Deepak, K. T., & Prasanna, S. R. M. (2016). Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1204–1218.
Article Google Scholar
Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Article Google Scholar
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29, 254–272.
Article Google Scholar
Haris, B., Pradhan, G., Misra, A., Shukla, S., Sinha, R., Prasanna, S., (2011). Multi-variability speech database for robust speaker recognition. In Communications (NCC), 2011 National conference on IEEE, pp. 1–5.
Hébert, M., (2008). Text-dependent speaker recognition. In Springer handbook of speech processing, pp. 743–762.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Enhancement of noisy speech by temporal and spectral processing. Speech Communication, 53(2), 154–174.
Article Google Scholar
Larcher, A., Lee, K. A., Ma, B., & Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and rsr2015. Speech Communication, 60, 56–77.
Article Google Scholar
Mahanta, D., Paul, A., Ramesh K Bhukya, Rohan K Das, Sinha, R, Prasanna, S.R.M., (2016). Warping path and gross spectrum information for speaker verification under degraded condition. In Communication (NCC), 2016 Twenty Second National Conference on IEEE, pp. 1–6.
Marinov, S., (2003). Text dependent and text independent speaker verification system: Technology and application. Overview article.
Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, Language Processing, 16(8), 16021613.
Google Scholar
Onukwugha, C., & Asagba, P. (2013). Remote control of home appliances using mobile phone: A polymorphous based system. African Journal of Computing and ICT, 6(5), 81–90.
Google Scholar
Pandit, M., Kittler, J., (1998). Feature selection for a dtw-based speaker verification system. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on IEEE, Vol. 2., pp. 769–772.
Piyare, R., Tazil, M., (2011). Bluetooth based home automation system using cell phone. In Consumer Electronics (ISCE), 2011 IEEE 15th International Symposium on IEEE, pp. 192–195.
Pradhan, G., & Prasanna, S. M. (2011). Speaker verification under degraded condition: A perceptual study. International Journal of Speech Technology, 14(4), 405.
Article Google Scholar
Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.
Article Google Scholar
Prasanna, S. M., Zachariah, J. M., Yegnanarayana, B., (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing.
Prasanna, S. R. M., Zachariah, J. M., Yegnanarayana, B. (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing, (TIFR, Mumbai, India).
Rabiner, L., & Juang, B.-H. (1993a). Fundamentals of speech recognition. New Jersey: Pearson Education.
MATH Google Scholar
Rabiner, L. R., & Juang, B. H. (1993b). Fundamentals of speech recognition. Upper Saddle River: Prentice-Hall.
MATH Google Scholar
Rabiner, L. R., Rosenberg, A. E., & Levinson, S. E. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. The Journal of the Acoustical Society of America, 63(S1), S79–S79.
Article MATH Google Scholar
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43–49.
Article MATH Google Scholar
Savoji, M. H. (1989). A robust algorithm for accurate endpointing of speech. Speech Communication, 8, 45–60.
Article Google Scholar
Shahriyar, R., Hoque, E., Sohan, S., Naim, I., Akbar, M. M., & Khan, M. K. (2008). Remote controlling of home appliances using mobile telephony. International Journal of Smart Home, 2(3), 37–54.
Google Scholar
Subhadeep Dey, Sujit Barman, Ramesh K Bhukya, Rohan K Das, Haris, BC, Prasanna, S.R.M., Sinha, R, (2014). Speech biometric based attendance system. In Communications (NCC), 2014 Twentieth National Conference on IEEE, pp. 1–6.
Tsao, C., Gray, R. M., (1984). An endpoint detection for lpc speech using residual look-ahead for vector quantization applications. In IEEE International conference on acoustic, speech, signal processing.
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.
Article Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Acoustics, Speech and Signal Processing, 13, 575–582.
Article Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Mr. Rajib Sharma, in the EEE department of IITG, Guwahati, for his help in making this work come to fruition.

Author information

Authors and Affiliations

Electro Medical and Speech Technology Laboratory, Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam, 781039, India
Banriskhem K. Khonglah, Ramesh K. Bhukya & S. R. Mahadeva Prasanna

Authors

Banriskhem K. Khonglah
View author publications
You can also search for this author in PubMed Google Scholar
Ramesh K. Bhukya
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramesh K. Bhukya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khonglah, B.K., Bhukya, R.K. & Prasanna, S.R.M. Processing degraded speech for text dependent speaker verification. Int J Speech Technol 20, 839–850 (2017). https://doi.org/10.1007/s10772-017-9451-z

Download citation

Received: 18 November 2016
Accepted: 07 August 2017
Published: 24 August 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10772-017-9451-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Processing degraded speech for text dependent speaker verification

Abstract

Access this article

Similar content being viewed by others

Robust Methods for Text-Dependent Speaker Verification

Exploring single channel speech separation for short-time text-dependent speaker verification

A Comparative Study on Effect of Temporal Phase for Speaker Verification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Processing degraded speech for text dependent speaker verification

Abstract

Access this article

Similar content being viewed by others

Robust Methods for Text-Dependent Speaker Verification

Exploring single channel speech separation for short-time text-dependent speaker verification

A Comparative Study on Effect of Temporal Phase for Speaker Verification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation