Skip to main content
Log in

Processing degraded speech for text dependent speaker verification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This work explores the use of speech enhancement for enhancing degraded speech which may be useful for text dependent speaker verification system. The degradation may be due to noise or background speech. The text dependent speaker verification is based on the dynamic time warping (DTW) method. Hence there is a necessity of the end point detection. The end point detection can be performed easily if the speech is clean. However the presence of degradation tends to give errors in the estimation of the end points and this error propagates into the overall accuracy of the speaker verification system. Temporal and spectral enhancement is performed on the degraded speech so that ideally the nature of the enhanced speech will be similar to the clean speech. Results show that the temporal and spectral processing methods do contribute to the task by eliminating the degradation and improved accuracy is obtained for the text dependent speaker verification system using DTW.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.

    Article  Google Scholar 

  • Chakrabarty, D., Prasanna, S. R., Mahadeva, Das, & Kumar, Rohan. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.

    Article  Google Scholar 

  • Das, C. K., Sanaullah, M., Sarower, H. M. G., & Hassan, M. M. (2009). Development of a cell phone based remote control system: An effective switching system for controlling home and office appliances. International Journal of Electrical and Computer Sciences IJECS, 9(10), 37–43.

    Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.

    Article  Google Scholar 

  • Deepak, K. T., & Prasanna, S. R. M. (2016). Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1204–1218.

    Article  Google Scholar 

  • Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.

    Article  Google Scholar 

  • Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29, 254–272.

    Article  Google Scholar 

  • Haris, B., Pradhan, G., Misra, A., Shukla, S., Sinha, R., Prasanna, S., (2011). Multi-variability speech database for robust speaker recognition. In Communications (NCC), 2011 National conference on IEEE, pp. 1–5.

  • Hébert, M., (2008). Text-dependent speaker recognition. In Springer handbook of speech processing, pp. 743–762.

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  • Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Enhancement of noisy speech by temporal and spectral processing. Speech Communication, 53(2), 154–174.

    Article  Google Scholar 

  • Larcher, A., Lee, K. A., Ma, B., & Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and rsr2015. Speech Communication, 60, 56–77.

    Article  Google Scholar 

  • Mahanta, D., Paul, A., Ramesh K Bhukya, Rohan K Das, Sinha, R, Prasanna, S.R.M., (2016). Warping path and gross spectrum information for speaker verification under degraded condition. In Communication (NCC), 2016 Twenty Second National Conference on IEEE, pp. 1–6.

  • Marinov, S., (2003). Text dependent and text independent speaker verification system: Technology and application. Overview article.

  • Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, Language Processing, 16(8), 16021613.

    Google Scholar 

  • Onukwugha, C., & Asagba, P. (2013). Remote control of home appliances using mobile phone: A polymorphous based system. African Journal of Computing and ICT, 6(5), 81–90.

    Google Scholar 

  • Pandit, M., Kittler, J., (1998). Feature selection for a dtw-based speaker verification system. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on IEEE, Vol. 2., pp. 769–772.

  • Piyare, R., Tazil, M., (2011). Bluetooth based home automation system using cell phone. In Consumer Electronics (ISCE), 2011 IEEE 15th International Symposium on IEEE, pp. 192–195.

  • Pradhan, G., & Prasanna, S. M. (2011). Speaker verification under degraded condition: A perceptual study. International Journal of Speech Technology, 14(4), 405.

    Article  Google Scholar 

  • Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.

    Article  Google Scholar 

  • Prasanna, S. M., Zachariah, J. M., Yegnanarayana, B., (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing.

  • Prasanna, S. R. M., Zachariah, J. M., Yegnanarayana, B. (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing, (TIFR, Mumbai, India).

  • Rabiner, L., & Juang, B.-H. (1993a). Fundamentals of speech recognition. New Jersey: Pearson Education.

    MATH  Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1993b). Fundamentals of speech recognition. Upper Saddle River: Prentice-Hall.

    MATH  Google Scholar 

  • Rabiner, L. R., Rosenberg, A. E., & Levinson, S. E. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. The Journal of the Acoustical Society of America, 63(S1), S79–S79.

    Article  MATH  Google Scholar 

  • Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43–49.

    Article  MATH  Google Scholar 

  • Savoji, M. H. (1989). A robust algorithm for accurate endpointing of speech. Speech Communication, 8, 45–60.

    Article  Google Scholar 

  • Shahriyar, R., Hoque, E., Sohan, S., Naim, I., Akbar, M. M., & Khan, M. K. (2008). Remote controlling of home appliances using mobile telephony. International Journal of Smart Home, 2(3), 37–54.

    Google Scholar 

  • Subhadeep Dey, Sujit Barman, Ramesh K Bhukya, Rohan K Das, Haris, BC, Prasanna, S.R.M., Sinha, R, (2014). Speech biometric based attendance system. In Communications (NCC), 2014 Twentieth National Conference on IEEE, pp. 1–6.

  • Tsao, C., Gray, R. M., (1984). An endpoint detection for lpc speech using residual look-ahead for vector quantization applications. In IEEE International conference on acoustic, speech, signal processing.

  • Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.

    Article  Google Scholar 

  • Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Acoustics, Speech and Signal Processing, 13, 575–582.

    Article  Google Scholar 

  • Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Mr. Rajib Sharma, in the EEE department of IITG, Guwahati, for his help in making this work come to fruition.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramesh K. Bhukya.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khonglah, B.K., Bhukya, R.K. & Prasanna, S.R.M. Processing degraded speech for text dependent speaker verification. Int J Speech Technol 20, 839–850 (2017). https://doi.org/10.1007/s10772-017-9451-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9451-z

Keywords

Navigation