Abstract
This work explores the use of speech enhancement for enhancing degraded speech which may be useful for text dependent speaker verification system. The degradation may be due to noise or background speech. The text dependent speaker verification is based on the dynamic time warping (DTW) method. Hence there is a necessity of the end point detection. The end point detection can be performed easily if the speech is clean. However the presence of degradation tends to give errors in the estimation of the end points and this error propagates into the overall accuracy of the speaker verification system. Temporal and spectral enhancement is performed on the degraded speech so that ideally the nature of the enhanced speech will be similar to the clean speech. Results show that the temporal and spectral processing methods do contribute to the task by eliminating the degradation and improved accuracy is obtained for the text dependent speaker verification system using DTW.
Similar content being viewed by others
References
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Chakrabarty, D., Prasanna, S. R., Mahadeva, Das, & Kumar, Rohan. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.
Das, C. K., Sanaullah, M., Sarower, H. M. G., & Hassan, M. M. (2009). Development of a cell phone based remote control system: An effective switching system for controlling home and office appliances. International Journal of Electrical and Computer Sciences IJECS, 9(10), 37–43.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Deepak, K. T., & Prasanna, S. R. M. (2016). Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1204–1218.
Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29, 254–272.
Haris, B., Pradhan, G., Misra, A., Shukla, S., Sinha, R., Prasanna, S., (2011). Multi-variability speech database for robust speaker recognition. In Communications (NCC), 2011 National conference on IEEE, pp. 1–5.
Hébert, M., (2008). Text-dependent speaker recognition. In Springer handbook of speech processing, pp. 743–762.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Enhancement of noisy speech by temporal and spectral processing. Speech Communication, 53(2), 154–174.
Larcher, A., Lee, K. A., Ma, B., & Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and rsr2015. Speech Communication, 60, 56–77.
Mahanta, D., Paul, A., Ramesh K Bhukya, Rohan K Das, Sinha, R, Prasanna, S.R.M., (2016). Warping path and gross spectrum information for speaker verification under degraded condition. In Communication (NCC), 2016 Twenty Second National Conference on IEEE, pp. 1–6.
Marinov, S., (2003). Text dependent and text independent speaker verification system: Technology and application. Overview article.
Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, Language Processing, 16(8), 16021613.
Onukwugha, C., & Asagba, P. (2013). Remote control of home appliances using mobile phone: A polymorphous based system. African Journal of Computing and ICT, 6(5), 81–90.
Pandit, M., Kittler, J., (1998). Feature selection for a dtw-based speaker verification system. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on IEEE, Vol. 2., pp. 769–772.
Piyare, R., Tazil, M., (2011). Bluetooth based home automation system using cell phone. In Consumer Electronics (ISCE), 2011 IEEE 15th International Symposium on IEEE, pp. 192–195.
Pradhan, G., & Prasanna, S. M. (2011). Speaker verification under degraded condition: A perceptual study. International Journal of Speech Technology, 14(4), 405.
Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.
Prasanna, S. M., Zachariah, J. M., Yegnanarayana, B., (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing.
Prasanna, S. R. M., Zachariah, J. M., Yegnanarayana, B. (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing, (TIFR, Mumbai, India).
Rabiner, L., & Juang, B.-H. (1993a). Fundamentals of speech recognition. New Jersey: Pearson Education.
Rabiner, L. R., & Juang, B. H. (1993b). Fundamentals of speech recognition. Upper Saddle River: Prentice-Hall.
Rabiner, L. R., Rosenberg, A. E., & Levinson, S. E. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. The Journal of the Acoustical Society of America, 63(S1), S79–S79.
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43–49.
Savoji, M. H. (1989). A robust algorithm for accurate endpointing of speech. Speech Communication, 8, 45–60.
Shahriyar, R., Hoque, E., Sohan, S., Naim, I., Akbar, M. M., & Khan, M. K. (2008). Remote controlling of home appliances using mobile telephony. International Journal of Smart Home, 2(3), 37–54.
Subhadeep Dey, Sujit Barman, Ramesh K Bhukya, Rohan K Das, Haris, BC, Prasanna, S.R.M., Sinha, R, (2014). Speech biometric based attendance system. In Communications (NCC), 2014 Twentieth National Conference on IEEE, pp. 1–6.
Tsao, C., Gray, R. M., (1984). An endpoint detection for lpc speech using residual look-ahead for vector quantization applications. In IEEE International conference on acoustic, speech, signal processing.
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Acoustics, Speech and Signal Processing, 13, 575–582.
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
Acknowledgements
The authors would like to thank Mr. Rajib Sharma, in the EEE department of IITG, Guwahati, for his help in making this work come to fruition.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khonglah, B.K., Bhukya, R.K. & Prasanna, S.R.M. Processing degraded speech for text dependent speaker verification. Int J Speech Technol 20, 839–850 (2017). https://doi.org/10.1007/s10772-017-9451-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9451-z