Skip to main content
Log in

End Point Detection Using Speech-Specific Knowledge for Text-Dependent Speaker Verification

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper proposes a method using speech-specific knowledge to detect the begin and end points of speech under degraded condition. The method is based on vowel-like region (VLR) detection and uses both excitation source and vocal tract system information. Existing method for VLR detection uses excitation source information. Vocal tract system information from dominant resonant frequency is used to eliminate spurious VLRs in background noise. Foreground speech segmentation using excitation and vocal tract system information is carried out to remove spurious VLRs in the background speech region. Better localization of the end points is done using more detailed information about excitation source in terms of glottal activity to detect the sonorant consonants and missed VLRs. To include an unvoiced consonant, obstruent region detection is done at the beginning of the first VLR and at the end of last VLR. Detected begin and end points are evaluated by comparing with manually marked end points as well as by conducting the text-dependent speaker verification experiments. The proposed method performs better than some of the existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-García, D. Petrovska-Delacrétaz, D.A. Reynolds, A tutorial on text-independent speaker verification. EURASIP J. Adv. Signal Process. 2004(4), 101962 (2004)

    Article  Google Scholar 

  2. S. E. Bou-Ghazale, K. Assaleh, A robust endpoint detection of speech for noisy environments with application to automatic speech recognition. in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4, pp. IV-3808 (2002)

  3. J.P. Campbell, Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  4. R.K. Das, S. Jelil, S.R.M. Prasanna, Development of multi-level speech based person authentication system. J. Signal Process. Syst. 88(3), 259–271 (2017)

    Article  Google Scholar 

  5. K.T. Deepak, B.D. Sarma, S.R.M. Prasanna, Foreground speech segmentation using zero frequency filtered signal. in Interspeech 2012, Sept (2012)

  6. K .T. Deepak, S .R .M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Acoust. Speech Lang. Process. 24, 1204–1218 (2016)

    Google Scholar 

  7. S. Dey, S. Barman, R. K. Bhukya, R. K. Das, B. Haris, S. R. M. Prasanna, R. Sinha, Speech biometric based attendance system, in 2014 Twentieth National Conference on Communications (NCC), IEEE, pp. 1–6 (2014)

  8. N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Process. Lett. 17, 273–277 (2010)

    Article  Google Scholar 

  9. T. Dutta, Dynamic time warping based approach to text-dependent speaker identification using spectrograms, in Congress on Image and Signal Processing, CISP’08, vol. 2. IEEE, pp. 354–360 (2008)

  10. S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)

    Article  Google Scholar 

  11. J. González-Rodríguez, J. Ortega-García, C. Martín, L. Hernández, Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays, in Proceedings of Fourth International Conference on Spoken Language, 1996, ICSLP 96, vol. 3, IEEE, pp. 1333–1336 (1996)

  12. D. N. Gowda, in Signal Processing for Excitation-Based Analysis of Acoustic Events in Speech. Ph.D. Dissertation, Department of Computer Science and Engineering, IIT Madras (2011)

  13. M. Hamada, Y. Takizawa, T. Norimatsu, A noise robust speech recognition system, in The International Conference on Spoken Language Processing (1990)

  14. V. Hautamäki, M. Tuononen, T. Niemi-Laitinen, P. Fränti, Improving speaker verification by periodicity based voice activity detection, in Proceedings of 12th International Conference on Speech and Computer (SPECOM2007), vol. 2, pp. 645–650 (2007)

  15. M. Hébert, Text-dependent speaker recognition, in Springer Handbook of Speech Processing, Springer, pp. 743–762 (2008)

  16. B. K. Khonglah, R. K. Bhukya, S. R. M. Prasanna, Processing degraded speech for text dependent speaker verification, in International Journal of Speech Technology, pp. 1–12 (2017)

  17. T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)

    Article  Google Scholar 

  18. L. Lamel, L. Rabiner, A. Rosenberg, J. Wilpon, An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 29(4), 777–785 (1981)

    Article  Google Scholar 

  19. A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)

    Article  Google Scholar 

  20. Q. Li, J. Zheng, A. Tsai, Q. Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech Audio Process. 10(3), 146–157 (2002)

    Article  Google Scholar 

  21. D. Mahanta, A. Paul, R. K. Bhukya, R. K. Das, R. Sinha, S. R. M. Prasanna, Warping path and gross spectrum information for speaker verification under degraded condition, in 2016 Twenty Second National Conference on Communication (NCC), IEEE, pp. 1–6 (2016)

  22. J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  23. J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio, Speech, Lang. Process. 15(5), 1711–1723 (2007)

    Article  Google Scholar 

  24. K .S .R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio, Speech, Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  25. K.S.R. Murthy, B. Yegnanarayana, M.A. Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)

    Article  Google Scholar 

  26. R. Piyare, M. Tazil, Bluetooth based home automation system using cell phone, in IEEE 15th International Symposium on Consumer Electronics (ISCE), IEEE , pp. 192–195 (2011)

  27. G. Pradhan, S.R.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)

    Article  Google Scholar 

  28. G. Pradhan, S.R.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)

    Article  Google Scholar 

  29. G. Pradhan, Speaker verification under degraded conditions using vowel-like and nonvowel-like regions, Ph.D. Dissertation (2013)

  30. G. Pradhan, S.R.M. Prasanna, Speaker verification under degraded condition: a perceptual study. Int. J. Speech Technol. (Springer) 14(4), 405–417 (2011)

    Article  Google Scholar 

  31. R. S. Prasad, B. Yegnanarayana, Acoustic segmentation of speech using zero time littering, in Proceedings of INTERSPEECH, pp. 2292–2296 Aug (2013)

  32. S. R. M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proceedings of INTERSPEECH, pp. 1133–1136, Sept (2005)

  33. S.R.M. Prasanna, J.M. Zachariah, B. Yegnanarayana, Begin-end detection using vowel onset points, in Workshop on Spoken Language Processing, TIFR, Mumbai, India, Jan (2003)

  34. S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process 17(4), 556–565 (2009)

    Article  Google Scholar 

  35. S.R.M. Prasanna, G. Pradhan, Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans. Audio Speech Lang. Process 19(8), 2552–2565 (2011)

    Article  Google Scholar 

  36. S. R. M. Prasanna, J. M. Zachariah, B. Yegnanarayana, Begin-end detection using vowel onset points, in Workshop on Spoken Language Processing (2003)

  37. S. R. M. Prasanna, Event-based analysis of speech, Ph.D. Dissertation, Department of Computer Science and Engineering, IIT Madras (2004)

  38. L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Upper Saddle River, 1993)

    Google Scholar 

  39. L.R. Rabiner, A.E. Rosenberg, S.E. Levinson, Considerations in dynamic time warping algorithms for discrete word recognition. J. Acoust. Soc. Am. 63(S1), S79–S79 (1978)

    Article  Google Scholar 

  40. K. Ramesh, S. R. M. Prasanna, R. K. Das, Significance of glottal activity detection and glottal signature for text dependent speaker verification, in International Conference on Signal Processing and Communications (SPCOM), 2014, IEEE, pp. 1–5 (2014)

  41. G. Saha, S. Chakroborty, S. Senapati, A new silence removal and endpoint detection algorithm for speech and speaker recognition applications, in Proceedings of the 11th national conference on communications (NCC), pp. 291–295 (2005)

  42. H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics Speech Signal Process. 26(1), 43–49 (1978)

    Article  Google Scholar 

  43. B. D. Sarma, S. R. M. Prasanna, Analysis of spurious vowel-like regions detected by excitation source information, in Indicon (2013)

  44. B.D. Sarma, S.R.M. Prasanna, P. Sarmah, Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun. 92, 77–89 (2017)

    Article  Google Scholar 

  45. B. D. Sarma, P. S. Supreeth, S. R. M. Prasanna, Improved vowel onset and offset points detection using Bessel features, in SPCOM (2014)

  46. M.H. Savoji, A robust algorithm for accurate endpointing of speech. Speech Commun. 8, 45–60 (1989)

    Article  Google Scholar 

  47. C.S.P. Secries, in Time-Frequency Analysis: Theory and Applications, Series: Signal Processing Series (Englewood Cliffs: Prentice-Hall, 1995)

  48. R. Sharma, S.R.M. Prasanna, A better decomposition of speech obtained using modified empirical mode decomposition. Digit. Signal Process. 58, 26–39 (2016)

    Article  Google Scholar 

  49. R. Sharma, R.K. Bhukya, S.R.M. Prasanna, Analysis of the Hilbert spectrum for text-dependent speaker verification. Speech Commun. 96, 207–224 (2018)

    Article  Google Scholar 

  50. J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  Google Scholar 

  51. C. Tsao, R.M. Gray, An endpoint detection for LPC speech using residual look-ahead for vector quantization applications, in IEEE International Conference on Acoustics, Speech, and Signal Processing (Springer, Berlin, 1984), p. 1

  52. A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  53. L. P. Wong, M. Russell, Text-dependent speaker verification under noisy conditions using parallel model combination, in Proceedings of (ICASSP’01). 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001, vol. 1, IEEE, pp. 457–460 (2001)

  54. B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Acoust. Speech Signal Process. 13, 575–582 (2005)

    Article  Google Scholar 

  55. B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Speech Audio Process. 13(4), 575–582 (2005)

    Article  Google Scholar 

  56. B. Yegnanarayana, D.N. Gowda, Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55, 782–795 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramesh K. Bhukya.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhukya, R.K., Sarma, B.D. & Prasanna, S.R.M. End Point Detection Using Speech-Specific Knowledge for Text-Dependent Speaker Verification. Circuits Syst Signal Process 37, 5507–5539 (2018). https://doi.org/10.1007/s00034-018-0827-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-018-0827-3

Keywords

Navigation