Skip to main content
Log in

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Speech is produced by exciting time-varying vocal tract with time-varying impulse-like excitations called epochs. In the literature, epoch extraction methods performed well on clean speech, but, detecting epoch locations from the band-limited signal like telephonic speech is difficult due to loss of information at low frequencies. This paper proposes a Stockwell transform (S-Transform)-based method that can find epochs accurately from the telephonic speech. The frequency-dependent Gaussian window and localization capabilities of S-Transform will reduce the effect of the bandpass nature of the telephonic channel. The telephonic channel is simulated using a 300–3400 Hz bandpass filter. The proposed method is evaluated on five speakers data, namely BDL, SLT, JMK, KED, and RAB, from CMU arctic database. The results are compared with the state-of-the-art methods for both clean speech and telephonic speech. The proposed method produced comparable results with existing methods on clean speech but has shown an improvement of 4.68\(\%\) over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The datasets used in the current study are from CMU arctic database, available at http://www.festvox.org/dbs/index.html.

Notes

  1. The datasets used in the current study are from CMU arctic database, available at http://www.festvox.org/dbs/index.html.

References

  1. C.D. Alessandro, N. Sturmel, Glottal closure instant and voice source analysis using time-scale lines of maximum amplitude. Sadhana 36(5), 601–622 (2011)

    Article  Google Scholar 

  2. T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process. 23(6), 562–570 (1975)

    Article  Google Scholar 

  3. T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)

    Article  Google Scholar 

  4. A. W. Black, S. King, K. Tokuda, “The blizzard challenge (2009).”

  5. R.A. Brown, R. Frayne, A fast discrete S-Transform for biomedical signal processing, in Proceedings of 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2586-2589 (2008)

  6. CMU arctic speech synthesis databases. [Online]. Available: http://festvox.org/cmu arctic/

  7. P.K. Dash, B.K. Panigrahi, G. Panda, Power quality analysis using S-Transform. IEEE Trans. Power Delivery 18(2), 406–411 (2003)

    Article  Google Scholar 

  8. S. Drabycz, R.G. Stockwell, J.R. Mitchell, Image texture characterization using the discrete orthonormal S-Transform. J. Digit. Imaging 22, 696 (2009)

    Article  Google Scholar 

  9. T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2012)

    Article  Google Scholar 

  10. D. Govind, R. Vishnu, D. Pravena, Improved method for epoch estimation in telephonic speech signals using zero frequency filtering. IEEE Int. Conf. Signal Image Process. Appl. 2015, 11–15 (2015)

    Google Scholar 

  11. B.G. Gowri, K.P. Soman, D. Govind. Improved epoch extraction from telephonic speech using chebfun and zero frequency filtering, in Proceedings of Interspeech, pp. 2152–2156 (2018)

  12. K. Gurugubelli, A.K. Vuppala, Stable implementation of zero frequency filtering of speech signals for efficient epoch extraction. IEEE Signal Process. Lett. 26(9), 1310–1314 (2019)

    Article  Google Scholar 

  13. M. Hamidia, A. Amrouche, A new robust double-talk detector based on the Stockwell transform for acoustic echo cancellation. Digit. Signal Process. (2016). https://doi.org/10.1016/j.dsp.2016.09.001

    Article  Google Scholar 

  14. N. H. T. Huda, A. R. Abdullah, M. H. Jopri, Power quality signals detection using S-Transform, in Proceedings of IEEE 7th International Power Engineering and Optimization Conference, pp. 552–557 (2013)

  15. ITU, Recommendation G.712: transmission performance characteristics of pulse code modulation channels (1996)

  16. S. R. Kadiri, A quantitative comparison of epoch extraction algorithms for telephone speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6500–6504 (2019)

  17. S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)

    Article  Google Scholar 

  18. Y.M. Keerthana, M.K. Reddy, K.S. Rao, CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Process. Lett. 26(8), 1107–1111 (2019)

    Article  Google Scholar 

  19. J. Kominek, A.W. Black, The CMU Arctic speech databases, in Proceedings of 5th ISCA speech synthesis workshop, pp. 223–224, (2004)

  20. S. Mallat, Wavelet Tour of Signal Processing (Academic, New York, 1999)

    MATH  Google Scholar 

  21. K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Letters 13(1), 52–55 (2006)

    Article  Google Scholar 

  22. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  23. P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)

    Article  Google Scholar 

  24. Noisex-92. [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section~1/Data/noisex.html

  25. A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)

    Article  Google Scholar 

  26. K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)

    Article  Google Scholar 

  27. S. Saoud, S. Bousselmi, M.B. Naser, A. Cherif, New speech enhancement based on discrete orthonormal stockwell transform. Int. J. Adv. Comput. Sci. Appl. 7(10), (2016)

  28. R.R. Shenoy, C.S. Seelamantula, Spectral zero-crossings: Localization properties and applications. IEEE Trans. Signal Process. 63(12), 3177–3190 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  29. R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)

    Article  Google Scholar 

  30. R.G. Stockwell, A basis for efficient representation of the S-Transform. Digit. Signal Process. 17, 371–393 (2007)

    Article  Google Scholar 

  31. R.G. Stockwell, L. Mansinha, R.P. Lowe, Localization of the complex spectrum: The S-Transform. IEEE Trans. Signal. Process. 44, 998–1001 (1996)

    Article  Google Scholar 

  32. Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)

    Article  Google Scholar 

  33. M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)

    Article  Google Scholar 

  34. K. Vijayan, K.S.R. Murty, Epoch extraction by phase modelling of speech signals, Circ. Syst. Signal Process. 1–26 (2015)

  35. C.M. Vikram, S.R.M. Prasanna, Epoch extraction from telephone quality speech using single pole filter. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 624–636 (2017)

    Article  Google Scholar 

  36. B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)

    Article  Google Scholar 

  37. B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using LP residual signal. IEEE Speech Audio Process. 8(3), 267–281 (2000)

    Article  Google Scholar 

  38. F. Zhao, R. Yang, Power quality disturbance recognition using S-Transform, in IEEE Power Engineering Society General Meeting, p. 7 (2006)

Download references

Acknowledgements

The authors would like to thank the Department of Science and Technology (DST), India, for supporting Paidi Gangamohan through the project SRG/2020/001363.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Botsa Kishore Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, B.K., Gangamohan, P. & Gangashetty, S.V. Epoch Extraction from Telephonic Speech Signal using Stockwell Transform. Circuits Syst Signal Process 42, 4238–4251 (2023). https://doi.org/10.1007/s00034-023-02312-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02312-7

Keywords

Navigation