Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Kumar, Botsa Kishore; Gangamohan, Paidi; Gangashetty, Suryakanth V.

doi:10.1007/s00034-023-02312-7

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Published: 26 February 2023

Volume 42, pages 4238–4251, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Botsa Kishore Kumar ORCID: orcid.org/0000-0002-8805-3814¹,
Paidi Gangamohan² &
Suryakanth V. Gangashetty³

178 Accesses
1 Altmetric
Explore all metrics

Abstract

Speech is produced by exciting time-varying vocal tract with time-varying impulse-like excitations called epochs. In the literature, epoch extraction methods performed well on clean speech, but, detecting epoch locations from the band-limited signal like telephonic speech is difficult due to loss of information at low frequencies. This paper proposes a Stockwell transform (S-Transform)-based method that can find epochs accurately from the telephonic speech. The frequency-dependent Gaussian window and localization capabilities of S-Transform will reduce the effect of the bandpass nature of the telephonic channel. The telephonic channel is simulated using a 300–3400 Hz bandpass filter. The proposed method is evaluated on five speakers data, namely BDL, SLT, JMK, KED, and RAB, from CMU arctic database. The results are compared with the state-of-the-art methods for both clean speech and telephonic speech. The proposed method produced comparable results with existing methods on clean speech but has shown an improvement of 4.68\(\%\) over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Data availability

The datasets used in the current study are from CMU arctic database, available at http://www.festvox.org/dbs/index.html.

Notes

The datasets used in the current study are from CMU arctic database, available at http://www.festvox.org/dbs/index.html.

References

C.D. Alessandro, N. Sturmel, Glottal closure instant and voice source analysis using time-scale lines of maximum amplitude. Sadhana 36(5), 601–622 (2011)
Article Google Scholar
T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process. 23(6), 562–570 (1975)
Article Google Scholar
T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)
Article Google Scholar
A. W. Black, S. King, K. Tokuda, “The blizzard challenge (2009).”
R.A. Brown, R. Frayne, A fast discrete S-Transform for biomedical signal processing, in Proceedings of 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2586-2589 (2008)
CMU arctic speech synthesis databases. [Online]. Available: http://festvox.org/cmu arctic/
P.K. Dash, B.K. Panigrahi, G. Panda, Power quality analysis using S-Transform. IEEE Trans. Power Delivery 18(2), 406–411 (2003)
Article Google Scholar
S. Drabycz, R.G. Stockwell, J.R. Mitchell, Image texture characterization using the discrete orthonormal S-Transform. J. Digit. Imaging 22, 696 (2009)
Article Google Scholar
T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2012)
Article Google Scholar
D. Govind, R. Vishnu, D. Pravena, Improved method for epoch estimation in telephonic speech signals using zero frequency filtering. IEEE Int. Conf. Signal Image Process. Appl. 2015, 11–15 (2015)
Google Scholar
B.G. Gowri, K.P. Soman, D. Govind. Improved epoch extraction from telephonic speech using chebfun and zero frequency filtering, in Proceedings of Interspeech, pp. 2152–2156 (2018)
K. Gurugubelli, A.K. Vuppala, Stable implementation of zero frequency filtering of speech signals for efficient epoch extraction. IEEE Signal Process. Lett. 26(9), 1310–1314 (2019)
Article Google Scholar
M. Hamidia, A. Amrouche, A new robust double-talk detector based on the Stockwell transform for acoustic echo cancellation. Digit. Signal Process. (2016). https://doi.org/10.1016/j.dsp.2016.09.001
Article Google Scholar
N. H. T. Huda, A. R. Abdullah, M. H. Jopri, Power quality signals detection using S-Transform, in Proceedings of IEEE 7th International Power Engineering and Optimization Conference, pp. 552–557 (2013)
ITU, Recommendation G.712: transmission performance characteristics of pulse code modulation channels (1996)
S. R. Kadiri, A quantitative comparison of epoch extraction algorithms for telephone speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6500–6504 (2019)
S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)
Article Google Scholar
Y.M. Keerthana, M.K. Reddy, K.S. Rao, CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Process. Lett. 26(8), 1107–1111 (2019)
Article Google Scholar
J. Kominek, A.W. Black, The CMU Arctic speech databases, in Proceedings of 5th ISCA speech synthesis workshop, pp. 223–224, (2004)
S. Mallat, Wavelet Tour of Signal Processing (Academic, New York, 1999)
MATH Google Scholar
K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Letters 13(1), 52–55 (2006)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
Article Google Scholar
Noisex-92. [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section~1/Data/noisex.html
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
Article Google Scholar
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)
Article Google Scholar
S. Saoud, S. Bousselmi, M.B. Naser, A. Cherif, New speech enhancement based on discrete orthonormal stockwell transform. Int. J. Adv. Comput. Sci. Appl. 7(10), (2016)
R.R. Shenoy, C.S. Seelamantula, Spectral zero-crossings: Localization properties and applications. IEEE Trans. Signal Process. 63(12), 3177–3190 (2015)
Article MathSciNet MATH Google Scholar
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)
Article Google Scholar
R.G. Stockwell, A basis for efficient representation of the S-Transform. Digit. Signal Process. 17, 371–393 (2007)
Article Google Scholar
R.G. Stockwell, L. Mansinha, R.P. Lowe, Localization of the complex spectrum: The S-Transform. IEEE Trans. Signal. Process. 44, 998–1001 (1996)
Article Google Scholar
Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)
Article Google Scholar
M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
Article Google Scholar
K. Vijayan, K.S.R. Murty, Epoch extraction by phase modelling of speech signals, Circ. Syst. Signal Process. 1–26 (2015)
C.M. Vikram, S.R.M. Prasanna, Epoch extraction from telephone quality speech using single pole filter. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 624–636 (2017)
Article Google Scholar
B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)
Article Google Scholar
B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using LP residual signal. IEEE Speech Audio Process. 8(3), 267–281 (2000)
Article Google Scholar
F. Zhao, R. Yang, Power quality disturbance recognition using S-Transform, in IEEE Power Engineering Society General Meeting, p. 7 (2006)

Download references

Acknowledgements

The authors would like to thank the Department of Science and Technology (DST), India, for supporting Paidi Gangamohan through the project SRG/2020/001363.

Author information

Authors and Affiliations

Speech Processing Laboratory, International Institute of Information Technology, Hyderabad, India
Botsa Kishore Kumar
Signal Processing and Machine Learning Laboratory, Koneru Lakshmaiah Education Foundation, Hyderabad Campus, Hyderabad, India
Paidi Gangamohan
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
Suryakanth V. Gangashetty

Authors

Botsa Kishore Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Paidi Gangamohan
View author publications
You can also search for this author in PubMed Google Scholar
Suryakanth V. Gangashetty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Botsa Kishore Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kumar, B.K., Gangamohan, P. & Gangashetty, S.V. Epoch Extraction from Telephonic Speech Signal using Stockwell Transform. Circuits Syst Signal Process 42, 4238–4251 (2023). https://doi.org/10.1007/s00034-023-02312-7

Download citation

Received: 19 October 2021
Revised: 31 January 2023
Accepted: 01 February 2023
Published: 26 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00034-023-02312-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation