Abstract
Speech is produced by exciting time-varying vocal tract with time-varying impulse-like excitations called epochs. In the literature, epoch extraction methods performed well on clean speech, but, detecting epoch locations from the band-limited signal like telephonic speech is difficult due to loss of information at low frequencies. This paper proposes a Stockwell transform (S-Transform)-based method that can find epochs accurately from the telephonic speech. The frequency-dependent Gaussian window and localization capabilities of S-Transform will reduce the effect of the bandpass nature of the telephonic channel. The telephonic channel is simulated using a 300–3400 Hz bandpass filter. The proposed method is evaluated on five speakers data, namely BDL, SLT, JMK, KED, and RAB, from CMU arctic database. The results are compared with the state-of-the-art methods for both clean speech and telephonic speech. The proposed method produced comparable results with existing methods on clean speech but has shown an improvement of 4.68\(\%\) over state-of-the-art methods.
Similar content being viewed by others
Data availability
The datasets used in the current study are from CMU arctic database, available at http://www.festvox.org/dbs/index.html.
Notes
The datasets used in the current study are from CMU arctic database, available at http://www.festvox.org/dbs/index.html.
References
C.D. Alessandro, N. Sturmel, Glottal closure instant and voice source analysis using time-scale lines of maximum amplitude. Sadhana 36(5), 601–622 (2011)
T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process. 23(6), 562–570 (1975)
T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)
A. W. Black, S. King, K. Tokuda, “The blizzard challenge (2009).”
R.A. Brown, R. Frayne, A fast discrete S-Transform for biomedical signal processing, in Proceedings of 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2586-2589 (2008)
CMU arctic speech synthesis databases. [Online]. Available: http://festvox.org/cmu arctic/
P.K. Dash, B.K. Panigrahi, G. Panda, Power quality analysis using S-Transform. IEEE Trans. Power Delivery 18(2), 406–411 (2003)
S. Drabycz, R.G. Stockwell, J.R. Mitchell, Image texture characterization using the discrete orthonormal S-Transform. J. Digit. Imaging 22, 696 (2009)
T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2012)
D. Govind, R. Vishnu, D. Pravena, Improved method for epoch estimation in telephonic speech signals using zero frequency filtering. IEEE Int. Conf. Signal Image Process. Appl. 2015, 11–15 (2015)
B.G. Gowri, K.P. Soman, D. Govind. Improved epoch extraction from telephonic speech using chebfun and zero frequency filtering, in Proceedings of Interspeech, pp. 2152–2156 (2018)
K. Gurugubelli, A.K. Vuppala, Stable implementation of zero frequency filtering of speech signals for efficient epoch extraction. IEEE Signal Process. Lett. 26(9), 1310–1314 (2019)
M. Hamidia, A. Amrouche, A new robust double-talk detector based on the Stockwell transform for acoustic echo cancellation. Digit. Signal Process. (2016). https://doi.org/10.1016/j.dsp.2016.09.001
N. H. T. Huda, A. R. Abdullah, M. H. Jopri, Power quality signals detection using S-Transform, in Proceedings of IEEE 7th International Power Engineering and Optimization Conference, pp. 552–557 (2013)
ITU, Recommendation G.712: transmission performance characteristics of pulse code modulation channels (1996)
S. R. Kadiri, A quantitative comparison of epoch extraction algorithms for telephone speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6500–6504 (2019)
S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)
Y.M. Keerthana, M.K. Reddy, K.S. Rao, CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Process. Lett. 26(8), 1107–1111 (2019)
J. Kominek, A.W. Black, The CMU Arctic speech databases, in Proceedings of 5th ISCA speech synthesis workshop, pp. 223–224, (2004)
S. Mallat, Wavelet Tour of Signal Processing (Academic, New York, 1999)
K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Letters 13(1), 52–55 (2006)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
Noisex-92. [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section~1/Data/noisex.html
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)
S. Saoud, S. Bousselmi, M.B. Naser, A. Cherif, New speech enhancement based on discrete orthonormal stockwell transform. Int. J. Adv. Comput. Sci. Appl. 7(10), (2016)
R.R. Shenoy, C.S. Seelamantula, Spectral zero-crossings: Localization properties and applications. IEEE Trans. Signal Process. 63(12), 3177–3190 (2015)
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)
R.G. Stockwell, A basis for efficient representation of the S-Transform. Digit. Signal Process. 17, 371–393 (2007)
R.G. Stockwell, L. Mansinha, R.P. Lowe, Localization of the complex spectrum: The S-Transform. IEEE Trans. Signal. Process. 44, 998–1001 (1996)
Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)
M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
K. Vijayan, K.S.R. Murty, Epoch extraction by phase modelling of speech signals, Circ. Syst. Signal Process. 1–26 (2015)
C.M. Vikram, S.R.M. Prasanna, Epoch extraction from telephone quality speech using single pole filter. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 624–636 (2017)
B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)
B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using LP residual signal. IEEE Speech Audio Process. 8(3), 267–281 (2000)
F. Zhao, R. Yang, Power quality disturbance recognition using S-Transform, in IEEE Power Engineering Society General Meeting, p. 7 (2006)
Acknowledgements
The authors would like to thank the Department of Science and Technology (DST), India, for supporting Paidi Gangamohan through the project SRG/2020/001363.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumar, B.K., Gangamohan, P. & Gangashetty, S.V. Epoch Extraction from Telephonic Speech Signal using Stockwell Transform. Circuits Syst Signal Process 42, 4238–4251 (2023). https://doi.org/10.1007/s00034-023-02312-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-023-02312-7