Epoch Extraction Using Zero Band Filtering from Speech Signal

Deepak, K. T.; Prasanna, S. R. M.

doi:10.1007/s00034-014-9957-4

Epoch Extraction Using Zero Band Filtering from Speech Signal

Published: 25 December 2014

Volume 34, pages 2309–2333, (2015)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

K. T. Deepak¹ &
S. R. M. Prasanna¹

610 Accesses
15 Citations
Explore all metrics

Abstract

Zero frequency filter (ZFF) is a marginally stable infinite impulse response resonant filter at 0 Hz that is used to extract the epoch locations reliably from speech signals. However, the output of such an ideal resonator is an exponentially increasing/decreasing function of time. The trend is removed from the filtered output by subtracting the average over 1–2 pitch periods to obtain zero frequency filtered signal. Alternatively in this paper, a bounded input bounded output stable realization of ZFF is proposed for epoch extraction, where the output of such a filter is not an increasing/decreasing function of time. The advantages of using such a stable filter is that the filter output is bounded and has no precision related problem associated with the output for lengthy speech files, also, the method does not require remove trend procedure that needs initial pitch estimation. The proposed approach is evaluated using CMU-Arctic database for clean and degraded conditions. Furthermore, the method is also validated in cases of singing voice and emotional speech to demonstrate the robustness for varying pitch scenarios. The proposed method is found to be robust for wide range of chosen parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

The electronic stethoscope

Article Open access 10 July 2015

Separating Neural Oscillations from Aperiodic 1/f Activity: Challenges and Recommendations

Article Open access 07 April 2022

References

T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27, 309–319 (1979)
Article Google Scholar
M. Brookes, P.A. Naylor, J. Gudnason, A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2), 456–466 (2006)
Article Google Scholar
P. Chetana, N. Dhananjaya, S.V. Gangashetty, Analysis of acoustic events in speech signals using bessel series expansion. Springer Circuits Syst. Signal Process. 32, 2915–2938 (2013)
Article Google Scholar
CMU-ARCTIC Speech Synthesis Databases. [Online]. Available: http://festvox.org/cmuarctic/index.html
K. T. Deepak, B. D. Sarma, and S. R. M. Prasanna, Foreground speech segmentation using zero frequency filtered signal, in Interspeech (2012)
T. Drugman and T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech, (2009)
German Emotional Speech Database. [Online]. Available: http://database.syntheticspeech.de/
D. Govind and S. R. M. Prasanna, Epoch extraction in emotional speech, in SPCOM, (2012)
D. Govind, S. R. M. Prasanna, and B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Interspeech, (2011)
S. Guruprasad, B. Yegnanarayana, Performance of an event-based instantaneous fundamental frequency estimator for distant speech signals. IEEE Trans. Audio Speech Signal Process. 19, 1853–1864 (2011)
Article Google Scholar
K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio, Speech Lang. Process. 16, 1602–1613 (2008)
Article Google Scholar
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio, Speech Lang. Process. 15(1), 34–43 (2007)
Article Google Scholar
NIST-Speaker Recognition Evaluations. in. [Online]. Available: http://www.nist.gov/itl/iad/mig/sre12.cfm
Noisex-92. in. [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html
A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice-Hall, Upper Saddle River, 1999)
Google Scholar
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21, 2471–2480 (2013)
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)
Article Google Scholar
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3, 352–333 (1995)
K.S.S. Srinivas, K. Prahallad, An FIR implementation of zero frequency filtering of speech signals. IEEE Trans. Audio Speech Lang. Process. 20(9), 2613–2617 (2012)
Article Google Scholar
H.W. Strube, Determination of the instant of glottal closure from the speech wave. J. Acoust. Soc. Am. 56, 1625–1629 (1974)
Article Google Scholar
M.R.P. Thomas, P.A. Naylor, The SIGMA algorithm: a glottal activity detector for electroglottographic signals. IEEE Trans. Audio Speech Lang. Process. 17, 1557–1566 (2009)
Article Google Scholar
M.R.P. Thomas, J. Gudnanson, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20, 82–91 (2012)
Article Google Scholar
B. Yegnanarayana, S. R. M. Prasanna, and S. Guruprasad, Study of robusness of zero frequency resonator method for extraction of fundamental frequency, in ICASSP, (2011)
B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36, 651–697 (2011)
Article Google Scholar
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Simultaneous modelling of spectrum, pitch and duration in HMM-based speech synthesis, in Eurospeech, (1999)

Download references

Acknowledgments

This work is part of the ongoing project on the development of Prosodically guided phonetic Engine for Assamese language funded by the Technology Development for Indian Languages (TDIL) Programme initiated by the Department of Electronics & Information Technology (DeitY), Ministry of Communication & Information Technology (MC&IT), Govt. of India under the consortium mode headed by IIIT Hyderabad.

Author information

Authors and Affiliations

Indian Institute of Technology Guwahati, Guwahati, India
K. T. Deepak & S. R. M. Prasanna

Authors

K. T. Deepak
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. R. M. Prasanna.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deepak, K.T., Prasanna, S.R.M. Epoch Extraction Using Zero Band Filtering from Speech Signal. Circuits Syst Signal Process 34, 2309–2333 (2015). https://doi.org/10.1007/s00034-014-9957-4

Download citation

Received: 20 February 2014
Revised: 08 December 2014
Accepted: 09 December 2014
Published: 25 December 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s00034-014-9957-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Epoch Extraction Using Zero Band Filtering from Speech Signal

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

The electronic stethoscope

Separating Neural Oscillations from Aperiodic 1/f Activity: Challenges and Recommendations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Epoch Extraction Using Zero Band Filtering from Speech Signal

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

The electronic stethoscope

Separating Neural Oscillations from Aperiodic 1/f Activity: Challenges and Recommendations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation