Abstract
Zero frequency filter (ZFF) is a marginally stable infinite impulse response resonant filter at 0 Hz that is used to extract the epoch locations reliably from speech signals. However, the output of such an ideal resonator is an exponentially increasing/decreasing function of time. The trend is removed from the filtered output by subtracting the average over 1–2 pitch periods to obtain zero frequency filtered signal. Alternatively in this paper, a bounded input bounded output stable realization of ZFF is proposed for epoch extraction, where the output of such a filter is not an increasing/decreasing function of time. The advantages of using such a stable filter is that the filter output is bounded and has no precision related problem associated with the output for lengthy speech files, also, the method does not require remove trend procedure that needs initial pitch estimation. The proposed approach is evaluated using CMU-Arctic database for clean and degraded conditions. Furthermore, the method is also validated in cases of singing voice and emotional speech to demonstrate the robustness for varying pitch scenarios. The proposed method is found to be robust for wide range of chosen parameters.
Similar content being viewed by others
References
T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27, 309–319 (1979)
M. Brookes, P.A. Naylor, J. Gudnason, A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2), 456–466 (2006)
P. Chetana, N. Dhananjaya, S.V. Gangashetty, Analysis of acoustic events in speech signals using bessel series expansion. Springer Circuits Syst. Signal Process. 32, 2915–2938 (2013)
CMU-ARCTIC Speech Synthesis Databases. [Online]. Available: http://festvox.org/cmuarctic/index.html
K. T. Deepak, B. D. Sarma, and S. R. M. Prasanna, Foreground speech segmentation using zero frequency filtered signal, in Interspeech (2012)
T. Drugman and T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech, (2009)
German Emotional Speech Database. [Online]. Available: http://database.syntheticspeech.de/
D. Govind and S. R. M. Prasanna, Epoch extraction in emotional speech, in SPCOM, (2012)
D. Govind, S. R. M. Prasanna, and B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Interspeech, (2011)
S. Guruprasad, B. Yegnanarayana, Performance of an event-based instantaneous fundamental frequency estimator for distant speech signals. IEEE Trans. Audio Speech Signal Process. 19, 1853–1864 (2011)
K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio, Speech Lang. Process. 16, 1602–1613 (2008)
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio, Speech Lang. Process. 15(1), 34–43 (2007)
NIST-Speaker Recognition Evaluations. in. [Online]. Available: http://www.nist.gov/itl/iad/mig/sre12.cfm
Noisex-92. in. [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html
A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice-Hall, Upper Saddle River, 1999)
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21, 2471–2480 (2013)
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3, 352–333 (1995)
K.S.S. Srinivas, K. Prahallad, An FIR implementation of zero frequency filtering of speech signals. IEEE Trans. Audio Speech Lang. Process. 20(9), 2613–2617 (2012)
H.W. Strube, Determination of the instant of glottal closure from the speech wave. J. Acoust. Soc. Am. 56, 1625–1629 (1974)
M.R.P. Thomas, P.A. Naylor, The SIGMA algorithm: a glottal activity detector for electroglottographic signals. IEEE Trans. Audio Speech Lang. Process. 17, 1557–1566 (2009)
M.R.P. Thomas, J. Gudnanson, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20, 82–91 (2012)
B. Yegnanarayana, S. R. M. Prasanna, and S. Guruprasad, Study of robusness of zero frequency resonator method for extraction of fundamental frequency, in ICASSP, (2011)
B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36, 651–697 (2011)
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Simultaneous modelling of spectrum, pitch and duration in HMM-based speech synthesis, in Eurospeech, (1999)
Acknowledgments
This work is part of the ongoing project on the development of Prosodically guided phonetic Engine for Assamese language funded by the Technology Development for Indian Languages (TDIL) Programme initiated by the Department of Electronics & Information Technology (DeitY), Ministry of Communication & Information Technology (MC&IT), Govt. of India under the consortium mode headed by IIIT Hyderabad.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deepak, K.T., Prasanna, S.R.M. Epoch Extraction Using Zero Band Filtering from Speech Signal. Circuits Syst Signal Process 34, 2309–2333 (2015). https://doi.org/10.1007/s00034-014-9957-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-014-9957-4