Skip to main content
Log in

Epoch Extraction Using Zero Band Filtering from Speech Signal

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Zero frequency filter (ZFF) is a marginally stable infinite impulse response resonant filter at 0 Hz that is used to extract the epoch locations reliably from speech signals. However, the output of such an ideal resonator is an exponentially increasing/decreasing function of time. The trend is removed from the filtered output by subtracting the average over 1–2 pitch periods to obtain zero frequency filtered signal. Alternatively in this paper, a bounded input bounded output stable realization of ZFF is proposed for epoch extraction, where the output of such a filter is not an increasing/decreasing function of time. The advantages of using such a stable filter is that the filter output is bounded and has no precision related problem associated with the output for lengthy speech files, also, the method does not require remove trend procedure that needs initial pitch estimation. The proposed approach is evaluated using CMU-Arctic database for clean and degraded conditions. Furthermore, the method is also validated in cases of singing voice and emotional speech to demonstrate the robustness for varying pitch scenarios. The proposed method is found to be robust for wide range of chosen parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27, 309–319 (1979)

    Article  Google Scholar 

  2. M. Brookes, P.A. Naylor, J. Gudnason, A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2), 456–466 (2006)

    Article  Google Scholar 

  3. P. Chetana, N. Dhananjaya, S.V. Gangashetty, Analysis of acoustic events in speech signals using bessel series expansion. Springer Circuits Syst. Signal Process. 32, 2915–2938 (2013)

    Article  Google Scholar 

  4. CMU-ARCTIC Speech Synthesis Databases. [Online]. Available: http://festvox.org/cmuarctic/index.html

  5. K. T. Deepak, B. D. Sarma, and S. R. M. Prasanna, Foreground speech segmentation using zero frequency filtered signal, in Interspeech (2012)

  6. T. Drugman and T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech, (2009)

  7. German Emotional Speech Database. [Online]. Available: http://database.syntheticspeech.de/

  8. D. Govind and S. R. M. Prasanna, Epoch extraction in emotional speech, in SPCOM, (2012)

  9. D. Govind, S. R. M. Prasanna, and B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Interspeech, (2011)

  10. S. Guruprasad, B. Yegnanarayana, Performance of an event-based instantaneous fundamental frequency estimator for distant speech signals. IEEE Trans. Audio Speech Signal Process. 19, 1853–1864 (2011)

    Article  Google Scholar 

  11. K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio, Speech Lang. Process. 16, 1602–1613 (2008)

    Article  Google Scholar 

  12. P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio, Speech Lang. Process. 15(1), 34–43 (2007)

    Article  Google Scholar 

  13. NIST-Speaker Recognition Evaluations. in. [Online]. Available: http://www.nist.gov/itl/iad/mig/sre12.cfm

  14. Noisex-92. in. [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html

  15. A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice-Hall, Upper Saddle River, 1999)

    Google Scholar 

  16. A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21, 2471–2480 (2013)

  17. K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)

    Article  Google Scholar 

  18. R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3, 352–333 (1995)

  19. K.S.S. Srinivas, K. Prahallad, An FIR implementation of zero frequency filtering of speech signals. IEEE Trans. Audio Speech Lang. Process. 20(9), 2613–2617 (2012)

    Article  Google Scholar 

  20. H.W. Strube, Determination of the instant of glottal closure from the speech wave. J. Acoust. Soc. Am. 56, 1625–1629 (1974)

    Article  Google Scholar 

  21. M.R.P. Thomas, P.A. Naylor, The SIGMA algorithm: a glottal activity detector for electroglottographic signals. IEEE Trans. Audio Speech Lang. Process. 17, 1557–1566 (2009)

    Article  Google Scholar 

  22. M.R.P. Thomas, J. Gudnanson, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20, 82–91 (2012)

    Article  Google Scholar 

  23. B. Yegnanarayana, S. R. M. Prasanna, and S. Guruprasad, Study of robusness of zero frequency resonator method for extraction of fundamental frequency, in ICASSP, (2011)

  24. B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36, 651–697 (2011)

    Article  Google Scholar 

  25. T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Simultaneous modelling of spectrum, pitch and duration in HMM-based speech synthesis, in Eurospeech, (1999)

Download references

Acknowledgments

This work is part of the ongoing project on the development of Prosodically guided phonetic Engine for Assamese language funded by the Technology Development for Indian Languages (TDIL) Programme initiated by the Department of Electronics & Information Technology (DeitY), Ministry of Communication & Information Technology (MC&IT), Govt. of India under the consortium mode headed by IIIT Hyderabad.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. R. M. Prasanna.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deepak, K.T., Prasanna, S.R.M. Epoch Extraction Using Zero Band Filtering from Speech Signal. Circuits Syst Signal Process 34, 2309–2333 (2015). https://doi.org/10.1007/s00034-014-9957-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-014-9957-4

Keywords

Navigation