Abstract
Zero frequency resonator (ZFR) was proposed earlier for the extraction of glottal closure instants (GCIs) (Murty and Yegnanarayana 2008). The output of ZFR is an exponentially growing/decaying signal. The trend of this signal can be removed to get the required resolution for detecting relevant information. By considering a window size of typical 1–2 pitch periods, the trend removed signal mainly exhibits information related to GCIs. This work proposes two methods for the detection of glottal opening instants (GOIs) using ZFR. In the first method, the window size for trend removing is reduced to a lower level (say, 0.33 \(\times \) pitch period), and the possibility of hypothesizing GOIs is demonstrated. In the second method, window size remains in the range of 1–2 pitch periods, but the input to ZFR is modified to remove GCIs information. The proposed methods are evaluated using CMU-Arctic database and compared with existing methods for GOI detection. The performance for the detection of GOIs is comparable to that of GCIs and also existing methods.
Similar content being viewed by others
References
Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(23), 109–118, eurospeech ’91. http://www.sciencedirect.com/science/article/pii/016763939290005R.
Ananthapadmanabha, T. V., & Yegnanarayana, B. (1975). Epoch extraction of voiced speech. IEEE Transactions on Acoustics, Speech and Signal Processing, 23(6), 562–570.
Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America, 50, 637–655.
Bouzid, A., & Ellouze, N. (2004). Glottal opening instant detection from speech signal. In Proceedings Eusipco.
Brookes, M. (2013). http://www.ee.ic.ac.uk/hp/staff/dmb/voicemail/voicebox.html.
Childers, D. G., & Krishnamurthy, A. K. (1985). A critical review of electroglottography. CRC Critical Reviews in Biomedical Engineering, 12, 131–161.
Childers, D. G., & Lee, C. K. (1991). Vocal quality factors: Analysis, synthesis, and perception. The Journal of the Acoustical Society of America, 90, 2394–2410.
Cohen, L. (1995). Time-frequency analysis: Theory and applications. Englewood Cliffs: Prentice-Hall.
Deepak, K. T., Ramesh, K., Adiga, N., & Prasanna, S. R. M. (2015). Speech and egg polarity detection using hilbert envelope. In Submitted to TENCON, pp. 1–5.
Drugman, T. (2013). http://tcts.fpms.ac.be/~drugman/.
Drugman, T. & Dutoit, T. (2009). Glottal closure and opening instant from speech signals. In Proceedings Interspeech.
Ellouze, N., & Bouzid, A. (2007). Open quotient measurements based on multiscale product of speech signal wavelet transform. Research Letter in Signal Processing, 7, 1–5.
Govind, D., Prasanna, S. R. M. & Pati, D. (2011). Epoch extraction in high pass filtered speech using Hilbert envelope. In Proceedings Interspeech.
Henrich, N., Doval, B. & dAlessandro, C. (1999). Glottal open quotient estimation using linear prediction. In Proceedings of the workshop on models and analysis of vocal emissions for biomedical applications.
Kominek, J. & Black, A. W. (2004). The CMU ARCTIC speech databases. In Proceedings of the 5th ISCA speech synthesis workshop (pp. 223–224). http://festvox.org/cmu_arctic/index.html.
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech and Language Processing, 16(8), 1602–1614.
Narendra, N., & Rao, K. S. (2015). Robust voicing detection and f0 estimation for hmm based speech synthesis. Circuits, Systems, and Signal Processing, 34, 1–23.
Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using DYPSA algorithm. IEEE Transactions on Audio, Speech and Language Processing, 15(1), 34–43.
Prasanna, S. R. M., Govind, D., Rao, K. S. & Yegnanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Proceedings Speech Prosody.
Quatieri, T. F. (2004). Discrete-time speech signal processing. Delhi: Pearson Education.
Ramesh, K., Prasanna, S. R. M. & Govind, D. (2013). Detection of glottal opening instants using Hilbert envelope. In Proceedings Interspeech, Lyon.
Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.
Smits, R., & Yegnanarayana, B. (1995a). Determination of instants of significant excitation in speech using group delay function. IEEE Transactions on Acoustics, Speech and Signal Processing, 4, 325–333.
Smits, R., & Yegnanarayana, B. (1995b). Determination of instants of significant excitation in speech using group delay function. IEEE Transactions on Speech and Audio Processing, 3(5), 325–333.
Thomas, M. R. P., Gudnason, J., & Naylor, P. A. (2012). Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 82–91.
Thomas, M. R. P., & Naylor, P. A. (2009). The sigma algorithm: A glottal activity detector for electroglottographic signals. IEEE Transactions on Audio,Speech, and Language Processing, 17(8), 1557–1566.
Yegnanarayana, B., & Murty, K. S. R. (2009). Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals. IEEE Transactions on Audio, Speech and Language Processing, 17(4), 614–625.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ramesh, K., Prasanna, S.R.M. Glottal opening instants detection using zero frequency resonator. Int J Speech Technol 20, 127–141 (2017). https://doi.org/10.1007/s10772-016-9383-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-016-9383-z