Abstract
This work explores the effectiveness of the Intrinsic Mode Functions (IMFs) of the speech signal, in estimating its Glottal Closure Instants (GCIs). The IMFs of the speech signal, which are its AM–FM or oscillatory components, are obtained from two similar nonlinear and non-stationary signal analysis techniques—Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), and Modified Empirical Mode Decomposition (MEMD). Both these techniques are advanced variants of the original technique—Empirical Mode Decomposition (EMD). MEMD is much faster than ICEEMDAN, whereas the latter curtails mode-mixing (a drawback of EMD) more effectively. It is observed that the partial summation of a certain subset of the IMFs results in a signal whose minima are aligned with the GCIs. Based on this observation, two different methods are devised for estimating the GCIs from the IMFs of ICEEMDAN and MEMD. The two methods are captioned ICEEMDAN-based GCIs Estimation (IGE) and MEMD-based GCIs Estimation (MGE). The results reveal that IGE and MGE provide consistent and reliable estimates of the GCIs, compared to the state-of-the-art methods, across different scenarios—clean, noisy, and telephone channel conditions.
Similar content being viewed by others
References
T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)
J. Benesty, M.M. Sondhi, Y. Huang, Springer Handbook of Speech Processing (Springer, Berlin, 2008)
A. Bouchikhi, A.O. Boudraa, Multicomponent am–fm signals analysis based on emd-b-splines esa. Signal Process. 92(9), 2214–2228 (2012)
B. Bozkurt, T. Dutoit, Mixed-phase speech modeling and formant estimation, using differential phase spectrums, in ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (2003)
M. Brookes, Voicebox, in Speech Processing Toolbox for Matlab, Department of Electrical and Electronic Engineering, Imperial College (2009)
J.C. Cexus, A.O. Boudraa, Nonstationary signals analysis by teager-huang transform (tht), in Signal Processing Conference, 2006 14th European (IEEE, 2006), pp. 1–5
S. King, V. Karaiskos, in The Blizzard Challenge 2009, Centre for Speech Technology Research (CSTR) at the University of Edinburgh, UK (2009). http://www.festvox.org/blizzard/bc2009/summary_Blizzard2009.pdf
N. Chatlani, J.J. Soraghan, Emd-based filtering (emdf) of low-frequency noise for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 20(4), 1158–1166 (2012)
K. Chen, X.C. Zhou, J.Q. Fang, P.F. Zheng, J. Wang, Fault feature extraction and diagnosis of gearbox based on EEMD and deep briefs network. Int. J. Rotating Mach. 2017 (2017). https://doi.org/10.1155/2017/9602650
Y. Chen, Ct Wu, Hl Liu, Emd self-adaptive selecting relevant modes algorithm for fbg spectrum signal. Opt. Fiber Technol. 36, 63–67 (2017)
M.A. Colominas, G. Schlotthauer, M.E. Torres, Improved complete ensemble emd: a suitable tool for biomedical signal processing. Biomed. Signal Process. Control 14, 19–29 (2014)
M.A. Colominas, G. Schlotthauer, M.E. Torres, An unconstrained optimization approach to empirical mode decomposition. Digit. Signal Process. 40, 164–175 (2015)
K. Deepak, S. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)
T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech—Tenth Annual Conference of the International Speech Communication Association, pp. 2891–2894 (2009)
T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Tenth Annual Conference of the International Speech Communication Association (2009)
T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2012)
P. Flandrin, Some aspects of huang’s empirical mode decomposition, from interpretation to applications. in International Conference on Computational Harmonic Analysis CHA, vol. 4 (2004)
P. Flandrin, P. Goncalves, Empirical mode decompositions as data-driven wavelet-like expansions. Int. J. Wavel. Multiresolut. Inf. Process. 2(04), 477–496 (2004)
P. Flandrin, G. Rilling, P. Goncalves, Empirical mode decomposition as a filter bank. Signal Process. Lett. IEEE 11(2), 112–114 (2004)
N.D. Gaubitch, P.A. Naylor, Spatiotemporal averaging method for enhancement of reverberant speech. in Digital Signal Processing, 2007 15th International Conference on (IEEE, 2007), pp. 607–610
Y. Guo, G.R. Naik, H. Nguyen, Single channel blind source separation based local mean decomposition for biomedical applications, in Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE (IEEE, 2013), pp. 6812–6815
Y. Guo, S. Huang, Y. Li, G.R. Naik, Edge effect elimination in single-mixture blind source separation. Circuits Syst. Signal Process. 32(5), 2317–2334 (2013)
H. Hao, H. Wang, N. Rehman, A joint framework for multivariate signal denoising using multivariate empirical mode decomposition. Signal Process. 135, 263–273 (2017)
W.J. Hardcastle, A. Marchal, Speech Production and Speech Modelling (Springer, Berlin, 1990). 55
R.S. Holambe, M.S. Deshpande, Advances in Non-Linear Modeling for Speech Processing (Springer, Berlin, 2012)
N.E. Huang, Empirical mode decomposition and hilbert spectral analysis, in 69th Meeting of Shock and Vibration, Minneapolis, MN, United States (1998). https://ntrs.nasa.gov/search.jsp?R=19990078602
H. Huang, J. Pan, Speech pitch determination based on hilbert-huang transform. Signal Process. 86(4), 792–803 (2006)
N.E. Huang, S.S. Shen, Hilbert–Huang Transform and Its Applications, vol. 5 (World Scientific, Singapore, 2005)
N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. So. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998)
P. Jain, R.B. Pachori, Event-based method for instantaneous fundamental frequency estimation from voiced speech based on eigenvalue decomposition of the hankel matrix. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(10), 1467–1482 (2014)
K. Khaldi, M.T.H. Alouane, A.O. Boudraa, A new emd denoising approach dedicated to voiced speech signals, in Signals, Circuits and Systems, 2008. SCS 2008. 2nd International Conference on, (IEEE, 2008), pp. 1–5
K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, M. Turki, Audio encoding using huang and hilbert transforms, in Communications, Control and Signal Processing (ISCCSP), 2010 4th International Symposium on, (IEEE, 2010), pp. 1–5
K. Khaldi, A.O. Boudraa, M. Turki, T. Chonavel, I. Samaali, Audio encoding based on the empirical mode decomposition, in Signal Processing Conference, 2009 17th European, (IEEE, 2009), pp. 924–928
K. Khaldi, A.O. Boudraa, On signals compression by emd. Electron. lett. 48(21), 1329–1331 (2012)
K. Khaldi, A. Boudraa, Audio watermarking via emd. IEEE Trans. Audio Speech Lang. Process. 21(3), 675–680 (2013)
K. Khaldi, A.O. Boudraa, A. Bouchikhi, M.T.H. Alouane, Speech enhancement via emd. EURASIP J. Adv. Signal Process. 2008(1), 873,204 (2008)
K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)
K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, Hht-based audio coding. Signal Image Video Process. 9(1), 107–115 (2015)
K. Khaldi, A.O. Boudraa, M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Process. 10(1), 69–80 (2016)
J. Kominek, A.W. Black, The cmu arctic speech databases, in Fifth ISCA Workshop on Speech Synthesis (2004)
C.D. Lin, C.M. Anderson-Cook, M.S. Hamada, L.M. Moore, R.R. Sitter, Using genetic algorithms to design experiments: a review. Qual. Reliab. Eng. Int. 31(2), 155–167 (2015). https://doi.org/10.1002/qre.1591
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
G.R. Naik, S.E. Selvan, H.T. Nguyen, Single-channel emg classification with ensemble-empirical-mode-decomposition-based ica for diagnosing neuromuscular disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 24(7), 734–743 (2016)
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, vol. 100 (Prentice-Hall, Englewood Cliffs, 1978)
L.R. Rabiner, R.W. Schafer, Introduction to digital speech processing. Found. Trends Signal Process. 1(1), 1–194 (2007)
G. Rilling, P. Flandrin, P. Goncalves, et al. On empirical mode decomposition and its algorithms, in IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, vol. 3,NSIP-03, Grado (I) (2003), pp. 8–11
G. Schlotthauer, M.E. Torres, H.L. Rufiner, Pathological voice analysis and classification based on empirical mode decomposition, in Development of Multimodal Interfaces: Active Listening and Synchrony, ed. by A. Esposito, N. Campbell, C. Vogel, A. Hussain, A. Nijholtt (Springer, 2010), pp. 364–381
G. Schlotthauer, M. Torres, H. Rufiner, Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies, in World Congress on Medical Physics and Biomedical Engineering, September 7–12, 2009, (Springer, Munich, Germany, 2010), pp. 984–987
R. Sharma, S.M. Prasanna, A better decomposition of speech obtained using modified empirical mode decomposition. Digit. Signal Process. 58, 26–39 (2016). https://doi.org/10.1016/j.dsp.2016.07.012, URL http://www.sciencedirect.com/science/article/pii/S1051200416300975
R. Sharma, S.R.M. Prasanna, Characterizing glottal activity from speech using empirical mode decomposition, in National Conference on Communications 2015 (NCC-2015). (Mumbai, India, 2015)
R. Sharma, L. Vignolo, G. Schlotthauer, M. Colominas, H.L. Rufiner, S. Prasanna, Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Commun. 88, 39–64 (2017). https://doi.org/10.1016/j.specom.2016.12.004, URL http://www.sciencedirect.com/science/article/pii/S0167639316302370
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)
K. Sreenivasa Rao, S. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. Signal Process. Lett. IEEE 14(10), 762–765 (2007)
Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)
D. Talkin, A robust algorithm for pitch tracking (rapt). Speech Coding Synth. 495, 518 (1995)
M.R. Thomas, J. Gudnason, P.A. Naylor, Data-driven voice source waveform modelling, in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, (IEEE, 2009), pp. 3965–3968
M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
M.E. Torres, M.A. Colominas, G. Schlotthauer, P. Flandrin, A complete ensemble empirical mode decomposition with adaptive noise, in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, (IEEE, 2011), pp. 4144–4147
A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
G. Wang, X.Y. CHEN, F.L. Qiao, Z. Wu, N.E. Huang, On intrinsic mode function. Adv. Adapt. Data Anal. 2(03), 277–293 (2010)
D. Wong, J. Markel, A. Gray, Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Trans. Acoust. Speech Signal Process. 27(4), 350–355 (1979)
Z. Wu, N.E. Huang, A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 460(2046), 1597–1611 (2004)
Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1(01), 1–41 (2009)
J.D. Wu, Y.J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011)
B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)
J.R. Yeh, J.S. Shieh, N.E. Huang, Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2(02), 135–156 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sharma, R., Prasanna, S.R.M., Rufiner, H.L. et al. Detection of the Glottal Closure Instants Using Empirical Mode Decomposition. Circuits Syst Signal Process 37, 3412–3440 (2018). https://doi.org/10.1007/s00034-017-0713-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-017-0713-4