Skip to main content
Log in

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This work explores the effectiveness of the Intrinsic Mode Functions (IMFs) of the speech signal, in estimating its Glottal Closure Instants (GCIs). The IMFs of the speech signal, which are its AM–FM or oscillatory components, are obtained from two similar nonlinear and non-stationary signal analysis techniques—Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), and Modified Empirical Mode Decomposition (MEMD). Both these techniques are advanced variants of the original technique—Empirical Mode Decomposition (EMD). MEMD is much faster than ICEEMDAN, whereas the latter curtails mode-mixing (a drawback of EMD) more effectively. It is observed that the partial summation of a certain subset of the IMFs results in a signal whose minima are aligned with the GCIs. Based on this observation, two different methods are devised for estimating the GCIs from the IMFs of ICEEMDAN and MEMD. The two methods are captioned ICEEMDAN-based GCIs Estimation (IGE) and MEMD-based GCIs Estimation (MGE). The results reveal that IGE and MGE provide consistent and reliable estimates of the GCIs, compared to the state-of-the-art methods, across different scenarios—clean, noisy, and telephone channel conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)

    Article  Google Scholar 

  2. J. Benesty, M.M. Sondhi, Y. Huang, Springer Handbook of Speech Processing (Springer, Berlin, 2008)

    Book  Google Scholar 

  3. A. Bouchikhi, A.O. Boudraa, Multicomponent am–fm signals analysis based on emd-b-splines esa. Signal Process. 92(9), 2214–2228 (2012)

    Article  Google Scholar 

  4. B. Bozkurt, T. Dutoit, Mixed-phase speech modeling and formant estimation, using differential phase spectrums, in ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (2003)

  5. M. Brookes, Voicebox, in Speech Processing Toolbox for Matlab, Department of Electrical and Electronic Engineering, Imperial College (2009)

  6. J.C. Cexus, A.O. Boudraa, Nonstationary signals analysis by teager-huang transform (tht), in Signal Processing Conference, 2006 14th European (IEEE, 2006), pp. 1–5

  7. S. King, V. Karaiskos, in The Blizzard Challenge 2009, Centre for Speech Technology Research (CSTR) at the University of Edinburgh, UK (2009). http://www.festvox.org/blizzard/bc2009/summary_Blizzard2009.pdf

  8. N. Chatlani, J.J. Soraghan, Emd-based filtering (emdf) of low-frequency noise for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 20(4), 1158–1166 (2012)

    Article  Google Scholar 

  9. K. Chen, X.C. Zhou, J.Q. Fang, P.F. Zheng, J. Wang, Fault feature extraction and diagnosis of gearbox based on EEMD and deep briefs network. Int. J. Rotating Mach. 2017 (2017). https://doi.org/10.1155/2017/9602650

  10. Y. Chen, Ct Wu, Hl Liu, Emd self-adaptive selecting relevant modes algorithm for fbg spectrum signal. Opt. Fiber Technol. 36, 63–67 (2017)

    Article  Google Scholar 

  11. M.A. Colominas, G. Schlotthauer, M.E. Torres, Improved complete ensemble emd: a suitable tool for biomedical signal processing. Biomed. Signal Process. Control 14, 19–29 (2014)

    Article  Google Scholar 

  12. M.A. Colominas, G. Schlotthauer, M.E. Torres, An unconstrained optimization approach to empirical mode decomposition. Digit. Signal Process. 40, 164–175 (2015)

    Article  MathSciNet  Google Scholar 

  13. K. Deepak, S. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)

    Article  Google Scholar 

  14. T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech—Tenth Annual Conference of the International Speech Communication Association, pp. 2891–2894 (2009)

  15. T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Tenth Annual Conference of the International Speech Communication Association (2009)

  16. T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2012)

    Article  Google Scholar 

  17. P. Flandrin, Some aspects of huang’s empirical mode decomposition, from interpretation to applications. in International Conference on Computational Harmonic Analysis CHA, vol. 4 (2004)

  18. P. Flandrin, P. Goncalves, Empirical mode decompositions as data-driven wavelet-like expansions. Int. J. Wavel. Multiresolut. Inf. Process. 2(04), 477–496 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  19. P. Flandrin, G. Rilling, P. Goncalves, Empirical mode decomposition as a filter bank. Signal Process. Lett. IEEE 11(2), 112–114 (2004)

    Article  Google Scholar 

  20. N.D. Gaubitch, P.A. Naylor, Spatiotemporal averaging method for enhancement of reverberant speech. in Digital Signal Processing, 2007 15th International Conference on (IEEE, 2007), pp. 607–610

  21. Y. Guo, G.R. Naik, H. Nguyen, Single channel blind source separation based local mean decomposition for biomedical applications, in Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE (IEEE, 2013), pp. 6812–6815

  22. Y. Guo, S. Huang, Y. Li, G.R. Naik, Edge effect elimination in single-mixture blind source separation. Circuits Syst. Signal Process. 32(5), 2317–2334 (2013)

    Article  MathSciNet  Google Scholar 

  23. H. Hao, H. Wang, N. Rehman, A joint framework for multivariate signal denoising using multivariate empirical mode decomposition. Signal Process. 135, 263–273 (2017)

    Article  Google Scholar 

  24. W.J. Hardcastle, A. Marchal, Speech Production and Speech Modelling (Springer, Berlin, 1990). 55

    Book  Google Scholar 

  25. R.S. Holambe, M.S. Deshpande, Advances in Non-Linear Modeling for Speech Processing (Springer, Berlin, 2012)

    Book  MATH  Google Scholar 

  26. N.E. Huang, Empirical mode decomposition and hilbert spectral analysis, in 69th Meeting of Shock and Vibration, Minneapolis, MN, United States (1998). https://ntrs.nasa.gov/search.jsp?R=19990078602

  27. H. Huang, J. Pan, Speech pitch determination based on hilbert-huang transform. Signal Process. 86(4), 792–803 (2006)

    Article  MATH  Google Scholar 

  28. N.E. Huang, S.S. Shen, Hilbert–Huang Transform and Its Applications, vol. 5 (World Scientific, Singapore, 2005)

    Book  MATH  Google Scholar 

  29. N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. So. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  30. P. Jain, R.B. Pachori, Event-based method for instantaneous fundamental frequency estimation from voiced speech based on eigenvalue decomposition of the hankel matrix. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(10), 1467–1482 (2014)

    Article  Google Scholar 

  31. K. Khaldi, M.T.H. Alouane, A.O. Boudraa, A new emd denoising approach dedicated to voiced speech signals, in Signals, Circuits and Systems, 2008. SCS 2008. 2nd International Conference on, (IEEE, 2008), pp. 1–5

  32. K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, M. Turki, Audio encoding using huang and hilbert transforms, in Communications, Control and Signal Processing (ISCCSP), 2010 4th International Symposium on, (IEEE, 2010), pp. 1–5

  33. K. Khaldi, A.O. Boudraa, M. Turki, T. Chonavel, I. Samaali, Audio encoding based on the empirical mode decomposition, in Signal Processing Conference, 2009 17th European, (IEEE, 2009), pp. 924–928

  34. K. Khaldi, A.O. Boudraa, On signals compression by emd. Electron. lett. 48(21), 1329–1331 (2012)

    Article  Google Scholar 

  35. K. Khaldi, A. Boudraa, Audio watermarking via emd. IEEE Trans. Audio Speech Lang. Process. 21(3), 675–680 (2013)

    Article  Google Scholar 

  36. K. Khaldi, A.O. Boudraa, A. Bouchikhi, M.T.H. Alouane, Speech enhancement via emd. EURASIP J. Adv. Signal Process. 2008(1), 873,204 (2008)

    Article  MATH  Google Scholar 

  37. K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)

    Article  Google Scholar 

  38. K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, Hht-based audio coding. Signal Image Video Process. 9(1), 107–115 (2015)

    Article  Google Scholar 

  39. K. Khaldi, A.O. Boudraa, M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Process. 10(1), 69–80 (2016)

    Article  Google Scholar 

  40. J. Kominek, A.W. Black, The cmu arctic speech databases, in Fifth ISCA Workshop on Speech Synthesis (2004)

  41. C.D. Lin, C.M. Anderson-Cook, M.S. Hamada, L.M. Moore, R.R. Sitter, Using genetic algorithms to design experiments: a review. Qual. Reliab. Eng. Int. 31(2), 155–167 (2015). https://doi.org/10.1002/qre.1591

    Article  Google Scholar 

  42. E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)

    Article  Google Scholar 

  43. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  44. G.R. Naik, S.E. Selvan, H.T. Nguyen, Single-channel emg classification with ensemble-empirical-mode-decomposition-based ica for diagnosing neuromuscular disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 24(7), 734–743 (2016)

    Article  Google Scholar 

  45. P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)

    Article  Google Scholar 

  46. A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)

    Article  Google Scholar 

  47. L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, vol. 100 (Prentice-Hall, Englewood Cliffs, 1978)

    Google Scholar 

  48. L.R. Rabiner, R.W. Schafer, Introduction to digital speech processing. Found. Trends Signal Process. 1(1), 1–194 (2007)

    Article  MATH  Google Scholar 

  49. G. Rilling, P. Flandrin, P. Goncalves, et al. On empirical mode decomposition and its algorithms, in IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, vol. 3,NSIP-03, Grado (I) (2003), pp. 8–11

  50. G. Schlotthauer, M.E. Torres, H.L. Rufiner, Pathological voice analysis and classification based on empirical mode decomposition, in Development of Multimodal Interfaces: Active Listening and Synchrony, ed. by A. Esposito, N. Campbell, C. Vogel, A. Hussain, A. Nijholtt (Springer, 2010), pp. 364–381

  51. G. Schlotthauer, M. Torres, H. Rufiner, Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies, in World Congress on Medical Physics and Biomedical Engineering, September 7–12, 2009, (Springer, Munich, Germany, 2010), pp. 984–987

  52. R. Sharma, S.M. Prasanna, A better decomposition of speech obtained using modified empirical mode decomposition. Digit. Signal Process. 58, 26–39 (2016). https://doi.org/10.1016/j.dsp.2016.07.012, URL http://www.sciencedirect.com/science/article/pii/S1051200416300975

  53. R. Sharma, S.R.M. Prasanna, Characterizing glottal activity from speech using empirical mode decomposition, in National Conference on Communications 2015 (NCC-2015). (Mumbai, India, 2015)

  54. R. Sharma, L. Vignolo, G. Schlotthauer, M. Colominas, H.L. Rufiner, S. Prasanna, Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Commun. 88, 39–64 (2017). https://doi.org/10.1016/j.specom.2016.12.004, URL http://www.sciencedirect.com/science/article/pii/S0167639316302370

  55. R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)

    Article  Google Scholar 

  56. K. Sreenivasa Rao, S. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. Signal Process. Lett. IEEE 14(10), 762–765 (2007)

    Article  Google Scholar 

  57. Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)

    Article  Google Scholar 

  58. D. Talkin, A robust algorithm for pitch tracking (rapt). Speech Coding Synth. 495, 518 (1995)

    Google Scholar 

  59. M.R. Thomas, J. Gudnason, P.A. Naylor, Data-driven voice source waveform modelling, in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, (IEEE, 2009), pp. 3965–3968

  60. M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)

    Article  Google Scholar 

  61. M.E. Torres, M.A. Colominas, G. Schlotthauer, P. Flandrin, A complete ensemble empirical mode decomposition with adaptive noise, in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, (IEEE, 2011), pp. 4144–4147

  62. URL http://perso.ens-lyon.fr/patrick.flandrin/emd.html

  63. URL http://www.bioingenieria.edu.ar/grupos/ldnlys/index.htm

  64. URL http://www.commsp.ee.ic.ac.uk/~sap/resources/aplawdw/

  65. A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  66. G. Wang, X.Y. CHEN, F.L. Qiao, Z. Wu, N.E. Huang, On intrinsic mode function. Adv. Adapt. Data Anal. 2(03), 277–293 (2010)

    Article  MathSciNet  Google Scholar 

  67. D. Wong, J. Markel, A. Gray, Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Trans. Acoust. Speech Signal Process. 27(4), 350–355 (1979)

    Article  Google Scholar 

  68. Z. Wu, N.E. Huang, A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 460(2046), 1597–1611 (2004)

    Article  MATH  Google Scholar 

  69. Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1(01), 1–41 (2009)

    Article  Google Scholar 

  70. J.D. Wu, Y.J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011)

    Article  Google Scholar 

  71. B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)

    Article  Google Scholar 

  72. J.R. Yeh, J.S. Shieh, N.E. Huang, Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2(02), 135–156 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajib Sharma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, R., Prasanna, S.R.M., Rufiner, H.L. et al. Detection of the Glottal Closure Instants Using Empirical Mode Decomposition. Circuits Syst Signal Process 37, 3412–3440 (2018). https://doi.org/10.1007/s00034-017-0713-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-017-0713-4

Keywords

Navigation