Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

Sharma, Rajib; Prasanna, S. R. M.; Rufiner, Hugo Leonardo; Schlotthauer, Gastón

doi:10.1007/s00034-017-0713-4

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

Published: 17 November 2017

Volume 37, pages 3412–3440, (2018)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Rajib Sharma ORCID: orcid.org/0000-0002-5515-5128¹,
S. R. M. Prasanna¹,
Hugo Leonardo Rufiner² &
…
Gastón Schlotthauer³

354 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

This work explores the effectiveness of the Intrinsic Mode Functions (IMFs) of the speech signal, in estimating its Glottal Closure Instants (GCIs). The IMFs of the speech signal, which are its AM–FM or oscillatory components, are obtained from two similar nonlinear and non-stationary signal analysis techniques—Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), and Modified Empirical Mode Decomposition (MEMD). Both these techniques are advanced variants of the original technique—Empirical Mode Decomposition (EMD). MEMD is much faster than ICEEMDAN, whereas the latter curtails mode-mixing (a drawback of EMD) more effectively. It is observed that the partial summation of a certain subset of the IMFs results in a signal whose minima are aligned with the GCIs. Based on this observation, two different methods are devised for estimating the GCIs from the IMFs of ICEEMDAN and MEMD. The two methods are captioned ICEEMDAN-based GCIs Estimation (IGE) and MEMD-based GCIs Estimation (MGE). The results reveal that IGE and MGE provide consistent and reliable estimates of the GCIs, compared to the state-of-the-art methods, across different scenarios—clean, noisy, and telephone channel conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Singular value decomposition of noisy data: noise filtering

Article Open access 16 July 2019

References

T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)
Article Google Scholar
J. Benesty, M.M. Sondhi, Y. Huang, Springer Handbook of Speech Processing (Springer, Berlin, 2008)
Book Google Scholar
A. Bouchikhi, A.O. Boudraa, Multicomponent am–fm signals analysis based on emd-b-splines esa. Signal Process. 92(9), 2214–2228 (2012)
Article Google Scholar
B. Bozkurt, T. Dutoit, Mixed-phase speech modeling and formant estimation, using differential phase spectrums, in ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (2003)
M. Brookes, Voicebox, in Speech Processing Toolbox for Matlab, Department of Electrical and Electronic Engineering, Imperial College (2009)
J.C. Cexus, A.O. Boudraa, Nonstationary signals analysis by teager-huang transform (tht), in Signal Processing Conference, 2006 14th European (IEEE, 2006), pp. 1–5
S. King, V. Karaiskos, in The Blizzard Challenge 2009, Centre for Speech Technology Research (CSTR) at the University of Edinburgh, UK (2009). http://www.festvox.org/blizzard/bc2009/summary_Blizzard2009.pdf
N. Chatlani, J.J. Soraghan, Emd-based filtering (emdf) of low-frequency noise for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 20(4), 1158–1166 (2012)
Article Google Scholar
K. Chen, X.C. Zhou, J.Q. Fang, P.F. Zheng, J. Wang, Fault feature extraction and diagnosis of gearbox based on EEMD and deep briefs network. Int. J. Rotating Mach. 2017 (2017). https://doi.org/10.1155/2017/9602650
Y. Chen, Ct Wu, Hl Liu, Emd self-adaptive selecting relevant modes algorithm for fbg spectrum signal. Opt. Fiber Technol. 36, 63–67 (2017)
Article Google Scholar
M.A. Colominas, G. Schlotthauer, M.E. Torres, Improved complete ensemble emd: a suitable tool for biomedical signal processing. Biomed. Signal Process. Control 14, 19–29 (2014)
Article Google Scholar
M.A. Colominas, G. Schlotthauer, M.E. Torres, An unconstrained optimization approach to empirical mode decomposition. Digit. Signal Process. 40, 164–175 (2015)
Article MathSciNet Google Scholar
K. Deepak, S. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)
Article Google Scholar
T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech—Tenth Annual Conference of the International Speech Communication Association, pp. 2891–2894 (2009)
T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Tenth Annual Conference of the International Speech Communication Association (2009)
T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2012)
Article Google Scholar
P. Flandrin, Some aspects of huang’s empirical mode decomposition, from interpretation to applications. in International Conference on Computational Harmonic Analysis CHA, vol. 4 (2004)
P. Flandrin, P. Goncalves, Empirical mode decompositions as data-driven wavelet-like expansions. Int. J. Wavel. Multiresolut. Inf. Process. 2(04), 477–496 (2004)
Article MathSciNet MATH Google Scholar
P. Flandrin, G. Rilling, P. Goncalves, Empirical mode decomposition as a filter bank. Signal Process. Lett. IEEE 11(2), 112–114 (2004)
Article Google Scholar
N.D. Gaubitch, P.A. Naylor, Spatiotemporal averaging method for enhancement of reverberant speech. in Digital Signal Processing, 2007 15th International Conference on (IEEE, 2007), pp. 607–610
Y. Guo, G.R. Naik, H. Nguyen, Single channel blind source separation based local mean decomposition for biomedical applications, in Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE (IEEE, 2013), pp. 6812–6815
Y. Guo, S. Huang, Y. Li, G.R. Naik, Edge effect elimination in single-mixture blind source separation. Circuits Syst. Signal Process. 32(5), 2317–2334 (2013)
Article MathSciNet Google Scholar
H. Hao, H. Wang, N. Rehman, A joint framework for multivariate signal denoising using multivariate empirical mode decomposition. Signal Process. 135, 263–273 (2017)
Article Google Scholar
W.J. Hardcastle, A. Marchal, Speech Production and Speech Modelling (Springer, Berlin, 1990). 55
Book Google Scholar
R.S. Holambe, M.S. Deshpande, Advances in Non-Linear Modeling for Speech Processing (Springer, Berlin, 2012)
Book MATH Google Scholar
N.E. Huang, Empirical mode decomposition and hilbert spectral analysis, in 69th Meeting of Shock and Vibration, Minneapolis, MN, United States (1998). https://ntrs.nasa.gov/search.jsp?R=19990078602
H. Huang, J. Pan, Speech pitch determination based on hilbert-huang transform. Signal Process. 86(4), 792–803 (2006)
Article MATH Google Scholar
N.E. Huang, S.S. Shen, Hilbert–Huang Transform and Its Applications, vol. 5 (World Scientific, Singapore, 2005)
Book MATH Google Scholar
N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. So. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998)
Article MathSciNet MATH Google Scholar
P. Jain, R.B. Pachori, Event-based method for instantaneous fundamental frequency estimation from voiced speech based on eigenvalue decomposition of the hankel matrix. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(10), 1467–1482 (2014)
Article Google Scholar
K. Khaldi, M.T.H. Alouane, A.O. Boudraa, A new emd denoising approach dedicated to voiced speech signals, in Signals, Circuits and Systems, 2008. SCS 2008. 2nd International Conference on, (IEEE, 2008), pp. 1–5
K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, M. Turki, Audio encoding using huang and hilbert transforms, in Communications, Control and Signal Processing (ISCCSP), 2010 4th International Symposium on, (IEEE, 2010), pp. 1–5
K. Khaldi, A.O. Boudraa, M. Turki, T. Chonavel, I. Samaali, Audio encoding based on the empirical mode decomposition, in Signal Processing Conference, 2009 17th European, (IEEE, 2009), pp. 924–928
K. Khaldi, A.O. Boudraa, On signals compression by emd. Electron. lett. 48(21), 1329–1331 (2012)
Article Google Scholar
K. Khaldi, A. Boudraa, Audio watermarking via emd. IEEE Trans. Audio Speech Lang. Process. 21(3), 675–680 (2013)
Article Google Scholar
K. Khaldi, A.O. Boudraa, A. Bouchikhi, M.T.H. Alouane, Speech enhancement via emd. EURASIP J. Adv. Signal Process. 2008(1), 873,204 (2008)
Article MATH Google Scholar
K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)
Article Google Scholar
K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, Hht-based audio coding. Signal Image Video Process. 9(1), 107–115 (2015)
Article Google Scholar
K. Khaldi, A.O. Boudraa, M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Process. 10(1), 69–80 (2016)
Article Google Scholar
J. Kominek, A.W. Black, The cmu arctic speech databases, in Fifth ISCA Workshop on Speech Synthesis (2004)
C.D. Lin, C.M. Anderson-Cook, M.S. Hamada, L.M. Moore, R.R. Sitter, Using genetic algorithms to design experiments: a review. Qual. Reliab. Eng. Int. 31(2), 155–167 (2015). https://doi.org/10.1002/qre.1591
Article Google Scholar
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
G.R. Naik, S.E. Selvan, H.T. Nguyen, Single-channel emg classification with ensemble-empirical-mode-decomposition-based ica for diagnosing neuromuscular disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 24(7), 734–743 (2016)
Article Google Scholar
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
Article Google Scholar
A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
Article Google Scholar
L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, vol. 100 (Prentice-Hall, Englewood Cliffs, 1978)
Google Scholar
L.R. Rabiner, R.W. Schafer, Introduction to digital speech processing. Found. Trends Signal Process. 1(1), 1–194 (2007)
Article MATH Google Scholar
G. Rilling, P. Flandrin, P. Goncalves, et al. On empirical mode decomposition and its algorithms, in IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, vol. 3,NSIP-03, Grado (I) (2003), pp. 8–11
G. Schlotthauer, M.E. Torres, H.L. Rufiner, Pathological voice analysis and classification based on empirical mode decomposition, in Development of Multimodal Interfaces: Active Listening and Synchrony, ed. by A. Esposito, N. Campbell, C. Vogel, A. Hussain, A. Nijholtt (Springer, 2010), pp. 364–381
G. Schlotthauer, M. Torres, H. Rufiner, Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies, in World Congress on Medical Physics and Biomedical Engineering, September 7–12, 2009, (Springer, Munich, Germany, 2010), pp. 984–987
R. Sharma, S.M. Prasanna, A better decomposition of speech obtained using modified empirical mode decomposition. Digit. Signal Process. 58, 26–39 (2016). https://doi.org/10.1016/j.dsp.2016.07.012, URL http://www.sciencedirect.com/science/article/pii/S1051200416300975
R. Sharma, S.R.M. Prasanna, Characterizing glottal activity from speech using empirical mode decomposition, in National Conference on Communications 2015 (NCC-2015). (Mumbai, India, 2015)
R. Sharma, L. Vignolo, G. Schlotthauer, M. Colominas, H.L. Rufiner, S. Prasanna, Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Commun. 88, 39–64 (2017). https://doi.org/10.1016/j.specom.2016.12.004, URL http://www.sciencedirect.com/science/article/pii/S0167639316302370
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)
Article Google Scholar
K. Sreenivasa Rao, S. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. Signal Process. Lett. IEEE 14(10), 762–765 (2007)
Article Google Scholar
Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)
Article Google Scholar
D. Talkin, A robust algorithm for pitch tracking (rapt). Speech Coding Synth. 495, 518 (1995)
Google Scholar
M.R. Thomas, J. Gudnason, P.A. Naylor, Data-driven voice source waveform modelling, in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, (IEEE, 2009), pp. 3965–3968
M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
Article Google Scholar
M.E. Torres, M.A. Colominas, G. Schlotthauer, P. Flandrin, A complete ensemble empirical mode decomposition with adaptive noise, in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, (IEEE, 2011), pp. 4144–4147
URL http://perso.ens-lyon.fr/patrick.flandrin/emd.html
URL http://www.bioingenieria.edu.ar/grupos/ldnlys/index.htm
URL http://www.commsp.ee.ic.ac.uk/~sap/resources/aplawdw/
A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Article Google Scholar
G. Wang, X.Y. CHEN, F.L. Qiao, Z. Wu, N.E. Huang, On intrinsic mode function. Adv. Adapt. Data Anal. 2(03), 277–293 (2010)
Article MathSciNet Google Scholar
D. Wong, J. Markel, A. Gray, Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Trans. Acoust. Speech Signal Process. 27(4), 350–355 (1979)
Article Google Scholar
Z. Wu, N.E. Huang, A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 460(2046), 1597–1611 (2004)
Article MATH Google Scholar
Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1(01), 1–41 (2009)
Article Google Scholar
J.D. Wu, Y.J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011)
Article Google Scholar
B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)
Article Google Scholar
J.R. Yeh, J.S. Shieh, N.E. Huang, Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2(02), 135–156 (2010)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Signal Informatics Laboratory, Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Rajib Sharma & S. R. M. Prasanna
Research Institute for Signals, Systems and Computational Intelligence – sinc(i), Facultad de Ingeniería y Ciencias Hídricas, Universidad Nacional del Litoral, 3000, Santa Fe, Argentina
Hugo Leonardo Rufiner
Laboratorio de Señales y Dinámicas no Lineales, CITER - CONICET, Facultad de Ingeniería, Universidad Nacional de Entre Ríos, 3101, Oro Verde, Entre Ríos, Argentina
Gastón Schlotthauer

Authors

Rajib Sharma
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Leonardo Rufiner
View author publications
You can also search for this author in PubMed Google Scholar
Gastón Schlotthauer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajib Sharma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, R., Prasanna, S.R.M., Rufiner, H.L. et al. Detection of the Glottal Closure Instants Using Empirical Mode Decomposition. Circuits Syst Signal Process 37, 3412–3440 (2018). https://doi.org/10.1007/s00034-017-0713-4

Download citation

Received: 01 August 2017
Revised: 01 November 2017
Accepted: 06 November 2017
Published: 17 November 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00034-017-0713-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Singular value decomposition of noisy data: noise filtering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Singular value decomposition of noisy data: noise filtering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation