Advertisement

Circuits, Systems, and Signal Processing

, Volume 37, Issue 8, pp 3245–3274 | Cite as

Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition

  • G. Jyothish Lal
  • E. A. Gopalakrishnan
  • D. Govind
Article
  • 95 Downloads

Abstract

This paper presents a novel approach for the estimation of epochs from the emotional speech signal. Epochs are the locations of significant excitation in the vocal tract during the production of voiced sound by the vibration of vocal folds. The estimation of epoch locations is essential for deriving instantaneous pitch contours for accurate emotion analysis. Many well-known algorithms for epoch extraction are found to show degraded performance due to the varying nature of excitation characteristics in the emotional speech signal. The proposed approach exploits the effectiveness of a new adaptive time series decomposition technique called variational mode decomposition (VMD) for the estimation of epochs. The VMD algorithm is applied on the emotional speech signal for decomposition of the signal into various sub-signals. Analysis of these signals shows that the VMD algorithm captures the center frequency close to the fundamental frequency defined for each glottal cycle of emotional speech utterance through its modes. This center frequency characteristic of the corresponding mode signal helps in the accurate estimation of epoch locations from the emotional speech signal. The performance evaluation of the proposed method is carried out on six different emotions taken from the German emotional speech database with simultaneous electroglottographic signals. Experimental results on clean emotive speech signals show that the proposed method provides identification rate and accuracy comparable to that of the best performing algorithm. Besides, the proposed method provides better reliability in epoch estimation from emotive speech signals degraded by the presence of noise.

Keywords

Epoch estimation Glottal closure instants Excitation source Emotional speech signal EGG signal Variational mode decomposition 

Notes

Acknowledgements

The authors gratefully acknowledge Amrita Vishwa Vidyapeetham for supporting the first author in pursing his Ph.D. The authors would like to thank Dr. K.P. Soman and Ms. M. Neethu (Amrita Vishwa Vidyapeetham) for lucidly explaining the concept of VMD .

References

  1. 1.
    T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)CrossRefGoogle Scholar
  2. 2.
    M. Brookes, VOICEBOX: speech processing toolbox for MATLAB. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Accessed 30 May 2017
  3. 3.
    M. Bulut, S. Narayanan, On the robustness of overall f0-only modifications to the perception of emotions in speech. J. Acoust. Soc. Am. 123, 4547–4558 (2008)CrossRefGoogle Scholar
  4. 4.
    F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, pp. 1–4 (2005)Google Scholar
  5. 5.
    J.P. Cabral, L.C. Oliveira, Emo voice: a system to generate emotions in speech, in Interspeech, pp. 1798–1801 (2006)Google Scholar
  6. 6.
    J.P. Cabral, L.C. Oliveira, Pitch-synchronous time-scaling for prosodic and voice quality transformations, in Interspeech, pp. 1137–1140 (2005)Google Scholar
  7. 7.
    F. Dellaert, T. Polzin, A. Waibel, Recognizing emotion in speech. Spoken Language, in ICSLP 96, pp, 1970–1973 (1996)Google Scholar
  8. 8.
    K.T. Deepak, S.R.M. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)CrossRefGoogle Scholar
  9. 9.
    P. Deshpande, M.S. Manikandan, Effective glottal instant detection and electroglottographic parameter extraction for automated voice pathology assessment. IEEE J. Biomed. Health Inf. PP(99), 1–11 (2017)Google Scholar
  10. 10.
    K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014)MathSciNetCrossRefGoogle Scholar
  11. 11.
    T. Drugman, P. Alku, A. Alwan, B. Yegnanarayana, Glottal source processing: from analysis to applications. Comput. Speech Lang. 28(5), 1117–1138 (2014)CrossRefGoogle Scholar
  12. 12.
    T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech, pp. 2891–2894 (2009)Google Scholar
  13. 13.
    S.R. Dumpala, K.V. Sridaran, S.V. Gangashetty, B. Yegnanarayana, Analysis of laughter and speech-laugh signals using excitation source information, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 975–979 (2014)Google Scholar
  14. 14.
    Z. Gao, X. Wang, J. Lin, Y. Liao, Online evaluation of metal burn degrees based on acoustic emission and variational mode decomposition. Measurement 103, 302–310 (2017)CrossRefGoogle Scholar
  15. 15.
    J. Gilles, Empirical wavelet transform. IEEE Trans. Signal Process. 61(16), 3999–4010 (2013)MathSciNetCrossRefGoogle Scholar
  16. 16.
    D. Govind, Epoch based dynamic prosody modification for neutral to expressive conversion, Ph.D Thesis, http://gyan.iitg.ernet.in/handle/123456789/363. Accessed 10 July 2017
  17. 17.
    D. Govind, P. Hisham, D. Pravena, Effectiveness of polarity detection for improved epoch extraction from speech, in National Conference on Communication (NCC), pp. 1–6 (2016)Google Scholar
  18. 18.
    D. Govind, S.R.M. Prasanna, Epoch extraction from emotional speech, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012)Google Scholar
  19. 19.
    D. Govind, S.R.M. Prasanna, Expressive speech synthesis: a review. Int. J. Speech Technol. 16(2), 237–260 (2013)CrossRefGoogle Scholar
  20. 20.
    D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Interspeech, pp. 2969–2972 (2011)Google Scholar
  21. 21.
    D. Govind, R. Vishnu, D. Pravena, Improved method for epoch estimation in telephonic speech signals using zero frequency filtering, in IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 11–15 (2015)Google Scholar
  22. 22.
    N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Soc. Lond. A Math. Phys. Eng. Sci. 454, 903–995 (1988)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    S. R. Kadiri, P. Gangamohan, S.V Gangashetty, B. Yegnanarayana, Analysis of excitation source features of speech for emotion recognition, in Interspeech, pp. 1324–1328 (2015)Google Scholar
  24. 24.
    S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)CrossRefGoogle Scholar
  25. 25.
    S.R. Kadiri, B. Yegnanarayana, Speech polarity detection using strength of impulse-like excitation extracted from speech epochs, in ICASSP), pp. 5610–5614 (2017)Google Scholar
  26. 26.
    S.R. Kadiri, B. Yegnanarayana, Analysis of singing voice for epoch extraction using zero frequency filtering method, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4260–4264 (2015)Google Scholar
  27. 27.
    V. Khanagha, K. Daoudi, H. Yahia, Detection of glottal closure instants based on the microcanonical multiscale formalism. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1941–1950 (2014)CrossRefGoogle Scholar
  28. 28.
    S.G. Koolagudi, S. Devliyal, B. Chawla, A. Barthwal, K.S. Rao, Recognition of emotions from speech using excitation source features. Procedia Eng. 38, 3409–3417 (2012)CrossRefGoogle Scholar
  29. 29.
    S.G. Koolagudi, R. Reddy, K.S. Rao, Emotion recognition from speech signal using epoch parameters, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2010)Google Scholar
  30. 30.
    A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE/ACM Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)CrossRefGoogle Scholar
  31. 31.
    S.R. Krothapalli, S.G. Koolagudi, Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16(2), 181–201 (2013)CrossRefGoogle Scholar
  32. 32.
    K.S. Kumar, M.S.H. Reddy, K.S.R. Murty, B. Yegnanarayana, Analysis of laugh signals for detecting in continuous speech, Interspeech, pp. 1591–1594 (2009)Google Scholar
  33. 33.
    G.J. Lal, E.A. Gopalakrishnan, D. Govind, Accurate estimation of glottal closure instants and glottal opening instants from electroglottographic signal using variational mode decomposition. Circuits Syst. Signal Process. 37(2), 810–830 (2018)MathSciNetCrossRefGoogle Scholar
  34. 34.
    A. Mert, ECG feature extraction based on the bandwidth properties of variational mode decomposition. Physiol. Meas. 37(4), 530–543 (2016)CrossRefGoogle Scholar
  35. 35.
    K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  36. 36.
    P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)CrossRefGoogle Scholar
  37. 37.
  38. 38.
    S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in Interspeech, pp. 781–784 (2010)Google Scholar
  39. 39.
    A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)CrossRefGoogle Scholar
  40. 40.
    L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal, A comparative performance study of several pitch detection algorithms. IEEE Trans. Audio Speech Lang. Process. 24(5), 399–418 (1976)CrossRefGoogle Scholar
  41. 41.
    K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)CrossRefGoogle Scholar
  42. 42.
    K.R. Scherer, Vocal affect expressions: a review and a model for future research. Psychol. Bull. 99, 143–165 (1986)CrossRefGoogle Scholar
  43. 43.
    K.P. Soman, P. Prabaharan, S. Athira, K. Harikumar, Recursive variational mode decomposition algorithm for real time power signal decomposition. Procedia Technol. 21, 540–546 (2015)CrossRefGoogle Scholar
  44. 44.
    D. Talkin, A robust algorithm for pitch tracking, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, New Providence, 1995), pp. 495–518Google Scholar
  45. 45.
    S.A. Thati, K.S. Kumar, B. Yegnanarayana, Synthesis of laughter by modifying excitation characteristics. J. Acoust. Soc. Am. 133(5), 3072–3082 (2013)CrossRefGoogle Scholar
  46. 46.
    M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)CrossRefGoogle Scholar
  47. 47.
    A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352, 2679–2707 (2015)CrossRefGoogle Scholar
  48. 48.
    A. Upadhyay, R.B. Pachori, A new method for determination of instantaneous pitch frequency from speech signals, in IEEE Signal Processing and Signal Processing Education Workshop, pp. 325–330 (2015)Google Scholar
  49. 49.
    WAVESURFER, https://www.speech.kth.se/wavesurfer. Accessed 6 Mar 2017
  50. 50.
    C.E. Williams, K. Stevens, Emotions and speech: some acoustic correlates. J. Acoust. Soc. Am. 52, 1238–1250 (1972)CrossRefGoogle Scholar
  51. 51.
    Y.J. Xue, J.X. Cao, D.X. Wang, H.K. Du, Y. Yao, Application of the variational-mode decomposition for seismic time–frequency analysis. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 9(8), 3821–3831 (2016)CrossRefGoogle Scholar
  52. 52.
    W. Yang, Z. Peng, K. Wei, P. Shi, W. Tian, Superiorities of variational mode decomposition over empirical mode decomposition particularly in time–frequency feature extraction and wind turbine condition monitoring. IET Renew. Power Gener. 11, 443–452 (2016).  https://doi.org/10.1049/iet-rpg.2016.0088 CrossRefGoogle Scholar
  53. 53.
    B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • G. Jyothish Lal
    • 1
  • E. A. Gopalakrishnan
    • 1
  • D. Govind
    • 1
  1. 1.Centre for Computational Engineering and Networking (CEN), Amrita School of EngineeringAmrita Vishwa VidyapeethamCoimbatoreIndia

Personalised recommendations