Skip to main content

Advertisement

Log in

Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

In this study, novel Spectro-Temporal Energy Ratio features based on the formants of vowels, linearly spaced low-frequency, and logarithmically spaced high-frequency parts of the human auditory system are introduced to implement single- and cross-corpus speech emotion recognition experiments. Since the underlying dynamics and characteristics of speech recognition and speech emotion recognition differ too much, designing an emotion-recognition-specific filter bank is mandatory. The proposed features will formulate a novel filter bank strategy to construct 7 trapezoidal filter banks. These novel filter banks differ from Mel and Bark scales in shape and frequency regions and are targeted to generalize the feature space. Cross-corpus experimentation is a step forward in speech emotion recognition, but the researchers are usually chagrined at its results. Our goal is to create a feature set that is robust and resistant to cross-corporal variations using various feature selection algorithms. We will prove this by shrinking the dimension of the feature space from 6984 down to 128 while boosting the accuracy using SVM, RBM, and sVGG (small-VGG) classifiers. Although RBMs are considered no longer fashionable, we will show that they can achieve outstanding jobs when tuned properly. This paper discloses a striking 90.65% accuracy rate harnessing STER features on EmoDB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of Data and Materials

Data is available on the following site: https://github.com/cevparlak/speech-emotion-recognition.

Code Availability

The source code is available on the following site: https://github.com/cevparlak/speech-emotion-recognition.

References

  1. Herculano-Houzel, S.: The human brain in numbers: a linearly scaled-up primate brain. Front. Hum. Neurosci. (2009). https://doi.org/10.3389/neuro.09.031.2009

    Article  PubMed  PubMed Central  Google Scholar 

  2. Nguyen, T.: Total number of synapses in the adult human neocortex. Undergrad. J. Math. Model. One+ Two 3(1), 26 (2010). https://doi.org/10.5038/2326-3652.3.1.26

    Article  Google Scholar 

  3. Ekman, P.E.; Davidson, R.J.: The Nature of Emotion: Fundamental Questions. Oxford University Press (1994)

    Google Scholar 

  4. Plutchik, R.: The Nature of Emotions Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)

    Article  ADS  Google Scholar 

  5. Whissel, C.M.: The Dictionary of Affect in Language: Emotion: Theory, Research and Experience. Academic Press, New York (1989) https://doi.org/10.1016/B978-0-12-558704-4.50011-6

    Book  Google Scholar 

  6. Cowie, R.; Cowie, E.D.; Tsapatsoulis, N.; Votsis, G.; Kollias, S.; Fellenz, W.; Taylor, J.G.: Emotion recognition in human-computer interaction. Signal Process. Mag. IEEE 18(1), 32–80 (2001). https://doi.org/10.1109/79.911197

    Article  ADS  Google Scholar 

  7. Schuller, B.; Batliner, A.; Steidl, S.; Seppi, D.: Recognizing realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011). https://doi.org/10.1016/j.specom.2011.01.011

    Article  Google Scholar 

  8. Wang, W.: Machine Audition: Principles, Algorithms, and Systems, p. 1–554. IGI Global (2011)

    Book  Google Scholar 

  9. Wu, S.; Falk, T.H.; Chan, W.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2010). https://doi.org/10.1016/j.specom.2010.08.013

    Article  Google Scholar 

  10. Ververidis, D.; Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006). https://doi.org/10.1016/j.specom.2006.04.003

    Article  Google Scholar 

  11. Ramakrishnan, S.: Recognition of emotion from speech: a review. Int. J. Speech Technol. 15(2), 99–117 (2012)

    Article  Google Scholar 

  12. He, L.: Stress and Emotion Recognition in Natural Speech in the Work and Family Environments, PhD Thesis, RMIT University (2010)

  13. Neumann, M.: Cross-lingual and multilingual speech emotion recognition on English and French. In: 2018 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5769–5773. Calgary, AB, Canada IEEE (2018). https://doi.org/10.1109/ICASSP.2018.8462162

  14. Zhang, B.; Provost, E.M.; Essl, G.: Cross-corpus acoustic emotion recognition with multi-task learning: seeking common ground while preserving differences. IEEE Trans. Affect. Comput. 10(1), 85–99 (2019). https://doi.org/10.1109/TAFFC.2017.2684799

    Article  Google Scholar 

  15. Song, P.: Transfer linear subspace learning for cross-corpus speech emotion recognition. IEEE Trans. Aff. Comput. 10(2), 265–275 (2017). https://doi.org/10.1109/TAFFC.2017.2705696

    Article  Google Scholar 

  16. Shah, M.; Chakrabarti, C.; Spanias, A.: Within and cross-corpus speech emotion recognition using latent topic model-based features. EURASIP J. Audio Speech Music Process. (2015). https://doi.org/10.1186/s13636-014-0049-y

    Article  Google Scholar 

  17. Song, P., et al.: Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization. Speech Commun. 83, 34–41 (2016). https://doi.org/10.1016/j.specom.2016.07.010

    Article  ADS  Google Scholar 

  18. Wang, K., et al.: Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015). https://doi.org/10.1109/TAFFC.2015.2392101

    Article  MathSciNet  Google Scholar 

  19. Yenigalla, P.; et al.: Speech emotion recognition using spectrogram & phoneme embedding. In: Interspeech 2018, September, pp. 3688–3692. Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-1811

  20. Mao, Q., et al.: Learning emotion-discriminative and domain-invariant features for domain adaptation in speech emotion recognition. Speech Commun. 93, 1–10 (2017). https://doi.org/10.1016/j.specom.2017.06.006

    Article  ADS  Google Scholar 

  21. Kamińska, D.; Sapiński, T.; Anbarjafari, G.: Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP J. Audio Speech Music Process. (2017). https://doi.org/10.1186/s13636-017-0100-x

    Article  Google Scholar 

  22. Seng, K.P.; Li-Minn, A.; Ooi, C.S.: A combined rule-based & machine learning audio-visual emotion recognition approach. IEEE Trans. Affect. Comput. 9(1), 3–13 (2016). https://doi.org/10.1109/TAFFC.2016.2588488

    Article  Google Scholar 

  23. Phan, D.A.; Matsumoto, Y.; Shindo, H.: Autoencoder for semisupervised multiple emotion detection of conversation transcripts. IEEE Trans. Affect. Comput. 12(3), 682–691 (2018). https://doi.org/10.1109/TAFFC.2018.2885304

    Article  Google Scholar 

  24. Deng, J., et al.: Semisupervised autoencoders for speech emotion recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 31–43 (2017). https://doi.org/10.1109/TASLP.2017.2759338

    Article  MathSciNet  Google Scholar 

  25. Sahu, S.; Gupta, R.; Espy-Wilson, C.: On enhancing speech emotion recognition using generative adversarial networks. Preprint https://arxiv.org/abs/1806.06626 (2018). https://doi.org/10.48550/arXiv.1806.06626

  26. Oflazoglu, C.; Yildirim, S.: Recognizing emotion from Turkish speech using acoustic features. EURASIP J. Audio Speech Music Process. (2013). https://doi.org/10.1186/1687-4722-2013-26

    Article  Google Scholar 

  27. Kaya, H.; et al.: LSTM Based cross-corpus and cross-task acoustic emotion recognition. In: Interspeech 2018, September, pp. 521–525, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-2298

  28. Rouast, P.V.; Adam, M.; Chiong, R.: Deep learning for human affect recognition: insights and new developments. IEEE Trans. Affect. Comput. 12(2), 524–543 (2019). https://doi.org/10.1109/TAFFC.2018.2890471

    Article  Google Scholar 

  29. Cho, J.; et al.: Deep neural networks for emotion recognition combining audio and transcripts. In: Interspeech 2018, September, pp. 247–251, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-2466

  30. Kim, J.; Saurus, R.A.: Emotion recognition from human speech using temporal information and deep learning. In: Interspeech 2018, September, pp. 937–940, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-1132

  31. Ma, X.; et al.: Emotion recognition from variable-length speech segments using deep learning on spectrograms. In: Interspeech 2018, September, pp. 3683–3687, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-2228

  32. Trigeorgis, G.; et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016 March, pp. 5200–5204, Shanghai, China. IEEE (2016). https://doi.org/10.1109/ICASSP.2016.7472669

  33. Tzirakis, P.; Zhang, J.; Schuller, B.W.: End-to-end speech emotion recognition using deep neural networks. In: 2018 International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, April, pp. 5089–5093, Calgary, AB, Canada. IEEE (2018). https://doi.org/10.1109/ICASSP.2018.8462677

  34. Busso, C.; Bulut, M.; Lee, C.C.; Kazemzadeh, A.; Mower, E.; Kim, S.; Chang, J.N.; Lee, S.; Narayanan, S.S.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6

    Article  Google Scholar 

  35. McKeown, G.; Valstar, M.; Cowie, R.; Pantic, M.; Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012). https://doi.org/10.1109/T-AFFC.2011.20

    Article  Google Scholar 

  36. Jeon, J.H.; Le, D.; Xia, R.; Liu, Y.: (2013). A preliminary study of cross-lingual emotion recognition from speech: automatic classification versus human perception. In: Interspeech 2013, August, pp. 2837–2840 Lyon, France (2013). https://doi.org/10.21437/Interspeech.2013-246

  37. Eyben, F.; Batliner, A.; Schuller, B.; Seppi, D.; Steidl, S.: Cross-Corpus classification of realistic emotions–some pilot experiments. In: Proc. 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, pp. 77–82Valetta, Malta (2010)

  38. Schuller, B.; Zhang, Z.; Weninger, F.; Rigoll, G.: Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization. In: Proc. Afeka-AVIOS Speech Processing Conference, Tel Aviv, Israel (2011)

  39. Shami, M.; Verhelst, W.: An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Commun. 49(3), 201–212 (2007). https://doi.org/10.1016/j.specom.2007.01.006

    Article  Google Scholar 

  40. Wen, G., et al.: Random deep belief networks for recognizing emotions from speech signals. Comput. Intell. Neurosci. (2017). https://doi.org/10.1155/2017/1945630

    Article  PubMed  PubMed Central  Google Scholar 

  41. Kahou, S.E., et al.: Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10, 99–111 (2016). https://doi.org/10.1007/s12193-015-0195-2

    Article  Google Scholar 

  42. Hassan, M.M., et al.: Human emotion recognition using deep belief network architecture. Inf. Fusion 51, 10–18 (2019). https://doi.org/10.1016/j.inffus.2018.10.009

    Article  Google Scholar 

  43. Atmaja, B.T.; Akagi, M.: Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM. Speech Commun. 126, 9–21 (2021). https://doi.org/10.1016/j.specom.2020.11.003

    Article  Google Scholar 

  44. Firdaus, M.; Chauhan, H.; Ekbal, A.; Bhattacharyya, P.: MEISD: A Multimodal Multi-Label Emotion, Intensity, and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations. In: Proceedings of the 28th International Conference on Computational Linguistics, 2020, December, pp. 4441–4453, Barcelona, Spain (2020). https://doi.org/10.18653/v1/2020.coling-main.393

  45. Poria, S.; Hazarika, D.; Majumder, N.; Naik, G.; Cambria, E.; Mihalcea, R.: MELD: A multimodal multi-party dataset for emotion recognition in conversations. Preprint https://arxiv.org/abs/1810.02508 (2018). https://doi.org/10.48550/arXiv.1810.02508

  46. Yin, Y.; Zheng, X.; Hu, B.; Zhang, Y.; Cui, X.: EEG emotion recognition using fusion model of graph convolutional neural networks and LSTM. Appl. Soft Comput. 100, 106954 (2021). https://doi.org/10.1016/j.asoc.2020.106954

    Article  Google Scholar 

  47. Pakyurek, M.; Atmis, M.; Kulac, S.; Uludag, U.: Extraction of novel features based on histograms of MFCCs used in emotion classification from generated original speech dataset. Elektronika ir Elektrotechnika 26(1), 46–51 (2020). https://doi.org/10.5755/j01.eie.26.1.25309

    Article  Google Scholar 

  48. Parlak, C.; Diri, B.; Gürgen, F.: A Cross-Corpus Experiment in Speech Emotion Recognition. In: Proc. International Workshop on Speech, Language and Audio in Multimedia (SLAM 2014), pp. 58–61, Penang, Malaysia, (2014)

  49. Burkhardt, F.; Paeschke, A.; Rolfes, M.; Sendlmeier, W.F.; Weiss, B.: A database of German emotional speech. In: Interspeech 2005, September, pp. 1517–1520 Lisbon, Portugal (2005). https://doi.org/10.21437/Interspeech.2005-446

  50. Eyben, F.; Wollmer, M.; Schuller, B.: OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In: 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops ACII 1–6, 2009, September, Amsterdam, Netherlands. IEEE (2009). https://doi.org/10.1109/ACII.2009.5349350

  51. Martin, O.; Kotsia, I.; Macq, B.; Pitas, I.: The eNTERFACE'05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW'06) 8–8, April 2006, Atlanta, GA, USA. IEEE (2006). https://doi.org/10.1109/ICDEW.2006.145

  52. Wang, Y.; Guan, L.: Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10(5), 936–946 (2008). https://doi.org/10.1109/TMM.2008.927665

    Article  Google Scholar 

  53. Haq, S.; Jackson, P.J.B.: Multimodal Emotion Recognition, In W. Wang (ed), Machine Audition: Principles, Algorithms and Systems. IGI Global Press, ISBN 978–1615209194 Chapter 17, pp. 398–423 (2010). https://doi.org/10.4018/978-1-61520-919-4.ch017

  54. Eyben, F.; Wöllmer, M.; Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: Proc. ACM Multimedia (MM), ACM, ISBN 978-1-60558-933-6, pp. 1459–1462, Florence, Italy, (2009). https://doi.org/10.1145/1873951.1874246

  55. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  56. Smolensky, P.: Chapter 6: Information processing in dynamical systems: foundations of harmony theory. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, MIT Press, pp. 194–281 (1987)

  57. Hinton, G.E.; Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.112764

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  58. Salakhutdinov, R.: Learning deep generative models. Annu. Rev. Stat. Appl. 2, 361–385 (2015). https://doi.org/10.1146/annurev-statistics-010814-020120

    Article  Google Scholar 

  59. Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Neural networks: Tricks of the trade 7700, 599–619 Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_32

  60. Krizhevsky, A.; Hinton, G.E.: Learning multiple layers of features from tiny images 1(4) Technical report, University of Toronto (2009)

  61. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002). https://doi.org/10.1162/089976602760128018

    Article  PubMed  Google Scholar 

  62. Tanaka, S.M.; Okutomi, M.: A novel inference of a restricted Boltzmann machine. In: 22nd International Conference on Pattern Recognition, 2014, August, pp. 1526–1531, Stockholm, Sweden IEEE (2014). https://doi.org/10.1109/ICPR.2014.271

  63. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Preprint https://arxiv.org/abs/1409.1556 (2014). https://doi.org/10.48550/arXiv.1409.1556

  64. Davis, S.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980). https://doi.org/10.1109/TASSP.1980.1163420

    Article  Google Scholar 

  65. Skowronski, M.D.; Harris, J.G.: Improving the filter bank of a classic speech feature extraction algorithm. In: Proceedings of the 2003 International Symposium on Circuits and Systems ISCAS'03, 2003, May, IV-IV, Bangkok, Thailand. IEEE (2003). https://doi.org/10.1109/ISCAS.2003.1205828

  66. Fletcher, H.; Munson, W.A.: Loudness, its definition, measurement and calculation. Bell Syst. Tech. J. 12(4), 377–430 (1933). https://doi.org/10.1002/j.1538-7305.1933.tb00403.x

    Article  Google Scholar 

  67. Robinson, D.W.; Dadson, R.S.: A re-determination of the equal-loudness relations for pure tones. Br. J. Appl. Phys. 7(5), 166 (1956). https://doi.org/10.1088/0508-3443/7/5/302

    Article  ADS  Google Scholar 

  68. Suzuki, Y.; Mellert, V.; Richter, U.; Møller, H.; Nielsen, L.; Hellman, R.; Takeshima, H.: Precise and full-range determination of two-dimensional equal loudness contours. Tohoku University, Japan (2003)

  69. Suzuki, Y.; Takeshima, H.: Equal-loudness-level contours for pure tones. The J. Acoust. Soc. Am. 116(2), 918–933 (2004). https://doi.org/10.1121/1.1763601

    Article  ADS  PubMed  Google Scholar 

  70. Erickson, D.; Yoshida, K.; Menezes, C.; Fujino, A.; Mochida, T.; Shibuya, Y.: Exploratory study of some acoustic and articulatory characteristics of sad speech. Phonetica 63(1), 1–25 (2006). https://doi.org/10.1159/000091404

    Article  PubMed  Google Scholar 

  71. Li, Y.; Li, J.; Akagi, M.: Contributions of the glottal source and vocal tract cues to emotional vowel perception in the valence-arousal space. J. Acoust. Soc. Am. 144(2), 908 (2018). https://doi.org/10.1121/1.5051323

    Article  ADS  PubMed  Google Scholar 

  72. Zahorian, S. A.; Dikshit, P.; Hu, H.: A spectral-temporal method for pitch tracking. In: Ninth International Conference on Spoken Language Processing, 2006, September, paper 1910-Wed2A1O.5. Pittsburgh, Pennsylvania, USA (2006). https://doi.org/10.21437/Interspeech.2006-475

  73. De Cheveigné, A.; Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002). https://doi.org/10.1121/1.1458024

    Article  ADS  PubMed  Google Scholar 

  74. Kim, J.W.; et al.: Crepe: A convolutional representation for pitch estimation. In: 2018 International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 161–165. Calgary, AB, Canada, IEEE (2018). https://doi.org/10.1109/ICASSP.2018.8461329

  75. Staudacher, M., et al.: Fast fundamental frequency determination via adaptive autocorrelation. EURASIP J. Audio Speech Music Process. (2016). https://doi.org/10.1186/s13636-016-0095-8

    Article  Google Scholar 

  76. Goh, Y.H.; et al.: Fast Wavelet-based Pitch Period Detector for Speech Signals. In: 2016 International Conference on Computer Engineering and Information Systems, 2016, November, pp. 494–497, Shanghai, China. Atlantis Press (2016). https://doi.org/10.2991/ceis-16.2016.101

  77. Stone, S.; Steiner, P.; Birkholz, P.: A time-warping pitch tracking algorithm considering fast f0 changes. In: Interspeech 2017, August, pp. 419–423 Stockholm, Sweden (2017). https://doi.org/10.21437/Interspeech.2017-382

  78. Aneeja, G.; Yegnanarayana, B.: Extraction of fundamental frequency from degraded speech using temporal envelopes at high SNR frequencies. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 829–838 (2017). https://doi.org/10.1109/TASLP.2017.2666425

    Article  Google Scholar 

  79. Ardaillon, L., & Roebel, A.: Fully-convolutional network for pitch estimation of speech signals. In: Interspeech 2019, September, pp. 2005–2009, Graz, Austria, (2019). https://doi.org/10.21437/Interspeech.2019-2815

  80. Wang, D.; Yu, C.; Hansen, J.H.L.: Robust harmonic features for classification-based pitch estimation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 952–964 (2017). https://doi.org/10.1109/TASLP.2017.2667879

    Article  PubMed  PubMed Central  Google Scholar 

  81. Kim, J.; Erickson, D.; Lee, S.; Narayanan, S.: A study of invariant properties and variation patterns in the converter/distributor model for emotional speech. In: Interspeech 2014, September, pp. 413–417 Singapore (2014). https://doi.org/10.21437/Interspeech.2014-95

  82. Whiteside, S.P.: Simulated emotions: an acoustic study of voice and perturbation measures. In: Fifth International Conference from Spoken Language Processing (ICSLP 1998), November, paper 0153, Sydney Convention Centre, Sydney, Australia (1998). https://doi.org/10.21437/ICSLP.1998-141

  83. Gunes, H.; Piccardi, M.; Pantic, M.: From the lab to the real world: Affect recognition using multiple cues and modalities. InTech Education and Publishing, pp. 185–218 (2008). https://doi.org/10.5772/6180

Download references

Funding

This work did not receive any grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

CP: Conceptualization, Methodology, Writing, Experiments, BD: Supervising, Reviewing, Writing, Editing, Conceptualization, Validation, and Data Preparation, YA: Supervising, Reviewing, Writing, Editing, Validation.

Corresponding author

Correspondence to Cevahir Parlak.

Ethics declarations

Conflict of interest

There is no conflict of interest between the authors.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to Participate and for Publication

We confirm that we did not use participants in our study.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parlak, C., Diri, B. & Altun, Y. Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition. Arab J Sci Eng 49, 3209–3223 (2024). https://doi.org/10.1007/s13369-023-07920-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-023-07920-8

Keywords

Navigation