Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

Parlak, Cevahir; Diri, Banu; Altun, Yusuf

doi:10.1007/s13369-023-07920-8

Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

Research Article-Computer Engineering and Computer Science
Published: 27 May 2023

Volume 49, pages 3209–3223, (2024)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

535 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this study, novel Spectro-Temporal Energy Ratio features based on the formants of vowels, linearly spaced low-frequency, and logarithmically spaced high-frequency parts of the human auditory system are introduced to implement single- and cross-corpus speech emotion recognition experiments. Since the underlying dynamics and characteristics of speech recognition and speech emotion recognition differ too much, designing an emotion-recognition-specific filter bank is mandatory. The proposed features will formulate a novel filter bank strategy to construct 7 trapezoidal filter banks. These novel filter banks differ from Mel and Bark scales in shape and frequency regions and are targeted to generalize the feature space. Cross-corpus experimentation is a step forward in speech emotion recognition, but the researchers are usually chagrined at its results. Our goal is to create a feature set that is robust and resistant to cross-corporal variations using various feature selection algorithms. We will prove this by shrinking the dimension of the feature space from 6984 down to 128 while boosting the accuracy using SVM, RBM, and sVGG (small-VGG) classifiers. Although RBMs are considered no longer fashionable, we will show that they can achieve outstanding jobs when tuned properly. This paper discloses a striking 90.65% accuracy rate harnessing STER features on EmoDB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Availability of Data and Materials

Data is available on the following site: https://github.com/cevparlak/speech-emotion-recognition.

Code Availability

The source code is available on the following site: https://github.com/cevparlak/speech-emotion-recognition.

References

Herculano-Houzel, S.: The human brain in numbers: a linearly scaled-up primate brain. Front. Hum. Neurosci. (2009). https://doi.org/10.3389/neuro.09.031.2009
Article PubMed PubMed Central Google Scholar
Nguyen, T.: Total number of synapses in the adult human neocortex. Undergrad. J. Math. Model. One+ Two 3(1), 26 (2010). https://doi.org/10.5038/2326-3652.3.1.26
Article Google Scholar
Ekman, P.E.; Davidson, R.J.: The Nature of Emotion: Fundamental Questions. Oxford University Press (1994)
Google Scholar
Plutchik, R.: The Nature of Emotions Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)
Article ADS Google Scholar
Whissel, C.M.: The Dictionary of Affect in Language: Emotion: Theory, Research and Experience. Academic Press, New York (1989) https://doi.org/10.1016/B978-0-12-558704-4.50011-6
Book Google Scholar
Cowie, R.; Cowie, E.D.; Tsapatsoulis, N.; Votsis, G.; Kollias, S.; Fellenz, W.; Taylor, J.G.: Emotion recognition in human-computer interaction. Signal Process. Mag. IEEE 18(1), 32–80 (2001). https://doi.org/10.1109/79.911197
Article ADS Google Scholar
Schuller, B.; Batliner, A.; Steidl, S.; Seppi, D.: Recognizing realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011). https://doi.org/10.1016/j.specom.2011.01.011
Article Google Scholar
Wang, W.: Machine Audition: Principles, Algorithms, and Systems, p. 1–554. IGI Global (2011)
Book Google Scholar
Wu, S.; Falk, T.H.; Chan, W.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2010). https://doi.org/10.1016/j.specom.2010.08.013
Article Google Scholar
Ververidis, D.; Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006). https://doi.org/10.1016/j.specom.2006.04.003
Article Google Scholar
Ramakrishnan, S.: Recognition of emotion from speech: a review. Int. J. Speech Technol. 15(2), 99–117 (2012)
Article Google Scholar
He, L.: Stress and Emotion Recognition in Natural Speech in the Work and Family Environments, PhD Thesis, RMIT University (2010)
Neumann, M.: Cross-lingual and multilingual speech emotion recognition on English and French. In: 2018 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5769–5773. Calgary, AB, Canada IEEE (2018). https://doi.org/10.1109/ICASSP.2018.8462162
Zhang, B.; Provost, E.M.; Essl, G.: Cross-corpus acoustic emotion recognition with multi-task learning: seeking common ground while preserving differences. IEEE Trans. Affect. Comput. 10(1), 85–99 (2019). https://doi.org/10.1109/TAFFC.2017.2684799
Article Google Scholar
Song, P.: Transfer linear subspace learning for cross-corpus speech emotion recognition. IEEE Trans. Aff. Comput. 10(2), 265–275 (2017). https://doi.org/10.1109/TAFFC.2017.2705696
Article Google Scholar
Shah, M.; Chakrabarti, C.; Spanias, A.: Within and cross-corpus speech emotion recognition using latent topic model-based features. EURASIP J. Audio Speech Music Process. (2015). https://doi.org/10.1186/s13636-014-0049-y
Article Google Scholar
Song, P., et al.: Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization. Speech Commun. 83, 34–41 (2016). https://doi.org/10.1016/j.specom.2016.07.010
Article ADS Google Scholar
Wang, K., et al.: Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015). https://doi.org/10.1109/TAFFC.2015.2392101
Article MathSciNet Google Scholar
Yenigalla, P.; et al.: Speech emotion recognition using spectrogram & phoneme embedding. In: Interspeech 2018, September, pp. 3688–3692. Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-1811
Mao, Q., et al.: Learning emotion-discriminative and domain-invariant features for domain adaptation in speech emotion recognition. Speech Commun. 93, 1–10 (2017). https://doi.org/10.1016/j.specom.2017.06.006
Article ADS Google Scholar
Kamińska, D.; Sapiński, T.; Anbarjafari, G.: Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP J. Audio Speech Music Process. (2017). https://doi.org/10.1186/s13636-017-0100-x
Article Google Scholar
Seng, K.P.; Li-Minn, A.; Ooi, C.S.: A combined rule-based & machine learning audio-visual emotion recognition approach. IEEE Trans. Affect. Comput. 9(1), 3–13 (2016). https://doi.org/10.1109/TAFFC.2016.2588488
Article Google Scholar
Phan, D.A.; Matsumoto, Y.; Shindo, H.: Autoencoder for semisupervised multiple emotion detection of conversation transcripts. IEEE Trans. Affect. Comput. 12(3), 682–691 (2018). https://doi.org/10.1109/TAFFC.2018.2885304
Article Google Scholar
Deng, J., et al.: Semisupervised autoencoders for speech emotion recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 31–43 (2017). https://doi.org/10.1109/TASLP.2017.2759338
Article MathSciNet Google Scholar
Sahu, S.; Gupta, R.; Espy-Wilson, C.: On enhancing speech emotion recognition using generative adversarial networks. Preprint https://arxiv.org/abs/1806.06626 (2018). https://doi.org/10.48550/arXiv.1806.06626
Oflazoglu, C.; Yildirim, S.: Recognizing emotion from Turkish speech using acoustic features. EURASIP J. Audio Speech Music Process. (2013). https://doi.org/10.1186/1687-4722-2013-26
Article Google Scholar
Kaya, H.; et al.: LSTM Based cross-corpus and cross-task acoustic emotion recognition. In: Interspeech 2018, September, pp. 521–525, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-2298
Rouast, P.V.; Adam, M.; Chiong, R.: Deep learning for human affect recognition: insights and new developments. IEEE Trans. Affect. Comput. 12(2), 524–543 (2019). https://doi.org/10.1109/TAFFC.2018.2890471
Article Google Scholar
Cho, J.; et al.: Deep neural networks for emotion recognition combining audio and transcripts. In: Interspeech 2018, September, pp. 247–251, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-2466
Kim, J.; Saurus, R.A.: Emotion recognition from human speech using temporal information and deep learning. In: Interspeech 2018, September, pp. 937–940, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-1132
Ma, X.; et al.: Emotion recognition from variable-length speech segments using deep learning on spectrograms. In: Interspeech 2018, September, pp. 3683–3687, Hyderabad, India (2018). https://doi.org/10.21437/Interspeech.2018-2228
Trigeorgis, G.; et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016 March, pp. 5200–5204, Shanghai, China. IEEE (2016). https://doi.org/10.1109/ICASSP.2016.7472669
Tzirakis, P.; Zhang, J.; Schuller, B.W.: End-to-end speech emotion recognition using deep neural networks. In: 2018 International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, April, pp. 5089–5093, Calgary, AB, Canada. IEEE (2018). https://doi.org/10.1109/ICASSP.2018.8462677
Busso, C.; Bulut, M.; Lee, C.C.; Kazemzadeh, A.; Mower, E.; Kim, S.; Chang, J.N.; Lee, S.; Narayanan, S.S.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6
Article Google Scholar
McKeown, G.; Valstar, M.; Cowie, R.; Pantic, M.; Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012). https://doi.org/10.1109/T-AFFC.2011.20
Article Google Scholar
Jeon, J.H.; Le, D.; Xia, R.; Liu, Y.: (2013). A preliminary study of cross-lingual emotion recognition from speech: automatic classification versus human perception. In: Interspeech 2013, August, pp. 2837–2840 Lyon, France (2013). https://doi.org/10.21437/Interspeech.2013-246
Eyben, F.; Batliner, A.; Schuller, B.; Seppi, D.; Steidl, S.: Cross-Corpus classification of realistic emotions–some pilot experiments. In: Proc. 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, pp. 77–82Valetta, Malta (2010)
Schuller, B.; Zhang, Z.; Weninger, F.; Rigoll, G.: Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization. In: Proc. Afeka-AVIOS Speech Processing Conference, Tel Aviv, Israel (2011)
Shami, M.; Verhelst, W.: An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Commun. 49(3), 201–212 (2007). https://doi.org/10.1016/j.specom.2007.01.006
Article Google Scholar
Wen, G., et al.: Random deep belief networks for recognizing emotions from speech signals. Comput. Intell. Neurosci. (2017). https://doi.org/10.1155/2017/1945630
Article PubMed PubMed Central Google Scholar
Kahou, S.E., et al.: Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10, 99–111 (2016). https://doi.org/10.1007/s12193-015-0195-2
Article Google Scholar
Hassan, M.M., et al.: Human emotion recognition using deep belief network architecture. Inf. Fusion 51, 10–18 (2019). https://doi.org/10.1016/j.inffus.2018.10.009
Article Google Scholar
Atmaja, B.T.; Akagi, M.: Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM. Speech Commun. 126, 9–21 (2021). https://doi.org/10.1016/j.specom.2020.11.003
Article Google Scholar
Firdaus, M.; Chauhan, H.; Ekbal, A.; Bhattacharyya, P.: MEISD: A Multimodal Multi-Label Emotion, Intensity, and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations. In: Proceedings of the 28th International Conference on Computational Linguistics, 2020, December, pp. 4441–4453, Barcelona, Spain (2020). https://doi.org/10.18653/v1/2020.coling-main.393
Poria, S.; Hazarika, D.; Majumder, N.; Naik, G.; Cambria, E.; Mihalcea, R.: MELD: A multimodal multi-party dataset for emotion recognition in conversations. Preprint https://arxiv.org/abs/1810.02508 (2018). https://doi.org/10.48550/arXiv.1810.02508
Yin, Y.; Zheng, X.; Hu, B.; Zhang, Y.; Cui, X.: EEG emotion recognition using fusion model of graph convolutional neural networks and LSTM. Appl. Soft Comput. 100, 106954 (2021). https://doi.org/10.1016/j.asoc.2020.106954
Article Google Scholar
Pakyurek, M.; Atmis, M.; Kulac, S.; Uludag, U.: Extraction of novel features based on histograms of MFCCs used in emotion classification from generated original speech dataset. Elektronika ir Elektrotechnika 26(1), 46–51 (2020). https://doi.org/10.5755/j01.eie.26.1.25309
Article Google Scholar
Parlak, C.; Diri, B.; Gürgen, F.: A Cross-Corpus Experiment in Speech Emotion Recognition. In: Proc. International Workshop on Speech, Language and Audio in Multimedia (SLAM 2014), pp. 58–61, Penang, Malaysia, (2014)
Burkhardt, F.; Paeschke, A.; Rolfes, M.; Sendlmeier, W.F.; Weiss, B.: A database of German emotional speech. In: Interspeech 2005, September, pp. 1517–1520 Lisbon, Portugal (2005). https://doi.org/10.21437/Interspeech.2005-446
Eyben, F.; Wollmer, M.; Schuller, B.: OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In: 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops ACII 1–6, 2009, September, Amsterdam, Netherlands. IEEE (2009). https://doi.org/10.1109/ACII.2009.5349350
Martin, O.; Kotsia, I.; Macq, B.; Pitas, I.: The eNTERFACE'05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW'06) 8–8, April 2006, Atlanta, GA, USA. IEEE (2006). https://doi.org/10.1109/ICDEW.2006.145
Wang, Y.; Guan, L.: Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10(5), 936–946 (2008). https://doi.org/10.1109/TMM.2008.927665
Article Google Scholar
Haq, S.; Jackson, P.J.B.: Multimodal Emotion Recognition, In W. Wang (ed), Machine Audition: Principles, Algorithms and Systems. IGI Global Press, ISBN 978–1615209194 Chapter 17, pp. 398–423 (2010). https://doi.org/10.4018/978-1-61520-919-4.ch017
Eyben, F.; Wöllmer, M.; Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: Proc. ACM Multimedia (MM), ACM, ISBN 978-1-60558-933-6, pp. 1459–1462, Florence, Italy, (2009). https://doi.org/10.1145/1873951.1874246
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). https://doi.org/10.1145/1656274.1656278
Article Google Scholar
Smolensky, P.: Chapter 6: Information processing in dynamical systems: foundations of harmony theory. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, MIT Press, pp. 194–281 (1987)
Hinton, G.E.; Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.112764
Article ADS MathSciNet CAS PubMed Google Scholar
Salakhutdinov, R.: Learning deep generative models. Annu. Rev. Stat. Appl. 2, 361–385 (2015). https://doi.org/10.1146/annurev-statistics-010814-020120
Article Google Scholar
Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Neural networks: Tricks of the trade 7700, 599–619 Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_32
Krizhevsky, A.; Hinton, G.E.: Learning multiple layers of features from tiny images 1(4) Technical report, University of Toronto (2009)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002). https://doi.org/10.1162/089976602760128018
Article PubMed Google Scholar
Tanaka, S.M.; Okutomi, M.: A novel inference of a restricted Boltzmann machine. In: 22nd International Conference on Pattern Recognition, 2014, August, pp. 1526–1531, Stockholm, Sweden IEEE (2014). https://doi.org/10.1109/ICPR.2014.271
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Preprint https://arxiv.org/abs/1409.1556 (2014). https://doi.org/10.48550/arXiv.1409.1556
Davis, S.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980). https://doi.org/10.1109/TASSP.1980.1163420
Article Google Scholar
Skowronski, M.D.; Harris, J.G.: Improving the filter bank of a classic speech feature extraction algorithm. In: Proceedings of the 2003 International Symposium on Circuits and Systems ISCAS'03, 2003, May, IV-IV, Bangkok, Thailand. IEEE (2003). https://doi.org/10.1109/ISCAS.2003.1205828
Fletcher, H.; Munson, W.A.: Loudness, its definition, measurement and calculation. Bell Syst. Tech. J. 12(4), 377–430 (1933). https://doi.org/10.1002/j.1538-7305.1933.tb00403.x
Article Google Scholar
Robinson, D.W.; Dadson, R.S.: A re-determination of the equal-loudness relations for pure tones. Br. J. Appl. Phys. 7(5), 166 (1956). https://doi.org/10.1088/0508-3443/7/5/302
Article ADS Google Scholar
Suzuki, Y.; Mellert, V.; Richter, U.; Møller, H.; Nielsen, L.; Hellman, R.; Takeshima, H.: Precise and full-range determination of two-dimensional equal loudness contours. Tohoku University, Japan (2003)
Suzuki, Y.; Takeshima, H.: Equal-loudness-level contours for pure tones. The J. Acoust. Soc. Am. 116(2), 918–933 (2004). https://doi.org/10.1121/1.1763601
Article ADS PubMed Google Scholar
Erickson, D.; Yoshida, K.; Menezes, C.; Fujino, A.; Mochida, T.; Shibuya, Y.: Exploratory study of some acoustic and articulatory characteristics of sad speech. Phonetica 63(1), 1–25 (2006). https://doi.org/10.1159/000091404
Article PubMed Google Scholar
Li, Y.; Li, J.; Akagi, M.: Contributions of the glottal source and vocal tract cues to emotional vowel perception in the valence-arousal space. J. Acoust. Soc. Am. 144(2), 908 (2018). https://doi.org/10.1121/1.5051323
Article ADS PubMed Google Scholar
Zahorian, S. A.; Dikshit, P.; Hu, H.: A spectral-temporal method for pitch tracking. In: Ninth International Conference on Spoken Language Processing, 2006, September, paper 1910-Wed2A1O.5. Pittsburgh, Pennsylvania, USA (2006). https://doi.org/10.21437/Interspeech.2006-475
De Cheveigné, A.; Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002). https://doi.org/10.1121/1.1458024
Article ADS PubMed Google Scholar
Kim, J.W.; et al.: Crepe: A convolutional representation for pitch estimation. In: 2018 International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 161–165. Calgary, AB, Canada, IEEE (2018). https://doi.org/10.1109/ICASSP.2018.8461329
Staudacher, M., et al.: Fast fundamental frequency determination via adaptive autocorrelation. EURASIP J. Audio Speech Music Process. (2016). https://doi.org/10.1186/s13636-016-0095-8
Article Google Scholar
Goh, Y.H.; et al.: Fast Wavelet-based Pitch Period Detector for Speech Signals. In: 2016 International Conference on Computer Engineering and Information Systems, 2016, November, pp. 494–497, Shanghai, China. Atlantis Press (2016). https://doi.org/10.2991/ceis-16.2016.101
Stone, S.; Steiner, P.; Birkholz, P.: A time-warping pitch tracking algorithm considering fast f0 changes. In: Interspeech 2017, August, pp. 419–423 Stockholm, Sweden (2017). https://doi.org/10.21437/Interspeech.2017-382
Aneeja, G.; Yegnanarayana, B.: Extraction of fundamental frequency from degraded speech using temporal envelopes at high SNR frequencies. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 829–838 (2017). https://doi.org/10.1109/TASLP.2017.2666425
Article Google Scholar
Ardaillon, L., & Roebel, A.: Fully-convolutional network for pitch estimation of speech signals. In: Interspeech 2019, September, pp. 2005–2009, Graz, Austria, (2019). https://doi.org/10.21437/Interspeech.2019-2815
Wang, D.; Yu, C.; Hansen, J.H.L.: Robust harmonic features for classification-based pitch estimation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 952–964 (2017). https://doi.org/10.1109/TASLP.2017.2667879
Article PubMed PubMed Central Google Scholar
Kim, J.; Erickson, D.; Lee, S.; Narayanan, S.: A study of invariant properties and variation patterns in the converter/distributor model for emotional speech. In: Interspeech 2014, September, pp. 413–417 Singapore (2014). https://doi.org/10.21437/Interspeech.2014-95
Whiteside, S.P.: Simulated emotions: an acoustic study of voice and perturbation measures. In: Fifth International Conference from Spoken Language Processing (ICSLP 1998), November, paper 0153, Sydney Convention Centre, Sydney, Australia (1998). https://doi.org/10.21437/ICSLP.1998-141
Gunes, H.; Piccardi, M.; Pantic, M.: From the lab to the real world: Affect recognition using multiple cues and modalities. InTech Education and Publishing, pp. 185–218 (2008). https://doi.org/10.5772/6180

Download references

Funding

This work did not receive any grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Computer Engineering Department, Duzce University, Duzce, Turkey
Cevahir Parlak & Yusuf Altun
Computer Engineering Department, Yildiz Technical University, Istanbul, Turkey
Banu Diri
Information Technologies Department, T C Ataşehir Adıgüzel Vocational School, İstanbul, Turkey
Cevahir Parlak

Authors

Cevahir Parlak
View author publications
You can also search for this author in PubMed Google Scholar
Banu Diri
View author publications
You can also search for this author in PubMed Google Scholar
Yusuf Altun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

CP: Conceptualization, Methodology, Writing, Experiments, BD: Supervising, Reviewing, Writing, Editing, Conceptualization, Validation, and Data Preparation, YA: Supervising, Reviewing, Writing, Editing, Validation.

Corresponding author

Correspondence to Cevahir Parlak.

Ethics declarations

Conflict of interest

There is no conflict of interest between the authors.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to Participate and for Publication

We confirm that we did not use participants in our study.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Parlak, C., Diri, B. & Altun, Y. Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition. Arab J Sci Eng 49, 3209–3223 (2024). https://doi.org/10.1007/s13369-023-07920-8

Download citation

Received: 17 April 2022
Accepted: 30 April 2023
Published: 27 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s13369-023-07920-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Availability of Data and Materials

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human and Animal Rights

Consent to Participate and for Publication

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectro-Temporal Energy Ratio Features for Single-Corpus and Cross-Corpus Experiments in Speech Emotion Recognition

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Availability of Data and Materials

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human and Animal Rights

Consent to Participate and for Publication

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation