Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages

Özsönmez, Damla Büşra; Acarman, Tankut; Parlak, Ismail Burak

doi:10.1007/978-3-030-85577-2_37

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 308))

Included in the following conference series:

International Conference on Intelligent and Fuzzy Systems

1525 Accesses
1 Citations

Abstract

Emotion extraction and detection are considered as complex tasks due to the nature of data and subjects involved in the acquisition of sentiments. Speech analysis becomes a critical gateway in deep learning where the acoustic features would be trained to obtain more accurate descriptors to disentangle sentiments, customs in natural language. Speech feature extraction varies by the quality of audio records and linguistic properties. The speech nature is handled through a broad spectrum of emotions regarding the age, the gender and the social effects of subjects. Speech emotion analysis is fostered in English and German languages through multilevel corpus. The emotion features disseminate the acoustic analysis in videos or texts. In this study, we propose a multilingual analysis of emotion extraction using Turkish and English languages. MFCC (Mel-Frequency Cepstrum Coefficients), Mel Spectrogram, Linear Predictive Coding (LPC) and PLP-RASTA techniques are used to extract acoustic features. Three different data sets are analyzed using feed forward neural network hierarchy. Different emotion states such as happy, calm, sad and angry are compared in bilingual speech records. The accuracy and precision metrics are reached at level higher than 80%. Turkish language emotion classification is concluded to be more accurate regarding speech features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Emotion Recognition Using Deep Neural Networks on Multilingual Databases

A statistical feature extraction for deep speech emotion recognition in a bilingual scenario

Article 21 October 2022

Speech Emotion Recognition Using Feature Fusion of TEO and MFCC on Multilingual Databases

References

Thanapattheerakul, T., Mao, K., Amoranto, J., Chan, J.: Emotion in a century. In: Proceedings of the 10th International Conference on Advances in Information Technology - IAIT (2018)
Google Scholar
Fragopanagos, N., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Netw. 18(4), 389–405 (2005)
Article Google Scholar
Kołakowska, A., Landowska, A., Szwoch, M., Szwoch, W., Wróbel, M.: Emotion recognition and its applications. Adv. Intell. Syst. Comput. 300, 51–62 (2014)
Google Scholar
Chatterjee, R., Mazumdar, S., Sherratt, R.S., Halder, R., Maitra, T., Giri, D.: Real-time speech emotion analysis for smart home assistants. IEEE Trans. Consum. Electron. 67(1), 68–76 (2021)
Article Google Scholar
Bakır, C., Yuzkat, M.: Speech emotion classification and recognition with different methods for Turkish language. Balkan J. Electr. Comput. Eng. 6(2), 122–128 (2018)
Google Scholar
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: International Conference on Platform Technology and Service (PlatCon) (2017)
Google Scholar
Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Hajarolasvadi, N., Demirel, H.: 3D CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5), 479 (2019)
Article Google Scholar
Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)
Article Google Scholar
Oflazoglu, C., Yildirim, S.: Turkish emotional speech database. In: Proceedings of IEEE 19th Conference of Signal Processing and Communications Applications, pp. 1153–1156 (2011)
Google Scholar
Canpolat, S.F., Ormanoğlu, Z., Zeyrek, D.: Turkish emotion voice data-base (TurEV-DB). In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 368–375 (2020)
Google Scholar
Zhou, X., Garcia-Romero, D., Duraiswami, R., Espy-Wilson, C., Shamma, S.: Linear versus mel frequency cepstral coefficients for speaker recognition. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 559–564 (2011)
Google Scholar
Grama, L., Rusu, C.: Audio signal classification using linear predictive coding and random forests. In: International Conference on Speech Technology and Human-Computer Dialogue (SpeD) (2017)
Google Scholar
Nayana, P.K., Mathew, D., Thomas, A.: Performance comparison of speaker recognition systems using GMM and i-Vector methods with PNCC and RASTA PLP features. In: International Conference on Intelligent Computing, Instrumentation and Control Technologies (2017)
Google Scholar

Download references

Acknowledgement

This work has been supported by the Scientific Research Projects Commission of Galatasaray University under grant number # 19.401.005.

Author information

Authors and Affiliations

Department of Computer Engineering, GSUNLPLab, Galatasaray University, Ciragan Cad. No:36, 34349, Ortakoy, Istanbul, Turkey
Damla Büşra Özsönmez, Tankut Acarman & Ismail Burak Parlak

Authors

Damla Büşra Özsönmez
View author publications
You can also search for this author in PubMed Google Scholar
Tankut Acarman
View author publications
You can also search for this author in PubMed Google Scholar
Ismail Burak Parlak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ismail Burak Parlak .

Editor information

Editors and Affiliations

Department of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey
Cengiz Kahraman
Industrial Engineering Department, Yildiz Technical University, Istanbul, Turkey
Selcuk Cebi
Department of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey
Sezi Cevik Onar
Department of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey
Basar Oztaysi
Industrial Engineering Department, Galatasaray University, Istanbul, Turkey
A. Cagri Tolga
Department of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey
Irem Ucal Sari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Özsönmez, D.B., Acarman, T., Parlak, I.B. (2022). Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages. In: Kahraman, C., Cebi, S., Cevik Onar, S., Oztaysi, B., Tolga, A.C., Sari, I.U. (eds) Intelligent and Fuzzy Techniques for Emerging Conditions and Digital Transformation. INFUS 2021. Lecture Notes in Networks and Systems, vol 308. Springer, Cham. https://doi.org/10.1007/978-3-030-85577-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-85577-2_37
Published: 24 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85576-5
Online ISBN: 978-3-030-85577-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages

Abstract

Access this chapter

Similar content being viewed by others

Speech Emotion Recognition Using Deep Neural Networks on Multilingual Databases

A statistical feature extraction for deep speech emotion recognition in a bilingual scenario

Speech Emotion Recognition Using Feature Fusion of TEO and MFCC on Multilingual Databases

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages

Abstract

Access this chapter

Similar content being viewed by others

Speech Emotion Recognition Using Deep Neural Networks on Multilingual Databases

A statistical feature extraction for deep speech emotion recognition in a bilingual scenario

Speech Emotion Recognition Using Feature Fusion of TEO and MFCC on Multilingual Databases

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation