Abstract
Emotion extraction and detection are considered as complex tasks due to the nature of data and subjects involved in the acquisition of sentiments. Speech analysis becomes a critical gateway in deep learning where the acoustic features would be trained to obtain more accurate descriptors to disentangle sentiments, customs in natural language. Speech feature extraction varies by the quality of audio records and linguistic properties. The speech nature is handled through a broad spectrum of emotions regarding the age, the gender and the social effects of subjects. Speech emotion analysis is fostered in English and German languages through multilevel corpus. The emotion features disseminate the acoustic analysis in videos or texts. In this study, we propose a multilingual analysis of emotion extraction using Turkish and English languages. MFCC (Mel-Frequency Cepstrum Coefficients), Mel Spectrogram, Linear Predictive Coding (LPC) and PLP-RASTA techniques are used to extract acoustic features. Three different data sets are analyzed using feed forward neural network hierarchy. Different emotion states such as happy, calm, sad and angry are compared in bilingual speech records. The accuracy and precision metrics are reached at level higher than 80%. Turkish language emotion classification is concluded to be more accurate regarding speech features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Thanapattheerakul, T., Mao, K., Amoranto, J., Chan, J.: Emotion in a century. In: Proceedings of the 10th International Conference on Advances in Information Technology - IAIT (2018)
Fragopanagos, N., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Netw. 18(4), 389–405 (2005)
Kołakowska, A., Landowska, A., Szwoch, M., Szwoch, W., Wróbel, M.: Emotion recognition and its applications. Adv. Intell. Syst. Comput. 300, 51–62 (2014)
Chatterjee, R., Mazumdar, S., Sherratt, R.S., Halder, R., Maitra, T., Giri, D.: Real-time speech emotion analysis for smart home assistants. IEEE Trans. Consum. Electron. 67(1), 68–76 (2021)
Bakır, C., Yuzkat, M.: Speech emotion classification and recognition with different methods for Turkish language. Balkan J. Electr. Comput. Eng. 6(2), 122–128 (2018)
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: International Conference on Platform Technology and Service (PlatCon) (2017)
Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Hajarolasvadi, N., Demirel, H.: 3D CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5), 479 (2019)
Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)
Oflazoglu, C., Yildirim, S.: Turkish emotional speech database. In: Proceedings of IEEE 19th Conference of Signal Processing and Communications Applications, pp. 1153–1156 (2011)
Canpolat, S.F., Ormanoğlu, Z., Zeyrek, D.: Turkish emotion voice data-base (TurEV-DB). In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 368–375 (2020)
Zhou, X., Garcia-Romero, D., Duraiswami, R., Espy-Wilson, C., Shamma, S.: Linear versus mel frequency cepstral coefficients for speaker recognition. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 559–564 (2011)
Grama, L., Rusu, C.: Audio signal classification using linear predictive coding and random forests. In: International Conference on Speech Technology and Human-Computer Dialogue (SpeD) (2017)
Nayana, P.K., Mathew, D., Thomas, A.: Performance comparison of speaker recognition systems using GMM and i-Vector methods with PNCC and RASTA PLP features. In: International Conference on Intelligent Computing, Instrumentation and Control Technologies (2017)
Acknowledgement
This work has been supported by the Scientific Research Projects Commission of Galatasaray University under grant number # 19.401.005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Özsönmez, D.B., Acarman, T., Parlak, I.B. (2022). Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages. In: Kahraman, C., Cebi, S., Cevik Onar, S., Oztaysi, B., Tolga, A.C., Sari, I.U. (eds) Intelligent and Fuzzy Techniques for Emerging Conditions and Digital Transformation. INFUS 2021. Lecture Notes in Networks and Systems, vol 308. Springer, Cham. https://doi.org/10.1007/978-3-030-85577-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-85577-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85576-5
Online ISBN: 978-3-030-85577-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)