Skip to main content

Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages

  • Conference paper
  • First Online:
Intelligent and Fuzzy Techniques for Emerging Conditions and Digital Transformation (INFUS 2021)

Abstract

Emotion extraction and detection are considered as complex tasks due to the nature of data and subjects involved in the acquisition of sentiments. Speech analysis becomes a critical gateway in deep learning where the acoustic features would be trained to obtain more accurate descriptors to disentangle sentiments, customs in natural language. Speech feature extraction varies by the quality of audio records and linguistic properties. The speech nature is handled through a broad spectrum of emotions regarding the age, the gender and the social effects of subjects. Speech emotion analysis is fostered in English and German languages through multilevel corpus. The emotion features disseminate the acoustic analysis in videos or texts. In this study, we propose a multilingual analysis of emotion extraction using Turkish and English languages. MFCC (Mel-Frequency Cepstrum Coefficients), Mel Spectrogram, Linear Predictive Coding (LPC) and PLP-RASTA techniques are used to extract acoustic features. Three different data sets are analyzed using feed forward neural network hierarchy. Different emotion states such as happy, calm, sad and angry are compared in bilingual speech records. The accuracy and precision metrics are reached at level higher than 80%. Turkish language emotion classification is concluded to be more accurate regarding speech features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Thanapattheerakul, T., Mao, K., Amoranto, J., Chan, J.: Emotion in a century. In: Proceedings of the 10th International Conference on Advances in Information Technology - IAIT (2018)

    Google Scholar 

  2. Fragopanagos, N., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Netw. 18(4), 389–405 (2005)

    Article  Google Scholar 

  3. Kołakowska, A., Landowska, A., Szwoch, M., Szwoch, W., Wróbel, M.: Emotion recognition and its applications. Adv. Intell. Syst. Comput. 300, 51–62 (2014)

    Google Scholar 

  4. Chatterjee, R., Mazumdar, S., Sherratt, R.S., Halder, R., Maitra, T., Giri, D.: Real-time speech emotion analysis for smart home assistants. IEEE Trans. Consum. Electron. 67(1), 68–76 (2021)

    Article  Google Scholar 

  5. Bakır, C., Yuzkat, M.: Speech emotion classification and recognition with different methods for Turkish language. Balkan J. Electr. Comput. Eng. 6(2), 122–128 (2018)

    Google Scholar 

  6. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: International Conference on Platform Technology and Service (PlatCon) (2017)

    Google Scholar 

  7. Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)

    Google Scholar 

  8. Hajarolasvadi, N., Demirel, H.: 3D CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5), 479 (2019)

    Article  Google Scholar 

  9. Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018)

    Article  Google Scholar 

  10. Oflazoglu, C., Yildirim, S.: Turkish emotional speech database. In: Proceedings of IEEE 19th Conference of Signal Processing and Communications Applications, pp. 1153–1156 (2011)

    Google Scholar 

  11. Canpolat, S.F., Ormanoğlu, Z., Zeyrek, D.: Turkish emotion voice data-base (TurEV-DB). In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 368–375 (2020)

    Google Scholar 

  12. Zhou, X., Garcia-Romero, D., Duraiswami, R., Espy-Wilson, C., Shamma, S.: Linear versus mel frequency cepstral coefficients for speaker recognition. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 559–564 (2011)

    Google Scholar 

  13. Grama, L., Rusu, C.: Audio signal classification using linear predictive coding and random forests. In: International Conference on Speech Technology and Human-Computer Dialogue (SpeD) (2017)

    Google Scholar 

  14. Nayana, P.K., Mathew, D., Thomas, A.: Performance comparison of speaker recognition systems using GMM and i-Vector methods with PNCC and RASTA PLP features. In: International Conference on Intelligent Computing, Instrumentation and Control Technologies (2017)

    Google Scholar 

Download references

Acknowledgement

This work has been supported by the Scientific Research Projects Commission of Galatasaray University under grant number # 19.401.005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ismail Burak Parlak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Özsönmez, D.B., Acarman, T., Parlak, I.B. (2022). Bilingual Speech Emotion Recognition Using Neural Networks: A Case Study for Turkish and English Languages. In: Kahraman, C., Cebi, S., Cevik Onar, S., Oztaysi, B., Tolga, A.C., Sari, I.U. (eds) Intelligent and Fuzzy Techniques for Emerging Conditions and Digital Transformation. INFUS 2021. Lecture Notes in Networks and Systems, vol 308. Springer, Cham. https://doi.org/10.1007/978-3-030-85577-2_37

Download citation

Publish with us

Policies and ethics