Skip to main content

Advertisement

Log in

Comparative studies on machine learning for paralinguistic signal compression and classification

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we focus on various compression and classification algorithms for three different paralinguistic signal classification tasks. These tasks are quite difficult for humans because the sound information from such signals is difficult to distinguish. Therefore, when machine learning techniques are applied to analyze paralinguistic signals, several different aspects of speech-related information, such as prosody, energy, and cepstral information, are usually considered for feature extraction. However, when the size of the training corpus is not sufficiently large, it is extremely difficult to directly apply machine learning to classify such signals due to their high feature dimensions; this problem is also known as the curse of dimensionality. This paper proposes to address this limitation by means of feature compression. First, we present experimental results obtained by using various compression algorithms to compress signals to eliminate redundancy of the signal features. We observe that compared with the original features, the compressed signal features still provide a comparable ability to distinguish the signals, especially when using a fully connected neural network classifier. Second, we calculate the output distribution of the F1-score for each emotion in the speech emotion recognition problem and show that the fully connected neural network classifier performs more stably than other classical methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aldeneh Z, Provost EM (2017) Using regional saliency for speech emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2741–2745

  2. Amiriparian S, Gerczuk M, Ottl S, Cummins N, Freitag M, Pugachevskiy S, Baird A, Schuller BW (2017) Snore sound classification using image-based deep spectrum features. In: INTERSPEECH, pp 3512–3516

  3. Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Casper J, Catanzaro B, Cheng Q, Chen G, et al. (2016) Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp 173–182

  4. Bandela SR, Kishpre KT (2019) Speech emotion recognition using semi-NMF feature optimization. Turk J Electr Eng Comput Sci 27(5):3741–3757

    Article  Google Scholar 

  5. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM, pp 144–152

  6. Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: Interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335

    Article  Google Scholar 

  7. Byun S, Yoon S, Jung K (2019) Neural networks for compressing and classifying speaker-independent paralinguistic signals. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, pp 1–4

  8. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. ACM, pp 785–794

  9. Chiou BC, Chen CP (2013) Feature space dimension reduction in speech emotion recognition using support vector machine. In: 2013 Asia–Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, pp 1–6

  10. Cho J, Pappagari R, Kulkarni P, Villalba J, Carmiel Y, Dehak N (2018) Deep neural networks for emotion recognition combining audio and transcripts. Proc Interspeech 2018:247–251

    Article  Google Scholar 

  11. Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia. ACM, pp 835–838

  12. Fewzee P, Karray F (2012) Dimensionality reduction for emotional speech recognition. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing. IEEE, pp 532–537

  13. Gamage KW, Sethu V, Ambikairajah E (2017) Salience based lexical features for emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5830–5834

  14. Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth Annual Conference of the International Speech Communication Association

  15. Hantke S, Eyben F, Appel T, Schuller B (2015) ihearu-play: introducing a game for crowdsourced data collection for affective computing. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp 891–897

  16. Hantke S, Sagha H, Cummins N, Schuller B (2017) Emotional speech of mentally and physically disabled individuals: introducing the emotass database and first findings. Proc Interspeech 2017:3137–3141

    Article  Google Scholar 

  17. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448–456

  18. Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp 971–980

  19. Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. In: Sixteenth Annual Conference of the International Speech Communication Association

  20. Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. 2017 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE, pp 2227–2231

  21. Neumann M, Vu NT (2017) Attentive convolutional neural network based speech emotion recognition: a study on the impact of input features, signal length, and acted speech. Proc Interspeech 2017:1263–1267

    Article  Google Scholar 

  22. Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. Int J Smart Home 6(2):101–108

    Google Scholar 

  23. Panwar S, Rad P, Choo KKR, Roopaei M (2019) Are you emotional or depressed? Learning about your emotional state from your music using machine learning. J Supercomput 75(6):2986–3009

    Article  Google Scholar 

  24. Quan C, Wan D, Zhang B, Ren F (2013) Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition. In: Proceedings of the 2013 IEEE/SICE International Symposium on System Integration. IEEE, pp 222–226

  25. Sahu S, Gupta R, Sivaraman G, AbdAlmageed W, Espy-Wilson C (2017) Adversarial auto-encoders for speech based emotion recognition. Proc Interspeech 2017:1243–1247

    Article  Google Scholar 

  26. Sahu S, Gupta R, Espy-Wilson C (2018) On enhancing speech emotion recognition using generative adversarial networks. Proc Interspeech 2018:3693–3697

    Article  Google Scholar 

  27. Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP’03), vol 2. IEEE, pp II-1

  28. Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings (ICASSP’04), vol 1. IEEE, pp I-577

  29. Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, MüLler C, Narayanan S (2013) Paralinguistics in speech and language-state-of-the-art and the challenge. Comput Speech Lang 27(1):4–39

    Article  Google Scholar 

  30. Schuller B, Steidl S, Batliner A, Hirschberg J, Burgoon JK, Baird A, Elkins A, Zhang Y, Coutinho E, Evanini K (2016) The interspeech 2016 computational paralinguistics challenge: deception, sincerity & native language. Interspeech 2016:2001–2005

    Article  Google Scholar 

  31. Schuller B, Steidl S, Batliner A, Bergelson E, Krajewski J, Janott C, Amatuni A, Casillas M, Seidl A, Soderstrom M, et al (2017) The interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In: Computational Paralinguistics Challenge (ComParE), Interspeech 2017, pp 3442–3446

  32. Schuller B, Steidl S, Batliner A, Marschik PB, Baumeister H, Dong F, Hantke S, Pokorny FB, Rathner EM, Bartl-Pokorny KD et al (2018) The interspeech 2018 computational paralinguistics challenge: atypical & self-assessed affect, crying & heart beats. Proc Interspeech 2018:122–126

    Article  Google Scholar 

  33. Schuller BW, Batliner A, Bergler C, Pokorny FB, Krajewski J, Cychosz M, Schmitt M, et al (2019) The interspeech 2019 computational paralinguistics challenge: styrian dialects, continuous sleepiness, baby sounds & orca activity. In: Proceedings of Interspeech

  34. Senoussaoui M, Cardinal P, Dehak N, Koerich AL (2016) Native language detection using the i-vector framework. Interspeech 2016:2398–2402

    Article  Google Scholar 

  35. Yoon S, Byun S, Jung K (2018) Multimodal speech emotion recognition using audio and text. In: 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, pp 112–118

  36. Yoon S, Byun S, Dey S, Jung K (2019) Speech emotion recognition using multi-hop attention mechanism. In: ICASSP 2019–2019 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE, pp 2822–2826

  37. Yu D, Deng L (2016) Automatic speech recognition. Springer, Berlin

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministry of Trade, Industry and Energy (MOTIE, Korea) under the Industrial Technology Innovation Program (No.10073144). K. Jung is with ASRI, Seoul National University, Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyomin Jung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Byun, S., Yoon, S. & Jung, K. Comparative studies on machine learning for paralinguistic signal compression and classification. J Supercomput 76, 8357–8371 (2020). https://doi.org/10.1007/s11227-020-03346-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03346-3

Keywords

Navigation