Comparative studies on machine learning for paralinguistic signal compression and classification

Byun, Seokhyun; Yoon, Seunghyun; Jung, Kyomin

doi:10.1007/s11227-020-03346-3

Comparative studies on machine learning for paralinguistic signal compression and classification

Published: 13 June 2020

Volume 76, pages 8357–8371, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Seokhyun Byun¹^na1,
Seunghyun Yoon¹^na1 &
Kyomin Jung¹

248 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we focus on various compression and classification algorithms for three different paralinguistic signal classification tasks. These tasks are quite difficult for humans because the sound information from such signals is difficult to distinguish. Therefore, when machine learning techniques are applied to analyze paralinguistic signals, several different aspects of speech-related information, such as prosody, energy, and cepstral information, are usually considered for feature extraction. However, when the size of the training corpus is not sufficiently large, it is extremely difficult to directly apply machine learning to classify such signals due to their high feature dimensions; this problem is also known as the curse of dimensionality. This paper proposes to address this limitation by means of feature compression. First, we present experimental results obtained by using various compression algorithms to compress signals to eliminate redundancy of the signal features. We observe that compared with the original features, the compressed signal features still provide a comparable ability to distinguish the signals, especially when using a fully connected neural network classifier. Second, we calculate the output distribution of the F1-score for each emotion in the speech emotion recognition problem and show that the fully connected neural network classifier performs more stably than other classical methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for time series classification: a review

Article 02 March 2019

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

A review of convolutional neural network architectures and their optimizations

Article 22 June 2022

References

Aldeneh Z, Provost EM (2017) Using regional saliency for speech emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2741–2745
Amiriparian S, Gerczuk M, Ottl S, Cummins N, Freitag M, Pugachevskiy S, Baird A, Schuller BW (2017) Snore sound classification using image-based deep spectrum features. In: INTERSPEECH, pp 3512–3516
Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Casper J, Catanzaro B, Cheng Q, Chen G, et al. (2016) Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp 173–182
Bandela SR, Kishpre KT (2019) Speech emotion recognition using semi-NMF feature optimization. Turk J Electr Eng Comput Sci 27(5):3741–3757
Article Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM, pp 144–152
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: Interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335
Article Google Scholar
Byun S, Yoon S, Jung K (2019) Neural networks for compressing and classifying speaker-independent paralinguistic signals. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, pp 1–4
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. ACM, pp 785–794
Chiou BC, Chen CP (2013) Feature space dimension reduction in speech emotion recognition using support vector machine. In: 2013 Asia–Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, pp 1–6
Cho J, Pappagari R, Kulkarni P, Villalba J, Carmiel Y, Dehak N (2018) Deep neural networks for emotion recognition combining audio and transcripts. Proc Interspeech 2018:247–251
Article Google Scholar
Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia. ACM, pp 835–838
Fewzee P, Karray F (2012) Dimensionality reduction for emotional speech recognition. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing. IEEE, pp 532–537
Gamage KW, Sethu V, Ambikairajah E (2017) Salience based lexical features for emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5830–5834
Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth Annual Conference of the International Speech Communication Association
Hantke S, Eyben F, Appel T, Schuller B (2015) ihearu-play: introducing a game for crowdsourced data collection for affective computing. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp 891–897
Hantke S, Sagha H, Cummins N, Schuller B (2017) Emotional speech of mentally and physically disabled individuals: introducing the emotass database and first findings. Proc Interspeech 2017:3137–3141
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448–456
Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp 971–980
Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. In: Sixteenth Annual Conference of the International Speech Communication Association
Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. 2017 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE, pp 2227–2231
Neumann M, Vu NT (2017) Attentive convolutional neural network based speech emotion recognition: a study on the impact of input features, signal length, and acted speech. Proc Interspeech 2017:1263–1267
Article Google Scholar
Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. Int J Smart Home 6(2):101–108
Google Scholar
Panwar S, Rad P, Choo KKR, Roopaei M (2019) Are you emotional or depressed? Learning about your emotional state from your music using machine learning. J Supercomput 75(6):2986–3009
Article Google Scholar
Quan C, Wan D, Zhang B, Ren F (2013) Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition. In: Proceedings of the 2013 IEEE/SICE International Symposium on System Integration. IEEE, pp 222–226
Sahu S, Gupta R, Sivaraman G, AbdAlmageed W, Espy-Wilson C (2017) Adversarial auto-encoders for speech based emotion recognition. Proc Interspeech 2017:1243–1247
Article Google Scholar
Sahu S, Gupta R, Espy-Wilson C (2018) On enhancing speech emotion recognition using generative adversarial networks. Proc Interspeech 2018:3693–3697
Article Google Scholar
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP’03), vol 2. IEEE, pp II-1
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings (ICASSP’04), vol 1. IEEE, pp I-577
Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, MüLler C, Narayanan S (2013) Paralinguistics in speech and language-state-of-the-art and the challenge. Comput Speech Lang 27(1):4–39
Article Google Scholar
Schuller B, Steidl S, Batliner A, Hirschberg J, Burgoon JK, Baird A, Elkins A, Zhang Y, Coutinho E, Evanini K (2016) The interspeech 2016 computational paralinguistics challenge: deception, sincerity & native language. Interspeech 2016:2001–2005
Article Google Scholar
Schuller B, Steidl S, Batliner A, Bergelson E, Krajewski J, Janott C, Amatuni A, Casillas M, Seidl A, Soderstrom M, et al (2017) The interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In: Computational Paralinguistics Challenge (ComParE), Interspeech 2017, pp 3442–3446
Schuller B, Steidl S, Batliner A, Marschik PB, Baumeister H, Dong F, Hantke S, Pokorny FB, Rathner EM, Bartl-Pokorny KD et al (2018) The interspeech 2018 computational paralinguistics challenge: atypical & self-assessed affect, crying & heart beats. Proc Interspeech 2018:122–126
Article Google Scholar
Schuller BW, Batliner A, Bergler C, Pokorny FB, Krajewski J, Cychosz M, Schmitt M, et al (2019) The interspeech 2019 computational paralinguistics challenge: styrian dialects, continuous sleepiness, baby sounds & orca activity. In: Proceedings of Interspeech
Senoussaoui M, Cardinal P, Dehak N, Koerich AL (2016) Native language detection using the i-vector framework. Interspeech 2016:2398–2402
Article Google Scholar
Yoon S, Byun S, Jung K (2018) Multimodal speech emotion recognition using audio and text. In: 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, pp 112–118
Yoon S, Byun S, Dey S, Jung K (2019) Speech emotion recognition using multi-hop attention mechanism. In: ICASSP 2019–2019 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE, pp 2822–2826
Yu D, Deng L (2016) Automatic speech recognition. Springer, Berlin
MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Ministry of Trade, Industry and Energy (MOTIE, Korea) under the Industrial Technology Innovation Program (No.10073144). K. Jung is with ASRI, Seoul National University, Korea.

Author information

Seokhyun Byun and Seunghyun Yoon contributed equally to this work.

Authors and Affiliations

Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea
Seokhyun Byun, Seunghyun Yoon & Kyomin Jung

Authors

Seokhyun Byun
View author publications
You can also search for this author in PubMed Google Scholar
Seunghyun Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Kyomin Jung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyomin Jung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Byun, S., Yoon, S. & Jung, K. Comparative studies on machine learning for paralinguistic signal compression and classification. J Supercomput 76, 8357–8371 (2020). https://doi.org/10.1007/s11227-020-03346-3

Download citation

Published: 13 June 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11227-020-03346-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative studies on machine learning for paralinguistic signal compression and classification

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

A survey of the recent architectures of deep convolutional neural networks

A review of convolutional neural network architectures and their optimizations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparative studies on machine learning for paralinguistic signal compression and classification

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

A survey of the recent architectures of deep convolutional neural networks

A review of convolutional neural network architectures and their optimizations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation