Handling high dimensional features by ensemble learning for emotion identification from speech signal

Ashok Kumar, Konduru; Iqbal, J. L. Mazher

doi:10.1007/s10772-021-09916-x

Handling high dimensional features by ensemble learning for emotion identification from speech signal

Published: 02 November 2021

Volume 25, pages 837–851, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Konduru Ashok Kumar¹ &
J. L. Mazher Iqbal²

173 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In the recent past, handling the curse of dimensionality observed in acoustic features of the speech signal in machine learning-based emotion detection has been considered a crucial objective. The contemporary emotion prediction methods are experiencing false alarming due to the high dimensionality of the features used in training phase of the machine learning models. The majority of the contemporary models have endeavored to handle the curse of high dimensionality of the training corpus. However, the contemporary models are focusing more on using fusion of multiple classifiers, which is barely improvising the decision accuracy, if the volume of the training corpus is high. The contribution of this manuscript endeavored to portray a novel ensemble model that using fusion of diversity measures to suggest the optimal features. Moreover, the proposed method attempts to reduce the impact of the high dimensionality in feature values by using a novel clustering process. The experimental study signifies the proposed method performance in term of emotion prediction from speech signals and compared to contemporary models of emotion detection using machine learning. The fourfold cross-validation of standard data corpus has used in performance analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble softmax regression model for speech emotion recognition

Article 02 April 2016

Detecting Human Emotions in a Large Size of Database by Using Ensemble Classification Model

Article 29 June 2018

Four-stage feature selection to recognize emotion from speech signals

Article 29 July 2015

Abbreviations

SER:: Speech emotion recognition
ML:: Machine learning
MFC:: Mel-frequency cepstral coefficient
LPCC:: Linear predictive cepstral coefficient
PLP:: Perceptual linear prediction
ZCR:: Zero-crossing rate
DNN:: Deep neural network
CWT:: Continuous wavelet transforms
EMD:: Empirical mode decomposition
MS:: Modulation spectral
HAF:: Hybrid acoustic features
DFODM:: Digital feature optimization using fusion of diversity measures
WSR:: Wilcoxon signed-rank

References

Alonso, J. B. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Expert Systems with Applications, 42(24), 9554–9564.
Article Google Scholar
Basu, S. C. (2017). A review on emotion recognition using speech. In 2017 International conference on inventive communication and computational technologies (ICICCT) (pp. 109–114). IEEE.
Bhavan, A. C. (2019). Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems, 184, 104886.
Article Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Article MATH Google Scholar
Budak, H. (2016). A modified t-score for feature selection. Anadolu University Journal of Science and Technology A-Applied Sciences and Engineering, 17(5), 845–852.
Google Scholar
Cao, H. V. (2015). peaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer Speech & Language, 29(1), 186–202.
Article Google Scholar
Cong, P., Wang, C., Ren, Z., Wang, H., Wang, Y., & Feng, J. (2016). Unsatisfied customer call detection with deep learning. In Proceedings of the 2016 10th international symposium on chinese spoken language processing (ISCSLP), 1–5.
Dietterich, T. G. (2000). Ensemble methods in machine learning. In International workshop on multiple classifier systems, 1–15.
Getahun, F. (2016). Emotion identification from spontaneous communication. In 2016 12th international conference on signal-image technology & internet-based systems (SITIS), 151–158.
Ghasemi, A. (2012). Normality tests for statistical analysis: A guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486.
Article Google Scholar
Hu, Q. H. (2007). Fault diagnosis of rotating machinery based on improved wavelet package transform and SVMs ensemble. Mechanical Systems and Signal Processing, 21(2), 688–705.
Article Google Scholar
Jiang, W. W. (2019). Speech emotion recognition with heterogeneous feature unification of deep neural network. Sensors, 19(12), 2730.
Article Google Scholar
K Ashok Kumar, J. L. (2020). Digital feature optimization using fusion of diversity measures for emotion identification from speech signal. Journal of Ambient Intelligence and Humanized Computing, 1–13.
Kerkeni, L. S. (2019). Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Communication, 114, 22–35.
Article Google Scholar
Khan, A., & Roy, U. (2017). motion recognition using prosodie and spectral features of speech and Naïve Bayes Classifier. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET) (pp. 1017–1021). IEEE.
Kim, H. C. (2002). Support vector machine ensemble with bagging. In International workshop on support vector machines, 397–408.
Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.
Article Google Scholar
Liu, F. B. (2016). Boost picking: a universal method on converting supervised classification to semi-supervised classification. arXiv preprint arXiv:abs/1602.05659.
Liu, Z. T. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280.
Article Google Scholar
Livingstone, S. R. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.
Article Google Scholar
Luengo, I. N. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.
Article Google Scholar
Mao, Q. Z. (2011). Extraction and analysis for non-personalized emotion features of speech. Advances in Information Sciences and Service Sciences, 3(10).
Matsuki, K. K. (2016). The Random Forests statistical technique: An examination of its value for the study of reading. Scientific Studies of Reading, 20(1), 20–33.
Article Google Scholar
McKnight, P. E. (2010). Mann-Whitney U test. The Corsini Encyclopedia of Psychology, 1–1.
Moretti, F. P. (2015). Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling. Neurocomputing, 167, 3–7.
Article Google Scholar
Morrison, D. W. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112.
Article Google Scholar
Ozcift, A. (2011). Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Computer Methods and Programs in Biomedicine, 104(3), 443–451.
Article Google Scholar
Palo, H. K. (2017). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, 7(4), 426–442.
Article MathSciNet Google Scholar
Pérez-Espinosa, H.R.-G.-P. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.
Article Google Scholar
pycharm. (n.d.). Retrieved from https://www.jetbrains.com/pycharm/download/: https://www.jetbrains.com/pycharm/download/.
Python. (n.d.). Retrieved from https://www.python.org/downloads/; https://www.python.org/downloads/.
Quinlan, J. R. (1996). Bagging, boosting, and C4. 5. Aaai/iaai, 1, 725–730.
Google Scholar
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
Article Google Scholar
Schuller, B. R.-H. (2005). Speaker independent speech emotion recognition by ensemble classification. In 2005 IEEE international conference on multimedia and expo (pp. 864–867). IEEE.
Semwal, N. K. (2017). Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) (pp. 1–6).
Shaqra, F.A.-A. (2019). Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Computer Science, 151, 37–44.
Article Google Scholar
Shasidhar, M. R. (2011). MRI brain image segmentation using modified fuzzy c-means clustering algorithm. In 2011 international conference on communication systems and network technologies (pp. 473–478). IEEE.
Shegokar, P. &. (2016). Continuous wavelet transform based speech emotion recognition. In 2016 10th international conference on signal processing and communication systems (ICSPCS) (pp. 1–8). IEEE.
Stuhlsatz, A. M. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691). IEEE.
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., & Schuller, B. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691). IEEE.
Sun, L. Z. (2019). Speech emotion recognition based on DNN-decision tree SVM model. Speech Communication, 115, 29–37.
Article Google Scholar
Sun, X. (2002). Pitch accent prediction using ensemble machine learning. In Seventh international conference on spoken language processing.
Test, F. (n.d.). Retrieved from https://statistics.laerd.com/spss-tutorials/friedman-test-using-spss-statistics.php; https://statistics.laerd.com/spss-tutorials/friedman-test-using-spss-statistics.php.
Test, K.-W. (n.d.). Retrieved from https://www.statisticssolutions.com/kruskal-wallis-test/; https://www.statisticssolutions.com/kruskal-wallis-test/.
t-table. (2017). Retrieved from http://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf.
van Veen, H. J. (2015). Le Nguyen The Dat Armando Segnini. Kaggle Ensembling Guide. Retrieved from https://mlwave.com/kaggle-ensembling-guide/.
Wang, G. H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1), 223–230.
Article Google Scholar
wav-to-csv. (n.d.). Retrieved from https://github.com/Lukious/wav-to-csv; https://github.com/Lukious/wav-to-csv.
Woolson, R. F. (2007). Wilcoxon signed‐rank test. Wiley encyclopedia of clinical trials, 1–3.
Yang, X. S. (2014). Cuckoo search: Recent advances and applications. Neural Computing and Applications, 24(1), 169–174.
Article Google Scholar
Yu, Z. Z. (2017). Adaptive semi-supervised classifier ensemble for high dimensional data classification. IEEE Transactions on Cybernetics, 49(2), 366–379.
Article Google Scholar
Zareapoor, M. (2015). Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Computer Science, 48, 679–685.
Article Google Scholar
Zhang, Z. C. (2014). Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 15–126.
Google Scholar
Zvarevashe, K. (2020). Ensemble learning of hybrid acoustic features for speech emotion recognition. Algorithms, 13(3), 70.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Veltech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, India
Konduru Ashok Kumar
ECE, Institute of Science and Technology, Veltech Rangarajan Dr. Sagunthala R&D, Avadi, Chennai, India
J. L. Mazher Iqbal

Authors

Konduru Ashok Kumar
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Mazher Iqbal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konduru Ashok Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ashok Kumar, K., Iqbal, J.L.M. Handling high dimensional features by ensemble learning for emotion identification from speech signal. Int J Speech Technol 25, 837–851 (2022). https://doi.org/10.1007/s10772-021-09916-x

Download citation

Received: 17 April 2021
Accepted: 29 September 2021
Published: 02 November 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10772-021-09916-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling high dimensional features by ensemble learning for emotion identification from speech signal

Abstract

Access this article

Similar content being viewed by others

Ensemble softmax regression model for speech emotion recognition

Detecting Human Emotions in a Large Size of Database by Using Ensemble Classification Model

Four-stage feature selection to recognize emotion from speech signals

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Handling high dimensional features by ensemble learning for emotion identification from speech signal

Abstract

Access this article

Similar content being viewed by others

Ensemble softmax regression model for speech emotion recognition

Detecting Human Emotions in a Large Size of Database by Using Ensemble Classification Model

Four-stage feature selection to recognize emotion from speech signals

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation