Skip to main content
Log in

Audio compression with multi-algorithm fusion and its impact in speech emotion recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The study examines the impact of multi-algorithm fusion over audio compression with reference to the traditional exercises. For emotion recognition, here the most prominent features ‘Mel Frequency Cepstral Coefficients’ (MFCC) and ‘Discrete Wavelet Transform’ (DWT) features are extracted from prevalent speech samples of Berlin emotional database and Telugu (a south Indian language) database, we proposed automatic emotion recognition system (AERS) based on multi-algorithms fusion. AERS means to monitor and identify unit psychological/emotional state. The extracted features are analyzed using support vector machine, K-NN algorithms used for the classification of different states of emotion. Using two state-of-art mp3, Speex codec with different bit-rates investigated to ensure specific emotional intelligibility. MP3 codec configuration with 96   kbps bit-rate is recommended to pull off high compression for all emotions. Fusion algorithms also performed well compared with individual algorithms. Accuracy of 94.2% using fusion DWT and MFCC compared to 89.1% using DWT and 91.38% using MFCC separately. The accuracy of the proposed method increased further to 94% through a multiresolution approach by approximating frequency along with time information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Banda, N., & Robinson, P. (2011). Noise analysis in audio-visual emotion recognition (pp. 3–6). Cl.Cam.Ac.Uk.

  • Bhaykar, M., Manjunath, K. E., Yadav, J., & Rao, K. S. (2020). Speaker independent emotion recognition from speech using combination of different classification models. Journal of Information and Communication Technology, 14, 15–76.

    Google Scholar 

  • Boersma, P. (2013). The use of Praat in corpus research. Oxford: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.016.

    Book  Google Scholar 

  • Chen, C., You, M., Song, M., Bu, J. & Liu, J. (2006). An enhanced speech emotion recognition system based on discourse information. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (vol. 3991, pp. 449–456). LNCS.

  • Devi, V. A. (2016). An analysis on types of speech recognition and algorithms. IJCST, 4(2), 350–355.

    MathSciNet  Google Scholar 

  • Engberg, I. S., & Hansen, A. V. (1996). Documentation of the Danish emotional speech database DES (Vol. 22). Internal AAU report, Center for Person Kommunikation, Denmark.

  • Ewender, T., Hoffmann, S., & Pfister, B. (2009). Nearly perfect detection of continuous F 0 contour and frame classification for TTS synthesis. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp. 100–103).

  • Felix Burkhardt, A. P. B. W., & Kienast, M. (1999). Berlin emo-db. Retrieved from http://emodb.bilderbar.info/docu/.

  • García, N., Vásquez-Correa, J. C., Arias-Londono, J. D., Várgas-Bonilla, J. F., & Orozco-Arroyave, J. R. (2015). Automatic emotion recognition in compressed speech using acoustic and non-linear features. In 2015 20th symposium on signal processing, images and computer vision, STSIVA 2015—conference proceedings (pp. 1–7). https://doi.org/10.1109/STSIVA.2015.7330399.

  • Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis (pp. 485–492). Berlin: Springer.

    Google Scholar 

  • Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117. https://doi.org/10.1007/s10772-011-9125-1.

    Article  Google Scholar 

  • Kwon, O., Chan, K., Hao, J., & Lee, T. (2003). Emotion recognition by speech signals. In European conference on speech communication and technology (pp. 125–128).

  • Li, L. et al. (2019). Hybrid deep neural network–Hidden Markov model (DNN-HMM) based speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction (pp. 312–317). https://doi.org/10.1109/ACII.2013.58.

  • Lotz, A. F., Siegert, I., Maruschke, M., & Wendemuth, A. (2017). Audio compression and its impact on emotion recognition in affective computing. Elektron. Sprachsignalverarbeitung, 2017, 1–8.

    Google Scholar 

  • Milton, A., Professor, A., Sharmy Roy, S., & Selvi, S. T. (2013). SVM scheme for speech emotion recognition using MFCC feature. International Journal of Computer Applications, 69(9), 975–8887. https://doi.org/10.5120/11872-7667.

    Article  Google Scholar 

  • Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623. https://doi.org/10.1016/S0167-6393(03)00099-2.

    Article  Google Scholar 

  • Processing, D. S. LPC methods digital speech processing—Linear predictive coding ( LPC )—Introduction LPC methods LPC methods basic principles of LP LP estimation issues solution for k solution for k solution for k solution for k (pp. 1–14)

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. https://doi.org/10.1109/5.18626.

    Article  Google Scholar 

  • Rajisha, T. M., Sunija, A. P., & Riyas, K. S. (2016). Performance analysis of Malayalam language speech emotion recognition system using ANN/SVM. Procedia Technology, 24, 1097–1104. https://doi.org/10.1016/j.protcy.2016.05.242.

    Article  Google Scholar 

  • Reddy, A. P., & Vijayarajan, V. (2017). Extraction of emotions from speech—A survey. International Journal of Applied Engineering Research, 12(16), 5760–5767.

    Google Scholar 

  • Rix, A. W. (2004). Perceptual speech quality assessment-a review. In IEEE international conference on acoustics speech and signal processing (pp. 17–21). https://doi.org/10.1109/ICASSP.2004.1326730.

  • Rongqing, H., & Changxue, M. (2006). Toward a speaker-independent real-time affect detection system. Proceedings of the International Conference on Pattern Recognition, 1, 1204–1207. https://doi.org/10.1109/ICPR.2006.1127.

    Article  Google Scholar 

  • Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech emotion recognition. In Proceedings of the 2003 IEEE international conference on acoustics, speech, and signal processing (ICASSP ’03) (Vol. 2, pp. 401–404). https://doi.org/10.1109/ICASSP.2003.1202279.

  • Shen, P., Changjun, Z., & Chen, X. (2011). Automatic speech emotion recognition using support vector machine. In EMEIT: 2011 international conference on electronic & mechanical engineering and information technology (Vol. 2, pp. 621–625). https://doi.org/10.1109/EMEIT.2011.6023178.

  • Still, A. I. (2020). Generation stress at any age is still stress (pp. 19–21).

  • Styler, W. (2013). Using Praat for linguistic research. Savevowels, 1, 1. https://doi.org/10.3758/brm.40.3.822.

    Article  Google Scholar 

  • Verma, G. K., Tiwary, U. S., & Agrawal, S. (2011). Multi-algorithm fusion for speech emotion recognition (pp. 452–459).

  • Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181. https://doi.org/10.1016/j.specom.2006.04.003.

    Article  Google Scholar 

  • Yadav, J., Kumari, A., & Rao, K. S. (2015). Emotion recognition using LP residual at sub-segmental, segmental and supra-segmental levels. In Proceedings of the 2015 international conference on communication, information computing technology (ICCICT) (Vol. v). https://doi.org/10.1109/ICCICT.2015.7045735.

Download references

Acknowledgements

We thank all the volunteers’ who helped us in making the Telugu database. Presently the database is under review with the committee for endorsement and will be made available publicly.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Vijayarajan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reddy, A.P., Vijayarajan, V. Audio compression with multi-algorithm fusion and its impact in speech emotion recognition. Int J Speech Technol 23, 277–285 (2020). https://doi.org/10.1007/s10772-020-09689-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09689-9

Keywords

Navigation