Stress Level Detection in Continuous Speech Using CNNs and a Hybrid Attention Layer

Subramani, R.; Suresh, K.; Donald, A. Cecil; Sivaselvan, K.

doi:10.1007/978-981-99-4071-4_29

R. Subramani^13,13,
K. Suresh¹⁴,
A. Cecil Donald¹⁴ &
…
K. Sivaselvan¹⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 731))

Included in the following conference series:

International Conference On Innovative Computing And Communication

179 Accesses

Abstract

This paper mainly targets stress detection by analyzing the audio signals obtained from human beings. Deep learning is used to model the levels of stress pertaining to this whole paper followed by an analysis of the Mel spectrogram of the audio signals is done. A hybrid attention model helps us achieve the required result. The dataset that has been used for this article is the DAIC-WOZ dataset containing continuous speech files of conversations between a patient and a virtual assistant who is controlled by a human counselor from another room. The best results obtained were a 78.7% accuracy on the classification of the stress levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chlasta K, Wołk K, Krejtz I (2019) Automated speech-based screening of depression using deep convolutional neural networks. ArXiv, abs/1912.01115
Google Scholar
Tanuj MM, Virigineni AA, Mani A, Subramani RR (2021) Comparative study of gradient domain based image blending approaches. In: 2021 international conference on innovative computing, intelligent communication and smart electrical systems (ICSES), Chennai, India, pp 1–5. https://doi.org/10.1109/ICSES52305.2021.9633858
Tomba K, Dumoulin J, Mugellini E, Abou Khaled O, Hawila S (2018) Stress detection through speech analysis. ICETE
Google Scholar
Patil KJKJ, Zope PH, Suralkar SR (2012) Emotion detection from speech using Mfcc and Gmm. Int J Eng Res Technol 1(9)
Google Scholar
Lanjewar RB, Mathurkar SS, Patel N (2015) Implementation and comparison of speech emotion recognition system using gaussian mixture model (GMM) and K-nearest neighbor (K-NN) Techniques. Proced Comput Sci 49:50–57
Article Google Scholar
Chavali ST, Kandavalli CT, STM, SR (2022) Grammar detection for sentiment analysis through improved viterbi algorithm. In: 2022 international conference on advances in computing, communication and applied informatics (ACCAI), Chennai, India, pp 1–6. https://doi.org/10.1109/ACCAI53970.2022.9752551
Gong Y, Poellabauer C (2017) Proceedings of the 7th annual workshop on audio/visual emotion challenge. Topic modeling based multi-modal depression detection. Association for Computing Machinery, New York, pp 69–76
Google Scholar
Subramani R, Vijayalakshmi C (2016) A review on advanced optimization techniques. ARPN J Eng Appl Sci 11(19):11675–11683
Google Scholar
Williamson JR, Godoy EE, Cha M, Schwarzentruber A, Khorrami P, Gwon Y, Kung H-T, Dagli C, Quatieri TF (2016) Proceedings of the 6th international workshop on audio/visual emotion challenge. Detecting depression using vocal, facial and semantic communication cues. Association for Computing Machinery, New York, pp 11–18
Google Scholar
Murugadoss B et al (2021) Blind digital image watermarking using henon chaotic map and elliptic curve cryptography in discrete wavelets with singular value decomposition. In; 2021 international symposium of Asian control association on intelligent robotics and industrial automation (IRIA). IEEE
Google Scholar
Al Hanai T, Ghassemi MM, Glass JR (2018) Interspeech. Detecting depression with audio/text sequence modeling of interviews. International Speech Communication Association, France, pp 1716–1720
Google Scholar
Yang L, Sahli H, Xia X, Pei E, Oveneke MC, Jiang D (2017) Proceedings of the 7th annual workshop on audio/visual emotion challenge. Hybrid depression classification and estimation from audio video and text information. Association for Computing Machinery, New York, pp 45–51
Google Scholar
Ringeval F, Schuller B, Valstar M, Cummins N, Cowie R, Tavabi L, Schmitt M, Alisamir S, Amiriparian S, Messner E-M et al (2019) Proceedings of the 9th international on audio/visual emotion challenge and workshop. Avec 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. Association for Computing Machinery, New York, pp 3–12
Google Scholar
Li C (2022) Robotic emotion recognition using two-level features fusion in audio signals of speech. IEEE Sens J 22(18):17447–17454. https://doi.org/10.1109/JSEN.2021.3065012
Zhou Y, Liang X, Gu Y, Yin Y, Yao L (2022) Multi-classifier interactive learning for ambiguous speech emotion recognition. IEEE/ACM Trans Audio, Speech, Language Proc 30:695–705. https://doi.org/10.1109/TASLP.2022.3145287
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, CHRIST (Deemed to be University), Bengaluru, 560029, India
R. Subramani & R. Subramani
Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, 560029, India
K. Suresh & A. Cecil Donald
Department of Mathematics, St. Thomas College of Arts and Science, Chennai, 600107, India
K. Sivaselvan

Authors

R. Subramani
View author publications
You can also search for this author in PubMed Google Scholar
K. Suresh
View author publications
You can also search for this author in PubMed Google Scholar
A. Cecil Donald
View author publications
You can also search for this author in PubMed Google Scholar
K. Sivaselvan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Subramani .

Editor information

Editors and Affiliations

IT Department, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Tijuana Institute of Technology, Tijuana, Mexico
Oscar Castillo
Department of Computer Science, Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi, India
Sameer Anand
Department of Computer Science, Shaheed Sukhdev College of Business Studies, University of Delhi, New Delhi, India
Ajay Jaiswal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Subramani, R., Suresh, K., Donald, A.C., Sivaselvan, K. (2024). Stress Level Detection in Continuous Speech Using CNNs and a Hybrid Attention Layer. In: Hassanien, A.E., Castillo, O., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. ICICC 2023. Lecture Notes in Networks and Systems, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-99-4071-4_29

Download citation

DOI: https://doi.org/10.1007/978-981-99-4071-4_29
Published: 26 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4070-7
Online ISBN: 978-981-99-4071-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics