Abstract
This paper mainly targets stress detection by analyzing the audio signals obtained from human beings. Deep learning is used to model the levels of stress pertaining to this whole paper followed by an analysis of the Mel spectrogram of the audio signals is done. A hybrid attention model helps us achieve the required result. The dataset that has been used for this article is the DAIC-WOZ dataset containing continuous speech files of conversations between a patient and a virtual assistant who is controlled by a human counselor from another room. The best results obtained were a 78.7% accuracy on the classification of the stress levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chlasta K, Wołk K, Krejtz I (2019) Automated speech-based screening of depression using deep convolutional neural networks. ArXiv, abs/1912.01115
Tanuj MM, Virigineni AA, Mani A, Subramani RR (2021) Comparative study of gradient domain based image blending approaches. In: 2021 international conference on innovative computing, intelligent communication and smart electrical systems (ICSES), Chennai, India, pp 1–5. https://doi.org/10.1109/ICSES52305.2021.9633858
Tomba K, Dumoulin J, Mugellini E, Abou Khaled O, Hawila S (2018) Stress detection through speech analysis. ICETE
Patil KJKJ, Zope PH, Suralkar SR (2012) Emotion detection from speech using Mfcc and Gmm. Int J Eng Res Technol 1(9)
Lanjewar RB, Mathurkar SS, Patel N (2015) Implementation and comparison of speech emotion recognition system using gaussian mixture model (GMM) and K-nearest neighbor (K-NN) Techniques. Proced Comput Sci 49:50–57
Chavali ST, Kandavalli CT, STM, SR (2022) Grammar detection for sentiment analysis through improved viterbi algorithm. In: 2022 international conference on advances in computing, communication and applied informatics (ACCAI), Chennai, India, pp 1–6. https://doi.org/10.1109/ACCAI53970.2022.9752551
Gong Y, Poellabauer C (2017) Proceedings of the 7th annual workshop on audio/visual emotion challenge. Topic modeling based multi-modal depression detection. Association for Computing Machinery, New York, pp 69–76
Subramani R, Vijayalakshmi C (2016) A review on advanced optimization techniques. ARPN J Eng Appl Sci 11(19):11675–11683
Williamson JR, Godoy EE, Cha M, Schwarzentruber A, Khorrami P, Gwon Y, Kung H-T, Dagli C, Quatieri TF (2016) Proceedings of the 6th international workshop on audio/visual emotion challenge. Detecting depression using vocal, facial and semantic communication cues. Association for Computing Machinery, New York, pp 11–18
Murugadoss B et al (2021) Blind digital image watermarking using henon chaotic map and elliptic curve cryptography in discrete wavelets with singular value decomposition. In; 2021 international symposium of Asian control association on intelligent robotics and industrial automation (IRIA). IEEE
Al Hanai T, Ghassemi MM, Glass JR (2018) Interspeech. Detecting depression with audio/text sequence modeling of interviews. International Speech Communication Association, France, pp 1716–1720
Yang L, Sahli H, Xia X, Pei E, Oveneke MC, Jiang D (2017) Proceedings of the 7th annual workshop on audio/visual emotion challenge. Hybrid depression classification and estimation from audio video and text information. Association for Computing Machinery, New York, pp 45–51
Ringeval F, Schuller B, Valstar M, Cummins N, Cowie R, Tavabi L, Schmitt M, Alisamir S, Amiriparian S, Messner E-M et al (2019) Proceedings of the 9th international on audio/visual emotion challenge and workshop. Avec 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. Association for Computing Machinery, New York, pp 3–12
Li C (2022) Robotic emotion recognition using two-level features fusion in audio signals of speech. IEEE Sens J 22(18):17447–17454. https://doi.org/10.1109/JSEN.2021.3065012
Zhou Y, Liang X, Gu Y, Yin Y, Yao L (2022) Multi-classifier interactive learning for ambiguous speech emotion recognition. IEEE/ACM Trans Audio, Speech, Language Proc 30:695–705. https://doi.org/10.1109/TASLP.2022.3145287
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Subramani, R., Suresh, K., Donald, A.C., Sivaselvan, K. (2024). Stress Level Detection in Continuous Speech Using CNNs and a Hybrid Attention Layer. In: Hassanien, A.E., Castillo, O., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. ICICC 2023. Lecture Notes in Networks and Systems, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-99-4071-4_29
Download citation
DOI: https://doi.org/10.1007/978-981-99-4071-4_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4070-7
Online ISBN: 978-981-99-4071-4
eBook Packages: EngineeringEngineering (R0)