Abstract
This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works [2, 6] collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved \(96.5\%\) accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.
Partly supported by FAPESP grants 2020/16543-7 and 2020/06443-5, and by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. Carried out at the Center for Artificial Intelligence (C4AI-USP), supported by FAPESP grant 2019/07665-4 and by the IBM Corporation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Initial tests attain above \(95\%\) accuracy (above 0.93 F1-score) when training and testing on P2 data in all 4 networks. So P2 is not harder, it is only different.
- 2.
“O amor ao próximo ajuda a enfrentar essa fase com a força que a gente precisa”.
- 3.
Performance difference by resampling the audios is minimal.
- 4.
Again, we use 20 epochs, batch size 16, learning rate \(10^{-4}\) and best models are saved.
- 5.
‘O’ (Other) and ‘CM’ represent controls. The other hospitals refer only to patients.
- 6.
Other angles do not add much. Using the PANNs yields similar results.
References
Aluísio, S.M., Camargo Neto, A.C.d, et al.: Detecting respiratory insufficiency via voice analysis: the SPIRA project. In: Practical Machine Learning for Developing Countries at ICLR 2022. Proceeding. ICLR (2022)
Casanova, E., Gris, L., et al.: Deep learning against COVID-19: respiratory insufficiency detection in Brazilian Portuguese speech. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 625–633. ACL, August 2021
Devlin, J., Chang, M.W., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fernandes-Svartman, F., Berti, L., et al.: Temporal prosodic cues for COVID-19 in Brazilian Portuguese speakers. In: Proceedings of Speech Prosody 2022, pp. 210–214 (2022)
Gauy, M., Finger, M.: Acoustic models for Brazilian Portuguese speech based on neural transformers (2023, submitted for publication)
Gauy, M.M., Finger, M.: Audio MFCC-gram transformers for respiratory insufficiency detection in COVID-19. In: STIL 2021, November 2021
Gauy, M.M., Finger, M.: Pretrained audio neural networks for speech emotion recognition in Portuguese. In: Automatic Speech Recognition for Spontaneous and Prepared Speech Speech Emotion Recognition in Portuguese. CEUR-WS (2022)
Gemmeke, J.F., Ellis, D.P., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)
Gong, Y., Lai, C.I., et al.: SSAST: self-supervised audio spectrogram transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10699–10709 (2022)
Khan, S., Naseer, M., et al.: Transformers in vision: a survey. ACM Comput. Surv. 54(10s) (2022)
Kong, Q., Cao, Y., et al.: PANNs: large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2880–2894 (2020)
Liu, A.T., Yang, S.W, et al.: Mockingjay: unsupervised speech representation learning with deep bidirectional transformer encoders. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6419–6423. IEEE (2020)
Robotti, C., Costantini, G., et al.: Machine learning-based voice assessment for the detection of positive and recovered COVID-19 patients. J. Voice (2021)
da Silva, D.P.P., Casanova, E., et al.: Interpretability analysis of deep models for COVID-19 detection. arXiv preprint arXiv:2211.14372 (2022)
Vaswani, A., Shazeer, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gauy, M.M. et al. (2023). Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese. In: Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A. (eds) Artificial Intelligence in Medicine. AIME 2023. Lecture Notes in Computer Science(), vol 13897. Springer, Cham. https://doi.org/10.1007/978-3-031-34344-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-34344-5_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34343-8
Online ISBN: 978-3-031-34344-5
eBook Packages: Computer ScienceComputer Science (R0)