A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent

Kumar, Yogesh; Koul, Apeksha; Mahajan, Seema

doi:10.1007/s00500-022-07261-y

A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent

Foundations
Published: 25 July 2022

Volume 26, pages 8253–8272, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

604 Accesses
21 Citations
Explore all metrics

Abstract

The article examined the deep learning models and Fastai text classification technique to predict the medical speech utterances, transcriptions, and intent to extract the 25 medicals problems. The experimental work was conducted using a large amount of data which contains 6661.wav files and one.csv file, including 13 distinct categorization fields of medical speech utterances. Each illness's exploratory data analysis demonstrated the phrase length classes and disease categorization based on the recorded speech sound of patients for each disease. The preprocessing of the task included the wordcloud consisting of all the vocabulary words having different sizes based on the number of speech utterances in each category, eliminating Nan values, verifying for duplicates, and computing the corpus and their term index. Further, features are extracted to determine the number of words in each category, the length of phrases, and the number of words in each phrase, followed by lemmatization and tokenization. Deep learning models such as GRU (Gated Recurrent Unit), LSTM (Long Short Term Memory), bidirectional gated recurrent unit, bidirectional long short-term memory, and Fastai classifier have been used to exact category of disease from the medical speech utterances and their textual phrases. After the assessment, it was discovered that Fastai earned the most incredible precision, recall, accuracy, and lowest loss rate by 96.89%, 95.8%, 93.32%, and 0.169, respectively. In comparison, bidirectional LSTM had achieved the highest F1 score by 95.69% to predict the medical speech utterances for each category.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on the long short-term memory model

Article 13 May 2020

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Article Open access 17 February 2024

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

Data availability

Not applicable.

References

Abdelgwad MM, Soliman THA, Taloba AI, Farghaly MF (2021) Arabic aspect based sentiment analysis using bidirectional GRU based models. J King Saud Univ–comput Inf Sci. https://doi.org/10.1016/j.jksuci.2021.08.030
Article Google Scholar
Akinloye FO, Obe O, Boyinbode O (2020) Development of an affective-based e-healthcare system for autistic children. Sci African 9:e00514. https://doi.org/10.1016/j.sciaf.2020.e00514
Article Google Scholar
Al-Hassan A, Al-Dossari H (2021) Detection of hate speech in Arabic tweets using deep learning. Multimedia Syst. https://doi.org/10.1007/s00530-020-00742-w
Article Google Scholar
Alhussein M, Muhammad G (2018) Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 6:41034–41041. https://doi.org/10.1109/ACCESS.2018.2856238
Article Google Scholar
Blackley SV, Huynh J, Wang L, Korach Z, Zhou L (2019) Speech recognition for clinical documentation from 1990 to 2018: a systematic review. J Am Med Inform Assoc 26(4):324–338. https://doi.org/10.1093/jamia/ocy179
Article Google Scholar
Dey R, Sale F (2017) Gate variants of Gated Recurrent Unit (GRU) neural networks. In: 60th International Midwest Symposium on Circuits and Systems, pp 1597–1600
Graves, A., Jaitly, N., Mohamed, A. (2013) Hybrid Speech Recognition with Deep Bidirectional LSTM. In: IEEE workshop on Automatic Speech Recognition and Understanding, pp 273–278
Ismail A, Abdlerazek S, El-Henawy IM (2020) Development of smart healthcare system based on speech recognition using support vector machine and dynamic time warping. Sustain (switz). https://doi.org/10.3390/su12062403
Article Google Scholar
Jayashankar S, Sridaran R (2017) Superlative model using wordcloud for short answers evaluation in eLearning. Educ Inf Technol 22:2383–2402. https://doi.org/10.1007/s10639-016-9547-0
Article Google Scholar
Johnson M, Lapkin S, Long V, Sanchez P, Suominen H, Basilakis J, Dawson L (2014) A systematic review of speech recognition technology in health care. BMC Med Inform Decis Mak. https://doi.org/10.1186/1472-6947-14-94
Article Google Scholar
Krishnan PT, Joseph Raj AN, Rajangam V (2021) Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell Syst 7:1919–1934. https://doi.org/10.1007/s40747-021-00295-z
Article Google Scholar
Kumah-Crystal YA, Pirtle CJ, Whyte HM, Goode ES, Anders SH, Lehmann CU (2018) Electronic health record interactions through voice: a review. Appl Clin Inform 9(3):541–552. https://doi.org/10.1055/s-0038-1666844
Article Google Scholar
Kumar Y, Singh N, Kumar M, Singh A (2021) AutoSSR: an efficient approach for automatic spontaneous speech recognition model for the Punjabi language. Soft Comput 25(2):1617–1630. https://doi.org/10.1007/s00500-020-05248-1
Article Google Scholar
Lam HY, Tang YM, Tang V, Wu CH (2020) An intelligent m-healthcare system for improving the service quality in domestic care industry. IFAC-PapersOnLine 53(2):17439–17444. https://doi.org/10.1016/j.ifacol.2020.12.2113
Article Google Scholar
Latif S, Qadir J, Qayyum A, Usama M, Younis S (2021) Speech technology for healthcare opportunities challenges, and state of the art. IEEE Rev Biomed Eng 14:342–356. https://doi.org/10.1109/RBME.2020.3006860
Article Google Scholar
Lazzarini V (2019) Soundfiles. In: Computer music instruments II. Springer, Cham. https://doi.org/10.1007/978-3-030-13712-0_10
Louinci K, Meziani K, Riu B (2021) Muddling label regularization deep learning for tabular datasets. arXiv, pp 1–36
Lu L, Sheng J, Liu Z, Gao JH (2021) Neural representations of imagined speech revealed by frequency-tagged magnetoencephalography responses. Neuroimage 229:117724. https://doi.org/10.1016/j.neuroimage.2021.117724
Article Google Scholar
Luchies E, Spruit M, Askari M (2018) Speech technology in Dutch health care: A qualitative study. In: HEALTHINF 2018–11th international conference on health informatics, proceedings; part of 11th international joint conference on biomedical engineering systems and technologies, BIOSTEC, vol 5, pp 339–348. https://doi.org/10.5220/0006550103390348
Mehta RP, Sanghvi MA, Shah DK, Singh A (2020) Sentiment analysis of tweets using supervised learning algorithms. In: Luhach A, Kosa J, Poonia R, Gao XZ, Singh D (eds) First international conference on sustainable technologies for computational intelligence advances in intelligent systems and computing. Springer, Singapore. https://doi.org/10.1007/978-981-15-0029-9_26
Chapter Google Scholar
Mohamed J, Zweig G, Gong Y (2015) LSTM time and frequency recurrence for automatic speech recognition. IEEE Workshop Autom Speech Recognit Underst (ASRU). https://doi.org/10.1109/ASRU.2015.7404793
Article Google Scholar
Mohammed MA, Abdulkareem KH, Mostafa SA, Ghani MKA, Maashi MS, Garcia-Zapirain B, Oleagordia I, Alhakami H, Al-Dhief FT (2020) Voice pathology detection and classification using convolutional neural network model. Appl Sci (switz) 10(11):1–13. https://doi.org/10.3390/app10113723
Article Google Scholar
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks a systematic review. IEEE Access 7:19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880
Article Google Scholar
Noort MC, Reader TW, Gillespie A (2021) The sounds of safety silence: interventions and temporal patterns unmute unique safety voice content in speech. Saf Sci 140:105289. https://doi.org/10.1016/j.ssci.2021.105289
Article Google Scholar
Patil S, Agashe S (2021) Comparison of neural network architectures for speech emotion recognition. In: Biswas A, Wennekes E, Hong TP, Wieczorkowska A (eds) Advances in speech and music technology. advances in intelligent systems and computing. Springer, Singapore. https://doi.org/10.1007/978-981-33-6881-1_25
Chapter Google Scholar
Paulett JM, Langlotz CP (2009) Improving language models for radiology speech recognition. J Biomed Inform 42(1):53–58. https://doi.org/10.1016/j.jbi.2008.08.001
Article Google Scholar
Poder TG, Fisette JF, Déry V (2018) Speech recognition for medical dictation: overview in quebec and systematic review. J Med Syst. https://doi.org/10.1007/s10916-018-0947-0
Article Google Scholar
Ramasubramanian K, Singh A (2019) Deep learning using keras and tensorflow. In: Machine learning using R. Apress, Berkeley. https://doi.org/10.1007/978-1-4842-4215-5_11
Santosh KC (2019) Speech processing in healthcare can we integrate. In: Intelligent speech signal processing. Elsevier. https://doi.org/10.1016/B978-0-12-818130-0.00001-5
Shukla S, Jain M (2021) A novel stochastic deep resilient network for effective speech recognition. Int J Speech Technol 24:797–806. https://doi.org/10.1007/s10772-021-09851-x
Article Google Scholar
Sonal J, Dodiya T (2016) Speech recognition system for medical domain pdf. Int J Comput Sci Inf Technol 7(1):185–189
Google Scholar
Suominen H, Zhou L, Goeuriot L, Kelly L (2016) Task 1 of the CLEF ehealth evaluation lab 2016 handover information extraction. CEUR Workshop Proceed 1609:1–14
Google Scholar
Takao T, Masumura R, Sakauchi S, Ohara Y, Bilgic E, Umegaki E, Kutsumi H, Azuma T, Medicine A, Takao T (2018) New report preparation system for endoscopic procedures using speech recognition technology, pp 6–8. 10–1055-a-0579–6494.
Uddin MZ, Nilsson EG (2020) Emotion recognition using speech and neural structured learning to facilitate edge intelligence. Eng Appl Artif Intell 94:103775. https://doi.org/10.1016/j.engappai.2020.103775
Article Google Scholar
van Lente H, Boon WPC, Klerkx L (2020) Positioning of systemic intermediaries in sustainability transitions between storylines and speech acts. Environ Innov Soc Trans 36:485–497. https://doi.org/10.1016/j.eist.2020.02.006
Article Google Scholar
Vij A, Pruthi J (2018) An automated psychometric analyzer based on sentiment analysis and emotion recognition for healthcare. Proced Comput Sci 132:1184–1191. https://doi.org/10.1016/j.procs.2018.05.033
Article Google Scholar
Zhang F, Underwood G, McGuire K, Liang C, Moore DR, Fu QJ (2019) Frequency change detection and speech perception in cochlear implant users. Hear Res 379:12–20. https://doi.org/10.1016/j.heares.2019.04.007
Article Google Scholar
Zisad SN, Hossain MS, Andersson K (2020) Speech emotion recognition in neurological disorders using convolutional neural network. In: Mahmud M, Vassanelli S, Kaiser MS, Zhong N (eds) Brain informatics bi 2020 lecture notes in computer science. Springer, Cham. https://doi.org/10.1007/978-3-030-59277-6_26
Chapter Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India
Yogesh Kumar
Department of Computer Science and Engineering, Punjabi University, Patiala, India
Apeksha Koul
Department of Computer Engineering, Indus Institute of Technology & Engineering, Indus University, Rancharda, Shilaj, Ahmedabad, 382115, Gujarat, India
Seema Mahajan

Authors

Yogesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Apeksha Koul
View author publications
You can also search for this author in PubMed Google Scholar
Seema Mahajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yogesh Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, Y., Koul, A. & Mahajan, S. A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent. Soft Comput 26, 8253–8272 (2022). https://doi.org/10.1007/s00500-022-07261-y

Download citation

Accepted: 17 May 2022
Published: 25 July 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s00500-022-07261-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

TextConvoNet: a convolutional neural network based architecture for text classification

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

TextConvoNet: a convolutional neural network based architecture for text classification

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation