Evaluation of the accuracy of an artificial intelligence in identifying contraindications to exercise therapy - Comparison with and interrater reliability of physical therapists judgments

Annika, Griefahn; Christoff, Zalpour; Kerstin, Luedtke

doi:10.1007/s12553-024-00827-w

Evaluation of the accuracy of an artificial intelligence in identifying contraindications to exercise therapy - Comparison with and interrater reliability of physical therapists judgments

Original Paper
Published: 20 February 2024

Volume 14, pages 513–522, (2024)
Cite this article

Health and Technology Aims and scope Submit manuscript

156 Accesses
Explore all metrics

Abstract

Purpose

The study validates a rule-based system for identifying contraindications to exercise therapy in a medical context. It evaluates accuracy and performance by comparing it with physical therapists’ assessments and patients' characteristics.

Method

The dataset included 80 patient cases with clinical characteristics assessed by 20 physical therapists for contraindications to exercise therapy. Fleiss kappa and pooled kappa values measured agreement between physical therapists and AI. AI performance was assessed by sensitivity, specificity, accuracy and F1 score. Clinical characteristics were compared between therapists' votes using ANOVA and Bonferroni post-hoc test.

Results

The physical therapists had a mean age of 40.85 (8.23) years and a mean experience of 14.53 (8.20) years. Out of 64 patient cases, there was consensus on 35 cases with no contraindication and 29 cases with a consensus on “contraindication exists” for exercise therapy. In 16 cases there was no consensus between therapists. Overall, therapists had 87.5% agreement with Fleiss Kappa κ_π = .43. The pooled kappa value between therapists and AI was κ_pooled = .63. AI achieved perfect values (1) for sensitivity, specificity, accuracy and F1 score. Statistically, consensus-based comparisons by therapists revealed significant differences in pain intensity, duration, timing, and quality.

Conclusion

The study shows significant agreement between physical therapists and the AI, consistent with similar musculoskeletal studies. Various clinical characteristics highlight the importance of clinical reasoning and contraindication detection. In conclusion, advanced technologies such as decision support and expert systems could have a profound impact on clinical practice, improving accuracy, personalized exercises and telemedicine referrals for efficient care and improved patient decisions.

Trial registration

30.12.2021 via OSF Registries, https://doi.org/10.17605/OSF.IO/YCNJQ.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A modern way to teach and practice manual therapy

Article Open access 21 May 2024

The Assessment, Management and Prevention of Calf Muscle Strain Injuries: A Qualitative Study of the Practices and Perspectives of 20 Expert Sports Clinicians

Article Open access 15 January 2022

Exercise and Neuropathy: Systematic Review with Meta-Analysis

Article 29 December 2021

Data availability statement

The data can be requested from the corresponding author.

References

Ali O, Abdelbaki W, Shrestha A, et al. A systematic literature review of artificial intelligence in the healthcare sector: Benefits, challenges, methodologies, and functionalities. J Innov Knowl. 2023;8:100333. https://doi.org/10.1016/j.jik.2023.100333.
Article Google Scholar
Goodman K, Zandi D, Reis A, Vayena E. Balancing risks and benefits of artificial intelligence in the health sector. Bull World Health Organ. 2020;98:230-230A. https://doi.org/10.2471/blt.20.253823.
Article Google Scholar
Pawloski PA, Brooks GA, Nielsen ME, Olson-Bullis BA. A systematic review of clinical decision support systems for clinical oncology practice. J Natl Compr Canc Netw. 2019;17:331–8. https://doi.org/10.6004/jnccn.2018.7104.
Article Google Scholar
Verboven L, Calders T, Callens S, et al. A treatment recommender clinical decision support system for personalized medicine: method development and proof-of-concept for drug resistant tuberculosis. BMC Med Inform Decis Mak. 2022;22:56. https://doi.org/10.1186/s12911-022-01790-0.
Article Google Scholar
Fiske A, Henningsen P, Buyx A. Your robot therapist will see you now: Ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. J Med Internet Res. 2019;21:e13216. https://doi.org/10.2196/13216.
Article Google Scholar
El Asmar ML, Dharmayat KI, Vallejo-Vaz AJ, et al. Effect of computerised, knowledge-based, clinical decision support systems on patient-reported and clinical outcomes of patients with chronic disease managed in primary care settings: a systematic review. BMJ Open. 2021;11:e054659. https://doi.org/10.1136/bmjopen-2021-054659.
Article Google Scholar
Rughani G, Nilsen TIL, Wood K, et al. The selfBACK artificial intelligence-based smartphone app can improve low back pain outcome even in patients with high levels of depression or stress. Eur J Pain. 2023;27:568–79. https://doi.org/10.1002/ejp.2080.
Article Google Scholar
Lewis R, Gómez Álvarez CB, Rayman M, et al. Strategies for optimising musculoskeletal health in the 21st century. BMC Musculoskelet Disord. 2019;20:164. https://doi.org/10.1186/s12891-019-2510-7.
Article Google Scholar
Briggs AM, Cross MJ, Hoy DG, et al. Musculoskeletal Health Conditions Represent a Global Threat to Healthy Aging: A Report for the 2015 World Health Organization World Report on Ageing and Health. Gerontologist. 2016;56(Suppl 2):S243–55. https://doi.org/10.1093/geront/gnw002.
Article Google Scholar
Bonanni R, Cariati I, Tancredi V, et al. Chronic pain in musculoskeletal diseases: Do you know your enemy? J Clin Med. 2022;11:2609. https://doi.org/10.3390/jcm11092609.
Article Google Scholar
Teepe GW, Kowatsch T, Hans FP, Benning L. Postmarketing follow-up of a digital home exercise program for back, hip, and knee pain: Retrospective observational study with a time-series and matched-pair analysis. J Med Internet Res. 2023;25:e43775. https://doi.org/10.2196/43775.
Article Google Scholar
Areias AC, Costa F, Janela D, et al. Impact on productivity impairment of a digital care program for chronic low back pain: A prospective longitudinal cohort study. Musculoskelet Sci Pract. 2023;63:102709. https://doi.org/10.1016/j.msksp.2022.102709.
Article Google Scholar
Chhabra HS, Sharma S, Verma S. Smartphone app in self-management of chronic low back pain: a randomized controlled trial. Eur Spine J. 2018;27:2862–74. https://doi.org/10.1007/s00586-018-5788-5.
Article Google Scholar
Marcuzzi A, Nordstoga AL, Bach K, et al. Effect of an artificial intelligence–based self-management app on musculoskeletal health in patients with neck and/or low back pain referred to specialist care. JAMA Netw Open. 2023;6:e2320400. https://doi.org/10.1001/jamanetworkopen.2023.20400.
Article Google Scholar
Mathews SC, McShea MJ, Hanley CL, et al. Digital health: A path to validation. NPJ Digit Med. 2019;2:38. https://doi.org/10.1038/s41746-019-0111-3.
Article Google Scholar
Bossuyt PM, Reitsma JB, Bruns DE, et al. STARD 2015: An Updated List of Essential Items for Reporting Diagnostic Accuracy Studies. Clin Chem. 2015;61:1446–52. https://doi.org/10.1373/clinchem.2015.246280.
Article Google Scholar
Sounderajah V, Ashrafian H, Golub RM, et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open. 2021;11:e047709. https://doi.org/10.1136/bmjopen-2020-047709.
Article Google Scholar
World Physiotherapy. Policy statement: Physical therapists as exercise and physical activity experts across the life span. World Physiotherapy. 2019. https://world.physio/sites/default/files/2020-09/PS-2019-Exercise-experts.pdf
Jette DU, Ardleigh K, Chandler K, McShea L. Decision-making ability of physical therapists: physical therapy intervention or medical referral. Phys Ther. 2006;86:1619–29. https://doi.org/10.2522/ptj.20050393.
Article Google Scholar
Gallotti M, Campagnola B, Cocchieri A, et al. Effectiveness and consequences of direct access in physiotherapy: A systematic review. J Clin Med Res. 2023;12:5832. https://doi.org/10.3390/jcm12185832.
Article Google Scholar
Lange T, Kopkow C, Lützner J, et al. Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability. BMC Med Res Methodol. 2020;20:28. https://doi.org/10.1186/s12874-020-0912-8.
Article Google Scholar
Diamond IR, Grant RC, Feldman BM, et al. Defining consensus: a systematic review recommends methodologic criteria for reporting of Delphi studies. J Clin Epidemiol. 2014;67:401–9. https://doi.org/10.1016/j.jclinepi.2013.12.002.
Article Google Scholar
Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–8. https://doi.org/10.1038/nature21056.
Article Google Scholar
Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76:378–82. https://doi.org/10.1037/h0031619.
Article Google Scholar
De Vries H, Elliott MN, Kanouse DE, Teleki SS. Using pooled kappa to summarize interrater agreement across many items. Field Methods. 2008;20:272–82. https://doi.org/10.1177/1525822x08317166.
Article Google Scholar
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
Article Google Scholar
Terwee CB, Prinsen CAC, Chiarotto A, et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res. 2018;27:1159–70. https://doi.org/10.1007/s11136-018-1829-0.
Article Google Scholar
Mokkink LB, Boers M, van der Vleuten CPM, et al. COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: A Delphi study. BMC Med Res Methodol. 2020;20:293. https://doi.org/10.1186/s12874-020-01179-5.
Article Google Scholar
Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In: Lecture Notes in Computer Science. Berlin Heidelberg, Berlin, Heidelberg: Springer; 2006. p. 1015–21.
Google Scholar
Yacouby R, Axman D. Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems. 2020;2020:79–91 Association for Computational Linguistics, Online.
Article Google Scholar
Lalkhen AG, McCluskey A. Clinical tests: sensitivity and specificity. Contin Educ Anaesth Crit Care Pain. 2008;8:221–3. https://doi.org/10.1093/bjaceaccp/mkn041.
Article Google Scholar
Dukic V, Gatsonis C. Meta-analysis of diagnostic test accuracy assessment studies with varying number of thresholds. Biometrics. 2003;59:936–46. https://doi.org/10.1111/j.0006-341x.2003.00108.x.
Article MathSciNet Google Scholar
Armstrong RA. When to use the Bonferroni correction. Ophthalmic Physiol Opt. 2014;34:502–8. https://doi.org/10.1111/opo.12131.
Article Google Scholar
Redier H, Daures JP, Michel C, et al. Assessment of the severity of asthma by an expert system. Description and evaluation. Am J Respir Crit Care Med. 1995;151:345–52. https://doi.org/10.1164/ajrccm.151.2.7842190.
Article Google Scholar
Gudmundsson HT, Hansen KE, Halldorsson BV, et al. Clinical decision support system for the management of osteoporosis compared to NOGG guidelines and an osteology specialist: A validation pilot study. BMC Med Inform Decis Mak. 2019;19:27. https://doi.org/10.1186/s12911-019-0749-4.
Article Google Scholar
Farmer N. An update and further testing of a knowledge-based diagnostic clinical decision support system for musculoskeletal disorders of the shoulder for use in a primary care setting. J Eval Clin Pract. 2014;20:589–95. https://doi.org/10.1111/jep.12153.
Article Google Scholar
Kim D, Lee J, Woo Y, et al. Deep learning application to clinical decision support system in sleep stage classification. J Pers Med. 2022;12:136. https://doi.org/10.3390/jpm12020136.
Article Google Scholar
Aron A, Cunningham S, Yoder I, et al. Diagnostic momentum in physical therapy clinical reasoning. J Eval Clin Pract. 2023. https://doi.org/10.1111/jep.13884.
Article Google Scholar
Leerar PJ, Boissonnault W, Domholdt E, Roddey T. Documentation of red flags by physical therapists for patients with low back pain. J Man Manip Ther. 2007;15:42–9. https://doi.org/10.1179/106698107791090105.
Article Google Scholar
Bourassa M, Kolb WH, Barrett D, Wassinger C. Guideline adherent screening and referral: do third year Doctor of Physical Therapy students identify red and yellow flags within descriptive patient cases? a United States based survey study. J Man Manip Ther. 2023;31:253–60. https://doi.org/10.1080/10669817.2023.2170743.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank, among others, Philipp Schlüter, Andres Jung, Prof. Dr. Ursula Hübner and Steffen Schulz for their help with specific statistical questions. We would also like to thank medicalmotion GmbH for providing the patient cases.

Funding

The authors declare that no funding, grants or other support was received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Physiotherapy, Institute of Health Sciences, Universität zu Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
Griefahn Annika & Luedtke Kerstin
Faculty Business Management and Social Sciences, University of Applied Science Osnabrück, Albrechtstraße 30, 49076, Osnabrück, Germany
Griefahn Annika & Zalpour Christoff
Medicalmotion GmbH, Blütenstraße 15, 80799, Munich, Germany
Griefahn Annika

Authors

Griefahn Annika
View author publications
You can also search for this author in PubMed Google Scholar
Zalpour Christoff
View author publications
You can also search for this author in PubMed Google Scholar
Luedtke Kerstin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Griefahn, Annika; Luedtke, Kerstin; Zalpour, Christoff. Methodology: Griefahn, Annika; Luedtke, Kerstin. Formal analysis and investigation: Griefahn, Annika. Writing—original draft preparation: Griefahn, Annika; Luedtke, Kerstin; Zalpour, Christoff. Writing—review and editing: Griefahn, Annika; Luedtke, Kerstin; Zalpour, Christoff. Supervision: Luedtke, Kerstin; Zalpour, Christoff.

Corresponding author

Correspondence to Griefahn Annika.

Ethics declarations

Ethics approval

Approval was obtained from the ethics committee of University of Applied Science Osnabrück (ID: HSOS/2021/1/3). The procedures used in this study adhere to the tenets of the Declaration of Helsinki.

Consent to participate

All participants provided written informed consent following a detailed explanation of the study’s purpose.

Consent for publication

Not applicable.

Competing interests

AG is employee of medicalmotion GmbH. The remaining authors declare no competing interests.

Disclaimer

All authors have read and approved the final version of the manuscript. All authors agree that they are responsible for all aspects of the work, and they will ensure that issues related to the accuracy or integrity of any part of the work are adequately investigated and resolved.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 29 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Annika, G., Christoff, Z. & Kerstin, L. Evaluation of the accuracy of an artificial intelligence in identifying contraindications to exercise therapy - Comparison with and interrater reliability of physical therapists judgments. Health Technol. 14, 513–522 (2024). https://doi.org/10.1007/s12553-024-00827-w

Download citation

Received: 14 December 2023
Accepted: 05 February 2024
Published: 20 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s12553-024-00827-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of the accuracy of an artificial intelligence in identifying contraindications to exercise therapy - Comparison with and interrater reliability of physical therapists judgments