Skip to main content

Reporting Standards and Quality Assessment Tools in Artificial Intelligence–Centered Healthcare Research

  • Reference work entry
  • First Online:
Artificial Intelligence in Medicine

Abstract

The practice of incomplete study reporting is rife within scientific literature. It hinders the adoption of technologies, introduces considerable “research waste,” and represents a significant moral hazard. In order to combat this issue, there has been a shift towards the use of reporting standards and quality assessment tools, a move that has been endorsed by major biomedical journals as well as other key stakeholders. These instruments help [1] to improve the quality and completeness of study reporting as well as [2] to aid researchers in their assessment of a study’s risk of bias and applicability. These instruments are carefully created through a multistep evidence generation process and are specific to individual study designs or specialties. Recently, it has been noted that many of the existing instruments are poorly suited to aid the reporting and assessment of artificial intelligence (AI)-based studies on account of their niche study considerations. As such, there has been a concerted effort to produce AI-specific extensions to preexisting instruments, such as CONSORT, SPIRIT, STARD, TRIPOD, QUADAS, and PROBAST. This chapter expands upon why AI-specific amendments to these instruments are required in addition to highlighting their contents and proposed scope.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Casas J-P, Kwong J, Ebrahim S. Telemonitoring for chronic heart failure: not ready for prime time. In: Cochrane database of systematic reviews [Internet]. Wiley; 2010 [cited 2021 Mar 15]. Available from: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.ED000008/full

  2. Glasziou P, Chalmers I. Research waste is still a scandal- A n essay by Paul Glasziou and Iain Chalmers. BMJ [Internet]. 2018 [cited 2021 Mar 15];363. Available from: https://www.bmj.com/content/363/bmj.k4645

  3. Chan AW, Hróbjartsson A, Jørgensen KJ, Gøtzsche PC, Altman DG. Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols. BMJ [Internet]. 2008 [cited 2021 Mar 15];337(7683):1404–7. Available from: http://www.bmj.com/

  4. Glasziou P, Meats E, Heneghan C, Shepperd S. What is missing from descriptions of treatment in trials and reviews? [Internet]. BMJ. BMJ Publishing Group; 2008 [cited 2021 Mar 15];336:1472. Available from: https://www.bmj.com/content/336/7659/1472

  5. Pitkin RM, Branagan MA, Burmeister LF. Accuracy of data in abstracts of published research articles. J Am Med Assoc [Internet]. 1999 [cited 2021 Mar 15];281(12):1110–1. Available from: https://jamanetwork.com/

  6. Estrada CA, Bloch RM, Antonacci D, Basnight LL, Patel SR, Patel SC, et al. Reporting and concordance of methodologic criteria between abstracts and articles in diagnostic test studies. J Gen Intern Med [Internet]. 2000 [cited 2021 Mar 15];15(3):183–7. Available from: /pmc/articles/PMC1495348/.

    Google Scholar 

  7. Vesterinen H V., Egan K, Deister A, Schlattmann P, MacLeod MR, Dirnagl U. Systematic survey of the design, statistical analysis, and reporting of studies published in the 2008 volume of the Journal of Cerebral Blood Flow and Metabolism. J Cereb Blood Flow Metab [Internet]. 2011 [cited 2021 Mar 15];31(4):1064–72. Available from: http://journals.sagepub.com/doi/10.1038/jcbfm.2010.217

  8. Dwan K, Altman DG, Blundell M, Gamble CL, Williamson PR. Comparison of protocols and registry entries to published reports for randomised controlled trials. In: Cochrane database of systematic reviews. Wiley; 2010.

    Google Scholar 

  9. Chan AW, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles [Internet]. JAMA. 2004 [cited 2021 Mar 15];291:2457–65. Available from: https://pubmed.ncbi.nlm.nih.gov/15161896/

  10. Ly WK, Strasak AM, Zaman Q, Pfeiffer KP, Göbel G, Ulmer H. Statistical errors in medical research-a review of common pitfalls [Internet]. Swiss Medical Weekly. 2007;37:0304. EMH Media; 2007 [cited 2021 Mar 15]. Available from: https://smw.ch/article/doi/smw.2007.11587

  11. Chowers MY, Gottesman BS, Leibovici L, Pielmeier U, Andreassen S, Paul M. Reporting of adverse events in randomized controlled trials of highly active antiretroviral therapy: systematic review [Internet]. J Antimicrob Chemother. 2009 [cited 2021 Mar 15];64:239–50. Available from: https://pubmed.ncbi.nlm.nih.gov/19477890/

  12. Gigerenzer G, Gaissmaier W, Kurz-Milcke E, Schwartz LM, Woloshin S. Helping Doctors and patients make sense of health statistics. 2008.

    Google Scholar 

  13. Jannot AS, Agoritsas T, Gayet-Ageron A, Perneger TV. Citation bias favoring statistically significant studies was present in medical research. J Clin Epidemiol [Internet]. 2013 [cited 2021 Mar 15];66(3):296–301. Available from: https://pubmed.ncbi.nlm.nih.gov/23347853/

  14. Reason J. The contribution of latent human failures to the breakdown of complex systems. Philos Trans R Soc Lond B Biol Sci [Internet]. 1990 [cited 2021 Mar 15];327(1241):475–84. Available from: https://pubmed.ncbi.nlm.nih.gov/1970893/

  15. Altman DG. Poor-quality medical research: what can journals do? [Internet]. JAMA. 2002 [cited 2021 Mar 15];287:2765–7. Available from: https://pubmed.ncbi.nlm.nih.gov/12038906/

  16. Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. J Am Med Assoc [Internet]. 1996 [cited 2021 Mar 15];276(8):637–9. Available from: https://pubmed.ncbi.nlm.nih.gov/8773637/

  17. Article types and preparation | The BMJ [Internet]. [cited 2021 Mar 15]. Available from: https://www.bmj.com/about-bmj/resources-authors/article-types

  18. The EQUATOR Network | Enhancing the QUAlity and Transparency Of Health Research [Internet]. [cited 2020 Sep 26]. Available from: https://www.equator-network.org/

  19. OCEBM Levels of Evidence Working Group. The Oxford 2011 Levels of Evidence. Vol. 1, Oxford Centre for Evidence-Based Medicine. 2011. p 5653.

    Google Scholar 

  20. Chalmers I, Altman D. Systematic reviews. London: BMJ Publishing Group Ltd. 1995.

    Google Scholar 

  21. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future [Internet]. Stroke Vasc Neurol. BMJ Publishing Group. 2017 [cited 2021 Jan 17];2:230–43. Available from: http://svn.bmj.com/

  22. Choudhury A, Asan O. Role of artificial intelligence in patient safety outcomes: systematic literature review [Internet]. JMIR Med Inform. JMIR Publications Inc.; 2020 [cited 2021 Jan 17];8. Available from: /pmc/articles/PMC7414411/?report=abstract

    Google Scholar 

  23. How AI Can Help Reduce $200B in Annual Waste [Internet]. [cited 2021 Jan 17]. Available from: https://www.optum.com/business/resources/library/artificial-intelligence-reduces-waste-health-care-costs.html

  24. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. Nature Publishing Group. 2019;25:44–56.

    Google Scholar 

  25. US Food and Drug Administration (FDA). Software as a Medical Device (SaMD) Action Plan [Internet]. 2021 [cited 2021 Mar 2]. Available from: www.fda.gov

  26. McCradden MD, Baba A, Saha A, Ahmad S, Boparai K, Fadaiefard P, et al. Ethical concerns around use of artificial intelligence in health care research from the perspective of patients with meningioma, caregivers and health care providers: a qualitative study. C Open [Internet]. 2020 [cited 2021 Mar 15];8(1):E90–5. Available from: /pmc/articles/PMC7028163/

    Google Scholar 

  27. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Heal. 2019;1(6):e271–97.

    Article  Google Scholar 

  28. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A survey on bias and fairness in machine learning. arXiv [Internet]. 2019 [cited 2021 Mar 15]; Available from: http://arxiv.org/abs/1908.09635

  29. The Frame Problem (Stanford Encyclopedia of Philosophy) [Internet]. [cited 2021 Mar 15]. Available from: https://plato.stanford.edu/entries/frame-problem/

  30. Harris M, Qi A, Jeagal L, Torabi N, Menzies D, Korobitsyn A, et al. A systematic review of the diagnostic accuracy of artificial intelligence-based computer programs to analyze chest x-rays for pulmonary tuberculosis. PLoS One [Internet]. 2019 [cited 2021 Mar 15];14(9). Available from: https://pubmed.ncbi.nlm.nih.gov/31479448/

  31. Marka A, Carter JB, Toto E, Hassanpour S. Automated detection of nonmelanoma skin cancer using digital images: a systematic review. BMC Med Imaging [Internet]. 2019 [cited 2021 Mar 15];19(1):21. Available from: https://bmcmedimaging.biomedcentral.com/articles/10.1186/s12880-019-0307-7

  32. Chan A-W, Tetzlaff JM, Altman DG, Laupacis A, Gøtzsche PC, Krleža-Jerić K, et al. SPIRIT 2013 statement: defining standard protocol items for clinical trials. Ann Intern Med [Internet]. 2013 [cited 2021 Mar 15];158(3):200. Available from: http://annals.org/article.aspx?doi=10.7326/0003-4819-158-3-201302050-00583

  33. Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ [Internet]. 2010 [cited 2021 Mar 15];340(7748):698–702. Available from: https://www.bmj.com/content/340/bmj.c332

  34. Rivera SC, Liu X, Chan A-W, Denniston AK, Calvert MJ. Consensus statement Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension The SPIRIT-AI and CONSORT-AI Working Group*, SPIRIT-AI and CONSORT-AI Steering Group and SPIRIT-AI and CONSORT-AI Consensus Group. Nat Med [Internet]. 2020 [cited 2020 Sep 26];26(9):1351–63. Available from: https://doi.org/10.1038/s41591-020-1037-7.

  35. Liu X, Rivera SC. Consensus statement Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension 6,13 ✉ and The SPIRIT-AI and CONSORT-AI Working Group*. Nat Med 2020 269 [Internet]. 2020 [cited 2020 Sep 26];26(9):1364–74. Available from: https://doi.org/10.1038/s41591-020-1034-x.

  36. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015;28:351.

    Google Scholar 

  37. Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. npj Digit Med [Internet]. 2020 [cited 2020 Sep 26];3(1):118. Available from: http://www.nature.com/articles/s41746-020-00324-0

  38. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature [Internet]. 2020;577(7788):89–94. https://doi.org/10.1038/s41586-019-1799-6.

    Article  CAS  Google Scholar 

  39. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med [Internet]. [cited 2020 Jul 2]; https://doi.org/10.1038/s41591-019-0447-x.

  40. Sounderajah V, Ashrafian H, Aggarwal R, De Fauw J, Denniston AK, Greaves F, et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: the STARD-AI Steering Group [Internet]. Nat Med. Nature Research; 2020 [cited 2020 Sep 26];26:807–8. https://doi.org/10.1038/s41591-020-0941-1.

  41. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med [Internet]. 2015 [cited 2021 Mar 15];13(1):1. Available from: http://www.biomedcentral.com/1741-7015/13/1

  42. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. npj Digit Med [Internet]. 2018 [cited 2018 Jun 19];1. Available from: https://www.nature.com/articles/s41746-018-0029-1.pdf

  43. Collins G, Moons K. Reporting of artificial intelligence prediction models. Lancet. 2019;393:1577–9.

    Article  Google Scholar 

  44. Whiting PF. QUADAS-2: a Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies. Ann Intern Med [Internet]. 2011 [cited 2021 Jan 17];155(8):529. Available from: http://annals.org/article.aspx?doi=10.7326/0003-4819-155-8-201110180-00009

  45. Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews [Internet]. BMC Med Res Methodol. BioMed Central Ltd.; 2003 [cited 2021 Mar 2];3:1–13. Available from: http://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-3-25

  46. Sounderajah V, Ashrafian H, Deeks J, Whiting P, Bossuyt P, Collins G, et al. QUADAS-AI: a revised tool for the quality assessment of artificial intelligence centred diagnostic accuracy studies. 2021 [cited 2021 Mar 15]; Available from: https://osf.io/fcpjt/

  47. McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies the PRISMA-DTA statement. JAMA [Internet]. 2018 [cited 2021 Mar 15];319(4):388–96. Available from: https://pubmed.ncbi.nlm.nih.gov/29362800/

  48. Quality Assessment of Prognostic Accuracy Studies (QUAPAS): an extension of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool for systematic reviews of prognostic test accuracy studies | Colloquium Abstracts [Internet]. [cited 2021 Mar 15]. Available from: https://abstracts.cochrane.org/2019-santiago/quality-assessment-prognostic-accuracy-studies-quapas-extension-quality-assessment

  49. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med [Internet]. 2019 [cited 2021 Mar 2];170(1):51. Available from: http://annals.org/article.aspx?doi=10.7326/M18-1376

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viknesh Sounderajah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Sounderajah, V. et al. (2022). Reporting Standards and Quality Assessment Tools in Artificial Intelligence–Centered Healthcare Research. In: Lidströmer, N., Ashrafian, H. (eds) Artificial Intelligence in Medicine. Springer, Cham. https://doi.org/10.1007/978-3-030-64573-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64573-1_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64572-4

  • Online ISBN: 978-3-030-64573-1

  • eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics