Skip to main content

A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology

  • Chapter
  • First Online:

Abstract

Medical imaging data is now extremely abundant due to over two decades of digitisation of imaging protocols and data storage formats. However, clean, well-curated data, that is amenable to machine learning, is relatively scarce, and AI developers are paradoxically data starved. Imaging and clinical data is also heterogeneous, often unstructured and unlabelled, whereas current supervised and semi-supervised machine learning techniques rely on homogeneous and carefully annotated data. While imaging biobanks contain small volumes of well-curated data, it is the leveraging of ‘big data’ from the front-line of healthcare that is the focus of many machine learning developers hoping to train and validate computer vision algorithms. The quest for sufficiently large volumes of clean data that can be used for training, validation and testing involves several hurdles, namely ethics and consent, security, the assessment of data quality, ground truth data labelling, bias reduction, reusability and generalisability. In this chapter we propose a new medical imaging data readiness (MIDaR) scale. The MIDaR scale is designed to objectively clarify data quality for both researchers seeking imaging data and clinical providers aiming to share their data. It is hoped that the MIDaR scale will be used globally during collaborative academic and business conversations, so that everyone can more easily understand and quickly appraise the relevant stages of data readiness for machine learning in relation to their AI development projects. We believe that the MIDaR scale could become essential in the design, planning and management of AI medical imaging projects, and significantly increase chances of success.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods. J Bus Res 2017;70:263–286. https://www.sciencedirect.com/science/article/pii/S014829631630488X.

    Article  Google Scholar 

  2. Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE international conference on computer vision. Vol 2017-Oct. New York: IEEE; 2017. p. 843–52. ISBN: 9781538610329. https://doi.org/10.1109/ICCV.2017.97. http://ieeexplore.ieee.org/document/8237359/.

  3. Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intell Syst. 2009;24(2)8–12. ISSN: 1541-1672. https://doi.org/10.1109/MIS.2009.36. http://ieeexplore.ieee.org/document/4804817/.

    Article  Google Scholar 

  4. Gueld MO, Kohnen M, Keysers D, Schubert H, Wein BB, Bredno J, Lehmann TM. Quality of DICOM header information for image categorization. Proc SPIE. 2002;4685:280–7. ISSN: 0277786X. https://doi.org/10.1117/12.467017. http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=880364. http://dx.doi.org/10.1117/12.467017.

  5. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3:160018. ISSN: 2052-4463. https://doi.org/10.1038/sdata.2016.18. http://www.ncbi.nlm.nih.gov/pubmed/26978244 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4792175 http://www.nature.com/articles/sdata201618.

    Article  Google Scholar 

  6. Kohli MD, Summers RM, Raymond Geis J. Medical image data and datasets in the era of machine learning-whitepaper from the 2016 C-MIMI Meeting Dataset Session. J Digit Imaging. 2017;30 (4):392–9. ISSN: 0897-1889. https://doi.org/10.1007/s10278-017-9976-3. http://www.ncbi.nlm.nih.gov/pubmed/28516233 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5537092 http://link.springer.com/10.1007/s10278-017-9976-3.

    Article  Google Scholar 

  7. Lawrence ND. Data readiness levels; 2017. http://arxiv.org/abs/1705.02245.

  8. Supplements – DICOM standard. https://www.dicomstandard.org/supplements/.

  9. De-identification knowledge base - the cancer imaging archive (TCIA) public access - cancer imaging archive Wiki; 2017. https://wiki.cancerimagingarchive.net/display/Public/De-identification+Knowledge+Base.

  10. European Commission - Directorate General for Research and Innovation. Ethics for researchers - Facilitating Research Excellence in FP7. Technical report; 2013. http://ec.europa.eu/research/fp7/index_en.cfm?pg=documents http://ec.europa.eu/research/participants/data/ref/fp7/89888/ethics-for-researchers_en.pdf.

  11. Integrated Research Application System; 2018. https://www.myresearchproject.org.uk/.

  12. Research Ethics Committees overview - Health Research Authority; 2018. https://www.hra.nhs.uk/about-us/committees-and-services/res-and-recs/research-ethics-committees-overview/.

  13. Institutional Review Board; 2018. https://www.niehs.nih.gov/about/boards/irb/index.cfm.

  14. Santosh KC, Wendling L. Automated chest X-ray image view classification using force histogram. Singapore: Springer; 2017. p. 333–42. https://doi.org/10.1007/978-981-10-4859-3{_}30. http://link.springer.com/10.1007/978-981-10-4859-3_30.

  15. Pons E, Braun LMM, Myriam Hunink MG, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43. ISSN: 0033-8419. https://doi.org/10.1148/radiol.16142770. http://pubs.rsna.org/doi/10.1148/radiol.16142770.

    Article  Google Scholar 

  16. Smith SM, Nichols TE. Statistical challenges in “big data” human neuroimaging; 2018. ISSN: 10974199. http://www.ncbi.nlm.nih.gov/pubmed/29346749.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

With thanks to Hugh Lyshkow, DesAcc Inc. for his invaluable input and insight.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugh Harvey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Harvey, H., Glocker, B. (2019). A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology. In: Ranschaert, E., Morozov, S., Algra, P. (eds) Artificial Intelligence in Medical Imaging. Springer, Cham. https://doi.org/10.1007/978-3-319-94878-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94878-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94877-5

  • Online ISBN: 978-3-319-94878-2

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics