Quality and Curation of Medical Images and Data

  • Peter M. A. van OoijenEmail author


With the increased collection of medical data in digital format the use and reuse of this data is also increasing. This introduces new challenges in the selection, de-identification, storage and handling of the imaging data. When building large data collections for use in training and validation of machine learning, merely collecting a lot of data is not enough. It is essential that the quality of the data is be sufficient for the intended application in order to obtain valid results. This chapter will discuss the issue of data quality by looking at the process of curation of medical images and other related data and the different aspects that are involved in this when moving forward in the era of AI.


Data curation Medical imaging data De-identification Data quality Data discovery Data reuse 


  1. 1.
    Rosenstein BS, et al. How will big data improve clinical and basic research in radiation therapy? Int J Radiat Oncol. 2015;95:895–904.CrossRefGoogle Scholar
  2. 2.
    Mayer-Schonberger V, Ingelsson E. Big data and medicine: a big deal? J Intern Med. 2017.Google Scholar
  3. 3.
    Ridley EL. How to develop deep-learning algorithms for radiology. 2017.\break sub=aic&pag=dis&ItemID=118078. Accessed 6 June 2018.
  4. 4.
    Redman TC. If your data is bad, your machine learning tools are useless. Harv Bus Rev. 2018. https://\break\break arning-tools-are-useless. Accessed 6 June 2018.
  5. 5.
    U of Illinois. 2018. Accessed 9 May 2018.
  6. 6.
    Freitas A, Curry E. Big data curation. In: Cavanillas JM, et al., editors. New horizons for a data-driven economy. Cham: Springer International Publishing; 2016.Google Scholar
  7. 7.
    Prior F, Smith K, Sharma A, Kirby J, Tarbox L, Clark K, Bennett W, Nolan T, Freymann J. Data descriptor: the public cancer radiology imaging collections of the Cancer Imaging Archive. Sci Data. 2017;4:170124.CrossRefGoogle Scholar
  8. 8.
    van Ooijen PMA, Viddeleer AR, Meijer F, Oudkerk M. Accessibility of data backup on CD-R after 8 to 11 years. J Digit Imaging. 2010;23(1):95–9.CrossRefGoogle Scholar
  9. 9.
    Aerts HJWL. Data science in radiology: a path forward. Clin Cancer Res. 2018;24(3):532–4.CrossRefGoogle Scholar
  10. 10.
    Kansagra AP, Yu J-PJ, Chatterjee AR, Lenchik L, Chow DS, Prater AB, Yeh J, Doshi AM, Hawkins M, Heilbrun ME, Smith SE, Oselkin M, Gupta P, Ali S. Big data and the future of radiology informatics. Acad Radiol. 2016;23:30–42.CrossRefGoogle Scholar
  11. 11.
    Tang A, Tam R, Cadrin-Chenevert A, Guest W, Chong J, Barfett J, Chepelev L, Cairns R, Michell R, Cicero MD, Gaudreau Poudrette M, Jaremko JL, Reinhold C, Gallix B, Gray B, Geis R. Canadian Association of Radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol J. 2018;69:120–35.CrossRefGoogle Scholar
  12. 12.
    Kohli M, Summers R, Geis R. Medical image data and datasets in the era of machine learning—whitepaper from the 2016 C-MIMI meeting dataset session. J Digit Imaging. 2017;30:392–9.CrossRefGoogle Scholar
  13. 13.
    Lupton D. Who owns your personal health and medical data? This Sociological Life BLOG. 2015.Google Scholar
  14. 14.
    Aryanto KYE, Oudkerk M, van Ooijen PMA. Free DICOM de-identification tools in clinical research: functioning and safety of patient privacy. Eur Radiol. 2015;25(12):3685–95. Scholar
  15. 15.
    Moore SM, et al. De-identification of medical images with retention of scientific research value. Radiographics. 2015;35:727–35.CrossRefGoogle Scholar
  16. 16.
    Clark K, et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26:1045–57.CrossRefGoogle Scholar
  17. 17.
    Prior FW, Brunsden B, Hildebolt C, et al. Facial recognition from volume rendered magnetic resonance imaging data. IEEE Trans Inf Technol Biomed. 2009;13(1):5–9.CrossRefGoogle Scholar
  18. 18.
    Mazura JC, Juluru K, Chen JJ, Morgan TA, John M, Siegel EL. Facial recognition software success rate for the identification of 3D surface reconstructed facial images: implications for patient privacy and security. J Digit Imaging. 2012;25(3): 347–51.CrossRefGoogle Scholar
  19. 19.
    Sweeney L. Only you, your doctor, and many others may know. Technology Science. 2015. http://\break Accessed 6 June 2018.
  20. 20.
    Lawrence ND. Data readiness levels. 2017. arXiv:1705.02245v1 [cs.DB].Google Scholar
  21. 21.
    Chalkidou A, O’Doherty MJ, Marsden PK. False discovery rates in PET and CT studies with texture features: a systematic review. PLoS One. 2015;10:e0124165.CrossRefGoogle Scholar
  22. 22.
    Harvey H. Is medical imaging data ready for Artificial Intelligence? AuntMinnieEurope. 2017.\break =sup&sub=pac&pag=dis&ItemID=615032. Accessed 6 June 2018.
  23. 23.
    EMC. The digital universe of opportunities: rich data and the increasing value of the internet of things. Executive summary data growth, business opportunities, and the IT imperatives. EMC. 2014. Accessed 9 June 2018.
  24. 24.
    Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. Scholar
  25. 25.
    ESR. ESR position paper on imaging biobanks. Insights Imaging. 2015;6(4):403–10.CrossRefGoogle Scholar
  26. 26.
    Bennett W, Metthews J, Bosch W. SU-GG-T-262: open-source tool for assessing variability in DICOM data. Med Phys. 2010;37:3245.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Groningen, University Medical Center GroningenGroningenThe Netherlands

Personalised recommendations