Skip to main content

Advertisement

Log in

Research Goal-Driven Data Model and Harmonization for De-Identifying Patient Data in Radiomics

  • Original Paper
  • Published:
Journal of Digital Imaging Aims and scope Submit manuscript

Abstract

There are various efforts in de-identifying patient’s radiation oncology data for their uses in the advancement of research in medicine. Though the task of de-identification needs to be defined in the context of research goals and objectives, existing systems lack the flexibility of modeling data and normalization of names of attributes for accomplishing them. In this work, we describe a de-identification process of radiation and clinical oncology data, which is guided by a data model and a schema of dynamically capturing domain ontology and normalization of terminologies, defined in tune with the research goals in this area. The radiological images are obtained in DICOM format. It consists of diagnostic, radiation therapy (RT) treatment planning, RT verification, and RT response images. During the DICOM de-identification, a few crucial pieces of information are taken about the dataset. The proposed model is generic in organizing information modeling in sync with the de-identification of a patient’s clinical information. The treatment and clinical data are provided in the comma-separated values (CSV) format, which follows a predefined data structure. The de-identified data is harmonized throughout the entire process. We have presented four specific case studies on four different types of cancers, namely glioblastoma multiforme, head–neck, breast, and lung. We also present experimental validation on a few patients’ data in these four areas. A few aspects are taken care of during de-identification, such as preservation of longitudinal date changes (LDC), incremental de-identification, referential data integrity between the clinical and image data, de-identified data harmonization, and transformation of the data to an underlined database schema.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. W. D. Bidgood Jr, S. C. Horii, F. W. Prior, and D. E. Van Syckle, “Understanding and using dicom, the data interchange standard for biomedical imaging,” Journal of the American Medical Informatics Association, 4(3):199–212, 1997.

  2. K. Aryanto, M. Oudkerk, and P. van Ooijen, “Free dicom de-identification tools in clinical research: functioning and safety of patient privacy,” European radiology, 25(12):3685–3695, 2015.

  3. P. Vcelak, M. Kryl, M. Kratochvil, and J. Kleckova, “Identification and classification of dicom files with burned-in text content,” International journal of medical informatics, 126:128–137, 2019.

  4. F. Prior, K. Smith, A. Sharma, J. Kirby, L. Tarbox, K. Clark, W. Bennett, T. Nolan, and J. Freymann, “The public cancer radiology imaging collections of the cancer imaging archive,” Scientific data, 4:170124, 2017.

  5. M. R. Bowers, T. R. McNutt, J. W. Wong, M. H. Phillips, K. R. Hendrickson, P. Kwok, W. Song, and T. L. DeWeese, “Oncospace consortium: A shared radiation oncology database system designed for personalized medicine and research,” International Journal of Radiation Oncology Biology Physics, 93(3):E385, 2015.

  6. U. UNNExT, UNESCAP, “Data harmonization and modelling guide for single windows environment,” 2012.

  7. P. A. Harris, R. Taylor, B. L. Minor, V. Elliott, M. Fernandez, L. O’Neal, L. McLeod, G. Delacqua, F. Delacqua, J. Kirby, et al., “The redcap consortium: Building an international community of software platform partners,” Journal of biomedical informatics, 95:103208, 2019.

  8. P. A. Harris, R. Taylor, R. Thielke, J. Payne, N. Gonzalez, and J. G. Conde, “Research electronic data capture (redcap)–a metadata-driven methodology and workflow process for providing translational research informatics support,” Journal of biomedical informatics, 42(2):377–381, 2009.

  9. S. Kundu, S. Chakraborty, S. Chatterjee, S. Das, R. B. Achari, J. Mukhopadhyay, and P. P. Das, “De-identification of radiomics data retaining longitudinal temporal information” Journal of Medical Systems, 2020.

  10. M. W. Kan, L. H. Leung, and K. Peter, “The use of biologically related model (eclipse) for the intensity-modulated radiation therapy planning of nasopharyngeal carcinomas,” PloS One, 9(11):e112229, 2014.

  11. F. P. Morrison, S. Sengupta, and G. Hripcsak, “Using a pipeline to improve de-identification performance,” In AMIA Annual Symposium Proceedings, volume 2009, page 447. American Medical Informatics Association, 2009.

  12. A. Fedorov, D. Clunie, E. Ulrich, C. Bauer, A. Wahle, B. Brown, M. Onken, J. Riesmeier, S. Pieper, R. Kikinis, et al., “Dicom for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured pet/ct analysis results in head and neck cancer research,” PeerJ, 4:e2057, 2016.

  13. A. J. Grossberg, A. S. Mohamed, H. Elhalawani, W. C. Bennett, K. E. Smith, T. S. Nolan, B. Williams, S. Chamchod, J. Heukelom, M. E. Kantor, et al., “Imaging and clinical data archive for head and neck squamous cell carcinoma patients treated with radiotherapy,” Scientific data, 5:180173, 2018.

  14. W. Bennett, J. Matthews, and W. Bosch, “Su-gg-t-262: Open-source tool for assessing variability in dicom data,” Medical Physics, 37(6Part19):3245, 2010.

  15. S. Bakr, O. Gevaert, S. Echegaray, K. Ayers, M. Zhou, M. Shafiq, H. Zheng, J. A. Benson, W. Zhang, A. N. Leung, et al., “A radiogenomic dataset of non-small cell lung cancer,” Scientific data, 5(1):1–9, 2018.

  16. O. Brook, “Radiological society of north america, inc. ctp-the rsna clinical trial processor,”

  17. W. Bennett, K. Smith, Q. Jarosz, T. Nolan, and W. Bosch, “Reengineering workflow for curation of dicom datasets,” Journal of digital imaging, 31(6):783–791, 2018.

Download references

Acknowledgements

This project is funded under National Digital Library of India (NDLI) sponsored by Ministry of Human Resource Development (MHRD), Govt. of India.

Funding

This study has been funded by the Ministry of Human Resource Development IN (IIT/SRIC/CS/NDM/2018-19/096). None of the authors have conflicts of interest to declare. The CHAVI protocol is approved by the institutional review board at the Tata Medical Center Kolkata and consent waiver for taken for storing data from retrospective studies. The reference no is EC/GOVT/24/IRB23 on August 31, 2018. After the inception of the biobank, patients have given written informed consent for storing their images and clinical datain the biobank prospectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Surajit Kundu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix I

A Appendix I

The Input and Output Clinical Dataset Uses for De-Identification Process

Table 6 The clinical dataset of a patient under INTELHOPE study
Table 7 The clinical dataset of a patient under Glioblastoma multiforme study

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kundu, S., Chakraborty, S., Mukhopadhyay, J. et al. Research Goal-Driven Data Model and Harmonization for De-Identifying Patient Data in Radiomics. J Digit Imaging 34, 986–1004 (2021). https://doi.org/10.1007/s10278-021-00476-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10278-021-00476-9

Keywords

Navigation