Skip to main content

CIRDataset: A Large-Scale Dataset for Clinically-Interpretable Lung Nodule Radiomics and Malignancy Prediction

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13435)


Spiculations/lobulations, sharp/curved spikes on the surface of lung nodules, are good predictors of lung cancer malignancy and hence, are routinely assessed and reported by radiologists as part of the standardized Lung-RADS clinical scoring criteria. Given the 3D geometry of the nodule and 2D slice-by-slice assessment by radiologists, manual spiculation/lobulation annotation is a tedious task and thus no public datasets exist to date for probing the importance of these clinically-reported features in the SOTA malignancy prediction algorithms. As part of this paper, we release a large-scale Clinically-Interpretable Radiomics Dataset, CIRDataset, containing 956 radiologist QA/QC’ed spiculation/lobulation annotations on segmented lung nodules from two public datasets, LIDC-IDRI (N = 883) and LUNGx (N = 73). We also present an end-to-end deep learning model based on multi-class Voxel2Mesh extension to segment nodules (while preserving spikes), classify spikes (sharp/spiculation and curved/lobulation), and perform malignancy prediction. Previous methods have performed malignancy prediction for LIDC and LUNGx datasets but without robust attribution to any clinically reported/actionable features (due to known hyperparameter sensitivity issues with general attribution schemes). With the release of this comprehensively-annotated CIRDataset and end-to-end deep learning baseline, we hope that malignancy prediction methods can validate their explanations, benchmark against our baseline, and provide clinically-actionable insights. Dataset, code, pretrained models, and docker containers are available at


  • Lung nodule
  • Spiculation
  • Malignancy prediction

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-16443-9_2
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-031-16443-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.


  1. Armato, S.G., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer, C.R., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011).

    CrossRef  Google Scholar 

  2. Armato, S.G., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer, C.R., et al.: Data from LIDC-IDRI. Cancer Imaging Arch. (2015).

  3. Arun, N., et al.: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiol. Artif. Intell. 3(6), e200267 (2021)

    Google Scholar 

  4. Bansal, N., Agarwal, C., Nguyen, A.: SAM: he sensitivity of attribution methods to hyperparameters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8673–8683 (2020)

    Google Scholar 

  5. Buty, M., Xu, Z., Gao, M., Bagci, U., Wu, A., Mollura, D.J.: Characterization of lung nodule malignancy using hybrid shape and appearance features. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9900, pp. 662–670. Springer, Cham (2016).

    CrossRef  Google Scholar 

  6. Causey, J.L., et al.: Highly accurate model for prediction of lung nodule malignancy with CT scans. Sci. Rep. 8(1), 1–12 (2018)

    MathSciNet  Google Scholar 

  7. Chelala, L., Hossain, R., Kazerooni, E.A., Christensen, J.D., Dyer, D.S., White, C.S.: Lung-RADS version 1.1: challenges and a look ahead, from the AJR special series on radiology reporting and data systems. Am. J. Roentgenol. 216(6), 1411–1422 (2021). pMID: 33470834

  8. Choi, W., Nadeem, S., Alam, S.R., Deasy, J.O., Tannenbaum, A., Lu, W.: Reproducible and interpretable spiculation quantification for lung cancer screening. Comput. Methods Programs Biomed. 200, 105839 (2021).

  9. Choi, W., et al.: Radiomics analysis of pulmonary nodules in low-dose CT for early detection of lung cancer. Med. Phys. (2018).

    CrossRef  Google Scholar 

  10. Dhara, A.K., Mukhopadhyay, S., Saha, P., Garg, M., Khandelwal, N.: Differential geometry-based techniques for characterization of boundary roughness of pulmonary nodules in CT images. Int. J. Comput. Assist. Radiolo. Surg. 11(3), 337–349 (2016)

    CrossRef  Google Scholar 

  11. Hawkins, S., et al.: Predicting malignant nodules from screening CT scans. J. Thorac. Oncol. 11(12), 2120–2128 (2016)

    CrossRef  Google Scholar 

  12. Meyer, M., et al.: Reproducibility of CT radiomic features within the same patient: influence of radiation dose and CT reconstruction settings. Radiology 293(3), 583–591 (2019)

    CrossRef  Google Scholar 

  13. Niehaus, R., Raicu, D.S., Furst, J., Armato, S.: Toward understanding the size dependence of shape features for predicting spiculation in lung nodules for computer-aided diagnosis. J. Digit. Imaging 28(6), 704–717 (2015)

    CrossRef  Google Scholar 

  14. Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2019. CA Cancer J. Clin. 69(1), 7–34 (2019).

  15. Snoeckx, A., et al.: Evaluation of the solitary pulmonary nodule: size matters, but do not ignore the power of morphology. Insights Imaging 9, 73–86 (2017)

    CrossRef  Google Scholar 

  16. Wickramasinghe, U., Remelli, E., Knott, G., Fua, P.: Voxel2Mesh: 3D mesh model generation from volumetric data. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020. LNCS, vol. 12264, pp. 299–308. Springer, Cham (2020).

    CrossRef  Google Scholar 

  17. Xie, Y., et al.: Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE Trans. Med. Imaging 38(4), 991–1004 (2018)

    CrossRef  Google Scholar 

Download references


This project was supported by MSK Cancer Center Support Grant/Core Grant (P30 CA008748) and by the Sidney Kimmel Cancer Center Support Grant (P30 CA056036).

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Wookjin Choi or Saad Nadeem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Choi, W., Dahiya, N., Nadeem, S. (2022). CIRDataset: A Large-Scale Dataset for Clinically-Interpretable Lung Nodule Radiomics and Malignancy Prediction. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13435. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16442-2

  • Online ISBN: 978-3-031-16443-9

  • eBook Packages: Computer ScienceComputer Science (R0)