Learning Lung Nodule Malignancy Likelihood from Radiologist Annotations or Diagnosis Data


Lung cancer is the world’s most lethal type of cancer, being crucial that an early diagnosis is made in order to achieve successful treatments. Computer-aided diagnosis can play an important role in lung nodule detection and on establishing the nodule malignancy likelihood. This paper is a contribution in the design of a learning approach, using computed tomography images. Our methodology involves the measurement of a set of features in the nodular image region, and train classifiers, as K-nearest neighbor or support vector machine (SVM), to compute the malignancy likelihood of lung nodules. For this purpose, the Lung Image Database Consortium and image database resource initiative database is used due to its size and nodule variability, as well as for being publicly available. For training we used both radiologist’s labels and annotations and diagnosis data, as biopsy, surgery and follow-up results. We obtained promising results, as an Area Under the Receiver operating characteristic curve value of 0.962 ± 0.005 and 0.905 ± 0.04 was achieved for the Radiologists’ data and for the Diagnosis data, respectively, using an SVM with an exponential kernel combined with a correlation-based feature selection method.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    American Cancer Society (2015). Cancer facts and figures.

  2. 2.

    Organization WH. International Agency for Research on Cancer. (2015). GLOBOCAN 2012: Estimated cancer incidence, mortality and prevalence worldwide in 2012.

  3. 3.

    Motohiro, A., Ueda, H., Komatsu, H., Yanai, N., & Mori, T. (2002). Prognosis of non-surgically treated, clinical stage I lung cancer patients in Japan. Lung Cancer, 36(1), 65–69.

    Article  Google Scholar 

  4. 4.

    Breadsmoore, C. J., & Screaton, N. J. (2003). Classification, staging and prognosis of lung cancer. European Journal of Radiology, 45, 8–17.

    Article  Google Scholar 

  5. 5.

    van Ginneken, B. (2008). Computer-aided diagnosis in thoracic computed tomography. Imaging Decisions MRI, 12, 11–22.

    Article  Google Scholar 

  6. 6.

    Aberle, D., Adams, A., Berg, C., Black, W., Clapp, J., Fagerstrom, R., et al. (2011). Reduced lung-cancer mortality with low-dose computed tomographic screening. The New England Journal of Medicine, 365, 395–409.

    Article  Google Scholar 

  7. 7.

    Rasmussen, J., Siersma, V., Pedersen, J., Heleno, B., Saghir, Z., & Brodersen, J. (2014). Healthcare costs in the Danish randomised controlled lung cancer CT-screening trial: A registry study. Lung Cancer, 83(3), 347–355.

    Article  Google Scholar 

  8. 8.

    Way, T., Chan, H., Hadjiiski, L., Sahiner, B., Chughtai, A., Song, T., et al. (2010). Computer-aided diagnosis of lung nodules on CT scans: ROC study of its effect on radiologists’ performance. Academic Radiology, 17(3), 323–332.

    Article  Google Scholar 

  9. 9.

    Antonelli, M., Cococcioni, M., Lazzerini, B., & Marcelloni, F. (2011). Computer-aided detection of lung nodules based on decision fusion techniques. Pattern Analysis and Applications, 14, 295–310.

    MathSciNet  Article  Google Scholar 

  10. 10.

    Saien, S., Hamid Pilevar, A., & Abrishami Moghaddam, H. (2015). Refinement of lung nodule candidates based on local geometric shape analysis and laplacian of gaussian kernels. Computers in Biology and Medicine, 54, 188–198.

    Article  Google Scholar 

  11. 11.

    Han, H., Li, L., Wang, H., Zhang, H., Moore, W., Liang, Z. (2014). A novel computer-aided detection system for pulmonary nodule identification in CT images. Proc. Of SPIE. Progress in Biomedical Optics and Imaging, 9035.

  12. 12.

    Badura, P., & Pietka, E. (2014). Soft computing approach to 3D lung nodule segmentation in CT. Computers in Biology and Medicine, 53, 230–243.

    Article  Google Scholar 

  13. 13.

    Heckel, F., Meine, H., Moltz, J., Kuhnigk, J. M., Heverhagen, J., Kießling, A., et al. (2014). Segmentation-based partial volume correction for volume estimation of solid lesions in CT. IEEE Transactions on Medical Imaging, 33(2), 462–480.

    Article  Google Scholar 

  14. 14.

    Sun, S., Guo, Y., Guan, Y., Ren, H., Fan, L., & Kang, Y. (2014). Juxta-vascular nodule segmentation based on flow entropy and geodesic distance. IEEE Journal of Biomedical and Health Informatics, 18(4), 1355–1362.

    Article  Google Scholar 

  15. 15.

    Krewer, H., Geiger, B., Hall, L., Goldgof, D., Gu, Y., Tockman, M. Gillies, R. (2013). Effect of texture features in computer aided diagnosis of pulmonary nodules in low-dose computed tomography. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, SMC 2013 (pp. 3887–3891).

  16. 16.

    Aggarwal, P., Vig, R., & Sardana, H. (2013). Patient-wise versus nodule-wise classification of annotated pulmonary nodules using pathologically confirmed cases. Journal of Computers (Finland), 8(9), 2245–2255. doi:10.4304/jcp.8.9.2245-2255.

    Google Scholar 

  17. 17.

    Han, F., Wang, H., Song, B., Zhang, G. Lu, H., Moore, W., Zhao, H., Liang, Z. (2013). A new 3D texture feature based computer-aided diagnosis approach to differentiate pulmonary nodules. Proceedings of the SPIE - The International Society for Optical Engineering (p. 8670).

  18. 18.

    Way, T. W. (2008). Computer-aided diagnosis of pulmonary nodules in thoracic computed tomography, Ph.D. thesis, The Universtity of Michigan.

  19. 19.

    Ye, X., Lin, X., Dehmeshkia, J., Slabaugh, G., & Beddoe, G. (2009). Shape-based computer-aided detection of lung nodules in thoracic CT images. IEEE Transactions on Biomedical Engineering, 56(10), 1810–1820.

    Google Scholar 

  20. 20.

    Armato, S., McLennan, G., Bidaut, L., McNitt-Gray, F., Meyer, R., Reeves, P., et al. (2011). The lung image database consortium (LIDC) and image database resource initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38, 915–931.

    Article  Google Scholar 

  21. 21.

    Xu, D., van der Zaag-Loonen, H., Oudkerk, M., Wang, Y., Vliegenthart, R., Scholten, E., et al. (2009). Smooth or attached solid indeterminate nodules detected at baseline CT screening in the NELSON study: Cancer risk during 1 year of follow-up. Radiology, 250(1), 264–272.

    Article  Google Scholar 

  22. 22.

    Novo, J., Gonçalves, L., Mendonça, A. M., & Campilho, A. (2015). 3D lung nodule candidates detection in multiple scales. In Proceedings of the IAPR International Conference on Machine Vision Applications, MVA’2015 (pp. 5–8).

  23. 23.

    Way, T., Hadjiiski, L., Sahiner, B., Chan, H. P., Cascade, P., Kazerooni, E., et al. (2006). Computer-aided diagnosis of pulmonary nodules on CT scans: Segmentation and classification using 3D active contours. Medical Physics, 37(7), 2323–2337.

    Article  Google Scholar 

  24. 24.

    Wu, H., Sun, T., Wang, J., Li, X., Wang, W., Huo, D., et al. (2013). Combination of radiological and gray level co-occurrence matrix textural features used to distinguish solitary pulmonary nodules by computed tomography. Journal of Digital Imaging, 26(4), 797–802.

    Article  Google Scholar 

  25. 25.

    Chen, H., Xu, Y., Ma, Y., & Ma, B. (2010). Neural network ensemble-based computer-aided diagnosis for differentiation of lung nodules on CT images. Academic Radiology, 17, 595–602.

    Article  Google Scholar 

  26. 26.

    Orozco, H. M., Villegas, O. O. V., Sanchez, V. G. C., Domínguez, H. J. O., & Alfaro, M. J. N. (2015). Automated system for lung nodules classification based on wavelet feature descriptor and support vector machine. Biomedical Engineering Online, 14(1), 9.

    Article  Google Scholar 

  27. 27.

    Zhu, Y., Tan, Y., Hua, Y., Wang, M., Zhang, G., & Zhang, J. (2010). Feature selection and performance evaluation of support vector machine (SVM)-based classifier for differentiating benign and malignant pulmonary nodules by Computed Tomography. Journal of Digital Imaging, 23(1), 51–65.

    Article  Google Scholar 

  28. 28.

    Way, T. W., Sahiner, B., Chan, H. P., Hadjiiski, L., Cascade, P. N., Chughtai, A., et al. (2009). Computer-aided diagnosis of pulmonary nodules on CT scans: Improvement of classification performance with nodule surface features. Medical Physics, 36(7), 3086–3098.

    Article  Google Scholar 

  29. 29.

    Kuruvilla, J., & Gunavathi, K. (2014). Lung cancer classification using neural networks for CT images. Computer Methods and Programs in Biomedicine, 113(1), 202–209.

    Article  Google Scholar 

  30. 30.

    Iwano, S., Nakamurab, T., Kamiokac, Y., Ikeda, M., & Ishigaki, T. (2008). Computer-aided differentiation of malignant from benign solitary pulmonary nodules imaged by high-resolution CT. Computerized Medical Imaging and Graphics, 32, 416–422.

    Article  Google Scholar 

  31. 31.

    Tartar, A., Akan, A., & Kilic, N. (2014). A novel approach to malignant-benign classification of pulmonary nodules by using ensemble learning classifiers. In Proceedings of the Engineering in Medicine and Biology Society, 36th Annual International Conference of the IEEE (pp, 4651–4654).

  32. 32.

    da Silva, E. C., Silva, A. C., de Paiva, A. C., & Nunes, R. A. (2008). Diagnosis of lung nodule using Moran’s index and Geary’s coefficient in computerized tomography images. Pattern Analysis and Applications, 11, 89–99.

    MathSciNet  Article  Google Scholar 

  33. 33.

    Silva, A. C., Carvalho, P. C., & Gattass, M. (2004). Analysis of spatial variability using geostatistical functions for diagnosis of lung nodule in computerized tomography images. Pattern Analysis and Applications, 7, 227–234.

    Article  Google Scholar 

  34. 34.

    Armato, S. G., Altman, M. B., Wilkie, J., Sone, S., Li, F., Doi, K., et al. (2003). Automated lung nodule classification following automated nodule detection on CT: A serial approach. Medical Physics, 30(6), 1188–1197.

    Article  Google Scholar 

  35. 35.

    Silva, S., Madeira, J., Santos, B.S., & Ferreira, C. (2011) Inter-observer variability assessment of a left ventricle segmentation tool applied to 4D MDCT images of the heart. In Proceedings of the Engineering in Medicine and Biology Society, EMBC’2011 Annual International Conference of the IEEE (pp. 3411–3414).

  36. 36.

    Lee, M., Boroczky, L., Stasik, K., Cann, A., Borczuk, A., Kawut, S., et al. (2010). Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. Artificial Intelligence in Medicine, 50, 43–53.

    Article  Google Scholar 

  37. 37.

    Ciompi, F., de Hoop, B., van Riel, S., Chung, K., Scholten, E., Oudkerk, M., et al. (2015). Automatic classification of pulmonary perifissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box. Medical Image Analysis, 26(1), 195–202.

    Article  Google Scholar 

  38. 38.

    Reeves, A., Xie, Y., & Jirapatnakul, A. (2016). Automated pulmonary nodule CT image characterization in lung cancer screening. International Journal of Computer Assisted Radiology and Surgery, 11(1), 73–88.

    Article  Google Scholar 

  39. 39.

    Kaya, A., & Can, A. (2015). A weighted rule based method for predicting malignancy of pulmonary nodules by nodule characteristics. Journal of Biomedical Informatics, 56, 69–79.

    Article  Google Scholar 

  40. 40.

    Firmino, M., Angelo, G., Morais, H., Dantas, M. R., & Valentim, R. (2016). Computer-aided detection (CADe) and diagnosis (CADx) system for lung cancer with likelihood of malignancy. Biomedical Engineering Online, 15(2), 1–17. doi:10.1186/s12938-015-0120-7.

    Google Scholar 

  41. 41.

    Aerts, H. J., Velazquez, E. R., Leijenaar, R. T., Parmar, C., Grossmann, P., Carvalho, S., et al. (2014). Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications, 4006, 1–8. doi:10.1038/ncomms5006.

    Google Scholar 

  42. 42.

    Kumar, D., Shafiee, M., Chung, A., Khalvati, F., Haider, M., & Wong, A. (2015). Discovery radiomics for computed tomography cancer detection. Computer Vision and Pattern Recognition, 1–8.

  43. 43.

    Novo, J., Rouco, J., Mendonça, A., & Campilho, A. (2014). Reliable lung segmentation methodology by including juxtapleural nodules. In Lecture Notes in Computer Science: Image Analysis and Recognition, International Conference Image Analysis and Recognition, ICIAR’2014 (Vol. 8815, pp. 227–235).

  44. 44.

    Jacobs, C., van Rikxoort, E., Twellmann, T., Scholten, E., de Jong, P., Kuhnigk, J. M., et al. (2014). Automatic detection of subsolid pulmonary nodules in thoracic computed tomography images. Medical Image Analysis, 18(2), 374–384.

    Article  Google Scholar 

  45. 45.

    Diciotti, S., Lombardo, S., Falchini, M., Picozzi, G., & Mascalchi, M. (2011). Automated segmentation refinement of small lung nodules in CT scans by local shape analysis. IEEE Transactions on Biomedical Engineering, 58(12), 3418–3428.

    Article  Google Scholar 

  46. 46.

    Aggarwal, P., Vig, R., & Sardana, K. (2013). Largest versus smallest nodules marked by different radiologists in chest CT scans for lung cancer detection. In Proceedings of the International MultiConference of Engineers and Computer ScientistsIMECS’2013 (Vol. 1, pp. 462–466).

  47. 47.

    He, X., Sahiner, B., Gallas, B., Chen, W., & Petrick, N. (2014). Computerized characterization of lung nodule subtlety using thoracic CT images. Physics in Medicine & Biology, 59(4), 897–910.

    Article  Google Scholar 

  48. 48.

    Gonçalves, L., Novo, J., & Campilho, A. (2016). Hessian based approaches for 3D lung nodule segmentation. Expert Systems with Applications, 61, 1–15.

    Article  Google Scholar 

  49. 49.

    Murphy, K., van Ginneken, B., Schilham, A., de Hoop, B., Gietema, H., & Prokop, M. (2009). A large-scale evaluation of automatic pulmonary nodule detection in chest CT using local image features and k-nearest-neighbour classification. Medical Image Analysis, 13, 757–770.

    Article  Google Scholar 

  50. 50.

    Haralick, R. M., Shanmugam, K., & Dinstein, I. H. (1973). Textural features for image classification. IEEE Transactions on Systems, Man and Cybernetics, 6, 610–621.

    Article  Google Scholar 

  51. 51.

    Albregtsen, F. (2008). Statistical texture measures computed from gray level coocurrence matrices, pp. 1–14.

  52. 52.

    Grigorescu, S. E., Petkov, N., & Kruizinga, P. (2002). Comparison of texture features based on gabor filters. IEEE Transactions on Image Processing, 11(10), 1160–1167.

    MathSciNet  Article  Google Scholar 

  53. 53.

    Laws, K. I. (1980). Textured image segmentation. Tech. rep., DTIC Document.

  54. 54.

    Liu, Y., & Zheng, Y. F. (2006). FS_SFS: A novel feature selection method for support vector machines. Pattern Recognition, 39(7), 1333–1345.

    Article  MATH  Google Scholar 

  55. 55.

    Mao, K. (2004). Feature subset selection for support vector machines through discriminative function pruning analysis. IEEE Transactions on Systems, Man, and Cybernetics B, 34(1), 60–67.

    Article  Google Scholar 

  56. 56.

    Hall, M. A. (1999). Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato.

  57. 57.

    Kononenko, I. (1994). Estimating attributes: analysis and extensions of relief. In Proceedings of the European Conference on Machine Learning - ECML’1994 (pp. 171–182).

  58. 58.

    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  59. 59.

    Zhang, F., Song, Y., Cai, W., Lee, M. Z., Zhou, Y., Huang, H., et al. (2013). Lung nodule classification with multilevel patch-based context analysis. IEEE Transactions on Biomedical Engineering, 61(4), 1155–1166.

    Article  Google Scholar 

Download references


This work is financed by the ERDF—European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation—COMPETE 2020 Programme, and by National Funds through the Portuguese funding agency, FCT—Fundação para a Ciência e a Tecnologia, within the project with code POCI-01-0145-FEDER-016673 and the Grant Contract SFRH/BPD/85663/2012 (J. Novo).

Author information



Corresponding author

Correspondence to Jorge Novo.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gonçalves, L., Novo, J., Cunha, A. et al. Learning Lung Nodule Malignancy Likelihood from Radiologist Annotations or Diagnosis Data. J. Med. Biol. Eng. 38, 424–442 (2018). https://doi.org/10.1007/s40846-017-0317-2

Download citation


  • Lung nodules
  • Computer-aided diagnosis
  • Thoracic computed tomography (CT) imaging
  • Feature selection
  • Malignancy likelihood