Skip to main content

Cloud-Based NoSQL Open Database of Pulmonary Nodules for Computer-Aided Lung Cancer Diagnosis and Reproducible Research

Abstract

Lung cancer is the leading cause of cancer-related deaths in the world, and its main manifestation is pulmonary nodules. Detection and classification of pulmonary nodules are challenging tasks that must be done by qualified specialists, but image interpretation errors make those tasks difficult. In order to aid radiologists on those hard tasks, it is important to integrate the computer-based tools with the lesion detection, pathology diagnosis, and image interpretation processes. However, computer-aided diagnosis research faces the problem of not having enough shared medical reference data for the development, testing, and evaluation of computational methods for diagnosis. In order to minimize this problem, this paper presents a public nonrelational document-oriented cloud-based database of pulmonary nodules characterized by 3D texture attributes, identified by experienced radiologists and classified in nine different subjective characteristics by the same specialists. Our goal with the development of this database is to improve computer-aided lung cancer diagnosis and pulmonary nodule detection and classification research through the deployment of this database in a cloud Database as a Service framework. Pulmonary nodule data was provided by the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), image descriptors were acquired by a volumetric texture analysis, and database schema was developed using a document-oriented Not only Structured Query Language (NoSQL) approach. The proposed database is now with 379 exams, 838 nodules, and 8237 images, 4029 of them are CT scans and 4208 manually segmented nodules, and it is allocated in a MongoDB instance on a cloud infrastructure.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. Available at www.morpheusdata.com/ [Online; accessed on June 14, 2016]

  2. Available at www.speedtest.net [Online; accessed on June 14, 2016]

Abbreviations

3DTA:

3D texture attributes

API:

Application programming interface

BSON:

Binary JavaScript Object Notation

CAD:

Computer-aided diagnosis

CBIR:

Content-based image retrieval

CR:

Computed radiography

CT:

Computed tomography

DBaaS:

Database as a Service

DBMS:

Database management system

DICOM:

Digital Imaging and Communications in Medicine

DX:

Digital radiography

FDA:

Food and Drug Administration

IDRI:

Image Database Resource Initiative

GLCM:

Gray-level co-occurrence matrix

GUI:

Graphical user interface

HIPAA:

Health Insurance Portability and Accountability Act

IP:

Internet protocol

JSON:

JavaScript Object Notation

LIDC:

Lung Image Database Consortium

NCI:

National Cancer Institute

NLST:

National Lung Screening Trial

NoSQL:

Not only Structured Query Language

NSCLC:

Nonsmall cell lung cancer

PR:

(DICOM) PResentation state

PT:

Positron emission tomography

QIBA:

Quantitative Imaging Biomarkers Alliance

RDBMS:

Relational Database Management System

RIDER:

Reference Image Database to Evaluate Therapy Response

SEG:

(DICOM) SEGmentation

SR:

(DICOM) Structured Report document

ROI:

Region of interest

TCIA:

The Cancer Imaging Archive

XaaS:

Everything as a Service

XML:

eXtensible Markup Language

References

  1. Wu H, Sun T, Wang J, Li X, Wang W, Huo D, Lv P, He W, Wang K, Guo X: Combination of Radiological and Gray Level Co-occurrence Matrix Textural Features Used to Distinguish Solitary Pulmonary Nodules by Computed Tomography. J Digit Imaging 26(4):797–802, 2013

    Article  PubMed  PubMed Central  Google Scholar 

  2. Reeves A, Chan A, Yankelevitz D, Henschke C, Kressler B, Kostis W: On Measuring the Change in Size of Pulmonary Nodules. IEEE Trans Med Imaging 25(4):435–450, 2006

    Article  PubMed  Google Scholar 

  3. Oliveira M, Ferreira J: A Bag-of-Tasks Approach to Speed Up the Lung Nodules Retrieval in the BigData age. E-Health Networking, Application & Services, DOI: 10.1109/HealthCom.2013.6720753, October 12, 2013.

  4. Doi K: Computer-Aided Diagnosis in Medical Imaging: Historical Review, Current Status and Future Potential. Comput Med Imaging and Graph 31(4–5):198–211, 2007

    Article  Google Scholar 

  5. Akgul C, Rubin D, Napel S, Beaulieu C, Greenspan H, Acar B: Content-Based Image Retrieval in Radiology: Current Status and Future Directions. J Digit Imaging 24(2):208–222, 2011

    Article  PubMed  Google Scholar 

  6. Jalalian A, Mashohor S, Mahmud H, Saripan M, Ramli A, Karasfi B: Computer-Aided Detection/Diagnosis of Breast Cancer in Mammography and Ultrasound: a review. Clin Imaging 37(3):420–426, 2013

    Article  PubMed  Google Scholar 

  7. Deserno T, Welter P, Horsch A: Towards a Repository for Standardized Medical Image and Signal Case Data Annotated with Ground Truth. J Digit Imaging 25(2):213–226, 2012

    Article  PubMed  Google Scholar 

  8. Tsymbal A, Meissner E, Kelm M, Kramer M: Towards Cloud-Based Image-Integrated Similarity Search in Big Data. Biomedical and Health Informatics, DOI: 10.1109/BHI.2014.6864434, June 4, 2014.

  9. Armato S, McLennan G, Bidaut L, McNitt-Gray M, Meyer C, Reeves A, Zhao B, Aberle D, Henschke C, Hoffman E, et al: The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Med Phys 38:915, 2011

    Article  PubMed  PubMed Central  Google Scholar 

  10. Aberle D, Berg C, Black W, Church T, Fagerstrom R, Galen B, Gareen I, Gatsonis C, Goldin J, Gohagan J, et al: The National Lung Screening Trial: overview and study design. Radiology 258(1):243–253, 2011

    Article  PubMed  Google Scholar 

  11. Aerts H, Velazquez E, Leijenaar R, Parmar C, Grossmann P, Carvalho S, Bussink J, Monshouwer R, Haibe-Kains B, Rietveld D, et al.: Decoding Tumour Phenotype by Noninvasive Imaging Using a Quantitative Radiomics Approach. Nature Communications, 5, 2014.

  12. The Cancer Imaging Archive (TCIA). RIDER Collections. Available at http://wiki.cancerimagingarchive.net/display/Public/RIDER+Collections Accessed 23 February 2015.

  13. Gavrielides M, Kinnard L, Myers K, Peregoy J, Pritchard W, Zeng R, Esparza J, Karanian J, Petrick N: A Resource for the Assessment of Lung Nodule Size Estimation Methods: database of thoracic CT scans of an anthropomorphic phantom. Optics Express 18(14):15244–15255, 2010

    Article  PubMed  PubMed Central  Google Scholar 

  14. Das M, Ley-Zaporozhan J, Gietema H, Czech A, Muhlenbruch G, Mahnken A, Katoh M, Bakai A, Salganicoff M, Diederich S, et al: Accuracy of Automated Volumetry of Pulmonary Nodules Across Different Multislice CT Scanners. Eur Radiol 17(8):1979–1984, 2007

    Article  PubMed  Google Scholar 

  15. The Cancer Imaging Archive (TCIA). Lung Phantom Image Collection. Available at http://wiki.cancerimagingarchive.net/display/Public/Lung+Phantom Accessed 23 February 2015.

  16. Armato S, Roberts R, McNitt-Gray M, Meyer C, Reeves A, McLennan G, Engelmann R, Bland P, Aberle D, Kazerooni E, et al: The Lung Image Database Consortium (LIDC): Ensuring the integrity of expert-defined “truth”. Acad Radiol 14(12):1455–1463, 2007

    Article  PubMed  PubMed Central  Google Scholar 

  17. Sluimer I, Schilham A, Prokop M, Ginneken B: Computer Analysis of Computed Tomography Scans of the Lung: a survey. IEEE Trans Med Imaging 25(4):385–405, 2006

    Article  PubMed  Google Scholar 

  18. Lung Image Database Consortium and Image Database Resource Initiative. The Cancer Imaging Archive. Available at http://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI Accessed 02 February 2015.

  19. Montagnat J, Breton V, Magnin I: Using Grid Technologies to Face Medical Image Analysis Challenges. Biomedical Computations on the Grid, DOI: 10.1109/ccgrid.2003.1199418, May, 2003.

  20. Vaquero L, Rodero-Merino L, Caceres J, Lindner M: A Break in the Clouds: Towards a Cloud Definition. ACM SIGCOMM Computer Communication Review 39(1):50–55, 2008

    Article  Google Scholar 

  21. Wei-ping Z, Ming-Xin L, Huan C: Using MongoDB to Implement Textbook Management System Instead of MySQL. Communication Software and Network, DOI: 10.1109/iccsn.2011.6013720, May 29, 2011.

  22. Tiwari S: Professional NoSQL. John Wiley and Sons, Inc., 2011.

  23. Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers A: Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute, pages 1–137, 2011.

  24. Banker K: MongoDB in Action. Manning Publications Co., 2011.

  25. Strauch C, Sites U, Kriha W: NoSQL Databases. Stuttgart Media University, 2011.

  26. Choi W, Choi T: Automated Pulmonary Nodule Detection Based on Three-Dimensional Shape-Based Feature Descriptor. Comput Methods Programs Biomed 113(1):37–54, 2014

    Article  PubMed  Google Scholar 

  27. Erasmus J, Connolly J, McAdams H, Roggli V: Solitary Pulmonary Nodules: Part I. Morphologic Evaluation for Differentiation of Benign and Malignant Lesions 1. Radiographics, 20(1):43–58, 2000.

  28. Kumar A, Kim J, Cai W, Fulham M, Feng D: Content-Based Medical Image Retrieval: A Survey of Applications to Multidimensional and Multimodality Data. J Digit Imaging 26(6):1025–1039, 2013

    Article  PubMed  PubMed Central  Google Scholar 

  29. Lung Image Database Consortium and Image Database Resource Initiative. LIDC-IDRI Documentation: Anno-tated XML File. Available at http://wiki.cancerimagingarchive.net/download/attachments/3539039/annotated xml file Mar% 202010.rtf?version = 1&modificationDate = 1319224566198&api = v2 Accessed 02 February 2015.

  30. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, et al: The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J Digit Imaging 26(6):1045–1057, 2013

    Article  PubMed  PubMed Central  Google Scholar 

  31. Leavitt N: Will NoSQL Databases Live Up to Their Promise? Computer 43(2):12–14, 2010

    Article  Google Scholar 

  32. Liu L: Computing Infrastructure for Big Data Processing. Frontiers of Computer Science 7(2):165–170, 2013

    Article  Google Scholar 

  33. MongoDB Inc. MongoDB Manual. Available at http://docs.mongodb.org/manual Accessed 02 February 2015.

  34. Hayes B: Cloud Computing. Communications of the ACM, 51(7), 2008.

  35. Rimal B, Choi E, Lumb I: A Taxonomy and Survey of Cloud Computing Systems. INC, IMS and IDC, DOI: 10.1109/NCM.2009.218, August 27, 2009.

  36. Hacigumus H, Iyer B, Mehrotra S: Providing Database as a Service. Data Engineering, DOI: 10.1109/ICDE.2002.994695, March 1, 2002.

  37. Oliveira M, Cirne W, Marques P: Towards Applying Content-Based Image Retrieval in the Clinical Routine. Future Generation Computer Systems 23(3):466–474, 2007

    Article  Google Scholar 

  38. Dhara A, Mukhopadhyay S, Dutta A, Garg M, Khandelwal N: A Combination of Shape and Texture Features for Classification of Pulmonary Nodules in Lung CT Images. J Digit Imaging, 1–10, 2016.

  39. Han F, Wang H, Zhang G, Han H, Song B, Li L, Moore W, Lu H, Zhao H, Liang Z: Texture feature analysis for computer-aided diagnosis on pulmonary nodules. J Digit Imaging 28(1):99–115, 2015

    Article  PubMed  Google Scholar 

  40. Kaya A, Can A: A weighted rule based method for predicting malignancy of pulmonary nodules by nodule characteristics. J Biomed Inform 56:69–79, 2015

    Article  PubMed  Google Scholar 

  41. Lam M, Disney T, Raicu D, Furst J, Channin D: BRISC - An Open Source Pulmonary Nodule Image Retrieval Framework. J Digit Imaging 20(1):63–71, 2007

    Article  PubMed  PubMed Central  Google Scholar 

  42. Ghoneim D, Toussaint G, Constans J, Certaines J: Three Dimensional Texture Analysis in MRI: A Preliminary Evaluation in Gliomas. Magn Reson Imaging 21(9):983–987, 2003

    Article  Google Scholar 

  43. Haralick R, Shanmugam K, Dinstein I: Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics, (6):610–621, 1973.

  44. Mehdi A, Vassili K, Eduard S, Vahid T: A Comprehensive Framework for Automatic Detection of Pulmonary Nodules in Lung CT Images. Image Analysis & Stereology 33(1):13–27, 2014

    Article  Google Scholar 

Download references

Acknowledgments

We thank the Brazilian institutions Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Fundação de Amparo à Pesquisa do Estado de Alagoas (FAPEAL) for the financial support in the form of a master scholarship (grant number 20130603-002-0040-0063).

Author information

Affiliations

Authors

Corresponding author

Correspondence to José Raniery Ferreira Junior.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ferreira Junior, J.R., Oliveira, M.C. & de Azevedo-Marques, P.M. Cloud-Based NoSQL Open Database of Pulmonary Nodules for Computer-Aided Lung Cancer Diagnosis and Reproducible Research. J Digit Imaging 29, 716–729 (2016). https://doi.org/10.1007/s10278-016-9894-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10278-016-9894-9

Keywords

  • Lung cancer
  • Pulmonary nodule
  • Lung Image Database Consortium
  • Image Database Resource Initiative
  • Computer-aided diagnosis
  • Computer-aided detection
  • 3D texture analysis
  • NoSQL
  • Document-oriented nonrelational database
  • MongoDB
  • Cloud computing
  • Database as a Service
  • Reproducible research