Advertisement

ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases

  • Xiaosong WangEmail author
  • Yifan Peng
  • Le Lu
  • Zhiyong Lu
  • Mohammadhadi Bagheri
  • Ronald M. Summers
Chapter
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

The chest X-ray is one of the most commonly accessible radiological examinations for screening and diagnosis of many lung diseases. A tremendous number of X-ray imaging studies accompanied by radiological reports are accumulated and stored in many modern hospitals’ picture archiving and communication systems (PACS) . On the other side, it is still an open question how this type of hospital-size knowledge database containing invaluable imaging informatics (i.e., loosely labeled) can be used to facilitate the data-hungry deep learning paradigms in building truly large-scale high-precision computer-aided diagnosis (CAD)  systems. In this chapter, we present a chest X-ray database, namely, “ChestX-ray”, which comprises 121,120 frontal-view X-ray images of 30,805 unique patients with the text-mined eight disease image labels (where each image can have multi-labels), from the associated radiological reports using natural language processing. Importantly, we demonstrate that these commonly occurring thoracic diseases can be detected and even spatially located via a unified weakly supervised multi-label image classification and disease localization framework, which is validated using our proposed dataset. Although the initial quantitative results are promising as reported, deep convolutional neural network-based “reading chest X-rays” (i.e., recognizing and locating the common disease patterns trained with only image-level labels) remains a strenuous task for fully automated high-precision CAD systems.

Notes

Acknowledgements

This work was supported by the Intramural Research Programs of the NIH Clinical Center and National Library of Medicine. We thank NVIDIA Corporation for the GPU donation.

References

  1. 1.
    Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick L (2015) Vqa: visual question answering. In: ICCVGoogle Scholar
  2. 2.
    Aronson AR, Lang FM (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17(3):229–236.  https://doi.org/10.1136/jamia.2009.002733CrossRefGoogle Scholar
  3. 3.
    Ba J, Swersky K, Fidler S, Salakhutdinov R (2015) Predicting deep zero-shot convolutional neural networks using textual descriptions. In: ICCVGoogle Scholar
  4. 4.
    Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media, IncGoogle Scholar
  5. 5.
    Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310.  https://doi.org/10.1006/jbin.2001.1029, http://www.sciencedirect.com/science/article/pii/S1532046401910299CrossRefGoogle Scholar
  6. 6.
    Charniak E, Johnson M (2005) Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd annual meeting on association for computational linguistics (ACL), pp 173–180Google Scholar
  7. 7.
    De Marneffe MC, Manning CD (2015) Stanford typed dependencies manual. Stanford University (2015)Google Scholar
  8. 8.
    Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, Thoma GR, McDonald CJ (2015) Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc 23(2):304–310.  https://doi.org/10.1093/jamia/ocv080, http://jamia.oxfordjournals.org/content/jaminfo/early/2015/07/01/jamia.ocv080.1.full.pdfCrossRefGoogle Scholar
  9. 9.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition. IEEE, pp 248–255Google Scholar
  10. 10.
    Dou Q, Chen H, Yu L, Zhao L, Qin J, Wang D, Mok V, Shi L, Heng P (2016) Automatic detection of cerebral microbleeds from mr images via 3D convolutional neural networks. IEEE Trans Med Imaging 35(5):1182–1195CrossRefGoogle Scholar
  11. 11.
    Durand T, Thome N, Cord M (2016) Weldon: weakly supervised learning of deep convolutional neural networks. IEEE CVPRGoogle Scholar
  12. 12.
    Everingham M, Eslami SMA, Van Gool LJ, Williams C, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRefGoogle Scholar
  13. 13.
    Greenspan H, van Ginneken B, Summers RM (2016) Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging 35(5):1153–1159CrossRefGoogle Scholar
  14. 14.
    Hariharan B, Girshick R (2016) Low-shot visual object recognition. arXiv:1606.02819
  15. 15.
    Havaei M, Guizard N, Chapados N, Bengio Y (2016) Hemis: hetero-modal image segmentation. In: MICCAI, (2). Springer, Berlin, pp 469–477CrossRefGoogle Scholar
  16. 16.
    He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
  17. 17.
    Hwang S, Kim HE (2015) Self-transfer learning for weakly supervised lesion localization. In: MICCAI, (2). pp 239–246Google Scholar
  18. 18.
    Jaeger S, Candemir S, Antani S, Wáng YXJ, Lu PX, Thoma G (2014) Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6). http://qims.amegroups.com/article/view/5132
  19. 19.
    Jamaludin A, Kadir T, Zisserman A (2016) Spinenet: automatically pinpointing classification evidence in spinal MRIs. In: MICCAI. Springer, BerlinCrossRefGoogle Scholar
  20. 20.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093
  21. 21.
    Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: fully convolutional localization networks for dense captioning. In: CVPRGoogle Scholar
  22. 22.
    Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: CVPRGoogle Scholar
  23. 23.
    Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, Bernstein M, Fei-Fei L (2016) Visual genome: connecting language and vision using crowdsourced dense image annotations. https://arxiv.org/abs/1602.07332
  24. 24.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  25. 25.
    Leaman R, Khare R, Lu Z (2015) Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform 57:28–37.  https://doi.org/10.1016/j.jbi.2015.07.010CrossRefGoogle Scholar
  26. 26.
    Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick L (2014) Microsoft coco: common objects in context. In: ECCV, (5). pp 740–755Google Scholar
  27. 27.
    McClosky D (2009) Any domain parsing: automatic domain adaptation for natural language parsing. Department of Computer Science, Brown University, ThesisGoogle Scholar
  28. 28.
    Moeskops P, Wolterink J, van der Velden B, Gilhuijs K, Leiner T, Viergever M, Isgum I (2016) Deep learning for multi-task medical image segmentation in multiple modalities. In: MICCAI. Springer, BerlinCrossRefGoogle Scholar
  29. 29.
    Open-i: an open access biomedical search engine. https://openi.nlm.nih.gov
  30. 30.
    Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: IEEE CVPR, pp 685–694Google Scholar
  31. 31.
    Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1713–1721Google Scholar
  32. 32.
    Plummer B, Wang L, Cervantes C, Caicedo J, Hockenmaier J, Lazebnik S (2015) Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: ICCVGoogle Scholar
  33. 33.
    Qiao R, Liu L, Shen C, van den Hengel A (2016) Less is more: zero-shot learning from online textual documents with noise suppression. In: CVPRGoogle Scholar
  34. 34.
    Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: MICCAI. Springer, Berlin, pp 234–241Google Scholar
  35. 35.
    Roth H, Lu L, Farag A, Shin HC, Liu J, Turkbey EB, Summers RM (2015) Deeporgan: multi-level deep convolutional networks for automated pancreas segmentation. In: MICCAI. Springer, Berlin, pp 556–564Google Scholar
  36. 36.
    Roth HR, Lu L, Seff A, Cherry KM, Hoffman J, Wang S, Liu J, Turkbey E, Summers RM (2014) A new 2.5D representation for lymph node detection using random sets of deep convolutional neural network observations. In: MICCAI. Springer, Berlin, pp 520–527CrossRefGoogle Scholar
  37. 37.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  38. 38.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  39. 39.
    Setio A, Ciompi F, Litjens G, Gerke P, Jacobs C, van Riel S, Wille M, Naqibullah M, Sánchez C, van Ginneken B (2016) Pulmonary nodule detection in ct images: false positive reduction using multi-view convolutional networks. IEEE Trans Med Imaging 35(5):1160–1169CrossRefGoogle Scholar
  40. 40.
    Shin H, Lu L, Kim L, Seff A, Yao J, Summers R (2016) Interleaved text/image deep mining on a large-scale radiology database for automated image interpretation. J Mach Learn Res 17:1–31MathSciNetGoogle Scholar
  41. 41.
    Shin H, Roberts K, Lu L, Demner-Fushman D, Yao J, Summers R (2016) Learning to read chest x-rays: recurrent neural cascade model for automated image annotation. In: CVPRGoogle Scholar
  42. 42.
    Shin H, Roth H, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers R (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learnings. IEEE Trans Med Imaging 35(5):1285–1298CrossRefGoogle Scholar
  43. 43.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  44. 44.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9Google Scholar
  45. 45.
    Tapaswi M, Zhu Y, Stiefelhagen R, Torralba A, Urtasun R, Fidler S (2015) Movieqa: understanding stories in movies through question-answering. In: ICCVGoogle Scholar
  46. 46.
    Vendrov I, Kiros R, Fidler S, Urtasun R (2016) Order-embeddings of images and language. In: ICLRGoogle Scholar
  47. 47.
    Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: CVPR, pp 3156–3164Google Scholar
  48. 48.
    Wilke HJ, Kümin M, Urban J (2016) Genodisc dataset: the benefits of multi-disciplinary research on intervertebral disc degeneration. Eur Spine J. http://www.physiol.ox.ac.uk/genodisc/
  49. 49.
    Wu Q, Wang P, Shen C, Dick A, van den Hengel A (2016) Ask me anything: free-form visual question answering based on knowledge from external sources. In: CVPRGoogle Scholar
  50. 50.
    Yao J, et al (2016) A multi-center milestone study of clinical vertebral ct segmentation. Comput Med Imaging Graph 49(4):16–28CrossRefGoogle Scholar
  51. 51.
    Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. In: TACLGoogle Scholar
  52. 52.
    Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Learning deep features for discriminative localization. arXiv:1512.04150
  53. 53.
    Zhu Y, Groth O, Bernstein M, Fei-Fei L (2016) Visual7w: grounded question answering in images. In: CVPRGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Xiaosong Wang
    • 1
    Email author
  • Yifan Peng
    • 2
  • Le Lu
    • 3
    • 4
  • Zhiyong Lu
    • 2
  • Mohammadhadi Bagheri
    • 5
  • Ronald M. Summers
    • 5
  1. 1.Nvidia CorporationBethesdaUSA
  2. 2.National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUSA
  3. 3.PAII Inc., Bethesda Research LabBethesdaUSA
  4. 4.Johns Hopkins UniversityBaltimoreUSA
  5. 5.Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences Department, Clinical Center, National Institutes of HealthBethesdaUSA

Personalised recommendations