Skip to main content

Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction

  • Conference paper
  • First Online:
Book cover MultiMedia Modeling (MMM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11962))

Included in the following conference series:

Abstract

Medical imaging research has long suffered problems getting access to large collections of images due to privacy constraints and to high costs that annotating images by physicians causes. With public scientific challenges and funding agencies fostering data sharing, repositories, particularly on cancer research in the US, are becoming available. Still, data and annotations are most often available on narrow domains and specific tasks. The medical literature (particularly articles contained in MedLine) has been used for research for many years as it contains a large amount of medical knowledge. Most analyses have focused on text, for example creating semi-automated systematic reviews, aggregating content on specific genes and their functions, or allowing for information retrieval to access specific content. The amount of research on images from the medical literature has been more limited, as MedLine abstracts are available publicly but no images are included. With PubMed Central, all the biomedical open access literature has become accessible for analysis, with images and text in structured format. This makes the use of such data easier than extracting it from PDF. This article reviews existing work on analyzing images from the biomedical literature and develops ideas on how such images can become useful and usable for a variety of tasks, including finding visual evidence for rare or unusual cases. These resources offer possibilities to train machine learning tools, increasing the diversity of available data and thus possibly the robustness of the classifiers. Examples with histopathology data available on Twitter already show promising possibilities. This article adds links to other sources that are accessible, for example via the ImageCLEF challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.cancerimagingarchive.net/.

  2. 2.

    https://www.genome.gov/Funded-Programs-Projects/Cancer-Genome-Atlas.

  3. 3.

    http://www.ncbi.nlm.nih.gov/pmc/.

References

  1. Andrearczyk, V., Depeursinge, A., Müller, H.: Neural network training for cross-protocol radiomic feature standardization in computed tomography. J. Med. Imaging 6(2), 024008 (2019)

    Article  Google Scholar 

  2. Andrearczyk, V., Müller, H.: Deep multimodal classification of image types in biomedical journal figures. In: Bellot, P., et al. (eds.) CLEF 2018. LNCS, vol. 11018, pp. 3–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98932-7_1

    Chapter  Google Scholar 

  3. Apostolova, E., You, D., Xue, Z., Antani, S., Demner-Fushman, D., Thoma, G.R.: Image retrieval from scientific publications: text and image content processing to separate multi-panel figures. J. Am. Soc. Inf. Sci. 64, 893–908 (2013)

    Article  Google Scholar 

  4. Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. J. Am. Med. Assoc. 318(22), 2199–2210 (2017)

    Article  Google Scholar 

  5. Cheng, B., Stanley, R.J., De, S., Antani, S., Thoma, G.R.: Automatic detection of arrow annotation overlays in biomedical images. Int. J. Healthcare Inf. Syst. Inform. 6(4), 23–41 (2011)

    Article  Google Scholar 

  6. Chhatkuli, A., Markonis, D., Foncubierta-Rodríguez, A., Meriaudeau, F., Müller, H.: Separating compound figures in journal articles to allow for subfigure classification. In: SPIE Medical Imaging (2013)

    Google Scholar 

  7. Cruz-Roa, A., et al.: Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci. Rep. 7, 46450 (2017)

    Article  Google Scholar 

  8. Demner-Fushman, D., Antani, S., Simpson, M.S., Thoma, G.R.: Design and development of a multimodal biomedical information retrieval system. J. Comput. Sci. Eng. 6(2), 168–177 (2012)

    Article  Google Scholar 

  9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255 (2009)

    Google Scholar 

  10. Depeursinge, A., Müller, H.: Sensors, medical images and signal processing: comprehensive multi-modal diagnosis aid frameworks. In: IMIA Yearbook of Medical Informatics, vol. 5, no. 1, pp. 43–46 (2010)

    Article  Google Scholar 

  11. Deselaers, T., Deserno, T.M., Müller, H.: Automatic medical image annotation in ImageCLEF 2007: overview, results, and discussion. Pattern Recogn. Lett. 29(15), 1988–1995 (2008)

    Article  Google Scholar 

  12. Dhrangadhariya, A.K., Jimenez-del Toro, O., Andrearczyk, V., Atzori, M., Müller, H.: Exploiting the PubMed central repository to mine out a large multimodal dataset of rare cancer studies. In: SPIE International Society for Optics and Photonics (2020)

    Google Scholar 

  13. Emanuel, E.: A half-life of 5 years. Can. Med. Assoc. J. 112(5), 572 (1975)

    Google Scholar 

  14. Graziani, M., Andrearczyk, V., Müller, H.: Regression concept vectors for bidirectional explanations in histopathology. In: Stoyanov, D., et al. (eds.) MLCN/DLF/IMIMIC -2018. LNCS, vol. 11038, pp. 124–132. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02628-8_14

    Chapter  Google Scholar 

  15. García Seco de Herrera, A., Kalpathy-Cramer, J., Demner Fushman, D., Antani, S., Müller, H.: Overview of the ImageCLEF 2013 medical tasks. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum), September 2013

    Google Scholar 

  16. García Seco de Herrera, A., Müller, H., Bromuri, S.: Overview of the ImageCLEF 2015 medical classification task. In: Working Notes of CLEF 2015 (Cross Language Evaluation Forum), September 2015

    Google Scholar 

  17. García Seco de Herrera, A., Schaer, R., Bromuri, S., Müller, H.: Overview of the ImageCLEF 2016 medical task. In: Working Notes of CLEF 2016 (Cross Language Evaluation Forum), September 2016

    Google Scholar 

  18. Jimenez-del-Toro, O., et al.: Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: VISCERAL Anatomy benchmarks. IEEE Trans. Med. Imaging 35(11), 2459–2475 (2016)

    Article  Google Scholar 

  19. Kahn Jr., C.E., Thao, C.: GoldMiner: a radiology image search engine. Am. J. Roentgenol. 188, 1475–1478 (2008)

    Article  Google Scholar 

  20. Kalpathy-Cramer, J., García Seco de Herrera, A., Demner-Fushman, D., Antani, S., Bedrick, S., Müller, H.: Evaluating performance of biomedical image retrieval systems: overview of the medical image retrieval task at ImageCLEF 2004–2014. Comput. Med. Imaging Graph. 39, 55–61 (2015)

    Article  Google Scholar 

  21. Kalpathy-Cramera, J., Hersh, W.: Automatic image modality based classification and annotation to improve medical image retrieval. Stud. Health Technol. Inform. 129, 1334–1338 (2007)

    Google Scholar 

  22. Lehmann, T.M., Schubert, H., Keysers, D., Kohnen, M., Wein, B.B.: The IRMA code for unique classification of medical images. In: Huang, H.K., Ratib, O.M. (eds.) Medical Imaging 2003: PACS and Integrated Medical Information Systems: Design and Evaluation. SPIEProc, vol. 5033, pp. 440–451. San Diego, California, USA (2003)

    Chapter  Google Scholar 

  23. Leo, P., Lee, G., Shih, N.N.C., Elliott, R., Feldman, M.D., Madabhushi, A.: Evaluating stability of histomorphometric features across scanner and staining variations: prostate cancer diagnosis from whole slide images. J. Med. Imaging 3(4), 047502 (2016)

    Article  Google Scholar 

  24. Li, X., Plataniotis, K.N.: A complete color normalization approach to histopathology images using color cues computed from saturation-weighted statistics. IEEE Trans. Biomed. Eng. 62(7), 1862–1873 (2015)

    Article  Google Scholar 

  25. Maier-Hein, L., et al.: Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9(1), 5217 (2018)

    Article  Google Scholar 

  26. Markonis, D., et al.: Khresmoi for radiologists - visual search in radiology archives and the open-access medical literature. Health Manage. 13(3), 23–24 (2013)

    Google Scholar 

  27. Markonis, D., et al.: A survey on visual information search behavior and requirements of radiologists. Methods Inf. Med. 51(6), 539–548 (2012)

    Article  Google Scholar 

  28. Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)

    Article  Google Scholar 

  29. Müller, H., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S.: Creating a classification of image types in the medical literature for visual categorization. In: SPIE Medical Imaging (2012)

    Google Scholar 

  30. Müller, H., Rosset, A., Vallée, J.P., Geissbuhler, A.: Integrating content-based visual access methods into a medical case database. In: Proceedings of the Medical Informatics Europe Conference, MIE 2003, St. Malo, France, vol. 95, pp. 480–485, May 2003

    Google Scholar 

  31. Müller, H., Unay, D.: Retrieval from and understanding of large-scale multi-modal medical datasets: a review. IEEE Trans. Multimedia 19(9), 2093–2104 (2017)

    Article  Google Scholar 

  32. Münzer, B., Schoeffmann, K., Böszörmenyi, L.: Content-based processing and analysis of endoscopic images and videos: a survey. Multimedia Tools Appl. 77(1), 1323–1362 (2018)

    Article  Google Scholar 

  33. Otálora, S., Atzori, M., Andrearczyk, V., Müller, H.: Image magnification regression using densenet for exploiting histopathology open access content. In: Stoyanov, D., et al. (eds.) OMIA/COMPAY -2018. LNCS, vol. 11039, pp. 148–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00949-6_18

    Chapter  Google Scholar 

  34. Pogorelov, K., et al..: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 164–169. ACM (2017)

    Google Scholar 

  35. Schaumberg, A.J., et al.: Large-scale annotation of histopathology images from social media. bioRxiv 396663 (2018)

    Google Scholar 

  36. Tellez, D., et al.: Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. arXiv preprint arXiv:1902.06543 (2019)

  37. Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16(1), 138 (2015)

    Article  Google Scholar 

  38. Valavanis, L., Stathopoulos, S.: IPL at ImageCLEF 2017 concept detection task. In: CLEF2017 Working Notes. CEUR Workshop Proceedings, Dublin, Ireland, 11–14 September 2017. CEUR-WS.org. http://ceur-ws.org

  39. Vannier, M.W., Summers, R.M.: Sharing images. Radiology 228, 23–25 (2003)

    Article  Google Scholar 

  40. Westergaard, D., Stærfeldt, H.H., Tønsberg, C., Jensen, L.J., Brunak, S.: A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput. Biol. 14(2), e1005962 (2018)

    Article  Google Scholar 

Download references

Acknowledgment

This work was partially funded by the EU H2020 ExaMode project (grant agreement 825292).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henning Müller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Müller, H., Andrearczyk, V., del Toro, O.J., Dhrangadhariya, A., Schaer, R., Atzori, M. (2020). Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37734-2_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37733-5

  • Online ISBN: 978-3-030-37734-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics