Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction

Müller, Henning; Andrearczyk, Vincent; del Toro, Oscar Jimenez; Dhrangadhariya, Anjani; Schaer, Roger; Atzori, Manfredo

doi:10.1007/978-3-030-37734-2_45

Henning Müller^16,17,
Vincent Andrearczyk¹⁶,
Oscar Jimenez del Toro¹⁶,
Anjani Dhrangadhariya¹⁶,
Roger Schaer¹⁶ &
…
Manfredo Atzori¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11962))

Included in the following conference series:

International Conference on Multimedia Modeling

2244 Accesses
1 Altmetric

Abstract

Medical imaging research has long suffered problems getting access to large collections of images due to privacy constraints and to high costs that annotating images by physicians causes. With public scientific challenges and funding agencies fostering data sharing, repositories, particularly on cancer research in the US, are becoming available. Still, data and annotations are most often available on narrow domains and specific tasks. The medical literature (particularly articles contained in MedLine) has been used for research for many years as it contains a large amount of medical knowledge. Most analyses have focused on text, for example creating semi-automated systematic reviews, aggregating content on specific genes and their functions, or allowing for information retrieval to access specific content. The amount of research on images from the medical literature has been more limited, as MedLine abstracts are available publicly but no images are included. With PubMed Central, all the biomedical open access literature has become accessible for analysis, with images and text in structured format. This makes the use of such data easier than extracting it from PDF. This article reviews existing work on analyzing images from the biomedical literature and develops ideas on how such images can become useful and usable for a variety of tasks, including finding visual evidence for rare or unusual cases. These resources offer possibilities to train machine learning tools, increasing the diversity of available data and thus possibly the robustness of the classifiers. Examples with histopathology data available on Twitter already show promising possibilities. This article adds links to other sources that are accessible, for example via the ImageCLEF challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Andrearczyk, V., Depeursinge, A., Müller, H.: Neural network training for cross-protocol radiomic feature standardization in computed tomography. J. Med. Imaging 6(2), 024008 (2019)
Article Google Scholar
Andrearczyk, V., Müller, H.: Deep multimodal classification of image types in biomedical journal figures. In: Bellot, P., et al. (eds.) CLEF 2018. LNCS, vol. 11018, pp. 3–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98932-7_1
Chapter Google Scholar
Apostolova, E., You, D., Xue, Z., Antani, S., Demner-Fushman, D., Thoma, G.R.: Image retrieval from scientific publications: text and image content processing to separate multi-panel figures. J. Am. Soc. Inf. Sci. 64, 893–908 (2013)
Article Google Scholar
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. J. Am. Med. Assoc. 318(22), 2199–2210 (2017)
Article Google Scholar
Cheng, B., Stanley, R.J., De, S., Antani, S., Thoma, G.R.: Automatic detection of arrow annotation overlays in biomedical images. Int. J. Healthcare Inf. Syst. Inform. 6(4), 23–41 (2011)
Article Google Scholar
Chhatkuli, A., Markonis, D., Foncubierta-Rodríguez, A., Meriaudeau, F., Müller, H.: Separating compound figures in journal articles to allow for subfigure classification. In: SPIE Medical Imaging (2013)
Google Scholar
Cruz-Roa, A., et al.: Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci. Rep. 7, 46450 (2017)
Article Google Scholar
Demner-Fushman, D., Antani, S., Simpson, M.S., Thoma, G.R.: Design and development of a multimodal biomedical information retrieval system. J. Comput. Sci. Eng. 6(2), 168–177 (2012)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255 (2009)
Google Scholar
Depeursinge, A., Müller, H.: Sensors, medical images and signal processing: comprehensive multi-modal diagnosis aid frameworks. In: IMIA Yearbook of Medical Informatics, vol. 5, no. 1, pp. 43–46 (2010)
Article Google Scholar
Deselaers, T., Deserno, T.M., Müller, H.: Automatic medical image annotation in ImageCLEF 2007: overview, results, and discussion. Pattern Recogn. Lett. 29(15), 1988–1995 (2008)
Article Google Scholar
Dhrangadhariya, A.K., Jimenez-del Toro, O., Andrearczyk, V., Atzori, M., Müller, H.: Exploiting the PubMed central repository to mine out a large multimodal dataset of rare cancer studies. In: SPIE International Society for Optics and Photonics (2020)
Google Scholar
Emanuel, E.: A half-life of 5 years. Can. Med. Assoc. J. 112(5), 572 (1975)
Google Scholar
Graziani, M., Andrearczyk, V., Müller, H.: Regression concept vectors for bidirectional explanations in histopathology. In: Stoyanov, D., et al. (eds.) MLCN/DLF/IMIMIC -2018. LNCS, vol. 11038, pp. 124–132. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02628-8_14
Chapter Google Scholar
García Seco de Herrera, A., Kalpathy-Cramer, J., Demner Fushman, D., Antani, S., Müller, H.: Overview of the ImageCLEF 2013 medical tasks. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum), September 2013
Google Scholar
García Seco de Herrera, A., Müller, H., Bromuri, S.: Overview of the ImageCLEF 2015 medical classification task. In: Working Notes of CLEF 2015 (Cross Language Evaluation Forum), September 2015
Google Scholar
García Seco de Herrera, A., Schaer, R., Bromuri, S., Müller, H.: Overview of the ImageCLEF 2016 medical task. In: Working Notes of CLEF 2016 (Cross Language Evaluation Forum), September 2016
Google Scholar
Jimenez-del-Toro, O., et al.: Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: VISCERAL Anatomy benchmarks. IEEE Trans. Med. Imaging 35(11), 2459–2475 (2016)
Article Google Scholar
Kahn Jr., C.E., Thao, C.: GoldMiner: a radiology image search engine. Am. J. Roentgenol. 188, 1475–1478 (2008)
Article Google Scholar
Kalpathy-Cramer, J., García Seco de Herrera, A., Demner-Fushman, D., Antani, S., Bedrick, S., Müller, H.: Evaluating performance of biomedical image retrieval systems: overview of the medical image retrieval task at ImageCLEF 2004–2014. Comput. Med. Imaging Graph. 39, 55–61 (2015)
Article Google Scholar
Kalpathy-Cramera, J., Hersh, W.: Automatic image modality based classification and annotation to improve medical image retrieval. Stud. Health Technol. Inform. 129, 1334–1338 (2007)
Google Scholar
Lehmann, T.M., Schubert, H., Keysers, D., Kohnen, M., Wein, B.B.: The IRMA code for unique classification of medical images. In: Huang, H.K., Ratib, O.M. (eds.) Medical Imaging 2003: PACS and Integrated Medical Information Systems: Design and Evaluation. SPIEProc, vol. 5033, pp. 440–451. San Diego, California, USA (2003)
Chapter Google Scholar
Leo, P., Lee, G., Shih, N.N.C., Elliott, R., Feldman, M.D., Madabhushi, A.: Evaluating stability of histomorphometric features across scanner and staining variations: prostate cancer diagnosis from whole slide images. J. Med. Imaging 3(4), 047502 (2016)
Article Google Scholar
Li, X., Plataniotis, K.N.: A complete color normalization approach to histopathology images using color cues computed from saturation-weighted statistics. IEEE Trans. Biomed. Eng. 62(7), 1862–1873 (2015)
Article Google Scholar
Maier-Hein, L., et al.: Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9(1), 5217 (2018)
Article Google Scholar
Markonis, D., et al.: Khresmoi for radiologists - visual search in radiology archives and the open-access medical literature. Health Manage. 13(3), 23–24 (2013)
Google Scholar
Markonis, D., et al.: A survey on visual information search behavior and requirements of radiologists. Methods Inf. Med. 51(6), 539–548 (2012)
Article Google Scholar
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Article Google Scholar
Müller, H., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S.: Creating a classification of image types in the medical literature for visual categorization. In: SPIE Medical Imaging (2012)
Google Scholar
Müller, H., Rosset, A., Vallée, J.P., Geissbuhler, A.: Integrating content-based visual access methods into a medical case database. In: Proceedings of the Medical Informatics Europe Conference, MIE 2003, St. Malo, France, vol. 95, pp. 480–485, May 2003
Google Scholar
Müller, H., Unay, D.: Retrieval from and understanding of large-scale multi-modal medical datasets: a review. IEEE Trans. Multimedia 19(9), 2093–2104 (2017)
Article Google Scholar
Münzer, B., Schoeffmann, K., Böszörmenyi, L.: Content-based processing and analysis of endoscopic images and videos: a survey. Multimedia Tools Appl. 77(1), 1323–1362 (2018)
Article Google Scholar
Otálora, S., Atzori, M., Andrearczyk, V., Müller, H.: Image magnification regression using densenet for exploiting histopathology open access content. In: Stoyanov, D., et al. (eds.) OMIA/COMPAY -2018. LNCS, vol. 11039, pp. 148–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00949-6_18
Chapter Google Scholar
Pogorelov, K., et al..: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 164–169. ACM (2017)
Google Scholar
Schaumberg, A.J., et al.: Large-scale annotation of histopathology images from social media. bioRxiv 396663 (2018)
Google Scholar
Tellez, D., et al.: Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. arXiv preprint arXiv:1902.06543 (2019)
Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16(1), 138 (2015)
Article Google Scholar
Valavanis, L., Stathopoulos, S.: IPL at ImageCLEF 2017 concept detection task. In: CLEF2017 Working Notes. CEUR Workshop Proceedings, Dublin, Ireland, 11–14 September 2017. CEUR-WS.org. http://ceur-ws.org
Vannier, M.W., Summers, R.M.: Sharing images. Radiology 228, 23–25 (2003)
Article Google Scholar
Westergaard, D., Stærfeldt, H.H., Tønsberg, C., Jensen, L.J., Brunak, S.: A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput. Biol. 14(2), e1005962 (2018)
Article Google Scholar

Download references

Acknowledgment

This work was partially funded by the EU H2020 ExaMode project (grant agreement 825292).

Author information

Authors and Affiliations

University of Applied Sciences Western Switzerland (HES-SO), Sierre, Switzerland
Henning Müller, Vincent Andrearczyk, Oscar Jimenez del Toro, Anjani Dhrangadhariya, Roger Schaer & Manfredo Atzori
University of Geneva, Geneva, Switzerland
Henning Müller

Authors

Henning Müller
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Andrearczyk
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Jimenez del Toro
View author publications
You can also search for this author in PubMed Google Scholar
Anjani Dhrangadhariya
View author publications
You can also search for this author in PubMed Google Scholar
Roger Schaer
View author publications
You can also search for this author in PubMed Google Scholar
Manfredo Atzori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henning Müller .

Editor information

Editors and Affiliations

Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müller, H., Andrearczyk, V., del Toro, O.J., Dhrangadhariya, A., Schaer, R., Atzori, M. (2020). Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-37734-2_45
Published: 24 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37733-5
Online ISBN: 978-3-030-37734-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics