Abstract
Medical imaging research has long suffered problems getting access to large collections of images due to privacy constraints and to high costs that annotating images by physicians causes. With public scientific challenges and funding agencies fostering data sharing, repositories, particularly on cancer research in the US, are becoming available. Still, data and annotations are most often available on narrow domains and specific tasks. The medical literature (particularly articles contained in MedLine) has been used for research for many years as it contains a large amount of medical knowledge. Most analyses have focused on text, for example creating semi-automated systematic reviews, aggregating content on specific genes and their functions, or allowing for information retrieval to access specific content. The amount of research on images from the medical literature has been more limited, as MedLine abstracts are available publicly but no images are included. With PubMed Central, all the biomedical open access literature has become accessible for analysis, with images and text in structured format. This makes the use of such data easier than extracting it from PDF. This article reviews existing work on analyzing images from the biomedical literature and develops ideas on how such images can become useful and usable for a variety of tasks, including finding visual evidence for rare or unusual cases. These resources offer possibilities to train machine learning tools, increasing the diversity of available data and thus possibly the robustness of the classifiers. Examples with histopathology data available on Twitter already show promising possibilities. This article adds links to other sources that are accessible, for example via the ImageCLEF challenges.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrearczyk, V., Depeursinge, A., Müller, H.: Neural network training for cross-protocol radiomic feature standardization in computed tomography. J. Med. Imaging 6(2), 024008 (2019)
Andrearczyk, V., Müller, H.: Deep multimodal classification of image types in biomedical journal figures. In: Bellot, P., et al. (eds.) CLEF 2018. LNCS, vol. 11018, pp. 3–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98932-7_1
Apostolova, E., You, D., Xue, Z., Antani, S., Demner-Fushman, D., Thoma, G.R.: Image retrieval from scientific publications: text and image content processing to separate multi-panel figures. J. Am. Soc. Inf. Sci. 64, 893–908 (2013)
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. J. Am. Med. Assoc. 318(22), 2199–2210 (2017)
Cheng, B., Stanley, R.J., De, S., Antani, S., Thoma, G.R.: Automatic detection of arrow annotation overlays in biomedical images. Int. J. Healthcare Inf. Syst. Inform. 6(4), 23–41 (2011)
Chhatkuli, A., Markonis, D., Foncubierta-RodrÃguez, A., Meriaudeau, F., Müller, H.: Separating compound figures in journal articles to allow for subfigure classification. In: SPIE Medical Imaging (2013)
Cruz-Roa, A., et al.: Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci. Rep. 7, 46450 (2017)
Demner-Fushman, D., Antani, S., Simpson, M.S., Thoma, G.R.: Design and development of a multimodal biomedical information retrieval system. J. Comput. Sci. Eng. 6(2), 168–177 (2012)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255 (2009)
Depeursinge, A., Müller, H.: Sensors, medical images and signal processing: comprehensive multi-modal diagnosis aid frameworks. In: IMIA Yearbook of Medical Informatics, vol. 5, no. 1, pp. 43–46 (2010)
Deselaers, T., Deserno, T.M., Müller, H.: Automatic medical image annotation in ImageCLEF 2007: overview, results, and discussion. Pattern Recogn. Lett. 29(15), 1988–1995 (2008)
Dhrangadhariya, A.K., Jimenez-del Toro, O., Andrearczyk, V., Atzori, M., Müller, H.: Exploiting the PubMed central repository to mine out a large multimodal dataset of rare cancer studies. In: SPIE International Society for Optics and Photonics (2020)
Emanuel, E.: A half-life of 5 years. Can. Med. Assoc. J. 112(5), 572 (1975)
Graziani, M., Andrearczyk, V., Müller, H.: Regression concept vectors for bidirectional explanations in histopathology. In: Stoyanov, D., et al. (eds.) MLCN/DLF/IMIMIC -2018. LNCS, vol. 11038, pp. 124–132. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02628-8_14
GarcÃa Seco de Herrera, A., Kalpathy-Cramer, J., Demner Fushman, D., Antani, S., Müller, H.: Overview of the ImageCLEF 2013 medical tasks. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum), September 2013
GarcÃa Seco de Herrera, A., Müller, H., Bromuri, S.: Overview of the ImageCLEF 2015 medical classification task. In: Working Notes of CLEF 2015 (Cross Language Evaluation Forum), September 2015
GarcÃa Seco de Herrera, A., Schaer, R., Bromuri, S., Müller, H.: Overview of the ImageCLEF 2016 medical task. In: Working Notes of CLEF 2016 (Cross Language Evaluation Forum), September 2016
Jimenez-del-Toro, O., et al.: Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: VISCERAL Anatomy benchmarks. IEEE Trans. Med. Imaging 35(11), 2459–2475 (2016)
Kahn Jr., C.E., Thao, C.: GoldMiner: a radiology image search engine. Am. J. Roentgenol. 188, 1475–1478 (2008)
Kalpathy-Cramer, J., GarcÃa Seco de Herrera, A., Demner-Fushman, D., Antani, S., Bedrick, S., Müller, H.: Evaluating performance of biomedical image retrieval systems: overview of the medical image retrieval task at ImageCLEF 2004–2014. Comput. Med. Imaging Graph. 39, 55–61 (2015)
Kalpathy-Cramera, J., Hersh, W.: Automatic image modality based classification and annotation to improve medical image retrieval. Stud. Health Technol. Inform. 129, 1334–1338 (2007)
Lehmann, T.M., Schubert, H., Keysers, D., Kohnen, M., Wein, B.B.: The IRMA code for unique classification of medical images. In: Huang, H.K., Ratib, O.M. (eds.) Medical Imaging 2003: PACS and Integrated Medical Information Systems: Design and Evaluation. SPIEProc, vol. 5033, pp. 440–451. San Diego, California, USA (2003)
Leo, P., Lee, G., Shih, N.N.C., Elliott, R., Feldman, M.D., Madabhushi, A.: Evaluating stability of histomorphometric features across scanner and staining variations: prostate cancer diagnosis from whole slide images. J. Med. Imaging 3(4), 047502 (2016)
Li, X., Plataniotis, K.N.: A complete color normalization approach to histopathology images using color cues computed from saturation-weighted statistics. IEEE Trans. Biomed. Eng. 62(7), 1862–1873 (2015)
Maier-Hein, L., et al.: Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9(1), 5217 (2018)
Markonis, D., et al.: Khresmoi for radiologists - visual search in radiology archives and the open-access medical literature. Health Manage. 13(3), 23–24 (2013)
Markonis, D., et al.: A survey on visual information search behavior and requirements of radiologists. Methods Inf. Med. 51(6), 539–548 (2012)
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Müller, H., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S.: Creating a classification of image types in the medical literature for visual categorization. In: SPIE Medical Imaging (2012)
Müller, H., Rosset, A., Vallée, J.P., Geissbuhler, A.: Integrating content-based visual access methods into a medical case database. In: Proceedings of the Medical Informatics Europe Conference, MIE 2003, St. Malo, France, vol. 95, pp. 480–485, May 2003
Müller, H., Unay, D.: Retrieval from and understanding of large-scale multi-modal medical datasets: a review. IEEE Trans. Multimedia 19(9), 2093–2104 (2017)
Münzer, B., Schoeffmann, K., Böszörmenyi, L.: Content-based processing and analysis of endoscopic images and videos: a survey. Multimedia Tools Appl. 77(1), 1323–1362 (2018)
Otálora, S., Atzori, M., Andrearczyk, V., Müller, H.: Image magnification regression using densenet for exploiting histopathology open access content. In: Stoyanov, D., et al. (eds.) OMIA/COMPAY -2018. LNCS, vol. 11039, pp. 148–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00949-6_18
Pogorelov, K., et al..: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 164–169. ACM (2017)
Schaumberg, A.J., et al.: Large-scale annotation of histopathology images from social media. bioRxiv 396663 (2018)
Tellez, D., et al.: Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. arXiv preprint arXiv:1902.06543 (2019)
Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16(1), 138 (2015)
Valavanis, L., Stathopoulos, S.: IPL at ImageCLEF 2017 concept detection task. In: CLEF2017 Working Notes. CEUR Workshop Proceedings, Dublin, Ireland, 11–14 September 2017. CEUR-WS.org. http://ceur-ws.org
Vannier, M.W., Summers, R.M.: Sharing images. Radiology 228, 23–25 (2003)
Westergaard, D., Stærfeldt, H.H., Tønsberg, C., Jensen, L.J., Brunak, S.: A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput. Biol. 14(2), e1005962 (2018)
Acknowledgment
This work was partially funded by the EU H2020 ExaMode project (grant agreement 825292).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Müller, H., Andrearczyk, V., del Toro, O.J., Dhrangadhariya, A., Schaer, R., Atzori, M. (2020). Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-37734-2_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37733-5
Online ISBN: 978-3-030-37734-2
eBook Packages: Computer ScienceComputer Science (R0)