Cloud-Based Whole Slide Image Analysis Using MapReduce

  • Hoang Vo
  • Jun Kong
  • Dejun Teng
  • Yanhui Liang
  • Ablimit Aji
  • George Teodoro
  • Fusheng WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10186)


Systematic analysis of high resolution whole slide images enables more effective diagnosis, prognosis and prediction of cancer and other important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be divided into smaller regions for processing due to computer memory limitations, which will lead to inaccurate results due to the ignorance of boundary crossing objects. In this paper, we propose a highly scalable and cost effective MapReduce based image analysis framework for whole slide image processing, and provide a cloud based implementation. The framework takes a grid-based overlapping partitioning scheme, and provides parallelization of image segmentation based on MapReduce. It provides graceful handling of boundary objects with a highly efficient spatial indexing based matching method, thus avoiding loss of accuracy due to partitioning. We demonstrate that the system achieves high scalability and is cost-effective – our experiments demonstrate that it costs less than fifteen cents to analyze one image on average using Amazon Elastic MapReduce.


Whole slide images Pathology image analysis MapReduce Cloud computing 



This work is supported in part by NSF IIS 1350885, by NSF ACI 1350885, by Grant Number K25CA181503 from the National Institute of Health, by Grant Number R01LM009239 from the National Library of Medicine, by Grant Number 1U24CA180924-01A1 from the National Cancer Institute, and by CNPq.


  1. 1.
    Kong, J., Cooper, L.A.D., Wang, F., Teodoro, G., Scarpace, L., Mikkelsen, T., Schniederjan, M.J., Moreno, S., Saltz, J.H., Brat, D.J.: Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS One 8(11), e81049 (2013)CrossRefGoogle Scholar
  2. 2.
    Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kurc, T.M., Moreno, C., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. J. Am. Med. Inform. Assoc. 19(2), 317–323 (2012)CrossRefGoogle Scholar
  3. 3.
    Foran, D.J., Yang, L., Hu, J., Goodell, L.A., Reise, M., Wang, F., Kurc, T., Pan, T., Sharma, A., Saltz, H.: Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. JAMIA 18(4), 403–415 (2011)Google Scholar
  4. 4.
    Teodoro, G., Pan, T., Kurc, T.M., Kong, J., Cooper, L.A.D., Podhorszki, N., Klasky, S., Saltz, J.H.: High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. In: IPDPS, pp. 103–114, May 2013Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  6. 6.
    Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: SIGSPATIAL GIS, pp. 309–318. ACM (2012)Google Scholar
  7. 7.
    Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)CrossRefGoogle Scholar
  8. 8.
    Cooper, L.A.D., Kong, J., Wang, F., Saltz, K.T., J.H., Brat D.: In silico analysis of nuclei in glioblastoma using large-scale microscopy images improves prediction of treatment response. In: EMBC (2011)Google Scholar
  9. 9.
    Wang, F., Oh, T.W., Vergara-Nidermayr, C., Kurc, T.M., Saltz, J.H.: Managing and querying whole slide images. In: SPIE Medical Imaging (2012)Google Scholar
  10. 10.
    Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)Google Scholar
  11. 11.
    Zhang, X., Wang, F., Lee, R., Saltz, J.H.: Towards building high performance medical image management system for clinical trials. In: SPIE Medical Imaging, pp. 762805–11 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Hoang Vo
    • 1
  • Jun Kong
    • 2
  • Dejun Teng
    • 3
  • Yanhui Liang
    • 4
  • Ablimit Aji
    • 5
  • George Teodoro
    • 6
  • Fusheng Wang
    • 4
    Email author
  1. 1.Department of Computer ScienceStony Brook UniversityStony BrookUSA
  2. 2.Department of Biomedical InformaticsEmory UniversityAtlantaUSA
  3. 3.Department of Computer Science and EngineeringThe Ohio State UniversityColumbusUSA
  4. 4.Department of Biomedical InformaticsStony Brook UniversityStony BrookUSA
  5. 5.HP LabsPalo AltoUSA
  6. 6.Department of Computer ScienceUniversity of BrasíliaBrasíliaBrazil

Personalised recommendations