Skip to main content
Log in

MaReIA: a cloud MapReduce based high performance whole slide image analysis framework

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Recent advancements in systematic analysis of high resolution whole slide images have increase efficiency of diagnosis, prognosis and prediction of cancer and important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be tiled for processing due to computer memory limitations, which lead to inaccurate results due to the ignorance of boundary crossing objects. Thus, we propose a generic and highly scalable cloud-based image analysis framework for whole slide images. The framework enables parallelized integration of image analysis steps, such as segmentation and aggregation of micro-structures in a single pipeline, and generation of final objects manageable by databases. The core concept relies on the abstraction of objects in whole slide images as different classes of spatial geometries, which in turn can be handled as text based records in MapReduce. The framework applies an overlapping partitioning scheme on images, and provides parallelization of tiling and image segmentation based on MapReduce architecture. It further provides robust object normalization, graceful handling of boundary objects with an efficient spatial indexing based matching method to generate accurate results. Our experiments on Amazon EMR show that MaReIA is highly scalable, generic and extremely cost effective by benchmark tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://tcga-data.nci.nih.gov/.

References

  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)

    Google Scholar 

  2. Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: SIGSPATIAL/GIS, pp. 309–318. ACM (2012)

  3. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  4. Apache hadoop. http://hadoop.apache.org

  5. Apache spark. http://spark.apache.org

  6. Apache spark. http://storm.apache.org

  7. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)

  8. Boost c++ libraries (2013). http://www.boost.org/

  9. Clipper library. http://www.angusj.com/delphi/clipper.php

  10. Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kurc, T., Moreno, C.S., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. J. Am. Med. Inform. Assoc. 19(2), 317–323 (2012)

    Article  Google Scholar 

  11. Cooper, L.A., Kong, J., Gutman, D.A., Dunn, W.D., Nalisnik, M., Brat, D.J.: Novel genotype-phenotype associations in human cancers enabled by advanced molecular platforms and computational analysis of whole slide images. Lab. Investig. 95(4), 366–376 (2015)

    Article  Google Scholar 

  12. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  13. Foran, D.J., Yang, L., Chen, W., Hu, J., Goodell, L.A., Reiss, M., Wang, F., Kurç, T.M., Pan, T., Sharma, A., Saltz, J.H.: Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. JAMIA 18(4), 403–415 (2011)

    Google Scholar 

  14. Geospatial standard. http://www.opengeospatial.org/standards/sfs

  15. Gu, L., Li, H.: Memory or time: Performance evaluation for iterative operation on hadoop and spark. In: IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC\_EUC), pp. 721–727. IEEE (2013)

  16. Hare, J.S., Samangooei, S., Dupplaw, D.P.: Openimaj and imageterrier: Java libraries and tools for scalable multimedia analysis and indexing of images. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 691–694. ACM (2011)

  17. Kong, L.C.J., Moreno, C., Wang, F., Kurc, T., Saltz, J., Brat, D.: In silico analysis of nuclei in glioblastoma using large-scale microscopy images improves prediction of treatment response. In: EMBC (2011)

  18. Kong, J., Cooper, L.A.D., Wang, F., Gao, J., Teodoro, G., Scarpace, L., Mikkelsen, T., Schniederjan, M.J., Moreno, C.S., Saltz, J.H., Brat, D.J.: Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS ONE 8(11), e81049 (2013)

    Article  Google Scholar 

  19. Kothari, S., Phan, J.H., Stokes, T.H., Wang, M.D.: Pathology imaging informatics for quantitative analysis of whole-slide images. J. Am. Med. Inform. Assoc. 20(6), 1099–1108 (2013)

    Article  Google Scholar 

  20. Liang, Y., Wang, F., Treanor, D., Magee, D., Roberts, N., Teodoro, G., Zhu, Y., Kong, J.: A framework for 3d vessel analysis using whole slide images of liver tissue sections. Int. J. Comput. Biol. Drug Des. 9(1–2), 102–119 (2016)

    Article  Google Scholar 

  21. Markonis, D., Schaer, R., Eggel, I., Müller, H., Depeursinge, A.: Using mapreduce for large-scale medical image analysis (2015). arXiv:1510.06937

  22. Sweeney, C., Liu, L., Arietta, S., Lawrence, J.: Hipi: A Hadoop Image Processing Interface for Image-based Mapreduce Tasks. University of Virginia, Chris (2011)

    Google Scholar 

  23. Teodoro, G., Pan, T., Kurc, T., Kong, J., Cooper, L., Podhorszki, N., Klasky, S., Saltz, J.: High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. In: IPDPS, pp. 103–114 (2013)

  24. Vo, H., Kong, J., Teng, D., Liang, Y., Aji, A., Teodoro, G., Wang, F.: Cloud-based whole slide image analysis using mapreduce. In: VLDB Workshop on Data Management and Analytics for Medicine and Healthcare, pp. 62–77. Springer, New York (2016)

  25. Wang, X.Z.F., Lee, R., Saltz, J.: Towards building high performance medical image management system for clinical trials. In: SPIE Medical, Imaging, pp. 762805–762811 (2011)

  26. Wang, F., Oh, T.W., Vergara-Niedermayr, C., Kurc, T., Saltz, J.: Managing and querying whole slide images. In: SPIE Medical Imaging (2012)

Download references

Acknowledgements

This research is supported in part by Grants from National Science Foundation ACI 1443054 and IIS 1350885, National Institute of Health K25CA181503, and CNPq.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoang Vo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vo, H., Kong, J., Teng, D. et al. MaReIA: a cloud MapReduce based high performance whole slide image analysis framework. Distrib Parallel Databases 37, 251–272 (2019). https://doi.org/10.1007/s10619-018-7237-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7237-1

Keywords

Navigation