Cloud-Based Whole Slide Image Analysis Using MapReduce
Systematic analysis of high resolution whole slide images enables more effective diagnosis, prognosis and prediction of cancer and other important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be divided into smaller regions for processing due to computer memory limitations, which will lead to inaccurate results due to the ignorance of boundary crossing objects. In this paper, we propose a highly scalable and cost effective MapReduce based image analysis framework for whole slide image processing, and provide a cloud based implementation. The framework takes a grid-based overlapping partitioning scheme, and provides parallelization of image segmentation based on MapReduce. It provides graceful handling of boundary objects with a highly efficient spatial indexing based matching method, thus avoiding loss of accuracy due to partitioning. We demonstrate that the system achieves high scalability and is cost-effective – our experiments demonstrate that it costs less than fifteen cents to analyze one image on average using Amazon Elastic MapReduce.
KeywordsWhole slide images Pathology image analysis MapReduce Cloud computing
This work is supported in part by NSF IIS 1350885, by NSF ACI 1350885, by Grant Number K25CA181503 from the National Institute of Health, by Grant Number R01LM009239 from the National Library of Medicine, by Grant Number 1U24CA180924-01A1 from the National Cancer Institute, and by CNPq.
- 1.Kong, J., Cooper, L.A.D., Wang, F., Teodoro, G., Scarpace, L., Mikkelsen, T., Schniederjan, M.J., Moreno, S., Saltz, J.H., Brat, D.J.: Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS One 8(11), e81049 (2013)CrossRefGoogle Scholar
- 2.Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kurc, T.M., Moreno, C., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. J. Am. Med. Inform. Assoc. 19(2), 317–323 (2012)CrossRefGoogle Scholar
- 3.Foran, D.J., Yang, L., Hu, J., Goodell, L.A., Reise, M., Wang, F., Kurc, T., Pan, T., Sharma, A., Saltz, H.: Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. JAMIA 18(4), 403–415 (2011)Google Scholar
- 4.Teodoro, G., Pan, T., Kurc, T.M., Kong, J., Cooper, L.A.D., Podhorszki, N., Klasky, S., Saltz, J.H.: High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. In: IPDPS, pp. 103–114, May 2013Google Scholar
- 6.Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: SIGSPATIAL GIS, pp. 309–318. ACM (2012)Google Scholar
- 8.Cooper, L.A.D., Kong, J., Wang, F., Saltz, K.T., J.H., Brat D.: In silico analysis of nuclei in glioblastoma using large-scale microscopy images improves prediction of treatment response. In: EMBC (2011)Google Scholar
- 9.Wang, F., Oh, T.W., Vergara-Nidermayr, C., Kurc, T.M., Saltz, J.H.: Managing and querying whole slide images. In: SPIE Medical Imaging (2012)Google Scholar
- 10.Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)Google Scholar
- 11.Zhang, X., Wang, F., Lee, R., Saltz, J.H.: Towards building high performance medical image management system for clinical trials. In: SPIE Medical Imaging, pp. 762805–11 (2011)Google Scholar