Skip to main content

SparkGIS: Efficient Comparison and Evaluation of Algorithm Results in Tissue Image Analysis Studies

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9579))

Abstract

Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison of multiple results, and facilitate algorithm sensitivity studies. The sizes of images and analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present SparkGIS, a distributed, in-memory spatial data processing framework to query, retrieve, and compare large volumes of analytical image result data for algorithm evaluation. Our approach combines the in-memory distributed processing capabilities of Apache Spark and the efficient spatial query processing of Hadoop-GIS. The experimental evaluation of SparkGIS for heatmap computations used to compare nucleus segmentation results from multiple images and analysis runs shows that SparkGIS is efficient and scalable, enabling algorithm evaluation and algorithm sensitivity studies on large datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://cancergenome.nih.gov.

References

  1. Mongo hadoop. https://github.com/mongodb/mongo-hadoop

  2. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  3. Beck, A.H., Sangoi, A.R., Leung, S., Marinelli, R.J., Nielsen, T.O., van de Vijver, M.J., West, R.B., van de Rijn, M., Koller, D.: Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3(108), 108ra113 (2011)

    Google Scholar 

  4. Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S.R., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kur, T.M., Moreno, C.S., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. JAMIA 19(2), 317–323 (2012)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  6. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  7. Eldawy, A.: Spatialhadoop: towards flexible and scalable spatial processing using mapreduce. In: Proceedings of the 2014 SIGMOD PhD Symposium, pp. 46–50. ACM, New York (2014)

    Google Scholar 

  8. Frye, R., McKenney, M.: Big data storage techniques for spatial databases: implications of big data architecture on spatial query processing. In: Information Granularity, Big Data, and Computational Intelligence, pp. 297–323. Springer, Switzerland (2015)

    Google Scholar 

  9. Fuchs, T.J., Buhmann, J.M.: Computational pathology: challenges and promises for tissue analysis. Comput. Med. Imaging Graph. 35(7), 515–530 (2011)

    Article  Google Scholar 

  10. Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz (1901)

    Google Scholar 

  11. Jia Yu, J.W., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 2015 International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL 2015 (2015)

    Google Scholar 

  12. Kong, J., Cooper, L.A.D., Wang, F., Chisolm, C., Moreno, C.S., Kur, T.M., Widener, P.M., Brat, D.J., Saltz, J.H.: A comprehensive framework for classification of nuclei in digital microscopy imaging: an application to diffuse gliomas. In: ISBI, pp. 2128–2131. IEEE (2011)

    Google Scholar 

  13. Louis, D.N., Feldman, M., Carter, A.B., Dighe, A.S., Pfeifer, J.D., Bry, L., Almeida, J.S., Saltz, J., Braun, J., Tomaszewski, J.E., et al.: Computational pathology: a path ahead. Archives of Pathology and Laboratory Medicine (2015)

    Google Scholar 

  14. Nishimura, S., Das, S., Agrawal, D., Abbadim A.E.: Md-hbase: a scalable multi-dimensional data infrastructure for location aware services. In: Proceedings of the 2011 IEEE 12th International Conference on Mobile Data Management, MDM 2011, vol. 01, pp. 7–16. IEEE Computer Society, Washington, DC (2011)

    Google Scholar 

  15. You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: IEEE CloudDM Workshop, to appear 2015. http://www-cs.ccny.cuny.edu/~jzhang/papers/spatial_cc_tr.pdf

  16. Yuan, Y., Failmezger, H., Rueda, O.M., Ali, H.R., Gräf, S., Chin, S.-F., Schwarz, R.F., Curtis, C., Dunning, M.J., Bardwell, H., Johnson, N., Doyle, S., Turashvili, G., Provenzano, E., Aparicio, S., Caldas, C., Markowetz, F.: Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4(157), 157ra143 (2012)

    Google Scholar 

  17. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012, p. 2. USENIX Association, Berkeley (2012)

    Google Scholar 

  18. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010)

    Google Scholar 

Download references

Acknowledgments

This work was funded in part by HHSN261200800001E from the NCI, 1U24CA180924-01A1 from the NCI, 5R01LM011119-05 and 5R01LM009239-07 from the NLM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Furqan Baig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Baig, F., Mehrotra, M., Vo, H., Wang, F., Saltz, J., Kurc, T. (2016). SparkGIS: Efficient Comparison and Evaluation of Algorithm Results in Tissue Image Analysis Studies. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41576-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41575-8

  • Online ISBN: 978-3-319-41576-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics