Advertisement

Galaxy + Hadoop: Toward a Collaborative and Scalable Image Processing Toolbox in Cloud

  • Shiping Chen
  • Tomasz Bednarz
  • Piotr Szul
  • Dadong Wang
  • Yulia Arzhaeva
  • Neil Burdett
  • Alex Khassapov
  • John Zic
  • Surya Nepal
  • Tim Gurevey
  • John Taylor
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8377)

Abstract

With emergence and adoption of cloud computing, cloud has become an effective collaboration platform for integrating various software tools to deliver as services. In this paper, we present a cloud-based image processing toolbox by integrating Galaxy, Hadoop and our proprietary image processing tools. This toolbox allows users to easily design and execute complex image processing tasks by sharing various advanced image processing tools and scalable cloud computation capacity. The paper provides the integration architecture and technical details about the whole system. In particular, we present our investigations to use Hadoop to handle massive image processing jobs in the system. A number of real image processing examples are used to demonstrate the usefulness and scalability of this class of data-intensive applications.

Keywords

Galaxy Hadoop Image Processing Workflow Scalability Data-Intensive Computation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
    Agrawal, D., Das, S., Abbadi, A.E.: Big data and cloud computing: current state and future opportunities. In: The 14th International Conference on Extending Database Technology (EDBT/ICDT 2011), pp. 530–533 (2011)Google Scholar
  11. 11.
    Goecks, J., Nekrutenko, A.: James Taylorcorresponding and The Galaxy Team team: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biol. 11(8) (2010), http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2945788/
  12. 12.
    Singh, R.P., Keshav, S.: Tim Brecht: A cloud-based consumer-centric architecture for energy data analytics. e-Energy, 63–74 (2013)Google Scholar
  13. 13.
    Roth, B., Hecht, R., Volz, B., Jablonski, S.: Towards a Generic Cloud-Based Virtual Research Environment. COMPSAC Workshops 2011, 267–272 (2011)Google Scholar
  14. 14.
  15. 15.
    Wang, D., Lagerstrom, R., Sun, C., Bischof, L., Vallotton, P., Götte, M.: HCA-Vision: Automated Neurite Outgrowth Analysis. Journal of Biomolecular Screening 15(9), 1165–1170 (2010)CrossRefGoogle Scholar
  16. 16.
    Gureyev, T.E., Nesterets, Y., Ternovski, D.: Toolbox for advanced x-ray image processing. In: Proc. SPIE 8141, Advances in Computational Methods for X-Ray Optics II, 81410BGoogle Scholar
  17. 17.
    Chandra, S., Dowling, J., Shen, K., et al.: Patient specific prostate segmentation in 3-D magnetic resonance images. IEEE Transactions on Medical Imaging 31(10), 1955–1964 (2012)CrossRefGoogle Scholar
  18. 18.
    Goller, A.: Parallel Processing Strategies for Large SAR Image Data Sets in a Distributed Environment. Computing 62(4), 277–291 (1999)CrossRefzbMATHGoogle Scholar
  19. 19.
    Rosenblum, D.S.: Software System Scalability: Concepts and Technologies, Keynote talk at ISEC (2009)Google Scholar
  20. 20.
    Afgan, E., Baker, D., Coraor, N., Chapman, B., Nekrutenko, A., Taylor, J.: Galaxy CloudMan: Delivering Cloud Compute Clusters. BMC Bioinformatics 11(12) (2010)Google Scholar
  21. 21.
    Warfield, S.K., Jolesz, F.A., Kikinis, R.: A High Performance Computing Approach to the Registration of Medical Imaging Data. Parallel Computing 24(9-10), 1345–1368 (1998)CrossRefGoogle Scholar
  22. 22.
  23. 23.
  24. 24.
  25. 25.
    Sweeney, C.: HIPI: A Hadoop Image Processing Interface for Image-Based MapReduce Tasks, B.S. Thesis. University of Virginia, Department of Computer Science (2011)Google Scholar
  26. 26.
    Madduri, R.K., Dave, P., Sulakhe, D., Lacinski, L., Liu, B., Foster, I.T.: Experiences in building a next-generation sequencing analysis service using galaxy, globus online and Amazon web service. ACM XSEDE 2013, Article 34Google Scholar
  27. 27.
    Liu, B., Sotomayor, B., Madduri, R., Chard, K., Foster, I.: Deploying Bioinformatics Workflows on Clouds with Galaxy and Globus Provision. In: SCC 2012, pp. 1087–1095 (2012)Google Scholar
  28. 28.
    Choi, H., Um, J., Yoon, H., Lee, M., Choi, Y., Lee, W., Song, S., Jung, H.: A partitioning technique for improving the performance of PageRank on Hadoop. In: ICCCT 2012, 458–461 (2012)Google Scholar
  29. 29.
    Lee, Y., Lee, Y.: Toward scalable internet traffic measurement and analysis with Hadoop. SIGCOMM Comput. Commun. Rev. 43(1), 5–13 (2012)CrossRefGoogle Scholar
  30. 30.
    Bao, Y., Ren, L., Zhang, L., Zhang, X., Luo, Y.: Massive sensor data management framework in Cloud manufacturing based on Hadoop. Industrial Informatics (INDIN), 397–401Google Scholar
  31. 31.
    Wang, J., Crawl, D., Altintas, I.: Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems. ACM WORKS 2009, Article 12 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Shiping Chen
    • 1
  • Tomasz Bednarz
    • 1
  • Piotr Szul
    • 1
  • Dadong Wang
    • 1
  • Yulia Arzhaeva
    • 1
  • Neil Burdett
    • 1
  • Alex Khassapov
    • 1
  • John Zic
    • 1
  • Surya Nepal
    • 1
  • Tim Gurevey
    • 1
  • John Taylor
    • 1
  1. 1.CSIRO Computational Informatics (CCI), AustraliaEppingAustralia

Personalised recommendations