Advertisement

Journal of Grid Computing

, Volume 14, Issue 4, pp 589–601 | Cite as

Extending Science Gateway Frameworks to Support Big Data Applications in the Cloud

  • Shashank Gugnani
  • Carlos Blanco
  • Tamas Kiss
  • Gabor Terstyanszky
Open Access
Article

Abstract

Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new opportunities for application developers. This paper investigates how workflow systems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data.

Keywords

Big data Hadoop MapReduce Science gateway WS-PGRADE Workflow 

References

  1. 1.
    Apache Hadoop. http://hadoop.apache.org/. [26 November 2015]
  2. 2.
    Dean, J., MapReduce, G.S.: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi: 10.1145/1327452.1327492 CrossRefGoogle Scholar
  3. 3.
    Li, L., Ma, Z., Liu, L., Fan, Y.: Hadoop-based ARIMA algorithm and its application in weather forecast. Int. J. Database Theory Appl. 6(5), 119–132 (2013). doi: 10.14257/ijdta.2013.6.5.11 CrossRefGoogle Scholar
  4. 4.
    Schatz, M.C.: Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics 25(11), 1363–1369 (2009). doi: 10.1093/bioinformatics/btp236 CrossRefGoogle Scholar
  5. 5.
    Jiao, S., He, C., Dou, Y., Tang, H.: Molecular dynamics simulation: Implementation and optimization based on Hadoop. 2012 Eighth International Conference on Natural Computation (ICNC), 2012; 1203–1207. doi: 10.1109/ICNC.2012.6234529
  6. 6.
    Ludascher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurr. Comput. Pract. Exper. 18(10), 1039–1065 (2006). doi: 10.1002/cpe.994 CrossRefGoogle Scholar
  7. 7.
    Kacsuk, P.: P-GRADE portal family for grid infrastructures. Concurr. Comput. Pract. Exper. 23 (3), 235–245 (2011). doi: 10.1002/cpe.1654 CrossRefGoogle Scholar
  8. 8.
    Wang J., Crawl D., Altintas I.: Kepler + Hadoop: A general architecture facilitating data-intensive applications in scientific workflow systems. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS ’09. doi: 10.1145/1645164.1645176, pp 12:1–12:8. ACM, NY, USA (2009)
  9. 9.
    Fei, X., Lu, S., Lin, C.: Mapreduce-enabled scientific workflow composition framework. IEEE Int. Conf. Web Services, 2009. ICWS 2009, 663–670 (2009). doi: 10.1109/ICWS.2009.90 CrossRefGoogle Scholar
  10. 10.
    Nguyen P., Halem M.: A MapReduce Workflow System for Architecting Scientific Data Intensive Applications. In: Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing, SECLOUD’11. doi: 10.1145/1985500.1985510, pp 57?-63. ACM, NY, USA (2011)
  11. 11.
    Oozie. http://oozie.apache.org/. [26 November 2015]
  12. 12.
    Chen, Q., Wang, L., Shang, Z.: MRGIS: A mapreduce-enabled high performance workflow system for GIS. In: Proceedings of the 2008 Fourth IEEE International Conference on eScience, ESCIENCE’08. doi: 10.1109/eScience.2008.169, pp 646?-651. IEEE Computer Society, DC, USA (2008)
  13. 13.
    Cloudbroker platform. http://cloudbroker.com/platform/. [26 November 2015]
  14. 14.
    Taylor, S.J.E., Kiss, T., Terstyanszky, G., Kacsuk, P., Fantini, N.: Cloud computing for simulation in manufacturing and engineering: Introducing the CloudSME simulation platform. In: Proceedings of the 2014 Annual Simulation Symposium, ANSS ’14, pp 12:1–12:8. Society for Computer Simulation International, CA, USA (2014)Google Scholar
  15. 15.
    SHIWA Workflow Repository. https://shiwa-repo.cpc.wmin.ac.uk/shiwa-repo/. [26 November 2015]
  16. 16.
    Prefix Span Hadoop. https://github.com/WCMinor/prefixspanhadoop/. [26 November 2015]
  17. 17.
    Gugnani, S., Khanolkar, D., Bihany, T., Khadilkar, N.: Rule based classification on a multi node scalable hadoop cluster. In: Fortino, G., Fatta, G.D., Li, W., Ochoa, S., Cuzzocrea, A., Pathan, M. (eds.) Internet and Distributed Computing Systems. doi: 10.1007/978-3-319-11692-115, pp 174–183. No. 8729 in Lecture Notes in Computer Science, Springer International Publishing (2014)
  18. 18.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004). doi: 10.1093/bioinformatics/bth361 CrossRefGoogle Scholar
  19. 19.
    Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming scientific and distributed workflow with Triana services. Concurr. Comput. Pract. Exper. 18(10), 1021–1037 (2006). doi: 10.1002/cpe.992 CrossRefGoogle Scholar
  20. 20.
    Institute for Biocomputation and Physics of Complex Systems(BIFI). http://bifi.es/. [26 November 2015]
  21. 21.
    Cloudsigma. https://www.cloudsigma.com/. [26 November 2015]
  22. 22.
    Kacsuk, P., Kecskemeti, G., Kertesz, A., Nemeth, Z., Visegradi, A., Gergely, M.: Infrastructure aware scientific workflows and their support by a science gateway. In: 2015 7th International Workshop on Science Gateways (IWSG). doi: 10.1109/IWSG.2015.14, pp 22–27 (2015)
  23. 23.
    Kacsuk, P., Karoczkai, K., Hermann, G., Sipos, G., Kovacs, J.: WS-PGRADE: Supporting parameter sweep applications in workflows. In: Third Workshop on Workflows in Support of Large-Scale Science, 2008. WORKS 2008. doi: 10.1109/WORKS.2008.4723955, pp 10–?10 (2008)
  24. 24.
    Foster, I., Grimshaw, A., Lane, P., Lee, W., Morgan, M., Newhouse, S., Pickles, S., Pulsipher, D., Smith, C., Theimer, M.: Ogsa basic execution service version 1.0 (2007)Google Scholar
  25. 25.
    Balasko, A., Farkas, Z., Kacsuk, P.: Building science gateways by utilizing the generic WS-PGRADE/gUSE workflow system. Comput. Sci. 14(2), 307 (2013). doi: 10.7494/csci.2013.14.2.307
  26. 26.
    Gugnani, S., Kiss, T.: Extending Scientific Workflow Systems to Support MapReduce Based Applications in the Cloud, 7th International Workshop on Science Gateways, IWSG 2015, 3-5, 2015, Budapest, Hungary, pp. 16–21, doi: 10.1109/IWSG.2015.15
  27. 27.
    Farkas, Z., Kacsuk, P., Hajnal, A.: Connecting Workflow-Oriented Science Gateways to Mul-ti-cloud Systems, 7th International Workshop on Science Gateways, IWSG 2015, 3-5, 2015, Budapest, Hungary, pp. 40–46, DOI 10.1109/IWSG.2015.20Google Scholar
  28. 28.
    Kacsuk P. (ed.): Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities, Springer, 2014. pp. 301. (ISBN:978-3-319-11267-1)Google Scholar

Copyright information

© The Author(s) 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  • Shashank Gugnani
    • 1
  • Carlos Blanco
    • 1
    • 2
  • Tamas Kiss
    • 1
  • Gabor Terstyanszky
    • 1
  1. 1.Center for Parallel ComputingUniversity of WestminsterLondonUK
  2. 2.University of CantabriaCantabriaSpain

Personalised recommendations