INDIGO-DataCloud: a Platform to Facilitate Seamless Access to E-Infrastructures

Abstract

This paper describes the achievements of the H2020 project INDIGO-DataCloud. The project has provided e-infrastructures with tools, applications and cloud framework enhancements to manage the demanding requirements of scientific communities, either locally or through enhanced interfaces. The middleware developed allows to federate hybrid resources, to easily write, port and run scientific applications to the cloud. In particular, we have extended existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT interfederation policies, thus guaranteeing transparency and trust in the provisioning of such services. Our middleware facilitates the execution of applications using containers on Cloud and Grid based infrastructures, as well as on HPC clusters. Our developments are freely downloadable as open source components, and are already being integrated into many scientific applications.

References

  1. 1.

    García, A.L., Castillo, E.F.-d., Puel, M.: Identity federation with VOMS in cloud infrastructures. In: 2013 IEEE 5Th International Conference on Cloud Computing Technology and Science, pp 42–48 (2013)

  2. 2.

    Chadwick, D.W., Siu, K., Lee, C., Fouillat, Y., Germonville, D.: Adding federated identity management to OpenStack. Journal of Grid Computing 12(1), 3–27 (2014)

    Article  Google Scholar 

  3. 3.

    Craig, A.L.: A design space review for general federation management using keystone. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp 720–725. IEEE Computer Society (2014)

  4. 4.

    Pustchi, N., Krishnan, R., Sandhu, R.: Authorization federation in iaas multi cloud. In: Proceedings of the 3rd International Workshop on Security in Cloud Computing, pp 63–71. ACM (2015)

  5. 5.

    Lee, C.A., Desai, N., Brethorst, A.: A Keystone-Based Virtual Organization Management System. In: 2014 IEEE 6Th International Conference On Cloud Computing Technology and Science (Cloudcom), pp 727–730. IEEE (2014)

  6. 6.

    Castillo, E.F.-d., Scardaci, D., García, A.L.: The EGI Federated Cloud e-Infrastructure. Procedia Computer Science 68, 196–205 (2015)

    Article  Google Scholar 

  7. 7.

    AARC project: AARC Blueprint Architecture, see https://aarc-project.eu/architecture. Technical report (2016)

  8. 8.

    Oesterle, F., Ostermann, S., Prodan, R., Mayr, G.J.: Experiences with distributed computing for meteorological applications: grid computing and cloud computing. Geosci. Model Dev. 8(7), 2067–2078 (2015)

    Article  Google Scholar 

  9. 9.

    Plasencia, I.C., Castillo, E.F.-d., Heinemeyer, S., García, A.L., Pahlen, F., Borges, G.: Phenomenology tools on cloud infrastructures using OpenStack. The European Physical Journal C 73(4), 2375 (2013)

    Article  Google Scholar 

  10. 10.

    Boettiger, C.: An introduction to docker for reproducible research. ACM SIGOPS Operating Systems Review 49(1), 71–79 (2015)

    Article  Google Scholar 

  11. 11.

    Docker: http://www.docker.com (2013)

  12. 12.

    Gomes, J., Campos, I., Bagnaschi, E., David, M., Alves, L., Martins, J., Pina, J., Alvaro, L.-G., Orviz, P.: Enabling rootless linux containers in multi-user environments: the udocker tool. Computing Physics Communications. https://doi.org/10.1016/j.cpc.2018.05.021 (2018)

  13. 13.

    Zhang, Z., Chuan, W., Cheung, D.W.L.: A survey on cloud interoperability taxonomies, standards, and practice. SIGMETRICS perform. Eval. Rev. 40(4), 13–22 (2013)

    Article  Google Scholar 

  14. 14.

    Lorido-Botran, T., Miguel-Alonso, J., Lozano, J.A.: A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments. Journal of Grid Computing 12(4), 559–592 (2014)

    Article  Google Scholar 

  15. 15.

    Nyrén, R., Metsch, T., Edmonds, A., Papaspyrou, A.: Open Cloud Computing Interface–Core. Technical report, Open Grid Forum (2010)

  16. 16.

    Metsch, T., Edmonds, A.: Open Cloud Computing Interface-Infrastructure. Technical report, Open Grid Forum (2010)

  17. 17.

    Metsch, T., Edmonds, A.: Open Cloud Computing Interface-RESTful HTTP Rendering. Technical report, Open Grid Forum (2011)

  18. 18.

    (Ca Technologies) Lipton, P., (Ibm) Moser, S., (Vnomic) Palma, D., (Ibm) Spatzier, T.: Topology and Orchestration Specification for Cloud Applications. Technical report, OASIS Standard (2013)

  19. 19.

    Teckelmann, R., Reich, C., Sulistio, A.: Mapping of cloud standards to the taxonomy of interoperability in IaaS. In: Proceedings - 2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011, pp 522–526 (2011)

  20. 20.

    García, A.L., Castillo, E.F.-d., Fernández, P.O.: Standards for enabling heterogeneous IaaS cloud federations. Computer Standards & Interfaces 47, 19–23 (2016)

    Article  Google Scholar 

  21. 21.

    Caballer, M., Zala, S., García, A.L., Montó, G., Fernández, P.O., Velten, M.: Orchestrating complex application architectures in heterogeneous clouds. Journal of Grid Computing 16 (1), 3–18 (2018)

    Article  Google Scholar 

  22. 22.

    Hardt, M., Jejkal, T., Plasencia, I.C., Castillo, E.F.-d., Jackson, A., Weiland, M., Palak, B., Plociennik, M., Nielsson, D.: Transparent Access to Scientific and Commercial Clouds from the Kepler Workflow Engine. Computing and Informatics 31(1), 119 (2012)

    MATH  Google Scholar 

  23. 23.

    Fakhfakh, F., Kacem, H.H., Kacem, A.H.: Workflow Scheduling in Cloud Computing a Survey. In: IEEE 18Th International Enterprise Distributed Object Computing Conference Workshops and Demonstrations (EDOCW), 2014, Vol. 71, pp. 372–378. Springer, New York (2014)

  24. 24.

    Stockton, D.B., Santamaria, F.: Automating NEURON simulation deployment in cloud resources. Neuroinformatics 15(1), 51–70 (2017)

    Article  Google Scholar 

  25. 25.

    Plóciennik, M., Fiore, S., Donvito, G., Owsiak, M., Fargetta, M., Barbera, R., Bruno, R., Giorgio, E., Williams, D.N., Aloisio, G.: Two-level Dynamic Workflow Orchestration in the INDIGO DataCloud for Large-scale, Climate Change Data Analytics Experiments. Procedia Computer Science 80, 722–733 (2016)

    Article  Google Scholar 

  26. 26.

    Moreno-Vozmediano, R., Montero, R.S., Llorente, I.M.: Multicloud deployment of computing clusters for loosely coupled mtc applications. IEEE transactions on parallel and distributed systems 22(6), 924–930 (2011)

    Article  Google Scholar 

  27. 27.

    Katsaros, G., Menzel, M., Lenk, A.: Cloud Service Orchestration with TOSCA, Chef and Openstack. In: Ic2e (2014)

  28. 28.

    Garcia, A.L., Zangrando, L., Sgaravatto, M., Llorens, V., Vallero, S., Zaccolo, V., Bagnasco, S., Taneja, S., Dal Pra, S., Salomoni, D., Donvito, G.: Improved Cloud resource allocation: how INDIGO-DataCloud is overcoming the current limitations in Cloud schedulers. J. Phys. Conf. Ser. 898(9), 92010 (2017)

    Article  Google Scholar 

  29. 29.

    Singh, S., Chana, I.: A survey on resource scheduling in cloud computing issues and challenges. Journal of Grid Computing, pp. 1–48 (2016)

  30. 30.

    García, A.L., Castillo, E.F.-d., Fernández, P.O., Plasencia, I.C., de Lucas, J.M.: Resource provisioning in Science Clouds: Requirements and challenges. Software: Practice and Experience 48(3), 486–498 (2018)

    Google Scholar 

  31. 31.

    Chauhan, M.A., Babar, M.A., Benatallah, B.: Architecting cloud-enabled systems: a systematic survey of challenges and solutions. Software - Practice and Experience 47(4), 599–644 (2017)

    Google Scholar 

  32. 32.

    Somasundaram, T.S., Govindarajan, K.: CLOUDRB A Framework for scheduling and managing High-Performance Computing (HPC) applications in science cloud. Futur. Gener. Comput. Syst. 34, 47–65 (2014)

    Article  Google Scholar 

  33. 33.

    Sotomayor, B., Keahey, K., Foster, I.: Overhead Matters: A Model for Virtual Resource Management. In: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing SE - VTDC ’06, p 5. IEEE Computer Society, Washington (2006)

  34. 34.

    SS, S.S., Shyam, G.K., Shyam, G.K.: Resource management for Infrastructure as a Service (IaaS) in cloud computing SS Manvi A survey. J. Netw. Comput. Appl. 41, 424–440 (2014)

    Article  Google Scholar 

  35. 35.

    INDIGO-DataCloud consortium: Initial requirements from research communities - d2.1, see https://www.indigo-datacloud.eu/documents/initial-requirements-research-communities-d21 https://www.indigo-datacloud.eu/documents/initial-requirements-research-communities-d21 https://www.indigo-datacloud.eu/documents/initial-requirements-research-communities-d21. Technical report (2015)

  36. 36.

    Europen open science cloud: https://ec.europa.eu/research/openscience (2015)

  37. 37.

    Proot: https://proot-me.github.io/ (2014)

  38. 38.

    Runc: https://github.com/opencontainers/runc (2016)

  39. 39.

    Fakechroot: https://github.com/dex4er/fakechroot (2015)

  40. 40.

    Pérez, A., Moltó, G., Caballer, M., Calatrava, A.: Serverless computing for container-based architectures Future Generation Computer Systems (2018)

  41. 41.

    de Vries, K.J.: Global fits of supersymmetric models after LHC run 1. Phd thesis Imperial College London (2015)

  42. 42.

    Openstack: https://www.openstack.org/ (2015)

  43. 43.

    See http://argus-documentation.readthedocs.io/en/stable/argus_introduction.html (2017)

  44. 44.

    See https://en.wikipedia.org/wiki/xacml (2013)

  45. 45.

    See http://www.simplecloud.info (2014)

  46. 46.

    Opennebula: http://opennebula.org/ (2018)

  47. 47.

    Redhat openshift: http://www.opencityplatform.eu (2011)

  48. 48.

    The cloud foundry foundation: https://www.cloudfoundry.org/ (2015)

  49. 49.

    Caballer, M., Blanquer, I., Moltó, G., de Alfonso, C.: Dynamic management of virtual infrastructures. Journal of Grid Computing 13(1), 53–70 (2015)

    Article  Google Scholar 

  50. 50.

    See http://www.infoq.com/articles/scaling-docker-with-kubernetes http://www.infoq.com/articles/scaling-docker-with-kubernetes (2014)

  51. 51.

    Prisma project: http://www.ponsmartcities-prisma.it/ (2010)

  52. 52.

    Opencitiy platform: http://www.opencityplatform.eu (2014)

  53. 53.

    Onedata: https://onedata.org/ (2018)

  54. 54.

    Dynafed: http://lcgdm.web.cern.ch/dynafed-dynamic-federation-project http://lcgdm.web.cern.ch/dynafed-dynamic-federation-project (2011)

  55. 55.

    Fts3: https://svnweb.cern.ch/trac/fts3 (2011)

  56. 56.

    Fernández, P.O., García, A.L., Duma, D.C., Donvito, G., David, M., Gomes, J.: A set of common software quality assurance baseline criteria for research projects, see http://hdl.handle.net/10261/160086. Technical report

  57. 57.

    Httermann, M.: Devops for developers Apress (2012)

  58. 58.

    EOSC-Hub: ”Integrating and managing services for the European Open Science Cloud” Funded by H2020 research and innovation pr ogramme under grant agreement No. 777536. See http://eosc-hub.eu (2018)

  59. 59.

    Apache License: author = https://www.apache.org/licenses/LICENSE-2.0 (2004)

  60. 60.

    INDIGO Package Repo: http://repo.indigo-datacloud.eu/ (2017)

  61. 61.

    INDIGO DockerHub: https://hub.docker.com/u/indigodatacloud/ https://hub.docker.com/u/indigodatacloud/ (2015)

  62. 62.

    Indigo gitbook: https://indigo-dc.gitbooks.io/indigo-datacloud-releases https://indigo-dc.gitbooks.io/indigo-datacloud-releases (2017)

  63. 63.

    Van Zundert, G.C., Bonvin, A.M.: Disvis: quantifying and visualizing the accessible interaction space of distance restrained biomolecular complexes. Bioinformatics 31(19), 3222–3224 (2015)

    Article  Google Scholar 

  64. 64.

    Van Zundert, G.C., Bonvin, A.M.: Fast and sensitive rigid–body fitting into cryo–em density maps with powerfit. AIMS Biophys. 2(0273), 73–87 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

INDIGO-Datacloud has been funded by the European Commision H2020 research and innovation program under grant agreement RIA 653549.

Author information

Affiliations

Authors

Corresponding author

Correspondence to I. Campos.

Appendices

Appendix A: Contribution to Open Source Software Projects

Here follows the list of software developed in the framework of INDIGO-Datacloud that has been contributed upstream to the Open Source community.

Appendix B: Tools and Services Involved in the Software Lifecycle

Figure 14 showcases the tools and services used for the development and distribution of the INDIGO-DataCloud software:

  • Project management service using openproject.org: It provides tools such as an issue tracker, wiki, a placeholder for documents and a project management timeline.

  • Source code is publicly available, housed externally in GitHub repositories, increasing so the visibility and simplifying the path to exploitation beyond the project lifetime. The INDIGO-DataCloud software is released under the Apache 2.0 software license [59].

  • Continuous Integration service using Jenkins: Service to automate the building, testing and packaging, where applicable. Testing includes the style compliance and estimation of the unit and functional test coverage of the software components.

  • Artifact repositories for RedHat and Debian packages [60] and virtual – Docker – images [61].

  • Code review service using GitHub: Source code review is one integral part of the SQA as it appears as the last step in the change verification process. This service facilitates the code review process, recording the comments and allowing the reviewer to verify the candidate change before being merged into the production version.

  • Issue tracking using GitHub Issues: Service to track issues, new features and bugs of INDIGO-DataCloud software components.

  • Release notes, installation and configuration guides, user and development manuals are made available on GitBook [62].

  • Code metrics services using Grimoire: To collect and visualize several metrics about the software components.

  • Integration infrastructure: this infrastructure is composed of computing resources to support directly the CI service.

  • Testing infrastructure: this infrastructure aims to provide a stable environment for users where they can preview the software and services developed by INDIGO-DataCloud, prior to its public release.

  • Preview infrastructure: where the released artifacts are deployed and made available for testing and validation by the use-cases.

Fig. 14
figure14

Tools and services used to support the software lifecycle process

Appendix C: DevOps Adoption from User Communities

DisVis [63] and PowerFit [64] applications were integrated into a CI/CD pipeline described above. As it can be seen in the Fig. 15, with this pipeline in place the application developers were provided with both a means to validate the source code before merging and the creation of a new versioned Docker image, automatically available in the INDIGO-DataCloud???s catalogue for applications i.e. DockerHub???s indigodatacloudapps repository.

Fig. 15
figure15

DisVis development workflow using a CI/CD approach

Once the application is deployed as a Docker container, and subsequently uploaded to indigodatacloudapps repository, it is instantiated in a new container to be validated. The application is then executed and the results compared with a set of reference outputs. Thus this pipeline implementation goes a step forward by testing the application execution for the last available Docker image in the catalogue.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Salomoni, D., Campos, I., Gaido, L. et al. INDIGO-DataCloud: a Platform to Facilitate Seamless Access to E-Infrastructures. J Grid Computing 16, 381–408 (2018). https://doi.org/10.1007/s10723-018-9453-3

Download citation

Keywords

  • Cloud computing
  • Platform as a service
  • Containers
  • Software management
  • Advanced user interfaces
  • Authorization and authentication