Skip to main content

Development of Experimental Data Processing Workflows Based on Kubernetes Infrastructure and REANA Workflow Management System

  • Conference paper
  • First Online:
Supercomputing (RuSCDays 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1331))

Included in the following conference series:

  • 728 Accesses

Abstract

In this paper we present the design of data processing workflow for scientific experiments, which require complicated multi-step analysis procedure. We test it on datasets from Single Particle Imaging (SPI) experiments. The workflow is based on microservice architecture, Docker containers and Kubernetes platform. For workflow setup and management we use REANA software which is compatible with Kubernetes ochestrator and supports standard Common Workflow Language (CWL) to describe complex computing jobs. Our approach allows easy construction of workflows of diverse architecture for a wide range of applications. It allows integration of heterogeneous software in a uniform way as well as easy modification or replacement of workflow components. In the same time it allows easy scaling of computations in a cloud infrastructure. We show the applicability of the designed scheme and estimate the overhead of the platform middleware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Osseyran, A., Giles, M.: Industrial Applications of High-Performance Computing: Best Global Practices, vol. 25. CRC Press, Boca Raton (2015)

    Google Scholar 

  2. Gaffney, K., Chapman, H.: Imaging atomic structure and dynamics with ultrafast X-ray scattering. Science 316(5830), 1444–1448 (2007)

    Article  Google Scholar 

  3. Altarelli, M., Brinkmann, R., Chergui, M., Decking, W., Dobson, B., Düsterer, S., Grübel, G., Graeff, W., Graafsma, H., Hajdu, J., et al.: The European X-ray free-electron laser technical design report. DESY 97(2006), 4 (2006)

    Google Scholar 

  4. Callaway, E.: The revolution will not be crystallized: a new method sweeps through structural biology. Nature News 525(7568), 172 (2015)

    Article  Google Scholar 

  5. Danev, R., Yanagisawa, H., Kikkawa, M.: Cryo-electron microscopy methodology: current aspects and future directions. Trends in biochemical sciences (2019)

    Google Scholar 

  6. Nadareishvili, I., Mitra, R., McLarty, M., Amundsen, M.: Microservice architecture: aligning principles, practices, and culture. O’Reilly Media, Inc. (2016)

    Google Scholar 

  7. Kubernetes. https://kubernetes.io/

  8. Docker swarm. https://docs.docker.com/get-started/swarm-deploy/

  9. Rancher. https://rancher.com/

  10. Computational data analysis workflow systems. https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems

  11. Galaxy project. https://galaxyproject.org

  12. Apache hadoop. https://hadoop.apache.org

  13. Apache spark mllib. https://spark.apache.org/mllib/

  14. Slurm workload manager. https://slurm.schedmd.com/documentation.html

  15. Htcondor project. https://research.cs.wisc.edu/htcondor/

  16. Singularity project. https://sylabs.io/

  17. Data acquisition and data processing pipelines for cryoem at slac. https://github.com/slaclab/cryoem-airflow

  18. Apache airflow. https://airflow.apache.org

  19. Docker. https://www.docker.com/

  20. Ceph. https://docs.ceph.com/docs/master/

  21. Reana. http://reanahub.io/

  22. Nvc-github. https://github.com/NVIDIA/nvidia-container-runtime

  23. Nvidia device plugin for kubernetes. https://github.com/NVIDIA/k8s-device-plugin

  24. Flannel for kubernetes. https://github.com/coreos/flannel

  25. Cephfs volume provisioner for kubernetes. https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/cephfs

  26. Amstutz, P., et al.: Common workflow language, v1. 0 (2016)

    Google Scholar 

  27. Maciulaitis, R., et al.: Support for htcondor high-throughput computing workflows in the reana reusable analysis platform. Technical report (2019)

    Google Scholar 

  28. Reana documentation. https://reana.readthedocs.io/en/latest/

  29. Kohl, J., Neuman, C., et al.: The kerberos network authentication service (v5). Technical report, RFC 1510, September 1993

    Google Scholar 

  30. Bozek, J.D.: AMO instrumentation for the LCLS X-ray FEL. Europ. Phys. J. Special Top. 169(1), 129–132 (2009)

    Article  Google Scholar 

  31. Ferguson, K.R., et al.: The atomic, molecular and optical science instrument at the linac coherent light source. J. Synchrotron Radiation 22(3), 492–497 (2015)

    Article  Google Scholar 

  32. Reddy, H.K., et al.: Coherent soft x-ray diffraction imaging of coliphage pr772 at the linac coherent light source. Sci. Data 4, 170079 (2017)

    Article  Google Scholar 

  33. Maia, F.R.: The coherent x-ray imaging data bank. Nat. Methods 9(9), 854 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the Helmholtz Associations Initiative and Networking Fund and the Russian Science Foundation (project No. 18-41-06001, workflow development and deployment done by AT, SB, AN, VI and VV) and by RFBR grant 18-29-23020 (review of existing workflow management software done by AP). The work has been carried out using computing resources provided by NRC Kurchatov institute project “Development of modular platform for scientific data processing and mining” (Project No. 1571).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Teslyuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Teslyuk, A., Bobkov, S., Poyda, A., Novikov, A., Velikhov, V., Ilyin, V. (2020). Development of Experimental Data Processing Workflows Based on Kubernetes Infrastructure and REANA Workflow Management System. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2020. Communications in Computer and Information Science, vol 1331. Springer, Cham. https://doi.org/10.1007/978-3-030-64616-5_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64616-5_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64615-8

  • Online ISBN: 978-3-030-64616-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics