Abstract
In this paper we present the design of data processing workflow for scientific experiments, which require complicated multi-step analysis procedure. We test it on datasets from Single Particle Imaging (SPI) experiments. The workflow is based on microservice architecture, Docker containers and Kubernetes platform. For workflow setup and management we use REANA software which is compatible with Kubernetes ochestrator and supports standard Common Workflow Language (CWL) to describe complex computing jobs. Our approach allows easy construction of workflows of diverse architecture for a wide range of applications. It allows integration of heterogeneous software in a uniform way as well as easy modification or replacement of workflow components. In the same time it allows easy scaling of computations in a cloud infrastructure. We show the applicability of the designed scheme and estimate the overhead of the platform middleware.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Osseyran, A., Giles, M.: Industrial Applications of High-Performance Computing: Best Global Practices, vol. 25. CRC Press, Boca Raton (2015)
Gaffney, K., Chapman, H.: Imaging atomic structure and dynamics with ultrafast X-ray scattering. Science 316(5830), 1444–1448 (2007)
Altarelli, M., Brinkmann, R., Chergui, M., Decking, W., Dobson, B., Düsterer, S., Grübel, G., Graeff, W., Graafsma, H., Hajdu, J., et al.: The European X-ray free-electron laser technical design report. DESY 97(2006), 4 (2006)
Callaway, E.: The revolution will not be crystallized: a new method sweeps through structural biology. Nature News 525(7568), 172 (2015)
Danev, R., Yanagisawa, H., Kikkawa, M.: Cryo-electron microscopy methodology: current aspects and future directions. Trends in biochemical sciences (2019)
Nadareishvili, I., Mitra, R., McLarty, M., Amundsen, M.: Microservice architecture: aligning principles, practices, and culture. O’Reilly Media, Inc. (2016)
Kubernetes. https://kubernetes.io/
Docker swarm. https://docs.docker.com/get-started/swarm-deploy/
Rancher. https://rancher.com/
Computational data analysis workflow systems. https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems
Galaxy project. https://galaxyproject.org
Apache hadoop. https://hadoop.apache.org
Apache spark mllib. https://spark.apache.org/mllib/
Slurm workload manager. https://slurm.schedmd.com/documentation.html
Htcondor project. https://research.cs.wisc.edu/htcondor/
Singularity project. https://sylabs.io/
Data acquisition and data processing pipelines for cryoem at slac. https://github.com/slaclab/cryoem-airflow
Apache airflow. https://airflow.apache.org
Docker. https://www.docker.com/
Reana. http://reanahub.io/
Nvc-github. https://github.com/NVIDIA/nvidia-container-runtime
Nvidia device plugin for kubernetes. https://github.com/NVIDIA/k8s-device-plugin
Flannel for kubernetes. https://github.com/coreos/flannel
Cephfs volume provisioner for kubernetes. https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/cephfs
Amstutz, P., et al.: Common workflow language, v1. 0 (2016)
Maciulaitis, R., et al.: Support for htcondor high-throughput computing workflows in the reana reusable analysis platform. Technical report (2019)
Reana documentation. https://reana.readthedocs.io/en/latest/
Kohl, J., Neuman, C., et al.: The kerberos network authentication service (v5). Technical report, RFC 1510, September 1993
Bozek, J.D.: AMO instrumentation for the LCLS X-ray FEL. Europ. Phys. J. Special Top. 169(1), 129–132 (2009)
Ferguson, K.R., et al.: The atomic, molecular and optical science instrument at the linac coherent light source. J. Synchrotron Radiation 22(3), 492–497 (2015)
Reddy, H.K., et al.: Coherent soft x-ray diffraction imaging of coliphage pr772 at the linac coherent light source. Sci. Data 4, 170079 (2017)
Maia, F.R.: The coherent x-ray imaging data bank. Nat. Methods 9(9), 854 (2012)
Acknowledgements
This research was partially supported by the Helmholtz Associations Initiative and Networking Fund and the Russian Science Foundation (project No. 18-41-06001, workflow development and deployment done by AT, SB, AN, VI and VV) and by RFBR grant 18-29-23020 (review of existing workflow management software done by AP). The work has been carried out using computing resources provided by NRC Kurchatov institute project “Development of modular platform for scientific data processing and mining” (Project No. 1571).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Teslyuk, A., Bobkov, S., Poyda, A., Novikov, A., Velikhov, V., Ilyin, V. (2020). Development of Experimental Data Processing Workflows Based on Kubernetes Infrastructure and REANA Workflow Management System. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2020. Communications in Computer and Information Science, vol 1331. Springer, Cham. https://doi.org/10.1007/978-3-030-64616-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-64616-5_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64615-8
Online ISBN: 978-3-030-64616-5
eBook Packages: Computer ScienceComputer Science (R0)