Abstract
Containerization is a technique used to encapsulate a software system and its dependencies into one isolated package, which is called a container. The goal of these containers is to deploy or replicate a software system on various platforms and environments without facing any compatibility or dependency issues. Developers can instantiate these containers from images using Docker; one of the most popular containerization platforms. Furthermore, many of these images are publicly available on DockerHub, on which developers can share their images with the community who in turn can leverage such publicly available image. However, DockerHub contains thousands of images for each software system, which makes the selection of an image a nontrivial task. In this paper, we investigate the differences among DockerHub images for five software systems and 936 images with the goal of helping Docker tooling creators and DockerHub better guide users select a suitable image. We observe that users tend to download the official images (images that are provided by Docker itself) when there exist a large number of image choices for each single software system on the community images (images that are provided by the community developers), which are in many cases more resource efficient (have less duplicate resources) and have less security vulnerabilities. In fact, we observe that 27% (median), 35% (median), 6% (median), and 9% (median) of the DockerHub Debian, Centos, Ubuntu, and Alpine based images are identical to another image across all the studied software systems. Furthermore, 26% (median), 49% (median), and 8% (median) of the Alpine, Debian, and Ubuntu based community images are more resource efficient than their respective official images across all the five studied software systems. 7% (median) of the community Debian based images have less security vulnerabilities than their respective official images across the four studied software systems, for which an official Debian based image exists. Unfortunately, the description of 78% of the studied images do not guide users when selecting an image (the description does not exist at all or it does not highlight the particularities of the image), we suggest that Docker tooling creators and DockerHub design approaches to distinguish DockerHub images and help users find the most suitable images for their needs.
Similar content being viewed by others
Notes
Our data is publicly available in https://github.com/SAILResearch/replication_dockerhub.
References
Acharya A, Fanguëde J, Paolino M, Raho D (2018) A performance benchmarking analysis of hypervisors containers and unikernels on armv8 and x86 cpus. In: European conference on networks and communications (euCNC), pp 282–289
Bettini A (2015) Vulnerability exploitation in docker container environments. Black Hat Europe, FlawCheck
Bitbucket (2019) https://bitbucket.org, 5 2019 [Online; last accessed: 23 May 2019
Brogi A, Neri D, Soldani J (2017) Dockerfinder: Multi-attribute search of docker images. In: International conference on cloud engineering (IC2e), pp 273–278
Carter E (2019) 2018 docker usage report. https://sysdig.com/blog/2018-docker-usage-report, 5 2019 [Online; last accessed: 23 May 2019
Chen W, Zhou J-H, Zhu J-X, Wu G-Q, Wei J (2019) Semi-supervised learning based tag recommendation for docker repositories. J Comput Sci Technol 34(5):957–971
Cito J, Schermann G, Wittern JE, Leitner P, Zumberi S, Gall HC (2017) An empirical analysis of the docker container ecosystem on github. In: 14Th international conference on mining software repositories (MSR), pp 323–333
CoreOS (2019) Coreos quay. https://quay.io, 5 2019 [Online; last accessed: 23 May 2019
Datadog (2019a) Datadog. https://www.datadoghq.com, 5 2019 [Online; last accessed: 23 May 2019
Datadog (2019b) Docker adoption. https://www.datadoghq.com/docker-adoption, 5 2019 [Online; last accessed: 23 May 2019
Debian (2020a) Debian packages https://packages.debian.org/
Debian (2020b) Debian packages vulnerabilities. https://security-tracker.debian.org/tracker/data/json
Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: 15Th international conference on mining software repositories (MSR), pp 181–191
Docker (2019) Docker. https://www.docker.com, 5 2019 [Online; last accessed: 23 May 2019
DockerHub (2019a) Dockerhub http api v2. https://docs.docker.com/registry/spec/api, 5 2019 [Online; last accessed: 23 May 2019
DockerHub (2019b) Official images on dockerhub. https://docs.docker.com/docker-hub/official_images, 6 2019 [Online; last accessed: 7 June 2019
DockerHub (2020) Nginx docker images. https://www.docker.com/products/docker-hub, 3 2020 [Online; last accessed: 15 March 2020
Github (2019) Github. https://github.com, 5 2019 [Online; last accessed: 23 May 2019
Google (2019) Google container registry. https://cloud.google.com/container-registry, 5 2019 [Online; last accessed: 23 May 2019
Li Z, Kihl M, Lu Q, Andersson JA (2017) Performance overhead comparison between hypervisor and container based virtualization. In: 31St international conference on advanced information networking and applications (AINA), pp 955–962
Lu Z, Xu J, Wu Y, Wang T, Huang T (2019) An empirical case study on the temporary file smell in dockerfiles. IEEE Access 7:63650–63659
Merkel D (2014) Docker: Lightweight linux containers for consistent development and deployment, vol 2014
Muhtaroglu N, Kolcu B, Arı İ (2017) Testing performance of application containers in the cloud with hpc loads. In: Fifth international conference on parallel, Distributed, Grid And Cloud Computing For Engineering. Civil-Comp
Serverwatch (2019) Container revenue growing to 2.7b by 2020. https://www.serverwatch.com/server-news/container-revenue-growing-to-2.7b-by-2020.html, 5 2019 [Online; last accessed: 23 May 2019
Shirinbab S, Lundberg L, Casalicchio E (2017) Performance evaluation of container and virtual machine running cassandra workload. In: 3Rd international conference of cloud computing technologies and applications (cloudtech), pp 1–8
Shu R, Gu X, Enck W (2017) A study of security vulnerabilities on dockerhub. In: Seventh ACM on conference on data and application security and privacy (CODASPY), pp 269–280
Snyk (2019) Snyk docker analyzer. https://github.com/snyk/snyk-docker-analyzer, 5 2019 [Online; last accessed: 23 May 2019
Tak B, Kim H, Suneja S, Isci C, Kudva P (2018) Security analysis of container images using cloud analytics framework. In: 2018 Web services (ICWS), Cham, pp 116–133
Wagoodman (2019) Dive. https://github.com/wagoodman/dive, 5 2019 [Online; last accessed: 23 May 2019
Xu T, Marinov D (2018) Mining container image repositories for software configuration and beyond. In: 40Th international conference on software engineering: New ideas and emerging technologies results (ICSE-NIER), pp 49–52
Zerouali A, Cosentino V, Robles G, Gonzalez-Barahona JM, Mens T (2019a) Conpan: a tool to analyze packages in software containers. In: Proceedings of the 16th International Conference on Mining Software Repositories. IEEE Press, pp 592–596
Zerouali A, Mens T, Robles G, Gonzalez-Barahona JM (2019b) On the relation between outdated docker containers, severity vulnerabilities, and bugs. In: 26Th international conference on software analysis, evolution and reengineering (SANER), pp 491–501
Zheng C, Thain D (2015) Integrating containers into workflows: a case study using makeflow, work queue, and docker. In: 8Th international workshop on virtualization technologies in distributed computing (VTDC), pp 31–38
Zhang Y, Yin G, Wang T, Yu Y, Wang H (2018) An insight into the impact of dockerfile evolutionary trajectories on quality and latency. In: 42Nd annual computer software and applications conference (COMPSAC), vol 01, pp 138–143
Zhang Y, Wang H, Filkov V (2019) A clustering-based approach for mining dockerfile evolutionary trajectories. vol 62. Science China Press, pp 19101:1–19101:3
451research. 451 research. https://451research.com, 5 2019. [Online; last accessed: 23 May, 2019]
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Nachiappan Nagappan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ibrahim, M.H., Sayagh, M. & Hassan, A.E. Too many images on DockerHub! How different are images for the same system?. Empir Software Eng 25, 4250–4281 (2020). https://doi.org/10.1007/s10664-020-09873-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-020-09873-0