Skip to main content
Log in

Too many images on DockerHub! How different are images for the same system?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Containerization is a technique used to encapsulate a software system and its dependencies into one isolated package, which is called a container. The goal of these containers is to deploy or replicate a software system on various platforms and environments without facing any compatibility or dependency issues. Developers can instantiate these containers from images using Docker; one of the most popular containerization platforms. Furthermore, many of these images are publicly available on DockerHub, on which developers can share their images with the community who in turn can leverage such publicly available image. However, DockerHub contains thousands of images for each software system, which makes the selection of an image a nontrivial task. In this paper, we investigate the differences among DockerHub images for five software systems and 936 images with the goal of helping Docker tooling creators and DockerHub better guide users select a suitable image. We observe that users tend to download the official images (images that are provided by Docker itself) when there exist a large number of image choices for each single software system on the community images (images that are provided by the community developers), which are in many cases more resource efficient (have less duplicate resources) and have less security vulnerabilities. In fact, we observe that 27% (median), 35% (median), 6% (median), and 9% (median) of the DockerHub Debian, Centos, Ubuntu, and Alpine based images are identical to another image across all the studied software systems. Furthermore, 26% (median), 49% (median), and 8% (median) of the Alpine, Debian, and Ubuntu based community images are more resource efficient than their respective official images across all the five studied software systems. 7% (median) of the community Debian based images have less security vulnerabilities than their respective official images across the four studied software systems, for which an official Debian based image exists. Unfortunately, the description of 78% of the studied images do not guide users when selecting an image (the description does not exist at all or it does not highlight the particularities of the image), we suggest that Docker tooling creators and DockerHub design approaches to distinguish DockerHub images and help users find the most suitable images for their needs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. Our data is publicly available in https://github.com/SAILResearch/replication_dockerhub.

References

  • Acharya A, Fanguëde J, Paolino M, Raho D (2018) A performance benchmarking analysis of hypervisors containers and unikernels on armv8 and x86 cpus. In: European conference on networks and communications (euCNC), pp 282–289

  • Bettini A (2015) Vulnerability exploitation in docker container environments. Black Hat Europe, FlawCheck

    Google Scholar 

  • Bitbucket (2019) https://bitbucket.org, 5 2019 [Online; last accessed: 23 May 2019

  • Brogi A, Neri D, Soldani J (2017) Dockerfinder: Multi-attribute search of docker images. In: International conference on cloud engineering (IC2e), pp 273–278

  • Carter E (2019) 2018 docker usage report. https://sysdig.com/blog/2018-docker-usage-report, 5 2019 [Online; last accessed: 23 May 2019

  • Chen W, Zhou J-H, Zhu J-X, Wu G-Q, Wei J (2019) Semi-supervised learning based tag recommendation for docker repositories. J Comput Sci Technol 34(5):957–971

    Article  Google Scholar 

  • Cito J, Schermann G, Wittern JE, Leitner P, Zumberi S, Gall HC (2017) An empirical analysis of the docker container ecosystem on github. In: 14Th international conference on mining software repositories (MSR), pp 323–333

  • CoreOS (2019) Coreos quay. https://quay.io, 5 2019 [Online; last accessed: 23 May 2019

  • Datadog (2019a) Datadog. https://www.datadoghq.com, 5 2019 [Online; last accessed: 23 May 2019

  • Datadog (2019b) Docker adoption. https://www.datadoghq.com/docker-adoption, 5 2019 [Online; last accessed: 23 May 2019

  • Debian (2020a) Debian packages https://packages.debian.org/

  • Debian (2020b) Debian packages vulnerabilities. https://security-tracker.debian.org/tracker/data/json

  • Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: 15Th international conference on mining software repositories (MSR), pp 181–191

  • Docker (2019) Docker. https://www.docker.com, 5 2019 [Online; last accessed: 23 May 2019

  • DockerHub (2019a) Dockerhub http api v2. https://docs.docker.com/registry/spec/api, 5 2019 [Online; last accessed: 23 May 2019

  • DockerHub (2019b) Official images on dockerhub. https://docs.docker.com/docker-hub/official_images, 6 2019 [Online; last accessed: 7 June 2019

  • DockerHub (2020) Nginx docker images. https://www.docker.com/products/docker-hub, 3 2020 [Online; last accessed: 15 March 2020

  • Github (2019) Github. https://github.com, 5 2019 [Online; last accessed: 23 May 2019

  • Google (2019) Google container registry. https://cloud.google.com/container-registry, 5 2019 [Online; last accessed: 23 May 2019

  • Li Z, Kihl M, Lu Q, Andersson JA (2017) Performance overhead comparison between hypervisor and container based virtualization. In: 31St international conference on advanced information networking and applications (AINA), pp 955–962

  • Lu Z, Xu J, Wu Y, Wang T, Huang T (2019) An empirical case study on the temporary file smell in dockerfiles. IEEE Access 7:63650–63659

    Article  Google Scholar 

  • Merkel D (2014) Docker: Lightweight linux containers for consistent development and deployment, vol 2014

  • Muhtaroglu N, Kolcu B, Arı İ (2017) Testing performance of application containers in the cloud with hpc loads. In: Fifth international conference on parallel, Distributed, Grid And Cloud Computing For Engineering. Civil-Comp

  • Serverwatch (2019) Container revenue growing to 2.7b by 2020. https://www.serverwatch.com/server-news/container-revenue-growing-to-2.7b-by-2020.html, 5 2019 [Online; last accessed: 23 May 2019

  • Shirinbab S, Lundberg L, Casalicchio E (2017) Performance evaluation of container and virtual machine running cassandra workload. In: 3Rd international conference of cloud computing technologies and applications (cloudtech), pp 1–8

  • Shu R, Gu X, Enck W (2017) A study of security vulnerabilities on dockerhub. In: Seventh ACM on conference on data and application security and privacy (CODASPY), pp 269–280

  • Snyk (2019) Snyk docker analyzer. https://github.com/snyk/snyk-docker-analyzer, 5 2019 [Online; last accessed: 23 May 2019

  • Tak B, Kim H, Suneja S, Isci C, Kudva P (2018) Security analysis of container images using cloud analytics framework. In: 2018 Web services (ICWS), Cham, pp 116–133

  • Wagoodman (2019) Dive. https://github.com/wagoodman/dive, 5 2019 [Online; last accessed: 23 May 2019

  • Xu T, Marinov D (2018) Mining container image repositories for software configuration and beyond. In: 40Th international conference on software engineering: New ideas and emerging technologies results (ICSE-NIER), pp 49–52

  • Zerouali A, Cosentino V, Robles G, Gonzalez-Barahona JM, Mens T (2019a) Conpan: a tool to analyze packages in software containers. In: Proceedings of the 16th International Conference on Mining Software Repositories. IEEE Press, pp 592–596

  • Zerouali A, Mens T, Robles G, Gonzalez-Barahona JM (2019b) On the relation between outdated docker containers, severity vulnerabilities, and bugs. In: 26Th international conference on software analysis, evolution and reengineering (SANER), pp 491–501

  • Zheng C, Thain D (2015) Integrating containers into workflows: a case study using makeflow, work queue, and docker. In: 8Th international workshop on virtualization technologies in distributed computing (VTDC), pp 31–38

  • Zhang Y, Yin G, Wang T, Yu Y, Wang H (2018) An insight into the impact of dockerfile evolutionary trajectories on quality and latency. In: 42Nd annual computer software and applications conference (COMPSAC), vol 01, pp 138–143

  • Zhang Y, Wang H, Filkov V (2019) A clustering-based approach for mining dockerfile evolutionary trajectories. vol 62. Science China Press, pp 19101:1–19101:3

  • 451research. 451 research. https://451research.com, 5 2019. [Online; last accessed: 23 May, 2019]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Hasan Ibrahim.

Additional information

Communicated by: Nachiappan Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibrahim, M.H., Sayagh, M. & Hassan, A.E. Too many images on DockerHub! How different are images for the same system?. Empir Software Eng 25, 4250–4281 (2020). https://doi.org/10.1007/s10664-020-09873-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-020-09873-0

Keywords

Navigation