Advertisement

Helping Your Docker Images to Spread Based on Explainable Models

  • Riccardo GuidottiEmail author
  • Jacopo Soldani
  • Davide Neri
  • Antonio Brogi
  • Dino Pedreschi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11053)

Abstract

Docker is on the rise in today’s enterprise IT. It permits shipping applications inside portable containers, which run from so-called Docker images. Docker images are distributed in public registries, which also monitor their popularity. The popularity of an image impacts on its actual usage, and hence on the potential revenues for its developers. In this paper, we present a solution based on interpretable decision tree and regression trees for estimating the popularity of a given Docker image, and for understanding how to improve an image to increase its popularity. The results presented in this work can provide valuable insights to Docker developers, helping them in spreading their images. Code related to this paper is available at: https://github.com/di-unipi-socc/DockerImageMiner.

Keywords

Docker images Popularity estimation Explainable models 

Notes

Acknowledgments

Work partially supported by the EU H2020 Program under the funding scheme “INFRAIA-1-2014-2015: Research Infrastructures”, grant agreement 654024 “SoBigData” (http://www.sobigdata.eu).

References

  1. 1.
    Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)MathSciNetGoogle Scholar
  2. 2.
    Berri, D.J., Schmidt, M.B., Brook, S.L.: Stars at the gate: the impact of star power on nba gate revenues. J. Sports Econ. 5(1), 33–50 (2004)CrossRefGoogle Scholar
  3. 3.
    Borges, H., et al.: Understanding the factors that impact the popularity of GitHub repositories. In: ICSME, pp. 334–344. IEEE (2016)Google Scholar
  4. 4.
    Borges, H., Hora, A., Valente, M.T.: Predicting the popularity of GitHub repositories. In: PROMISE, p. 9. ACM (2016)Google Scholar
  5. 5.
    Breiman, L., et al.: Classification and Regression Trees. CRC Press, Boca Raton (1984)zbMATHGoogle Scholar
  6. 6.
    Brogi, A., Neri, D., Soldani, J.: DockerFinder: multi-attribute search of Docker images. In: IC2E, pp. 273–278. IEEE (2017)Google Scholar
  7. 7.
    Franck, E., Nüesch, S.: Mechanisms of superstar formation in german soccer: empirical evidence. Eur. Sport Manag. Q. 8(2), 145–164 (2008)CrossRefGoogle Scholar
  8. 8.
    Goodman, B., Flaxman, S.: EU regulations on algorithmic decision-making and a right to explanation. In: ICML (2016)Google Scholar
  9. 9.
    Guidotti, R., Davide, S.J.N., Antonio, B.: Explaining successful Docker images using pattern mining analysis. In: Mazzara, M., Ober, I., Salaün, G. (eds.) STAF 2018. LNCS, vol. 11176, pp. 98–113. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-04771-9_9CrossRefGoogle Scholar
  10. 10.
    Guidotti, R., et al.: Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018)
  11. 11.
    Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. CSUR 51(5), 93 (2018)CrossRefGoogle Scholar
  12. 12.
    Harackiewicz, J.M., et al.: Predicting success in college. JEP 94(3), 562 (2002)Google Scholar
  13. 13.
    Hars, A., Ou, S.: Working for free? - motivations of participating in open source projects. IJEC 6(3), 25–39 (2002)Google Scholar
  14. 14.
    Joy, A.: Performance comparison between Linux containers and virtual machines. In: ICACEA, pp. 342–346, March 2015Google Scholar
  15. 15.
    Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining GitHub. In: MSR, pp. 92–101. ACM (2014)Google Scholar
  16. 16.
    Lehmann, E.L., Casella, G.: Theory of Point Estimation. Springer, New York (2006).  https://doi.org/10.1007/b98854CrossRefzbMATHGoogle Scholar
  17. 17.
    Litman, B.R.: Predicting success of theatrical movies: an empirical study. J. Popular Cult. 16(4), 159–175 (1983)CrossRefGoogle Scholar
  18. 18.
    Ma, Z., Sun, A., Cong, G.: On predicting the popularity of newly emerging hashtags in twitter. JASIST 64(7), 1399–1410 (2013)CrossRefGoogle Scholar
  19. 19.
    Miell, I., Sayers, A.H.: Docker in Practice. Manning Publications Co., New York (2016)Google Scholar
  20. 20.
    Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D., Sebastiani, F.: Quantification trees. In: ICDM, pp. 528–536. IEEE (2013)Google Scholar
  21. 21.
    Pahl, C., Brogi, A., Soldani, J., Jamshidi, P.: Cloud container technologies: a state-of-the-art review. IEEE Trans. Cloud Comput. (2017, in press)Google Scholar
  22. 22.
    Pappalardo, L., Cintia, P.: Quantifying the relation between performance and success in soccer. Adv. Complex Syst. 1750014 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Pappalardo, L., Cintia, P., Pedreschi, D., Giannotti, F., Barabasi, A.-L.: Human perception of performance. arXiv preprint arXiv:1712.02224 (2017)
  24. 24.
    Park, J., et al.: Style in the age of Instagram: predicting success within the fashion industry using social media. In: CSCW, pp. 64–73. ACM (2016)Google Scholar
  25. 25.
    Penner, O., Pan, R.K., Petersen, A.M., Kaski, K., Fortunato, S.: On the predictability of future impact in science. Sci. Rep. 3, 3052 (2013)CrossRefGoogle Scholar
  26. 26.
    Pollacci, L., Guidotti, R., Rossetti, G., Giannotti, F., Pedreschi, D.: The fractal dimension of music: geography, popularity and sentiment analysis. In: Guidi, B., Ricci, L., Calafate, C., Gaggi, O., Marquez-Barja, J. (eds.) GOODTECHS 2017. LNICST, vol. 233, pp. 183–194. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-76111-4_19CrossRefGoogle Scholar
  27. 27.
    Resnick, P., Varian, H.R.: Recommender systems. CACM 40(3), 56–58 (1997)CrossRefGoogle Scholar
  28. 28.
    Shcherbakov, M.V., et al.: A survey of forecast error measures. World Appl. Sci. J. 24, 171–176 (2013)Google Scholar
  29. 29.
    Sinatra, R., Wang, D., Deville, P., Song, C., Barabási, A.-L.: Quantifying the evolution of individual scientific impact. Science 354(6312), aaf5239 (2016)CrossRefGoogle Scholar
  30. 30.
    Soltesz, S., et al.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. SIGOPS 41(3), 275–287 (2007)CrossRefGoogle Scholar
  31. 31.
    Tan, P.-N., et al.: Introduction to Data Mining. Pearson Education India, Bangalore (2006)Google Scholar
  32. 32.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Tikhonov, A.: Solution of incorrectly formulated problems and the regularization method. Soviet Meth. Dokl. 4, 1035–1038 (1963)zbMATHGoogle Scholar
  34. 34.
    Trzciński, T., Rokita, P.: Predicting popularity of online videos using support vector regression. IEEE Trans. Multimedia 19(11), 2561–2570 (2017)CrossRefGoogle Scholar
  35. 35.
    Wang, D., Song, C., Barabási, A.-L.: Quantifying long-term scientific impact. Science 342(6154), 127–132 (2013)CrossRefGoogle Scholar
  36. 36.
    Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE). Clim. Res. 30(1), 79–82 (2005)CrossRefGoogle Scholar
  37. 37.
    Yan, X., Su, X.: Linear Regression Analysis: Theory and Computing. World Scientific, Singapore (2009)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Riccardo Guidotti
    • 1
    • 2
    Email author
  • Jacopo Soldani
    • 1
  • Davide Neri
    • 1
  • Antonio Brogi
    • 1
  • Dino Pedreschi
    • 1
  1. 1.University of PisaPisaItaly
  2. 2.KDDLab, ISTI-CNRPisaItaly

Personalised recommendations