Advertisement

Predicting Popularity of Open Source Projects Using Recurrent Neural Networks

  • Sefa Eren SahinEmail author
  • Kubilay KarpatEmail author
  • Ayse TosunEmail author
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 556)

Abstract

GitHub is the largest open source software development platform with millions of repositories on variety of topics. The number of stars received by a repository is often considered as a measure of its popularity. Predicting the number of stars of a repository has been associated with the number of forks, commits, followers, documentation size, and programming language in the literature. We extend prior studies in terms of input features and algorithm: We define six features from GitHub events corresponding to the development activities, and additional six features incorporating the influence of users (followers and contributors) on the popularity of projects into their development activities. We propose a time-series based forecast model using Recurrent Neural Networks to predict the number of stars received in consecutive k days. We assess the performance of our proposed model with varying k (1, 7, 14, 30 days) and with varying input features. Our analysis on five topmost starred repositories in data visualization area shows that the error rate ranges between 19.76 and 70.57 among the projects. The best performing models use either features from development activities only, or all metrics including all the features.

Keywords

Open source projects Predicting stars Recurrent Neural Networks 

Notes

Acknowledgments

This research is supported in part by Scientific Research Projects Division of Istanbul Technical University with project number MGA-2017-40712 and Scientific and Technological Research Council of Turkey with project number 5170048.

References

  1. 1.
    Badashian, A.S., Stroulia, E.: Measuring user influence in GitHub: the million follower fallacy. In: International Workshop on CrowdSourcing in Software Engineering, pp. 15–21 (2016)Google Scholar
  2. 2.
    Bissyande, T.F., Lo, D., Jiang, L., Reveillere, L., Klein, J., Traon, Y.L.: Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In: International Symposium on Software Reliability Engineering, pp. 188–197 (2013)Google Scholar
  3. 3.
    Borges, H., Hora, A., Valente, M.T.: Predicting the popularity of GitHub repositories. In: International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 1–10 (2016)Google Scholar
  4. 4.
    Borges, H., Hora, A., Valente, M.T.: Understanding the factors that impact the popularity of GitHub repositories. In: IEEE International Conference on Software Maintenance and Evolution (2016)Google Scholar
  5. 5.
    Borges, H., Valente, M.T.: What’s in a GitHub star? Understanding repository starring practices in a social coding platform. J. Syst. Softw. 146, 112–129 (2018)CrossRefGoogle Scholar
  6. 6.
    Chen, F., Li, L., Jiang, J., Zhang, L.: Predicting the number of forks for open source software project. In: International Workshop on Evidential Assessment of Software Technologies, pp. 40–47 (2014)Google Scholar
  7. 7.
    Chniti, G., Bakir, H., Zaher, H.: E-commerce time series forecasting using LSTM neural network and support vector regression. In: International Conference on Big Data and Internet of Thing, pp. 80–84 (2017)Google Scholar
  8. 8.
    Connor, J.T., Martin, R.D., Atlas, L.E.: Recurrent neural networks and robust time series prediction. IEEE Trans. Neural Netw. 5(2), 240–254 (1994)CrossRefGoogle Scholar
  9. 9.
    Cosentino, V., Izquierdo, J.L.C., Cabot, J.: A systematic mapping study of software development with GitHub. IEEE Access 5, 7173–7192 (2017)CrossRefGoogle Scholar
  10. 10.
    Grammel, L., Schackmann, H., Schröter, A., Treude, C., Storey, M.A.: Human aspects of software engineering, pp. 1–6 (2010)Google Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Janardhanan, D., Barrett, E.: CPU workload forecasting of machines in data centers using LSTM recurrent neural networks and ARIMA models. In: International Conference for Internet Technology and Secured Transactions (2017)Google Scholar
  13. 13.
    López, A.R.: Analyzing GitHub as a collaborative software development platform: a systematic review. MSc thesis, University of Victoria (2017)Google Scholar
  14. 14.
    Neath, K.: Notifications & stars, August 2012Google Scholar
  15. 15.
    Shcherbakov, M.V., Brebels, A., Shcherbakova, N.L., Tyukov, A.P., Janovsky, T.A., Kamaev, V.A.: A survey of forecast error measures. World Appl. Sci. J. 24(24), 171–176 (2013)Google Scholar
  16. 16.
    Tsay, J., Dabbish, L., Herbsleb, J.: Influence of social and technical factors for evaluating contribution in GitHub. In: 36th Internaitonal Conference on Software Engineering, pp. 356–366 (2014)Google Scholar
  17. 17.
    Weber, S., Luo, J.: What makes an open source code popular on GitHub? In: International Conference on Data Mining Workshop, December, pp. 851–855 (2014)Google Scholar
  18. 18.
    Zhang, L., Liu, P., Gulla, J.A.: A neural time series forecasting model for user interests prediction on Twitter. In: 25th Conference on User Modeling, Adaptation and Personalization, pp. 397–398 (2017)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  1. 1.Faculty of Computer and Informatics EngineeringIstanbul Technical UniversityIstanbulTurkey

Personalised recommendations