The Metrics to Evaluate the Health Status of OSS Projects Based on Factor Analysis

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1042)


As open-source software (OSS) development is becoming a trend, an increasing number of businesses and developers are joining OSS projects. For project managers, developers and users, understanding the current health status of a project is very important to manage a development process, select the open-source projects to development or to adopt the software packages developed by projects. Therefore, an efficient approach to evaluate the health status of the open-source project is needed. Unfortunately, although many approaches including metrics have been proposed, they are designed in arbitrary ways. In this paper, a math ematical tool, i.e., factor analysis, is used to build a health evaluation model for OSS projects. As far as we know, this is the first time that factor analysis has been applied to evaluate OSS projects. This model is based on GitHub data and uses the basic indexes that are closely related to the health status of the projects as the input. Then, six new synthetic metrics, namely community activity, project popularity, development activity, completeness, responsiveness and persistence are obtained through factor analysis, which can be used to calculate the overall health score of a project. Moreover, in order to verify the effectiveness of this model, it is applied to some real projects and the results show that the overall scores achieved by this model can reflect the health status of the projects.


Open source software project Health status Factor analysis 



This work is supported by National Key Research and Development Plan (No. 2018YFB1003800).


  1. 1.
    Bird, C., Gall, H., Murphy, B., Devanbu, P.: An analysis of the effect of code ownership on software quality across windows, eclipse, and firefox (2010)Google Scholar
  2. 2.
    Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P.: Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, pp. 4–14. ACM (2011)Google Scholar
  3. 3.
    Borges, H., Hora, A., Valente, M.T.: Understanding the factors that impact the popularity of github repositories. In: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 334–344. IEEE (2016)Google Scholar
  4. 4.
    Cattell, R.B.: The scree test for the number of factors. Multivar. Behav. Res. 1(2), 245–276 (1966)CrossRefGoogle Scholar
  5. 5.
    Farah, G., Tejada, J.S., Correal, D.: OpenHub: a scalable architecture for the analysis of software quality attributes. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 420–423. ACM (2014)Google Scholar
  6. 6.
    Gamalielsson, J., Lundell, B., Lings, B.: Responsiveness as a measure for assessing the health of OSS ecosystems. In: Proceedings of the 2nd International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010), pp. 1–8. Tampere University of Technology, Tampere (2010)Google Scholar
  7. 7.
    Gousios, G., Spinellis, D.: GHTorrent: GitHub’s data from a firehose. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 12–21. IEEE (2012)Google Scholar
  8. 8.
    Hippel, E.V., Krogh, G.V.: Open source software and the “private-collective” innovation model: issues for organization science. Organ. Sci. 14(2), 209–223 (2003)CrossRefGoogle Scholar
  9. 9.
    Hu, Y., Zhang, J., Bai, X., Yu, S., Yang, Z.: Influence analysis of github repositories. SpringerPlus 5(1), 1268 (2016)CrossRefGoogle Scholar
  10. 10.
    Jensen, C., Scacchi, W.: Data mining for software process discovery in open source software development communities. In: Proceedings of Workshop on Mining Software Repositories, pp. 96–100. IET (2004)Google Scholar
  11. 11.
    Junior, J.H., Joseph, F., Anderson, R.E., TATHAM, R.L., et al.: Multivariate Data Analysis with Readings. Macmillan London (1992)Google Scholar
  12. 12.
    Kaiser, H.F.: The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20(1), 141–151 (1960)CrossRefGoogle Scholar
  13. 13.
    Manikas, K., Hansen, K.M.: Reviewing the health of software ecosystems - a conceptual framework proposal (2013)Google Scholar
  14. 14.
    Van der Linden, F., Lundell, B., Marttiin, P.: Commodification of industrial software: a case for open source. IEEE Softw. 26(4), 77–83 (2009)CrossRefGoogle Scholar
  15. 15.
    MacCallum, R.C., Widaman, K.F., Zhang, S., Hong, S.: Sample size in factor analysis. Psychol. Methods 4(1), 84 (1999)CrossRefGoogle Scholar
  16. 16.
    Manikas, K., Hansen, K.M.: Software ecosystems-a systematic literature review. J. Syst. Softw. 86(5), 1294–1306 (2013)CrossRefGoogle Scholar
  17. 17.
    Mockus, A., Fielding, R.T., Herbsleb, J.: A case study of open source software development: the apache server. In: Proceedings of the 22nd International Conference on Software Engineering, pp. 263–272. ACM (2000)Google Scholar
  18. 18.
    Moon, J., Sproull, L.: Essence of Distributed Work. Online Communication and Collaboration: A Reader, p. 125 (2010)Google Scholar
  19. 19.
    Oriol, M., Franco-Bedoya, O., Franch, X., Marco, J.: Assessing open source communities’ health using service oriented computing concepts. In: 2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS), pp. 1–6. IEEE (2014)Google Scholar
  20. 20.
    Ray, B., Posnett, D., Filkov, V., Devanbu, P.: A large scale study of programming languages and code quality in github. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 155–165. ACM (2014)Google Scholar
  21. 21.
    Spearman, C.: “General intelligence,” objectively determined and measured. Am. J. Psychol. 15(2), 201–292 (1904)CrossRefGoogle Scholar
  22. 22.
    Tabachnick, B.G., Fidell, L.S.: Using Multivariate Statistics, 5th edn. Allyn & Bacon, Needham Height (2007)Google Scholar
  23. 23.
    Van Den Berk, I., Jansen, S., Luinenburg, L.: Software ecosystems: a software ecosystem strategy assessment model. In: Proceedings of the Fourth European Conference on Software Architecture, pp. 127–134. ACM (2010)Google Scholar
  24. 24.
    Van Maanen, J.E., Schein, E.H.: Toward a theory of organizational socialization (1977)Google Scholar
  25. 25.
    Wahyudin, D., Mustofa, K., Schatten, A., Biffl, S., Min Tjoa, A.: Monitoring the health status of open source web-engineering projects. Int. J. Web Inf. Syst. 3(1/2), 116–139 (2007)CrossRefGoogle Scholar
  26. 26.
    Wikipedia contributors: Spss – Wikipedia, the free encyclopedia (2018). Accessed 16 Jan 2019
  27. 27.
    Wikipedia contributors: Interplanetary file system – Wikipedia, the free encyclopedia (2019). Accessed 18 Jan 2019
  28. 28.
    Wikipedia contributors: Tensorflow – Wikipedia, the free encyclopedia (2019). Accessed 18 Jan 2019

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiaotong UniversityShanghaiChina
  2. 2.Centre for Artificial Intelligence, Faculty of Engineering and Information TechnologyUniversity of Technology SydneySydneyAustralia

Personalised recommendations