Can Commit Change History Reveal Potential Fault Prone Classes? A Study on GitHub Repositories

  • Chun Yong ChongEmail author
  • Sai Peck Lee
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1077)


Various studies had successfully utilized graph theory analysis as a way to gain a high-level abstraction view of the software systems, such as constructing the call graph to visualize the dependencies among software components. The level of granularity and information shown by the graph usually depends on the input such as variable, method, class, package, or combination of multiple levels. However, there are very limited studies that investigated how software evolution and change history can be used as a basis to model software-based complex network. It is a common understanding that stable and well-designed source code will have less update throughout a software development lifecycle. It is only those code that were badly design tend to get updated due to broken dependencies, high coupling, or dependencies with other classes. This paper put forward an approach to model a commit change-based weighted complex network based on historical software change and evolution data captured from GitHub repositories with the aim to identify potential fault prone classes. Four well-established graph centrality metrics were used as a proxy metric to discover fault prone classes. Experiments on ten open-source projects discovered that when all centrality metrics are used together, it can yield reasonably good precision when compared against the ground truth.


Software fault identification Software change coupling Commit change data Mining software repositories Complex network 



This work was carried out within the framework of the research project FP001-2016 under the Fundamental Research Grant Scheme provided by Ministry of Higher Education, Malaysia.


  1. 1.
    Ma, Y.T., He, K.Q., Li, B., Liu, J., Zhou, X.Y.: A hybrid set of complexity metrics for large-scale object-oriented software systems. J. Comput. Sci. Technol. 25, 1184–1201 (2010)CrossRefGoogle Scholar
  2. 2.
    Concas, G., Marchesi, M., Murgia, A., Tonelli, R., Turnu, I.: On the Distribution of Bugs in the Eclipse System. IEEE T Softw. Eng. 37, 872–877 (2011)CrossRefGoogle Scholar
  3. 3.
    Turnu, I., Concas, G., Marchesi, M., Tonelli, R.: The fractal dimension of software networks as a global quality metric. Inform. Sci. 245, 290–303 (2013)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Zimmermann, T., Nagappan, N.: Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th International Conference on Software Engineering, pp. 531–540. ACM (2008)Google Scholar
  5. 5.
    Hyland-Wood, D., Carrington, D., Kaplan, S.: Scale-free nature of java software package, class and method collaboration graphs. In: Proceedings of the 5th International Symposium on Empirical Software Engineering, Rio de Janeiro, Brasil (2006)Google Scholar
  6. 6.
    Chong, C.Y., Lee, S.P.: Analyzing maintainability and reliability of object-oriented software using weighted complex network. J. Syst. Softw. 110, 28–53 (2015)CrossRefGoogle Scholar
  7. 7.
    Chong, C.Y., Lee, S.P.: Automatic clustering constraints derivation from object-oriented software using weighted complex network with graph theory analysis. J. Syst. Softw. 133, 28–53 (2017)CrossRefGoogle Scholar
  8. 8.
    Myers, C.R.: Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys. Rev. E 68, 046116 (2003)CrossRefGoogle Scholar
  9. 9.
    Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: An in-depth study of the promises and perils of mining GitHub. Empirical Softw. Eng. 21(5), 2035–2071 (2016)CrossRefGoogle Scholar
  10. 10.
    Begel, A., Bosch, J., Storey, M.A.: Social networking meets software development: perspectives from GitHub, MSDN, stack exchange, and TopCoder. Softw. IEEE 30, 52–66 (2013)CrossRefGoogle Scholar
  11. 11.
    Gousios, G., Pinzger, M., Deursen, A.V.: An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp. 345–355. ACM, Hyderabad (2014)Google Scholar
  12. 12.
    Nagappan, N., Zeller, A., Zimmermann, T., Herzig, K., Murphy, B.: Change bursts as defect predictors. In: 2010 IEEE 21st International Symposium on Software Reliability Engineering (ISSRE), pp. 309–318. IEEE (2010)Google Scholar
  13. 13.
    Chong, C.Y., Lee, S.P.: A commit change-based weighted complex network approach to identify potential fault prone classes. In: 13th International Conference on Software Technologies, pp. 471–482 (2018)Google Scholar
  14. 14.
    Potanin, A., Noble, J., Frean, M., Biddle, R.: Scale-free geometry in OO programs. Commun. ACM 48, 99–103 (2005)CrossRefGoogle Scholar
  15. 15.
    Concas, G., Marchesi, M., Pinna, S., Serra, N.: Power-laws in a large object-oriented software system. IEEE Trans. Softw. Eng. 33, 687–708 (2007)CrossRefGoogle Scholar
  16. 16.
    Louridas, P., Spinellis, D., Vlachos, V.: Power laws in software. ACM Trans. Softw. Eng. Methodol. 18, 1–26 (2008)CrossRefGoogle Scholar
  17. 17.
    Pang, T.Y., Maslov, S.: Universal distribution of component frequencies in biological and technological systems. Proc. Nat. Acad. Sci. 110(15), 6235–6239 (2013)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Baxter, G., et al.: Understanding the shape of Java software. In: Sigplan Notices, vol. 41, pp. 397–412 (2006)CrossRefGoogle Scholar
  19. 19.
    LaBelle, N., Wallingford, E.: Inter-package dependency networks in open-source software. arXiv preprint arXiv:cs/0411096 (2004)
  20. 20.
    Oyetoyan, T.D., Falleri, J.R., Dietrich, J., Jezek, K.: Circular dependencies and change-proneness: an empirical study. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 241–250 (2015)Google Scholar
  21. 21.
    Valverde, S., Solé, R.V.: Hierarchical small worlds in software architecture. arXiv preprint arXiv:cond-mat/0307278 (2003)
  22. 22.
    Zhang, B., Huang, G., Zheng, Z., Ren, J., Hu, C.: Approach to mine the modularity of software network based on the most vital nodes. IEEE Access (2018)Google Scholar
  23. 23.
    Muthukumaran, K., Choudhary, A., Murthy, N.L.B.: Mining GitHub for novel change metrics to predict buggy files in software systems. In: 2015 International Conference on Computational Intelligence and Networks, pp. 15–20 (2015)Google Scholar
  24. 24.
    Hassan, A.E.: Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, pp. 78–88. IEEE Computer Society (2009)Google Scholar
  25. 25.
    Wiese, I.S., Kuroda, R.T., Re, R., Oliva, G.A., Gerosa, M.A.: An empirical study of the relation between strong change coupling and defects using history and social metrics in the apache aries project. In: Damiani, E., Frati, F., Riehle, D., Wasserman, Anthony I. (eds.) OSS 2015. IAICT, vol. 451, pp. 3–12. Springer, Cham (2015). Scholar
  26. 26.
    Ambros, M.D., Lanza, M., Robbes, R.: On the relationship between change coupling and software defects. In: 2009 16th Working Conference on Reverse Engineering, pp. 135–144 (2009)Google Scholar
  27. 27.
    Ajienka, N., Capiluppi, A.: Understanding the interplay between the logical and structural coupling of software classes. J. Syst. Softw. 134, 120–137 (2017)CrossRefGoogle Scholar
  28. 28.
    Zimmermann, T., Weisgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering, pp. 563–572. IEEE Computer Society (2004)Google Scholar
  29. 29.
    Kagdi, H., Gethers, M., Poshyvanyk, D.: Integrating conceptual and logical couplings for change impact analysis in software. Empirical Softw. Eng. 18, 933–969 (2013)CrossRefGoogle Scholar
  30. 30.
    Yang, X., Lo, D., Xia, X., Sun, J.: TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Technol. 87, 206–220 (2017)CrossRefGoogle Scholar
  31. 31.
    Xia, X., Lo, D., Pan, S.J., Nagappan, N., Wang, X.: HYDRA: massively compositional model for cross-project defect prediction. IEEE T. Softw. Eng. 42, 977–998 (2016)CrossRefGoogle Scholar
  32. 32.
    Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 159–170 (2017)Google Scholar
  33. 33.
    Guerrouj, L., et al.: Investigating the relation between lexical smells and change-and fault-proneness: an empirical study. Softw. Qual. J. 25, 641–670 (2017)CrossRefGoogle Scholar
  34. 34.
    Arnaoudova, V., Di Penta, M., Antoniol, G.: Linguistic antipatterns: what they are and how developers perceive them. Empirical Softw. Eng. 21, 104–158 (2016)CrossRefGoogle Scholar
  35. 35.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Information TechnologyMonash University MalaysiaBandar SunwayMalaysia
  2. 2.Department of Software Engineering, Faculty of Computer Science and Information TechnologyUniversity of MalayaKuala LumpurMalaysia

Personalised recommendations