Studying the impact of social interactions on software quality

Bettenburg, Nicolas; Hassan, Ahmed E.

doi:10.1007/s10664-012-9205-0

Studying the impact of social interactions on software quality

Published: 28 April 2012

Volume 18, pages 375–431, (2013)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Nicolas Bettenburg¹ &
Ahmed E. Hassan¹

1464 Accesses
36 Citations
1 Altmetric
1 Mention
Explore all metrics

Abstract

Correcting software defects accounts for a significant amount of resources in a software project. To make best use of testing efforts, researchers have studied statistical models to predict in which parts of a software system future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on source code-oriented metrics, such as lines of code or number of changes. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the various repositories used to manage development efforts. In this paper, we develop statistical models to study the impact of social interactions in a software project on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of two large open-source software projects. The results of our case studies demonstrate the impact of metrics from four different dimensions of social interaction on post-release defects. Our findings show that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our results demonstrate that social information does not substitute, but rather augments traditional source code-based metrics used in defect prediction models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How does developer interaction relate to software quality? an examination of product development data

Article 03 August 2017

Subhajit Datta

An empirical study on the effect of community smells on bug prediction

Article 15 February 2021

Beyza Eken, Francis Palma, … Tosun Ayşe

The impact of tangled code changes on defect prediction models

Article 16 April 2015

Kim Herzig, Sascha Just & Andreas Zeller

Notes

http://pmd.sourceforge.net/

References

Alatis JE (1993) Language, communication and social meaning. Georgetown University Press
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y-G (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: CASCON ’08: proceedings of the 2008 conference of the center for advanced studies on collaborative research. ACM, pp 304–318
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: ICSE ’06: proceedings of the 28th international conference on software engineering. ACM, pp 361–370
Bacchelli A, D’Ambros M, Lanza M (2010) Are popular classes more defect prone? In: To appear in FASE 2010: proceedings of the 13th international conference on fundamental approaches to soft. eng. Springer
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Article Google Scholar
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: SIGSOFT ’08/FSE-16: proceedings of the 2008 ACM SIGSOFT symposium on foundations of software engineering. ACM, pp 308–318
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structural information from bug reports. In: MSR ’08: proceedings of the 2008 international working conference on mining software repositories. ACM, pp 27–30
Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: ESEC/FSE ’08: proceedings of the 2008 ACM SIGSOFT symposium on foundations of software engineering. ACM, pp 24–35
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: ESEC/FSE ’09: proceedings of the 2009 ACM SIGSOFT symposium on foundations of software engineering. ACM, pp 121–130
Bland MJ, Altman DG (1996) Transformations, means and confidence intervals. Br Med J 312(7038):1079
Article Google Scholar
Cataldo M, Mockus A, Roberts JA, Herbsleb JD (2009) Software dependencies, work dependencies, and their impact on failures. IEEE Trans Softw Eng 35(6):864–878
Article Google Scholar
Cohen J (2003) Applied multiple regression/correlation analysis for the behavioral sciences, vol 1. Routledge
Čubranić D, Murphy GC (2003) Hipikat: recommending pertinent software development artifacts. In: ICSE ’03: proceedings of the 25th international conference on software engineering. IEEE Computer Society, pp 408–418
D’Este C (2004) Sharing meaning with machines. In: Proceedings of the fourth international workshop on epigenetic robotics. Lund University Cognitive Studies, pp 111–114
Edwards AWF (1963) The measure of association in a 2 by 2 table. J R Stat Soc A 126(1):109–114
Article Google Scholar
Fay MP, Proschan MA (2010) Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. In: Statistics surveys, vol 4, pp 1–39
Fischer M, Pinzger M, Gall H (2003) Analyzing and relating bug report data for feature tracking. In: WCRE ’03: proceedings of the 10th working conference on reverse engineering. IEEE Computer Society, p 90
Friendly M (2002) Corrgrams: exploratory displays for correlation matrices. Am Stat 56(1):316–324
Article MathSciNet Google Scholar
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In: To appear in proceedings of the 32th international conference on software engineering
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2011) Not my bug and other reasons for software bug report reassignments. In: Proceedings of the ACM conference on computer supported cooperative work (CSCW 2011). ACM
Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE ’09: proceedings of the 31st international conference on software engineering. IEEE Computer Society, pp 78–88
Jeong G, Kim S, Zimmermann Th (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE ’09: proceedings of the 2009 ACM SIGSOFT symposium on foundations of software engineering. ACM, pp 111–120
Kutner MH, Nachtsheim CJ, Neter J (2004) Applied linear regression models, 4th international edn. McGraw-Hill/Irwin
McCabe TJ (1976) A complexity measure. In: ICSE ’76: proceedings of the 2nd international conference on software engineering. IEEE Computer Society Press, p 407
Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: SIGSOFT ’08/FSE-16: proceedings of the 2008 ACM SIGSOFT symposium on foundations of software engineering. ACM, pp 13–23
Mertsalov K, Magdon-Ismail M, Goldberg M (2009) Models of communication dynamics for simulation of information diffusion. In: Proceedings of the 2009 international conference on advances in social network analysis and mining (ASONAM ’09). IEEE Computer Society Press, pp 194–199
Mockus A, Zhang P, Li PL (2005) Predictors of customer perceived software quality. In: ICSE ’05: proceedings of the 27th international conference on software engineering. ACM, pp 225–233
Mockus A, Nagappan N, Dinh-Trong T (2009) Test coverage and post-verification defects: a multiple case study. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement (ESEM ’09). IEEE Computer Society, pp 291–301
Munson JC, Elbaum SG (1998) Code churn: a measure for estimating the impact of code change. In: ICSM ’98: proceedings of the international conference on software maintenance. IEEE Computer Society, p 24
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: ICSE ’05: proceedings of the 27th international conference on software engineering. ACM, pp 284–292
Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: an empirical case study. In: ESEM ’07: proceedings of the first international symposium on empirical software engineering and measurement. IEEE Computer Society, pp 364–373
Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894
Article Google Scholar
Purao S, Vaishnavi V (2003) Product metrics for object-oriented systems. ACM Comput Surv 35(2):191–221
Article Google Scholar
Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In: SIGSOFT ’08/FSE-16: proceedings of the 2008 ACM SIGSOFT symposium on foundations of software engineering. ACM, pp 2–12
Shannon CE (2001) A mathematical theory of communication. SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Article Google Scholar
Schröter A, Zimmermann T, Zeller A (2006) Predicting component failures at design time. In: ISESE ’06: proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering. ACM, pp 18–27
Schroter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs? In: Proceedings of 7th IEEE working conference on mining software repositories (MSR’10). IEEE Computer Society, pp 118–121
Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K (2010a) Predicting re-opened bugs: a case study on the eclipse project. In: Proceedings of the 17th working conference on reverse engineering (WCRE 2010). IEEE Computer Society, pp 13–16
Shihab E, Jiang ZM, Ibrahim WM, Adams B, Hassan AE (2010b) Understanding the impact of code and process metrics on post-release defects: a case study on the eclipse project. In: Proceedings of the 4th IEEE international symposium on empirical software engineering and measurement (ESEM 2010). ACM, pp 4:1–4:10
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR ’05: proceedings of the 2005 international workshop on mining software repositories. ACM, pp 1–5
Steel RGD, Torrie JH (1960) Principles and procedures of statistics. McGraw-Hill, pp 187–287
Wasserman S, Faust K (1994) Social network analysis: methods and applications (structural analysis in the social sciences), 1st edn. Cambridge University Press
Wolf T, Schröter A, Damian D, Nguyen T (2009) Predicting build failures using social network analysis on developer communication. In: ICSE ’09: proceedings of the 31st international conference on software engineering. IEEE Computer Society, pp 1–11
Yin RK (1994) Case study research: design and methods. Sage, Thousand Oaks, California
Zeller A (2009) Why programs fail, 2nd edn: a guide to systematic debugging. Morgan Kaufmann
Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: ICSE ’08: proceedings of the 30th international conference on software engineering. ACM, pp 531–540
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: ESEC/FSE ’09: proceedings of the 2009 ACM SIGSOFT symposium on foundations of software engineering. ACM, pp 91–100
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering, May 2007

Download references

Acknowledgements

We want to thank Audris Mockus of Avaya Labs and the anonymous reviewers of ICPC ’10 for their many helpful comments on earlier revisions of this study.

Author information

Authors and Affiliations

Software Analysis and Intelligence Lab (SAIL), Queen’s University, School of Computing, 156, Barrie Street, Kingston, ON, K7L 3N6, Canada
Nicolas Bettenburg & Ahmed E. Hassan

Authors

Nicolas Bettenburg
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Bettenburg.

Additional information

Editor: Giulio Antoniol and Keith Gallagher

Appendices

Appendix A: Repeatability of our Work

We have carried out most of our statistical analyses in a statistical tool called R, which is a widely used open-source implementation of the commercial S programming language, originally developed by John Chambers at Bell Labs. For further information and downloads, we direct our interested reader to http://cran.r-project.org/. In order to provide a common ground for future research in this area, and to enable repeatability of our work, we present the source code for all our analyses carried out in this study in this Appendix. In addition to the R scripts, we provide the datasets for both case studies under the following URL: http://sailhome.cs.queensu.ca/replication/social-interactions/.

1.1 Case Study One: ECLIPSE

Appendix B: Summary of Hierarchical Models with β-coefficients

Table 17 Hierarchical analysis of logistic regression models along the four dimensions of social interaction metrics for ECLIPSE 3.0

Full size table

Table 18 Hierarchical analysis of logistic regression models along the four dimensions of social interaction metrics for ECLIPSE 3.1

Full size table

Table 19 Hierarchical analysis of logistic regression models along the four dimensions of social interaction metrics for ECLIPSE 3.2

Full size table

Table 20 Hierarchical analysis of logistic regression models along the four dimensions of social interaction metrics for Mozilla FIREFOX 1.5

Full size table

Table 21 Hierarchical analysis of logistic regression models along the four dimensions of social interaction metrics for Mozilla FIREFOX 2.0

Full size table

Table 22 Hierarchical analysis of logistic regression models along the four dimensions of social interaction metrics for Mozilla FIREFOX 3.0

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bettenburg, N., Hassan, A.E. Studying the impact of social interactions on software quality. Empir Software Eng 18, 375–431 (2013). https://doi.org/10.1007/s10664-012-9205-0

Download citation

Published: 28 April 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10664-012-9205-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Studying the impact of social interactions on software quality

Abstract

Access this article

Similar content being viewed by others

How does developer interaction relate to software quality? an examination of product development data

An empirical study on the effect of community smells on bug prediction

The impact of tangled code changes on defect prediction models

Notes

References

Acknowledgements