Skip to main content
Log in

What are the effects of history length and age on mining software change impact?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The goal of Software Change Impact Analysis is to identify artifacts (typically source-code files or individual methods therein) potentially affected by a change. Recently, there has been increased interest in mining software change impact based on evolutionary coupling. A particularly promising approach uses association rule mining to uncover potentially affected artifacts from patterns in the system’s change history. Two main considerations when using this approach are the history length, the number of transactions from the change history used to identify the impact of a change, and history age, the number of transactions that have occurred since patterns were last mined from the history. Although history length and age can significantly affect the quality of mining results, few guidelines exist on how to best select appropriate values for these two parameters. In this paper, we empirically investigate the effects of history length and age on the quality of change impact analysis using mined evolutionary coupling. Specifically, we report on a series of systematic experiments using three state-of-the-art mining algorithms that involve the change histories of two large industrial systems and 17 large open source systems. In these experiments, we vary the length and age of the history used to mine software change impact, and assess how this affects precision and applicability. Results from the study are used to derive practical guidelines for choosing history length and age when applying association rule mining to conduct software change impact analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Note that various granularity choices are possible since the algorithms are granularity agnostic; if fine-grained co-change data is available (or computable), the same algorithms will relate methods or variables just as well as more coarse-grained files. In this paper we consider a practical fine-grained level that uses method-level information where possible (i.e., for source files that can be parsed, as discussed later in the paper), and file-level information otherwise (e.g., for test plans, build files, and configuration files).

  2. For a normally distributed population of 50 000, a minimum of 657 samples is required to attain 99% confidence with a 5% confidence interval that the sampled transactions are representative of the population. Since we do not know the distribution of transactions, we correct the sample size to the number needed for a non-parametric test to have the same ability to reject the null hypothesis. This correction is done using the Asymptotic Relative Efficiency (ARE). As AREs differ for various non-parametric tests, we choose the lowest coefficient, 0.637, yielding a conservative minimum sample size of 657/0.637 = 1032 transactions. Hence, a sample size of 1100 is more than sufficient to attain 99% confidence with a 5% confidence interval that the samples are representative of the population.

  3. Observe that the MAP values in the three subtables of Table 9 were obtained from three different randomly sampled collections, which explains the variation in MAPs for ages repeated in different collections. Although these values are within the 5% confidence interval targeted by our sampling approach, it still means that cutoff values obtained from one collection cannot be used to look up corresponding ages in another collection, as also shown by the values for age2000 and age200 in Table 9.

References

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data. ACM, pp 207–216

  • Alali A (2008) An empirical characterization of commits in software repositories. Ms.c. Kent State University, 53

  • Alali A, Kagdi H, Maletic JI (2008) What’s a typical commit? A characterization of open source software repositories. In: International conference on program comprehension (ICPC). IEEE, pp 182–191

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM, p 513

  • Bohner S, Arnold R (1996) Software change impact analysis. IEEE, USA

    Google Scholar 

  • Canfora G, Cerulo L (2005) Impact analysis by mining software and change request repositories. In: International software metrics symposium (METRICS). IEEE, pp 29–37

  • Eick S et al (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12

    Article  Google Scholar 

  • Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 190–198

  • German DM (2006) An empirical study of fine-grained software modifications. Empir Softw Eng 11(3):369–393

    Article  Google Scholar 

  • Gethers M et al (2011) An adaptive approach to impact analysis from change requests to source code. In: IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 540–543

  • Graves T L et al (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Hassan AE (2008) The road ahead for Mining Software Repositories. In: Frontiers of software maintenance. IEEE, pp 48–57

  • Hassan AE, Holt R (2004) Predicting change propagation in software systems. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 284–293

  • Jaafar F et al (2014) Detecting asynchrony and dephase change patterns by mining software repositories. J Softw: Evol Process 26(1):77–106

    Google Scholar 

  • Jashki M-A, Zafarani R, Bagheri E (2008) Towards a more efficient static software change impact analysis method. In: ACM SIGPLAN-SIGSOFT workshop on program analysis for software tools and engineering (PASTE). ACM, pp 84–90

  • Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. ACM SIGMOD Rec 35(1):14–19

    Article  Google Scholar 

  • Kagdi H, Yusuf S, Maletic JI (2006) Mining sequences of changed-files from version histories. In: International workshop on mining software repositories (MSR). ACM, pp 47–53

  • Kagdi H, Gethers M, Poshyvanyk D (2013) Integrating conceptual and logical couplings for change impact analysis in software. Empir Softw Eng 18(5):933–969

    Article  Google Scholar 

  • Kolassa C, Riehle D, Salim MA (2013) The empirical commit frequency distribution of open source projects. In: International Symposium On Open Collaboration (WikiSym). ACM, pp 1–8

  • Law J, Rothermel G (2003) Whole program path-based dynamic impact analysis. In: International conference on software engineering (ICSE). IEEE, pp 308–318

  • Lin W, Alvarez SA, Ruiz C (2002) Efficient adaptive-support association rule mining for recommender systems. Data Min Knowl Disc 6(1):83–105

    Article  MathSciNet  Google Scholar 

  • Maimon O, Rokach L (1383) In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin

  • Moonen L et al (2016a) Exploring the effects of history length and age on mining software change impact. In: IEEE international working conference on source code analysis and manipulation (SCAM), pp 207– 216

  • Moonen L et al (2016b) Practical guidelines for change recommendation using association rule mining. In: International conference on automated software engineering (ASE). ACM, pp 732–743

  • Podgurski A, Clarke L (1990) A formal model of program dependences and its implications for software testing, debugging, and maintenance. IEEE Trans Softw Eng 16(9):965–979

    Article  Google Scholar 

  • Ren X et al (2004) Chianti: a tool for change impact analysis of java programs. In: ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications (OOPSLA), pp 432–448

  • Robbes R, Pollet D, Lanza M (2008) Logical coupling based on fine- grained change information. In: Working conference on reverse engineering (WCRE). IEEE, pp 42–46

  • Rolfsnes T et al (2016a) Generalizing the analysis of evolutionary coupling for software change impact analysis. In: International conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 201–212

  • Rolfsnes T et al (2016b) Improving change recommendation using aggregated association rules. In: International conference on mining software repositories (MSR). ACM, pp 73–84

  • Schuirmann D (1981) On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics

  • Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: International conference on knowledge discovery and data mining (KDD). AASI, pp 67–73

  • Westlake W (1981) Response to T.B.L. Kirkwood: bioequivalence testing—a need to rethink. Biometrics 37:589–594

    Article  Google Scholar 

  • Yazdanshenas AR, Moonen L (2011) Crossing the boundaries while analyzing heterogeneous component-based software systems. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 193–202

  • Ying ATT et al (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586

    Article  Google Scholar 

  • Zanjani M B, Swartzendruber G, Kagdi H (2014) Impact analysis of change requests on source code based on interaction and commit histories. In: International working conference on mining software repositories (MSR), pp 162–171

  • Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 401–406

  • Zimmermann T et al (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the Research Council of Norway through the EvolveIT project (#221751/F20) and the Certus SFI (#203461/030). Dr. Binkley was supported by NSF grant IIA-1360707 and a J. William Fulbright award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leon Moonen.

Additional information

Communicated by: Gabriele Bavota and Michaela Greiler

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moonen, L., Rolfsnes, T., Binkley, D. et al. What are the effects of history length and age on mining software change impact?. Empir Software Eng 23, 2362–2397 (2018). https://doi.org/10.1007/s10664-017-9588-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9588-z

Keywords

Navigation