Abstract
Evolutionary coupling is a well investigated phenomenon in software maintenance research and practice. Association rules and two related measures, support and confidence, have been used to identify evolutionary coupling among program entities. However, these measures only emphasize the co-change (i.e., changing together) frequency of entities and cannot determine whether the entities co-evolved by experiencing related changes. Consequently, the approach reports false positives and fails to detect evolutionary coupling among infrequently co-changed entities. We propose a new measure, identifier correspondence (id-correspondence), that quantifies the extent to which changes that occurred to the co-changed entities are related based on identifier similarity. Identifiers are the names given to different program entities such as variables, methods, classes, packages, interfaces, structures, unions etc. We use Dice-Sørensen co-efficient for measuring lexical similarity between the identifiers involved in the changed lines of the co-changed entities. Our investigation on thousands of revisions from nine subject systems covering three programming languages shows that id-correspondence can considerably improve the detection accuracy of evolutionary coupling. It outperforms the existing state-of-the-art evolutionary coupling based techniques with significantly higher recall and F-score in predicting future co-change candidates.
Similar content being viewed by others
Notes
Sørenson–dice coefficient. https://en.wikipedia.org/wiki/s%c3%b8rensen%e2%80%93dice_coefficient
Strike a match. http://www.catalysoft.com/articles/strikeamatch.html.
SourceForge. https://sourceforge.net/.
Exuberant CTAGS. https://sourceforge.net/projects/ctags/.
Implementation and database. https://drive.google.com/open?id=17biLYZu-nfzj_wiMG-PTvfiuuiXOWDXR.
Wilcoxon signed rank test online. http://www.statskingdom.com/175wilcoxon/signed/ranks.html.
Wilcoxon signed rank test. https://en.wikipedia.org/wiki/wilcoxon/signed-rank/test.
Sørenson–dice coefficient. https://en.wikipedia.org/wiki/s%c3%b8rensen%e2%80%93dice_coefficient
References
Agrawal R, Imieliski T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD International Conference on Management of Data (ACM SIGMOD’93) 22(2):207–216
Ahsan S N, Wotawa F (2011) Fault prediction capability of program file’s logical-coupling metrics. In: Proceedings of 2011 joint conference of the 21st int’l workshop on and 6th int’l conference on software process and product software measurement (IWSM-MENSURA’11), pp 257–262
Alali A, Bartman B, Newman C D, Maletic J I (2013) A preliminary investigation of using age and distance measures in the detection of evolutionary couplings. In: Proceedings of the 10th working conference on mining software repositories (MSR’13), pp 169–172
Ali N, Jaafar F, Hassan A E (2013) Leveraging historical co-change information for requirements traceability. In: Proceedings of the 2013 working conference on reverse engineering (WCRE’13), pp 361–370
Bantelay F, Zanjani M B, Kagdi H (2013) Comparing and combining evolutionary couplings from interactions and commits. In: Proceedings of the 2013 working conference on reverse engineering (WCRE’13), pp 311–320
Bavota G, Dit B, Oliveto R, Penta M D, Poshyvanyk D, Lucia A D (2013) An empirical study on the developers’ perception of software coupling. In: Proceedings of the international conference on software engineering (ICSE’13), pp 692–701
Brindescu C, Codoban M, Shmarkatiuk S, Dig D (2014) How do centralized and distributed version control systems impact software changes?. In: ICSE, pp 322–333
Canfora G, Cerulo L, Penta M D (2006) On the use of line co-change for identifying crosscutting concern code. In: Proceeding of the international conference on software maintenance (ICSM’06), pp 213–222
Canfora G, Ceccarelli M, Cerulo L, Penta M D (2010) Using multivariate time series and association rules to detect logical change coupling: an empirical study. In: Proceedings of the IEEE international conference on software maintenance (ICSM’10), pp 1–10
Ceccarelli M, Cerulo L, Canfora G, Penta M D (2010) An eclectic approach for change impact analysis. In: Proceedings of the international conference on software engineering (ICSE’10), pp 163–166
D’Ambros M, Lanza M (2006) Reverse engineering with logical coupling. In: Proceedings of the 13th working conference on reverse engineering (WCRE’06), pp 189–198
Dice L R (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302
Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the international conference on software maintenance (ICSM’98), pp 190–198
Gall H, Jazayeri M, Krajewski J (2003) CVS Release history data for detecting logical couplings. In: Proceedings of the 6th international workshop on principles of software evolution (IWPSE’03), pp 13–23
Hanakawa N (2007) Visualization for software evolution based on logical coupling and module coupling. In: Proceedings of the 14th Asia-Pacific software engineering conference (APSEC’07), pp 214–221
Hotta K, Sano Y, Higo Y, Kusumoto S (2010) Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software. In: Proceedings of the international workshop on principles of software evolution (IWPSE’10), pp 73–82
Islam J F, Mondal M, Roy C K (2016) Bug replication in code clones: an empirical study. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering (SANER’16), pp 68–78
Islam M A, Islam M, Mondal M, Roy B, Roy C K, Schneider K A (2018) Detecting evolutionary coupling using transitive association rules. In: SCAM, pp 113–122
Itkonen J, Hillebrand M, Lappalainen V (2004) Application of relation analysis to a small java software. In: Proceedings of the 8th European conference on software maintenance and reengineering (CSMR’04), pp 233–239
Jaafar F, Gueheneuc Y, Hamel S, Antoniol G (2011) An exploratory study of macro co-changes. In: Proceedings of the 2011 working conference on reverse engineering (WCRE’11), pp 325–334
Kagdi H, Gethers M, Poshyvanyk D, Collard M L (2010) Blending conceptual and evolutionary couplings to support change impact analysis in source code. In: Proceedings of the 17th IEEE working conference on reverse engineering (WCRE’10), pp 119–128
Kagdi H, Gethers M, Poshyvanyk D (2013) Integrating conceptual and logical couplings for change impact analysis in software. Empir Softw Eng 18 (5):933–969
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Kotsiantis S, Kanellopoulos D (2006) Association rules mining: a recent overview. GESTS International Transactions on Computer Science and Engineering 32(1):71–82
Krinke J (2011) Is cloned code older than non-cloned code?. In: Proceedings of the 5th international workshop on software clones (IWSC’11), pp 28–33
Lozano A, Wermelinger M (2008) Assessing the effect of clones on changeability. In: Proceedings of the IEEE international conference on software maintenance (ICSM’08), pp 227–236
Lozano A, Wermelinger M (2010) Tracking clones’ imprint. In: Proceedings of the 4th international workshop on software clones (IWSC’10), pp 65–72
Mondal M, Roy C K, Schneider K A (2012) An empirical study on clone stability. ACM SIGAPP Applied Computing Review 12(3):20–36
Mondal M, Roy C K, Schneider K A (2013a) Improving the detection accuracy of evolutionary coupling. In: Proceedings of the IEEE 21st international conference on program comprehension (ICPC’13), pp 223–226
Mondal M, Roy C K, Schneider K A (2013b) An insight into the dispersion of changes in cloned and non-cloned code: a genealogy based empirical study. Science of Computer Programming Journal 95(4):445–468
Mondal M, Roy C K, Schneider K A (2014) Improving the detection accuracy of evolutionary coupling by measuring change correspondence. In: Proceedings of the IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE’14), Software Evolution Week, pp 358– 362
Mondal M, Roy C K, Schneider K A (2016) An empirical study on ranking change recommendations retrieved using code similarity. In: Proceedings of the 10th international workshop on software clones (IWSC’16), pp 44–50
Mondal M, Rahman M S, Roy C K, Schneider K A (2018) Is cloned code really stable? Empir Softw Eng 23(2):693–770
Mondal M, Roy B, Roy C K, Schneider K A (2019) Investigating context adaptation bugs in code clones. In: ICSME, pp 157–168
Mondal M, Roy B, Roy C K, Schneider K A (2020a) Associating code clones with association rules for change impact analysis. In: SANER, pp 93–103
Mondal M, Roy B, Roy C K, Schneider K A (2020b) Investigating near-miss micro-clones in evolving software. In: ICPC, p 11
Oliva G A, Gerosa M A (2011) On the interplay between structural and logical dependencies in open-source software. In: Proceedings of the 25th Brazilian symposium on software engineering (SBES’11), pp 144–153
Poshyvanyk D, Marcus A (2006) The conceptual coupling metrics for object-oriented systems. In: Proceedings of the international conference on software maintenance (ICSM’06), pp 469–478
Pugh S, Binkley D, Moonen L (2018) The case for adaptive change recommendation. In: SCAM, pp 129–138
Robbes R, Pollet D, Lanza M (2008) Logical coupling based on fine-grained change information. In: Proceedings of the 2008 working conference on reverse engineering (WCRE’08), pp 42–46
Rolfsnes T, Alesio S D, Behjati R, Moonen L, Binkley D W (2016) Generalizing the analysis of evolutionary coupling for software change impact analysis. In: Proceedings of the 24th IEEE international conference on software analysis, evolution, and reengineering (SANER’16), pp 201–212
Romano J, Kromrey J D, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys?. In: Annual meeting of the florida association of institutional research
Sørensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Kongelige Danske Videnskabernes Selskab 5(4):1–34
Sun X, Li B, Tao C, Wen W, Zhang S (2010) Change impact analysis based on a taxonomy of change types. In: COMPSAC, pp 373–382
Tantithamthavorn C, Ihara A, Matsumoto K (2013) Using co-change histories to improve bug localization performance. In: ACIS, pp 543–548
Wenzel S, Hutter H, Kelter U (2007) Tracing model elements. In: ICSM, pp 104–113
Wong S, Cai Y (2011) Generalizing evolutionary coupling with stochastic dependencies. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE’11), pp 293–302
Ying A T T, Murphy G C, Ng R, Chu-Carroll M C (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30 (9):574–586
Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proceedings of the 26th international conference on software engineering (ICSE’04), pp 563–572
Acknowledgments
This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), and by two Canada First Research Excellence Fund (CFREF) grants coordinated by the Global Institute for Food Security (GIFS) and the Global Institute for Water Security (GIWS).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Andrea De Lucia
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mondal, M., Roy, B., Roy, C.K. et al. ID-correspondence: a measure for detecting evolutionary coupling. Empir Software Eng 26, 5 (2021). https://doi.org/10.1007/s10664-020-09921-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-020-09921-9