On the adoption and effects of source code reuse on defect proneness and maintenance effort

Software reusability mechanisms, like inheritance and delegation in Object-Oriented programming, are widely recognized as key instruments of software design that reduce the risks of source code being affected by defects, other than to reduce the effort required to maintain and evolve source code. Previous work has traditionally employed source code reuse metrics for prediction purposes, e.g., in the context of defect prediction. However, our research identifies two noticeable limitations of the current literature. First, still little is known about the extent to which developers actually employ code reuse mechanisms over time. Second, it is still unclear how these mechanisms may contribute to explaining defect-proneness and mainten0ance effort during software evolution. We aim at bridging this gap of knowledge, as an improved understanding of these aspects might provide insights into the actual support provided by these mechanisms, e.g., by suggesting whether and how to use them for prediction purposes. We propose an exploratory study, conducted on 12 Java projects–over 44,900 commits–of the Defects4J dataset, aiming at (1) assessing how developers use inheritance and delegation during software evolution; and (2) statistically analyzing the impact of inheritance and delegation on fault proneness and maintenance effort. Our results let emerge various usage patterns that describe the way inheritance and delegation vary over time. In addition, we find out that inheritance and delegation are statistically significant factors that influence both source code defect-proneness and maintenance effort.


I. INTRODUCTION
Software reusability is the design principle that allows developers to reuse part of the existing code to implement new features [1], [2].This practice is widely recognized as one of the key assets of software development, as developers may have multiple benefits, such as the reduction of evolution time, effort, and cost, other than of the risks of source code being affected by defects [3], [4], [5].
When it turns to Object-Oriented programming languages, many software reuse mechanisms have been provided over time.Design patterns [6], [7], third-party libraries [8], [9], and programming abstractions [10] are examples of these mechanisms.Focusing on JAVA, two very well-known types of programming abstractions are provided to developers: inheritance and delegation [11].The former allows a class to take the properties and attributes of another class, establishing a hierarchical relation between them.The latter refers to when a class invokes an instance of another class to carry out operations without performing any other type of action.
The importance of these mechanisms has been remarked multiple times by researchers.Already in early 90s, Chidamber and Kemerer [12] included the Depth of Inheritance Tree (DIT), i.e., a metric that measures the number of classes that inherit from another class, in their Object-Oriented metrics suite.Later on, several other metrics capturing various aspects of inheritance [13], [14], [15] and delegation [16], [17], [18] were proposed, along with best and bad practices on how to use reusability mechanisms [19], [20], [21], [22].On the empirical standpoint, a noticeable amount of investigations targeted the role of inheritance and delegation in keeping source code quality under control.For instance, researchers have been studying the relation between these mechanisms and other Object-Oriented metrics [23], [24], [25], design patterns [26], [27], code complexity [28], and source code maintainability [29], [30], [31].
Despite the availability of a large body of knowledge on how inheritance and delegation mechanisms contribute to the prediction of source code attributes, most of the prediction models defined so far made a strong assumption: developers make use of reusability principles while evolving source code.
On the one hand, the extent to which these mechanisms are used in practice might have a notable impact on their contribution to prediction models.On the other hand, it is unclear how the relation between reusability and source code attributes varies over time and, therefore, whether inheritance and delegation mechanisms should still be considered for prediction purposes as the system evolves.
In this registered report, we propose the methodology we plan to use to fill the limitations of current research with respect to the adoption of reusability practices and their evolutionary effects on two specific source code attributes such as defect proneness and maintenance effort.We select these attributes as they represent two interesting use cases to assess reusability mechanisms.First, these mechanisms are indeed supposed to reduce fault proneness and maintenance effort [3], [4], [5].Second, a number of prediction models targeted the early location of defects and estimation of the effort required to perform evolutionary tasks [39], [45], [40].
Our study will focus on JAVA projects, as Java (1) offers mechanisms that encourage the use of inheritance and delegation [46], [47] and ( 2) is still among the most popular programming languages used in industry. 1To conduct our experiment, we will first mine the DEFECTS4J dataset to extract commit-level information on the adoption of reusability mechanisms.Then, we will develop statistical models to assess the contribution of reusability mechanisms on defect proneness-as indicated by the number of defects over timeand maintenance effort-as indicated by the code churn of commits.
Our work has an exploratory connotation, as we do not start with predefined hypotheses but plan to develop a set of hypotheses after the execution of the study, based on the results achieved.All the collected data and the scripts developed in the context of our research will be made publicly available for the research community.

II. BACKGROUND AND RELATED WORK
We first describe the most widely used paradigms in the Object-Oriented programming languages for reusing code: inheritance and delegation.Then, we survey the related literature targeting code reusability and its impact on source code.

A. Inheritance and Delegation
In JAVA there are two ways to define a hierarchical dependency between two classes: 'extends'.Given two classes A and B, A is defined as super-class of B if B inherits variables or methods by A.
In JAVA to establish this super-class -sub-class relation the sub-class must indicate it through the keyword "extends".
'implements'.Given a class A, and an interface B, we will claim that A inherits from B if A implements the interface B. In JAVA this mechanism is provided using the keyword "implements".In particular, when a class A inherits using an interface, it must provide a concrete implementation of methods defined as a blueprint on interface.These definitions recall the concept of reusability in terms of specification inheritance, implementation inheritance, and delegation [48].From a practical point of view, the first one refers to the possibility of replacing an object A with an object B using a combination of two principles: • Strict Inheritance.When a sub-class A exposes behavior and properties of super-class B without making any changes [48].
• The Liskov Substitution Principle.According to Liskov and Wing [49], given two classes A and B, A is a subclass of B if is possible to substitute the object B with the object A every time that the object B was expected.The implementation inheritance occurs when a class indirectly reuses a super-class source code.The sub-class can wholly or partially override methods and/or properties and replace the super-class's original behavior with its own.However, the implementation inheritance by definition violates the encapsulate principle because a sub-class could accidentally invoke methods or use some proprieties of the super-class in a wrong manner [48].To avoid this, it is possible to substitute the implementation inheritance with the delegation in some cases.With this mechanism, a class A does not inherit anything from another class B, but A invokes methods of B directly by declaring itself a variable of type B.

B. Related Work
Source code reusability has been the subject of several researches in the last decades.These touched various angles of the problem, by introducing novel metrics to capture inheritance relations [12], [13], [14], [15] and delegation [16], [17], [18], defining best design practices to exploit the benefits of reusability [19], [20], or identifying a number of source code quality issues that reusability can cause, e.g., code smells [21], [22], [50].While the scope of our work targets inheritance and delegation mechanisms, it is worth mentioning the existence of close research areas such as the analysis of design patterns [51], [52] and third-party libraries [53].
Reusability and code quality.As for the themes of our exploratory study, Albalooshi and Mahmood [28] conducted an empirical analysis on the implementation inheritance by considering three programming languages like C++, PYTHON, and JAVA.As a result, the authors found that the mechanisms of JAVA to define inheritance tend to degrade source code quality.Goel and Bathia [54] obtained similar results by analyzing the impact of multilevel inheritance on the reusability considering three C++ projects.They found a negative correlation between the use of inheritance and the quality of source code in terms of maintainability.Other research efforts targeted the effect of inheritance and delegation on various aspects of source code quality.Chhikara et al. [23] conducted a case study on one small-scale software project, reporting on the correlation between inheritance metrics and other metrics belonging to the Chidamber and Kemerer suite.Chawla and Nath [24] took a closer look at how inheritance and delegation metrics may impact software coupling, concluding that these metrics can be useful to assess code quality.Similar findings were reported by Abreu et al. [25].Additional experiments were conducted to assess the relation between reusability and design patterns [26], [27] and code complexity [28]: all these studies converged toward the relevance of inheritance and delegation.More recently, we carried out a study to investigate the evolution of inheritance and delegation and their impact on the severity of code smells [30].The results revealed that inheritance and delegation tend to increase over time, but not in a statistically significant manner.However, increasing the adoption of these mechanisms tends to decrease code smells' severity.
The potential benefits of reusability have led researchers to use inheritance and delegation metrics within prediction models.In this respect, most of the defect prediction models include reusability as a feature [32].Perhaps more importantly, these metrics have been sometimes shown to significantly contribute to the predictions of those models: for instance, Jureczko and Madeyski [55] showed that the Depth of Inheritance Tree metric is among the best predictors of source code defectiveness.These results were later confirmed by other software maintenance and evolution researches [56], [57].
Reusability and maintenance effort.From an empirical side, Prechelt et al. [31] carried out two experiments to investigate the relation between inheritance metrics and maintenance effort estimation.Their results revealed that maintaining a low level of inheritance depth positively impacts the (decrease of) developer's effort to maintain source code.Similarly, Daly et al. [29] showed that as the inheritance depth level increases, so does the effort of developers to maintain code.
In terms of maintenance effort estimation, researchers have been mainly looking at process-level information (e.g., team data and measurements of the development activities), attempting to provide indications in terms of direct and indirect estimations of entire projects under maintenance [58].Besides that, researchers have been also working on effort prediction of maintenance activities, which revolves around the prediction of the effort spent in performing specific activities such as code review [59] and bug fixing time [60], [61].The contribution provided by reusability metrics to those models are, however, unclear.Recently, Nagappan et al. [40] and Liu et al. [62] proposed the use of code churn, i.e., the amount of lines of code modified within commits, as an alternative metric of maintenance effort which better aligns with the actual effort spent by developers while performing evolutionary tasks.
Our work.With respect to the papers discussed above, ours has multiple differences.In the first place, most of previous work analyzed reusability by relying on the computation of metrics, e.g., DIT; as further elaborated in Section IV, we plan to operationalize reusability by means of specification inheritance, implementation inheritance, and delegation, being able to better map the employment of reuse mechanisms over time.In the second place, we plan to conduct a fine-grained analysis where the evolution and impact of reusability will be investigated at commit-level.Furthermore, we plan to address a key limitation of most previous work proposing prediction models: the contribution of code reuse to their capabilities indeed assumes that developers make use of reusability mechanisms.As such, our evolutionary study will provide more detailed insights into the potential benefits brought by inheritance and delegation to state-of-the-art prediction models.

III. RESEARCH QUESTIONS AND OBJECTIVES
The goal of the study aims at investigating how the use of reusability mechanisms evolves over time and assessing their impact on fault-proneness and code churn.The purpose is to understand whether those mechanisms can provide developers with an indication of source code quality variationconsidering the fault-proneness and effort to fix faults of a project.The quality focus is on the reusability in terms of implementation inheritance, specification inheritance, and delegation and their evolution within software projects.We conduct the analysis from both practitioners and researchers (perspective).The first one is interested in understanding whether the reusability mechanism can be suitable for monitoring the quality of a system.The latter is to have more evidence about inheritance and delegation mechanisms when monitoring the source code quality.The context of our investigation will be JAVA projects publicly available.Based on the goal of our study, we formulate three main research questions.
RQ1.How does the use of source code reusability mechanisms vary during software evolution?
The first research question aims at understanding the use of source code reusability mechanisms by developers during software evolution.More specifically, the goal of RQ1 is that of providing insights on the evolution of reuse mechanisms that might later be exploited to better interpret the findings of RQ2 and RQ3.In other terms, the patterns observed in the context of this research question will be also useful to understand the effects of inheritance and delegation on defect-proneness and code churn, e.g., should we identify an exponential growth in the adoption of delegation, this would potentially make this mechanism more relevant for software evolution, hence influencing more the amount of code churn required to apply modifications.Since we intend to analyze three mechanisms for REUSABILITY, i.e., specification inheritance, implementation inheritance, and delegation [48], that can impact differently on software evolution, we will consider three sub-research questions: RQ1 1 .How does the use of implementation inheritance vary during software evolution?
How does the use of the specification inheritance vary during software evolution?
How does the use of delegation vary during software evolution?Once the evolution of reusability mechanisms is analyzed, we plan to investigate how the evolution might affect code quality, measured in terms of fault-proneness.

RQ2. How do source code reusability mechanisms impact fault-proneness over time?
Finally, we plan to assess the impact of reusability mechanisms on the maintenance effort required to fix faults.Among the various direct and indirect metrics available in literature [58], we will operationalize maintenance effort through code churn, that is, the amount of lines of code modified within a commit.This is an indirect metric that can proxy the actual effort spent by developers when maintaining source code [58], [63], [64].

RQ3. How do source code reusability mechanisms impact code churn?
The above research questions will be addressed by employing statistical tests and models (see details in Section IV-C).To design and report on the empirical study to be performed, we will follow the guidelines proposed by Wohlin et al. [65] and ACM/SIGSOFT Empirical Standards2 .All the experimental material (e.g., datasets, scripts) will be publicly available in an online appendix.
IV. RESEARCH METHODOLOGY Figure 1 overviews the methodology we intend to exploit to address our research questions, as detailed in the next sections.

A. Dataset
We plan to perform an empirical analysis on JAVA projects provided by DEFECTS4J, which collects information on 835 bugs provided by 17 real JAVA projects.According to the official documentation 3 each bug collected into the dataset is characterized by the following properties: 1) It is fixed in a single commit, meaning that the bug resolution never refers to more than one commit; 2) It is minimized, meaning that Defects4J maintainers manually remove commits that do not provide information about the introduction of bugs or fixing activity (e.g., commits where refactoring activities are done); 3) The fixing activities modify the source code.This means that the bug introduction can be caused by several factors, e.g., wrong parameters in configuration files and problems in the production class.However, the corresponding fixing only concerns changing the source code.
There are multiple reasons leading to the selection of this dataset.First, it enables the investigation of the impact of reuse mechanisms in a noise-free environment.Indeed, we can provide more precise insights into the actual role played by inheritance and delegation which would not be possible through larger software repository mining studies where the existence of uncontrolled conditions, e.g., tangled changes [66], may bias the conclusions provided.Secondly, despite the defects being carefully selected, those defects are of different types and nature, therefore representing various defects affecting real-world software systems [67].Last but not least, Defects4J has been widely used in literature (e.g., [68], [69]), hence representing a valuable asset that enables us to build additional knowledge on a state-of-the-art dataset -this would also be useful for other researchers interested in building on top of our work.
As mentioned in Section II, little has been done to analyze code reuse mechanisms over time and how those may contribute to explaining fault-proneness and maintenance efforts during software evolution.For this reason, we intend to fill the gap by analyzing code reuse mechanisms from a low granularity perspective, i.e., commits.We plan to analyze over 9,000 commits.Table I reports statistics of the projects included in the Defects4J dataset.In particular, for each project the table provides (i) the numbers of defects, (ii) the IDs of fixed and unfixed defects, (iii) process metrics such as numbers of commits, numbers of pull request, and number of contributors; and (iv) its minimum and maximum LOC.

B. Data Extraction Procedure
To answer our research questions, we need to quantify the reusability in terms of implementation inheritance, specification inheritance, and delegation.To this end, we plan to use a tool already validated in our previous work [30].In particular, the tool computes those metrics following these patterns:  Specification Inheritance.Given a class A, the tool considers the specification inheritance as the arithmetical sum of each interface used by A. For instance, suppose that A inherits methods from two interfaces B and C, and C in turn inherits methods from another interface D. In this case, the specification inheritance for A is 3. Implementation Inheritance.Suppose that A is a sub-class of B, the tool considers the implementation inheritance as the arithmetical sum of each method in B called by some method in A. For example, suppose that A is a class with N methods, and B a class with just one method call bar().
To increase the number of implementation inheritance by one, one of the methods in A must invoke bar().Delegation.Given a class A, the tool considers the delegation metric as the arithmetical sum of each non-primitive variable (i.e., variables different from int, double, String, and so on) or variables that do not have a binding type provided by external libraries (e.g., Checkbox offered by javax.swingframework).For each variable, the tool verifies if it is only used to invoke external objects.
To answer research questions RQ2 and RQ3, we plan to collect information on bugs and code churns.Defects4J contains information on bugs at commit level, while we will consider PYDRILLER, an automatic static analysis tool that can analyze GIT repositories, to extract information about commits, developers, modifications, diffs, and source code 4 .

C. Experimental Plan
For the sake of comprehensibility, we present the empirical analysis we plan to perform for each research question.
1) RQ1.Analysis of the evolution of reusability mechanisms over time.
To answer this research question we will analyze the behavior of the reusability metrics (implementation inheritance, specification inheritance and delegation) during the evolution.In particular, we plan to employ basic statistical analysis, and visualize results using plots.
2) RQ2.Analysis of the impact on fault-proneness of reusability mechanisms over time.Moving on RQ2, we 4 https://pydriller.readthedocs.io/en/latest/intro.html plan to build a statistical model to verify how reusability metrics impact the variability of bugs in the source code.
3) RQ3.Analysis of the impact on maintenance effort of reusability mechanisms over time.Also for RQ3, we plan to build a statistical model to verify how reusability metrics impact the maintenance effort to fix a bug.
The statistical models will be devised as follows.
Independent Variables.According to our previous considerations, we will use as independent variables the reusability metrics, i.e., implementation inheritance, specification inheritance, and delegation.
Response Variable.The number of bugs represents our response variable.However, since our goal is to understand the variability of them, we plan analyzing whether the number of bugs between two commits is stable, or increase/decrease.In particular, it will be considered "stable" if we do not identify any changes in terms of the number of bugs between the commit I and the commit I+1.It will be considered as "increase" ("decrease") if we identify a positive (negative) value as a result of the subtraction between the numbers of bugs on the commit I+1 and the numbers of bugs on the commit I.
Control Variables.Conscious that bugs variation could depend on other external factors, we will consider two sets of metrics as control variables.On the one hand, we will consider the following Chidamber and Kemerer (CK) metrics [12]: DIT (Depth of Inheritance Tree), NOC (Number Of Children), LOC (Lines of Code), LCOM (Lack of Cohesion of Methods), WMC (Weighted Methods per Class), RFC (Response for a Class) and CBO (Coupling Between Objects).On the other hand, we will also take into account the Code Churns.It is important to mark that although NOC and DIT are also metrics related to code reuse, we will include them with the intent of comparing their statistical power to the adoption mechanisms estimated by our reusability metrics.However, we plan to assess the presence of possible multi-collinearity when performing the statistical modeling due to the presence of those related metrics.We will rely on previous guidelines provided by the literature [70], [71].
Choosing Statistical Model.To address RQ2, we will use a Multinomial Log-Linear Model [72].This model generalizes logistic regression to multi-class problems, so it perfectly fits our case.In particular, as already done in our previous work [30], we plan to use R for running the analysis using the function MULTINOM available in the package NNET 5 that fits the model via neural networks.Finally, as for RQ3, given the nature of the response variable, i.e., code churn, we will use a different statistical model, i.e., Generalized Linear Model [73] using GLM function.
Additional analysis.The influence of the reuse metrics on introducing defects might not necessarily be directly measurable.For instance, previous work [74], [75], [76] reported that inheritance might negatively impact program comprehension, which, in turn, can negatively contribute to the defect-proneness of source code.In other words, the value of reuse metrics can be sneakier, representing a co-occurring phenomenon rather than directly responsible for introducing defects.For this reason, besides interpreting the statistical codes provided by the statistical models, we will also (1) compute the number of cases in which defect-inducing commits involved the variation of inheritance and delegation metrics and ( 2) manually analyze those cases to better understand the way these metrics can directly impact the introduction of defects.Such an additional analysis will therefore provide more qualitative insights into how reuse mechanisms can impact defect-proneness.

D. Publication of generated data
All the material of our study, e.g., scripts will be publicly available in an online repository (e.g., GitHub) to guarantee the replicability of our work and possible reuse for future investigations by other researchers.

V. THREATS TO VALIDITY
In this subsection, we discuss possible threats to validity and the strategies we will adopt to mitigate them.Construct Validity.These threats refer to a possible mismatch between the theory and the observation.Therefore, the selection of the dataset represents a crucial point.We plan to use Defects4J, which has been already widely used by the research community in several studies (e.g., [77][78] [79]) and that will reduce possible bias due to the presence of uncontrolled conditions, e.g., tangled changes [66], allowing us to investigate the impact of reuse mechanisms on defectproneness and maintenance effort more precisely.A second threat to validity relates to the selection of the metric used to operationalize maintenance effort.We plan to use code churn [63]: we are aware that this metric can only proxy the actual effort spent when maintaining source code, yet this choice is required in our case because of the unavailability of precise data regarding the maintenance effort in our dataset.The tool we plan to use to extract metrics, e.g., reusability or CK metrics, represents another potential threat to validity. 5https://cran.r-project.org/web/packages/nnet/nnet.pdf We will use tools already validated and used by the research community [30], [80].Finally, as mentioned in Section IV-A, in Defects4j a single bug can be introduced by multiple factors, but its resolution will always occur within a JAVA file.Thus, to avoid possible threats to contraction validity, we will discard commits that introduced bugs caused by issues not involving source code.This allows focusing only on defects introduced and resolved through changes to the source files.
Internal Validity.These threats refer to factors that can impact the study results.In our context, the threat concerns the metrics we intend to exploit to build the statistical models.Besides our hypothesis, we will use control variables -previously shown to be significant for source code quality [81], [82], [23], [29]thus guaranteeing the reliability of our results.

Conclusion Validity.
Threats related to this area refer to the selection and the use of the statistical test.In particular, for addressing RQ2 we will use the Multinomial Logistic Linear Model [72].As for RQ3, we will apply the Generalized Linear Model [73].These choices come from the nature of our response variables, i.e., multiclass and continuous, respectively.Moreover, the research community used this type of model in similar contexts [30], [83], [84].
External validity.Threats in this category concern the generalizability of the results.Our work employs statistical analysis to seek relations between the employment of reuse mechanisms and source code maintainability, operationalized with defect-proneness and code churn metrics.The target of the work will be composed of 17 JAVA projects with over 9,000 commits coming from the DEFECTS4J dataset.As such, our work is based on the analyses conducted on a sample, hence our generalization strategy can be identified within the samplebased generalization strategies proposed by Wieringa and Daneva [85].In particular, among those strategies, the "statistical learning" seems to be the most appropriate.Wieringa and Daneva [85] reported that the "descriptions of statistical sample phenomena can be used to predict similar phenomena in new samples.[...].The goal is not to generalize to a population, but to generalize to the next few cases".This strategy is basically in line with the generalizing by similarity principle described by Ghaisas et al. [86].When contextualizing those strategies in our case, it is likely that similar results might be obtained in projects having similar characteristics with respect to those analyzed in our work (see Table I).Therefore, we cannot claim the generalizability of our findings to projects having different properties or even written in different programming languages.Replications in these contexts would still be desirable.

VI. CONCLUSION
This research will focus on understanding how inheritance and delegation mechanisms evolve over time and their impact on code quality, e.g., variability of bugs, at the commit level.In particular, we will conduct this study on over 9,000 commits provided by 17 Java projects reclaimed from Defects4J.We first plan to analyze the evolution of reusability metrics at the commit level.Then we will construct two different statistical models for assessing whether reusability metrics-combined with additional factors-impact bugs variability and the time to fix bugs in terms of code churn.

Fig. 1 :
Fig. 1: Overview of the methodology applied to address our research questions.

TABLE I :
Characteristics of the projects considered in the study.Information concerned with the 'Active Bug IDs' and 'LOC' are provided in a range reporting the minimum and maximum values observed over the history of the projects.