Study Design
The goal of the study is to qualitatively investigate quick remedy commits. The purpose is to define a taxonomy of quick remedy commits that developers perform to fix issues introduced in a previous commit and/or finalize an uncompleted implementation task. The study addresses the following research question (RQ):
RQ1: What types of quick remedy commits are made by developers in Java projects?
This RQ aims at identifying the types of quick remedy commits that are performed by developers (e.g., documenting through a code comment a piece of code introduced in the previous commit). Knowing the types of quick remedy commits made by developers can guide the development of tools to automatically alert developers when code changes they are committing may require a subsequent remedy commit. In some cases this could even avoid the introduction of bugs (e.g., due to changes not propagated in all code areas where they are required).
Data Collection and Analysis
To answer RQ1 we mined the complete change history of 1,497 open source Java projects hosted on GitHub. These projects represent the context of our study and have been selected from GitHub in November 2018 using the following constraints:
-
Programming language. We only considered projects written in Java since all the manual evaluators involved in the study (i.e., three of the four authors) have experience in Java, and would be able to understand the reasons behind the quick remedy comments in most of the cases.
-
Change history. Since we were interested in identifying a good number of quick remedy commits to manually analyze, we only selected projects having a relatively long change history, composed of at least 500 commits.
-
Popularity. The number of stars (About stars (GitHub) 2021) of a repository is a proxy for its popularity on GitHub. Starring a repository allows GitHub users to express their appreciation for the project. Projects with less than ten stars are excluded from the dataset, to avoid the inclusion of likely irrelevant/toy projects.
A total of 6,563 projects satisfied these constraints. Then, we sorted the projects in descending order based on their number of stars (i.e., the most popular on top), and we manually inspected the ranked list (starting from the top) to filter out repositories that do not represent real software systems (e.g., java-design-patterns 2021 and spring-petclinic 2021). Such a selection was done by checking the projects’ names and descriptions (no code analysis was performed). We also checked for projects with shared history (i.e., forked projects). In particular, we considered as forked projects two repositories having in their history at least one commit having the same SHA and commit date. When we identified a set of forked projects, we only selected among them the one with the longest commit history (e.g., both FindBugs 2021 and its successor SpotBugs 2021 fall under our search criteria, but we only kept the latter one). Such a process stopped once we reached 1,500 valid projects for our study.
During the cloning of the 1,500 GitHub repositories, we got a cloning error for three of them. Thus, we extracted the list of commits performed over the change history of the remaining 1,497 projects. Table 1 reports descriptive statistics for size, change history, and popularity of the selected projects. The complete list of considered projects is publicly available in our replication package (2021).
Table 1 Dataset statistics To extract the history of the subject systems, we iterated through the commit history related to all branches of each project with the --- command. This allowed us to analyze all branches of a project, without intermixing their history and avoiding unwanted effects of merge commits.
Then, given the commit history, our goal was to identify all pairs of subsequent commits (ci,ci+ 1) in which ci+ 1 had been performed by a developer Dj as a quick remedy fix for a commit ci also authored by Dj. In other words, ci+ 1 must (i) have been authored by the same developer of ci and performed within a relatively short time interval from ci; (ii) clearly be a “compensatory” fix for ci.
To identify the (ci,ci+ 1) pairs of interest, we adopt the following heuristic-based procedure. First, we computed the time interval between all adjacent (subsequent) commits in each system authored by the same developer. In git it is possible to retrieve the author date (i.e., the date in which the change has been implemented by the author) or the committer date (i.e., the date in which the change has been committed). Given the goal of our work, we considered the author date. We analyzed the distribution of these time intervals (see Fig. 2).
We considered the first quartile, exactly five minutes, as a candidate threshold to identify remedy commits: ci+ 1 commits performed as quick fixes for their predecessor ci commit. This allowed us to select pairs of commits meeting our first requirement: They were authored by the same developer and performed in rapid succession (i.e., within five minutes). This filtering left us with 1,041,397candidate commits.
Second, we set up a process to define lexical patterns allowing the identification of ci+ 1 commits in which the developer explicitly indicates in the commit notes the fact that ci+ 1 is a remedy commit for changes introduced in the previous commit (ci). The first author extracted from all 1,041,397commits output of the previous filtering step the words and 2-grams used in their commit notes. This means that, from a commit note reporting “Fixes a bug introduced in previous commit”, we would extract fixes, a, bug, etc. as the single words, and fixes a, a bug, bug introduced, etc. as 2-grams. To remove noise, stop words (e.g., articles) and all single words shorter than four characters had been excluded from the set of single words (not from the 2-grams list). The remaining words and all 2-grams had then been sorted by frequency in descending order, excluding the long tail of those appearing in less than ten commits. Indeed, even if useful to identify remedy commits, lexical patterns defined from these words/2-grams are unlikely to retrieve a substantial amount of useful commits and, thus, are excluded a priori from reducing the inspection effort. For each remaining word/2-gram, we randomly extracted ten commit notes in which it appears.
This dataset, composed of words/2-grams and related commit notes, had been manually and independently inspected by three authors with the goal of defining the needed lexical patterns. After an open discussion in which each author presented his list of patterns, the three evaluators agreed on the following lexical pattern to identify remedy commits:
(former or last or prev or previous) and commit
This means that commit notes including former commit, last commit, prev commit, or previous commit would be matched and considered as relevant for our study. While this heuristic is quite strict, our goal was to maximize precision at the expense of recall, considering the fact that our study is qualitative in nature and does not target a large number of manually analyzed commits. At the end of this last filtering step, we obtained 1,577ci+ 1 commits which (i) have been authored within five minutes from the commit ci previously performed by the same author; and (ii) explicitly mention in the commit note a lexical reference to the previous commit that can be captured by the defined pattern. Given the high cost of the manual analysis process detailed in the following, we decided to focus our analysis on a randomly selected sample of 500 commits, representing a 99% statistically significant sample with a 4.8% confidence interval.
The 500 commits were randomly distributed among three authors, making sure that each commit was classified by two authors. The goal of the process was to identify the exact reason behind the changes performed in the commit. If the commit was unrelated to the previous one, the evaluator classified it as false positive.
Otherwise, a tag explaining the reason for the change (e.g., remove debugging code from the previous commit) was assigned.
We did not limit our analysis to the reading of the commit message, but we analyzed the source code diff of the changes implemented in the GitHub commits, both in the ci+ 1 commit as well as in its predecessor (ci). The tagging process was supported by a Web application that we developed to classify the commit and to solve conflicts between the authors. The Web application is shown in Fig. 3. Each author independently tagged the commits assigned to him by defining a tag describing the reason behind the commit. Every time the authors had to tag a commit, the Web application also showed the list of tags created so far, allowing the tagger to select one of the already defined tags (visible in the bottom part of Fig. 3). Although, in principle, this is against the notion of open coding, in a context like the one encountered in this work, where the number of possible tags (i.e., cause behind the commit) is extremely high, such a choice helps using consistent naming and does not introduce substantial bias. In cases for which there was no agreement between the two evaluators (44%of the classified commits), the commit was assigned to an additional evaluator to solve the conflict. While such a percentage may look high, it is worth considering that our task was not to assign commits to a list of predefined categories, but to define the names for such categories during the tagging process. This naturally leads to a higher number of conflicts. Also, we considered as a conflict cases in which a different but “semantically equivalent” tag was used by the two evaluators (e.g., remove unnecessary code vs remove unneeded code). In this case, the third evaluator just made sure that consistent wording was used, and selected the proper tag. In a minority of cases, the two evaluators applied completely different tags and the third evaluator could choose whether to reuse one of the two labels or, instead, define a new tag by discussing and agreeing with the two original evaluators.
After having manually tagged all commits, we defined a taxonomy of quick remedy commits through an open discussion involving all the authors (see Fig. 4). We qualitatively answer our research question by discussing specific categories of commits likely related to the code changes developers often forget to implement and try to immediately remedy. For each category, we present interesting examples and discuss implications for researchers and practitioners.
Results
We addressed our research question by labeling 500 commits identified as candidates to being quick remedy commits (see Section 2.1). We identified 42 false positives (i.e., commits ci+ 1 that were not related to the preceding ci commit) and 458 commits actually classifiable as quick remedies.Footnote 1 Note that not all these quick remedy commits are compensatory fixes for issues caused by omitted changes. They also include fixes for previously introduced errors (e.g., the developer realizes that her previous commit introduced a bug) as well as commits aimed at simply improving the previously committed change (e.g., improve the name of a newly introduced variable). Finally, our taxonomy also features remedy commits aimed at fixing simple mistakes performed during the ci commit process itself (e.g., the developer forgot to include a modified file in commit ci and thus commits it in ci+ 1).
Overall, we identified 69 types of quick remedy commits made by developers, 20 of which relevant for changes omitted in the previous commit.
Figure 4 presents the results in the form of a hierarchical taxonomy composed by six root categories: Bug Fix, Code Refactoring/Clean Up, Build Issue, Missing Code Change, Documentation, and Reverted Commit. The more specific types of quick remedy commits are represented either as intermediate nodes or leaves, and commits relevant for the fixing of issues caused by omitted changes are marked with a
sign. For each category, we next describe representative examples and discuss implications for researchers (indicated with the
icon) and/or practitioners (
icon) derived from our findings.
Bug Fix (79)
This category groups pairs of commits (ci, ci+ 1) in which the remedy commit (i.e., ci+ 1) fixes a bug introduced in ci. We identified two main subcategories: Fix Broken Test, in which ci+ 1 has been triggered by test cases failing after the change implemented in ci, and Fix Implementation Logic, in which the developer realized that she introduced a bug in ci and quickly submits a patch.
The commits in the Fix Broken Test category targets the fixing of the production code or the test code modified in ci that caused a break in the test suite. For example, in the project of , a developer reported in the commit message: “Fix tests broken by former commits” (Commit to denominator project on GitHub 2021).
While in the cases we analyzed the issue was spotted and fixed quickly by the developer, there might be non-trivial cases in which only a subset of the test suite is executed for regression testing (e.g., due to a limited testing budget) and a non-executed broken test is not identified by the developer.
For researchers, this is an opportunity to study test breaking-changes and to develop techniques able to alert the developer when a change she implemented might require a double check of (part of) the test suite.
For practitioners, continuous integration practices can help in timely spotting these issues in most of the cases.
The fixes to the implementation logic are mostly classic bugs introduced but quickly recognized and fixed by developers (e.g., errors in conditions, wrong literal values, null pointer exceptions, etc.). While these are not related to omitted changes, they are interesting since they represent bugs fixed by developers within five minutes (due to our selection criteria for the commits).
This indicates that these bugs, while prevalent in our taxonomy (73 instances), are likely quite simple to fix. Thus,
researchers could investigate the possibility of creating approaches able to learn from this data on how to avoid and/or automatically fix these bugs. For example, recent work applied Neural Machine Translation (NMT) models to automatically fix bugs (Tufano et al. 2018). However, given the complexity of this task and the non-trivial bugs that these models have to fix, they are usually only able to automatically fix a minority of the bugs provided as input (Tufano et al. 2018). Focusing on these simpler but quite frequent bugs could represent a good application scenario for the NMT-based bug fixing approach.
Some of the fixes in the Fix Implementation Logic category are related to omitted changes (see Fig. 4). This includes the Forgot to Propagate Code Change category in which developers do not consistently propagate a change across all relevant code components. This is typical of cases in which code clones are spread in the system and inconsistent changes are implemented in ci (Krinke 2007). An example of this can be seen in the mathttTomP2P project. In a commit (Commit to TomP2P project on GitHub 2021b), the developers adapts a builder class () to earlier changes of the original class and they implement new methods such as and . In a follow-up change (Commit to TomP2P project on GitHub 2021c), they fix a conditional statement to check the status of a object in a new branch. Then, only a few seconds later (Commit to tomp2p project on GitHub 2021a), they update a conditional check with a similar structure but in another class. For this last commit, the commit message says “belongs to previous commit”. Another example can be seen in the mathttspacewalk project. In a commit (Commit to spacewalk project on GitHub 2021a), they update a SQL script by adding a query for the removal of unnecessary data. Then, in the quick subsequent commit (Commit to spacewalk project on GitHub 2021b), they propagate the same schema changes into a database upgrade file.
These examples highlight the relevance for practitioners of approaches to guide code changes (see e.g., the seminal work in the area by Zimmermann et al. (2005)) as well as the need for
the research community to continue improving these techniques and, possibly, making them easily pluggable into a continuous integration pipeline to foster developers’ adoption.
Interesting in this category is also the introduction of ambiguous references due to incomplete move package refactoring. We found this case in the project, where they migrate some classes to another package (Commit to Accumulo project on GitHub 2021), but still keep the old ones.
In a follow-up commit (Commit to accumulo project on GitHub 2021), they realize that they use, however, the wrong references to the migrated classes.
Code clone detection techniques (Roy et al. 2009) could help in these cases by promptly pointing the developer to the presence of multiple copies of the same classes in the repository. The integration of these approaches in a just-in-time fashion could help in identifying clones introduced in the last commit, thus avoiding mistakes as the one in the discussed commit (Commit to Accumulo project on GitHub 2021).
Code Refactoring/Clean up (39)
This category groups the pairs of commits (ci, ci+ 1) in which the remedy commit (i.e., ci+ 1) implements a refactoring/cleanup of the code changed in ci (see Fig. 4). In these commits developers are either not satisfied of the code they implemented or are trying to address warnings received by static analyzers.
Some other subcategories include the simple removal of code that was only temporary implemented in ci (i.e., Remove Debugging Code) or that becomes unnecessary after ci’s changes (i.e., Remove Unnecessary Code). Also, code formatting issues (e.g., mainly the inconsistencies of indentations and line breaks introduced with code changes) were fixed by developers in the remedy commit (ie Code Formatting). Additionally, in 2 cases, developers changed the code implemented in ci to improve its performance. An example can be seen in project (Commit to lombok project on GitHub 2021) where a developer fine tunes a cache clearing mechanism implemented in a previous commit by turning a variable volatile and moving the invocation for the cache clearing after a conditional check.
However, the main purpose of those code refactoring/clean up tasks is to improve the code understandability. Variable and method renaming refactoring (i.e., renaming a variable or method to better reflect its functionality) is the most common way to make the code easier to comprehend. Also popular are code transformations aimed at replacing literal values with variables or splitting long functions through extract method refactoring. The latter allows not only to foster comprehensibility, but also the reusability of small code snippets.
Other interesting cases are the ones in which developers modify the previously committed code to promote consistency with the coding style of the project (see e.g., Rename Method for Consistency). For example, in a commit of the project (Commit to liferay-portal project on GitHub 2021), developers opened an issue to “introduce tests to document current behavior” (Liferay Portal Issue LPS-44476 2021). Interestingly, in this process they very carefully review the used method names for better readability, and in a commit they say:
[...] where specific method names are NOT accurate, go for a generic name to force the developer to read the code to find what the method actually does.
The developers decided to change a method’s name from to . In the next commit (Commit to liferay-portal project on GitHub 2021), to remain consistent, they replace the method invocation of (in another class) to . For this last commit, the commit message says “Match previous commit even though this method name was accurate”.
The inconsistencies fixed with simple refactorings point to the possibility for the software engineering research community to investigate techniques able to learn coding conventions used in a given system and recommend fixes for possible violations. To the best of our knowledge, the only attempt at date has been made by Allamanis et al. (2014) with their NATURALIZE tool able to recommend meaningful identifier names and formatting guidelines. Other approaches focus only on rename refactoring suggestions (Lin et al. 2017, 2017). While these techniques cover most of the inconsistencies fixed in the Code Refactoring/Clean up category (e.g., Rename Method for Consistency, Fix Improper Exception Name), others are left uncovered (e.g., Fields Ordering), indicating more potential for additional research in the area of recommending coding convention fixes.
Build Issue (68)
This category is related to commits fixing build issues introduced as a result of the ci changes. The main subcategory here is the fix of the compilation errors/warnings issued by the compiler due to the changes in ci (i.e., Fix Compilation Warning/Error). Unused import statements are the main cause for the warnings we identified (see Fig. 4), and the trigger for the remedy commits in this category. The unnecessary import statements are caused either by statements introduced in ci by the developer and then unused, or by previously existing becoming unused due to the changes implemented in ci. These warnings are usually raised by static analysis checks performed at commit time and, thus, are easy to catch for developers.
In the Syntax Error category we found many cases of broken references due to rename refactoring operations performed in ci. These rename refactorings are related to variables, methods, classes, as well as packages. An example can be seen in the commit (Commit to tower project on GitHub 2021) of the project which followed a renaming of multiple classes. Some other cases were violating the syntax of the programming language due to introduced typos (e.g., missing statement separators).
Considering the good refactoring support provided by modern IDEs, the identification of these broken references as a consequence of refactorings was quite surprising for us.
This may indicate either that these refactorings were performed manually, leading to the introduction of broken references, or that bugs might affect refactoring engines, as already found by previous work in the literature (Daniel et al. 2007). Additional investigation focused on these specific types of errors is needed to understand the reasons behind them.
Other subcategories that also caused a build issue include the fix of introduced errors in configuration files (i.e., Fix Error in Configuration File) or in a build script (i.e., Fix Build Issue in Build Script). For example, in some remedy commits developers fixed broken tags in configuration files or incorrect filepath references in build scripts.
Missing Code Change (165)
This category groups the pairs of commits (ci, ci+ 1) in which the remedy commit (i.e., ci+ 1) adds some missing code changes that should be introduced within previous commit ci. We divided those commits into two subcategories: Commit Added/Deleted Files Missed in Previous Commit and Finalizing Code Change.
The first subcategory is related to fixing a previous commit error. In this case, we are not referring to the code changes implemented in ci, but to the commit process itself. This issue is mainly caused by an incorrect selection of committed files by the developer. Also, sometimes IDE cache issues can lead to a similar situation (e.g., the IDE cached the wrong version of a committed file or lost track of some code changes during the git commit process). While this subcategory is kind of unrelated to artifacts’ changes, it still provides hints for interesting research directions.
For example, approaches to automatically identify the set of files to commit can be designed to reduce the possibility of missing files or to include unrelated changes. This could also go further and recommend to the developer when to commit in such a way to avoid tangled commits (Herzig and Zeller 2013) and committing cohesive sets of code changes. To the best of our knowledge, the only step in this direction has been done by Bradley et al. (2018) with a context-aware developer assistant able to identify the files to push towards the repository when the developer asks. However, more automation can be envisioned, with approaches also able to (i) recommend when to commit (as previously said, to e.g., avoid tangled commits), and (ii) summarize the changes in a meaningful commit message (as attempted by Jiang et al. 2017).
The second subcategory (i.e., Finalizing Code Change) refers to code changes forgotten or left incomplete for other reasons in commit ci that are then finalized in ci+ 1. This includes cases in which developers add new test cases needed to test the production code introduced in the previous commit, or to complete an implementation task. For example, in a commit of the project (Commit to openpnp project on GitHub 2021), the developer claimed in the commit message that three new sub-features were introduced. However, the developer forgot to actually implement one of those sub-features and added the missing implementation in the following commit. In another case from the project (Commit to geoserver project on GitHub1 2021), the developer introduced a guard clause in commit ci to check if a processed reference is . Meanwhile, a debugging message was also added saying that “the reference is null, reset it to default value”. However, the actual implementation for resetting this reference value was missing in commit ci, and implemented in the remedy commit ci+ 1.
While these issues are of different natures, some of them can be spotted automatically through techniques comparing what is described in the commit message and what has been actually implemented in the change. For example, in the previously discussed example (Commit to openpnp project on GitHub 2021), a misalignment between the number of sub-features actually implemented and claimed in the commit message could be spotted and reported to the developer.
Reverted Commit (58)
This category groups remedy commits ci+ 1 in which the developers revert the code changes they committed in the previous commit ci. The reasons pushing a developer to revert previous changes through a remedy commit include: (i) introduced bugs spotted after pushing the changes in ci; (ii) unintended changes, pushed in ci by mistake; (iii) failing test cases, possibly indicating a bug worth of investigation before applying the ci’s changes. In all these cases, developers prefer to quickly bring the code back to its previous state to double check the implemented changes and understand the causes for the (possible) introduced issues.
In many cases we were not able to understand the reasons behind the reverted changes by manually inspecting the subject commits. These cases are just grouped in the root category Reverted Commit. Also, we observed that sometimes the code changes were reverted backward and forward within a few subsequent commits.
Our study is not the first one investigating reverted commits in software repositories. Shimagaki et al. (2016) conducted a study to gain a better understanding of why commits are reverted in large software systems. They found that 1%-5% of the commits from the systems they studies are reverted and this number could be reduced by improving team communication and developers’ awareness. However, in some cases, commits are reverted due to external factors (e.g., requirement change by end-users, customers, or remote teams) and, in this case, they are difficult to avoid. Yan et al. (2019) proposed a model to automatically identify commits that will be reverted in the future. They also found that the developer who performs the change is the most important predictive feature among the three they studied (i.e., code change, developer, commit message).
Besides the recommendations to developers already provided by Shimagaki et al. (2016),
the presence of reverted commits in the history of software systems is also relevant for the mining software repositories (MSR) research community. For example, it could be debated whether studies analyzing the change-proneness of code components (i.e., how frequently code components are subject to changes in software repositories) — e.g., Bieman et al. (2003), Catolino and Ferrucci (2019), and Aniche et al. (2018) — should take into account commits that are quickly reverted or, as currently done, should consider them. The same applies for works using the history of changes implemented by developers as a proxy for the developers’ experience — e.g., Rahman et al. (2017) and Tufano et al. (2017). In Section 3 we present an empirical study aimed at assessing the impact of considering (or not) reverted commits for typical MSR data collection tasks.
Documentation (49)
Our last category groups remedy commits related to software documentation. These commits impact a number of documentation artifacts that represent the main subcategories (see Fig. 4), namely: release notes, licensing statements, code comments, commit messages, and readme files.
The errors fixed in release notes, licenses and readme files are mostly minor. For example, some commits just update the copyright year in a previously committed file. Also, the fixes of commit messages rarely happen, and are mostly due to adding a missing commit message for the code changes implemented in the previous commit.
Also these cases are interesting for the MSR community. For example, approaches using pairs 〈code changes implemented in a commit cx, commit message of cx〉 to train models able to learn how to generate commit notes (Jiang et al. 2017), could be negatively biased by commit messages in a commit ci+ 1 referring to changes implemented in ci.
Other remedy commits are related to code comments. In some cases, developers documented the rationale for a code change implemented in the previous commit. This is the case of commit (Commit to jitsi project on GitHub 2021a) performed in the project. In a commit (Commit to jitsi project on GitHub 2021b) they fix a bug due to the wrong generation of a message where they mistakenly set a value of a parameter to an empty string instead of a value.
In the next commit (Commit to jitsi project on GitHub 2021a) they add a comment to explain the otherwise non-trivial difference in the generated message.
Interesting is also the missed removal of Self Admitted Technical Debt (SATD) instances (Potdar and Shihab 2014), meaning technical debt documented by developers in the code with comments such as \(\mathtt {TODO: \dots }\), \(\mathtt {TOFIX: \dots }\), etc. We found cases in which developers payed-back the technical debt instance, but forgot to remove the comment documenting the SATD. This resulted in a code-comment inconsistency (Wen et al. 2019), that could possibly confuse developers comprehending the associated code components. One representative example of this scenario is the commit (Commit to tinkerpop project on GitHub 2021a) performed in the project where the developers “Forgot to remove todo from previous commit”, as their commit message says. Indeed, in the remedy commit they remove a single-line comment which says “todo: need a test to enforce this condition”, and just right in the previous commit (Commit to tinkerpop project on GitHub 2021b) they had implemented the missing test case, thus paying back the technical debt.
The cases discussed above for the Documentation category provide us with some interesting lessons learned. First, identifying code components in which specific types of comments (e.g., to document the rationale for a given implementation and/or to detail the application logic) are needed, can be a promising research direction. Second, automatically classify SATD as payed-back (or not) can help in identifying obsolete and misleading comments in the code. We believe this is another interesting research direction for the software engineering community.