GitHub Actions: The Impact on the Pull Request Process

Wessel, Mairieli; Vargovich, Joseph; Gerosa, Marco A.; Treude, Christoph

doi:10.1007/s10664-023-10369-w

GitHub Actions: The Impact on the Pull Request Process

Open access
Published: 26 September 2023

Volume 28, article number 131, (2023)
Cite this article

Download PDF

You have full access to this open access article

Empirical Software Engineering Aims and scope Submit manuscript

GitHub Actions: The Impact on the Pull Request Process

Download PDF

1926 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Software projects frequently use automation tools to perform repetitive activities in the distributed software development process. Recently, GitHub introduced GitHub Actions, a feature providing automated workflows for software projects. Understanding and anticipating the effects of adopting such technology is important for planning and management. Our research investigates how projects use GitHub Actions, what the developers discuss about them, and how project activity indicators change after their adoption. Our results indicate that 1,489 out of 5,000 most popular repositories (almost 30% of our sample) adopt GitHub Actions and that developers frequently ask for help implementing them. Our findings also suggest that the adoption of GitHub Actions leads to more rejections of pull requests (PRs), more communication in accepted PRs and less communication in rejected PRs, fewer commits in accepted PRs and more commits in rejected PRs, and more time to accept a PR. We found similar results when segmenting our results by categories of GitHub Actions. We suggest practitioners consider these effects when adopting GitHub Actions on their projects.

Agile Project Management

A method for identifying different types of university research teams

Article Open access 18 April 2024

Agile Project Management and Project Success: A Literature Review

1 Introduction

Social coding platforms, such as GitHub, have changed the collaborative nature of open-source software development by integrating mechanisms such as issue reporting and pull requests into distributed version control tools (Dabbish et al. 2012; Gousios et al. 2014). This pull-based development workflow offers new opportunities for community engagement but increases the workload for repository maintainers, who need to communicate, review code, deal with contributor license agreement issues, explain project guidelines, run tests, and merge pull requests (Gousios et al. 2016).

To reduce this intensive workload, developers often rely on automation tools to perform repetitive tasks, such as to check whether the code builds, the tests pass, and the contribution conforms to a defined style guide (Kavaler et al. 2019). GitHub projects adopt, for example, tools to support Continuous Integration and Continuous Delivery or Deployment (CI/CD) (Zhao et al. 2017; Cassee et al. 2020) and for code review (Kavaler et al. 2019; Wessel et al. 2020). In recent years, development bots have been widely adopted to automate predefined tasks around pull requests (Wessel et al. 2018). By automating part of the workflow, developers expect to increase both productivity and quality (Vasilescu et al. 2015).

To further support automation, GitHub recently introduced GitHub Actions^{Footnote 1}(the feature was made available to the public in November 2019). GitHub Actions allow the automation of tasks based on various triggers (e.g., commits, pull requests, issues, comments, etc.) and can be easily shared between repositories, automating aspects of how developers build, test, and deploy software projects.

However, little is known about the impact on the project activities when adopting GitHub Actions and the challenges imposed on the project development process. In this paper, we aim to understand how software developers use GitHub Actions to automate their workflows and how the dynamics of pull requests of GitHub projects change following the adoption of GitHub Actions.

To achieve our goal, we address the following research questions:

We aim to understand how commonly repositories use GitHub Actions and for what purposes. As a result of this analysis, we found a considerable number of active repositories (1,489 out of 5,000 repositories) adopted GitHub Actions. This is a dramatic change when compared to the early adoption of GitHub Actions (Kinsman et al. 2021) (only 0.7% of the studies repositories adopted it). Actions are spread across 20 categories, including utilities, continuous integration, code quality, and deployment.

To gain insight into how developers perceive GitHub Actions, we manually analyzed a set of discussion threads and developer conversations on Discord that mention GitHub Actions. We found distinct categories of discussions related to Actions, including help requests, the potential of using them, issues reproducing output with Actions, and plans to use GitHub Actions.

In this research question, we investigate whether project activity indicators, such as the number of pull requests, comments, commits, and time to close pull requests change after GitHub Actions adoption. We used a Regression Discontinuity Design (RDD) (Thistlethwaite and Campbell 1960) to model the effect of Action adoption across 662 projects that had adopted GitHub Actions for at least 12 months. Our findings also suggest that the activity indicators change in opposite directions for accepted and rejected pull requests (PRs). Fewer pull requests are being accepted after adopting GitHub Actions, and these pull requests usually have more comments and fewer commits. In contrast, there are more rejected pull requests, with fewer comments and more commits.

As Actions are diverse and might perform a diverse range of tasks on GitHub repositories, we also investigated whether the impact of GitHub Actions differs across Action categories. The literature recommends employing a segmented analysis to further explain the general findings from statistical models (Wessel et al. 2022). In this research question, as in RQ3, we used a Regression Discontinuity Design model to measure the impact of adoption in project indicators across the four most popular Action categories: Utilities, Continuous Integration, Code Quality, and Deployment. Results obtained in the segmented analysis were similar to the overall results (from RQ3), except for code quality Actions, which led to fewer rejected pull requests.

The main contributions of this paper are:

1.
Characterization of the usage of GitHub Actions.
2.
An understanding of how developers discuss GitHub Actions.
3.
An understanding of how GitHub Actions’ adoption impacts project activities.

This paper extends our prior work (Kinsman et al. 2021), published at MSR 2021 (The Mining Software Repositories Conference), along two major dimensions: the data used and the analyses performed. The data used in this paper broadens our previous work in three major dimensions: time (24 vs. 12 months after GitHub Actions were introduced), the number of unique Actions (973 vs. 708), and the dataset of projects used (5,000 most popular GitHub projects vs. RepoReapers dataset). In this extension, we also added RQ4 and included new regression discontinuity design analyses split by Action categories.

2 Workflow Automation with GitHub Actions

GitHub Actions is an event-driven API the GitHub platform provides to automate development workflows. GitHub Actions can run a series of commands after a specified event has occurred. An event is a specific activity that triggers a workflow run, as shown in Fig. 1 (see the icon). For example, a workflow is triggered when a pull request is created for a repository or when a pull request is merged into the main branch. Workflows are defined in the .github/workflows/ directory and use YAML syntax, having either a .yml or .yaml file extension.

A workflow can contain one or more Actions. GitHub allows developers to build reusable components, called Actions. Developers create Docker and JavaScript Actions, and both require a metadata file to define the Action’s inputs, outputs, and entry point.

After the successful execution of a workflow, the outputs can be displayed in different ways, such as through a GitHub Action bot. Like many other bots on GitHub, this bot is implemented as a GitHub user that can submit code contributions, interact through comments, and merge or close pull requests (Wessel and Steinmacher 2020).

As an example of GitHub Actions adoption, consider the case of the project Gammapy^{Footnote 2}, an open-source Python package for gamma-ray astronomy. As of the 13$^{th}$ of November 2019, the Gammapy community adopted a GitHub Action called First Interaction^{Footnote 3}, which is responsible for identifying and welcoming newcomers when they create their first issue or open their first pull request on a project. As shown in Fig. 2a, Gammapy created a workflow called Greeting that both new pull requests and issues might trigger, as defined by the on keyword. The output of the First Interaction Action is displayed through an issue/pull request comment posted by GitHub Action Bot when a new contributor authors a new pull request or issue. An example of this Action interaction on a GitHub issue is shown in Fig. 2b.

Development bots and workflows that rely on GitHub Actions are already used in hundreds of thousands of repositories, justifying the need for further studies on these automation mechanisms’ evolution and impact on collaborative software development practices. Recently, developers published GitHub Actions variants for many well-known bots (e.g., Coveralls, Codecov, Snyk), and these Actions are rapidly increasing in popularity (Golzadeh et al. 2020).

3 Related Work

Previous work has investigated a variety of automation tools, including development bots, continuous integration/delivery, and GitHub Actions.

3.1 Development Bots

Development bots have been proposed to automate technical and social aspects of software development activities (Lin et al. 2016), such as communication and decision-making (Storey and Zagalsky 2016). For example, on GitHub, bots are often integrated into the pull request workflow (Erlenhov et al. 2019) to perform a variety of tasks, including repairing bugs (Monperrus 2019), refactoring source code (Wyrich and Bogner 2019), recommending tools to help developers (Brown and Parnin 2019), and updating outdated dependencies (Mirhosseini and Parnin 2017). Wessel et al. (2018) identified 13 categories of development bots. van Tonder and Le Goues (2019) believe development bots are a promising addition to a developer’s toolkit as they bridge the gap between human software development and automated processes.

However, understanding the impact of development bots on human developers’ interactions is a major challenge. Storey and Zagalsky (2016) highlight that the way that development bots interact on pull requests can be disruptive and perceived as unwelcoming. Wessel et al. (2021) identified several challenges caused by bots in pull requests and theorized how human developers perceive annoying bot behaviors as noise on social coding platforms. Wessel et al. (2020, 2022) also found that adopting code review bots changes team dynamics, for example, by increasing the number of monthly merged pull requests and decreasing communication among developers.

3.2 Continuous Integration and Continuous Delivery

Continuous Integration and Continuous Delivery (CI/CD) tools aim to bridge development and operation activities by automating the building, testing, and deployment of applications (Duvall et al. 2007). These tools constantly compile incremental code changes made by developers, build software deliverables, run automated tests and verifications, and deploy applications to servers, improving software quality and productivity (Duvall et al. 2007). Vasilescu et al. (2015) show that using CI leads to more pull requests being processed, and thus more pull requests being accepted or rejected. In the context of Computer Science education, Hu and Gehringer (2019) set up a continuous integration service on GitHub to provide feedback to students about code style and functionality. Prior work has also investigated the impact of CI and code review tools on GitHub projects (Zhao et al. 2017; Kavaler et al. 2019; Cassee et al. 2020) across time. While Zhao et al. (2017) and Cassee et al. (2020) focused on the impact of the Travis CI tool’s introduction in development practices, Kavaler et al. (2019) examined the impact of linters, dependency managers, and coverage reporter tools. A survey by Chen et al. (2001) reports that of the hundreds of billions of dollars spent on developer wages, up to 25% accounts for fixing bugs. Continuous integration and other automation tools thus hold huge potential to further reduce human effort and costs by automatically fixing bugs.

3.3 GitHub Actions

GitHub Actions offer built-in support to automate parts of the software development workflows that exceed what CI/CD tools can achieve. Golzadeh et al. (2022) showed that, in 18 months of existence, GitHub Actions had become the dominant CI service, covering more than half of all repositories with a CI. Software projects are still adjusting GitHub Actions to their dynamics. Valenzuela-Toledo and Bergel (2022) found 11 reasons for changing the GitHub Actions’ workflow. Saroar and Nayebi (2023) conducted a survey to understand the motivations and best practices in using, developing, and debugging GitHub Actions. Calefato et al. (2022) identified a set of practices for using GitHub Actions in projects related to machine learning-enabled systems. In a broader view, Decan et al. (2022) found that the reuse of actions is a common practice. Researchers are also starting to provide academic tools via GitHub Actions to facilitate the integration with real projects. For example, Cordeiro et al. (2021) offer a GitHub Action for detecting flakiness in time-constrained tests. Finally, in a prior work Kinsman et al. (2021), we investigate how developers use GitHub Actions and how several activity indicators change after their adoption. We explain how this paper extends our prior work in Section 1. Chen et al. (2021) also extended our prior work, finding that 22% of popular projects adopt GitHub Actions and that adoption correlates with project popularity and number of contributors and varies per programming language. They also found that after adopting GitHub Actions, the number of commits, number of pull requests, issue latency, and pull request latency tend to decrease, while the number of issues closed tends to increase.

4 Research Design

This study aims to understand GitHub Actions usage and the effects on GitHub projects. To achieve our goal, we employed a mixed-methods approach combining a time series analysis on a sample of open-source repositories and qualitative analysis of developers impressions about GitHub Actions. We present our study design, data collection, and analysis procedures in the following.

4.1 Selecting Projects

We assembled a dataset of GitHub open-source projects that adopted GitHub Actions at some point in their history. We started by selecting the 5,000 most-starred GitHub repositories. We used stars as a proxy for popularity. We then filtered this dataset to keep open-source software projects that had adopted at least one Action during their lifetime. To identify these projects, we retrieved data from the GitHub API using a Ruby toolkit called Octokit.rb.^{Footnote 4} To determine if the project used any Actions, we verified whether the repositories contained files in yaml format in the ./github/workflows directory. This filtered dataset comprised 1,489 projects.

4.2 Analyzing the Use of GitHub Actions

First, we collected and quantitatively analyzed the number of projects using GitHub Actions and the number of GitHub Actions per project (RQ1). We also automatically analyzed the workflow files of the studied projects, searching for the category, description, and whether GitHub verified the Action. We determined the Actions used within a workflow by extracting the ‘uses: ACTION@VERSION.’ For example, in ‘uses: actions/first-interaction@v1’ the First interaction^{Footnote 5} was identified and extracted. In the case of multiple Actions in a single workflow, all of them were identified.

4.3 Categorizing GitHub Actions Discussions

To answer RQ2, we manually investigated how GitHub Actions were discussed in project-specific channels, including GitHub Discussions (Hata et al. 2022) and Discord chats (Subash et al. 2022).

Filtering GitHub Discussions and Discord chats. We started by investigating the GitHub Discussions on our selected projects. Out of the 5,000 repositories in our dataset, 897 (18%) had the Discussions feature enabled at the time of data collection, and 830 (17%) contained at least one Discussion thread. These 830 repositories account for 88,443 Discussion threads (minimum: 1, median: 22, maximum: 10,129), containing 326,033 posts. To complement our analysis, we have also considered developers’ conversations on Discord, as they may use other communication channels to discuss GitHub Actions. For this analysis, we used the DISCO dataset (Subash et al. 2022). This dataset consists of one-year public conversations on Discord of five software development communities (Python, Go, Clojure, and Racket).

Aiming for high precision rather than recall, we applied a strict filter to these GitHub Discussion posts and chat excerpts and selected only those with the exact string “GitHub Action” (case insensitive). We avoided searching for strings like “.github/workflows/” and “workflow”, which tend to generate many false positives. An exploratory analysis of the DISCO dataset showed that strings like “.github/workflows/” are rarely mentioned, and “workflow” mostly appears in unrelated contexts.

This filtering step resulted in (i) 573 posts originating from 458 threads in 148 different repositories and (ii) 40 excerpts from two distinct communities (34 and 6 excerpts from Python and Go, respectively).

Qualitative analysis. We applied qualitative coding to the 458 threads to understand how developers discuss GitHub Actions. One author developed a preliminary coding schema based on a random sample of 20 threads, which was refined through discussions with all authors. Two authors then independently coded another set of 20 threads and measured inter-rater agreement. Based on achieving an ‘almost perfect’ agreement (Cohen’s $\kappa = 0.939$ (McHugh 2012)) and resolving disagreements through discussion, the same two authors divided the remaining threads equally among them and completed the annotation of all 458 threads. We also applied qualitative coding to the 40 chat excerpts from the DISCO dataset. Two authors then independently coded all chat excerpts based on the defined code schema and measured inter-rater agreement (Cohen’s $\kappa = 1$). Section 5.2 reports the coding schema and the detailed results for both Discussion threads and Discord conversations.

4.4 Time Series Analysis

We conducted a time series analysis to answer RQ3 and RQ4. We collected longitudinal data for different outcome variables and treated the adoption of GitHub Actions by each project in our dataset as an “intervention”. This way, we could align all the time series of project-level outcome variables on the intervention date and compare their trends before and after adopting GitHub Actions. The following subsections detail the steps involved, from aggregating the project variables to running the statistical models.

4.4.1 Aggregating Project Variables

We gathered Action data within an observation period of 12 months before and 12 months after the Action adoption within each project. Similar to previous work (Zhao et al. 2017; Wessel et al. 2020; Cassee et al. 2020; Kinsman et al. 2021), we exclude 30 days around the Action adoption date to avoid the influence of the instability caused during this period. Afterward, we aggregated individual pull request data into monthly periods, considering 12 months before and after the Action introduction. Next, we checked the activity level of the candidate projects, since many projects on GitHub are inactive (Gousios et al. 2014). Our data set comprises 662 active projects that had been using at least one GitHub Action for 12 months.

We focused on the same pull request-related variables as in previous work (Wessel et al. 2020; Kinsman et al. 2021):

Merged/non-merged pull requests: the number of monthly contributions (pull requests) that have been merged (accepted) or closed but not merged (rejected) into the project, computed over all closed pull requests in each time frame.

Comments on merged/non-merged pull requests: the median number of monthly comments computed over all merged and non-merged pull requests in each time frame.

Commits of merged/non-merged pull requests: the median of monthly commits computed over all merged and non-merged pull requests in each time frame.

Time to merge/time to close pull requests: the median of monthly pull request latency (in hours), computed as the difference between the time when the pull request was closed and the time when it was opened. The median is computed using all merged and non-merged pull requests in each time frame.

Based on previous work (Cassee et al. 2020; Zhao et al. 2017; Wessel et al. 2020; Kinsman et al. 2021), we also collected six known covariates for each project:

Project name: the name of the project to which the pull request belongs. This name is used to uniquely identify the project on GitHub.

Programming language: the primary project programming language, as automatically provided by GitHub.

Time since the first pull request: in months, computed since the earliest recorded pull request in the project’s history. We use this variable to capture the project’s maturity regarding its use of pull requests.

Total number of pull request authors: we count how many contributors submitted pull requests to the project as a proxy for the community size of a project.

Total number of commits: we compute the total number of commits as a proxy for the activity level of a project.

Number of pull requests opened: the number of monthly contributions (pull requests) received in each time frame. We expect that projects with a high number of contributions also observe a high number of comments, latency, commits, and merged and non-merged contributions.

4.4.2 Statistical Approach

We modeled the effect of GitHub Action adoption over time across GitHub repositories using a Regression Discontinuity Design (RDD) (Thistlethwaite and Campbell 1960; Imbens and Lemieux 2008), following the work of Wessel et al. (2020). RDD is a technique used to model the extent of a discontinuity at the moment of intervention and long after the intervention. The technique assumes that if the intervention does not affect the outcome, there would be no discontinuity, and the outcome would be continuous over time (Cook and Campbell 1979). The statistical model behind RDD is

$$\begin{aligned} \begin{aligned} y_{i} =&\, \alpha + \beta \cdot \text {time}_{i} + \gamma \cdot \text {intervention}_{i} \, + \\ {}&\delta \cdot \text {time\_after\_intervention}_{i} \, + \eta \cdot {controls}_{i} + \varepsilon _{i} \end{aligned} \end{aligned}$$

where i indicates the observations for a given project.

To model the passage of time as well as the GitHub Action introduction, we rely on three variables: time, time after intervention, and intervention. The time variable is measured as months at the time j from the start to the end of our observation period for each project.

The intervention variable is a binary value used to indicate whether the time j occurs before (${\textit{intervention}}=0$) or after the (${\textit{intervention}}=1$) adoption event. The time_after_intervention variable counts the number of months at time j since the Action adoption, and the variable is set to 0 before adoption. The ${\textit{controls}}_{i}$ variables enable the analysis of Action adoption effects rather than confounding the effects that influence the dependent variables. For observations before the intervention, holding controls constant, the resulting regression line has a slope of $\beta $, and after the intervention $\beta +\delta $. The size of the intervention effect is measured as the difference equal to $\gamma $ between the two regression values of $y_{i}$ at the moment of the intervention.

Considering that in RQ3 we are interested in the effects of GitHub Actions on the monthly trend of the number of pull requests, number of comments, number of commits, and time to close for both merged and non-merged pull requests, we fitted eight models (4 variables $\times $ 2 cases). In RQ4, we measured the impact of adoption for the same variables across the four most popular Action categories in our filtered dataset: utilities, continuous integration, code quality, and deployment. We selected projects that have adopted one or more of the four categories. In cases where a project employs multiple Actions, the project is considered in the analysis of multiple Action categories. Therefore, we fitted thirty-two models (4 variables $\times $ 2 cases $\times $ 4 categories).

To balance false positives and false negatives, we report the corrected p-values after applying multiple corrections using the method of Benjamini and Hochberg (1995). We implemented the RDD models as a mixed-effects linear regression using the R package lmerTest (Kuznetsova et al. 2017). We modeled project name and programming language as random effects (Gałecki and Burzykowski 2013) to capture project-to-project and language-to-language variability (Zhao et al. 2017). We evaluate the model fit using marginal $(R^2_m)$ and conditional $(R^2_c)$ scores, as described by Nakagawa and Schielzeth (2013). The $R^2_m$ can be interpreted as the variance explained by the fixed effects alone, and $R^2_c$ as the variance explained by the fixed and random effects together.

In mixed-effects regression, the variables used to model the intervention and the other fixed effects are aggregated across all projects, resulting in coefficients useful for interpretation. The interpretation of these regression coefficients supports the discussion of the intervention and its effects, if any. Thus, we report the significant coefficients ($p < 0.05$) in the regression and their variance, obtained using ANOVA. In addition, we log transform the fixed effects and dependent variables that have high variance (Sheather 2009). We also account for multicollinearity, excluding any fixed effects for which the variance inflation factor (VIF) is higher than 5 (Sheather 2009).

5 Results

In the following, we report the results of our study per research question.

5.1 How do OSS Projects use GitHub Actions? (RQ1)

Analyzing the set of 5,000 repositories, we identified 1,489 (29.8%) open-source software projects that had adopted at least one GitHub Action at the time of our data collection. As the box plot in Fig. 3 shows, many of these repositories adopt more than one Action, with a median value of four and a maximum of 46.

In these repositories, we found 973 distinct predefined GitHub Actions. We collected data from each Action’s repository and the GitHub Marketplace^{Footnote 6} page to categorize these GitHub Actions. If published in the marketplace, an Action is classified into 1–2 categories by the publisher. Table 1 presents the categorization of GitHub Actions we found. Note that the percentages do not add up to 100, since about half of the GitHub Actions are assigned to two categories, a primary and a secondary.

Table 1 Categorization of GitHub Actions found in our sample

GitHub Actions: The Impact on the Pull Request Process

Abstract

Similar content being viewed by others

Agile Project Management

A method for identifying different types of university research teams

Agile Project Management and Project Success: A Literature Review

1 Introduction

2 Workflow Automation with GitHub Actions

3 Related Work

3.1 Development Bots

3.2 Continuous Integration and Continuous Delivery

3.3 GitHub Actions

4 Research Design

4.1 Selecting Projects

4.2 Analyzing the Use of GitHub Actions

4.3 Categorizing GitHub Actions Discussions

4.4 Time Series Analysis

4.4.1 Aggregating Project Variables

4.4.2 Statistical Approach

5 Results

5.1 How do OSS Projects use GitHub Actions? (RQ1)

5.2 How is the use of GitHub Actions Discussed by Developers? (RQ2)

5.3 What is the Impact of GitHub Actions? (RQ3)

5.4 How Does the Impact of GitHub Actions Differ Across Action Categories? (RQ4)

6 Discussion

7 Limitations and Threats to Validity

8 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation