Will You Come Back to Contribute? Investigating the Inactivity of OSS Core Developers in GitHub

Several Open Source Software (OSS) projects depend on the continuity of their development communities to remain sustainable. Understanding how developers become inactive or why they take breaks can help communities prevent abandonment and incentivize developers to come back. In this paper, we propose a novel method to identify developers' inactive periods by analyzing the individual rhythm of contributions to the projects. Using this method, we quantitatively analyze the inactivity of core developers in 18 OSS organizations hosted on GitHub. We also survey core developers to receive their feedback about the identified breaks and transitions. Our results show that our method was effective for identifying developers' breaks. About 94% of the surveyed core developers agreed with our state model of inactivity; 71% and 79% of them acknowledged their breaks and state transition, respectively. We also show that all core developers take breaks (at least once) and about a half of them (~45%}) have completely disengaged from a project for at least one year. We also analyzed the probability of transitions to/from inactivity and found that developers who pause their activity have a ~35-55\% chance to return to an active state; yet, if the break lasts for a year or longer, then the probability of resuming activities drops to ~21-26%, with a ~54% chance of complete disengagement. These results may support the creation of policies and mechanisms to make OSS community managers aware of breaks and potential project abandonment.


Introduction
The success of an OSS project depends on the strength and health of the group behind it (Crowston and Howison, 2006;Link and Germonprez, 2018;Coelho and Valente, 2017).Marsan et al. (2018) reported that, from the perspective of OSS developers, maintainers, and managers, the loss of contributors is a major problem in OSS communities and ecosystems, followed by poor code quality.Almost half of the OSS projects that fail do so because of a problem related to the development team (Coelho and Valente, 2017).Turnovers often disrupt the community, reduce productivity (Mockus, 2010), and degrade product quality (Schilling, 2014;Foucault et al., 2015), especially when core contributors are involved.More specifically, Avelino et al. (2019) showed that 59% of the analyzed projects did not survive after the departure of Truck Factor developers (Williams and Kessler, 2002), i.e., developers who hold unique responsibilities in the project.
Therefore, it is critical to nurture the project workforce and avoid losing core developers and their knowledge.However, little is known about either the stages of project disengagement that developers undergo, or the transitions between these stages.So far, research has focused on the developers' life cycle: how people join the projects (Von Krogh et al., 2003), including the barriers they face (Steinmacher et al., 2015;Hannebauer and Gruhn, 2017;Balali et al., 2018); how they are attracted (Yamashita et al., 2016;Santos et al., 2013;Fronchetti et al., 2019); and how they become long-term contributors or core members (Ducheneaut, 2005;Nakakoji et al., 2002;Zhou and Mockus, 2012).The limited research about understanding developers' disengagement has focused on the risks that projects incur when they lose developers (Ricca and Marchetto, 2010;Avelino et al., 2019;Ferreira et al., 2017) and factors related to the developers' abandonment, using the survival analysis technique (Lin et al., 2017).To avoid developers' disengagement, researchers analyzed the factors related to developers' engagement and retention (Schilling, 2014;Midha and Palvia, 2007;Zhou et al., 2016).However, these factors fail to consider the developers' rhythm or the disengagement stages and their transitions.
In this paper, we employ a mixed-methods approach to investigate how developers take breaks and disengage from OSS projects.We first design a state model based on interviews with OSS developers.Then, we analyze data from eighteen OSS organizations and identify core developers' breaks by analyzing their personal work rhythm and identifying their inactivity states and the transitions between states.Then, we collect the developers' feedback about the output of our method.Finally, we employ our method to characterize break frequency and length as well as state transition probabilities in the analyzed OSS organizations.
This paper makes the following contributions: (i) an empirically-defined state model that describes developers' (in)activity states and transitions in OSS projects; (ii) a method to identify the inactivity periods based on the individual rhythm of contribution, assessed by OSS developers via a survey; and (iii) an in-depth analysis at the organizational level about the frequency and extension of over 500 core developers' break periods, showing how the break periods vary and the probability of core developers' transitions to and from inactive periods.The proposed model captures disengagement, short breaks, and periods when contributors are only working on non-coding activities.Our results show that breaks are rather common, and highlight the importance of considering more than just coding to analyze inactivity/disengagement.The model and method proposed here are relevant for OSS communities; they can be used to define reference values for dashboards that alert when someone is likely about to leave.
The rest of this paper is organized as follows.In Section 2, we review related work.In Section 3, we illustrate the research framework devised to carry out the study and answer the research questions.In Section 4, we present the model developed to inform our research method, which is described in Section 5. Results are reported in Section 6 and discussed in Section 7. Finally, we conclude in Section 8.

Related Work
In this section, we present the literature related to developers' life cycle and disengagement, focusing on OSS.

Open Source Developers' Life Cycle
Many OSS projects rely on a globally distributed community of developers who join projects for diverse reasons, such as reputation, giving back to the community, and learning (Gerosa et al., 2021;David and Shapiro, 2008;Alexander Hars, 2002;Lakhani and Wolf, 2005;Krogh et al., 2012).Many studies have focused on investigating the stages and activities in the path to becoming core members or long-term contributors.For example, Nakakoji et al. (2002) proposed the Onion model, which represents the general structure of OSS roles that developers follow to become core members.Von Krogh et al. (2003) proposed a joining script, based on steps followed to become a member of a project.Ducheneaut (2005) offered an in-depth look at a successful newcomer's socialization history, identifying activities that contributed to their success.Steinmacher et al. (2014) bring a different perspective to the joining process, explaining it in two stages, namely onboarding and contributing, and describing the forces that push developers towards a project, such as motivation (Hannebauer and Gruhn, 2017;Krogh et al., 2012;Gerosa et al., 2021) and attractiveness (Yamashita et al., 2016;Santos et al., 2013), and those that hinder developers' onboarding (Steinmacher et al., 2015).
Overall, the studies that model OSS developers' life cycles address how to onboard, remain, and become long-term contributors.Even the motivationrelated literature focuses on what compels developers to join projects (Von Krogh et al., 2003;Hannebauer and Gruhn, 2017;Alexander Hars, 2002;Silva et al., 2020) and remain active (Zhou and Mockus, 2012;Barcomb, 2014;Yamashita et al., 2016).Very little is known about developers' inactive periods and disengagement.Recent work has started filling this gap by investigating the impacts and reasons behind abandonment and turnover, as presented in the following section.

Developers' Turnover and Disengagement
Previous literature reported on the negative impact of turnover on team cognition and performance (Levine and Choi, 2004;Levine et al., 2005), and its costs for the company and society (Graef and Hill, 2000;Garman et al., 2005).Software engineering-specific literature has also shown that developers' turnover harms software development projects (Bass et al., 2018;Nakatsu and Iacovou, 2009;Hall et al., 2008).Hall et al. (2008) showed that projects with high turnover are less successful.Mockus (2010) reported that people leaving the organization (departures) might increase the probability of defects in commercial projects.Some researchers also investigate the impact of developers' turnover on software quality (Mockus, 2009;Foucault et al., 2015).Regarding the reasons for leaving a software company, Dittrich et al. (1985) and Garden (1988) reported that wage and pay raise equity is a significant predictor of intentions to quit.Additionally, Lee (2002) showed that job satisfaction strongly influences turnover intentions.
In the OSS context, several researchers investigated turnover in OSS projects.For example, Lin et al. (2017) studied why some developers are more likely to continue their contributions than others.Others focused on understanding the potential issues surrounding developers leaving OSS projects, including the so-called truck factor (Ricca and Marchetto, 2010;Avelino et al., 2016;Ferreira et al., 2017;Cosentino et al., 2015), which is defined as the number of people who have to be hit by a truck (i.e., leave the project) before the project itself is at risk (Williams and Kessler, 2002).Avelino et al. (2019) found that truck factor is a real concern, which may affect project evolution.They showed that the majority of the projects do not survive when truck factor developers disengage and no other developers replace them.
Some other researchers have focused on understanding the reasons behind disengagement from OSS projects.Miller et al. (2019) found that contributors who work at night and during weekends disengage for different reasons than those who work during office hours.While the first group reported social rea-sons as their primary motivation to leave, the latter mostly cited job-related reasons.In our previous work (Iaffaldano et al., 2019), we uncovered potential reasons behind developers' inactive periods, including personal (e.g., job change, financial issues, lack of interests) and project-related (e.g., role change, governance issues) reasons.Our current study further investigates these reasons, quantifying how frequently they occur.
Finally, Decan et al. (2020) predict developers' imminent abandonment by considering their commit activity.Although we are not predicting abandonment, our results can inform similar prediction models by bringing to light the importance of the work rhythm and by characterizing multiple states and their transitions, which consider coding and non-coding activities.
In summary, existing research has focused on studying the health of OSS communities concerning developers' onboarding, retention, and turnover.While a few papers consider the reasons behind OSS developers' disengagement, turnover, and the consequences of such actions, it is still unknown how breaks comprise part of the developer life cycle.The current literature does not explore the extent of disengagement, nor how to identify breaks based on the contribution rhythm of developers.In this paper, we broaden the literature by going beyond understanding the reasons to stay or leave.We investigate developers' disengagement, modeling intermediate states in which developers pause their coding contribution or engage in non-coding activities, and then either resume coding, or extend their hiatus until eventually abandoning a project altogether.

Research Framework
The goal of this study is to investigate how core developers take breaks and disengage from OSS projects.To conduct the study, we devised a research framework divided into two phases (see Fig. 1).In Phase I (see Sect. 4), we refined the preliminary model of developers' activities and breaks from our previous work (Iaffaldano et al., 2019).Accordingly, we carried out interviews with 13 OSS developers from 8 projects.Building on their feedback and insights, we designed a revised state model.
In Phase II (see Sect. 5), we collected a larger sample of 18 projects from GitHub.Building on the revised model, we designed and implemented an algorithm that analyzes trace data mined from project repositories to establish the working rhythm of OSS developers and identify their states (activities and breaks) and transitions, at the organization level, as defined in the model.
To further understand the phenomenon of core developers taking breaks from and abandoning OSS projects, we defined the following research questions: RQ1.To what extent can our method identify developers' breaks?RQ2.How often do core developers take breaks?RQ3.How long do core developers' breaks last?RQ4.How common are the transitions between states?
To answer RQ1, we applied the Truck Factor algorithm (Avelino et al., 2016) to identify 75 core developers of the selected projects.Then, we invited those with a public email address to participate in a personalized survey where we asked them to provide feedback on the revised model and to acknowledge their own breaks and transition as identified by our algorithm.We received answers from 17 developers.The results of this step were used to further refine the model.
For the remaining research questions RQ2-4, we considered a larger sample of core developers, i.e., those who authored 80% of a project codebase, because, as discussed in Sect.5.1.1,the Truck Factor algorithm is far more restrictive.We ran our break-identification algorithm on the trace data of 538 core developers to quantitatively assess the phenomenon of inactivity and disengagement in OSS projects on a larger scale.
Next, we detail the activities carried out in the two phases of the research framework.

Phase I -Modeling Developers' Activities and Breaks
In our previous work (Iaffaldano et al., 2019), we designed a preliminary state model, which defines states for project disengagement and explains the reasons for the transitions, as can be seen in Fig. 2. We designed this model by conducting six interviews with experienced OSS developers and comparing their commit timeline on GitHub to the circadian rhythm (or sleep-wake cycle).
Fig. 2 The original model of OSS developers' states and contribution breaks (Iaffaldano et al., 2019).
The interviewed developers contributed to different projects, including Radar Parlamentar,1 Noosfero,2 Crossminer,3 GrimoireLab,4 KDE Cantor,5 and R.6 In this preliminary model, nodes represent the developers' states in a project and edges represent the transitions between these states.The model is framed within the the metaphor of the circadian rhythm: the wake stagewhere intense brain activity is registered-corresponds to the active state, in which developers actively contribute source code changes; the sleep stagewhen brain activity is low, but other life signals are still observable-corresponds to the sleeping state, when developers pause their code contributions but still give signals of their presence in the community (e.g., by commenting issues and pull requests); finally, the dead stage corresponds to the state where a contributor has completely abandoned a project (i.e., neither code contributions nor signals of non-coding activities are found).The analysis of the interviews also helped to uncover a list of reasons for transitioning from one state to another.For example, a sleeping developer may wake up and return to active by contributing code after a break due to either a personal (e.g., life event, financial, lack of interests) or project-related (e.g., role change, social, governance issues) reason.
Because this model was based on a small set of interviews, we decided to further assess its validity and refine it before using it in the second phase of this study.

Evaluation of the Preliminary Model
To evaluate our preliminary model, we presented our results to 13 OSS developers from 8 OSS projects hosted on GitHub.These developers are longterm contributors and active in these projects.They were recruited from our personal network and referrals.These developers are from a different set of projects compared to those interviewed to conceive the preliminary model.The projects they contribute to (i.e., rails/rails,7 laravel/framework,8 elixirlang/elixir,9 JabRef/jabref,10 github/linguist,11 atom/atom,12 flutter/flutter,13 and ionic-team/ionic-framework)14 are active, popular, with 5+ years of historic records, and diverse in terms of programming languages.Table 2 includes the details of these projects (which are marked with a * in the table), among others.
We tailored a structured interview (questionnaire) for each respondent.We first collected the project history of each respondent from GitHub-using the GitHub API-and instantiated the model for that data.To identify breaks, we applied a method based on the far-out values approach used for computing outliers (detailed in Sect.5.1.2),looking for breaks outside the developers' natural contribution rhythm.We also asked the respondents to give feedback about the state transitions (we showed up to 3 examples of each transition) and about the model as a whole.
After analyzing their answers, we identified three recurring issues.The first issue regarded considering the states based on a single project instead of analyzing the entire organization, as some developers appeared as sleeping, but in fact are contributing to other repositories within the same organization.The second issue was that the term sleeping was misleading, as it failed to capture that, albeit not coding, many members are still actively contributing, e.g., by managing issues or the project.The third problem was also related to naming, specifically of the dead state; some respondents pointed out that the term was too negative and 'terminal,' therefore failing to capture that some disengaged developers can actually go back.As one of the interviewees commented, a dead developer would be one who "does not want to contribute anymore to that community ever again."Instead, the developers also noticed a high number of transitions between dead and active states as well as between active and sleeping.
Based on the feedback, we defined a revised version of the model.

The Revised Model of Inactivity
Given the feedback from the structured interviews, we defined a revised model (see Fig. 3) to inform this study.Compared to the preliminary version (Fig. 2), we dropped the metaphor of the circadian rhythm; as such, we renamed the dead state as gone, and sleeping as non-coding.We explicitly added this state because contributors follow different pathways to contribute and contributors not working with code are usually 'hidden' (Trinkenreich et al., 2020).
In addition, we added a new state between non-coding and gone, named inactive.
Also, the revised model includes transition labels that indicate the event that triggers the state change; therefore, if active developers do not perform any commit in a given time interval (i.e., no commit after ∆t non−coding ), but perform other activities, they transition from active into the non-coding state.The classification of coding and non-coding activities is provided later in Sect.5.1.3,along with the description of the algorithm for labeling breaks and transitions accordingly.
OSS developers who take breaks from writing code may resume their commit activity at some point.However, if non-coding developers do not resume code contribution but rather stop contributing (e.g., no issue comments, no pull requests reviews) for a time interval (i.e., no collaboration activity after ∆t inactive ), they become inactive.Entering into such a state is also possible for active developers, should they stop both committing and contributing by other means altogether (i.e., neither commit nor collaborate via issue tracker for at least ∆t inactive ).Developers can come out of the inactive state by resuming their activities.Otherwise, if their hiatus continues for at least ∆t gone , they eventually will be considered gone (i.e., they abandoned the project).As such, the transition towards gone is only possible from inactive.Finally, developers can go back to either active (by resuming code commits) or noncoding state (by reviewing code, opening issues, commenting, etc.).

Phase II -Evaluation of the Revised Model and Characterization of the Phenomenon
In this section, we describe the method adopted for the second phase of our research, including the data collection and sampling, the definition of break periods, and the identification of transitions between states.

Data Collection and Sampling
We mined project data from eighteen projects from different organizations hosted on GitHub.We extracted their entire history of activities, including commits, issues, and pull requests.The dataset and all the scripts for data collection and analysis are available online as supplementary material. 15e used convenience sampling to select the organizations (see Table 1) and projects (see Table 2) used in the study.We started by including the same eight projects and organizations used for the preliminary model evaluation (see Sect. 4.1), namely rails/rails, laravel/framework, elixir-lang/elixir, JabRef/jabref, github/linguist, atom/atom, flutter/flutter, and ionic-team/ionicframework (which are identified with a * in Table 2).This choice was motivated by the fact that we have connections in those organizations, which we were planning to use to recruit developers for future qualitative assessments.
To select the other ten projects and their organizations, from the 'Topics' section on the GitHub website, 16 we identified the ten most trending topics and then moved on to select up to three projects per topic, excluding a priori research projects and small, one-person projects.Instead, we opted to select active, successful projects whose discontinuation would pose a threat to the sustainability of many others depending on it.We also filtered out projects within organizations with too many other projects (>350) to contain the extraction time within acceptable limits; likewise, we filtered out projects deemed too large in terms of the number of distinct contributors (>4000) or development history (>15 years).Finally, we selected the highest-rated projects (GitHub stars) that would also add variety to the sample in terms of size (using the number of contributors, pull requests, and LOC as proxy), history (age), and programming language.After completing the sampling, we ensured we selected the main project from each organization.The identification was straightforward in most of the cases, as the largest project (in terms of the number of contributors, stars, forks received, pull requests, and LOC) is either homonymous with its organization (e.g., rails/rails, atom/atom) or has a closely resembling name (e.g., elixirlang/elixir, ionic-team/ionic-framework).The verification in the case of github and facebook organizations, from which we respectively selected linguist and react, required extra care.We verified that the two projects were indeed the largest ones after excluding those that are mirrors or forks of other projects, as well as documentation-only projects (i.e., do not contain any source code).
Looking at the breakdown of the sample, we observe that the selected organizations (Table 1) host a number of repositories ranging between 9 and 349; regarding the projects (Table 2), we notice that, as intended, they are written in different programming languages (e.g., Java, Ruby, PHP, Dart) and have a history (3-15 years) long enough to observe the development rhythm of its contributors.The projects vary also in terms of size (40-3.8kcontributors, 1.5k-5.9MLOC, 1.6k-23.5kpull requests) and popularity (1.3k-131k GitHub stars).
In the rest of the paper, we focus on studying the core developers of the selected projects while analyzing their (in)activity across all projects within the entire organization.This is motivated by the observation that-albeit most of their commit activity focuses on the main project-on average nearly 36% of the total commits by the sampled core developers are contributed to the other projects from the same organization.
Table 2 A breakdown of the eighteen GitHub projects analyzed for each organization in the study (as of January 2020).Projects with a * represent those that also had developers interviewed to evaluate the preliminary model (in Phase 1).

Project
Progr

Identifying Core Developers and Their Pauses
For each project in the selected organizations, we first used the GitHub API to collect the commit history of all developers.Then, for each developer, we checked the days when they made a commit to the projects and called them 'commit days.'Accordingly, we define a 'pause' as the interval (in days) between two consecutive commit days.For each project, we collected the number of pauses taken by all the developers who contributed at least twice (i.e., we observed those who took at least one pause) and normalized it by the number of years that they have been in the project.The focus of our analysis is on core contributors because their disengagement poses a serious risk to the survival of OSS projects (Avelino et al., 2019).Therefore, we applied the Truck Factor (TF) algorithm (Avelino et al., 2016) to select the subset of the developers who are key to the project, and whose disengagement would greatly affect project survival (see Table 3).As the Truck Factor algorithm is not the only approach for identifying key people from the projects, we also used a popular heuristic based on the number of commits from the project contributors, which usually follows a heavy-tailed distribution (Coelho et al., 2018).Thus, in Table 3, we also report the core developers of a project, those who produced 80% of the total commits in a project (Commit-Based Heuristic).We noticed that the Truck Factor algorithm indeed is more selective than using the Commit-Based Heuristic and that most of the time the Truck Factor developers of a project are also included in the set of core contributors identified using the Commit-Based Heuristic.The only exceptions (i.e., nodejs/node, JabRef/jabref, and atom/atom) are caused by the fact that, unlike the Commit-Based Heuristic, the implementation of the Truck Factor approach that we used does not take into account the commits made on documentation files.Because Truck Factor is more selective, we used this approach to identify the developers of each main project to assess our model (RQ1).Thus, we collected their data and analyzed their inactivity according to the state model.We assessed the outcomes of the model by conducting a survey with the Truck Factor developers, aiming to validate our findings.
After assessing the model, we used the Commit-Based Heuristic approach to broaden the understanding of the phenomenon.With the complete set of core developers identified, we conducted a quantitative analysis based on the data collected to answer RQ2, RQ3, and RQ4.

Identifying Core Developers' Inactivity Periods
Because each developer has their own rhythm of development, not every pause in between two consecutive commits indicates that a developer has become inactive.For instance, for a developer who typically commits every other day, a week with no contribution to any project of an organization represents an actual period of inactivity; by comparison, a week without a contribution may not mean inactivity for a developer whose rhythm comprises only a few commits per month.Therefore, to identify the actual inactivity periods for each developer, we first create an individual array with all their pauses (in days) at the organization-level (i.e., for all the projects therein).Since developers' pauses vary in length, we only consider as inactivity periods those that are 'longer than usual.' To identify longer-than-usual pauses, we set a developer-specific threshold T f ov using the far out values approach for detecting outliers in a distribution (Tukey, 1977).A far out value is defined as an extreme value larger than the upper quartile plus 3 times the interquartile range, i.e., the outer fences in a box plot (see Fig. 4).In our case, let P =< p 1 , p 2 , ..., p n > be the array of all the pauses taken by a given developer, Q 1 (P ) the first quartile of the distribution, Q 3 (P ) the third quartile, and IQR = Q 3 (P ) − Q 1 (P ) the interquartile range (or H-spread ).Thus, based on the previous definition, we calculate the developer-specific threshold as T f ov = Q 3 (P ) + 3 × IQR.
The rhythm of a developer is not only unique, but also likely to change over time.Furthermore, the lifespan of participation in a project and its organization also varies from developer to developer.Therefore, it was necessary to identify a time-interval long enough to observe the natural rhythm of contributions and inactivity periods, and the 'variations' from it.As such, to account for these variations, we refined our approach based on the far out values by introducing a sliding window mechanism.
To define the size (in months) of this window, we experimented with different values, namely w = 1, 3, 4, 6, 12.For each value, we started by considering the initial window w (see step i in Fig. 5), identified all the pauses therein, and computed the far-out values threshold T f ov ; then, we identified the inactivity periods as the longer-than-usual pauses that are above T f ov within the window.After that, we moved the window w forward by 1 week (see step ii ) and repeated the shift until the last week of the contribution history (step iii ), thus eventually extracting all the inactivity periods from a developer's commit history.We selected a window size of 3 months because there are no cases of Truck Factor developers with a history of contributions smaller than Fig. 5 Inactivity periods are identified as the longer-than-usual pauses within each sliding window of size w and a forward shift of 1 week.We tested windows sizes of w = 1, 3, 4, 6, 12 months and selected 3 months as the optimal configuration. the selected window size, as is the case for w = 4, 6, 12 months (i.e., no data point is lost because of a too large window size); at the same time, this value reduces the number of pauses that are longer than the window size, as occurs in the case of w = 1 month (i.e., window size too small).
For further details, please refer to the complete algorithm available in Appendix A.

Labeling Breaks and Transitions
As noted by the participants in the preliminary model evaluation (see Sect. 4.1), developers often collaborate on multiple projects in the same organization.Therefore, broadening the activity analysis of the selected projects' core developers at the organization level was necessary for the completeness of our model.Indeed, today's OSS communities are more and more often examples of complex ecosystems of interrelated projects; for example, one developer might be non-coding or inactive in the front-end project, while they are actively coding to fix a bug in the back-end project from the same organization.
Regarding the type of activities, we label as coding activity both making a commit to a local repository and opening a pull request.For pull requests, we point out that we analyze and include in the activity timeline all the commits therein, whether the pull request was closed (as merged or not) or still open at the time of data extraction.
Other activities such as those related to code review and project management (e.g., closing a pull request, assigning an issue) are classified as noncoding; we included comments in pull request discussions, issues opened, comments in issue discussions, and other actions performed on issues (e.g., subscription, assignment, closing, labeling) that could be gathered via GitHub API.Given the state diagram presented in Fig. 3, we label as non-coding those inactivity periods when at least one non-commit action has been per-formed, whereas we call inactive breaks those periods of inactivity in which no events were extracted.
Regarding the transitions, first we mined the list of commits and other collaboration events (e.g., pull request and issue comments) from GitHub and labeled the back-to-codings (i.e., the transitions from non-coding to active), reactivations (from inactive to active or non-coding), and comebacks (from gone to active or non-coding).
To label the remaining transitions, we defined the time intervals ∆t non−coding , ∆t inactive , and ∆t gone after which a state change can be triggered if there were no commits and/or other events.Regarding the transitions to non-coding and inactive, we used ∆t non−coding = ∆t inactive = T f ov , the same developerspecific, far out value-based threshold used to select the breaks.The rationale behind this choice is to ensure that the time intervals account for the differences in the individual rhythm of contribution of each core developer.Additionally, this choice allows us to identify cases of multiple breaks and state transitions within one inactivity period.
To further clarify these cases, we provide a couple of examples.Fig. 6a portrays an inactivity period within two consecutive commits made by a developer of the atom project on Sept. 11 and Dec. 13, 2015.Various collaboration traces are left by the developer during such an interval (e.g., pull requests and issues comments), each shown in the figure as separate graphs.The threshold ∆t non−coding = ∆t inactive = T f ov is 85.75 days in this case.Because there are no sub-intervals longer than T f ov during which any activity was performed, this inactivity period represents a single break, labeled as non-coding.Instead, looking at the inactivity period depicted in Fig. 6b, we note that the developer was in the non-coding state for 32 days between Sep. 12 and Oct. 14, 2014.After that, (s)he did not leave further collaboration activity traces for longer than T f ov = 30 days, thus transitioning to inactive.As such, this inactivity period contains two breaks: a non-coding break followed by an inactive break.
Finally, regarding the extent of the hiatus ∆t gone after which an inactive developer transitions into gone, existing studies rely on different thresholds to classify abandoning developers.As there is no consensus, we experimentally tested different thresholds.We first tested thresholds of three (Constantinou and Mens, 2017) and six months (Miller et al., 2019).However, they generated too many gone breaks and comeback transitions.Then, we tested and selected ∆t gone = 12 months as the threshold, which gave results in line with our model conceptualization, while also being consistent with the speculations reported in our previous work (Iaffaldano et al., 2019) that a developer's 'death' occurs after a year of complete inactivity.Avelino et al. (2019) also found that this threshold is the least sensitive to error when computing the Truck Factor of popular GitHub projects.6 Phase II -Results

RQ1 -Evaluation of the model and identification method
To evaluate our model and identification method, we collected feedback from Truck Factor developers.Using the algorithm provided by Avelino et al. (2016), we identified 75 Truck Factor developers, distributed across the 18 projects, as shown in Table 3.Among them, 53 have been gone and/or inactive at least once.Out of these, 34 had a valid email address publicly available on their GitHub page.Similarl to the preliminary evaluation (see Sect. 4.1), we designed personalized questionnaires 17 for each developer, asking them (1) to provide feedback about the model and (2) to confirm whether they recognize the breaks and transitions we identified.In particular, we selected up to nine breaks (three instances for each of the three types of break states) among the most recent ones to facilitate their recollection; then, we explained the model, as well as the meaning of each state, and for each break we asked: (i) Were you actually in that state?, (ii) Why did you take that break?, and (iii) Why did you return to commit/collaborate?.We offered a $15 gift card to those who completed the questionnaire.
We received responses from 17 developers out of 34 invited (50% of response rate).A breakdown of the respondents is reported in Table 4.All respondents self-identified as male, with ages ranging between 21 and 50 (avg.35).We received answers from developers of several projects: rails/rails (8), laravel/framework (2), flutter/flutter (2), JabRef/jabref (2), nodejs/node (1), elixir-lang/elixir (1), and github/linguist (1).Model feedback.We asked the developers to rank their agreement with our inactivity model on a four-point Likert scale (1=Strongly disagree, 2=Somewhat disagree, 3=Somewhat agree, and 4=Strongly agree).We received 17 answers, of which only 1 strongly disagreed (6%); the remaining ranks are evenly distributed between moderately agree (8, 47%) and strongly agree (8, 47%).The mean and median values are 3.4 and 3, respectively.The strong disagreement came from a developer who mentioned that "GitHub doesn't track all the ways that I contribute to a project.For example, [...] I review design docs written in Google docs and I attend video conference meetings to mentor teammates" (D-16).This comment suggests that the name non-coding may unintentionally imply that developers are not actively participating unless they are coding.Consistently, albeit D-16 moderately agreed with model, D-13 commented that "there's varying degree of 'active.'Maybe this is more a continuum than one with discrete states", and D-10 suggested that we "should also consider reviewing and filing bugs as active, in case you're not."These observation are congruous with the comment from D-04, who observed that our model "maybe makes the 'commit' the ultimate contribution.For most projects that's the case, but [sometimes] the discussions around the new features added [...] are more important than the few line of codes that implement [them]." We also received a few comments concerning the gone state.For instance, D-05 pointed that the name is maybe too strong and constrained: "Per this definition, I am gone [...], but I'm still around, just no time to code or contribute.I occasionally do come to life, though."In line with this, D-08 noted that "you can be inactive for 12 months or longer without being gone.There's also a bit of an extra stage, which I guess I would just call 'lurker,' where you're still watching everything happening, you are just not participating in it (unless it hits some personal threshold that pulls you back in to participate)."This lurking stage matches our definition of inactive; as such, lurking may become an alternative name for inactive or be included in its the definition.Nevertheless, lurking represents a state from the perspective of the contributor, who can be viewed as inactive from the point of view of the project.Breaks and state transitions acknowledgment.We analyzed the questionnaire responses and, in particular, those cases where developers reported to disagree with the identified breaks and transitions.Table 5 reports the breakdown of the acknowledgments with 88 breaks and 62 agreements overall (71%).We observed 21 cases of disagreements (24%) of which 10 (48%) had "don't know/can't remember " entered as a comment or simply had no comment.To uncover potential limitations of our model, next we focus on discussing the remaining cases of disagreement that were reported with a motivation.
Aligned with our previous observation, the gone breaks generated disagreements due to different perspectives.Two developers disagreed that they were gone.We analyzed their explanations and concluded that they were actually gone according to our definition, but not to their own.In fact, D-05 did not want to consider himself gone despite the fact that he did not contribute for over a year because he "started several companies and work took over."Similarly, D-04 reported that he simply "did not find any bug or thing worthwhile contributing." Regarding the inactive breaks, we found 8 disagreements out of 39 cases (20.5%).In 6 cases, no comments were provided.In two cases, developers disagreed because they were studying new technologies (D-03) or doing design work (D-16) before contributing again.There are two cases we labeled as 'other' (6.1%): D-09 commented that he was sort of inactive: "I stopped being employed to [work] on <project name> full-time in 2010, but I remained pretty active as a mentor for at least one year after that."As for the agreements (29, 74.4%), the most cited reasons for inactive breaks are holidays, paid work, switching jobs, and personal matters.
Regarding non-coding breaks, we found 9 cases of disagreements out of 44 (20.5%), 5 of which present no comments to analyze.D-05 disagreed that he was in the non-coding state.Unfortunately, his comment ("was working on other things") does not provide any clarification: if he was coding for any project of the organization, we would have found traces, because we mined the entire GitHub organization commit history; otherwise, any work on other, unrelated projects would not matter here.The remaining case concerned D-15, who explained his disagreement saying that he had "already left the company" by then and, therefore, he could not possibly be contributing in any way to the project.The comment raised a red flag: having labeled it a non-coding break meant that we had found traces of collaboration activities through the GitHub API.After investigating, we found a couple of events that in which D-15 was mentioned in a PR conversion and assigned to an issue.As such, our speculation is that, unaware of his disengagement from the company, other project contributors kept mentioning him and assigning him to issues in his area of expertise.To avoid these cases, we fine-tuned our break labeling algorithm (see Appendix A) to consider only those GitHub events that imply 'active' participation of developers (e.g., writing a comment), discarding 'passive' events such being mentioned in comment, being un/assigned from/to an issue.All the results reported next in this paper have been obtained after applying this change.Finally, as for the 3 remaining cases of disagreement labeled 'other' (5.7%), D-09 commented that he "likely slowed down because the break was over the July 4 th long weekend, but was still doing some work."Yet, we could only retrieve traces of interaction on GitHub that do not include code contributions.
Table 5 also reports the breakdown of the 85 state transition acknowledgments, with 67 agreements overall (78.8%).We observed 14 cases of disagreements, of which half had "don't know/can't remember " entered as a comment or simply had no comment.Regarding back-to-coding transitions, we found 5 cases of disagreement out of 40 (12.5%),all of which with no comments.Regarding reactivations (transitions from inactive or non-coding into ac- tive), we found 8 disagreements.D-03 pointed out that what we mined "was probably a merge from an old pull request."Yet, after checking, we did find a commit from him that he probably did not remember or missed while checking on GitHub.As for the 2 cases of partial disagreement with reactivation transitions (i.e., classified as other), D-09 commented that, while he disagreed, he was not entirely sure.Again, we found commits authored by him thorough git log.Finally, with respect to the comebacks from the gone state, we found 1 case of disagreement out of 5 (20%), namely from D-05 who commented that he did not actually want to re-engage, but only "needed to fix something in the code." Coding of the reasons why.In our previous work (Iaffaldano et al., 2019), after interviewing OSS developers, we not only designed the preliminary version of the state model presented in Sect.4, but also defined a classification schema for the reasons why they took breaks.To gather a better understanding of the motivations of the developers surveyed in this study, we adapted the existing schema and developed two classifications to code, respectively, the comments to the the second question (i.e., "Why did you take that break?," see Table 6) and the third question ("Why did you return to <state>?," see Table 7) of the questionnaire.
In both coding schemas, we distinguished personal reasons (i.e., related to individual choices, needs, end events) from those concerning the community (i.e., events related to changes or interaction with others within the OSS community).We observe a consistent 80/20% split between the personal-vs.community-related categories.
Regarding the personal reasons for taking a break, they are quite balanced, with 20% of the surveyed developers mentioning professional reasons (e.g., "changed job", "been busy doing analysis work "), financial reasons (i.e., "doing paid work "), and a change of interest (e.g., "started looking for other projects to contribute to").Slightly fewer developers (14.5%) mentioned a life-related event as a cause for taking breaks (e.g., "was on vacation/holidays", "personal health-related issues"), which are more varied.With respect to the communityrelated reasons for taking a break, by far the most common is the change of role (13%), such as becoming project manager or a mentor for newcomers.
Changes in the organization of the community or personal issues with other members are not very common ( 3%).
Regarding personal reasons for returning to commit or contribute, by far the most common are professional (50%), which are cited not only by paid developers-whose employer is sponsoring the project-but also by others who need to fix something in the code of the project that was blocking their work.The other cited reasons concern personal (15.2%, e.g., "found something to contribute to") and financial interest (13%, e.g., "was doing paid work ").Again, life events such as end of vacation/holidays are the least common personal reasons (4.3%) for returning among core developers.With respect to the community-related reasons for returning to commit or contribute, the most common concern the sense of responsibility of the developer ( 13%), who feels committed to the project and dependable, and realizes they hold exclusive, vital knowledge.The other reasons are related to changes of the developer's role or in the project itself, and are considerably less common ( 2%).

RQ1 -Evaluation of the model
-Almost all of the surveyed Truck Factor developers (94%) agree with our state model of inactivity, either strongly (47%) or moderately (47%).-Respectively, 71% and 79% of the surveyed developers acknowledge their breaks and state transitions.-The names chosen for the active and non-coding states can be misleading as they may unintentionally imply that developers are not actively participating unless they are coding.-Most of the cited reasons among core developers for taking breaks as well as returning are personal ( 80%) rather than related to the interaction within the community ( 20%).-The most common personal reasons for taking breaks are professional, financial, or lack of interest.-The most common personal reason for returning to contribute are professional and the sense of responsibility toward the project.

RQ2 -Break Frequency
To answer the research question about the frequency of inactivity periods in OSS development, we quantitatively analyzed the results of our approach to identify the non-coding, inactive, and gone breaks of the core developers from the eighteen sampled projects.To broaden our analysis, in the second phase of analysis we considered the set of 538 core developers identified using the Commit-Based Heuristic approach.Table 8 provides an overview of the core developers who have been inactive at least once, considering their activity in all the projects of each organization.
We notice that 97% of the sampled core developers have been in the noncoding state at least once and 89% have been inactive.Nearly a third (33%) of the developers in our sample transitioned to the gone state at least once.On the one hand, we found projects in which no core developers have been in the gone state (i.e., aseprite, SpaceVim, and crystal-lang) or only one (i.e., ionicteam, BabylonJS, and MinecraftForge).On the other hand, in organizations like rails and github this happened for 65% and 89% of their core developers, respectively.
Finally, in Fig. 7, we report the distributions of the average number of noncoding, inactive, and gone breaks per core developer in each organization.As expected, we observe that the non-coding (mean: 11.63, median: 10, SD: 9.14) periods occur more frequently than inactive breaks (mean: 6.56, median: 5, SD: 5.84), and that gone breaks are the least common (mean: 0.27, median: 0, SD: 0.60).

RQ2 -Break frequency
-Almost all core developers take breaks, as 97% and 89% of them have been in the non-coding and inactive states at least once, respectively.-About one third of the developers analyzed (33%) have been gone at least once and disengaged completely from a project for one year or longer.-Non-coding periods occur more frequently than inactive breaks within organizations; gone breaks are by far the least common.

RQ3 -Break Length
We analyzed the duration (in days) of the breaks, except for gone breaks, since most of them (see Table 8) were still in progress at the time of this analysis.The breaks range from 367 to 2,893 days.Fig. 8 shows the boxplots of the average duration of the non-coding and the inactive breaks per core developer, grouped by organization and on a logarithmic scale.Overall, the average duration of a non-coding break ranges from 8 to 994 days (mean: 32.26, median: 22, SD: 32.03), whereas the duration of an inactive break varies in the range of 8 to 374 days (mean: 65.59, median: 41, SD: 66.43).From Fig. 8, we observe that the median duration of inactive breaks is consistently longer than that of non-coding breaks across all the organizations, with the sole exception of jquery.
To understand whether these differences are statistically significant, for each organization we filtered those developers who have been in both states, and then performed a series of paired tests for each organization.Because the distribution of break durations are not normal, we used the Wilcoxon singedrank test as a non-parametric alternative to t-test for matched pairs.The results are reported in Table 9, where we observe that most of the core developers have been in both states (non-coding and inactive) at least once.We observe a few cases of statistically significant mean difference at the 1% level (after Bonferroni-Holm correction for multiple comparisons) between non-coding and inactive break duration, namely flutter, atom, jquery, crystal-lang, and MinecraftForge.To assess the magnitude of such differences, we computed the effect size as Cliff's δ, using the thresholds provided by (Hess and Kromrey, 2004). 18We found a large effect size for atom, crystal-lang, and Minecraft-Forge.

RQ3 -Break length
-On average, inactive breaks last longer than non-coding breaks (65 vs. 32 days).-This difference is, however, statistically significant only for three organizations (atom, crystal-lang, and MinecraftForge).

RQ4 -State Transition Probabilities
After identifying all the core developers' breaks, we calculated their probabilities to transition from one state into another.Specifically, in Fig. 9, we present the transition probabilities aggregated for all the sampled organiza- 147 "negligible", |δ| < 0.330 "small", |δ| < 0.474 "medium", otherwise "large" tions, whereas in Fig. 10 we report the probabilities for each organization individually.
Active state.From Fig. 9, we can observe that the average probability to remain active is 59.57%.This suggests that more than a half of the analyzed core developers tend to follow a constant rhythm of code contributions.This is similar for each organization in our sample (probabilities in the range 50.0-62.9%,see Fig. 10).On average (see Fig. 9), core developers in the active state are more likely to transition to non-coding (28.99%) than to become inactive (11.44%).This observation also holds true when we analyzed organizations individually (Fig. 10), with the exception of BabylonJS (see Fig. 10k).
Non-coding state.On average, core developers in the non-coding state are much more likely to return to being active (67.5%,see Fig. 9) than remain in the same state (0.33%) or become inactive (32.17%).The same observation also holds true at the individual organization level (Fig. 10).
Inactive state.We found that inactive core developers who do not provide any signal of life/participation in an organization are more likely to resume contributing by writing code-i.e., return to the active state (54.89%,Fig. 9)-than by collaborating without code-i.e., return to the non-coding state (35.05%).Also, our findings suggest that inactive core developers have a lower probability (8.18%) to remain so for more than one year and enter the gone state than to remain inactive for less than 12 months (1.88%).Regard- ing the individual organizations, despite several differences observed, no clear pattern can be identified.Gone state.On average, more than a half of the developers who abandon organizations for more than one year remain in the gone state (53.89%).Still, comebacks are not rare events, with 25.55% and 20.56% going back to the active and non-coding state, respectively.However, looking at the organizations individually, a quite varied picture emerges, which does not appear to be affected by their type.First, we notice that in only 4 out of 18 organizations (i.e., aseprite, MinecrafForge, SpaceVim, and crystal-lang), no core developer has ever disengaged, i.e., transitioned into the gone state.In a way, this pattern of behavior is expected for these smaller communities in which the main project relies on 1-3 Truck Factor developers and 1-6 core developers (see Table 3), whose disengagement would jeopardize the survival of both the project and the community.As such, these developers form a solid base of project maintainers who have never taken breaks longer than a year.For the remaining organizations for which we found cases of disengagement, such as node (Fig. 10a) and linguist (Fig. 10m), we notice that nearly 60% of them remain in the gone state (between 40-100%), while the others resume contributing, i.e., go back to either active or non-coding.
To further our understanding of developers disengaging from projects, we analyzed the association between the level of code contribution to a project and the likelihood of developers transitioning into the gone state at least once.To do so, we used the set of core developers (n = 538) with their associated percentage of code contribution to the projects.Then, we computed the perproject median and assigned each developer to either of two equally-sized bins, called high-and low -level contributors.Finally, we built a logistic regression model where is_gone is the dichotomous dependent variable to be predicted (i.e., whether a developer has ever transitioned to the gone state at least once) and the only predictor is the ordinal variable contribution_level (low/high).
The results of the logistic regression are reported in Table 10.The odds ratio (OR) for the variable contribution_level = high is 0.39.The 95% confidence interval for the OR does not include 1, indicating that the result is statistically significant.Therefore, as compared to low-level contributors, high-level contributors' odds of disengaging from a project at least once (i.e., is_gone = true) decrease by 61%.

RQ4 -State transition probabilities
-Most core developers have a steady rhythm of code contributions, as the average probability of remaining in the active state is 60%.-Most core developers who pause their code contributions and transition into the non-coding and inactive states resume coding and go back to active (on average 67% and 55%, respectively).-On average, about 8% of inactive developers remain so for over a year and eventually become gone, disengaging from the organization.-On average, more than a half of gone developers never come back ( 54%), whereas the remaining half is equally distributed between those who resume coding (i.e., active, 26%) and those who resume collaborating without contributing code (i.e., non-coding, 21%).-No transition into the gone state was observed for smaller projects, counting on 1-3 Truck Factor developers and up to 6 Core developers.-As compared to Low-level contributors, for High-level contributors the odds of disengaging from a project at least once decrease by 61%.

Discussion
In the following, we discuss our results, presenting implications, insights, and future research avenues.
Rhythm-based model.Our revised model, presented in Fig. 3, is a significant extension of the original model presented in (Iaffaldano et al., 2019)inspired by the circadian rhythm, which also varies from individual to individual.The assessment of the revised model by the developers showed that 94% of them agreed with it, and that it correctly captured about 70-80% of the transitions (as per RQ1, see Sect.6.1).Notably, some developers did not remember their breaks or did not acknowledge the transitions.For many cases of transition disagreements, we found that the developers were not actually working on the project, but they were still present in some form (e.g., lurking, waiting for the next thing to do).Although the developers considered themselves active in the project, per the model definition they fit the inactivity state, since they were considered not actively contributing.We also found that even considering multiple events and activities that are public, contributors perform other kinds of activities not observable from the traces left on GitHub.This corroborates the results from Trinkenreich et al. (2020), who provide empirical evidence of the existence and importance of community-centric roles (e.g., advocate, license manager, community founder), which typically remain hidden, since these roles may not leave traces in the software repositories typically analyzed by researchers.Why are you leaving?Our investigation (RQ1) collected evidence on the reasons why core developers who do not completely disengage take breaks and then resume their activity.Albeit not exhaustive, our findings show that in both cases the reasons are mostly personal ( 80%) rather than related to the community (e.g., social interactions, technical changes, and organizational aspects).Such personal reasons concern life events (e.g., child birth, vacation) as well as professional (e.g., new job) and financial changes (e.g., stopped volunteer contribution to focus on a paid job).
These reasons for disengagement extend the current literature that focuses on motivations to join and factors that explain longer attachment (Zhou and Mockus, 2012;Silva et al., 2020;Lin et al., 2017;Constantinou and Mens, 2017;Gerosa et al., 2021).We contribute by showing that, in addition to job-related reasons (also reported also by Miller et al. (2019)), a non-negligible share of the reasons to leave are related to life-events and change of interest.These two latter categories of reasons cannot be easily controlled by project maintainers and are harder to predict based on historical data.On the other hand, we found cases of disengagement related to changes in the community organizational structure, problem with other members' ideas, and lack of available tasks.These reasons are in line with other studies, which showed that changes in the organization (Yu et al., 2012) and mismatch of ideas (Steinmacher et al., 2018) are reasons for contributors to give up.Still, we show that those core members who leave for job-related and life-event reasons generally returnwhich explains more than 50% of probability of contributors moving away from the gone state (see Fig. 9).Variations in inactive periods.Our findings serve as an input to communities that want to create awareness mechanisms to inform about developers' inactivity.As our results showed, almost all developers take some kind of break during their life cycle-89% and 97% have been respectively in the inactive and non-coding states at least once (as per RQ2, see Sect.6.2).Moreover, as investigated in RQ3, the duration of the breaks may vary widely, even across developers from a single project (see Fig. 8).We noted, for example, that for rails/rails, the period in non-coding state varies from 8 to 270 days, while for atom/atom, the variation in gone state is 428-651.This suggests that projects and organizations may find reference values based on the distribution of breaks; however, it is important to closely follow the individual rhythm and characteristics of developers, given that each developer may have a distinct contribution pace.It is not a 'goodbye,' it is a 'see you soon.'One interesting finding from RQ2 is that about one third of the core developers analyzed (33%) have been gone at least once from a project for one year or longer (see Table 8).Furthermore, as per RQ4 (see Sect. 6.4), the transition probabilities presented in Fig. 9 show that the probability of a comeback, considering all projects, is 25.6%.Moreover, the chances that someone inactive will transition to gone is on average 8.2%, which is much smaller than the chances of transitioning from inactive to active (54.9%) or non-coding (30.1%).
Overall, the number of break events observed and the odds in Figs.9-10 suggest that the core members of the analyzed projects are more likely to stay than to abandon projects, especially for smaller projects like aseprite, Space-Vim, MinecraftForge, and crystal, which count on a smaller core base (see Table 3).Consistently, the results of the logistic regression reported in Table 10 indicate that projects remain sustainable because people with high-level contribution rates have higher odds (61%) to stay active.This shows that turnover at the core-developers level is not an issue as we expected-considering the analyzed sample.Moreover, given the age of the projects analyzed (3 to 15 years, median 8), some turnover is expected with the rise of new core members.For example, as reported in Sect.6.2 (see Table 8), 38 core developers from rails were gone at the time of the data collection (41% of the rails sample), and likewise for fastlane (9 devs., 47%) and jekyll (8 devs., 47%).The remaining core developers were, in general, enough to keep the projects running.Although the renewal was not analyzed in this paper, this is something worth investigating from the perspective of project sustainability.Future work may focus on the impact of breaks and disengagement with the project performance, and in understanding how other developers take over when a core member transitions into gone.
Non-coding activities matter.Usually, the indicators of contribution (such as GitHub graphs)19 focus on contributions made to the repository, which represent mainly code contributions.However, it is known that contributors follow different pathways to contribute (Trinkenreich et al., 2020) and non-code contributions are often neglected (Ferreira et al., 2017;Decan et al., 2020;Miller et al., 2019).We found in our analysis that almost all core contributors take breaks from code contributions but take over non-coding roles (see Table 8).Separating the analysis of these two types of contributions and making non-coding one state in the model is a way of not obfuscating the importance of-and give visibility to-non-code contributions.
Adjustments to the model.Given the feedback received during the qualitative assessment, we accordingly adjusted the model.First, we had to account for the 'degrees of activity' provided by our model.Our analysis of RQ1 (see Sect. 6.1) revealed that the model was accepted by 94% of the surveyed developers.However, we noticed room for improvement based on the received feedback.One shortcoming that clearly emerged from the analysis of comments was that the activity state came with different degrees of action and that the model seemed to convey the idea that committing source code is the ultimate form of 'action.'While it is impossible to represent all the nuances of development activities in a state diagram, we accepted the feedback and refined the model so that it would not unintentionally mislead developers into thinking that non-coding means that they are not actively participating.As such, in the final version of the model, presented in Fig. 11, we renamed the active state to active coding and non-coding to active non-coding.
Regarding the identification of the transitions into the gone state, the feedback we received showed that 4 out of 5 developers did not acknowledge that they had been gone (see Sect. 6.1).As aforementioned, these developers point out that they did not want to leave; rather, they state that they did not find anything to work on (D-04) or were too busy with their paid job (D-05).We argue that the disagreement is related to different perspectives taken: project vs. personal.Our model takes the perspective of the project, mapping when someone takes an overly long break.When a developer reaches the threshold of one year without any traces, per the model definition they are considered gone.Future Work Opportunities and Implications for Researchers.We designed and evaluated our model and method to understand the different states of the contributor's life cycle and their transitions.Researchers can extend our approach to build prediction models to anticipate breaks and disengagement.An interesting future work would be to create and evaluate such prediction tools by extending the work of Decan et al. (2020) to consider the length of the breaks and non-coding activities as part of developers' contributions.
Researchers can also expand our research by including maintainers' perception of core developers taking breaks.We have so far focused on identifying and acknowledging breaks according to developers' own perception, but we do not yet know how these inactivity periods and breaks are seen by other community members.
Although we mined multiple public events from GitHub, sometimes we considered developers as not active because their activities did not leave traces on GitHub.Researchers who investigate contributors' activity in software repositories should be aware of this kind of activity.Usually, the literature focuses on coding activities, disregarding other kinds of contributions and 'hidden figures' (Trinkenreich et al., 2020).Although we only looked for the activities available on GitHub, this opens doors for further research about 'invisible collaboration,' which may not be mined from project repositories.
A larger sample of repositories can also be investigated, including some private repositories owned by companies, to gather a better picture of how disengagement varies across different projects and ecosystems, and whether developers migrate.In addition, we did not analyze whether the breaks are related to the project schedule or coordinated among team members.A potential future direction would be an in-depth exploration of these relationships.
Another way to expand our work is to look at the effects of sponsorship on developers' inactivities-to identify potential differences in frequency and length for company-supported OSS projects that routinely receive contributions from employed developers-as well as study the impact on productivity at the project level-to understand whether and how overall productivity takes a hit during core contributors' breaks.
Implications for Practitioners.Our results show that developers have different work rhythms, and that the transitions to breaks and abandonment vary from project to project.In fact, this is in line with the literature that has shown software engineering metrics to be highly context dependent (Zhang et al., 2013;Gil and Lalouche, 2016;Aniche et al., 2016).Nevertheless, in this paper we presented individual analyses of a variety of projects and organizations, which allows practitioners to observe similarities and differences, fostering the generalization by similarity (Wieringa and Daneva, 2015).Our results also indicate a range of values to which projects can compare themselves.
Inspired by our model and methods, communities can also build tools to monitor the rhythm of individual developers (e.g., building dashboards) and detect when they take longer-than-usual breaks, helping maintainers and managers to assess developers' life cycles individually.While informally discussing our results with a core developer who participated in the research, he mentioned that this kind of information would be valuable to create a "Contributor Relationship Management System" (CRM).In this case, according to him, "customers are the contributors.As an OSS project owner, one wants to keep the contributors contributing and identify high-potential ones.When I see high-potential contributors being away longer than usual, I would use the communication channel they like."Such a system could work as a heart monitor, alerting the community managers of potential cases of 'non-reported' breaks.This may help maintainers, for example, identify developers who are 'waiting for the next thing to do,' and have been inactive for longer than expected.It is important to keep these developers in the loop, because they may eventually lose track of the technical and social aspects of the project, which may lead to disengagement.This is especially important since most of the reasons to leave reported by our survey respondents are personal (see Table 6), beyond the project's reach.Limitations.Every empirical study suffers from threats to validity and limitations (Wohlin et al., 2012).First, we acknowledge that we focused on studying a sample of eighteen projects and organizations hosted on GitHub.Although GitHub hosts millions of open source projects20 , we acknowledge that the results may be different for other forges and ecosystems.Future research can expand our work to consider other environments.Moreover, because we used a convenience sampling strategy, we acknowledge the potential sampling bias.Also, despite our efforts to add variety to the sample, most of the projects analyzed are popular development frameworks and tools for developers-with the exceptions of jabref, MinecraftForge, aseprite, and SpaceVim.This imbalance is likely the side effect of having started the sampling process by selecting projects and organizations from those trending on GitHub.Accordingly, our findings may not generalize to other OSS communities, given that each project and organization has their peculiarities.Nonetheless, the projects in our sample are diverse in terms of programming languages, size, age, and popularity, which helped us provide an initial understanding of the developers' inactivity phenomenon.Presenting individual analyses of these projects and organizations allows the identification of similarities and differences, which fosters the generalization by similarity (Wieringa and Daneva, 2015).
Also related to the generalization of results, after sampling the 18 organizations, in Sect.5.1 we described how we selected their main project as the largest one in terms of the number of contributors, stars, forks received, pull requests, and LOC.While most of the sampled projects are undoubtedly representative of the organizations-since they have identical or closely resembling names (e.g., atom/atom)-this might not be the case for github/linguist and facebook/react.Both github and facebook organizations are 'umbrellas' hosting any open-source effort from the two companies and, as such, they include a varied set of loosely related projects.Hence, we acknowledge that linguist and react may not be representative of their respective organization and that this difference may impact the generalizability of our findings.
The method designed to identify breaks showed a reasonably good performance (between 70-80% of developers acknowledged their state and transitions), albeit not perfect.One possible explanation for the disagreements is that we consider only the activities tracked by GitHub, whereas other actions happen outside of the platform, like communicating over email or Gitter.While we acknowledge this as a limitation of our current method, we point out that there is a great variety of communications solutions (not to mention the other non-communication tools), which makes it impracticable to cover all of possible alternatives.In addition, these tools do not always leave freely accessible traces of activity.Projects using Slack, for example, make it particularly hard to download and analyze the communication logs.Moreover, when integrating data from multiple sources, we point out the need to disambiguate developers' multiple logins and identifications (Wiese et al., 2016).Finally, we hope that, by making all the scripts available, there can be independent replications and extensions of our work that bridge this gap by investigating the effect of external communication channels on our method.
When identifying the gone breaks, our model cannot distinguish between cases in which developers have the intention of abandoning a project from those who are just taking a break longer than a year (for whatever reason) until they will eventually come back and resume their activity.Furthermore, the survey about the reasons for taking breaks and resuming contribution (described in Sect.6.1) has revealed that reasons can also depend on the decision of the employer.It is indeed more and more common for paid developers to contribute to OSS projects; some companies even share their products under open-source licenses, while keeping their employees in charge of maintaining the projects (Pinto et al., 2018).In our current analysis, we do not distinguish between company-sponsored projects, which routinely receive contributions from employed developers, and community-based projects, which mostly grow through voluntary contributions.In fact, the distinction between the two types can be blurry, since there are often sponsored contributions to community-based projects and vice-versa.To date, there have been semiautomatic attempts at classifying the contribution between paid and non-paid developers (Pinto et al., 2018); however, there is no broadly accepted method to be employed at scale, without time-consuming manual intervention.
Another potential limitation of this work relates to the choice of the algorithms used to identify the sample of core developers under investigation.After manual validation by comparing the Truck Factor developers sample with the contributors listed in the projects' contributors pages on GitHub, 21 we observed that the top contributor was not automatically selected in the cases of jabref and ionic-framework, and we had to manually amend these errors.As such, the Truck Factor algorithm may suffer from shortcomings in identifying all the core developers, arguably due to how authorship of a file is determined.However, by also using the Commit-Based Heuristics approach, which broadens the list of core developers of each project to the authors of 80% of non doc-only commits, we implicitly address those limitations.
Lastly, the method proposed to analyze the developer's life cycle may be seen as arbitrary.To mitigate this issue, we evaluated our approach via survey twice, first when conceiving the model (Sect.4), and then as an answer to RQ1 (Sect.6.1), which resulted in the final version depicted in Fig. 11.

Conclusion
In this paper, we analyze the life cycle of OSS project developers concerning their breaks and disengagement.First, by collecting data from GitHub and conducting a survey with developers, we devised a model that describes states and transitions related to the breaks.Then, we designed and implemented an approach to identify developers' transitions to/from inactivity states, which we validated by collecting feedback from developers-94% of the TF core developers agree with our state model of inactivity and 71% and 79% of them acknowledge their breaks and state transitions, respectively.
Our results also showed that breaks are rather common, and that core members take frequent breaks, which vary in length and type.We observed that all developers took at least one break, 97% of them transitioned to noncoding and 89% to inactive.
We also analyzed the probability of the transitions to/from inactivity states and observed that core developers will likely remain in the projects.However, if they transition to gone, they are less likely to come back ( 54% of chance, on average).Our results may help communities to create mechanisms to monitor core members breaks or to prepare to recruit new members in case of disengagements.

A Breaks Algorithm
This is the pseudocode of our sliding-window algorithm for identifying breaks in the commit history of developers.
A window W is defined as a couple < start, end >.A pause p is defined as a couple < start, end > in between two consecutive activities.A break b, which by definition is a pause in a window W that is larger than the far out value threshold t, is therefore defined as a tuple < W, p, t >.To compute the far out value threshold, a window W must contain at least 4 pauses (data points); if so, W is considered valid.Finally, a threshold t in a window containing a distribution of pauses for which the IQR ≤ 1 is considered not valid.IQR, defined as the range holding 50% of the data, is a measure of variability.Small values of IQR in a window suggest that the lengths of pauses concentrate around the mean value.

Fig. 1
Fig. 1 An overview of the research framework.

Fig. 3
Fig. 3 The state diagram of the revised model.A transition label represents the event triggering the state change, whereas internal activities are performed while in state.

Fig. 4
Fig. 4 An example of box plot with far out values above the outer fence.
(a) An inactivity period containing one break labeled as non-coding.(b)An inactivity period containing a non-coding break that transitions into an inactive break.

Fig. 6
Fig. 6 Examples of how breaks are identified within inactive periods and labeled.

Fig. 7
Fig. 7 Distributions of the average number of non-coding, inactive, and gone breaks per core developer in each organization.The boxplots are sorted in descending order with respect to the duration of non-coding breaks.

Fig. 8
Fig. 8 Distribution of the median duration (in days on a logarithmic scale) of non-coding vs. inactive breaks for each developer.The boxplots are sorted in descending order with respect to the duration of non-coding breaks.

Fig. 9
Fig. 9 Aggregated transition probabilities all core developers in the sampled organizations.

Fig. 10
Fig. 10 Transition probabilities for core developers of the analyzed organizations.

Fig. 11
Fig. 11 Final version of the inactivity model with revised names for active-coding and active non-coding states.

Table 1
The eighteen GitHub organizations sampled for this study, sorted by number of repositories, and the selected projects (as of January 2020).

Table 3
Truck Factor developer s (TF) and Core developers responsible for the 80% of total commits (Core), extracted for each of the projects in the sample.

Table 4
Breakdown of survey respondents.

Table 5
Break and state transition acknowledgments from the surveyed Truck Factor developers.

Table 6
Coding schema for the reasons to take breaks.

Table 7
Coding schema for the reasons to return to commit/contribute.

Table 8
Core Developers who have been inactive per organization.

Table 9
Results of the Wilcoxon signed-rank (paired) tests to reveal mean differences between non-coding and inactive break durations for developers who have been in both states (results in bold are significant at the 1% level after p-value correction and also have a large effect size).

Table 10
Results of the logistic regression shows that for high-level contributors the odds of disengaging at least once from projects are 61% (OR=0.39)less than low-level contributors.