1 Introduction

During the last several decades, OSS has been gradually accepted and used by people. The development mode of OSS is quite different from that of closed source software (CSS). CSS has hierarchical management in the development and testing processes, and is completed according to the pre-arranged plan and target. Therefore, developers, testers and debuggers are relatively stable during development and testing. However, OSS is developed, tested and debugged dynamically by developers, users and volunteers in the network and open environment. In order to improve the OSS reliability, the industry generally uses the method of frequent release to improve it. But there are two problems. One is that if the OSS is released too early, there will be many faults in the software. It will seriously affect the use of OSS. Second, if OSS is released too late, users and volunteers will be impatient to wait and turn to other OSS to replace it. Therefore, the reliability of OSS will be widely questioned.

In order to solve the problem of OSS reliability evaluation, some researchers have developed a few reliability models of OSS. For instance, Tamura and Yamada [1] established a reliability model of OSS using stochastic differential equation. Li et al. [2] observed that the error detection rate of OSS increased first and then decreased, and established the corresponding OSS reliability model. Yang et al. [3] built delayed OSS reliability models considering the relation between fault introduction and fault detection. Aiming at the debugging activities of OSS, Lin and Li [4] established an OSS reliability model with a rate-based queue theory. Huang et al. [5] proposed a reliability model of OSS considering fault detection with a bounded generalized Pareto distribution. Singh et al. [6] built a multi-version reliability model for OSS based on entropy and an optimal release strategy considering user and volunteer satisfaction level. Wang and Mi [7] established a reliability model for OSS considering that the fault detection rate has a decreasing change trend.

Although the reliability model for OSS can effectively assess the OSS reliability under certain open source conditions, owing to the variability and complexity of OSS development, testing and debugging environments, the established reliability model for OSS cannot fully satisfy the actual OSS reliability assessment.

In the development and testing processes of CSS, there are static testing methods, such as peer review, walk through and inspection, to ensure the quality improvement of CSS. When faults are detected, developers (or debuggers) and testers can communicate face to face and discuss the causes of faults and how to remove them. Thus, in the CSS testing environment, faults can be well described and completely removed. However, faults detected during the OSS test are passed to the developers by users and volunteers through the network, and then the developers organize personnel to remove the faults. In the open environment, users or volunteers cannot clearly describe fault information to developers, which will cause developers to remove the fault incompletely and new faults are introduced. Moreover, in the fault tracking system of OSS, the fault state will be modified from closed to reopened. This indicates that the removed faults have not been completely eliminated, or new faults may have been introduced. Therefore, in the process of reliability modeling of OSS, it certainly needs the reasonable and effective study of the phenomenon of fault introduction in the OSS debugging process.

Considering the complexity and non-linear changes of fault introduction during OSS debugging, the fault introduction rate will show the change of decreasing, or increasing first and then decreasing over time, etc. Therefore, assuming that fault introduction obeys a single law of change, it does not accord with the actual situation of fault introduction in the process of OSS debugging. The reliability model of OSS established by this method cannot satisfy the actual reliability assessment of OSS. At least, the adaptability of the reliability model of OSS established by this method is very poor. It cannot adapt to the complex reliability assessment of OSS.

Additionally, because OSS is very different from CSS in the testing and development processes, the software is evolved through the active participation of volunteers and users located at different geographical locations. Moreover, each release of OSS will add some features, functions or components compared with the previous version. Those changes in OSS lead to the complexity of fault introduction. One is the introduction of new faults for fault removal in the current version of software, and the other is the newly generated faults caused by newly added features, functions and components. When the newly generated fault is removed, a new fault will also be introduced. These two kinds of introduced faults make the debugging of OSS complex. Considering those characteristics, we can use the GISS distribution to simulate the complex changes of introduced faults for OSS. Since the GISS distribution can well simulate the complex changes of introduced faults for OSS, the proposed model in this paper is different from the CSS reliability model which fault detection obeys the GISS distribution in the existing literature [8]. Thus, the established model can fully get with the OSS testing and development environment, and effectively carry out residual fault prediction and software reliability evaluation in OSS.

In this paper, we develop an OSS reliability model considering that fault introduction obeys a generalized inflection S-shaped (GISS) distribution. Assuming that the fault introduction obeys the GISS distribution, the fault introduction rate will show a variety of complex nonlinear changes, for example, the fault introduction rate increases first, then decreases over time; the fault introduction rate decreases or increase over time, and linear changes, for instance, the fault introduction rate is a constant. The OSS reliability model established by this method will have strong adaptability and robustness, and it can adapt to the fault introduction changes during development, testing and debugging of various OSS. Therefore, the established model can be used to assess the OSS reliability.

The contributions of this paper are as follows,

  1. (1)

    To the best of our knowledge, it is the first to propose that fault introduction obeys GISS distribution.

  2. (2)

    The proposed model can adapt to the complicated changes of fault introduction in the process of actual OSS development, testing and debugging.

  3. (3)

    The proposed model can be used to assess the reliability of actual OSS projects.

  4. (4)

    We solve the problem of establishing software reliability model with various fault introduction changes under complex open source environments.

The structure of this paper is organized as follows,

Related work is introduced in Sect. 2. Section 3 presents the fault introduction rate with the GISS distribution, and introduces the proposed model. Section 4 introduces OSS fault data sets, model comparison criteria and parameter estimation method of the proposed model. Experiments of model performance comparison are done in Sect. 4. Sensitivity analysis of the proposed model parameters is done in Sect. 5. The implication of the study is discussed in Sect. 6. Section 7 is threats to validity of the developed model. Last section is conclusions on this paper.

2 Related work

In the past of decades, the reliability of OSS has become one of the hot issues in the industry. “Release early, release often.” [9] has become a way for developers to improve the OSS reliability. But it faces two problems: (1) if the software is released too early, there will be too many faults in the software, and users will lose interest in using the software. As a result, the software cannot be widely tested and used. (2) If the software is released too late, users will use other open source software to replace the software. After the software was released, it was not used, tested and abandoned.

By studying the change rule of fault data of OSS, researchers propose that the reliability model of CSS can also be applied to evaluate the OSS reliability. For example, Zou and Davis [10] proposed that the reliability models of CSS can evaluate the OSS reliability, and the reliability model of CSS based on the Weibull distribution has the better fitting and predictive performance than other CSS reliability models. Anbalagan and Vouk [11] proposed that traditional reliability models can evaluate OSS reliability by studying problem reports. Similarly, Rossi et al. [12] study bug repositories and think that the reliability of OSS can be evaluated by traditional CSS reliability models. Chiu et al. [13] proposed a software reliability model in consideration of learning effects of fault detection. Okamura et al. [14] studied the distribution of failure time and established a software reliability model based on the normal distribution. Additionally, considering the weighted value changes of support vector regression (SVR), Utkin and Coolen [15] proposed a corresponding software reliability model. Wang and Zhang [16] used a deep learning method with RNN encoder-decoder to predict software reliability. Ke and Huang [17] developed software reliability models based on changing points of testing efforts.

Because the development and testing environment of OSS is completely different from that of CSS, the method of evaluating the reliability of OSS by using CSS reliability model is widely questioned. Therefore, researchers have proposed some reliability models of OSS. For instance, Tamura and Yamada [18] established an OSS reliability model of the logarithmic Poisson execution time and used the Analytic Hierarchy process to estimate the model parameters. Gyimothy et al. [19] used various object-oriented metrics to study fault prediction for OSS. Tamura and Yamada [20] used neural networks to build OSS reliability model and gave an optimal software release method. Syed-Mohamad and Mcbride [21] indicate that the testing and development process of each OSS is different, and the reliability model can be established according to the specific situation. Tamura and Yamada [22] developed an OSS reliability model using the deterministic chaos theory.

Singh et al. [23] built a reliability model of OSS considering different kinds of faults in the fault data sets. Singh et al. [24] proposed an OSS reliability model based on change-point. Considering the dynamic changes of OSS development and testing, some researchers use a random differential equation to build the corresponding OSS reliability model [1, 25,26,27]. Considering the frequent release of OSS, some researchers proposed a few reliability models of multi-version or multi-release OSS [6, 28,29,30]. Tamura et al. [31] proposed a reliability model of open source cloud computing based on jump diffusion. Tamura [32] proposed a reliability model of open source cloud computing considering the irregular fluctuation of the fault detection rate. Tandon et al. [33] developed multi-release OSS reliability models based on entropy. Through studying changing-point changes, Kapur et al. [34] developed two-dimensional OSS reliability models. Ivanov et al. [35] used the goal question metric (GQM) method to evaluate and forecast software reliability under open source situation considering mobile operating environments. Additionally, Barack and Huang [36] used a few software reliability models to evaluate and forecast the reliability of OSS. Wang [37] proposed an OSS reliability model taking account into fault introduction based on the Pareto distribution.

Yang et al. [38] built a reliability framework based on a change point for OSS using masked data, and used expectation maximization algorithm to solve the likelihood function as estimating model parameter values. Considering imperfect debugging and changing-point, Saraf et al. [39] developed a multi-release framework on fault correction and fault detection for OSS. Sun and Li [40] proposed a few software imperfect debugging models taking into account fault introduction and fault levels.

Owing to the complexity and dynamic change of the OSS test and development, software reliability model cannot be used in all OSS reliability assessment. Thus, Ullah and Morisio [41] used the optimal selecting model method to determine which software reliability model is used to estimate the current OSS reliability. Considering the complexity of fault detection during software development and tests, Raghuvanshi et al. [42] established a time-variant software reliability model. In addition, considering the dynamic changes of OSS testing and development processes, Saraf et al. [43]. developed multi-release reliability models with imperfect debugging and change-point for OSS. To evaluate the OSS reliability in various complex situations, we develop an OSS model which can adapt to various open source environments and considers fault introduction.

3 Building OSS reliability model

3.1 Fault introduction with a generalized inflection S-shaped distribution

Suppose that the fault introduction follows the GISS distribution, then the fault introduction rate can be expressed as follows,

$$ \omega (t) = \frac{f(t)}{{1 - F(t)}} $$
(1)
$$ F(t) = \frac{{1 - \exp \left( { - \alpha t^{d} } \right)}}{{1 + \beta \exp \left( { - \alpha t^{d} } \right)}},\quad \;\alpha > 0,\,d > 0,\,\beta \ge 0\; $$
(2)
$$ f(t) = \frac{{{\text{d}}F\left( t \right)}}{{{\text{d}}t}} = \frac{{\left( {1 + \beta } \right)\alpha {\text{d}}t^{d - 1} \exp \left( { - \alpha t^{d} } \right)}}{{\left( {1 + \beta \exp \left( { - \alpha t^{d} } \right)} \right)^{2} }}\; $$
(3)

where \(\omega (t)\) presents the fault introduction rate function. F(t) denotes the GISS distribution. d represents a shape parameter. \(\alpha\) and \(\beta\) are scale parameters.

In general, two main fault types for open source software are stochastically dependent and mutually independent faults. When independent faults are removed, the possibility of introducing new faults is relatively small. When interdependent faults are removed, it is more likely to introduce new faults, or they cannot be completely removed. In other words, in the debugging process of open source software, the possibility of introducing new faults when simple faults are removed is less than that when complex faults are removed. Thus, we can establish the following differential equation,

$$ \left\{ \begin{gathered} \omega (t) = \alpha \left[ {\gamma + (1 - \gamma )F(t)} \right] \hfill \\ \omega (t) = \frac{f(t)}{{1 - F(t)}} = \frac{{\frac{{{\text{d}}F(t)}}{{{\text{d}}t}}}}{1 - F(t)} \hfill \\ \end{gathered} \right. $$
(4)

where \(\gamma\) and \((1 - \gamma )\) denote the fractions of introducing faults in terms of the first and second types, respectively.

Solving the Eq. (4), we can derive the following distribution of fault introduction.

$$ F(t) = \frac{1 - \exp ( - \alpha t)}{{1 + \frac{1 - \gamma }{\gamma }\exp ( - \alpha t)}} = \frac{1 - \exp ( - \alpha t)}{{1 + \beta \exp ( - \alpha t)}},\;\;\;\alpha > 0,0 < \gamma \le 1,\;\beta = \frac{1 - \gamma }{\gamma } \ge 0 $$
(5)

where \(\beta\) is called as an inflection factor.

In the debugging process of open source software, software complexity is also an important factor affecting fault introduction. Herein, software complexity refers to the complexity of software itself and the complexity of software debugging environment. The former includes the complexity of software scale, algorithm complexity, logic complexity, architecture complexity, parallel storage and parallel computing, etc. The latter refers to the complexity of the debugger's personnel, the complexity of the debugging resources owned by the debugger, and the complexity of the debugger's debugging environment, etc. Because open source software is mainly completed by developers, users and volunteers, there is no fixed debugging personnel. The detected faults are randomly assigned to debuggers by developers. The skills, resources and experience of debuggers will affect the introduction of faults when detected faults in open source software are removed. Therefore, we can use a function to express the influence of software complexity on fault introduction in the debugging process of open source software, such as \(\varphi (t)\), that is assumed to be integrable in (0,t) and nonnegative, denotes the effect on \(\omega (t)\) for software complexity.

In consideration of minimizing the influence of too many parameters for \(\varphi (t)\), we use two parameters for software complexity function \(\varphi (t)\). Because the more parameters a function has, the more complex and difficult it is to estimate parameter values. We use the simple power-law function \(\varphi (t) = dt^{d - 1}\), which a good tradeoff is provides between flexibility and simplicity. Thus, we can extend Eq. (4) as follows,

$$ \left\{ \begin{gathered} \omega (t) = \alpha \left[ {\gamma + \left( {1 - \gamma } \right)F(t)} \right]\varphi (t) \hfill \\ \omega (t) = \frac{f(t)}{{1 - F(t)}} = \frac{{\frac{{{\text{d}}F(t)}}{{{\text{d}}t}}}}{1 - F(t)} \hfill \\ \varphi (t) = {\text{d}}t^{d - 1} \hfill \\ \end{gathered} \right. $$
(6)

Solving the Eq. (6), we can derive the following distribution of fault introduction considering software complexity.

$$ F(t) = \frac{{1 - \exp \left( { - \alpha t^{d} } \right)}}{{1 + \frac{1 - \gamma }{\gamma }\exp \left( { - \alpha t^{d} } \right)}} = \frac{{1 - \exp \left( { - \alpha t^{d} } \right)}}{{1 + \beta \exp \left( { - \alpha t^{d} } \right)}},\;\quad \alpha ,d > 0,0 < \gamma \le 1,\;\quad \beta = \frac{1 - \gamma }{\gamma } \ge 0 $$
(7)

where d is a shape parameter. Despite Eq. (7) is a simple structure, it is very flexible considering that fault introduction obeys the GISS distribution.

Substituting (2) and (3) into (1), then

$$ \omega (t) = \frac{{\alpha dt^{d - 1} }}{{1{ + }\beta \exp ( - \alpha t^{d} )}} $$
(8)

where in Eq. (8), \(\alpha\) represents the scale parameter. d is the shape parameter. \(\beta\) denotes an inflection factor.

In Eq. (8), when t goes to infinity, \(\omega (t) = \alpha dt^{d - 1}\). Thus, when \(t \to \infty\) and d < 1, \(\omega (t) = 0.\) When \(t \to \infty\) and d = 1, \(\omega (t) = \alpha d.\) When \(t \to \infty\) and d > 1, \(\omega (t)\) tends to infinity.

In Fig. 1a–c), the fault introduction rate \(\omega (t)\) shows the complex changes when parameters d and \(\beta\) change over time, respectively. From Fig. 1a, we can see that \(\omega (t)\) tends to zero when d < 1 and \(t \to \infty\). In Fig. 1a, when \(\beta { = }100\), the fault introduction rate function first increases and then decreases over time. In Fig. 1b, when d = 1, \(\omega (t)\) tends to the constant. In Fig. 1c, when d > 1, \(\omega (t)\) tends to infinity. Moreover, when \(\beta { = }100\), the fault introduction rate function shows an S-shaped change over time in Fig. 1c.

Fig. 1
figure 1

The changes of the fault introduction rate \(\omega (t)\) over time

Fault introduction which obeys the GISS distribution can represent a variety of complex changes, and can get with the complicated changes of actual fault introduction in the OSS debugging process. Thus, fault introduction with the GISS distribution is consistent with the situation of fault introduction in the actual OSS debugging process. The reliability model of OSS which fault introduction obeys the GISS distribution can fully satisfy the actual OSS reliability evaluation.

3.2 Proposed model

The assumptions of the proposed model are as follows,

  1. (1)

    The fault detection process obeys the non-homogeneous Poisson process (NHPP).

  2. (2)

    The number of fault detected in \((t,t + \Delta t)\) is proportional to the number of remaining faults in the software.

  3. (3)

    In the process of open source software debugging, new faults can be introduced when detected errors are eliminated.

  4. (4)

    Fault introduction obeys the GISS distribution.

  5. (5)

    The number of introduced faults is relevant to that of detected faults.

From Assumptions 1 and 2, the differential equations can be derived as follows,

$$ \frac{{{\text{d}}\mu (t)}}{{{\text{d}}t}} = \theta \left( {\psi \left( t \right) - \mu \left( t \right)} \right) $$
(9)

where \(\mu (t)\), \(\theta\) and \(\psi (t)\) denote the mean function and the content function, respectively. \(\theta\) denotes the fault detection rate.

From Assumptions 3, 4 and 5, the following equation can be derived,

$$ \psi (t) = \omega (t)\mu (t){ + }\eta $$
(10)

where \(\omega (t)\) denotes the fault introduction rate function, and \(\eta\) presents the expected number of initially detected faults.

Substituting (4) and (6) into (5), we can derive the following equation,

$$ \mu \left( t \right) = \frac{{\eta \exp \left( { - \theta t + \theta \alpha t^{d} } \right)\left( {1 + \beta \exp \left( { - \alpha t^{d} } \right)} \right)^{\theta } \left( {\exp \left( {\theta t} \right) - 1} \right)}}{{\left( {1 + \beta } \right)^{\theta } }} $$
(11)

Formula (11) is the developed model expression. Please see the Appendix for detailed derivation process.

4 Numerical examples

4.1 Illustrations of fault data sets, comparison models and model comparison criteria

In this paper, we collected three fault data sets from three projects of Apache products of OSS. Its website is https://issues.apache.org. Each OSS fault data set includes three successive fault data subsets. Please see Table 1 for details of the fault data sets. Note that detected faults of OSS are stored in the bug tracking system. Resolution of faults in the bug tracking system includes FIXED, INVALID, WONTFIX and DUPLICATE, etc. We remove faults whose resolutions are INVALID, WONTFIX and DUPLICATE, and the remaining faults are collected in our OSS fault data sets.

Table 1 OSS fault data sets

To fully verify the performance of the developed model, we compared the proposed model with other models using six model comparison standards [44] and five software reliability models. Six model comparison criteria include the mean square error (MSE), R2, RMSE, TS, Bias and AIC. Five software reliability models include the G–O model, Weibull distribution model, generalized inflection S-shaped (GISS) model, Wang model and Li model. The G–O model, Weibull distribution model and generalized inflection S-shaped (GISS) model are CSS reliability models. The Li model and Wang model are reliability models for OSS. Tables 2 and 3 list the detailed information on model comparison criteria and comparison models used in this paper, respectively. Note that the G–O model is used as a comparison model. We mainly consider that it is a classical CSS reliability model with a concave curve.

Table 2 Illustrations of model comparison criteria
Table 3 Illustrations of software reliability models

4.2 A parameter estimation method of the proposed model

In general, there are two common estimation methods of model parameters. Such as the maximum likelihood estimation (MLE) and least square estimation (LSE) methods. There are two reasons why we consider using LSE to estimate the parameters of the model. (1) Although in the case of large samples, the MLE is superior to LSE, in the terms of small samples, LSE is not inferior to MLE. Generally speaking, due to the limitation of test time, the collected fault data sets are all small samples. (2) It is possible that there is no maximum likelihood function value to estimate model parameters [8]. In order to be able to fully estimate the parameters of the model for model comparison, we use LSE and MlE to estimate the parameters of the model in this paper.

The least square estimation (LSE) method can be written as follows,

$$ L = \sum\limits_{i = 1}^{\kappa } {\left( {\mu \left( {t_{i} } \right) - \mu_{i} } \right)^{2} } $$
(12)

where \(\mu (t_{i} )\), \(\mu_{i}\) and \(\kappa\) represent the cumulative number of detected faults, the actual observed number of faults and the sample size, respectively.

Equation (12) calculates the partial differential.

$$ \frac{\partial \left( L \right)}{{\partial \alpha }} = \frac{\partial \left( L \right)}{{\partial d}} = \frac{\partial \left( L \right)}{{\partial \theta }} = \frac{\partial \left( L \right)}{{\partial \eta }} = \frac{\partial \left( L \right)}{{\partial \beta }} = 0 $$
(13)

The maximum likelihood estimation (MLE) method can be denoted as follows,

$$ LL = \Pr \left\{ {N\left( {t_{1} } \right) = n_{1} ,N\left( {t_{2} } \right) = n_{2} , \ldots ,N\left( {t_{k} } \right) = n_{k} } \right\} = \prod\limits_{i = 1}^{k} {\frac{{\left[ {\mu \left( {t_{i} } \right) - \mu \left( {t_{i - 1} } \right)} \right]^{{(n_{i} - n_{i - 1} )}} \exp \left[ {\mu \left( {t_{i - 1} } \right) - \mu \left( {t_{i} } \right)} \right]}}{{\left( {n_{i} - n_{i - 1} } \right)!}}} $$
(14)

where \(N(t_{i} )\) and \(n_{i}\) denote a count process and the number of actual detected faults, respectively.

The model Parameters’ values can be estimated by partial differential on both sides of Eq. (14),

$$ \frac{\partial (LL)}{{\partial \alpha }} = \frac{\partial (LL)}{{\partial d}} = \frac{\partial (LL)}{{\partial \theta }} = \frac{\partial (LL)}{{\partial \eta }} = \frac{\partial (LL)}{{\partial \beta }} = 0 $$
(15)

By solving the differential Eqs. (9,11), the estimated parameter value (\(\alpha {*,}\beta {*,}\eta {*,}\theta {\text{*,d*}}\)) of the developed model can be obtained using LSE and MLE, respectively.

4.3 Model performance comparison for model parameters estimation using LSE

In terms of goodness-of-fit, from Tables 4, 5, 6, 7, 8 and 9, as can be seen that, in terms of the fitting performance, the developed model is the best among models. In Table 4, we can see that MSE of the developed model is nearly 2.3 times as small as that of the G–O model using 100% of data for DS1-1. Table 5 shows that RMSE of the Weibull distribution model is nearly 1.74 times as larger as that of the developed model using 100% of data for DS1-2. Table 6 indicates that MSE of the developed model is about 1.44 times as small as that of the Li model using 100% of data for DS1-3. In Table 7, although R2 of the Weibull distribution model is approximately equal to that of the proposed, other metrics (i.e. MSE, RMSE, TS and Bias) are larger than those of the developed model using 100% of data for DS2-1. Similarly, from Table 8, R2 of the developed model is approximately the same as that in the Weibull distribution model, but MSE, RMSE, TS and Bias of the developed model are lower than those of the Weibull distribution model using 100% of data for DS2-2. From Table 9, we can see that MSE of the developed model is about 1.82 times as little as that of the GISS model using 100% of data for DS2-3.

Table 4 Model comparison results using AIRFOLW 1.10.1 (DS1-1)
Table 5 Model comparison results using AIRFOLW 1.10.2 (DS1-2)
Table 6 Model comparison results using AIRFOLW 1.10.3 (DS1-3)
Table 7 Model comparison results using Jena 3.6.0 (DS2-1)
Table 8 Model comparison results using Jena 3.7.0 (DS2-2)
Table 9 Model comparison results using Jena 3.8.0 (DS2-3)

On the whole, the MSE, RMSE, TS and Bias values of the developed model are lower than those of other models using 100% of DS1 and DS2, respectively. Moreover, the R2 values of the developed model are larger than those of other models using 100% of DS1 and DS2, respectively. Thus, the established model has the better fitting power than other models. From Fig. 2a–f, we can see clearly that the fitting power of the developed model is better than other models.

Fig. 2
figure 2

Plots of the cumulative number of detected faults over time. a, b and c Cumulative number of detected faults using 100% of DS1-1, DS1-2 and DS1-3, respectively. d, e and f Cumulative number of detected faults using 100% of DS2-1, DS2-2 and DS2-3, respectively

In terms of prediction, from Tables 4, 5, 6, 7, 8 and 9, the predictive power of the established model is the best among all models. Table 4 shows that MSEpredict of the developed model is nearly 6.69 times less than that of the GISS model using 90% of data for DS1-1. From Table 5, we can see that RMSE of the proposed model is approximately 1.29 times less than that of the Li model using 90% of data for DS1-2. Table 6 shows that TS of the GISS model is about 2.3 times larger than that of the developed model using 90% of data for DS1-3. Table 7 shows that RMSE of the G–O model is about 1.24 times as big as that of the developed model using 90% of data for DS2-1. Table 8 shows that MSEpredict of the Weibull distribution model is approximately 3.5 times larger than that of the developed model using 90% of data for DS2-2. From Table 9, we can see that TS of the developed model is about three times less than that of the GISS model using 95% of data for DS2-3.

In Tables 4, 5, 6, 7, 8 and 9, the MSEpredict, RMSE, TS and Bias values of the developed model are less than those of other models using 90% of DS1, DS2-1 and DS2-2, respectively. Furthermore, the MSEpredict, RMSE, TS and Bias values of the developed model are less than those of other models using 95% of DS2-3, respectively. Therefore, the predictive power of the developed model is better predictive power than that of other models. From Fig. 3a–f, we can see clearly that the predictive power of the developed model is the best among models used in this paper. Note that 95% of the fault data sets (DS2-3) are used, mainly because the random selection of fault data sets can more fairly compare the power of the model than the subjective selection of fault data sets.

Fig. 3
figure 3

Plots of the cumulative number of detected faults over time. a, b and c Cumulative number of detected faults using 90% of DS1-1, DS1-2 and DS1-3, respectively. d, e and f Cumulative number of detected faults using 90% of DS2-1, 90% of DS2-2 and 95% of DS2-3, respectively

Overall, the fitting and predictive performance of the Weibull distribution model and the GISS model is better than other models except for the developed model. It also verifies that the CSS reliability model based on the Weibull distribution can be used to assess the reliability of OSS [10]. However, the reliability models of OSS, such as Wang model and Li model does not perform well. It is more advantageous to explain the complexity of OSS, especially the different forms of fault introduction in different open source development environments. Because fault introduction of the developed model has many forms, the proposed model which well meets the changes of OSS fault introduction has better adaptive ability than other models used in this paper. Therefore, the developed model has good adaptability and robustness, which can assist developers to assess the actual OSS reliability in the development and testing processes.

4.4 Model performance comparison for model parameters estimation using MLE

In order to compare AIC values, we use the third fault data set to compare the model performance, and use MLE method to estimate the model parameter values. From Table 10, MSE and AIC values of the developed model are less than those of other models and R2 values of the developed model are larger than those of other models. R2 values of the developed model are larger than 0.9. These show which the fitting power of the developed model is better than those of other models. Except for DS3-2, MSE, R2 and AIC values of other models are very close. In DS3-2, the AIC value of the G–O model is closed to that of the proposed model, and the R2 value of the Weibull distribution model is close to that of the developed model.

Table 10 Model performance comparisons using MLE for 100% of data (DS3)

From Table 11, we can see that MSE and AIC values of the proposed model are less than those of other models. In DS3-1, the AIC value of the G–O model is closed to that of the developed model. In DS3-2, the MSE value of the Li model is close to that of the developed model and the AIC value of the Weibull distribution model is close to that of the developed model. In DS3-3, MSE values of other models are very close and larger than those of the developed model.

Table 11 Model performance comparisons using MLE for 90% of data (DS3)

In summary, through the above comparison, it can be concluded that the fitting and prediction performance of the developed model is better than other models. And the fitting and prediction power of other models is not stable. This shows which the developed model has good adaptability and flexibility in the complex environment of OSS testing and development, while other models cannot work well in the complex environment. Finally, it should be noted that when using MLE to estimate the model parameters for the Wang model, there is no existing maximum likelihood function value. So we didn't compare the model in Tables 10, 11.

5 Sensitivity analysis

Sensitivity analysis is carried out by changing a parameter and fixing other parameters of the proposed model. From Fig. 4, we can see that parameter changes of the developed model. The expected number of initially detected faults \(\left( \eta \right)\), the fault detection rate \(\left( \theta \right)\), the fault introduction rate \(\left( \alpha \right)\) and the shape parameter (d) for the proposed model have a major impact.

Fig. 4
figure 4

Plots of sensitivity analysis of the developed model using 100% of DS1-2. a, b and c Parameters \(\left( {\eta ,\theta ,\alpha ,d,\beta } \right)\) of the developed model change using 100% of DS1-2, respectively

There are mainly the following reasons,

  1. (1)

    The number of faults in OSS has an important influence on OSS Therefore, estimating the total number of faults in OSS is the basis of establishing the OSS reliability model.

  2. (2)

    The change of fault detection rate has an important influence on the estimation of the number of detected faults in the OSS.

  3. (3)

    Fault introduction is a problem that must be considered in reliability modeling of OSS. The change of the fault introduction rate directly affects the power of the OSS reliability model.

  4. (4)

    Owing to the dynamics, complexity and diversity of the OSS development, and to ensure the robustness and stability of the developed model, the shape parameters of the proposed model must be considered.

In addition, parameter \(\beta\) is not an important parameter. Owing to the diversity and complexity of OSS testing and development environments, the cumulative number curve of introduced faults can show a variety of complex changes, not necessarily S-shaped curve. Parameter \(\beta\) is also called an inflection factor. Therefore, it can be seen from this point that the fault introduction changes of OSS and CSS are different.

6 Implication of the study

The implication of this study has the following points,

  1. (1)

    Because OSS is mainly completed by users and volunteers in the open source community. The dynamic changes of users and volunteers cause the complexity and uncertainty of OSS testing and development. In view of the complexity of OSS testing and development environment, we conduct corresponding research and propose an effective OSS reliability model. This study lays a foundation for OSS reliability modeling under complex environments.

  2. (2)

    Owing to the uncertainty and complexity of OSS testing and development, the introduction of faults in the OSS debugging process shows a variety of changes. This study studies many changes of fault introduction in the OSS testing and development process. Meanwhile, fault introduction obeying the GISS distribution is proposed, which lays a foundation for various change modeling of fault introduction in the future.

  3. (3)

    The existing OSS reliability models rely too much on specific OSS development and testing environment, so their applicability and adaptability are very poor. The model proposed in this study fully considers the complicated changes of OSS testing and development environment, and an OSS reliability model with the GISS distribution is established. Because fault introduction obeys the GISS distribution, it can well simulate various changes of actual fault introduction. Therefore, the developed model can be effectively used to the reliability evaluation of actual OSS testing and development. In a sense, the modeling process of the developed model also provides methods and guidance for the further research on the reliability of OSS in the future.

7 Threats to validity

The threats to validity of the developed model have two factors. One is external factors, the other is internal factors.

External factors: To evaluate the performance of OSS reliability model, more kinds of fault data sets should be selected. However, we have selected three OSS fault data sets to fully evaluate the power of software reliability models. Three OSS fault data sets can meet the basic experimental requirements.

Internal factors: we used Taylor formula to simplify the equation and gave an approximate solution. In order to simplify the calculation, we assume that the failure detection rate is constant. Owing to the complexity of OSS reliability modeling, it is very beneficial to simplify the calculation properly. Moreover, OSS reliability modeling is also a trade-off between complex modeling and practical simplified use.

8 Conclusions

In this paper, we develop the OSS reliability model considering that fault introduction obeys the GISS distribution. Because fault introduction obeys the GISS distribution, the fault introduction rate can be represented by complex nonlinear changes, such as decreasing, increasing first and then decreasing over time. Therefore, the developed model can get with a variety of different reliability evaluation of OSS. To verify the adaptive and robust performance of the developed model, we use three OSS fault data sets, six model comparison criteria and five comparison models for sufficient experiments. The experimental results show that the developed model has best fitting and prediction performance among all models. We also carry out the relevant model parameter sensitivity analysis, and the results show that the parameters of the developed model, such as the expected number of initially detected faults \(\left( \eta \right)\), the fault detection rate \(\left( \theta \right)\), the fault introduction rate \(\left( \alpha \right)\) and the shape parameter (d), have an important impact. The developed model can be used to assess the reliability for OSS, and it can also assist developers to evaluate OSS reliability.

Due to the complex and dynamic changes of OSS testing, debugging and development, we will consider various random changes of fault introduction to develop the corresponding OSS reliability model in the future.