Abstract
Recently, the open source software (OSS) reliability has become one of hot issues. Owing to the uncertainty and complexity of OSS development, testing and debugging environments, OSS are completed dynamically. When detected faults are removed for OSS, they are likely to introduce new faults. Moreover, under the different OSS debugging environments, fault introduction will show different changes. For example, the fault introduction rate shows a decrease change, or increasing first and then decreasing change over time. Considering the complex and dynamic changes in fault introduction, an OSS reliability model that fault introduction obeys a generalized inflection Sshaped distribution is proposed in this paper. Experimental results indicate that the fitting and predictive performance of the proposed model is good. The established model in this paper can adapt the dynamical and complicated changes of fault introduction during OSS debugging. Moreover, the established model can accurately forecast the number of remaining faults in OSS, and assist developers to evaluate the actual OSS reliability.
Article highlights

An OSS reliability model considering that fault introduction obeys GISS distribution is proposed.

The developed model can be used to assess the reliability of OSS under complicated environments.

The developed model can get with the complex testing and development environment of OSS.
Introduction
During the last several decades, OSS has been gradually accepted and used by people. The development mode of OSS is quite different from that of closed source software (CSS). CSS has hierarchical management in the development and testing processes, and is completed according to the prearranged plan and target. Therefore, developers, testers and debuggers are relatively stable during development and testing. However, OSS is developed, tested and debugged dynamically by developers, users and volunteers in the network and open environment. In order to improve the OSS reliability, the industry generally uses the method of frequent release to improve it. But there are two problems. One is that if the OSS is released too early, there will be many faults in the software. It will seriously affect the use of OSS. Second, if OSS is released too late, users and volunteers will be impatient to wait and turn to other OSS to replace it. Therefore, the reliability of OSS will be widely questioned.
In order to solve the problem of OSS reliability evaluation, some researchers have developed a few reliability models of OSS. For instance, Tamura and Yamada [1] established a reliability model of OSS using stochastic differential equation. Li et al. [2] observed that the error detection rate of OSS increased first and then decreased, and established the corresponding OSS reliability model. Yang et al. [3] built delayed OSS reliability models considering the relation between fault introduction and fault detection. Aiming at the debugging activities of OSS, Lin and Li [4] established an OSS reliability model with a ratebased queue theory. Huang et al. [5] proposed a reliability model of OSS considering fault detection with a bounded generalized Pareto distribution. Singh et al. [6] built a multiversion reliability model for OSS based on entropy and an optimal release strategy considering user and volunteer satisfaction level. Wang and Mi [7] established a reliability model for OSS considering that the fault detection rate has a decreasing change trend.
Although the reliability model for OSS can effectively assess the OSS reliability under certain open source conditions, owing to the variability and complexity of OSS development, testing and debugging environments, the established reliability model for OSS cannot fully satisfy the actual OSS reliability assessment.
In the development and testing processes of CSS, there are static testing methods, such as peer review, walk through and inspection, to ensure the quality improvement of CSS. When faults are detected, developers (or debuggers) and testers can communicate face to face and discuss the causes of faults and how to remove them. Thus, in the CSS testing environment, faults can be well described and completely removed. However, faults detected during the OSS test are passed to the developers by users and volunteers through the network, and then the developers organize personnel to remove the faults. In the open environment, users or volunteers cannot clearly describe fault information to developers, which will cause developers to remove the fault incompletely and new faults are introduced. Moreover, in the fault tracking system of OSS, the fault state will be modified from closed to reopened. This indicates that the removed faults have not been completely eliminated, or new faults may have been introduced. Therefore, in the process of reliability modeling of OSS, it certainly needs the reasonable and effective study of the phenomenon of fault introduction in the OSS debugging process.
Considering the complexity and nonlinear changes of fault introduction during OSS debugging, the fault introduction rate will show the change of decreasing, or increasing first and then decreasing over time, etc. Therefore, assuming that fault introduction obeys a single law of change, it does not accord with the actual situation of fault introduction in the process of OSS debugging. The reliability model of OSS established by this method cannot satisfy the actual reliability assessment of OSS. At least, the adaptability of the reliability model of OSS established by this method is very poor. It cannot adapt to the complex reliability assessment of OSS.
Additionally, because OSS is very different from CSS in the testing and development processes, the software is evolved through the active participation of volunteers and users located at different geographical locations. Moreover, each release of OSS will add some features, functions or components compared with the previous version. Those changes in OSS lead to the complexity of fault introduction. One is the introduction of new faults for fault removal in the current version of software, and the other is the newly generated faults caused by newly added features, functions and components. When the newly generated fault is removed, a new fault will also be introduced. These two kinds of introduced faults make the debugging of OSS complex. Considering those characteristics, we can use the GISS distribution to simulate the complex changes of introduced faults for OSS. Since the GISS distribution can well simulate the complex changes of introduced faults for OSS, the proposed model in this paper is different from the CSS reliability model which fault detection obeys the GISS distribution in the existing literature [8]. Thus, the established model can fully get with the OSS testing and development environment, and effectively carry out residual fault prediction and software reliability evaluation in OSS.
In this paper, we develop an OSS reliability model considering that fault introduction obeys a generalized inflection Sshaped (GISS) distribution. Assuming that the fault introduction obeys the GISS distribution, the fault introduction rate will show a variety of complex nonlinear changes, for example, the fault introduction rate increases first, then decreases over time; the fault introduction rate decreases or increase over time, and linear changes, for instance, the fault introduction rate is a constant. The OSS reliability model established by this method will have strong adaptability and robustness, and it can adapt to the fault introduction changes during development, testing and debugging of various OSS. Therefore, the established model can be used to assess the OSS reliability.
The contributions of this paper are as follows,

(1)
To the best of our knowledge, it is the first to propose that fault introduction obeys GISS distribution.

(2)
The proposed model can adapt to the complicated changes of fault introduction in the process of actual OSS development, testing and debugging.

(3)
The proposed model can be used to assess the reliability of actual OSS projects.

(4)
We solve the problem of establishing software reliability model with various fault introduction changes under complex open source environments.
The structure of this paper is organized as follows,
Related work is introduced in Sect. 2. Section 3 presents the fault introduction rate with the GISS distribution, and introduces the proposed model. Section 4 introduces OSS fault data sets, model comparison criteria and parameter estimation method of the proposed model. Experiments of model performance comparison are done in Sect. 4. Sensitivity analysis of the proposed model parameters is done in Sect. 5. The implication of the study is discussed in Sect. 6. Section 7 is threats to validity of the developed model. Last section is conclusions on this paper.
Related work
In the past of decades, the reliability of OSS has become one of the hot issues in the industry. “Release early, release often.” [9] has become a way for developers to improve the OSS reliability. But it faces two problems: (1) if the software is released too early, there will be too many faults in the software, and users will lose interest in using the software. As a result, the software cannot be widely tested and used. (2) If the software is released too late, users will use other open source software to replace the software. After the software was released, it was not used, tested and abandoned.
By studying the change rule of fault data of OSS, researchers propose that the reliability model of CSS can also be applied to evaluate the OSS reliability. For example, Zou and Davis [10] proposed that the reliability models of CSS can evaluate the OSS reliability, and the reliability model of CSS based on the Weibull distribution has the better fitting and predictive performance than other CSS reliability models. Anbalagan and Vouk [11] proposed that traditional reliability models can evaluate OSS reliability by studying problem reports. Similarly, Rossi et al. [12] study bug repositories and think that the reliability of OSS can be evaluated by traditional CSS reliability models. Chiu et al. [13] proposed a software reliability model in consideration of learning effects of fault detection. Okamura et al. [14] studied the distribution of failure time and established a software reliability model based on the normal distribution. Additionally, considering the weighted value changes of support vector regression (SVR), Utkin and Coolen [15] proposed a corresponding software reliability model. Wang and Zhang [16] used a deep learning method with RNN encoderdecoder to predict software reliability. Ke and Huang [17] developed software reliability models based on changing points of testing efforts.
Because the development and testing environment of OSS is completely different from that of CSS, the method of evaluating the reliability of OSS by using CSS reliability model is widely questioned. Therefore, researchers have proposed some reliability models of OSS. For instance, Tamura and Yamada [18] established an OSS reliability model of the logarithmic Poisson execution time and used the Analytic Hierarchy process to estimate the model parameters. Gyimothy et al. [19] used various objectoriented metrics to study fault prediction for OSS. Tamura and Yamada [20] used neural networks to build OSS reliability model and gave an optimal software release method. SyedMohamad and Mcbride [21] indicate that the testing and development process of each OSS is different, and the reliability model can be established according to the specific situation. Tamura and Yamada [22] developed an OSS reliability model using the deterministic chaos theory.
Singh et al. [23] built a reliability model of OSS considering different kinds of faults in the fault data sets. Singh et al. [24] proposed an OSS reliability model based on changepoint. Considering the dynamic changes of OSS development and testing, some researchers use a random differential equation to build the corresponding OSS reliability model [1, 25,26,27]. Considering the frequent release of OSS, some researchers proposed a few reliability models of multiversion or multirelease OSS [6, 28,29,30]. Tamura et al. [31] proposed a reliability model of open source cloud computing based on jump diffusion. Tamura [32] proposed a reliability model of open source cloud computing considering the irregular fluctuation of the fault detection rate. Tandon et al. [33] developed multirelease OSS reliability models based on entropy. Through studying changingpoint changes, Kapur et al. [34] developed twodimensional OSS reliability models. Ivanov et al. [35] used the goal question metric (GQM) method to evaluate and forecast software reliability under open source situation considering mobile operating environments. Additionally, Barack and Huang [36] used a few software reliability models to evaluate and forecast the reliability of OSS. Wang [37] proposed an OSS reliability model taking account into fault introduction based on the Pareto distribution.
Yang et al. [38] built a reliability framework based on a change point for OSS using masked data, and used expectation maximization algorithm to solve the likelihood function as estimating model parameter values. Considering imperfect debugging and changingpoint, Saraf et al. [39] developed a multirelease framework on fault correction and fault detection for OSS. Sun and Li [40] proposed a few software imperfect debugging models taking into account fault introduction and fault levels.
Owing to the complexity and dynamic change of the OSS test and development, software reliability model cannot be used in all OSS reliability assessment. Thus, Ullah and Morisio [41] used the optimal selecting model method to determine which software reliability model is used to estimate the current OSS reliability. Considering the complexity of fault detection during software development and tests, Raghuvanshi et al. [42] established a timevariant software reliability model. In addition, considering the dynamic changes of OSS testing and development processes, Saraf et al. [43]. developed multirelease reliability models with imperfect debugging and changepoint for OSS. To evaluate the OSS reliability in various complex situations, we develop an OSS model which can adapt to various open source environments and considers fault introduction.
Building OSS reliability model
Fault introduction with a generalized inflection Sshaped distribution
Suppose that the fault introduction follows the GISS distribution, then the fault introduction rate can be expressed as follows,
where \(\omega (t)\) presents the fault introduction rate function. F(t) denotes the GISS distribution. d represents a shape parameter. \(\alpha\) and \(\beta\) are scale parameters.
In general, two main fault types for open source software are stochastically dependent and mutually independent faults. When independent faults are removed, the possibility of introducing new faults is relatively small. When interdependent faults are removed, it is more likely to introduce new faults, or they cannot be completely removed. In other words, in the debugging process of open source software, the possibility of introducing new faults when simple faults are removed is less than that when complex faults are removed. Thus, we can establish the following differential equation,
where \(\gamma\) and \((1  \gamma )\) denote the fractions of introducing faults in terms of the first and second types, respectively.
Solving the Eq. (4), we can derive the following distribution of fault introduction.
where \(\beta\) is called as an inflection factor.
In the debugging process of open source software, software complexity is also an important factor affecting fault introduction. Herein, software complexity refers to the complexity of software itself and the complexity of software debugging environment. The former includes the complexity of software scale, algorithm complexity, logic complexity, architecture complexity, parallel storage and parallel computing, etc. The latter refers to the complexity of the debugger's personnel, the complexity of the debugging resources owned by the debugger, and the complexity of the debugger's debugging environment, etc. Because open source software is mainly completed by developers, users and volunteers, there is no fixed debugging personnel. The detected faults are randomly assigned to debuggers by developers. The skills, resources and experience of debuggers will affect the introduction of faults when detected faults in open source software are removed. Therefore, we can use a function to express the influence of software complexity on fault introduction in the debugging process of open source software, such as \(\varphi (t)\), that is assumed to be integrable in (0,t) and nonnegative, denotes the effect on \(\omega (t)\) for software complexity.
In consideration of minimizing the influence of too many parameters for \(\varphi (t)\), we use two parameters for software complexity function \(\varphi (t)\). Because the more parameters a function has, the more complex and difficult it is to estimate parameter values. We use the simple powerlaw function \(\varphi (t) = dt^{d  1}\), which a good tradeoff is provides between flexibility and simplicity. Thus, we can extend Eq. (4) as follows,
Solving the Eq. (6), we can derive the following distribution of fault introduction considering software complexity.
where d is a shape parameter. Despite Eq. (7) is a simple structure, it is very flexible considering that fault introduction obeys the GISS distribution.
Substituting (2) and (3) into (1), then
where in Eq. (8), \(\alpha\) represents the scale parameter. d is the shape parameter. \(\beta\) denotes an inflection factor.
In Eq. (8), when t goes to infinity, \(\omega (t) = \alpha dt^{d  1}\). Thus, when \(t \to \infty\) and d < 1, \(\omega (t) = 0.\) When \(t \to \infty\) and d = 1, \(\omega (t) = \alpha d.\) When \(t \to \infty\) and d > 1, \(\omega (t)\) tends to infinity.
In Fig. 1a–c), the fault introduction rate \(\omega (t)\) shows the complex changes when parameters d and \(\beta\) change over time, respectively. From Fig. 1a, we can see that \(\omega (t)\) tends to zero when d < 1 and \(t \to \infty\). In Fig. 1a, when \(\beta { = }100\), the fault introduction rate function first increases and then decreases over time. In Fig. 1b, when d = 1, \(\omega (t)\) tends to the constant. In Fig. 1c, when d > 1, \(\omega (t)\) tends to infinity. Moreover, when \(\beta { = }100\), the fault introduction rate function shows an Sshaped change over time in Fig. 1c.
Fault introduction which obeys the GISS distribution can represent a variety of complex changes, and can get with the complicated changes of actual fault introduction in the OSS debugging process. Thus, fault introduction with the GISS distribution is consistent with the situation of fault introduction in the actual OSS debugging process. The reliability model of OSS which fault introduction obeys the GISS distribution can fully satisfy the actual OSS reliability evaluation.
Proposed model
The assumptions of the proposed model are as follows,

(1)
The fault detection process obeys the nonhomogeneous Poisson process (NHPP).

(2)
The number of fault detected in \((t,t + \Delta t)\) is proportional to the number of remaining faults in the software.

(3)
In the process of open source software debugging, new faults can be introduced when detected errors are eliminated.

(4)
Fault introduction obeys the GISS distribution.

(5)
The number of introduced faults is relevant to that of detected faults.
From Assumptions 1 and 2, the differential equations can be derived as follows,
where \(\mu (t)\), \(\theta\) and \(\psi (t)\) denote the mean function and the content function, respectively. \(\theta\) denotes the fault detection rate.
From Assumptions 3, 4 and 5, the following equation can be derived,
where \(\omega (t)\) denotes the fault introduction rate function, and \(\eta\) presents the expected number of initially detected faults.
Substituting (4) and (6) into (5), we can derive the following equation,
Formula (11) is the developed model expression. Please see the Appendix for detailed derivation process.
Numerical examples
Illustrations of fault data sets, comparison models and model comparison criteria
In this paper, we collected three fault data sets from three projects of Apache products of OSS. Its website is https://issues.apache.org. Each OSS fault data set includes three successive fault data subsets. Please see Table 1 for details of the fault data sets. Note that detected faults of OSS are stored in the bug tracking system. Resolution of faults in the bug tracking system includes FIXED, INVALID, WONTFIX and DUPLICATE, etc. We remove faults whose resolutions are INVALID, WONTFIX and DUPLICATE, and the remaining faults are collected in our OSS fault data sets.
To fully verify the performance of the developed model, we compared the proposed model with other models using six model comparison standards [44] and five software reliability models. Six model comparison criteria include the mean square error (MSE), R^{2}, RMSE, TS, Bias and AIC. Five software reliability models include the G–O model, Weibull distribution model, generalized inflection Sshaped (GISS) model, Wang model and Li model. The G–O model, Weibull distribution model and generalized inflection Sshaped (GISS) model are CSS reliability models. The Li model and Wang model are reliability models for OSS. Tables 2 and 3 list the detailed information on model comparison criteria and comparison models used in this paper, respectively. Note that the G–O model is used as a comparison model. We mainly consider that it is a classical CSS reliability model with a concave curve.
A parameter estimation method of the proposed model
In general, there are two common estimation methods of model parameters. Such as the maximum likelihood estimation (MLE) and least square estimation (LSE) methods. There are two reasons why we consider using LSE to estimate the parameters of the model. (1) Although in the case of large samples, the MLE is superior to LSE, in the terms of small samples, LSE is not inferior to MLE. Generally speaking, due to the limitation of test time, the collected fault data sets are all small samples. (2) It is possible that there is no maximum likelihood function value to estimate model parameters [8]. In order to be able to fully estimate the parameters of the model for model comparison, we use LSE and MlE to estimate the parameters of the model in this paper.
The least square estimation (LSE) method can be written as follows,
where \(\mu (t_{i} )\), \(\mu_{i}\) and \(\kappa\) represent the cumulative number of detected faults, the actual observed number of faults and the sample size, respectively.
Equation (12) calculates the partial differential.
The maximum likelihood estimation (MLE) method can be denoted as follows,
where \(N(t_{i} )\) and \(n_{i}\) denote a count process and the number of actual detected faults, respectively.
The model Parameters’ values can be estimated by partial differential on both sides of Eq. (14),
By solving the differential Eqs. (9,11), the estimated parameter value (\(\alpha {*,}\beta {*,}\eta {*,}\theta {\text{*,d*}}\)) of the developed model can be obtained using LSE and MLE, respectively.
Model performance comparison for model parameters estimation using LSE
In terms of goodnessoffit, from Tables 4, 5, 6, 7, 8 and 9, as can be seen that, in terms of the fitting performance, the developed model is the best among models. In Table 4, we can see that MSE of the developed model is nearly 2.3 times as small as that of the G–O model using 100% of data for DS11. Table 5 shows that RMSE of the Weibull distribution model is nearly 1.74 times as larger as that of the developed model using 100% of data for DS12. Table 6 indicates that MSE of the developed model is about 1.44 times as small as that of the Li model using 100% of data for DS13. In Table 7, although R^{2} of the Weibull distribution model is approximately equal to that of the proposed, other metrics (i.e. MSE, RMSE, TS and Bias) are larger than those of the developed model using 100% of data for DS21. Similarly, from Table 8, R^{2} of the developed model is approximately the same as that in the Weibull distribution model, but MSE, RMSE, TS and Bias of the developed model are lower than those of the Weibull distribution model using 100% of data for DS22. From Table 9, we can see that MSE of the developed model is about 1.82 times as little as that of the GISS model using 100% of data for DS23.
On the whole, the MSE, RMSE, TS and Bias values of the developed model are lower than those of other models using 100% of DS1 and DS2, respectively. Moreover, the R^{2} values of the developed model are larger than those of other models using 100% of DS1 and DS2, respectively. Thus, the established model has the better fitting power than other models. From Fig. 2a–f, we can see clearly that the fitting power of the developed model is better than other models.
In terms of prediction, from Tables 4, 5, 6, 7, 8 and 9, the predictive power of the established model is the best among all models. Table 4 shows that MSE_{predict} of the developed model is nearly 6.69 times less than that of the GISS model using 90% of data for DS11. From Table 5, we can see that RMSE of the proposed model is approximately 1.29 times less than that of the Li model using 90% of data for DS12. Table 6 shows that TS of the GISS model is about 2.3 times larger than that of the developed model using 90% of data for DS13. Table 7 shows that RMSE of the G–O model is about 1.24 times as big as that of the developed model using 90% of data for DS21. Table 8 shows that MSE_{predict} of the Weibull distribution model is approximately 3.5 times larger than that of the developed model using 90% of data for DS22. From Table 9, we can see that TS of the developed model is about three times less than that of the GISS model using 95% of data for DS23.
In Tables 4, 5, 6, 7, 8 and 9, the MSE_{predict}, RMSE, TS and Bias values of the developed model are less than those of other models using 90% of DS1, DS21 and DS22, respectively. Furthermore, the MSE_{predict}, RMSE, TS and Bias values of the developed model are less than those of other models using 95% of DS23, respectively. Therefore, the predictive power of the developed model is better predictive power than that of other models. From Fig. 3a–f, we can see clearly that the predictive power of the developed model is the best among models used in this paper. Note that 95% of the fault data sets (DS23) are used, mainly because the random selection of fault data sets can more fairly compare the power of the model than the subjective selection of fault data sets.
Overall, the fitting and predictive performance of the Weibull distribution model and the GISS model is better than other models except for the developed model. It also verifies that the CSS reliability model based on the Weibull distribution can be used to assess the reliability of OSS [10]. However, the reliability models of OSS, such as Wang model and Li model does not perform well. It is more advantageous to explain the complexity of OSS, especially the different forms of fault introduction in different open source development environments. Because fault introduction of the developed model has many forms, the proposed model which well meets the changes of OSS fault introduction has better adaptive ability than other models used in this paper. Therefore, the developed model has good adaptability and robustness, which can assist developers to assess the actual OSS reliability in the development and testing processes.
Model performance comparison for model parameters estimation using MLE
In order to compare AIC values, we use the third fault data set to compare the model performance, and use MLE method to estimate the model parameter values. From Table 10, MSE and AIC values of the developed model are less than those of other models and R^{2} values of the developed model are larger than those of other models. R^{2} values of the developed model are larger than 0.9. These show which the fitting power of the developed model is better than those of other models. Except for DS32, MSE, R^{2} and AIC values of other models are very close. In DS32, the AIC value of the G–O model is closed to that of the proposed model, and the R^{2} value of the Weibull distribution model is close to that of the developed model.
From Table 11, we can see that MSE and AIC values of the proposed model are less than those of other models. In DS31, the AIC value of the G–O model is closed to that of the developed model. In DS32, the MSE value of the Li model is close to that of the developed model and the AIC value of the Weibull distribution model is close to that of the developed model. In DS33, MSE values of other models are very close and larger than those of the developed model.
In summary, through the above comparison, it can be concluded that the fitting and prediction performance of the developed model is better than other models. And the fitting and prediction power of other models is not stable. This shows which the developed model has good adaptability and flexibility in the complex environment of OSS testing and development, while other models cannot work well in the complex environment. Finally, it should be noted that when using MLE to estimate the model parameters for the Wang model, there is no existing maximum likelihood function value. So we didn't compare the model in Tables 10, 11.
Sensitivity analysis
Sensitivity analysis is carried out by changing a parameter and fixing other parameters of the proposed model. From Fig. 4, we can see that parameter changes of the developed model. The expected number of initially detected faults \(\left( \eta \right)\), the fault detection rate \(\left( \theta \right)\), the fault introduction rate \(\left( \alpha \right)\) and the shape parameter (d) for the proposed model have a major impact.
There are mainly the following reasons,

(1)
The number of faults in OSS has an important influence on OSS Therefore, estimating the total number of faults in OSS is the basis of establishing the OSS reliability model.

(2)
The change of fault detection rate has an important influence on the estimation of the number of detected faults in the OSS.

(3)
Fault introduction is a problem that must be considered in reliability modeling of OSS. The change of the fault introduction rate directly affects the power of the OSS reliability model.

(4)
Owing to the dynamics, complexity and diversity of the OSS development, and to ensure the robustness and stability of the developed model, the shape parameters of the proposed model must be considered.
In addition, parameter \(\beta\) is not an important parameter. Owing to the diversity and complexity of OSS testing and development environments, the cumulative number curve of introduced faults can show a variety of complex changes, not necessarily Sshaped curve. Parameter \(\beta\) is also called an inflection factor. Therefore, it can be seen from this point that the fault introduction changes of OSS and CSS are different.
Implication of the study
The implication of this study has the following points,

(1)
Because OSS is mainly completed by users and volunteers in the open source community. The dynamic changes of users and volunteers cause the complexity and uncertainty of OSS testing and development. In view of the complexity of OSS testing and development environment, we conduct corresponding research and propose an effective OSS reliability model. This study lays a foundation for OSS reliability modeling under complex environments.

(2)
Owing to the uncertainty and complexity of OSS testing and development, the introduction of faults in the OSS debugging process shows a variety of changes. This study studies many changes of fault introduction in the OSS testing and development process. Meanwhile, fault introduction obeying the GISS distribution is proposed, which lays a foundation for various change modeling of fault introduction in the future.

(3)
The existing OSS reliability models rely too much on specific OSS development and testing environment, so their applicability and adaptability are very poor. The model proposed in this study fully considers the complicated changes of OSS testing and development environment, and an OSS reliability model with the GISS distribution is established. Because fault introduction obeys the GISS distribution, it can well simulate various changes of actual fault introduction. Therefore, the developed model can be effectively used to the reliability evaluation of actual OSS testing and development. In a sense, the modeling process of the developed model also provides methods and guidance for the further research on the reliability of OSS in the future.
Threats to validity
The threats to validity of the developed model have two factors. One is external factors, the other is internal factors.
External factors: To evaluate the performance of OSS reliability model, more kinds of fault data sets should be selected. However, we have selected three OSS fault data sets to fully evaluate the power of software reliability models. Three OSS fault data sets can meet the basic experimental requirements.
Internal factors: we used Taylor formula to simplify the equation and gave an approximate solution. In order to simplify the calculation, we assume that the failure detection rate is constant. Owing to the complexity of OSS reliability modeling, it is very beneficial to simplify the calculation properly. Moreover, OSS reliability modeling is also a tradeoff between complex modeling and practical simplified use.
Conclusions
In this paper, we develop the OSS reliability model considering that fault introduction obeys the GISS distribution. Because fault introduction obeys the GISS distribution, the fault introduction rate can be represented by complex nonlinear changes, such as decreasing, increasing first and then decreasing over time. Therefore, the developed model can get with a variety of different reliability evaluation of OSS. To verify the adaptive and robust performance of the developed model, we use three OSS fault data sets, six model comparison criteria and five comparison models for sufficient experiments. The experimental results show that the developed model has best fitting and prediction performance among all models. We also carry out the relevant model parameter sensitivity analysis, and the results show that the parameters of the developed model, such as the expected number of initially detected faults \(\left( \eta \right)\), the fault detection rate \(\left( \theta \right)\), the fault introduction rate \(\left( \alpha \right)\) and the shape parameter (d), have an important impact. The developed model can be used to assess the reliability for OSS, and it can also assist developers to evaluate OSS reliability.
Due to the complex and dynamic changes of OSS testing, debugging and development, we will consider various random changes of fault introduction to develop the corresponding OSS reliability model in the future.
References
Tamura Y, Yamada S (2007) Software reliability growth model based on stochastic differential equations for open source software. In: 2007 IEEE international conference on mechatronics. IEEE pp 1–6
Li X, Li YF, Xie M, Ng SH (2011) Reliability analysis and optimal versionupdating for open source software. Inf Softw Technol 53(9):929–936
Yang J, Liu Y, Xie M, Zhao M (2016) Modeling and analysis of reliability of multirelease open source software incorporating both fault detection and correction processes. J Syst Softw 115:102–110
Lin CT, Li YF (2014) Ratebased queueing simulation model of open source software debugging activities. IEEE Trans Softw Eng 40(11):1075–1099
Huang CY, Kuo CS, Luan SP (2014) Evaluation and application of bounded generalized pareto analysis to fault distributions in open source software. IEEE Trans Reliab 63(1):309–319
Singh VB, Sharma M, Pham H (2018) Entropy based software reliability analysis of multiversion open source software. IEEE Trans Softw Eng 44:1207–1223
Wang J, Mi X (2019) Open source software reliability model with the decreasing trend of fault detection rate. Comput J 62(9):1301–1312
Erto P, Giorgio M, Lepor A (2018) The generalized inflection sshaped software reliability growth model. IEEE Trans Reliab 69:1–17
Raymond ES (2001) The cathedral and the bazaar: musings on Linux and open source by an accidental revolutionary. O’Reilly, Sebastopol
Zhou Y, Davis J (2005) Open source software reliability model: an empirical approach. In: Proceedings of the fifth workshop on open source software engineering, ACM, New York. pp 1–6
Anbalagan P, Vouk M (2008) On reliability analysis of open source softwarefedora. In: 2008 19th international symposium on software reliability engineering (ISSRE). IEEE, pp 325–326
Rossi B, Russo B, Succi G (2010) Modelling failures occurrences of open source software with reliability growth. In: IFIP international conference on open source systems. Springer, Berlin, Heidelberg pp 268–280
Chiu KC, Huang YS, Lee TZ (2008) A study of software reliability growth from the perspective of learning effects. Reliab Eng Syst Saf 93(10):1410–1421
Okamura H, Dohi T, Osaki S (2013) Software reliability growth models with normal failure time distributions. Reliab Eng Syst Saf 116:135–141
Utkin LV, Coolen FPA (2018) A robust weighted SVRbased software reliability growth mode. Reliab Eng Syst Saf 176:93–101
Wang J, Zhang C (2018) Software reliability prediction using a deep learning model based on the RNN encoderdecoder. Reliab Eng Syst Saf 170:73–82
Ke SZ, Huang CY (2020) Software reliability prediction and management: a multiple changepoint model approach. Qual Reliab Eng Int 36:1678–1707
Tamura Y, Yamada S (2005) Comparison of software reliability assessment methods for open source software. In: 11th international conference on parallel and distributed systems (ICPADS'05). IEEE vol 2, pp 488–492
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of objectoriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910
Tamura Y, Yamada S (2007) Software reliability assessment and optimal versionupgrade problem for open source software. In: 2007 IEEE international conference on systems, man and cybernetics. IEEE pp 1333–1338
SyedMohamad MS, McBride T (2008) A comparison of the reliability growth of open source and inhouse software. In: 2008 15th AsiaPacific software engineering conference. IEEE pp 229–236
Tamura Y, Yamada S (2008) A method of reliability assessment based on deterministic chaos theory for an open source software. In: 2008 Second international conference on secure system integration and reliability improvement. IEEE pp 60–66
Singh BV, Singh PG, Kumar R et al (2010) A generalized reliability growth model for open source software. In: 2010 2nd international conference on reliability, safety and hazardriskbased technologies and physicsoffailure methods (ICRESH). IEEE pp 523–528
Singh BV, Kapur KP, Basirzadeh M (2012) Open source software reliability growth model by considering change–point. Int J Inf Technol (IJIT) 4(1):405
Singh V, Kapur KP, Tandon A (2010) Measuring reliability growth of open source software by applying stochastic differential equations. In: 2010 second world congress on software engineering. IEEE vol 2, pp 115–118
Tamura Y, Yamada S (2009) Optimisation analysis for reliability assessment based on stochastic differential equation modelling for open source software. Int J Syst Sci 40(4):429–438
Yamada S, Tamura Y (2016) OSS reliability measurement and assessment. Springer International Publishing, Geneva
Zhu M, Pham H (2018) A multirelease software reliability modeling for open source software incorporating dependent fault detection process. Ann Oper Res 269(1–2):773–790
Aggarwal AG, Dhaka V, Nijhawan N (2017) Reliability analysis for multirelease opensource software systems with change point and exponentiated Weibull fault reduction factor. Life Cycle Reliab Saf Eng 6(1):3–14
Ahmadi M, Mahdavi I, Garmabaki SHA (2016) Multi upgradation reliability model for open source software. Current trends in reliability, availability, maintainability and safety. Springer, Cham, pp 691–702
Tamura Y, Miyahara H, Yamada S (2012) Reliability analysis based on jump diffusion models for an open source cloud computing. In: 2012 IEEE international conference on industrial engineering and engineering management. IEEE, pp 752–756
Tamura Y, Kawakami M, Yamada S (2013) Reliability modeling and analysis for open source cloud computing. Proc Inst Mech Eng Part O 227(2):179–186
Tandon A, Singh VB, Sharma M et al (2020) Entropy based software reliability growth modelling for open source software evolution. Tehnicki Vjesnik 27(2):550–557
Kapur PK, Panwar S, Kumar V et al (2020) Entropybased twodimensional software reliability growth modeling for opensource software incorporating changepoint. Int J Reliab Qual Saf Eng 27(05):204000912040009–19
Ivanov V, Reznik A, Succi G (2018) Comparing the reliability of software systems: a case study on mobile operating systems. Inf Sci 423:398–411
Barack O, Huang L (2020) Assessment and prediction of software reliability in mobile applications. J Softw Eng Appl 13(9):179–190
Wang J (2021) Model of open source software reliability with fault introduction obeying the generalized pareto distribution. Arab J Sci Eng 46:3981–4000
Yang J, Wang X, Huo Y, Cai J (2020) Change point reliability modelling for open source software with masked data using expectation maximization algorithm. pp 1–6
Saraf I, Shrivastava AK, Iqbal J (2020) Generalised fault detection and correction modelling framework for multirelease of software. Int J Ind Syst Eng 34(4):464
Sun X, Li J (2021) Simulation of software reliability growth model based on fault severity and imperfect debugging. pp 140–152
Ullah N, Morisio M, Vetrò A (2014) Selecting the best reliability model to predict residual defects in open source software. Computer 48(6):50–58
Raghuvanshi KK, Agarwal A, Jain K et al (2021) A timevariant fault detection software reliability model. SN Appl Sci 3(1):1–10
Saraf I, Iqbal J, Shrivastava AK, Khurshid S (2021) Modelling reliability growth for multiversion open source software considering varied testing and debugging factors. Qual Reliab Eng Int 38:1–12
Kapur PK, Pham H, Anand S, Yadav K (2011) A unified approach for developing software reliability growth models in the presence of imperfect debugging and error generation. IEEE Trans Reliab 60(1):331–340
Goel AL, Okumoto K (1979) Timedependent errordetection rate model for software reliability and other performance measures. IEEE Trans Reliab 28(3):206–211
Goel AL (1985) Software reliability models: assumptions, limitations and applicability. IEEE Trans Softw Eng 11(12):1411–1423
Acknowledgements
We would like to thank anonymous reviewers and associate editor for their valuable comments. This work is supported by Fundamental Research Program of Shanxi Province of China and Natural Science Foundation of Shandong Province of China under Grant Nos. 201801D121120, ZR2021MF067.
Funding
Funding was provided by Fundamental Research Program of Shanxi Province of China (Grant No. 201801D121120) and Natural Science Foundation of Shandong Province of China (Grant No. ZR2021MF067).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Conflict of interest on behalf of all authors, the corresponding author states that there is no conflict of interest. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Substituting 17 into 16,
The general solution of differential Eq. (19) can be expressed as,
Substituting 18 into 20,
Suppose \(y(t) = \int {\theta \left( {1  \frac{{\alpha {\text{d}}t^{d  1} }}{{1 + \beta \exp \left( {  \alpha t^{d} } \right)}}} \right)} {\text{d}}t\).
Using Taylor’s formula to expand the following equation,
Then
herein C is a constant. When t = 0, \(\mu (t) = 0\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, J., Zhang, C. Reliability model of open source software considering fault introduction with generalized inflection Sshaped distribution. SN Appl. Sci. 4, 244 (2022). https://doi.org/10.1007/s42452022051256
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42452022051256
Keywords
 Open source software (OSS)
 Software reliability model
 Fault introduction
 Generalized inflection Sshaped (GISS) distribution