Classical and Bayesian inference for Burr type-III distribution based on progressive type-II hybrid censored data

The aim of this paper is to discuss the estimation and prediction problems for the Burr type-III distribution under progressive type-II hybrid censored data. We obtained maximum likelihood estimators (MLEs) of unknown parameters using stochastic expectation maximization (SEM) algorithms, and the asymptotic variance–covariance matrix of the MLEs under SEM framework is obtained by Fisher’s information matrix. We provide various Bayes estimators for unknown parameters using Lindley’s approximation method and importance sampling technique from square error, entropy, and linex loss functions. Finally, we analyze a real data set and generate a simulation study to compare the performance of various proposed estimators and predictors under different situations.


Introduction
In many lifetime and reliability studies, the experimenter may not always obtain complete information on failure times for all experimental units. The experimenters remove some unit from the experiment, or in some cases, it may happen that some units are unintentionally lost from the experiment. So, the censored data obtained from such experiments and censoring occur commonly. Type-I censoring and type-II censoring are the most common censoring schemes [33]. One important characteristic of these two censoring schemes is that they do not allow for units to be removed from the test at any point other than the final termination point. The mixture of type-I and type-II censoring schemes is known as the hybrid censoring scheme which was firstly introduced by Epstein [16], and it becomes quite popular in reliability and life testing experiments. A lot of work has been done on hybrid censoring and many different variations in it (see [6,11,14,21,22]). For example, Fairbanks et al. [17] introduced the type-I hybrid censoring (HCS) and considered the special case when the lifetime distribution is exponential. The main disadvantage of type-I HCS is that most of the inferential results need to be developed in this case under the condition that the number of observed failures is at least one; moreover, there may be very few failures occurring up to the pre-fixed time, which results in low efficiency of the estimator(s) of the model parameter(s). For this reason, Childs et al. [11] introduced an alternative hybrid censoring scheme that would terminate the experiment at the random time T * = max{X m∶m∶n , T} . This hybrid censoring scheme is called type-II hybrid censoring scheme (type-II HCS), and it has the advantage of guaranteeing at least m failures to be observed by the end of the experiment. If m failures occur before time T, then the experiment would continue up to time T which may end up yielding possibly more than m failures in the data. On the other hand, if the mth failure does not occur before time T, then the experiment would continue until the time when the mth failure occurs in which case we would observe exactly m failures in the data. Hybrid censoring schemes have been introduced in the context of progressive censoring as well. Kundu and Joarder [23] discussed type-II progressive hybrid censoring scheme (Type-II PHCS). Type-II PHCS overcomes the drawback of the type-I PHCS that the maximum likelihood (ML) may not always exist. In brief, progressive type-II hybrid censoring scheme can be described as follows. Consider n identical, independent units with distinct distributions which are placed in a lifetime test, each random variable X 1∶m∶n , X 2∶m∶n ,...,X m∶m∶n is identically distributed, with p.d.f f (x; ) and c.d.f F(x; ) , where denotes the vector of parameters ( , ) . The correct value of m (m < n) is a random variable. Suppose R 1 , R 2 , … , R m are fixed before the start of the experiment called progressive censoring scheme with R j > 0 and ∑ m j=1 R j + m = n is specified in the experiment. Under the type-II progressive censoring scheme, at the time of the first failure, X 1∶m∶n , R 1 of the n − 1 surviving units are randomly withdrawn from the life test, then at the time of the second failure X 2∶m∶n , R 2 of then n − R 1 − 2 surviving units are withdrawn, and so on, and finally at the time of mththe failure X m∶m∶n all R m = n − R 1 − R 2 − ⋯ − R m−1 − m surviving units are withdrawn from the life test. Since R ′ i s are pre-fixed, let us denote these failure times X 1∶m∶n ≤ X 2∶m∶n ≤ ⋯ ≤ X m∶m∶n , though their distributions depend on R ′ i s. The type-II PHCS involves the termination of the life test at time T * = max{X m∶m∶n , T} , and let D denote the number of failures that occur before time T, and d shows the observed value of D. Then, if X m∶n > T , the experiment would terminate at the mthfailure, with the withdrawal of units occurring after each failure according to the pre-fixed progressive censoring scheme (R 1 , R 2 , … , R m ) . However, if X m∶n < T , then instead of terminating the experiment by removing all remaining R m units after the mth failure, the experiment would continue to observe failures without any further withdrawals up to time T. Thus, in this case, we have R m = R m+1 = ⋯ = R D = 0,and the resulting failures times are indicated by X 1∶m∶n , X 2∶m∶n , … , X m∶m∶n , X m+1∶n , … , X d∶n . We denote the two cases as Case I and Case II, respectively: In this paper, we consider the estimation and prediction of unknown parameters of Burr type-III distribution under progressive type-II hybrid censored sample from both classical and Bayesian perspectives. Also, we provide predictive estimates and intervals for future unknown observable value based on some priori information. The p.d.f and c.d.f of a random variable X following Burr type-III distribution have the following form (1) where and are the shape parameters of the distribution. Burr [7] introduced twelve cumulative distribution functions for fitting various failure lifetime data. Among these distributions, Burr type-III distribution can accommodate different hazard lifetime data, so it has received considerable attention in the recent past. Also, it can properly approximate many well-known distributions such as Weibull, gamma, and log-normal for fitting lifetime data. This makes it worthwhile model to these distributions. Inferences for the parameters and of Burr( , ) distribution have been investigated by many researchers such as [3,4,9,26,28,31,32,37]. We have organized the rest of this paper as follows. The MLEs of unknown parameters are discussed in "Maximum likelihood estimation" section. We have proposed SEM algorithm for this purpose. The asymptotic confidence intervals are also constructed by using fisher's information matrix in "Fisher's information matrix" section. Bayes estimators are obtained with respect to loss functions in "Bayesian estimation" section. In "Data analysis" section, a real-life data set is analyzed to illustrate the proposed statistical methods, and also Monte Carlo simulations have been performed for comparison purposes in "Simulation study" section. Finally, a conclusion is given in "Conclusion" section.

Maximum likelihood estimation
In this section, we derive MLEs of the unknown parameters of the Burr type-III distribution when the lifetime data are observed under progressive hybrid type-II censoring. Suppose that x = (x (1) , x (2) , … , x (m) ) from Case I and x c = (x (1) , x (2) , … , x (m∶∶m∶n) , … , x (D∶n) ) from Case II are observed sample from Burr( , ) distribution under progressive hybrid type-II censoring scheme. Then, the likelihood function of ( , ) given the observed data x can be written as: where and (3) Case II: and f (.; , ) and F(.; , ) are as defined in Eqs. (1) and (2). Now using the associated log-likelihood function l = ln L( , | x) , ML estimates of and can be obtained on simultaneously solving the partial derivatives equations of l with respect to and . In most cases, the estimators do not admit explicit expressions and some numerical procedures such as Newton-Raphson method (NR) have to be used to determine the estimators. NR is a direct approach for obtaining the MLEs by maximizing the likelihood function and a well-known numerical algorithm for finding the root of a function or equation [39]. It involves calculation of the first and second derivatives of the observed log-likelihood with respect to the parameters. However, when employing NR

Expectation-Maximization (EM) algorithm
The EM algorithm was introduced an iterative procedure to compute the MLEs in the presence of missing data and consists of an expectation (E) step and a maximization (M) step by Dempster et al. [12]. Dealing with hybrid censored observations, the problem finding MLEs of unknown parameters with the Burr type-XII model can be viewed as an incomplete data problem [30]. Under progressive hybrid type-II censoring, X (i) shows at the time of ith failure, and R i shows the number of units which are removed from the experiment. Now suppose that denote a vector of those R i and Ŕ D number of censored units. Then, the total censored data can be viewed as Z = (Z 1 , Z 2 , … , Z m ) and Ź = (Ź 1 ,Ź 2 , … ,Ź́R D ) . Now the complete sample of n number of units can be seen as a combination of the observed data and the censored data, that is, W = (X, Z,Ź) . Consequently, the log-likelihood function of the complete data set can be written as: Now, by taking the partial derivatives with respect to and and equating them to zero, we get the following expressions method, some errors can be occur, such as time-consuming depending on the size of your system, or may fail to convergence. So, the traditional NR method or a numerical technique that can be used to solve the associated log-likelihood partial derivative equations as a closed form for the MLEs does not exist. Here we use the expectation-maximization algorithm for this purpose, see [12]. The main advantage of this algorithm is that: it is more reliable particularly dealing with censored data. Note that the Case I has been discussed by Singh et al. [15]. So, our aim in this paper is that we propose the classical and Bayesian estimation for Case II; they are given in the next sections.
The above two equations need to be solved simultaneously to obtain the ML estimates of ( , ) . EM algorithm consists of two steps. In E-step, expression of the censored observations is replaced by their respective expected values. Further, M-step involves the maximization of E-step. Now suppose that at the kth stage in E-step estimate of ( , ) is ( (k) , (k) ) ; then, after using the M-step it can be shown that the (k + 1) th stage updated estimators are of the form where (8) Notice that the iterative procedure in M-step can be terminated after the convergence is achieved, that is, when we have | (k+1) − (k) | + | (k+1) − (k) | ≤ for some given > 0 . However, one of the biggest disadvantages of EM algorithm is that it is only a local optimization procedure and can easily get stuck in a saddle point [40], in particular with high-dimensional data or increasing complexity for censored and lifetime models. A possible solution to overcome the computational inefficiencies is to invoke stochastic EM algorithm suggested by Celeux and Diebolt [8,19,29,38]. It can be seen that the above EM expressions E si ( , ), s = 1, 2, 3, 4, 5, 6 do not turn out to have closed form, and therefore, one needs to compute these expressions numerically. So, we used the stochastic expectation maximization (SEM) algorithm to obtain ML estimators.

SEM algorithm
In this section, the SEM algorithm is used in computing MLEs of unknown parameters. Diebolt and Celeux [13] proposed a stochastic version of EM algorithm that replaces the E-step by stochastic (S-step) and execute it by simulation. Thus, SEM algorithm completes the observed sample by replacing each missing information by a value randomly drawn from the distribution conditional on results from the previous step. The algorithm has been shown to be computationally less burdensome and more appropriate than the EM algorithm in a lot of problems, see [2,13,35]. Recently, Zhang et al. [40] considered SEM algorithm to obtain ML estimates for unknown parameters of various models when the data are observed under progressive type-II censoring; also they compared the results with EM algorithm. Next, we use the same idea to generate the independent R i number of samples of z ij , i = 1, 2, … , m and j = 1, 2, … , R i and Subsequently, the ML estimators of and at the (k + 1) th stage are given by Further, the above iterative procedure can be terminated after the convergence is achieved.

Fisher's information matrix
In this section, we present the observed fisher's information matrix obtained by using the missing value principle of Louis [27]. The observed fisher's information matrix can be used to construct the asymptotic confidence intervals. The idea of missing information principle is as follows: This section deals with obtaining fisher's information matrix which will be further used to compute interval estimates for the unknown parameters of the Burr type-III distribution. The asymptotic variance-covariance matrix of the MLEs of ( , ) can be obtained by inverting the observed information matrix and is given by Next, we use the SEM algorithm to compute observed information matrix. We first generate the censored observations z ij using Monte Carlo simulation from the conditional density as discussed in the previous section. Subsequently, the asymptotic variance-covariance matrix of the MLEs of ( , ) can be obtained as where l * = ln L(w; , ) given in (5). Further, the involved expressions are given by Finally, in all the three cases using the idea of large sample approximation the two-sided 100(1 − )%, 0 < < 1 , asymptotic confidence intervals for and , respectively, can be obtained as ̂± z 2 √ Var(̂) and ̂± z 2 √ Var(̂) . Here z 2 is the upper 2 th percentile of the standard normal distribution. Notice that, here ̂ and ̂ represent the ML estimates of and , obtained by NR, EM algorithm and SEM algorithm in the previous section.

Bayesian estimation
Suppose that a sample x = (x (1) , x (2) , … , x (m) ) is observed from Burr type-III distribution under progressive hybrid type-II censoring scheme R = (R 1 , R 2 , … , R m ) . We assume an independent gamma priors for the unknown parameters and having pdfs of the following form: Here a, b, c and d are the hyper-parameters and are responsible for providing information about the unknown parameters. Now, the joint posterior density of ( , ) can be written as ( , ) = ( ) ( ) . Further, the joint posterior density of ( , ) given the observed data x is obtained as where K is the normalizing constant given by In Bayesian estimation, selection of loss function plays an important role. The most commonly used loss function is square error loss, given by SL (g( ),ĝ( )) = (g( ) −ĝ( )) 2 ; here ĝ( ) represent an estimate of g( ) . Bayes estimator of g( ) under the square error loss is the posterior mean given by It is to be noticed that square error loss is a symmetric loss function and it puts an equal weight to the under estimation and overestimation. In many practical situations, underestimation may be more serious than the overestimation, and vice versa. In such cases, asymmetric loss function can be taken into account. So, we considered linex loss function which was proposed by Varian [36] and is given by where ≠ 0 is a shape parameter. Notice that > 0 suggests the case when overestimation is more serious than underestimation, and vice versa for < 0 . The Bayes estimator of g( ) under linex loss is given by We further consider entropy loss function, given by Here w > 0 suggests the case when overestimation is more serious than underestimation, and vice versa for w < 0 . The Bayes estimator of g( ) under entropy loss function is given by Notice that for w = −1 , Bayes estimator of g( ) under entropy loss function coincides with Bayes estimator of g( ) under square error loss function. Now, observe that Bayesian estimators given by (17), (18) and (19) do not admit closedform expressions. Therefore, in the next section we use the (18) approximation method of Lindley [25] and importance sampling technique.

Lindley approximation
In this section, we use the method of Lindley to obtain approximate explicit Bayes estimators of and . The Bayesian estimates involve the ratio of two integrals; we consider I(X) defined as: By applying the Lindley's method, I(x) can be approximated as: where (20) Here l(., .) denotes the log-likelihood function, ( , ) denotes the corresponding prior distribution, and ij represents the (i, j)th element of the variance covariance matrix. All the expressions in Eq. (19) are evaluated at the ML estimates. Suppose, we want to estimate under the squared error loss. Then, we take g( , ) = and subsequently observe that g = 1 , g = g = g = g = g = 0 . Consequently, the Bayes estimator of is obtained as: where ij , i, j = 1, 2 are the elements of the variance-covariance matrix of (̂,̂) as reported in "Fisher's information matrix" section. Other involved expressions (evaluated at ( , ) = (̂,̂) ) are reported in "Appendix." In a similar way, Bayes estimator of under linex loss and entropy loss is, respectively, obtained as: And with g( , ) = −w , g = −w −(w+1) , g = w(w + 1) −(w+2) , g = g = g = g = 0 . Similarly, Bayes estimator of under square error, linex and entropy loss functions can easily be obtained. Details are not presented here for the shake of brevity. Further notice that highest posterior density (HPD) interval estimates of and can not be obtained using Lindley approximation. Therefore, we next use importance sampling technique for this purpose.

Importance sampling
Importance sampling is a very useful technique to draw sample from the posterior densities. Observe that posterior density given by (15) can be written as:

and with
Finally, the Bayes estimator of Burr type-III under the linex loss function is given by: where Now consider the following steps to draw samples from the above posterior density.
Now Bayes estimator of under square error and entropy loss functions are, respectively, obtained as: Next we use the method of [10] for computing HPD intervals. Suppose that is the unknown parameter of interest, and ( | x) and Π( | x). respectively, denote its posterior density and posterior distribution functions. If α (p) denotes t h e p t h q u a n t i l e o f . t h e n w e h a v e (p) = inf{ ∶ Π( | x) ≥ p; 0 < p < 1} . It can be observed that for a given * , a simulation consistent estimator of Π( * | x) can be obtained as: To obtain a 100(1 − p)% confidence interval for , we con- the greatest integer less than or equal to u. The interval with the smallest width is treated as the HPD interval. Similarly, the HPD interval for the parameter can be constructed.

Data analysis
In this section, we analyze a real data set for illustration purpose. We consider a data set that represents the strength measured in GPA for single carbon fibers of 10 mm in gauge lengths and is given below. In literature, this data set has been discussed for Burr type-III distribution. In fact authors considered a transformation X = e Y , where Y represent the original lifetime data that fits the generalized logistic distribution, see [1,5]. Notice that the transformed variable X follows Burr type-III distribution. For comparison purpose, we also consider fitting of gamma, Weibull, and generalized exponential distribution. We use the method of negative log-likelihood criterion (NLC), Kolmogorov-Smirnov (KS) test statistics, Akaike's information criterion (AIC) and Bayesian information criterion to judge the goodness of fit as discussed in Farbod and Gasparian [18]. The results are shown in Table 1. It can be clearly verified that the Burr type-III distribution has better fit to this data set based on the minimum KS test statistics and NLC. Also, from Fig. 1 we observe that the Burr type-III has a good fit for these data. Then, we generate the progressive hybrid type-II censored sample with different schemes as shown in Table 2 when T = 65 . The values of SEM and NR estimates are also given in Table 2. Also, the Bayes estimates based on Lindley and Markov Chain Monte Carlo (MCMC) methods are given in Tables 3 and 4.

Simulation study
In this section, we conduct a Monte Carlo simulation study to compare the performance of proposed estimators. We simulate the data from Burr (0.5, 1.5) distribution under various progressive hybrid type-II censoring schemes for different combinations of (n, m). For each case, we obtain ML estimates of and using NR method which was introduced in [24] and SEM algorithm. We mention that all the values are based on 5000 Monte Carlo simulations. Further, in the tables we denote the censoring schemes like (5, 0, 0, 0) as (5, 0 * 3 ) for convenience. All the average estimates and means square error (MSE) values of and are reported in Tables 5 and 6. It is seen the table SEM estimates are better than NR estimates in the sense of having lower MSE values. The 95% asymptotic and Bayesian confidence intervals are also included in Tables 4 and 5. It can be observed that the both Bayesian and asymptotic confidence intervals generally have less lengths. Tables 6 and 7 show the Bayes estimates based on Lindley approximation method. Also, Tables 8 and 9 show the Bayes estimates based on MCMC method that discussed in Rizzo [33] and Givens and Hoeting [20]. For both methods, we used the non-informative prior distribution by setting a = b = c = d = 0 . The results show that both methods have Acknowledgements The authors would like to thank the anonymous referees for their constructive comments that led to put many details on the paper and substantially improved the presentation.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, Appendix (26)