1 Introduction

The Weibull distribution has assumed a prominent position as statistical model for data from reliability, engineering and biological studies (McCool 2012). This model has been exaustively used for describing hazard rates – an important quantity of survival analysis. In the context of monotone hazard rates, some results from the literature suggest that the Weibull law is a reasonable choice due to its negatively and positively skewed density shapes. However, this distribution is not a good model for describing phenomenon with non-monotone failure rates, which can be found on data from applications in reliability and biological studies. Thus, extended forms of the Weibull model have been sought in many applied areas. As a solution for this issue, the inclusion of additional parameters to a well-defined distribution has been indicated as a good methodology for providing more flexible new classes of distributions.

Marshall and Olkin (1997) derived an important method of including an extra shape parameter to a given baseline model thus defining an extended distribution. The Marshall and Olkin (“O” for short) transformation furnishes a wide range of behaviors with respect to the baseline distribution. The geometrical and inferential properties associated with the generated distribution depend on the values of the extra parameter. These characteristics provide more flexibility to the O generated distributions. Considering the proportional odds model, Sankaran and Jayakumar (2008) presented a detailed discussion about the physical interpretation of the O family.

This family has a relationship with the odds ratio associated with the baseline distribution. Let X be a distributed O random variable which describes the lifetime relative to each individual in the population with a vector of p-covariates z=(z1,…,z p ), where (·) denotes the transposition operator. Then, the cumulative distribution function (cdf) of X is given by

F ¯ (x;z)= k ( z ) G ¯ ( x ) 1 [ 1 k ( z ) ] G ¯ ( x ) ,
(1)

where k(z)=λ G (x)/λ F (x ; z) is a non-negative function such that z is independent of the time x, λ F (x ; z) is the proportional odds model [for a discussion about such modeling, see Sankaran and Jayakumar (2008)] and λ G (x)=G(x)/ G ¯ (x) represents an arbitrary odds for the baseline distribution.

In this paper, we consider k(z)=δ. Before, however, it is important to highlight two important properties of the O transformation: (i) the stability and (ii) geometric extreme stability (Marshall and Olkin 1997). In other words, the O distribution possesses a stability property in the sense that if the method is applied twice, it returns to the same distribution. In addition, the following stochastic behavior can also be verified: let {X1,…,X N } be a random sample from the population random variable equipped with the survival function (1) at k(z)=δ. Suppose that N has the geometric distribution with probability p and that this quantity is independent of X i , for i=1,…,N. Then, U=m i n(X1,…,X N ) and V=m a x(X1,…,X N ) are random variables having survival functions (1) such that k(z) can be equal to p and p−1, respectively, i.e., the O transform satisfies the geometric extreme stability property.

Due to these advantages, many papers have employed the O transformation. In Marshall and Olkin work, the exponential and Weibull distributions were generalized. Subsequently, the O extension was applied to several well-known distributions: Weibull (Ghitany et al.2005, Zhang and Xie 2007), Pareto (Ghitany 2005), gamma (Ristić et al.2007), Lomax (Ghitany et al.2007) and linear failure-rate (Ghitany and Kotz 2007) distributions. More recently, general results have been addressed by Barreto-Souza et al. (2013) and Cordeiro and Lemonte (2013). In this paper, we aim to apply the O generator to the extended Weibull (EW) class of distributions to obtain a new more flexible family to describe reliability data. The proposed family can also be applied to other fields including business, environment, informatics and medicine in the same way as it was originally done with the Birnbaum-Saunders and other lifetime distributions.

Let G ¯ (x)=1G(x) and g(x)=d G(x)/d x be the survival and density functions of a continuous random variable Y with baseline cdf G(x). Then, the O extended distribution has survival function given by

F ¯ (x;δ)= δ G ¯ ( x ) 1 δ ¯ G ¯ ( x ) = δ G ¯ ( x ) G ( x ) + δ G ¯ ( x ) ,xXR,δ>0,
(2)

where δ ¯ =1δ.

Clearly, δ=1 implies F ¯ (x)= G ¯ (x). The family (2) has probability density function (pdf) given by

f ( x ; δ ) = δg ( x ) [ 1 δ ¯ G ¯ ( x ) ] 2 , x X R , δ > 0 .

Its hazard rate function (hrf) becomes

τ ( x ; δ ) = g ( x ) G ¯ ( x ) [ 1 δ ¯ G ¯ ( x ) ] , x X R , δ > 0 .

Further, the class of extended Weibull (EW) distributions pioneered by Gurvich et al. (1997) has achieved a prominent position in lifetime models. Its cdf is given by

G(x;α,ξ)=1exp[αH(x;ξ)],xD R + ,α>0,
(3)

where H(x;ξ) is a non-negative monotonically increasing function which depends on the parameter vector ξ. The corresponding pdf is given by

g(x;α,ξ)=αexp[αH(x;ξ)]h(x;ξ),
(4)

where h(x;ξ) is the derivative of H(x;ξ).

Different expressions for H(x;ξ) in Equation (3) define important models such as:

  1. (i)

    H(x;ξ)=x gives the exponential distribution;

  2. (ii)

    H(x;ξ)=x 2 leads to the Rayleigh (Burr type-X) distribution;

  3. (iii)

    H(x;ξ)= log(x/k) leads to the Pareto distribution;

  4. (iv)

    H(x;ξ)=β −1[ exp(β x)−1] gives the Gompertz distribution.

In this paper, we derive a new family of distributions by compounding the O and EW classes. We define a new generated family in order to provide a “better fit” in certain practical situations. The compounding procedure follows by taking the EW class (3) as the baseline model in Equation (2). The Marshall-Olkin extended Weibull (OEW) family of distributions contains some special models as those listed in Table 1 with the corresponding H(·;·) and h(·;·) functions and the parameter vectors.

Table 1 Special models and the corresponding functions H ( x ; ξ ) and h ( x ; ξ )

The paper unfolds as follows. Section 2 presents the cdf and pdf of the proposed distribution and some expansions for the density function. The main statistical properties of the new family are derived in Section 3 including the moments, moment generating function (mgf) and incomplete moments, quantile function (qf), random number generator, skewness and kurtosis measures, order statistics, mean deviations and average lifetime functions. In Section 4, we derive four measures of information theory: Shannon and Rényi entropies, cross entropy and Kullback-Leibler divergence. The maximum likelihood method to estimate the model parameters is adopted in Section 5. Two special models are studied in some details in Section 6. We perform a simulation study using Monte Carlo’s experiments in order to assess the accuracy of the maximum likelihood estimators (MLEs) in Section 7.1 and two applications to real data in Section 7.2. Conclusions and some future lines of research are addressed in Section 8.

2 The OEW family

The cdf of the new family of distributions is given by

F(x;δ,α,ξ)= 1 exp [ αH ( x ; ξ ) ] 1 δ ¯ exp [ αH ( x ; ξ ) ] ,xD,
(5)

where α>0 and δ>0. Using (5), we can express its survival function as

F ¯ (x;δ,α,ξ)= δ exp [ αH ( x ; ξ ) ] 1 δ ¯ exp [ αH ( x ; ξ ) ] ,xD
(6)

and the associated hrf reduces to

τ(x;δ,α,ξ)= α h ( x ; ξ ) 1 δ ¯ exp [ αH ( x ; ξ ) ] ,xD.
(7)

The corresponding pdf is given by

f(x;δ,α,ξ)= δ α h ( x ; ξ ) exp [ αH ( x ; ξ ) ] { 1 δ ¯ exp [ αH ( x ; ξ ) ] } 2 ,
(8)

where H(x;ξ) can be any special distribution listed in Table 1.

Hereafter, let X be a random variable having the OEW pdf (8) with parameters δ,α and ξ, say XOEW(δ,α,ξ). Equation (8) extends several distributions which have been studied in the literature.

The O Pareto (Ghitany 2005) is obtained by taking H(x; ξ)= log(x/k)(xk). Further, for H(x; ξ)=xγ we obtain the O Weibull (Ghitany et al. 2005, Zhang and Xie 2007). The O Lomax (Ghitany et al. 2007) and O log-logistic are derived from (8) by taking H(x; ξ)= log(1+xc) with c=1 and H(x; ξ)= log(1+xc) with α=1, respectively. For H(x; ξ)=a x+b x2/2 and α=1, Equation (8) reduces to the O linear failure rate (Ghitany and Kotz 2007). In the same way, for H(x; ξ)= log(1+xc), we have the O Burr XII (Jayakumar and Mathew 2008). Finally, we obtain the O Fréchet (Krishna et al. 2013) from Equation (8) by setting H(x; ξ)=xγ. Table 1 displays some useful quantities and corresponding parameter vectors for special distributions.

A general approximate goodness-of-fit test for the null hypothesis H0:X1,…,X n with X i following F(x;θ), where the form of F is known but the p-vector θ=(δ,α,ξ) is unknown, was proposed by Chen and Balakrishnan (1995). This method is based on the Cramér-von Mises (CM) and Anderson-Darling (AD) statistics and, in general, the smaller the values of these statistics, the better the fit. In this paper, such methodology is applied to provide goodness-of-fit tests for the distributions under study.

Some results in the following sections can be obtained numerically in any software such as MAPLE (Garvan 2002), MATLAB (Sigmon and Davis 2002), MATHEMATICA (Wolfram 2003), Ox (Doornik 2007) and R (R Development Core Team 2009). The Ox (for academic purposes) and R are freely available at http://www.doornik.com and http://www.r-project.org, respectively. The results can be computed by taking in the sums a large positive integer value in place of .

2.1 Expansions for the density function

For any positive real number a, and for |z|<1, we have the generalized binomial expansion

( 1 z ) a = k = 0 ( a ) k k ! z k ,
(9)

where (a) k =Γ(a+k)/Γ(a)=a(a+1)…(a+k−1) is the ascending factorial and Γ(·) is the gamma function. Applying (9) to (8), for 0<δ<1, gives

f(x;δ,α,ξ)= j = 0 η j g(x;(j+1)α,ξ),
(10)

where η j =δ δ ¯ j and g(x;(j+1)α,ξ) denotes the EW density function with parameters (j+1)α and ξ. Otherwise, for δ>1, after some algebra, we can express (8) as

f(x;δ,α,ξ)= g ( x ; α , ξ ) δ 1 1 1 / δ [ 1 exp ( αH ( x ; ξ ) ) ] 2 .
(11)

In this case, we can verify that |(1−1/δ)[1− exp(−α H(x;ξ))]|<1. Then, applying twice the expansion (9) in Equation (11), we obtain

f(x;δ,α,ξ)= j = 0 ν j g(x;(j+1)α,ξ),
(12)

where

ν j = ν j ( δ ) = ( 1 ) j δ ( j + 1 ) ! k = j ( k + 1 ) ! ( 1 1 / δ ) k .

We can verify that j = 0 η j = j = 0 ν j =1. Then, the OEW density function can be expressed as an infinite linear combination of EW densities. Equations (10) and (12) have the same form except for the coefficients η js in (10) and ν js in (12). They depend only on the generator parameter δ. For simplicity, we can write

f(x;δ,α,ξ)= j = 0 w j g(x;(j+1)α,ξ),
(13)

where

w j = η j , if 0 < δ < 1 , ν j , if δ > 1 ,

and η j and ν j are given by (10) and (12), respectively. Thus, some mathematical properties of (13) can be obtained directly from those EW properties. For example, the ordinary, incomplete, inverse and factorial moments and the mgf of X follow immediately from those quantities of the EW distribution.

3 General properties

3.1 Moments, generating function and incomplete moments

The n th ordinary moment of X can be obtained from (13) as

E ( X n ) = j = 0 w j E ( Y j n ) ,

where from now on Y j EW((j+1)α,ξ) denotes a random variable having the EW density function g(y;(j+1)α,ξ).

The mgf and the k th incomplete moment of X follow from (13) as

M X ( t ) = E e tX = j = 0 w j M j ( t )

and

T k ( z ) = j = 0 w j T k ( j ) ( z ) ,
(14)

where M j (t) is the mgf of Y j and T k ( j ) (z)= z x k g(x;(j+1)α,ξ)dx comes directly from the EW model.

3.2 Quantile function and random number generator

The qf of X follows by inverting (5) and it can be expressed in terms of H−1(·) as

Q(u)= H 1 1 α log 1 δ ¯ u 1 u , ξ .
(15)

In Table 2, we provide the function H−1(x;ξ) for some special models.

Table 2 The H −1 ( x ; ξ ) function

Hence, the generator for X can be given by the algorithm:

The OEW distributions can be very useful in modeling lifetime data and practitioners may be interested in fitting one of these models. We provide a script using the R language to generate the density, distribution function, hrf, qf, random numbers, Anderson-Darling test, Cramer-von Mises test and likelihood ratio (LR) tests. This script can be be obtained from the authors upon requested.

3.3 Mean deviations

The mean deviations of X about the mean and the median are given by

δ 1 = D | x μ | f ( x ; δ , α , ξ ) d x and δ 2 = D | x M | f ( x ; δ , α , ξ ) d x ,

respectively, where μ=E(X) denotes the mean and M=M e d i a n(X) the median. The median follows from the nonlinear equation F(M;δ,α,ξ)=1/2. So, these quantities reduce to

δ 1 = 2 μ F ( μ ; δ , α , ξ ) 2 T 1 ( μ ) and δ 2 = μ 2 T 1 ( M ) ,

where T1(z) is the first incomplete moment of X obtained from (14) as

T 1 ( z ) = j = 0 w j T 1 ( j ) ( z ) ,

and T 1 ( j ) (z)= z xg(x;(j+1)α,ξ)dx is the first incomplete moment of Y j .

An important application of the mean deviations is related to the Bonferroni and Lorenz curves. These curves are useful in economics, reliability, demography, medicine and other fields. For a given probability p, they are defined by B(p)=T1(q)/(p μ) and L(p)=T1(q)/μ, respectively, where q=Q(p) is the qf of X given by (15) at u=p.

3.4 Average lifetime and mean residual lifetime functions

The average lifetime is given by

t m = 0 1 F ( x ; δ , α , ξ ) d x = j = 0 w j 0 G ¯ ( x ; ( j + 1 ) α , ξ ) d x.

In fields such as actuarial sciences, survival studies and reliability theory, the mean residual lifetime has been of much interest; see, for a survey, Guess and Proschan (1988). Given that there was no failure prior to x0, the residual life is the period from time x0 until the time of failure. The mean residual lifetime is given by

m ( x 0 ; δ , α , ξ ) = E X x 0 | X x 0 ; δ , α , ξ = { x : x > x 0 } ( x x 0 ) f ( x ; δ , α , ξ ) Pr ( X > x 0 ) d x = Pr ( X > x 0 ) 1 0 y f ( x 0 + y ; δ , α , ξ ) d y = F ¯ ( x 0 ; δ , α , ξ ) 1 j = 0 w j 0 y g ( x 0 + y ; ( j + 1 ) α , ξ ) d y.

The last integral can be computed from the baseline EW distribution. Further, m(x0;δ,α,ξ)→E(X) as x0→0.

4 Information theory measures

The seminal idea about information theory was pioneered by Hartley (1928), who defined a logarithmic measure of information for communication. Subsequently, Shannon (1948) formalized this idea by defining the entropy and mutual information concepts. The relative entropy notion (which would later be called divergence) was proposed by Kullback and Leibler (1951). The Kullback-Leibler’s measure can be understood like a comparison criterion between two distributions. In this section, we derive two classes of entropy measures and one class of divergence measures which can be understood as new goodness-of-fit quantities such those discussed by Seghouane and Amari (2007). All these measures are defined for one element or between two elements in the OEW family.

4.1 Rényi entropy

The Rényi entropy of X with pdf (8) is given by

H R s ( X ) = 1 1 s log D f ( x ; δ , α , ξ ) s d x ,

where s∈(0,1)∪(1,).

It is a difficult problem to obtain H R s (X) in closed-form for the OEW family. So, we derive an expansion for this quantity.

By using (9), f(x;δ,α,ξ)s can be expanded as

f ( x ; δ , α , ξ ) s = j = 0 w j exp[(j+s)αH(x;ξ)]h ( x ; ξ ) s ,
(16)

where

w j = η j ( α , δ ) = α s δ s ( 2 s ) j δ ¯ j j ! , for 0 < δ < 1 , ν j ( α , δ ) = α s δ s j ! k = 0 ( 2 s ) k ( k ) j k ! ( 1 1 / δ ) k , for δ > 1 .

The proof of this expansion is given in Appendix 8.

Finally, based on Equation (16), the Rényi entropy can be expressed as

H R s ( X ) = 1 1 s log j = 0 w j D exp [ ( j + s ) αH ( x ; ξ ) ] h ( x ; ξ ) s d x .

An advantage of this expansion is its dependence of an integral which has closed-form for some EW distributions.

4.2 Shannon entropy

The Shannon entropy of X is given by

H S ( X ) = E X log [ f ( X ; δ , α , ξ ) ] ,

where the log-likelihood function corresponding to one observation follows from (8) as

log [ f ( x ; δ , α , ξ ) ] = log ( δα ) + log [ h ( x ; ξ ) ] αH ( x ; ξ ) 2 log 1 δ ¯ exp [ αH ( x ; ξ ) ] .

Thus, it can be reduced to

H S ( X ) = log ( αδ ) + 2 E log 1 δ ̄ G ̄ ( X ; ξ ) E log h ( X ; ξ ) + α E H ( X ; ξ ) .

4.3 Cross entropy and Kullback-Leibler divergence and distance

Let X and Y be two random variables with common support R + whose densities are f X (x;θ1) and f Y (y;θ2), respectively. Cover and Thomas (1991) defined the cross entropy as

C X ( Y ) = E X log f Y ( X ; θ 2 ) = 0 f X ( z ; θ 1 ) log f Y ( z ; θ 2 ) d z.

We consider that XOEW( δ x , α x , ξ x ) and YOEW( δ y , α y , ξ y ). After some algebraic manipulations, we obtain

C X ( Y ) = D f X ( z ; δ x , α x , ξ x ) log f Y ( z ; δ y , α y , ξ y ) d z = log δ y α y E X log h ( X ; ξ y ) + α y E X H ( X ; ξ y ) + 2 E X log 1 δ ̄ G ̄ ( X ; ξ y ) .
(17)

An important measure in information theory is the Kullback-Leibler divergence given by

D(X||Y)= C X (Y) H S (X)= E X log f X ( X ; δ x , α x , ξ x ) f Y ( X ; δ y , α y , ξ y ) .
(18)

Applying (4.2) and (17) in Equation (18) gives

D ( X | | Y ) = log δ x α x δ y α y + E X log h ( X ; ξ x ) h ( X ; ξ y ) + 2 E X log 1 δ ̄ G ̄ ( X ; ξ y ) 1 δ ̄ G ̄ ( X ; ξ x ) + α y E X [ H ( X ; ξ y ) ] α x E X [ H ( X ; ξ x ) ] .
(19)

According to Cover and Thomas (1991), the Kullback-Leibler measure D(X||Y) is the quantification of the error considering that the Y model is true when the data follow the X distribution. For example, this measure has been proposed as essential parts of test statistics, which has seen strongly applied to contexts of radar synthetic aperture image processing in both univariate (Nascimento et al. 2010) and polarimetric (or multivariate) (Nascimento et al. 2014) perspectives.

In order to work with measures that satisfied the non-negativity, symmetry and definiteness properties, Nascimento et al. (2010) considered the symmetrization of (19)

d KL ( X , Y ) = 1 2 D ( X | | Y ) + D ( Y | | X ) = D f X x ; δ x , α x , ξ x f Y x ; δ y , α y , ξ y log f X ( x ; δ x , α x , ξ x ) f Y ( x ; δ x , α x , ξ x ) IntegrandKL ( x , y ) d x ,

which is given by

2 d KL ( X , Y ) = α y E X H ( X ; ξ y ) E Y H ( Y ; ξ y ) + α x E Y H ( Y ; ξ x ) ] E X [ H ( X ; ξ x ) + E X log h X ; ξ x h X ; ξ y + E Y log h Y ; ξ y h Y ; ξ x + 2 E X log 1 δ ̄ G ̄ X ; ξ y 1 δ ̄ G ̄ X ; ξ x + 2 E Y log 1 δ ̄ G ̄ Y ; ξ x 1 δ ̄ G ̄ Y ; ξ y .
(20)

Although this measure does not satisfy the triangle inequality, it is usually called the Kullback-Leibler distance (Jensen-Shannon divergence). The new measure can be used to answer questions like “how could one quantify the difference in selecting the Phani model with three parameters as the baseline distribution instead of the Weibull Kies distribution which has four parameters?”.

As an illustration for (20), we initially consider two distinct elements of the generated special model from the specifications: H(x;β)=β−1[ exp(β x)−1] and h(x;β)= exp(β x) in (8). This model will be presented with more details in future sections and its parametric space is represented by the vector (δ,α,β). Suppose that we are interested in quantifying the influence of a nuisance degree ε in the parameter α over the distance between two distinct elements, (2,1,3) and (2,1+ε,3), at such parametric space. Figure 1(a) displays the integrand of (20) for ε=0.1, 1, 2 and 4 for which the distances (or areas) associated with dKL(X,Y) are 6.50×10−3, 3.56×10−1, 9.46×10−1 and 2.25, respectively. It is notable that dKL(X,Y) takes smaller values for more closer points (or, equivalently, for more closer fits) and, therefore, (20) consists of new goodness-of-fit measures. In Figures 1(b) and 1(c), we show the influence of η=α/β on dKL([δ,α,β],[δ,α,β+ε]) (for β=δ=3 and α∈{1,3,9}) and of δ on dKL([δ,α,β],[δ+ε,α,β]) (for β=α=3 and δ∈{3,4,5}). For all cases, the contamination ε takes values in the interval (−2.9,2.9).

Figure 1
figure 1

MSE curves for δ ∈{0 . 3,0 . 8,1,2,4}, η =0 . 5,1,2 and n =200. (a) Behavior of the function IntegrandKL. (b) Influence of η=α/β under β=δ=3 and α∈{1,3,9}. (c) Influence of δ under (α,β)=(3,3).

5 Estimation

Here, we present a general procedure for estimating the OEW parameters from one observed sample and from multi-censored data. Additionally, we provide a discussion about how one can test the significance of additional parameter at the proposed class. Let x1,…,x n be a sample of size n from X. The log-likelihood function for the vector of parameters θ=(δ,α,ξ) can be expressed as

( θ ) = n log ( δα ) + i = 1 n log h ( x i ; ξ ) α i = 1 n H ( x i ; ξ ) 2 i = 1 n log 1 δ ¯ exp [ αH ( x i ; ξ ) ] .

From the above log-likelihood, the components of the score vector, U(θ)= ( U δ , U α , U ξ ) , are given by

U δ ( θ ) = ∂ℓ ( θ ) ∂δ = n δ 2 i = 1 n exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] , U α ( θ ) = ∂ℓ ( θ ) ∂α = n α i = 1 n H ( x i ; ξ ) 2 δ ¯ i = 1 n H ( x i ; ξ ) exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] and U ξ k ( θ ) = ∂ℓ ( θ ) ξ k = i = 1 n 1 h ( x i ; ξ ) ∂h ( x i ; ξ ) ξ k α i = 1 n ∂H ( x i ; ξ ) ξ k 2 δ ¯ α i = 1 n ∂H ( x i ; ξ ) ξ k exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] .

Finally, the partitioned observed information matrix for the OEW family is

whose elements are

U δδ ( θ ) = n δ 2 , U δα ( θ ) = 2 i = 1 n H ( x i ; ξ ) exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] 2 , U δ ξ k ( θ ) = 2 α i = 1 n ∂H ( x i ; ξ ) ξ k exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] 2 , U αα ( θ ) = n α 2 + 2 δ ¯ i = 1 n H ( x i ; ξ ) 2 exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] 2 , U α ξ k ( θ ) = 2 δ ¯ i = 1 n ∂H ( x i ; ξ ) ξ k exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] 1 αH ( x i ; ξ ) 1 δ ¯ exp [ αH ( x i ; ξ ) ] + i = 1 n ∂H ( x i ; ξ ) ξ k and U ξ k ξ j ( θ ) = i = 1 n 1 h ( x i ; ξ ) 2 h ( x i ; ξ ) ξ k ξ j 1 h ( x i ; ξ ) ∂h ( x i ; ξ ) ξ k ∂h ( x i ; ξ ) ξ j α i = 1 n 2 H ( x i ; ξ ) ξ k ξ j 2 α δ ¯ i = 1 n exp [ αH ( x i ; ξ ) ] 1 δ ¯ exp [ αH ( x i ; ξ ) ] 2 H ( x i ; ξ ) ξ k ξ j ∂H ( x i ; ξ ) ξ k αH ( x i ; ξ ) 1 δ ¯ exp [ αH ( x i ; ξ ) ] .

When some standard regularity conditions are satisfied (Cox and Hinkley 1974), one can verify that n α ̂ , δ ̂ , ξ ̂ α , δ , ξ converges in distribution to the multivariate N p + 2 0 , K ( [ α , δ , ξ ] ) 1 distribution, where p denotes the dimension of ξ and K([α,δ,ξ]) is the expected information matrix for which the limit identity lim n J n ([α,δ,ξ])=K([α,δ,ξ]) is satisfied. Based on this result, one can compute confidence regions for the OEW parameters. Such regions can be used as decision criteria in several practical situations.

For checking if δ is statistically different from one, i.e. for testing the null hypothesis H0:δ=1 against H1:δ≠1, we use the LR statistic given by LR=2 ( θ ̂ ) ( θ ~ ) , where θ ̂ is the vector of unrestricted MLEs under H1 and θ ~ is the vector of restricted MLEs under H0. Under the null hypothesis, the limiting distribution of LR is a χ 1 2 distribution. If the test statistic exceeds the upper 100(1−α)% quantile of the χ 1 2 distribution, then we reject the null hypothesis.

Censored data occur very frequently in lifetime data analysis. Some mechanisms of censoring are identified in the literature as, for example, types I and II censoring (Lawless 2003). Here, we consider the general case of multi-censored data: there are n=n0+n1+n2 subjects of which n0 is known to have failed at the times x 1 ,, x n 0 , n1 is known to have failed in the interval [ si−1,s i ], i=1,…,n1, and n2 survived to a time r i , i=1,…,n2, but not observed any longer. Note that type I censoring and type II censoring are contained as particular cases of multi-censoring. The log-likelihood function of θ=(δ,α,ξ) for this multi-censoring data reduces to

( θ ) = n 0 log ( δα ) + i = 1 n 0 log h ( x i ; ξ ) α i = 1 n 0 H ( x i ; ξ ) 2 i = 1 n 0 log 1 δ ¯ exp [ αH ( x i ; ξ ) ] + i = 1 n 1 log 1 exp [ α H ( s i ; ξ ) ] 1 δ ¯ exp [ α H ( s i ; ξ ) ] 1 exp [ α H ( s i 1 ; ξ ) ] 1 δ ¯ exp [ α H ( s i 1 ; ξ ) ] + n 2 log ( δ ) α i = 1 n 2 H ( r i ; ξ ) 2 i = 1 n 2 log 1 δ ¯ exp [ αH ( r i ; ξ ) ] .
(21)

The score functions and the observed information matrix corresponding to (21) is too complicated to be presented here.

6 Two special models

In this section, we study two special OEW models, namely the Marshall-Olkin modified Weibull (OW) and Marshall-Olkin Gompertz (OG) distributions. We provide plots of the density and hazard rate functions for some parameters to illustrate the flexibility of these distributions.

6.1 The OW model

For H(x;λ,γ)=xγ exp(λ x) and h(x;λ,γ)=xγ−1 exp(λ x)(γ+λ x), we obtain the OW distribution. Its density function is given by

f ( x ; α , δ , λ , γ ) = δα ( γ + λx ) x γ 1 exp [ λx α x γ exp ( λx ) ] 1 δ ¯ exp [ α x γ exp ( λx ) ] 2 , x > 0 ,

where λ,γ≥0. If δ=1, it leads to the special case of the modified Weibull (W) distribution (Lai et al.2003). In addition, when λ=0, it gives the Weibull distribution. Its cdf and hrf are given by

F ( x ; α , δ , λ , γ ) = 1 exp [ α x γ exp ( λx ) ] 1 δ ¯ exp [ α x γ exp ( λx ) ]

and

τ ( x ; α , δ , λ , γ ) = α x γ 1 exp ( λx ) ( γ + λx ) 1 δ ¯ exp [ α x γ exp ( λx ) ] ,

respectively. In Figures 2(a), 2(b), 2(c) and 2(d), we note some different shapes of the OW pdf. Further, Figures 3(a), 3(b), 3(c) and 3(d) display plots of the OW hrf, which can have increasing, decreasing, non-monotone and bathtub forms.

Figure 2
figure 2

TheOW density functions. (a) For α=0.5,λ=2.0,γ=0.5. (b) For δ=2.0,λ=2.0,γ=0.5. (c) For δ=5.0,δ=2.0,γ=0.5. (d) For α=0.5,δ=2.0,λ=2.0.

Figure 3
figure 3

TheOWN hrfs. (a) For α=0.5,λ=2.0,γ=0.5. (b) For δ=2.0,λ=2.0,γ=0.5. (c) For δ=5.0,δ=2.0,γ=0.5. (d) For α=0.5,δ=2.0,λ=2.0.

The r th raw moment of the OW distribution comes from (13) as

E X r = j = 1 w j μ r (j),
(22)

where μ r (j)= 0 x r g(x;(j+1)α,γ,λ))dx denotes the r th raw moment of the W distribution with parameters (j+1)α,γ and λ. Carrasco et al. (2008) determined an infinite representation for μ r (j) given by

μ r (j)= i 1 , , i r = 1 A i 1 , , i r Γ ( s r / γ + 1 ) [ ( j + 1 ) α ] s r / γ ,
(23)

where

A i 1 , , i r = a i 1 , , a i r and s r = i 1 , , i r ,

and

a i = ( 1 ) i + 1 i i 2 ( i 1 ) ! λ γ i 1 .

Hence, the OW moments can be obtained directly from (22) and (23).

Let x1,…,x n be a sample of size n from XOW(α,δ,λ,γ). The log-likelihood function for the vector of parameters θ=(α,δ,λ,γ) can be expressed as

( θ ) = n log ( δα ) + i = 1 n log ( γ + λ x i ) + ( γ 1 ) i = 1 n log ( x i ) + λ i = 1 n x i α i = 1 n x i λ exp ( λ x i ) 2 i = 1 n log 1 δ ¯ exp α x i γ exp ( λ x i ) .

6.2 The OG model

For H(x;β)=β−1[ exp(β x)−1] and h(x;β)= exp(β x), we obtain the OG distribution. Its pdf is given by

f ( x ; α , δ , β ) = δα exp βx α / β [ exp ( βx ) 1 ] 1 δ ¯ exp α / β [ exp ( βx ) 1 ] 2 , x > 0 ,

where −<β<. For δ=1, it follows the Gompertz distribution as a special case. The OG model is a special case of the Marshall-Olkin Makeham distribution (EL-Bassiouny and Abdo 2009). The cdf and hrf of the OG distribution are given by

F ( x ; α , δ , β ) = 1 exp α / β [ exp ( βx ) 1 ] 1 δ ¯ exp α / β [ exp ( βx ) 1 ]

and

τ ( x ; α , δ , β ) = α exp ( βx ) 1 δ ¯ exp α / β [ exp ( βx ) 1 ] .

Figures 4(a), 4(b) and 4(c) display some plots of the density functions for some values of α, δ and β. The hrf of the Gompertz distribution is increasing (β>0) and decreasing (β<0). Besides these two forms, Figures 5(a), 5(b) and 5(c) indicate that the OG hrf can be bathtub shaped.

Figure 4
figure 4

TheOG density functions. (a) For α=0.5,β=0.7. (b) For δ=2.0,β=2.0. (c) For δ=5.0,α=1.5.

Figure 5
figure 5

TheOG hrfs. (a) For α=25,β=2.0. (b) For δ=0.2,β=0.5. (c) For α=0.01,δ=0.5.

From Equation (15), the OG qf becomes

Q ( u ) = β 1 log β α log 1 δ ¯ u 1 u + 1 .

Let x1,…,x n be a sample of size n from the OG model. The log-likelihood function for the vector of parameters θ=(δ,α,β) can be expressed as

( θ ) = n log ( δα ) + β i = 1 n x i α β i = 1 n exp ( β x i ) 1 2 i = 1 n log 1 δ ¯ exp α [ exp ( β x i ) 1 ] / β .

7 Simulation and applications

This section is divided in two parts. First, we perform a simulation study in order to assess the performance of the MLEs on some points at the parametric space of one of the special models. Second, an application to real data provides evidence in favor of one distribution in the OEW class.

7.1 Simulation study

We present a simulation study by means of Monte Carlo’s experiments in order to assess the performance of the MLEs described in Section 5. To that end, we work with the OG distribution. One of advantages of this model is that its cdf has tractable analytical form. This fact implies in a simple random number generation (RNG) determined by the OG qf given in Section 6.2. The OG generator is illustrated in Figure 6.

Figure 6
figure 6

Illustration of the OG generator for two points at the parametric space.

The simulation study is conducted in order to quantify the influence of η=α/β over the estimation of the extra parameter δ. It is known that η>1 gives the Gompertz distribution which presents mode at zero or, for η<1, having their modes at x=β−1 [1 − log(η)]. An initial discussion using the Kullback-Leibler distance derived in Section 4.3 points out that increasing the contamination (or the bias of the estimates) can affect the quality of fit.

In this study, the following scenarios are taken into account. For the sample size n=50,100,150,200, we adopt as the true parameters the following cases:

  • Scenario η<1: (α,β)=(1,2) and δ∈{0.3,1,4};

  • Scenario η=1: (α,β)=(2,2) and δ∈{0.3,1,4};

  • Scenario η>1: (α,β)=(4,2) and δ∈{0.3,1,4}.

Also, we use 10,000 Monte Carlo’s replications and, at each one of them, we quantify (i) the average of the MLEs and (ii) the mean square error (MSEs).

Table 3 gives the results of the simulation study. In general, the MLEs present smaller values of the biases and MSEs when the sample size increases. It is important to highlight the following atypical case: for the MLEs of α at the scenarios (α,δ,β)∈{(1,4,2),(2,1,2),(4,0.3,2),(4,1,2)} and of δ at (4,0.3,2), the associated biases do not have an inverse monotonic relationship with sample sizes, as expected.However, based on the fact that their MSEs tend to zero, we can expect that there exists a sample size n0 such that biases of the MLEs decrease when the sample sizes increase from n0.

Table 3 Performance of the MLEs for the OG distribution

The results provide evidence that the scenarios under the condition η>1 yield a hard estimation (having larger variation ranges of the MSEs than those obtained for the cases when η<1) for α and β parameters, and that the MLEs present smaller values of the MSEs under such conditions. Figure 7 illustrates the above behavior for the cases δ∈{0.3,0.8,1,2,4} and n=200. In summary, the scenario with less numerical problems is (η,δ)=(2,0.1), whereas that one which requires more attention for estimating the OG parameters is (η,δ)=(0.5,4).

Figure 7
figure 7

MSE curves for δ ∈{0 . 3,1,4}, η =0 . 5,1,2 and n =200. (a)MSE( α ̂ ), (b)MSE( β ̂ ), (c)MSE( δ ̂ ).

7.2 Applications

Here, the usefulness of the OEW distribution is illustrated by means of two real data sets.

7.2.1 Uncensored data

Here, we compare the fits of some special models of the OEW family using a real data set. The estimation of the model parameters is performed by the maximum likelihood method discussed in Section 5. We use the maxLik function of the maxLik package in R language. In this function, if the argument “method” is not specified, a suitable method is selected automatically. For this application, we use the Newton-Raphson method. The data represent the percentage of body fat determined by underwater weighing for 250 men. For more details about the data see http://lib.stat.cmu.edu/datasets/bodyfat.

Table 4 provides some descriptive measures. They suggest an empirical distribution which is slightly asymmetric and platykurtic.

Table 4 Descriptive statistics

We compare the classical models and generalized models within the O family. The null hypothesis H0:δ=1 is tested against H1:δ≠1 using the LR statistic. The comparisons are presented in Table 5. For the OW and OEP models, one cannot say that the parameter δ is statistically different from one at the 10% significance level. Based on this result, we fit the , exponential power (EP), OG and Marshall-Olkin flexible Weibull extension (OFWE) models to the current data (see Table 1). These models are compared with two other three-parameter models, namely: the modified Weibull (W) and generalized Birnbaum-Saunders (GS) (Owen 2006) distributions. The GS density is given by

f ( x ; ϕ , η , κ ) = 1 ϕ 2 πη x κ 1 κ + ηκ x exp 1 2 ϕ 2 ( x η ) 2 η x 2 κ , x > 0 .
Table 5 Comparison of fitted models using the LR test

In Table 6, we present the MLEs (standard errors in parentheses) of the parameters of the fitted OFWE, OG, EP, , W and GS distributions. Also, we provide the goodness-of-fit measures (p-values in parentheses). Thus, these values indicate that the null models are strongly rejected for the OFWE and OG distributions, since the associated p-values are much lower than 0.001.

Table 6 MLEs and goodness-of-fit statistics

Table 7 gives the values of the Akaike information criterion (AIC), Bayesian information criterion (BIC), consistent Akaike information criterion (CAIC) and Hannan-Quinn information criterion (HQIC). Since the values of the AIC, CAIC and HQIC are smaller for the OFWE distribution compared to those values of the other fitted models. Thus, this new distribution seems to be a very competitive model to explain the current data.

Table 7 Statistics AIC, BIC, CAIC and HQIC

Figures 8(a) and 8(b) display the estimated density and survival functions of the OFWE distribution. The plots confirm the excellent fit of this distribution to the data. Figure 8(c) shows that the estimated OFWE hrf is an increasing curve.

Figure 8
figure 8

Plots of the estimated. (a) density, (b) survivor function (c) and hrf of the OFWE model.

7.2.2 Censored data

Now, we consider a set of remission times from 137 cancer patients [Lee and Wang (2003), pag. 231]. Lee and Wang (2003) showed that the log-logistic () model provides a good fit to the data. Ghitany et al. (2005) compared the fits of the OW and models to these data. Now, we present a more detailed study by comparing the fitted , , EP, OW, Marshall-Olkin log-logistic (O), OEP and GS models to these data. The functions H(x;γ,c)= log(1+γ xc) and h(x;γ,c)=γ c xc−1/(1+γ xc) are associated with the model.

The hypothesis that the underlying distribution is (or EP) versus the alternative hypothesis that the distribution is the OW (or OEP) is rejected with p-value = 0.0055 (or p-value = <0.0001). Further, the hypothesis test that the underlying distribution is versus the O distribution yields the p-value =1.0000. Thus, we compare the OW, OEP, and GS models to determine which model gives the best fit to the current data.

Table 8 lists the MLEs (and corresponding standard errors in parentheses) of the parameters and the values of the AD and CM statistics (their p-values in parentheses). The figures in this table, specially the p-values, suggest that the OW distribution yields a better fit to these data than the other three distributions.

Table 8 MLEs and goodness-of-fit statistics

Table 9 lists the values of the AIC, BIC, CAIC and HQIC statistics. The figures in this table indicate that there is a competitiveness among the OW, OEP and models. However, if we observe the Figures 9(a), 9(b) and 9(c), we note that the OW and OEP models present better fits to the current data.

Table 9 Statistics AIC, BIC, CAIC and HQIC
Figure 9
figure 9

Plots of the estimated. (a) Q-Q plot of OW, (b), (c)OEP distributions and (d) Kaplan-Meier curve estimated survival and upper and lower 95% confidence limits for the cancer patients.

Figure 9(d) really shows that the OW and OEP distributions present good fits to the current data. We can conclude that the OW and OEP distributions are excellent alternatives to explain this data set.

8 Conclusion

In this paper, the Marshall-Olkin extended Weibull family of distributions is proposed and some of its mathematical properties are studied. The maximum likelihood procedure is used for estimating the model parameters. Two special models in the family are described with some details. In order to assess the performance of the maximum likelihood estimates, a simulation study is performed by means of Monte Carlo experiments. Special models of the proposed family are compared (through goodness-of-fit measures) with other well-known lifetime models by means of two real data sets. The proposed model outperforms classical lifetime models to these data.

Appendix: An expansion for f(x;δ,α,ξ)F(x;δ,α,ξ)c

Here, we obtain an expansion for the quantity f(x;δ,α,ξ)F(x;δ,α,ξ)c. First, we consider an expansion for F(x;δ,α,ξ)c. Based on (5), the power of the cdf can be expressed as

F ( x ; δ , α , ξ ) c = { 1 exp [ αH ( x ; ξ ) ] } c A { 1 δ ¯ exp [ αH ( x ; ξ ) ] } c B .

Applying expansion (9), we have

A = k = 0 ( 1 ) k c k exp [ kαH ( x ; ξ ) ] .

Now, we expand the quantity B. Equation (9) under the restriction δ<1 (implying that δ ¯ exp[αH(x;ξ)]<1) yields

B = j = 0 ( c ) j j ! δ ¯ j exp [ jαH ( x ; ξ ) ] .

Moreover, it is clear that δ=1 implies B=1. Finally, for δ>1 (i.e., {1 δ ¯ exp[αH(x;ξ)]}>1), the quantity B can be rewritten as

B = 1 [ 1 { 1 δ ¯ exp [ αH ( x ; ξ ) ] } 1 ] c .

Using the binomial expansion, we have

B = j = 0 ( 1 ) j c j 1 1 δ ¯ exp αH ( x ; ξ ) 1 j .

Thus,

F ( x ; δ , α , ξ ) c = I ( δ < 1 ) j , k = 0 ( 1 ) k ( c ) j j ! c k δ ¯ j exp [ ( j + k ) αH ( x ; ξ ) ] + I ( δ = 1 ) k = 0 ( 1 ) k c k exp [ kαH ( x ; ξ ) ] + I ( δ > 1 ) j , k = 0 ( 1 ) j + k c k c j exp [ kαH ( x ; ξ ) ] × [ 1 { 1 δ ¯ exp [ αH ( x ; ξ ) ] } 1 ] j .

Hence, based on Equation (13), the following expansion holds

f ( x ; δ , α , ξ ) F ( x ; δ , α , ξ ) c = v = 0 w v g x ; ( v + 1 ) α , ξ F ( x ; δ , α , ξ ) c = I ( δ < 1 ) j , k , v = 0 ( 1 ) k × w v ( c ) j j ! c k δ ¯ j exp [ ( j + k ) αH ( x ; ξ ) ] g ( x ; ( v + 1 ) α , ξ ) + I ( δ = 1 ) k , v = 0 ( 1 ) k w v c k exp [ kαH ( x ; ξ ) ] g ( x ; ( v + 1 ) α , ξ ) + I ( δ > 1 ) j , k , v = 0 ( 1 ) j + k w v c k c j exp [ kαH ( x ; ξ ) ] × 1 { 1 δ ¯ exp [ αH ( x ; ξ ) ] } 1 j g ( x ; ( v + 1 ) α , ξ ) .
(24)