Properties and inference for a new class of XGamma distributions with an application


The current paper introduces a new flexible probability distribution model called transmuted XGamma distribution which pullulates from the XGamma distribution and possesses the characteristics of XGamma distribution in special cases. In the paper, we obtain the explicit expressions for some important statistical properties of the introduced distribution such as hazard rate and survival functions, mean residual life, moment-generating function, moments, skewness, kurtosis, distribution of its order statistics, Lorenz and Bonferroni curves. Besides obtaining the various effective estimators for the parameters of the distribution, estimation performances of these estimators are comparatively examined with a series of Monte Carlo simulations. Furthermore, to demonstrate the modeling ability of the proposed distribution on real-world phenomena, an illustrative example is performed by using an actual data set in connection with the field of the lifetime.


Introduction
The performances of the statistical analyses of real-world phenomena depend on whether the distributions chosen as the models are appropriate for the phenomena. Although famous distribution families are often used for the modeling of a wide range of events, the modeling performances of these distribution families may not always reach the desired level. In recent two decades, to overcome this problem, many researchers have introduced new distribution families for optimally modeling of the real-world phenomena by using some generalization of the available families or the new distribution generating methodologies. In the literature, several techniques are available for generating a new distribution. One of these is the method of quadratic rank transmutation map (QRTM) proposed by Shaw and Buckley [1]. The newly derived distribution by this method, called transmuted distribution, depends on a baseline distribution. In addition to possessing the features of baseline distributions, the transmuted distributions are always being more flexible than the baseline distributions. In the literature, one can find a number of published papers showing the importance of the QRTM method and derived distributions by this method, see [2][3][4][5][6][7]. A transmuted probability model can be easily derived by applying the following definition to an available probability model. Definition 1 Suppose X be a random variable with cumulative distribution function G(x) and probability density function g(x) , and be a real constant such that | | ≤ 1 . By these notations, the transmuted cdf F(x) corresponding to the baseline cdf G(x) is given by and the corresponding pdf to the transmuted cdf F(x) is The parameter is called the transmutation parameter, and when = 0 , the transmuted cdf F and the baseline cdf G are the same [1].
Recently, the XGamma distribution is introduced by Sen et al. [8] as the probability distribution model with a single shape parameter. The XGamma, which has many useful 1 3 statistical features, is a probability distribution that could have the potential use for the modeling of lifetime data from a wide range of the field of science. Sen et al. [8] have studied many useful features of XGamma distribution. Although it has nice statistical properties, it is a disadvantage of XGamma that the distribution has only one parameter which plays a crucial role in determining the various behaviors of the distribution. Until today, various attempts have been made by several researchers to eliminate this disadvantage of the distribution, see [9][10][11][12][13][14]. However, the XGamma distribution needs to be improved in an aspect of the ability to a model for a wide variety of data types, especially the data with the hazard rates in different forms.
The aim of this study is to derive an alternative distribution for modeling data sets with various hazard rates. In this context, we introduce a new probability distribution model called the transmuted XGamma (TXG) distribution by following the idea of an extension of the distribution families with the QRTM method. The derived distribution has a more general form than the XGamma distribution and provides a better fit than the XGamma distribution for data sets with more various forms of the hazard rates.
The rest of the paper is organized as follows: In "TXG distribution" section, the TXG distribution is derived and its reliability properties are comprehensively investigated. For the moments and moment-based measures of the TXG distribution, some explicit expressions are given in "Moments of the TXG distribution" section. Section "Lorenz and Bonferroni curves" discusses the Lorenz and Bonferroni curves of the TXG distribution. The distributions of the order statistics of the TXG distribution are given in "Order statistics" section. In "Inference" section, the statistical inference problem for the TXG distribution is investigated by using the maximum likelihood (ML) and least-square (LS) methodologies. In "Simulation study" section, some numerical studies are performed to compare the estimation performances of the estimators performed in "Inference" section. To exemplify how the TXG distribution works in practice as a model, we analyze a real-world dataset in "Data analysis" section. Finally, "Conclusion" section concludes the study.

TXG distribution
In this section, we derive the TXG distribution by applying Definition 1 to the XGamma distribution as a baseline distribution. Before progressing to derivation, let us recall the pdf and cdf of the XGamma distribution.
The pdf of the XGamma distribution is given by and the corresponding cdf is given by where > 0 is the shape parameter of the distribution [8].
Hence, considering Definition 1, the pdf (3) and cdf (4), the TXG distribution is described by the following definition.

Definition 2
If a random variable X has the TXG distribution with parameters > 0 , and −1 ≤ ≤ 1 , then its cdf is and pdf of the TXG is expressed by As in the XGamma distribution, parameter > 0 controls the shape of the distribution. In addition to the role of the parameter in determining the behavior of the distribution, parameter , (| | ≤ 1) contributes to the flexibility of the distribution. In the remainder of the paper, we will use the notation X ∼ TXG( , ) to indicate a random variable X from TXG distribution with parameters and .
For the TXG distribution, we now derive the main constituent elements of the reliability analysis, such as survival function, hazard rate function and mean residual lifetime.
When the random variable X is described as the lifetime of a unit, the survival function is a function that gives the probability P(X > t) for any certain time t ≥ 0 . It is clear from the definition that the survival function is a probability computed as S(t) = 1 − F(t) . Thus, the survival function of the TXG distribution, S(t) , is expressed as F(x; , ) = ( + 1) f (x; , ) As a second constituent element of the reliability analysis, the general form of the hazard rate function, also known as failure rate, is defined by Following the general form of the hazard rate function given by Eq. (8), the hazard rate function of the TXG distribution can be immediately expressed in a closed form as Now, we obtain the mean residual life function, m(x), of the TXG distribution. The mean residual life function of a continuous random variable is defined by By considering the survival function of the TXG distribution given by (7), the mean residual life function of the TXG distribution is easily obtained as Plots of the pdf and hazard rate function of the TXG distribution for several combinations of the parameter values are displayed in Fig. 1 in order to exemplify its distributional behavior. From the plots, it is lucidly seen that the density of TXG distribution is right-skewed and also its hazard rate function can be in the different forms such as increasing, decreasing-increasing, and decreasing-increasing -decreasing.

Moments of the TXG distribution
In this section, we derive the some basic statistical measures of the TXG distribution such as moments, skewness and kurtosis coefficients. The moment-generating function, M X (t) , following the general definition M X (t) = E e tX , is obtained as The rth non-central moment is obtained for the TXG distribution, using Eq. (6), as follows: See "Appendix" for calculation of the rth non-central moment of the TXG distribution. By using the expression of the rth moment, the first and second moments of the TXG distribution are, respectively, obtained as and Hence, by Eqs. (12) and (13), the variance of the TXG distribution is easily written as The skewness and kurtosis coefficients for the TXG distribution can be calculated from the expressions where (14)

Lorenz and Bonferroni curves
Following the general definition of the Lorenz index L(p), the Lorenz index of the TXG distribution is obtained as where indicates the expectation of the TXG distribution and q = F −1 (p) . Thus, the Lorenz curve of the TXG distribution can be drawn plotting the L(p) against the cdf of the TXG distribution.
Similarly, the Bonferroni index of TXG distribution is also obtained as

Order statistics
Let X 1 , X 2 , … , X n be a random sample drawn from the TXG( , , , ) distribution and X (1) ≤ X (2) ≤ ⋯ ≤ X n implies its order statistic. By these assumptions, the density of the order statistic X (i) , i = 1, 2, … , n is given by By using the pdf (6) and cdf (5) in Eq. (19), density of the ith order statistic of the TXG distribution is easily obtained as

ML estimation
Let X 1 , X 2 , … , X n be a random sample drawn from TXG( , ) distribution. The logarithmic likelihood function of the random variables X i , i = 1, 2, … , n is, considering the pdf (6), easily written as By deriving the logarithmic likelihood function given by Eq.
(23) with respect to parameters and , and equated them to zero, we have the following likelihood equations: The ML estimators of the parameters and are obtained solution of nonlinear equation system given by Eqs.
(24)-(25). There is no analytical solution of this nonlinear equation system. However, the ML estimates of the parameters and , say ̂M L and ̂M L , respectively, can be easily obtained using a numerical technique such as the method of Newton.

LS estimation
The LS estimation is a relatively new method from the ML estimation. The method introduced by Swain et al. [15] can be easily applied to the statistical inference problem for a specific distribution family by using notations of the [15].
Let us assume that X 1 , … , X n is a random sample drawn from TXG distribution and X (1) ≤ X (2) ⋯ ≤ X (n) is its order statistic. By these notations, the LS estimators of the parameters and , say ̂L S and ̂L S , respectively, can be easily obtained by minimizing the nonlinear function  where S XG (x) indicates the survival function of the XGamma distribution. By deriving the quadratic form Q( , ) with respect to parameters and equated them to zero, we achieve the following nonlinear equations: (29) here Therefore, the LS estimates ̂L S and ̂L S are obtained from solution of Eqs. (30) and (31).

Simulation study
In this section, a Monte Carlo simulation study is carried out to show and compare the estimation performances of the ML and LS estimators obtained in "Inference" section. In the simulation study, the parameter is set as 0.5 and 1.5, and the transmutation parameter is set as −0.75, −0.25, 0.25 and 0.75. For both estimators, estimation (means), bias and mean square error (MSE) values are calculated through the different sample of sizes n = 30, 50, 100 . The simulation study results obtained by 1000 repetitions are shown in Table 1. From the results given in Table 1, it is lucidly seen that the bias and MSE values of both estimators decrease when the sample size increases. Essentially, this is an expected conclusion for the ML estimators since the ML estimators are asymptotically unbiased and consistent. In addition to these conclusions, it is clearly seen from the results tabulated by Table 1 that the ML estimators outperform the LS with smaller bias and MSE values. Therefore, we can say that the estimation performance of the ML estimators is better than LS estimators.

Data analysis
In this section, a data modeling is provided on a real data set called the Kevlar 49 / epoxy data set to show the modeling capability of the TXG distribution and compare it with XGamma distribution as a sub-model.  The Kevlar 49/epoxy data set [16] includes 101 observations which represent the failure times of Kevlar 49/epoxy strands which were subjected to constant sustained pressure at the 90% stress level until all had failed. The data could be seen in [16].
The total time on test (TTT) plot, see [17], is a commonly used tool in determining the underlying distribution of the data with a suitable hazard rate function. TTT plot of the Kevlar 49/epoxy data set is shown in Fig. 3. Lucidly from Fig. 3, the data set has a decreasing-increasing-decreasing hazard rate function and TXG distribution is a suitable model for these data.
For the Kevlar 49/epoxy data set, ML estimates of the model parameters, Kolmogorov-Smirnov (K-S) statistics and the corresponding p values, negative log-likelihood (Neg.L) values, Akaike information criterion (AIC), and consistent Akaike information criterion (cAIC) values are tabulated in Table 2. According to K-S statistics provided in Table 2, both models are appropriate for modeling the Kevlar 49/epoxy data. Besides, the TXG distribution is a more appropriate model than XGamma distribution with smaller Neg.L, AIC and cAIC values. Thus, it can be concluded that the derived model TXG by this paper is to gain a modeling performance to the baseline distribution XGamma. When the TXG distribution is selected as a model for the Kevlar 49/epoxy data set, we also present Fig. 4 which shows the Q-Q plot of the data, the fitted cdf with together empirical cumulative distribution function (ECDF) of the data, and the fitted pdf with together histogram of the data. As can be clearly seen from the Q-Q plot of the data given by Fig. 4a, the data points fall approximately on the straight line and the fitted cdf provided in Fig. 4b closely follows the empirical cdf. Furthermore, fitted pdf provided by Fig. 4c closely imitates the behavior of the data histogram. To test that TXG for Fig. 4 Q-Q plots, empirical and fitted cdfs and fitted pdfs for the Kevlar 49/epoxy data this data set is a more suitable model than XGamma, it can also be developed a statistic for discriminating between the TXG and XGamma distributions by following the methods used in [18,19]. Figure 5 shows the fitted hazard rate functions with the TXG and XGamma distributions. Although both distributions are appropriate models for modeling the Kevlar 49/ epoxy data set as a result of the model selection criteria, hazard rate function of the optimal model must have a decreasing-increasing-decreasing ID form based on the result of TTT plot given by Fig. 3. As can be clearly seen from Fig. 5, the fitted hazard rate function of the TXG distribution is consistent with the TTT graph of the data, but XGamma is not consistent. Therefore, it is concluded that the TXG distribution is an optimal model for the Kevlar 49/epoxy data set.

Conclusion
In this paper, we have introduced a new two-parameter probability distribution model called the TXG by following extensions of the distributions with QRTM method. In the paper, important statistical properties of the TXG distribution such as hazard rate and survival function, moments, moment-generating function, skewness, kurtosis, distribution of its order statistics, and Lorenz and Bonferroni curves have been obtained in closed forms. This feature facilitates the applicability of distribution for practitioners from different fields of science. Besides obtaining the ML and LS estimators to estimate the unknown model parameters and , we in the paper have compared the estimation performances of them by an extensive numerical simulation study on the different sample of sizes-small, moderate and large. By the numerical study results, we conclude the performances of both estimators are satisfactory, and the bias and MSE values of them are decreasing when the sample size is increasing. Thus, we can say that both estimators are asymptotically unbiased and consistent. By Fig. 1 given in the paper, the pdf and hazard rate behaviors of the distribution have lucidly exemplified, and it is plain that the TXG distribution is a more possible model than the XGamma distribution in an aspect of modeling ability of various data types. This conclusion is supported by the application on Kevlar 49/epoxy data set given in the paper. According to the model selection criteria K-S, AIC, cAIC, and Neg.L, the TXG distribution provides better data fit than the XGamma distribution. In addition, while the fitted hazard rate function of the XGamma distribution does not consistent with the TTT plot of the data, the fitted hazard rate function of the TXG distribution is closely related to the behavior of TTT plot of the data set. Therefore, it can be concluded that the TXG distribution has capable of modeling more data types than the baseline distribution XGamma.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.