Abstract
The objective of this investigation is to provide framework to construct a threefold mixture model and its shifted version using Weibull, lognormal, and gamma distributions. The proposed models are examined by establishing the statistical and reliability indices. The parameter estimation using the maximum likelihood estimation method (MLE) and expectation–maximization has been proposed. The usefulness of the shifted mixture models by fitting them into the actual data set has revealed. The goodness-of-fit tests are used to compare the mixture models for the real-life data. Based on statistical testing, it is established that for small data set, shifted mixture model is the best fitted model in comparison with other single and mixed mixture distributions.
Avoid common mistakes on your manuscript.
Introduction
In recent years, some research articles in reliability literature have appeared to investigate a variety of new mixture models. Using mixed distributions, appropriate modeling of various complex systems can be done. The prediction of various statistical and reliability indices can be made with the help of mixture distributions, particularly when the hazard rate function has some critical shapes. Many researchers have studied two- and threefold mixture models for analyzing the reliability indices using real data sets. Jiang and Murthy [1] characterized the twofold Weibull mixture model in terms of the probability density function (PDF) and cumulative density function (CDF) for plotting the curves by fitting to the real data set. Sarhan [2] used some survival functions to study the performance of the mixture model and provided the result for the reliability indices of the system. Ateya [3] investigated a finite mixed mixture of generalized exponential distribution and estimated the parameter values using the EM algorithm based on maximum likelihood estimation (MLE) in a general form of right-censored failure data. Atienza et al. [4] explored the combination of lognormal distributions to examine the congealment performance of the concerned mixture model. Nedjar and Zeghdoudi [5] discussed the mixture model to study its various properties and simulations of gamma Lindley distribution. Nie et al. [6] considered an unbiased estimation of some reliability indices based on a mixture model of two exponential distributions with known mixing weight functions. Kumar et al. [7] used the mixed mixture distribution using Weibull, lognormal, and gamma distributions to estimate the reliability indices using real data sets. Kumar and Jain [8] presented the survival analysis of COVID-19 vaccine by using the mixture distribution.
Maximum likelihood estimation is widely used for the parameter estimation due to their noble statistical properties. However, it is not easy to find a closed-form solution using the MLE method in particular when the numbers of variables are large and some data are missing. The expectation–maximization (EM) algorithm can be used for the parameter estimation in such cases. Martin and Tatjana [9] considered the mixed models for the best validation of left-truncated real-time data using the combination of gamma, lognormal, and Weibull distribution, respectively. Kumar et al. [10] proposed a mixture model to investigate the reliability indices by using the combination of homogeneous and non-homogeneous continuous distributions. For parameter estimation, they have implemented the MLE and EM methods.
The mixture distributions and its shifted version are more flexible in comparison with single distributions and can be well fitted to a wide variety of applications including survival analysis. The proposed shifted model provides adequate fitting of survival function having enormous real-life applications in a variant scenario encountered in biology and medicine, engineering and technology, genetics and health-care, geology and agriculture, business and marketing, economics and social sciences, etc. In some real-time system, the failure may not occur in the starting as such shifted distribution can be used to represent the survival function of real-time observations. The main objectives to develop mixed mixture model in this article are twofold. Firstly, we introduce the shifted mixed distribution (SMWLG) using Weibull, gamma, and lognormal distributions. Secondly, we will estimate the parameters of the proposed models applying EM methods for the mixed distributions. The estimation of the parameter values by EM methods can be used for the model for which some data are missing. The estimation of eighteen parameters for the shifted mixed distribution (SMWLG) will be of taken into account to study the reliability indices. To demonstrate the usefulness of the underlying shifted mixture model, we shall use some real data set and examine whether it is suitable and effective for data modeling and survival analysis.
Preliminaries on Distributions
Here, we describe specific two-parameter Weibull, lognormal, and gamma distributions for developing the mixture models.
Weibull Distribution (WD)
The probability density function (PDF) of WD with shape parameters ‘α’ and scale parameters ‘β’ is
The mean time to failure (MTTF) of WD is
Reliability of WD is given by
Hazard function of WD is
Lognormal Distribution (LD)
The PDF and MTTF of two-parameter lognormal distribution are given by
and
Reliability function of LD is
where \(\varphi\) is the CDF of normal distribution with mean ‘\({\varvec{\mu}}\)’ and standard deviation ‘\({\varvec{\sigma}}\),’ respectively.
Hazard function of LD is
Gamma Distribution (GD)
The PDF and MTTF of two-parameter gamma distributions with shape parameter ‘a’ and scale parameter ‘b’ are defined as follows:
and
Reliability function of GD is
Hazard function of GD is
Mixture of Weibull, Lognormal, and Gamma Distributions
Now, we consider the three-component mixture of Weibull, lognormal, and gamma distributions and obtain the survival function and reliability indices.
If \(X_{1} ,X_{2} ,and\,X_{3}\) be the random variables, and \(f_{i} \left( t \right)\) be the PDF of \(X_{i} , i = 1,2,3\), respectively, then PDF of threefold model is given by Jayant and Sudha [11].\(f\left( t \right) = \mathop \sum \limits_{i = 1}^{3} p_{i} f_{i} \left( t \right),\)where \(p_{i}\) is mixing weights and \(\mathop \sum \limits_{i = 1}^{3} p_{i} = 1.\)
To develop threefold mixture of Weibull, lognormal, and gamma distributions (MWLG), we use the shape and scale parameters (\(\alpha_{1} ,\beta_{1} ) and \left( {a_{1} ,b_{1} } \right)\), respectively, for the Weibull and gamma distributions. Let \(\mu\) and \(\sigma\) be the mean and standard deviation for the normal distribution. Thus, the PDF of MWLG is
The MTTF for MWLG model is given by
The hazard rate of MWLG model is
The cumulative hazard rate function of MWLG model is
Reversed hazard rate function is
Mills ratio of MWLG model is
Shifted Mixture Model
Let \(Y_{1} ,Y_{2} ,and Y_{3}\) be the random variables representing the shifted Weibull, lognormal, and gamma distributions (SMWLG) with location parameters \(\tau_{1} , \tau_{2} , and\tau_{3} ,\) respectively. The shifted mixture of Weibull, lognormal, and gamma (SMWLG) distributions is formed by using \(g_{1} \left( t \right),g_{2} \left( t \right),andg_{3} \left( t \right)\) as PDF of Weibull, lognormal, and gamma distributions, and the respective shifted PDF are denoted by \(g_{4} \left( t \right),g_{5} \left( t \right),andg_{6} \left( t \right)\). The scale and shape parameters for Weibull (gamma) distribution are denoted by \(\alpha_{2} \left( {a_{2} } \right)\;{\text{and}}\;\beta_{2} \left( {b_{2} } \right)\), respectively. Furthermore, for SMWLG, we denote the mean and standard deviation of normal distribution by \(\mu_{2} \;{\text{and}}\;\sigma_{2}\), respectively. Thus,
Here \(q_{i}\) is mixing weight and \(\mathop \sum \limits_{i = 1}^{6} q_{i} = 1.\)
Now, we get
MTTF for SMWLG is
The reliability function is
The hazard function is obtained using
The cumulative hazard rate of SMWLG model is
Reversed hazard rate function is
The Mills ratio of SMWLG model is
Remarks
When \(\tau_{1} = \tau_{3} = \tau_{3} = 0\), the shifted mixture model (SMWLG) deduces to mixture model (MWLG) composed of Weibull, lognormal, and gamma distributions.
Parameter Estimation
For parameter estimation, we use the EM method cf. Zhang and Barnes [12] based on a random sample of size m. The estimation of the eighteen parameters \(\left( {\varvec{\theta}} \right)\) of SMWLG given in Eq. (21) is performed as follows:
The likelihood function corresponding to Eq. (21) is given by
The MLE equation yields
where \({\varvec{\theta}}_{0} = \left( {q_{1} ,q_{2} ,q_{3} ,q_{4} ,q_{5} ,q_{6} } \right),\)
\(\varvec{ \theta }_{1} = \left( {\alpha_{2} ,\beta_{2} } \right)\), \({\varvec{\theta}}_{2} = \left( {\mu_{2} ,\sigma_{2} } \right)\), \({\varvec{\theta}}_{3} = \left( {a_{2} ,b_{2} } \right)\),
\(\varvec{ \theta }_{4} = \left( {\alpha_{3} ,\beta_{3} } \right)\), \({\varvec{\theta}}_{5} = \left( {\mu_{3} ,\sigma_{3} } \right)\), and \({\varvec{\theta}}_{6} = \left( {a_{3} ,b_{3} } \right)\).
Let us assume that \(t_{j}\), \(j = 1, 2, 3, \ldots , m\) represents a collection of ‘m’ incomplete data and \(y_{1} , y_{2} , y_{3}\) be the missing data, where \(y_{rj} = y_{r} \left( {t_{j} } \right) = 1,\) if the data value belongs to \(r^{th}\) missing data and otherwise 0 for \(r\) = 1, 2, 3, 4, 5, 6, and \(j = 1, 2, \ldots , m\). Now, we apply EM to the shifted mixture model and find the ‘\(y_{r}\)’ which are the missing values.
Here, \({\varvec{y}}_{{{\varvec{rj}}}} = (y_{1j} ,y_{2j} ,y_{3j} ,y_{4j} ,y_{5j} ,y_{6j}\)) are evaluated by using of the conditional expectation E(\(y_{rj} | t_{j}\)). Now,
The functions E (\(y_{rj} | t_{j}\)) are evaluated in the E-step and M-step. Now, we evaluate the parameters (\({\varvec{\theta}})\) and respective parameter values. For SMWLG model, log-likelihood function log L (\({\varvec{\theta}})\) is defined by Zhang and Gui [13].
Partially differentiating Eq. (31) with respect to all parameters, we get
where \(\delta_{2} = \frac{{\mathop \sum \nolimits_{j = 1}^{m} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{y}_{5j} t_{j} }}{{\mathop \sum \nolimits_{j = 1}^{m} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{y}_{5j} }}\) and \(\delta_{3} = \frac{{\mathop \sum \nolimits_{j = 1}^{m} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{y}_{6j} \left( {t_{j} - \tau_{3} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{m} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{y}_{6j} }}\). Here dash ‘\(\prime\)’ denotes the derivative w.r.t. parameters \(a_{2 } \;{\text{and}}\;a_{3}\) in Eqs. (43) and (44), respectively.
To implement the EM algorithm for the proposed mixed model SMWLG, we start with the initial value of \({\varvec{\theta}}\) for the logL(\({\varvec{\theta}})\) used in (31). Here, logL(\({\varvec{\theta}})\) is a linear function of the unobserved data ‘y’ for the concerned problem. Now, we calculate the parameter values by E-step and M-step involved in EM method. The M-step consists of substituting these \(\widetilde{{y_{rj} }}\) values for in Eqs. (32)–(44), and again, we evaluate parameter values through E-step and M-step and iterate until convergence has been achieved. The value for parameter estimation and testing indices using EM is given in Table 1. The goodness-of-fit test can be applied using statistical methods. Now, we compute Akaike’s information criterion (AIC), consistent Akaike’s information criterion (CAIC), and Bayesian information criterion (BIC) tests for best fit of models by taking the survival data. The least value in the CAIC column of Table 1 is displayed in bold letters.
In the present study, AIC, BIC, and CAIC tests are written in the form [-2logL + UV], where L is the likelihood function, and V is the count of parameters. U is equal to 2 for AIC and log(n) for BIC. For small n, the corrected version of AIC is CAIC = AIC + 2 V*(V + 1)/(n-V-1). To choose the best fitting mixed distribution for the competing mixture models, CAIC test is more appropriate when sample size is small, and the best fit has the minimum CAIC value [14].
Numerical Simulation
For validating the proposed mixture models, we consider the real data set and perform the statistical tests for the goodness of fit. We consider the data of ith (i = 1,2,3, …,12) failures (ti) for a system until the time of 12th failures as given below cf. Rigdon and Basu [15] ti = 3, 9, 20, 25, 41, 50, 69, 91, 128, 151, 182, 227.
According to the coagulation of different mixture models, we consider the SMWLG for the estimation of parameters on the real data sets. The values of goodness-of-fit indicators AIC, CAIC, and BIC are computed for all the models. The corresponding parameter estimation and statistical indices are presented in Table 1 for the data set. For small size sample, the lowest value of CAIC demonstrates that the respective mixture model could be chosen as the best model is comparison to other mixture models. For the given data, we notice that the mixture model SMWLG performs better than mixture model MWLG in terms of goodness of fit due to the fact that the indicator CAIC is lowest for this model.
For the different models, the trends of PDF, CDF, R(t), and h(t) are shown in Fig. 1a–d. Figures 2a and b display the reversed hazard rate function and MR, respectively.
Concluding Remarks
This study has presented a comparative study of threefold mixture model with the corresponding shifted mixture models by taking the combination of Weibull, lognormal, and gamma distributions. To validate the proposed model for the real-time data, we have performed the parameter estimation using MLE and EM methods. The goodness-of-fit tests used to test the threefold mixture model and respective threefold shifted model distributions facilitate the choice of the best mixed model. The data analysis of the shifted mixture distribution in comparison with classical mixed statistical distributions can facilitate a meaningful fit to survival function in life time data analysis. The proposed model showed that the survival mixture model is flexible and maintains the feature of pure classical survival models and facilitates better option to model heterogeneous survival data.
Based on numerical experiment, we can conclude that shifted mixture model is best one in terms of CAIC value. The investigation can be further modified for three-parameter distributions or using memory-based properties. The usefulness of the underlying shifted mixture model which involves the location parameters lies in the fact that it outperforms in comparison with simple single and mixture models for survival analysis. The three-parameter finite mixture distributions (FMDs) can be developed in the future due to its rich flexibility but parameter estimation may become tedious job for the same.
References
Jiang R, Murthy DNP (1995) Modeling failure data by mixture of two Weibull distributions: a graphical approach. IEEE Trans Reliab 44(5):477–487
Sarhan AM (2005) Reliability equivalence factors of a parallel system. Reliab Eng Syst Saf 87:405–411
Ateya SF (2014) Maximum likelihood estimation under a finite mixture of generalized exponential distributions based on censored data. Stat Pap 55:311–325
Atienza N, Garcia J, Munoz-Pichardo J, Villa R (2014) Applications of mixture distributions in modulation of length of hospital stay. Stat Med 27(9):1403–1420
Nedjar S, Zeghdoudi H (2016) on gamma Lindley distribution: properties and simulations. J Comput Appl Math 298:167–174
Nie K, Sinha KB, Hedayat AS (2017) Unbiased estimation of reliability function from a mixture of two exponential distributions based on a single observation. Stat Probab Lett 127:7–13
Kumar S, Jain M, Gangopadhay A (2021) A mixture model of the mixed distributions to analyze some reliability indices and survival data. In: Proceedings - 26th international conference on reliability and quality in design (RQD), international society of science and applied technologies (ISSAT), Florida USA, pp 245–250
Kumar S, Jain M (2022) A mixture of hybrid type distributions for the survival analysis of COVID-19 vaccine. Glob J Model Intell Comput 2(1):11–18
Martin B, Tatjana M (2019) on modeling left- truncated loss data using mixture of distributions. Math Econ 85:35–46
Kumar S, Jain M, Gangopadhay A (2022) Reliability indices using mixture distribution homogeneous and non-homogeneous continuous distribution. J Life Cycle Reliab Saf Eng 11:302–312
Jayant VD, Sudha GP (2005) Life time data statistical models and methods. Ser Qual Reliab Eng Stat 11:71–77
Zhang X, Barnes S (2019) Lognormal based mixture models for robust fitting of hospital length of stay distributions. Oper Res Health Care 22:100–184
Zhang Z, Gui W (2019) Statistical inference of reliability of generalized Rayleigh distribution under progressively type. J Comput Appl Math 361:295–312
Nohut S (2020) Three-parameter (3P) Weibull distribution for characterization of strength of ceramics showing R-Curve behavior. Ceram Int 47:2270–2279
Rigdon SE, Basu AP (2000) Statistical methods for the reliability of repairable systems. Wiley, Hoboken
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, S., Jain, M. Shifted Mixture Model Using Weibull, Lognormal, and Gamma Distributions. Natl. Acad. Sci. Lett. 46, 539–545 (2023). https://doi.org/10.1007/s40009-023-01287-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40009-023-01287-y