A new generalization of generalized halfnormal distribution: properties and regression models
 128 Downloads
Abstract
In this paper, a new extension of the generalized halfnormal distribution is introduced and studied. We assess the performance of the maximum likelihood estimators of the parameters of the new distribution via simulation study. The flexibility of the new model is illustrated by means of four real data sets. A new loglocation regression model based on the new distribution is also introduced and studied. It is shown that the new loglocation regression model can be useful in the analysis of survival data and provides more realistic fits than other competitive regression models.
Keywords
Regression Residuals SimulationAbbreviations
 ALs
Average lengths
 BGHN
Beta generalized halfnormal
 BGHNG
Beta generalized halfnormal geometric
 CPs
Coverage probabilities
 GHN
Generalized halfnormal
 HN
Halfnormal
 KwGHN
Kumaraswamy generalized halfnormal
 LZBOLLGHN
LogZografosBalarkishnan odd loglogistic generalized halfnormal
 MLEs
Maximum likelihood estimates
 MSEs
Means square errors
 OLLGHN
odd loglogistic generalized halfnormal
 ZBOLLG
ZografosBalarkishnan odd loglogisticG
 ZBOLLGHN
ZografosBalarkishnan odd loglogistic generalized halfnormal
AMS 2010 Subject Classification
60E05 62J05Introduction
The generalized halfnormal (GHN) distribution has been widely modified and studied in recent years and various authors developed new generalizations of it. Following an idea due to Eugene et al. (2002), Pescim et al. (2017) introduced the beta generalized halfNormal (BGHN) distribution with applications to myelogenous leukemia data. Cordeiro et al. (2012) defined the Kumaraswamy generalized halfnormal (KwGHN) distribution for censored data. More recently, Cordeiro et al. (2013) studied some of the mathematical properties of the BGHN distribution proposed by Pescim et al. (2010b). Pescim et al. (2013) proposed the loglinear regression model based on the BGHN distribution, while Ramires et al. (2013) defined the beta generalized halfnormal geometric (BGHNG) distribution in order to achieve wider diversity among the density and failure rate functions.
Submodels of ZBOLLGHN distribution
Distribution  β  α  λ  θ  Author 

GammaGHN  β  1  λ  θ  New 
GammaHN  β  1  1  θ  New 
OLLGHN  1  α  λ  θ  
OLLHN  1  α  1  θ  
GHN  1  1  λ  θ  
HN  1  1  1  θ 
where π_{w+1}(x;λ,θ)=(w+1)g(x;λ,θ)[G(x;λ,θ)]^{w} denotes the pdf of the expGHN distribution with the power parameter w+1. For the definitions of p_{j,k} and a_{w}(β,α,i,k), please see Cordeiro et al. (2016a). Equation (7) reveals that the density function of X is a linear combination of the expGHN densities. Thus, some of the structural properties of the ZBOLLGHN distribution such as ordinary and incomplete moments and generating function can be obtained from wellestablished properties of the expGHN distribution.
We are motivated to introduce the ZBOLLGHN distribution since it contains a number of aforementioned known lifetime models as illustrated in Table 1. The new distribution exhibits increasing, decreasing, upsidedown as well as bathtub hazard rates as illustrated in Fig. 2. It is shown that the new distribution can be viewed as a mixture of the twoparameter GHN model. It can also be viewed as a suitable model for fitting the leftskewed, rightskewed, symmetric and bimodal data. The ZBOLLGHN distribution outperforms several of the wellknown lifetime distributions with respect to four real data applications as illustrated in “Applications” section. The new loglocation regression model based on the ZBOLLGHN distribution provides better fits than log BGHN, log GHN and logWeibull models for volatage data set. Based on the residual analysis (martingale and modified deviance residuals) for the new loglocation regression model (log ZBOLLGHN), we conclude that none of the observed values appear as possible outliers. Thus, it is clear that the fitted model is appropriate for the voltage data set.
The rest of the paper is organized as follows. In “Estimation” section, the maximum likelihood method is used to estimate the model parameters. The performance of maximum likelihood estimators of the model parameters are investigated by means of a Monte Carlo simulation study when n is finite. A new loglocation regression model as well as residual analysis are presented in “A new loglocation regression model” section. Four applications to real data sets illustrate empirically the importance of the new model in “Applications” section. Finally, a summary is provided in “Summary” section.
Estimation

The optim function of R software is used to minimize the minus loglikelihood function of GHN model by means of the NelderMead (NM) optimization method. There is no need to provide the derivatives of the objective function for NM method.

The estimated parameters of GHN distribution are used as initial values of the ZBOLLGHN model. The initial values of the additional parameters α and β are chosen as 1. Note that the ZBOLLGHN model reduces to GHN model when the parameters α=β=1. Then, the parameter estimation of ZBOLLGHN model are obtained with the optim function as given in the first step.

The inverse of estimated Hessian matrix is used to obtain the corresponding standard errors.
Simulation study
 ✓ The estimated biases decrease when the sample size n increases,

✓ The estimated MSEs decay toward zero as n increases,

✓ The CPs are near 0.95 and approach the nominal value when the sample size n increases,

✓ The ALs decrease for all parameters when the sample size n increases.
These results reveal the consistency property of the MLEs.
A new loglocation regression model
where the random error z_{i} has density function (10), \(\boldmath {\beta }=(\beta _{1},\ldots,\beta _{p})^{\intercal }\), and σ>0, α>0 and β>0 are unknown parameters. The parameter \(\mu _{i}=\mathbf {v}_{i}^{\intercal } \boldmath {\beta }\) is the location of y_{i}. The location parameter vector \({\boldmath {\mu }}=(\mu _{1},\ldots,\mu _{n})^{\intercal }\) is represented by a linear model μ=Vβ, where \(\mathbf {V}=(\mathbf {v}_{1},\ldots,\mathbf {v}_{n})^{\intercal }\) is a known model matrix.
The LZBOLLGHN model (11) provides new opportunities for modeling several types of data sets. This model contains two important regression models as its submodels: (i) for β=1, the LZBOLLGHN model reduces to logOLLGHN regression model introduced by Pescim et al. (2017); (ii) for α=β=1, the LZBOLLGHN model reduces to logGHN regression model.
where \({u_{i}}=2\Phi [\exp (z_{i}\sqrt {2}/2)]\), z_{i}=(y_{i}−μ_{i})/σ, and r is the number of uncensored observations (failures). The MLE \(\widehat {\Theta }\) of the vector of unknown parameters can be evaluated by maximizing the loglikelihood (12). The R software is used to estimate unknown parameters of LZBOLLGHN regression model
The likelihood ratio (LR) statistic can be used for comparing some submodels of LZBOLLGHN regression model. For example, the LR statistic can be used to discriminate between the LZBOLLGHN and LGHN regression models since they are nested models, or equivalently to test H_{0}:α=β=1. The LR statistic reduces to \(w=2\left [\ell (\hat {\alpha },\hat {\beta },\hat {\sigma },\boldsymbol {\hat {\beta }})\ell (1,1,\tilde {\sigma },\boldsymbol {\tilde {\beta }})\right ]\), where \(\left (\hat {\alpha },\hat {\beta },\hat {\sigma },\boldsymbol {\hat {\beta }}\right)\) are the unrestricted MLEs and \((1,1,\tilde {\sigma },\boldsymbol {\tilde {\beta }})\) are the restricted estimates under H_{0}. The statistic w is asymptotically (as n→∞) distributed as \(\chi _{k}^{2}\), where k is difference of two parameter vectors of nested models. For example, take k=2 for the above hypothesis test.
Residual analysis
Residual analysis has critical role to check the adequacy of the fitted model. In order to analyze departures from error assumption, two types of residuals are considered: martingale and modified deviance residuals.
Martingale residual
where \(u_{i}=2\Phi \left [\exp \left (z_{i}\sqrt {2}/2\right)\right ]\) and z_{i}=(y_{i}−μ_{i})/σ.
Modified deviance residual
where \(\hat r_{M_{i}}\) is the martingale residual.
Applications
MLEs and their SEs of the fitted models and goodnessoffit statistics for first data set
Models  α  β  λ  θ  −ℓ  AIC  A*  W*  KS  pvalue 

ZBOLLGHN  0.143  1.360  4.049  8.243  73.053  154.107  0.620  0.098  0.160  0.383 
0.022  0.177  0.003  0.003  
BGHN  0.233  1.327  4.876  13.924  81.868  171.737  2.157  0.348  0.315  0.004 
0.047  0.439  0.004  0.004  
GammaGHN  0.238  4.945  13.941  81.851  169.702  2.193  0.354  0.305  0.005  
0.042  0.003  0.003  
OLLGHN  0.165  4.016  8.488  75.522  157.043  0.750  0.117  0.294  0.008  
0.032  0.141  0.133  
GHN  1.491  10.226  87.927  179.853  3.074  0.521  0.278  0.014  
0.255  0.903 
Lifetime of device data
The first data set is given by Sylwia (2007) on the lifetime of a certain device. Table 2 shows the estimated parameters and their standard errors, −ℓ, A*, W*, KS and its corresponding pvalue and AIC values. Based on the figures in Table 2, it is clear that ZBOLLGHN model provides the best fit for this data set. Figure 5a displays the estimated pdfs of the fitted models. Figure 5b displays the PP plot of ZBOLLGHN distribution and its fitted hrf. Figure 5 shows that ZBOLLGHN distribution provides superior fit to the leftskewed data set.
The LR test results for first data set
Hypotheses  LR  pvalue  

ZBOLLGHN versus OLLGHN  H_{0}:β=1  4.936  0.0262 
ZBOLLGHN versus GammaGHN  H_{0}:α=1  17.595  <0.0001 
ZBOLLGHN versus GHN  H_{0}:α=β=1  29.746  <0.0001 
Failure times of windshield data
MLEs and their SEs of the fitted models and goodnessoffit statistics for second data set
Models  α  β  λ  θ  −ℓ  AIC  A*  W*  KS  pvalue 

ZBOLLGHN  0.531  0.387  7.807  4.079  126.286  260.573  0.566  0.069  0.066  0.847 
0.065  0.059  0.003  0.003  
BGHN  0.240  1.704  6.370  4.527  128.668  265.336  1.005  0.164  0.091  0.476 
0.029  0.375  0.057  0.056  
GammaGHN  0.356  4.438  4.153  128.774  263.548  0.895  0.141  0.085  0.569  
0.472  5.055  0.833  
OLLGHN  0.868  2.145  3.080  129.197  264.394  0.656  0.089  0.072  0.758  
0.231  0.487  0.137  
GHN  1.917  3.107  129.328  262.656  0.600  0.078  0.067  0.834  
0.175  0.135 
The LR test results for second data set
Hypotheses  LR  pvalue  

ZBOLLGHN versus OLLGHN  H_{0}:β=1  5.822  0.016 
ZBOLLGHN versus GammaGHN  H_{0}:α=1  4.976  0.026 
ZBOLLGHN versus GHN  H_{0}:α=β=1  6.084  0.047 
The profile loglikelihood functions of the ZBOLLGHN distribution are plotted but not included here. These plots reveal that the likelihood functions of the ZBOLLGHN distribution have solutions that are maximizers.
Strengths of glass fibres data
MLEs and their SEs of the fitted models and goodnessoffit statistics for third data set
Models  α  β  λ  θ  −ℓ  AIC  A*  W*  KS  pvalue 

ZBOLLGHN  5.820  0.340  1.723  2.240  11.627  31.254  0.529  0.094  0.115  0.373 
6.976  0.134  1.902  0.611  
BGHN  1.131  0.298  3.592  1.324  14.113  36.227  0.973  0.174  0.137  0.186 
0.445  0.362  0.529  0.256  
GammaGHN  1.316  3.670  1.579  14.513  35.026  1.084  0.195  0.144  0.144  
0.545  1.096  0.172  
OLLGHN  1.290  3.761  1.709  14.163  34.328  1.065  0.192  0.136  0.188  
0.328  0.775  0.048  
GHN  4.414  1.682  14.740  33.481  1.052  0.187  0.145  0.141  
0.429  0.036 
The LR test results for third data set
Hypotheses  LR  pvalue  

ZBOLLGHN versus OLLGHN  H_{0}:β=1  5.0716  0.0243 
ZBOLLGHN versus GammaGHN  H_{0}:α=1  5.7716  0.0162 
ZBOLLGHN versus GHN  H_{0}:α=β=1  6.2264  0.0444 
The profile loglikelihood functions of the ZBOLLGHN distribution are plotted but not included here. These plots reveal that the likelihood functions of the ZBOLLGHN distribution have solutions that are maximizers (Fig. 8).
Voltage data
Lawless (2003) reported an experiment in which specimens of solid epoxy electricalinsulation were studied in an accelerated voltage life test. The sample size is n=60, the percentage of censored observations is 10% and there are three levels of voltage: 52.5, 55.0 and 57.5. The variables involved in the study are: x_{i} failure times for epoxy insulation specimens (in min); c_{i}  censoring indicator (0 =censoring, 1 =lifetime observed); v_{i1}  voltage (kV).
MLEs of the parameters to the voltage data for LZBOLLGHN, LBGHN, LGHN and logWeibull regression models, the corresponding SEs in second line, pvalues in third line and the AIC and BIC statistics
Model  α  β  σ  β _{ 0}  β _{ 1}  AIC  BIC 

LZBOLLGHN  41.488  10.857  16.021  21.865  0.177  166.264  176.735 
49.967  11.306  18.828  11.003  0.063  
0.047  0.005  
LBGHN  102.140  1.564  5.306  10.632  0.201  167.100  177.500 
3.989  0.672  0.666  3.304  0.056  
0.002  0.001  
LGHN  0.778  23.637  0.301  178.800  185.100  
0.089  2.928  0.053  
<0.001  <0.001  
LogWeibull  0.845  22.032  0.275  173.400  179.700  
0.090  3.046  0.055  
<0.001  <0.001 
Residual Analysis of LZBOLLGHN model
Summary
A new model called ZografosBalarkishnan odd loglogistic generalized halfnormal is introduced and studied. We assess the performance of the maximum likelihood estimators of the parameters of the new distribution with respect to the sample size n. The assessment is based on a graphical simulation study. The flexibility of the new model is illustrated by means of the three real data sets. The new model performs much better than beta generalized halfnormal, generalized halfnormal, odd loglogistic generalized halfnormal and the generalized halfnormal models. Additionally, a new loglocation regression model based on the new distribution is introduced and studied. The martingale residual and the modified deviance residuals to detect outliers and evaluate the model assumptions are defined. We demonstrate that the new loglocation regression model can be very useful in the analysis of real data and provide more realistic fits than other regression models such as the log beta generalized halfnormal, the log generalized halfnormal and the logWeibull regression models. The potentiality of the new regression model is illustrated by means of a real data.
Notes
Acknowledgments
Not applicable.
Funding
GGH (coauthor of the manuscript) is an Associate Editor of JSDA, 100% discount on Article Processing Charge (APC) for accepted article).
Availability of data and material
The used data sets are given in the manuscript.
Authors’ contributions
EA, HMY and GGH have contributed jointly to all of the sections of the paper. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 Aarts, R.M.: Lauricella functions (2000). www.mathworld.com/LauricellaFunctions.html. From MathWorld  A Wolfram Web Resource, created by Eric W. Weisstein.
 BarretoSouza, W., Santos, A.H., Cordeiro, G.M.: The beta generalized exponential distribution. J. Stat. Comput. Simul. 80, 159–172 (2010).MathSciNetCrossRefGoogle Scholar
 Cooray, K., Ananda, M.M.A.: A generalization of the halfnormal distribution with applications to lifetime data. Commun. Stat. Theory Methods. 37, 1323–1337 (2008).MathSciNetCrossRefGoogle Scholar
 Cordeiro, G.M., Alizadeh, M., Ortega, E.M., Serrano, L.H.V.: The ZografosBalakrishnan odd loglogistic family of distributions: properties and applications. Hacettepe Res. J. Math. Stat. 45, 1781–1803 (2016a).MathSciNetzbMATHGoogle Scholar
 Cordeiro, G.M., Alizadeh, M., Pescim, R.R., Ortega, E.M.M.: The odd loglogistic generalized halfnormal lifetime distribution: properties and applications. Commun. Stat. Theory Methods. 46, 4195–4214 (2016b).MathSciNetCrossRefGoogle Scholar
 Cordeiro, G.M., Pescim, R.R., Ortega, E.M.M.: The Kumaraswamy generalized halfnormal distribution for skewed positive data. J. Data Sci. 10, 195–224 (2012).MathSciNetGoogle Scholar
 Cordeiro, G.M., Pescim, R.R., Ortega, E.M.M., Demétrio, C.G.B.: The beta generalized halfnormal distribution: new properties. J. Probab. Stat. 2013, 1–18 (2013).MathSciNetzbMATHGoogle Scholar
 Eugene, N., Lee, C., Famoye, F.: Betanormal distribution and its applications. Commun. Stat. Theory Methods. 31, 497–512 (2002).MathSciNetCrossRefGoogle Scholar
 Exton, H.: Handbook of hypergeometric integrals: theory, applications, tables, computer programs. Halsted Press, New York (1978).zbMATHGoogle Scholar
 Fleming, T.R., Harrington, D.P.: Counting process and survival analysis. John Wiley, New York (1994).zbMATHGoogle Scholar
 Hamedani, G.G.: On certain generalized gamma convolution distributions II (No. 484). Technical Report No. 484. Marquette University, MSCS (2013).Google Scholar
 Lawless, J.F.: Statistical models and methods for lifetime data, Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ, USA (2003). 2nd edition.Google Scholar
 Murthy, D.P., Xie, M., Jiang, R.: Weibull models (Vol. 505). Wiley (2004).Google Scholar
 Pescim, R.R., Ortega, E.M., Cordeiro, G.M., Alizadeh, M.: A new loglocation regression model: estimation, influence diagnostics and residual analysis. J. Appl. Stat. 44, 233–252 (2017).MathSciNetCrossRefGoogle Scholar
 Pescim, R.R., Demetrio, C.G.B., Cordeiro, G.M., Ortega, E.M.M., Urbano, M.R.: The beta generalized halfnormal distribution. Comput. Stat. Data Anal. 54, 945–957 (2010b).MathSciNetCrossRefGoogle Scholar
 Pescim, R.R., Ortega, E.M.M., Cordeiro, G.M., Demetrio, C.G.B., Hamedani, G.G.: The logbeta generalized halfnormal regression model. J. Stat. Theory Appl. 12, 330–347 (2013).MathSciNetGoogle Scholar
 Ramires, T.G., Ortega, E.M.M., Cordeiro, G.M., Hamedani, G.G.: The beta generalized halfnormal geometric distribution. Stud. Sci. Math. Hung. 50, 523–554 (2013).zbMATHGoogle Scholar
 Smith, R.L., Naylor, J.C.: A comparison of maximum likelihood and bayesian estimators for the threeparameter Weibull distribution. Appl. Stat. 36, 358–369 (1987).MathSciNetCrossRefGoogle Scholar
 Sylwia, K.B.: Makeham’s generalised distribution. Comput. Methods Sci. Tech. 13, 113–120 (2007).CrossRefGoogle Scholar
 Therneau, T.M., Grambsch, P.M., Fleming, T.R.: Martingalebased residuals for survival models. Biometrika. 77, 147–160 (1990).MathSciNetCrossRefGoogle Scholar
 Trott, M.: The mathematica guidebook for symbolics. Springer, New York (2006).zbMATHGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.