Improved shrinkage estimators in the beta regression model with application in econometric and educational data

Although beta regression is a very useful tool to model the continuous bounded outcome variable with some explanatory variables, however, in the presence of multicollinearity, the performance of the maximum likelihood estimates for the estimation of the parameters is poor. In this paper, we propose improved shrinkage estimators via Liu estimator to obtain more efficient estimates. Therefore, we defined linear shrinkage, pretest, shrinkage pretest, Stein and positive part Stein estimators to estimate of the parameters in the beta regression model, when some of them have not a significant effect to predict the outcome variable so that a sub-model may be sufficient. We derived the asymptotic distributional biases, variances, and then we conducted extensive Monte Carlo simulation study to obtain the performance of the proposed estimation strategy. Our results showed a great benefit of the new methodologies for practitioners specifically in the applied sciences. We concluded the paper with two real data analysis from economics and education.

1 Introduction Ferrari and Cribari-Neto (2004) firstly introduced Beta regression (BR) to model the outcome variables bounded in the interval (a, b) to be explained by some variables.The main assumption of the BR is that the dependent variable has beta distribution.Several applications of the BR have been studied in the literature.For instance, when modelling the proportion of income spent on food, the poverty rate, the proportion of crude oil converted to gasoline and the proportion of surface covered by vegetation (Qasim et al. 2021).The BR model has also been applied for modeling bounded time series data in analyzing Canada Google Flu Trends (Guolo and Varin 2014).Recently, the BR model has gained attention in machine learning area such that Espinheira et al. (2019) proposed some criteria for variable selection in beta regression models.
Usually, the maximum likelihood estimator (MLE) is used to estimate the unknown regression coefficients in the BR model (Ferrari and Cribari-Neto 2004).Hence, a multicollinearity problem may arise in the count regression model, when there are near linear dependencies between predictor variables.As a remedy, Karlsson et al. (2020) studied a Liu estimator (Liu 1993) approach in the BR model to overcome the multicollinearity problem.
In regression models, when there is some prior information about the parameter vector β under a linear restriction defined as Hβ = h, the shrinkage strategies, namely linear shrinkage (Thompson 1968), pretest (Bancroft 1944), shrinkage pretest estimator (Ahmed 1992), Stein estimator (Stein 1956), and positive Stein estimators (Kibria and Saleh 2004) are applied to the estimation of parameters.The parameter vector β is partitioned into two parts as β = (β 1 , β 2 ) where β 1 is of order p 1 × 1 containing the active or significant parameters and β 2 is of order p 2 × 1 containing the inactive parameters that are not significantly effective in predicting the dependent variable in this setting.Note that the number of regression parameters is p = p 1 + p 2 .Therefore, there are two models, such as a full model or unrestricted model including all parameters estimated with the maximum likelihood method and a sub-model or restricted model only containing the significant parameters.For more details about methodology of shrinkage estimations, see Ahmed (2014) and Kibria and Saleh (2004).
The main purpose of this paper is to develop effective methods for the BR model in the presence of highly correlated variables where some of them may not have significant effect on the models specifically in econometric and education data.Therefore, we propose improved shrinkage estimation strategies by making use of the Liu estimator in the BR model to estimate the parameters in the presence of multicollinearity.We derive the theoretical properties of the proposed estimators and conduct a Monte Carlo simulation experiment to evaluate their relative performance with respect to the usual unrestricted Liu estimator.We observe that the proposed estimators, specifically the Liu Stein estimator, uniformly outperform the usual estimators in both the simulation studies and the real data application.
The organization of the paper is as follows: We introduce the BR model and the unrestricted Liu estimator, propose a restricted Liu estimator, and then derive the shrinkage Liu estimators in Sect. 2. In Sect.3, the asymptotic distributional bias and variance of the proposed estimators are presented.Asymptotic evaluations of the variance of the proposed estimators are given in Sect. 4. We provide the details of the Monte Carlo simulation experiment to compare the performance of the proposed estimators in Sect. 5. We apply the proposed estimation methods to two real data sets in Sect.6.Finally, conclusive remarks are presented in Sect.7.

Theory and method
In this section, we briefly introduce the BR model.Then, the unrestricted and restricted Liu estimators and shrinkage Liu estimators are defined.

Beta regression model
Assume that y = [y 1 , y 2 , . . ., y n ] be the vector of observations of the response variable following independent beta distribution with two shape parameters a and b such that the probability distribution function (pdf) is given as where a > 0, b > 0 and (.) is the gamma function, and it is denoted as y i ∼ Beta (a, b).The mean and variance of each y i are, respectively, .
Following Ferrari and Cribari-Neto (2004), we use a different re-parametrization in order to derive the BR model.Let us suppose that μ = a/(a + b) and φ = a + b which is called the precision parameter.Now, the pdf of y i can be written as where 0 < μ < 1 and φ > 0 such that y i ∼ Beta (μφ, (1 − μ)φ).Therefore, the mean and variance of each observation becomes respectively, E(y i ) = μ and V ar( Now, the beta regression model can be written by assuming that the mean of y i can be written as where x i is the ith observation vector such that X = x 1 , x 2 , . . ., x n which is the design matrix of order n × p, (n > p), β = β 1 , β 2 , . . ., β p is a vector of regression parameters.In Equation (3), we assume that the link function g(.) is a strictly monotone and twice differentiable function from the interval (0, 1) to R p .Although alternative link functions are available for the BR model (Ferrari and Cribari-Neto 2004), we use the logit link function given as g(μ) = log(μ/(1 − μ) such that for i = 1, 2, . . ., n.Thus, the corresponding log-likelihood function of the BR model given in (3) can be written as One should use an iterative algorithm to obtain the parameter estimates due to the nonlinearity of the log-likelihood function.Therefore, the score functions can be obtained by differentiating the log-likelihood function with respect to the parameters β and φ respectively as and where and the Fisher's information matrix as where Ferrari and Cribari-Neto (2004) for derivations of the score functions and the Fisher's information matrix in detail.
It is known that under usual regularity conditions, the asymptotic distribution of the maximum likelihood estimators β and φ of β and φ as n → ∞, is approximately given by

Unrestricted Liu estimator in beta regression
One estimation method for handling the multicollinearity problem in the BR model is the Liu estimator, introduced by Karlsson et al. (2020), having the following form where 0 < d < 1 is the Liu biasing parameter and W is a diagonal matrix such that the ith diagonal element is equal to μ i = ex p(x i β).
In this study, following Karlsson et al. (2020), we estimate the Liu parameter d by d = max 0, where λ max is the maximum eigenvalue of the matrix X WX and β max is the maximum element of the maximum likelihood estimator.

Restricted Liu estimator
When there exists some prior information regarding the parameters as linear restrictions, some of them are not significant and should be eliminated from the model to improve estimation efficiency.Therefore, the following general hypothesis on β is defined where H is a p 2 × p matrix, p 2 is the number of non-significant parameters, and h is a p 2 × 1 known vector.Then, based on Kibria and Saleh (2012), the restricted estimator of β denoted by β RMLE can be written as where I −1 is the inverse of the Fisher's information matrix given in the previous sub-section.In the presence of multicollinearity, following Kibria and Saleh (2012), a restricted Liu estimator in the BR model denoted by β RL is defined as The test statistic for testing the null hypothesis given in ( 11) is defined as As n → ∞, the above test statistic has an asymptotic chi-square distribution with p 2 degrees of freedom.

Liu linear shrinkage estimator
We denote the Liu linear shrinkage estimator of β by β L L S as follows: where 0 ≤ δ ≤ 1 is the confidence level in prior information and can be specified by the researcher.However, if there is no prior information on δ, then one can estimate the optimum value of the δ, by minimizing the mean square error of β L L S with respect to δ (See Online Appendix 0), as follows: and (.) is the variance-covariance matrix.It is clear that the δ optimal depends on the unknown value of β.We can recommend that the users can use the estimated values of β for practical situations.

Liu pretest estimator
The Liu pretest estimator of β denoted by β L PT has the following form where I (.) is an indicator function and T n,α is the α-level upper value of the distribution of the test statistic T n .The Liu pretest estimator has two choices so that, if H 0 :

Liu shrinkage pretest estimator
The Liu shrinkage pretest estimator of β denoted by β L S P E is as note that, β L S P E is more efficient than β L PT in many parts of the parameter space.

Liu Stein estimator
We denote the Liu Stein estimator of β by β L S that combines the Liu unrestricted and Liu restricted estimator in an optimal way, dominating the Liu unrestricted estimator is defined as follows

Liu positive Stein estimator
The Liu positive Stein estimator of β denoted by β L P S is defined as where z + = max(0, z).The β L P S adjusts controls for the over-shrinking problem in Liu Stein estimator.For more on Liu estimator for Stein type estimator, we refer Kibria (2012) among others.

Asymptotic properties
In this section, we provide the asymptotic properties of the Liu shrinkage estimators introduced in Sect. 2. To explore the properties when the subspace information Hβ = h is wrong, we consider the sequence of local alternatives where ϑ = (ϑ 1 , ϑ 2 , ..., ϑ p 2 ) ∈ R p 2 is a p 2 × 1 vector of fixed values.In order to compare the estimators, we compute the asymptotic distributional bias (B) and the asymptotic distributional variance (V) of the proposed estimators.Suppose β is any of the proposed estimators of β.The asymptotic distributional bias of β is defined as Also, the asymptotic distributional variance of β is defined as We present the following lemma which are useful for computing the asymptotic results of proposed estimators.
Lemma 3.1 Under the sequence of local alternatives {K (n) } given in (21) and the usual regularity conditions of MLE, as n → ∞ where , I p is an identity matrix of order p, and I −1 is the inverse of Fisher information matrix given right after the Eq.(8).
Proof See Online Appendix 1.
Using Lemma 3.1, we present the asymptotic properties of the Liu shrinkage estimators in the following theorems.
Theorem 3.2 Under the sequence of local alternatives given in (21) and the usual regularity conditions, the asymptotic distributional biases of the proposed estimators are as follows .
Proof See Online Appendix 2.
Theorem 3.3 Under the local alternatives given in (21) and the usual regularity conditions, the asymptotic distributional variances of the estimators are as follows Proof See Online Appendix 3.

Some asymptotic evaluations of the variance of the proposed estimators
In this section, we compare the asymptotic distributional variances of the seven estimators discussed in Sect.3. The following definition is very helpful for comparison purposes.
4. β L P S is superior to the β L S if: The right hand side of the equations above is just real numbers.Since the expectation of a positive random variable is positive, then by definition of an indicator function, Since P(χ 2 p 2 +2 ( * ) > 0) = 1 Thus, for all * ∈ (0, +∞) V(β L P S ) ≤ V(β L S ).

Monte Carlo simulation
In this section, we provide the details of an extensive Monte Carlo simulation study in order to compare the performances of the listed estimators in terms of relative efficiency, which is defined in Eq. ( 24) where β * corresponds to the listed methods in the paper Since one of our main aims is to investigate the performance of the estimators under a multicollinear design, we generate the design matrix X using the multivariate normal distribution with zero mean vector 0 and variance covariance matrix such that X ∼ N (0, ) ∈ R n× p , where i j = ρ |i− j| , i, j = 1, 2, . . ., p, n is the sample size and p is the number of predictor variables.In this setting, ρ controls the degree of correlation between the predictors, and it is taken as 0.6 and 0.9.
We consider the candidate sub-model given in (11).The hypothesis H 0 : Hβ = h is tested against H 1 : Hβ = h where H = 0 p 2 × p 1 , I p 2 ∈ R p 2 × p is a matrix of rank p 2 , I p 2 ∈ R p 2 × p 2 is an identity matrix of order p 2 such that p = p 1 + p 2 .The sample size is chosen to be n = 50, 100, 200.The true regression parameters are taken to be β = β 1 , β 2 where β 1 ∈ R p 1 and β 2 ∈ R p 2 are active and inactive parameter vectors respectively.
The response variable is generated using the beta distribution such that y i ∼ Beta (μ i φ, (1 − μ i )φ) where .
which is known as the logit link function and the dispersion parameter is fixed to be 5.The confidence level in prior information is taken as 0.5 which means that equal weights are put on the unrestricted and restricted Liu estimates.However, one can use the estimated value of δ using ( 16).In designing this Monte Carlo experiment, we also aim to understand the effects of the departure from the true parameter vector.Thus, we focus on the two different cases, namely, β 2 = 0 p 2 meaning that the null hypothesis H 0 holds and β 2 = 0 p 2 which means that the alternative hypothesis H 1 holds.In order to measure the effect of the departure from the null hypothesis, we define another parameter that represents the distance between the simulated model and the candidate sub-model.It is defined as = β − β (0) where . is the usual Euclidean norm and β (0) = β 1 , 0 p 2 is the parameter vector under H 0 .Thus, = 0 means In the second scenario, β 1 is the same while Moreover, p 2 is chosen to be 10, 15, 20.
The number of repetitions in the simulation is 1000.The simulated mean squared error of an estimator β * is computed as follows Note that we use the relative efficiency, which is the relative mean squared errors (RMSE) such that a value of RMSE larger than one shows that the estimator β * is superior to β U R .
The results of the simulation is summarized in Figs. 1 and 2 showing the RMSE performance of the methods with respect to .The following the conclusions can be deducted from the figures: • At = 0, the performance of the restricted Liu estimator is the best in all the situations.When the null hypothesis is violated, the RMSE of this estimator sharply decreases.• At = 0, the Liu positive Stein estimator is better than the Liu Stein estimator.However, as moves away from zero, the performance of these two estimators becomes the same.• The RMSE of all estimators increases as the correlation between predictor variables increases.
• The RMSE of all estimators generally decreases as the sample size n increases.
• The most important result is that for high correlation (ρ = 0.9) the relative efficiencies of all estimates are higher than the unrestricted estimator.• For high correlation (ρ = 0.9), the restricted Liu shrinkage estimators generally has higher relative efficiencies in a wide range of .

Government spending in Dutch cities data (2005)
The aim of this data is to explain the proportion of Dutch city budgets spent on administration and government based on 10 covariates.The data is contained in fmlogit package in R as a data frame with 429 observations and 12 variables.The dependent variable is governing and the remaining variables are the explanatory variables which are given in Table 1.Since there are missing values in some observations, we exclude them and make a complete case analysis.We fit a beta regression model and observe the significant variables.We summarize the unrestricted and restricted models in Table 2. From Fig. 3, it is seen that there is a high correlation between some covariates.Also, we compute the condition number (CN) of the matrix of cross products X WX as 809.097 which is defined as the square root of the ratio of the maximum eigenvalue to the minimum eigenvalue X WX.Both of the correlation plot and CN indicate that there is a severe collinearity problem and using the usual beta regression for this data may not be appropriate, as such analysis may result in unreliable estimates.Therefore, we applied the proposed estimators given in this paper.The corresponding values of the  2 indicate that five of the variables houseval, education, recreation, social and urbanplanning are effective and the rest of variables are ineffective (see also the R 2 values).Therefore, we use this restriction in the analysis and computed the proposed Liu shrinkage estimators (Table 3).We obtained the optimal value of δ as 0.48 in the Liu shrinkage estimator.Further, we set significance level as 0.05 in the preliminary test estimator.To evaluate the performance of the proposed new estimators, we apply the bootstrap technique with n = 200 and 2000 boot times.
We then compute the mean squares as the square of the estimated bias plus the square of the standard deviation for each estimator.
The results show that the bootstrap root mean squares given in Table 3 (and standard errors, Online Appendix 4) of the proposed estimators, specifically the Stein-type and positive Stein-type shrinkage estimators, are generally lower than those based on the unrestricted maximum likelihood estimator of the beta regression model.The relative efficiencies also show that the restricted estimators had the highest value of 2.235 which is preferable to the other estimates.

Student performance data set
This data approach student achievement in secondary education of two Portuguese schools.The data attributes include student grades, demographic, social and school    Cortez and Silva (2008), the data set was modeled under binary/five-level classification and regression tasks.However, we use beta regression without categorization of the outcome variable, as it may lose some useful information (Altman and Royston 2006).Note that the outcome variable is final grade (G3) which has a strong correlation with attributes second period grade (G2) and first period grade (G1).As the G3 (final grade in mathematics) was bounded in the interval [0, 20] , we converted it to the interval (0, 1) and fitted beta regression.Based on the AIC and R 2 measures, we form the null model given in Table 4.We then apply the estimators developed in this paper.We set the significance level (α) equal to 0.05.Since we do not have any prior information about the parameters, therefore we use the estimated the value of δ (0.354) in the linear shrinkage estimator.The results show that our proposed estimates outperform the unrestricted in most of the cases in terms of the bootstrap root mean squares (Table 5) and standard errors (Online Appendix 4).Further, of those proposed estimates, the restricted estimators are by far the best estimate in terms of having higher relative efficiency (2.579).Positive Stein and Stein type estimates are the other favorable estimators.

Conclusion
In this paper, we considered different types of improved shrinkage estimators based on the Liu estimator, namely, restricted, preliminary test, Stein-type, positive Steintype, and linear shrinkage estimators for the beta regression model.We obtained the analytical biases and variances of the proposed estimators under the local alternative hypothesis.Further, we conducted an extensive simulation study to examine the performance of the proposed estimators in a limited number of samples.Our results showed that Stein-type estimators uniformly outperform the usual maximum likelihood estimators.Other shrinkage type estimators also had higher relative efficiencies compared to the maximum likelihood in a wide range of parameter space.We concluded the paper by applying the proposed methodology for two known real data from

Definition 4. 1
Let B be the parameter space of β.If two estimators β * and β * * are such that V( β * ) ≤ V( β * * ) for all values of β ∈ B, with strict inequality for at least one β, we say that β * dominates β * * 1. β RL is superior to the β U R if:

Fig. 3
Fig. 3 Bivariate correlation plot of the explanatory variables in real data Coefficients and bootstrapped root mean square errors of the proposed

Table 1
Variable descriptions in the dataset

Table 2
Variables included in the competing models

Table 4
Variables included in the competing models

Table 5
Coefficients and bootstrapped root mean square errors of the proposed estimators for education data Coefficients