1 Introduction

Ferrari and Cribari-Neto (2004) firstly introduced Beta regression (BR) to model the outcome variables bounded in the interval (ab) to be explained by some variables. The main assumption of the BR is that the dependent variable has beta distribution. Several applications of the BR have been studied in the literature. For instance, when modelling the proportion of income spent on food, the poverty rate, the proportion of crude oil converted to gasoline and the proportion of surface covered by vegetation (Qasim et al. 2021). The BR model has also been applied for modeling bounded time series data in analyzing Canada Google®Flu Trends (Guolo and Varin 2014). Recently, the BR model has gained attention in machine learning area such that Espinheira et al. (2019) proposed some criteria for variable selection in beta regression models.

Usually, the maximum likelihood estimator (MLE) is used to estimate the unknown regression coefficients in the BR model (Ferrari and Cribari-Neto 2004). Hence, a multicollinearity problem may arise in the count regression model, when there are near linear dependencies between predictor variables. As a remedy, Karlsson et al. (2020) studied a Liu estimator (Liu 1993) approach in the BR model to overcome the multicollinearity problem.

In regression models, when there is some prior information about the parameter vector \(\varvec{\beta } \) under a linear restriction defined as \( {{\textbf {H}}} \varvec{\beta } = {{\textbf {h}}} \), the shrinkage strategies, namely linear shrinkage (Thompson 1968), pretest (Bancroft 1944), shrinkage pretest estimator (Ahmed 1992), Stein estimator (Stein 1956), and positive Stein estimators (Kibria and Saleh 2004) are applied to the estimation of parameters. The parameter vector \( \varvec{\beta } \) is partitioned into two parts as \(\varvec{\beta } = (\varvec{\beta }_1^\prime , \varvec{\beta }_2^\prime )^\prime \) where \( \varvec{\beta }_1 \) is of order \(p_1 \times 1\) containing the active or significant parameters and \( \varvec{\beta }_2 \) is of order \(p_2 \times 1\) containing the inactive parameters that are not significantly effective in predicting the dependent variable in this setting. Note that the number of regression parameters is \(p=p_1 + p_2\). Therefore, there are two models, such as a full model or unrestricted model including all parameters estimated with the maximum likelihood method and a sub–model or restricted model only containing the significant parameters. For more details about methodology of shrinkage estimations, see Ahmed (2014) and Kibria and Saleh (2004).

The main purpose of this paper is to develop effective methods for the BR model in the presence of highly correlated variables where some of them may not have significant effect on the models specifically in econometric and education data. Therefore, we propose improved shrinkage estimation strategies by making use of the Liu estimator in the BR model to estimate the parameters in the presence of multicollinearity. We derive the theoretical properties of the proposed estimators and conduct a Monte Carlo simulation experiment to evaluate their relative performance with respect to the usual unrestricted Liu estimator. We observe that the proposed estimators, specifically the Liu Stein estimator, uniformly outperform the usual estimators in both the simulation studies and the real data application.

The organization of the paper is as follows: We introduce the BR model and the unrestricted Liu estimator, propose a restricted Liu estimator, and then derive the shrinkage Liu estimators in Sect. 2. In Sect. 3, the asymptotic distributional bias and variance of the proposed estimators are presented. Asymptotic evaluations of the variance of the proposed estimators are given in Sect. 4. We provide the details of the Monte Carlo simulation experiment to compare the performance of the proposed estimators in Sect. 5. We apply the proposed estimation methods to two real data sets in Sect. 6. Finally, conclusive remarks are presented in Sect. 7.

2 Theory and method

In this section, we briefly introduce the BR model. Then, the unrestricted and restricted Liu estimators and shrinkage Liu estimators are defined.

2.1 Beta regression model

Assume that \({{\mathbf {y}}}=\left[ y_1, y_2, \ldots , y_n\right] ^\prime \) be the vector of observations of the response variable following independent beta distribution with two shape parameters a and b such that the probability distribution function (pdf) is given as

$$\begin{aligned} f( y_{i}; a , b)=\frac{\Gamma (a+b)}{\Gamma (a)\Gamma (b)}y^{(a-1)} (1-y)^{(b-1)}, \ y \in (0,1), \end{aligned}$$
(1)

where \(a>0, ~b>0\) and \(\Gamma (.)\) is the gamma function, and it is denoted as \(y_i \sim Beta(a,b)\). The mean and variance of each \(y_i\) are, respectively,

$$\begin{aligned} E(y_i) = a/(a+b) \end{aligned}$$

and

$$\begin{aligned} Var(y_i)=\frac{ab}{(a+b)^2(a+b+1)}. \end{aligned}$$

Following Ferrari and Cribari-Neto (2004), we use a different re-parametrization in order to derive the BR model. Let us suppose that \(\mu = a/(a+b)\) and \(\phi = a+b\) which is called the precision parameter. Now, the pdf of \(y_i\) can be written as

$$\begin{aligned} f( y_{i}; \mu , \phi )=\frac{\Gamma (\phi )}{\Gamma (\mu \phi )\Gamma (\phi (1-\mu ))}y^{\mu \phi -1} (1-y)^{(1-\mu )\phi -1}, \ y \in (0,1), \end{aligned}$$
(2)

where \(0<\mu <1\) and \(\phi >0\) such that \(y_i \sim Beta\left( \mu \phi , (1-\mu )\phi \right) \). Therefore, the mean and variance of each observation becomes respectively, \(E(y_i)=\mu \) and \(Var(y_i)=V(\mu )/(1+\phi )\) where \(V(\mu )=\mu (1-\mu )\).

Now, the beta regression model can be written by assuming that the mean of \(y_i\) can be written as

$$\begin{aligned} g(\mu _i)=\sum _{j=1}^p x_{ij}\beta _j={{\mathbf {x}}}_i^\prime {\varvec{\beta }}=\eta _i \end{aligned}$$
(3)

where \({{\mathbf {x}}}_i^\prime \) is the ith observation vector such that \({{\mathbf {X}}}=\left[ {{\mathbf {x}}}_1^\prime ,{{\mathbf {x}}}_2^\prime ,\ldots ,{{\mathbf {x}}}_n^\prime \right] \) which is the design matrix of order \(n \times p, (n>p)\), \({\varvec{\beta }}=\left[ \beta _1, \beta _2, \ldots , \beta _p \right] ^\prime \) is a vector of regression parameters. In Equation (3), we assume that the link function g(.) is a strictly monotone and twice differentiable function from the interval (0, 1) to \({\mathbb {R}}^p\).

Although alternative link functions are available for the BR model (Ferrari and Cribari-Neto 2004), we use the logit link function given as \(g(\mu )=\log (\mu /(1-\mu )\) such that

$$\begin{aligned} \mu _i=\frac{\exp ({{\mathbf {x}}}_i^\prime {\varvec{\beta }})}{1+\exp ({{\mathbf {x}}}_i^\prime {\varvec{\beta }})} \end{aligned}$$
(4)

for \(i=1,2, \ldots , n\). Thus, the corresponding log-likelihood function of the BR model given in (3) can be written as

$$\begin{aligned} l({\varvec{\beta }})= & {} \sum _{i=1}^n \log (\Gamma (\phi ))-\log (\Gamma (\mu _i\phi ))-\log (\Gamma ((1-\mu _i)\phi )) +(\phi \mu _i-1)\log (y_i)\nonumber \\&+((1-\mu _i)\phi -1)\log (1-y_i). \end{aligned}$$
(5)

One should use an iterative algorithm to obtain the parameter estimates due to the nonlinearity of the log-likelihood function. Therefore, the score functions can be obtained by differentiating the log-likelihood function with respect to the parameters \({\varvec{\beta }}\) and \(\phi \) respectively as

$$\begin{aligned} U_{{\varvec{\beta }}}({\varvec{\beta }}, \phi ) = \phi {{\mathbf {X}}}^\prime {{\mathbf {T}}}({{\mathbf {y}}}^* - {\varvec{\mu }}^*) \end{aligned}$$
(6)

and

$$\begin{aligned} U_{\phi }({\varvec{\beta }}, \phi ) = \sum _{i=1}^n \{ \mu _i(y_i^*-\mu _i^*)+\log (1-y_i)-\psi (\phi (1-\mu _i) +\psi (\phi ))\} \end{aligned}$$
(7)

where \(y_i^*=\log (y_i/(1-y_i)\), \(\mu _i^*=\psi (\mu _i \phi )\), \({{\mathbf {T}}}=diag\{1/g'(\mu _1),\ldots , 1/g'(\mu _n) \}\), \({{\mathbf {y}}}^*=(y_1^*,\ldots , y_n^*)^\prime \), \({\varvec{\mu }}^*=(\mu _1^*,\ldots , \mu _n^*)^\prime \) and the Fisher’s information matrix as

$$\begin{aligned} {{\mathbf {K}}}={{\mathbf {K}}}({\varvec{\beta }}, \phi )= \begin{pmatrix} {{\mathbf {K}}}_{{\varvec{\beta }}{\varvec{\beta }}} &{} {{\mathbf {K}}}_{{\varvec{\beta }}\phi }\\ {{\mathbf {K}}}_{\phi {\varvec{\beta }}} &{} {{\mathbf {K}}}_{\phi \phi } \end{pmatrix} \end{aligned}$$
(8)

where \({{\mathbf {K}}}_{{\varvec{\beta }}{\varvec{\beta }}}=\phi {{\mathbf {X}}}^\prime {{\mathbf {W}}}{{\mathbf {X}}}= \varvec{\mathcal {I}}\), \({{\mathbf {K}}}_{{\varvec{\beta }}\phi }={{\mathbf {K}}}_{\phi {\varvec{\beta }}}={{\mathbf {X}}}^\prime {{\mathbf {T}}}\) and \({{\mathbf {K}}}_{\phi \phi }=trace({{\mathbf {D}}})\), \({{\mathbf {D}}}=diag\{d_1, \ldots , d_n \}\) with \(d_i=\psi ^\prime (\mu _i\phi )\mu _i^2+\psi ^\prime ((1-\mu _i)\phi )(1-\mu _i)^2-\psi ^\prime (\phi )\) such that \(\psi ^\prime (.)\) is the trigamma function, \({{\mathbf {c}}}=(c_1, \ldots , c_n)\) with \(c_i=\phi \{\psi ^\prime (\mu _i\phi )\mu _i-\psi ^\prime ((1-\mu _i)\phi )(1-\mu _i) \}\). Please see the appendix of Ferrari and Cribari-Neto (2004) for derivations of the score functions and the Fisher’s information matrix in detail.

It is known that under usual regularity conditions, the asymptotic distribution of the maximum likelihood estimators \(\widehat{\varvec{\beta }}\) and \(\widehat{\phi }\) of \({\varvec{\beta }}\) and \(\phi \) as \(n \rightarrow \infty \), is approximately given by

$$\begin{aligned} \begin{pmatrix} \widehat{\varvec{\beta }}\\ \widehat{\phi } \end{pmatrix} \sim N_{p+1} \left[ \begin{pmatrix} {\varvec{\beta }}\\ \phi \end{pmatrix}, {{\mathbf {K}}}^{-1} \right] . \end{aligned}$$
(9)

2.2 Unrestricted Liu estimator in beta regression

One estimation method for handling the multicollinearity problem in the BR model is the Liu estimator, introduced by Karlsson et al. (2020), having the following form

$$\begin{aligned} \widehat{\varvec{\beta }}^{UR} = ({{\textbf {X}}}^\prime \widehat{{{\mathbf {W}}}} {{\textbf {X}}}+{{\textbf {I}}}_p )^{-1}( {{\textbf {X}}}^\prime \widehat{{{\mathbf {W}}}} {{\mathbf {X}}}+d{{\textbf {I}}}_p) \widehat{\varvec{\beta }}, \end{aligned}$$
(10)

where \(0<d<1\) is the Liu biasing parameter and \(\widehat{{{\mathbf {W}}}}\) is a diagonal matrix such that the ith diagonal element is equal to \( \widehat{\mu }_i = exp( {{\mathbf {x}}}^\prime _i \widehat{\varvec{\beta }}) \).

In this study, following Karlsson et al. (2020), we estimate the Liu parameter d by \({\widehat{d}} =\max \left( 0, \frac{ \widehat{\varvec{\beta }}_{max}^2-1 }{\lambda _{max}+\widehat{\varvec{\beta }}_{max}^2 } \right) \) where \(\lambda _{max}\) is the maximum eigenvalue of the matrix \( {{\mathbf {X}}}' \widehat{{{\mathbf {W}}}}{{\mathbf {X}}}\) and \(\widehat{\varvec{\beta }}_{max}\) is the maximum element of the maximum likelihood estimator.

2.3 Restricted Liu estimator

When there exists some prior information regarding the parameters as linear restrictions, some of them are not significant and should be eliminated from the model to improve estimation efficiency. Therefore, the following general hypothesis on \( \varvec{\beta } \) is defined

$$\begin{aligned} {{\textbf {H}}}_0 : {{\textbf {H}}} \varvec{\beta } = {{\textbf {h}}} \,\, vs \,\, {{\textbf {H}}}_1 : {{\textbf {H}}} \varvec{\beta } \ne {{\textbf {h}}} \end{aligned}$$
(11)

where \( {{\textbf {H}}} \) is a \( p_2 \times p \) matrix, \( p_2 \) is the number of non-significant parameters, and \( {{\textbf {h}}} \) is a \( p_2 \times 1 \) known vector. Then, based on Kibria and Saleh (2012), the restricted estimator of \({\varvec{\beta }}\) denoted by \( \widehat{\varvec{\beta }}_{\text {RMLE}}\) can be written as

$$\begin{aligned} \widehat{\varvec{\beta }}_{\text {RMLE}}= \widehat{\varvec{\beta }}- \varvec{\mathcal {I}}^{-1} {{\mathbf {H}}}^\prime \Big ( {{\mathbf {H}}}\, \varvec{\mathcal {I}} ^{-1} {{\textbf {H}}}^\prime \Big )^{-1} \Big ( {{\textbf {H}}} \, \widehat{\varvec{\beta }}- {{\textbf {h}}} \Big ) \end{aligned}$$
(12)

where \(\varvec{\mathcal {I}}^{-1}\) is the inverse of the Fisher’s information matrix given in the previous sub-section. In the presence of multicollinearity, following Kibria and Saleh (2012), a restricted Liu estimator in the BR model denoted by \(\widehat{\varvec{\beta }}^{RL}\) is defined as

$$\begin{aligned} \widehat{\varvec{\beta }}^{RL}= ( {{\mathbf {X}}}^\prime \widehat{{{\mathbf {W}}}}{{\mathbf {X}}}+ {{\mathbf {I}}}_p )^{-1} ({{\mathbf {X}}}^\prime \widehat{{{\mathbf {W}}}}{{\mathbf {X}}}+d{{\mathbf {I}}}_p)\widehat{\varvec{\beta }}_{\text {RMLE}}. \end{aligned}$$
(13)

The test statistic for testing the null hypothesis given in (11) is defined as

$$\begin{aligned} T_n = n \, ({{\textbf {H}}} \widehat{\varvec{\beta }}^{UR}- {{\textbf {h}}})^\prime \Big ( {{\textbf {H}}} \ ( \frac{1}{n} \varvec{\mathcal {I}} )^{-1} {{\textbf {H}}}^\prime \Big )^{-1} ({{\textbf {H}}} \widehat{\varvec{\beta }}^{UR}- {{\textbf {h}}}). \end{aligned}$$
(14)

As \( n \rightarrow \infty \), the above test statistic has an asymptotic chi-square distribution with \( p_2 \) degrees of freedom.

2.4 Improved estimators

Now, by applying shrinkage strategies defined in many papers for example Asar and Korkmaz (2022), Lisawadi et al. (2020), Hossain et al. (2018), and Hossain and Ahmed (2012), we propose the improved shrinkage estimators based on the unrestricted and restricted Liu estimators.

2.5 Liu linear shrinkage estimator

We denote the Liu linear shrinkage estimator of \( \varvec{\beta }\) by \(\widehat{\varvec{\beta }}^{LLS}\) as follows:

$$\begin{aligned} \widehat{\varvec{\beta }}^{LLS}= \delta \, \widehat{\varvec{\beta }}^{RL}+ (1 - \delta ) \, \widehat{\varvec{\beta }}^{UR}, \end{aligned}$$
(15)

where \( 0 \le \delta \le 1 \) is the confidence level in prior information and can be specified by the researcher. However, if there is no prior information on \(\delta \), then one can estimate the optimum value of the \(\delta \), by minimizing the mean square error of \(\widehat{\varvec{\beta }}^{LLS}\) with respect to \(\delta \) (See Online Appendix 0), as follows:

$$\begin{aligned} \delta _{optimal} = \dfrac{ {{\textbf {B}}} \, E \Big [ \widehat{\varvec{\beta }}\, \widehat{\varvec{\beta }}^\prime \Big ] \, {{\textbf {H}}}^\prime \, \varvec{\mathcal {Z}}^\prime \, {{\textbf {B}}}^\prime {{\textbf {B}}} \, E \Big [ \widehat{\varvec{\beta }}\Big ] \, {{\textbf {h}}}^\prime \, \varvec{\mathcal {Z}}^\prime \, {{\textbf {B}}}^\prime - \varvec{\beta } \, E \Big [ \widehat{\varvec{\beta }}^\prime \Big ] \, {{\textbf {H}}}^\prime \, \varvec{\mathcal {Z}}^\prime \, {{\textbf {A}}}^\prime + \varvec{\beta } \, {{\textbf {h}}}^\prime \, \varvec{\mathcal {Z}}^\prime \, {{\textbf {B}}}^\prime }{ {{\textbf {B}}}\, \varvec{ \Sigma } \big ( \widehat{\varvec{\beta }}\big )\, {{\textbf {B}}}^\prime \Big [ {{\textbf {I}}}_p - \big ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}} \,{{\textbf {H}}} \, {{\textbf {H}}}^\prime \, \varvec{\mathcal {Z}}^\prime \big ) \, \big ( {{\textbf {B}}} \,{{\textbf {B}}}^\prime - 2 \, {{\textbf {I}}}_p \big ) \Big ] } ,\nonumber \\ \end{aligned}$$
(16)

in which \( {{\textbf {B}}} = ( {{\textbf {X}}}^\prime \widehat{{{\mathbf {W}}}}{{\textbf {X}}} + \, {{\textbf {I}}}_p )^{-1} ( {{\textbf {X}}}^\prime \widehat{{{\mathbf {W}}}}{{\textbf {X}}} + d \, {{\textbf {I}}}_p ) \), \( \varvec{\mathcal {Z}} = \varvec{\mathcal {I}}^{-1} {{\textbf {H}}}^\prime \Big ( {{\textbf {H}}}\, \varvec{\mathcal {I}} ^{-1} {{\textbf {H}}}^\prime \Big )^{-1} \) and \(\varvec{\Sigma }(.) \) is the variance–covariance matrix. It is clear that the \(\delta _{optimal}\) depends on the unknown value of \(\varvec{\beta }\). We can recommend that the users can use the estimated values of \(\varvec{\beta }\) for practical situations.

2.6 Liu pretest estimator

The Liu pretest estimator of \( \varvec{\beta } \) denoted by \( \widehat{\varvec{\beta }}^{LPT}\) has the following form

$$\begin{aligned} \widehat{\varvec{\beta }}^{LPT}= \widehat{\varvec{\beta }}^{UR}- (\widehat{\varvec{\beta }}^{UR}- \widehat{\varvec{\beta }}^{RL}) \, I( T_n \le T_{n,\alpha }) , \end{aligned}$$
(17)

where I(.) is an indicator function and \( T_{n,\alpha } \) is the \( \alpha \)-level upper value of the distribution of the test statistic \( T_n \). The Liu pretest estimator has two choices so that, if \( {{\textbf {H}}}_0 : {{\textbf {H}}} \varvec{\beta } = {{\textbf {h}}} \) is not rejected then, \( \widehat{\varvec{\beta }}^{LPT}= \widehat{\varvec{\beta }}^{RL}\) otherwise, \( \widehat{\varvec{\beta }}^{LPT}= \widehat{\varvec{\beta }}^{UR}\).

2.7 Liu shrinkage pretest estimator

The Liu shrinkage pretest estimator of \( \varvec{\beta } \) denoted by \( \widehat{\varvec{\beta }}^{LSPE}\) is as

$$\begin{aligned} \widehat{\varvec{\beta }}^{LSPE}= \widehat{\varvec{\beta }}^{UR}- \delta \,(\widehat{\varvec{\beta }}^{UR}- \widehat{\varvec{\beta }}^{RL}) \, I( T_n \le T_{n,\alpha }), \end{aligned}$$
(18)

note that, \( \widehat{\varvec{\beta }}^{LSPE}\) is more efficient than \( \widehat{\varvec{\beta }}^{LPT}\) in many parts of the parameter space.

2.8 Liu Stein estimator

We denote the Liu Stein estimator of \( \varvec{\beta } \) by \( \widehat{\varvec{\beta }}^{LS}\) that combines the Liu unrestricted and Liu restricted estimator in an optimal way, dominating the Liu unrestricted estimator is defined as follows

$$\begin{aligned} \widehat{\varvec{\beta }}^{LS}= \widehat{\varvec{\beta }}^{RL}+ [ 1 - (p_2 - 2)\, T^{-1}_n ] \Big (\widehat{\varvec{\beta }}^{UR}- \widehat{\varvec{\beta }}^{RL}\Big ) ,\qquad p_2\ge 3. \end{aligned}$$
(19)

2.9 Liu positive Stein estimator

The Liu positive Stein estimator of \( \varvec{\beta } \) denoted by \( \widehat{\varvec{\beta }}^{LPS}\) is defined as

$$\begin{aligned} \widehat{\varvec{\beta }}^{LPS}=\widehat{\varvec{\beta }}^{RL}+ [ 1 - (p_2 - 2)\, T^{-1}_n ]^{+} \Big (\widehat{\varvec{\beta }}^{UR}- \widehat{\varvec{\beta }}^{RL}\Big ) ,\qquad p_2\ge 3 , \end{aligned}$$
(20)

where \( z^{+} = \max (0 , z) \). The \( \widehat{\varvec{\beta }}^{LPS}\) adjusts controls for the over–shrinking problem in Liu Stein estimator. For more on Liu estimator for Stein type estimator, we refer Kibria (2012) among others.

3 Asymptotic properties

In this section, we provide the asymptotic properties of the Liu shrinkage estimators introduced in Sect. 2. To explore the properties when the subspace information \({{\textbf {H}}}\varvec{\beta } = {{\textbf {h}}} \) is wrong, we consider the sequence of local alternatives

$$\begin{aligned} \mathcal {K}_{(n)} \, : \, {{\textbf {H}}}\varvec{\beta } = {{\textbf {h}}} + \frac{\varvec{\vartheta }}{\sqrt{n}} , \end{aligned}$$
(21)

where \( \varvec{\vartheta } = (\vartheta _1, \vartheta _2, ..., \vartheta _{p_2})^\prime \in R^{p_2} \) is a \( p_2 \times 1 \) vector of fixed values. In order to compare the estimators, we compute the asymptotic distributional bias \( ( \mathcal {B}) \) and the asymptotic distributional variance \( (\mathcal {V}) \) of the proposed estimators.

Suppose \( \widehat{{\varvec{\beta }}} \) is any of the proposed estimators of \( {\varvec{\beta }}\). The asymptotic distributional bias of \( \widehat{{\varvec{\beta }}} \) is defined as

$$\begin{aligned} \mathcal {B} ( \widehat{{\varvec{\beta }}}) = \lim _{n \rightarrow \infty } E \Big ( \sqrt{n} ( \widehat{{\varvec{\beta }}} - \varvec{\beta } ) \Big ). \end{aligned}$$
(22)

Also, the asymptotic distributional variance of \( \widehat{{\varvec{\beta }}} \) is defined as

$$\begin{aligned} \mathcal {V} ( \widehat{{\varvec{\beta }}}) = \lim _{n \rightarrow \infty } E \Big ( \sqrt{n} ( \widehat{{\varvec{\beta }}} - \varvec{\beta } ) \, \sqrt{n} ( \widehat{{\varvec{\beta }}} - \varvec{\beta } ) ^\prime \Big ) \end{aligned}$$
(23)

We present the following lemma which are useful for computing the asymptotic results of proposed estimators.

Lemma 3.1

Under the sequence of local alternatives \( \lbrace \mathcal {K}_{(n)} \rbrace \) given in (21) and the usual regularity conditions of MLE, as \( n \rightarrow \infty \)

$$\begin{aligned} \varvec{\kappa }^n_1&= \sqrt{n} ( \widehat{\varvec{\beta }}^{UR}- \varvec{\beta } ) \xrightarrow { D } \mathbf {\kappa }_1 \sim \mathcal {N}_{p} \Big ( {{\textbf {B}}} - {\mathbf {I}}_p \, ,\, {{\textbf {B}}} \varvec{\mathcal {I}}^{-1} {{\textbf {B}}}^\prime \Big ) \, , \, \\ \varvec{\kappa }^n_2&= \sqrt{n} ( {{\mathbf {H}}}\widehat{\varvec{\beta }}^{UR}- {\mathbf {h}} ) \xrightarrow { D } \varvec{\kappa }_2 \sim \mathcal {N}_{p} \Big ( {\mathbf {H}} ({{\textbf {B}}} - {\mathbf {I}}_p) \varvec{\beta } + \varvec{\vartheta } \, ,\, {\mathbf {H}} \, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1} {{\textbf {B}}}^\prime {\mathbf {H}}^\prime \Big ) , \\ \varvec{\kappa }^n_3&= \sqrt{n} ( \widehat{\varvec{\beta }}^{RL}- \varvec{\beta } ) \xrightarrow { D } \varvec{\kappa }_3 \sim \mathcal {N}_{p} \Big ( ({\mathbf {I}}_p - \varvec{\mathcal {Z}} {{\textbf {H}}}) ({{\textbf {B}}} - {\mathbf {I}}_P) \varvec{\beta } - \varvec{\mathcal {Z}} \varvec{\vartheta } \, , \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime - \varvec{\mathcal {Z}} {\mathbf {H}} \, {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime \Big ) , \\ \varvec{\kappa }^n_4&= \sqrt{n} ( \widehat{\varvec{\beta }}^{UR}- \widehat{\varvec{\beta }}^{RL}) \xrightarrow { D } \varvec{\kappa }_4 \sim \mathcal {N}_{p} \Big ( \varvec{\mathcal {Z}} [ {\mathbf {H}} \, ({{\textbf {B}}} - {\mathbf {I}}_p) \varvec{\beta } + \varvec{\vartheta } ] \, , \, \varvec{\mathcal {Z}} {\mathbf {H}} {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime \Big ) , \\ \begin{pmatrix} \varvec{\kappa }^n_1 \\ \varvec{\kappa }^n_4 \end{pmatrix}&\xrightarrow { D } \begin{pmatrix} \varvec{\kappa }_1 \\ \varvec{\kappa }_4 \end{pmatrix} \sim \mathcal {N}_{2p} \Bigg [ \begin{pmatrix} ({{\textbf {B}}} - {\mathbf {I}}_p)\, \varvec{\beta } \\ \varvec{\mathcal {Z}} {{\textbf {H}}} ({{\textbf {B}}} - {\mathbf {I}}_P)\, \varvec{\beta } + \varvec{\mathcal {Z}} \varvec{\vartheta } \end{pmatrix} , \begin{pmatrix} {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime &{} \varvec{\mathcal {Z}} {\mathbf {H}} {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime \\ \varvec{\mathcal {Z}} {\mathbf {H}} {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime &{} \varvec{\mathcal {Z}} {\mathbf {H}} \, {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime \end{pmatrix} \Bigg ], \nonumber \\ \begin{pmatrix} \varvec{\kappa }^n_3 \\ \varvec{\kappa }^n_4 \end{pmatrix}&\xrightarrow { D } \begin{pmatrix} \varvec{\kappa }_3 \\ \varvec{\kappa }_4 \end{pmatrix} \sim \mathcal {N}_{2p} \Bigg [ \begin{pmatrix} ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}} {\mathbf {H}} ) \,( {{\textbf {B}}} - {\mathbf {I}}_p )\,\varvec{\beta } - \varvec{\mathcal {Z}} \varvec{\omega } \\ \varvec{\mathcal {Z}} {\mathbf {H}}\,( {{\textbf {B}}} - {\mathbf {I}}_p ) \varvec{\beta } + \varvec{\mathcal {Z}} \varvec{\omega } \end{pmatrix} , \nonumber \\&\qquad \qquad \qquad \qquad \qquad \begin{pmatrix} {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime - \varvec{\mathcal {Z}} {\mathbf {H}} \, {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime &{} {{\textbf {0}}}\\ {{\textbf {0}}} &{} \varvec{\mathcal {Z}} {\mathbf {H}} \, {{\textbf {B}}}\, \varvec{\mathcal {I}}^{-1}{{\textbf {B}}}^\prime \end{pmatrix} \Bigg ], \end{aligned}$$

where \( \varvec{\mathcal {Z}} = \varvec{\mathcal {I}}^{-1} {{\textbf {H}}}^\prime ( {{\textbf {H}}}\, \varvec{\mathcal {I}}^{-1} {{\textbf {H}}}^\prime )^{-1} \), \( {{\textbf {B}}} = ( {{\mathbf {X}}}^\prime \widehat{{{\mathbf {W}}}}{{\mathbf {X}}}+ k\, {{\mathbf {I}}}_p )^{-1} ({{\mathbf {X}}}^\prime \widehat{{{\mathbf {W}}}}{{\mathbf {X}}}+{{\mathbf {I}}}_p) \), \( {{\textbf {I}}}_p \) is an identity matrix of order p, and \( \varvec{\mathcal {I}}^{-1} \) is the inverse of Fisher information matrix given right after the Eq. (8).

Proof

See Online Appendix 1.

Using Lemma 3.1, we present the asymptotic properties of the Liu shrinkage estimators in the following theorems.

Theorem 3.2

Under the sequence of local alternatives given in (21) and the usual regularity conditions, the asymptotic distributional biases of the proposed estimators are as follows

$$\begin{aligned} \mathcal {B} ( \widehat{\varvec{\beta }}^{UR})&= \, ( {{\textbf {B}}} - {\mathbf {I}}_p )\,\varvec{\beta } , \\ \mathcal {B} ( \widehat{\varvec{\beta }}^{RL})&= ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}} \, {\mathbf {H}} ) ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } - \varvec{\mathcal {Z}} \, \varvec{\vartheta } , \\ \mathcal {B} ( \widehat{\varvec{\beta }}^{LLS})&= \mathcal {B} ( \widehat{\varvec{\beta }}^{UR}) - \delta \,\varvec{\mathcal {Z}}\, [ {{\textbf {H}}} ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\vartheta } ] , \\ \mathcal {B} ( \widehat{\varvec{\beta }}^{LPT})&= \mathcal {B} ( \widehat{\varvec{\beta }}^{UR}) - \varvec{\mathcal {Z}}\, [ {{\textbf {H}}} ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\vartheta } ]\,\varvec{ \Psi }_{p_2 + 2} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*}) , \\ \ \ \ \mathcal {B} ( \widehat{\varvec{\beta }}^{LSPE})&= \mathcal {B} ( \widehat{\varvec{\beta }}^{UR}) - \delta \,\varvec{\mathcal {Z}}\, [ {{\textbf {H}}} ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\vartheta } ] \,\varvec{ \Psi }_{p_2 + 2} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*}) , \\ \mathcal {B} ( \widehat{\varvec{\beta }}^{LS})&= \mathcal {B} ( \widehat{\varvec{\beta }}^{UR}) - ( p_2 - 2 ) \, \varvec{\mathcal {Z}} \ [ {{\textbf {H}}} \,( {{\textbf {B}}} - {{\textbf {I}}}_p )\, \varvec{\beta } + \varvec{\vartheta } ] \, E \Big [ \frac{1}{\chi ^2_{ p_2 + 2 } ( \Delta ^{*} )} \Big ] , \\ \ \ \mathcal {B} ( \widehat{\varvec{\beta }}^{LPS})&= \mathcal {B} ( \widehat{\varvec{\beta }}^{LS}) - \, \varvec{\mathcal {Z}}\, [ {{\textbf {H}}} ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\vartheta } ] \bigg \lbrace \varvec{\Psi }_{p_2 + 2}(\chi ^2_{p_2 ,\alpha } ; \Delta ^{*} ) + \\&\qquad \qquad \qquad \qquad ( p_2 - 2 ) \, E \biggl [ \dfrac{ I ( \chi ^2_{p_2 + 2} (\Delta ^{*}) < p_2 - 2 ) }{ \chi ^2_{p_2 + 2} ( \Delta ^{*} ) } \biggl ] \bigg \rbrace . \end{aligned}$$

where \( \varvec{\Psi }_{v}(. ; \Delta ^{*} ) \) is the cumulative distribution function of the \( \chi ^2_{v}( \Delta ^{*} ) \) distribution and \( \Delta ^{*} = \varvec{\vartheta }^\prime ( {{\textbf {H}}} \, {{\textbf {I}}}^{-1} {{\textbf {H}}}^\prime )^{-1} \varvec{\vartheta } \) is the non-centrality parameter. \(\square \)

Proof

See Online Appendix 2.

Theorem 3.3

Under the local alternatives given in (21) and the usual regularity conditions, the asymptotic distributional variances of the estimators are as follows

$$\begin{aligned} \mathcal {V} ( \widehat{\varvec{\beta }}^{UR}) =&\, {{\textbf {B}}} \varvec{\mathcal {I}}^{-1} {{\textbf {B}}}^\prime + \bigg [ ( {{\textbf {B}}} - {\mathbf {I}}_p ) \,\varvec{\beta } \bigg ] \, \bigg [ ( {{\textbf {B}}} - {\mathbf {I}}_p ) \, \varvec{\beta } \bigg ]^\prime , \\ \mathcal {V} ( \widehat{\varvec{\beta }}^{RL}) =&\, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1} \, {{\textbf {B}}}^\prime - \varvec{\mathcal {Z}} \, {\mathbf {H}} \, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1} \, {{\textbf {B}}}^\prime \\&\,+ \bigg [ ( {\mathbf {I}}_p - \varvec{\mathcal {Z}} \, {\mathbf {H}} ) \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } - \varvec{\mathcal {Z}} \,\varvec{\vartheta } \bigg ] \, \bigg [ ( {\mathbf {I}}_p - \varvec{\mathcal {Z}} \, {\mathbf {H}} ) \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } - \varvec{\mathcal {Z}} \,\varvec{\vartheta } \bigg ]^\prime , \end{aligned}$$
$$\begin{aligned} \mathcal {V} ( \widehat{\varvec{\beta }}^{LLS}) =&\,\mathcal {V} ( \widehat{\varvec{\beta }}^{UR}) \\&- 2 \delta \, \big \lbrace ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}}\, {{\textbf {H}}} ) \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } ) - \varvec{\mathcal {Z}} \, \varvec{\vartheta } \big \rbrace \big \lbrace \varvec{\mathcal {Z}}\, {{\textbf {H}}}\,( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } \big \rbrace ^\prime \\&- \delta \, ( 2 - \delta ) \, \big \lbrace \varvec{\mathcal {Z}}\, {{\textbf {H}}}\, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1} \, {{\textbf {B}}}^\prime + [ \varvec{\mathcal {Z}}\, {{\textbf {H}}}\,( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ] \\&\times [ \varvec{\mathcal {Z}}\, {{\textbf {H}}}\,( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ]^\prime \big \rbrace , \end{aligned}$$
$$\begin{aligned} \mathcal {V} ( \widehat{\varvec{\beta }}^{LPT}) =&\mathcal {V} ( \widehat{\varvec{\beta }}^{UR}) - 2 \, \bigg ( \bigg \lbrace ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}} \, {{\textbf {H}}} )\, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } - \varvec{\mathcal {Z}} \, \varvec{\vartheta } \bigg \rbrace \\&\times [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ]^\prime \, \Psi _{p_2 + 2} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*} ) \bigg ) \\&- \bigg ( \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1} \, {{\textbf {B}}}^\prime \, \Psi _{p_2 + 2} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*} ) + [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ] \\&\times [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ]^\prime \Psi _{p_2 + 4} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*} ) \bigg ) , \end{aligned}$$
$$\begin{aligned} \mathcal {V} ( \widehat{\varvec{\beta }}^{LSPE}) =&\mathcal {V} ( \widehat{\varvec{\beta }}^{UR}) - 2 \delta \, \bigg ( \lbrace ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}} \, {{\textbf {H}}} )\, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } - \varvec{\mathcal {Z}} \, \varvec{\vartheta } \rbrace \\&\times [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ]^\prime \, \Psi _{p_2 + 2} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*} ) \bigg ) \\&- \delta \, ( 2 - \delta ) \, \bigg ( \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1} \, {{\textbf {B}}}^\prime \, \Psi _{p_2 + 2} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*} ) + [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ] \\&\times [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ]^\prime \Psi _{p_2 + 4} ( \chi ^2_{p_2, \alpha } ; \Delta ^{*} ) \bigg ) , \end{aligned}$$
$$\begin{aligned} \mathcal {V} ( \widehat{\varvec{\beta }}^{LS}) =&\mathcal {V} ( \widehat{\varvec{\beta }}^{UR}) \\&- 2 \, ( p_2 - 2 ) \, \bigg ( \big [ ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}} \, {{\textbf {H}}} ) \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } - \varvec{\mathcal {Z}} \, \varvec{\vartheta } \big ]\, \\&\times \big [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} [ {{\textbf {B}}} - {{\textbf {I}}}_p ] \, \varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } \big ] \, E \Big [ \frac{1}{\chi ^2_{p_2 + 2}(\Delta ^{*})} \Big ] \bigg ) \\&+ ( p_2 - 2 ) \, ( p_2 - 4 )\, \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1} \, {{\textbf {B}}}^\prime \, \bigg ( E \Big [ \frac{1}{(\chi ^2_{p_2 + 2}(\Delta ^{*}))^2} \Big ] - E \Big [ \frac{1}{\chi ^2_{p_2 + 2}(\Delta ^{*})} \Big ] \bigg ) \\&+ ( p_2 - 2 ) \, ( p_2 - 4 )\, [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ] \, [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ]^\prime \\&\times \bigg ( E \Big [ \frac{1}{(\chi ^2_{p_2 + 4}(\Delta ^{*}))^2} \Big ] - E \Big [ \frac{1}{\chi ^2_{p_2 + 4}(\Delta ^{*})} \Big ] \bigg ) , \end{aligned}$$
$$\begin{aligned} \mathcal {V} ( \widehat{\varvec{\beta }}^{LPS}) =&\mathcal {V} ( \widehat{\varvec{\beta }}^{LS}) \\&- 2\, \bigg ( \lbrace ( {{\textbf {I}}}_p - \varvec{\mathcal {Z}} \, {{\textbf {H}}} ) \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } - \varvec{\mathcal {Z}} \, \varvec{\vartheta } \rbrace \, [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } - \varvec{\mathcal {Z}} \, \varvec{\vartheta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ] \, \\&\times E \Big [ \Big (1 - \frac{p_2 - 2}{\chi ^2_{p_2 + 2}(\Delta ^{*})} \Big ) \, I ( \chi ^2_{p_2 + 2}(\Delta ^{*})< p_2 - 2 )\Big ] \bigg ) \\&- \bigg ( \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, {{\textbf {B}}} \, \varvec{\mathcal {I}}^{-1}\, {{\textbf {B}}}^\prime \, E\Big [ \Big ( 1 - \frac{p_2 -2}{\chi ^2_{p_2 + 2}(\Delta ^{*})} \Big )^2 \, I ( \chi ^2_{p_2 + 2}(\Delta ^{*})< p_2 - 2 ) \Big ] \\&+ [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ] \, [ \varvec{\mathcal {Z}} \, {{\textbf {H}}} \, ( {{\textbf {B}}} - {{\textbf {I}}}_p ) \,\varvec{\beta } + \varvec{\mathcal {Z}} \, \varvec{\vartheta } ]^\prime \\&\times E\Big [ \Big ( 1 - \frac{p_2 -2}{\chi ^2_{p_2 + 4}(\Delta ^{*})} \Big )^2 \, I ( \chi ^2_{p_2 + 4}(\Delta ^{*}) < p_2 - 2 ) \Big ] \bigg ) . \end{aligned}$$

Proof

See Online Appendix 3.

4 Some asymptotic evaluations of the variance of the proposed estimators

In this section, we compare the asymptotic distributional variances of the seven estimators discussed in Sect. 3. The following definition is very helpful for comparison purposes.

Definition 4.1

Let \(\mathcal {B}\) be the parameter space of \(\varvec{\beta }\). If two estimators \(\hat{\varvec{\beta }}^{*}\) and \(\hat{\varvec{\beta }}^{**}\) are such that \(\mathcal {V} (\hat{\varvec{\beta }}^{*})\le \mathcal {V} (\hat{\varvec{\beta }}^{**})\) for all values of \(\varvec{\beta }\in \mathcal {B}\), with strict inequality for at least one \(\varvec{\beta }\), we say that \(\hat{\varvec{\beta }}^{*}\) dominates \(\hat{\varvec{\beta }}^{**}\)

1. \(\varvec{\beta }^{RL}\) is superior to the \(\varvec{\beta }^{UR}\) if:

$$\begin{aligned} \mathcal {V} (\varvec{\beta }^{RL}) -\mathcal {V} (\varvec{\beta }^{UR})=&-\varvec{\mathcal {Z}H}\varvec{B}\varvec{\mathcal {I}}^{-1}\varvec{B}^{\prime }-\Big [\{(\varvec{B}-\varvec{I}_p)\varvec{\beta }\}\big (\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }\big )^{\prime }\Big ]\\ {}&\times \Big [\big (\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }\big )\{(\varvec{B}-\varvec{I}_p)\varvec{\beta }\}^{\prime }\Big ]\Big (\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }\Big )\\&\times \Big (\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }\Big )^{\prime }\le 0,\quad \forall \Delta ^{*}\in (0,+\infty ) \end{aligned}$$

2. \(\varvec{\beta }^{LPT}\) is superior to the \(\varvec{\beta }^{UR}\) if:

$$\begin{aligned} \mathcal {V} (\varvec{\beta }^{LPT}) -\mathcal {V} (\varvec{\beta }^{UR})=&-2\Bigg (\{(\varvec{I}_p-\varvec{\mathcal {Z}H})(\varvec{B}-\varvec{I}_p)\varvec{\beta }-\varvec{\mathcal {Z}\vartheta }\}\\&\times \{\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }\}^{\prime }\Psi _{p_2+2}(\chi _{p_2,\alpha }^{2};\varvec{\Delta ^{*}})\Bigg )\\&-\Bigg (\varvec{\mathcal {Z}H}\varvec{B}\varvec{I}^{-1}\varvec{B}^{\prime }\Psi _{p_2+2}(\chi _{p_2,\alpha }^{2};\varvec{\Delta ^{*}})+[\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }]\\&\times [\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }]^{\prime }\Psi _{p_2+4}(\chi _{p_2,\alpha }^{2};\varvec{\Delta ^{*}})\Bigg )\le 0,\quad \forall \Delta ^{*}\in {\mathbb {R}}^{+}. \end{aligned}$$

3. \(\varvec{\beta }^{LLS}\) is superior to the \(\varvec{\beta }^{UR}\) if:

$$\begin{aligned} \mathcal {V} (\varvec{\beta }^{LLS}) -\mathcal {V} (\varvec{\beta }^{UR})=&-\delta \Bigg [2\Big ((\varvec{I}_{p}-\varvec{\mathcal {Z}H})(\varvec{B}-\varvec{I}_p)\varvec{\beta }-\varvec{\mathcal {Z}\vartheta }\Big )\{\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_{p})\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }\}^{\prime }\Big )\\&+(2-\delta )\Big (\varvec{\mathcal {Z}HB\mathcal {I}^{-1}B^{\prime }}+[\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }]\\ {}&\times [\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}\vartheta }]^{\prime }\Big )\Bigg ]\le 0,\quad \forall \Delta ^{*}\in {\mathbb {R}}^{+} \end{aligned}$$

4. \(\varvec{\beta }^{LPS}\) is superior to the \(\varvec{\beta }^{LS}\) if:

$$\begin{aligned} \mathcal {V} (\varvec{\beta }^{LPS}) -\mathcal {V} (\varvec{\beta }^{LS})=&\,- 2\Bigg (\{(\varvec{I}_p-\varvec{\mathcal {Z}H})(\varvec{B}-\varvec{I}_p)\varvec{\beta }-\varvec{\mathcal {Z}}\varvec{\vartheta }\}\{\varvec{\mathcal {Z}}\varvec{H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }\}\\&\times E\Bigg [(1-\frac{p_2-2}{\chi ^{2}_{p_2+2}(\varvec{\Delta ^{*}})})I(\chi ^{2}_{p_2+2}(\varvec{\Delta ^{*}})<p_2-2)\Bigg ]\Bigg )\\&-\Bigg (\varvec{\mathcal {Z}HB\mathcal {I}^{-1}}\varvec{B}^{\prime } E\Bigg [(1-\frac{p_2-2}{\chi ^{2}_{p_2+2}(\varvec{\Delta ^{*}})})^{2}I(\chi ^{2}_{p_2+2}(\varvec{\Delta ^{*}})<p_2-2)\Bigg ]\Bigg )\\&+\Bigg [\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}}\varvec{\vartheta }\Bigg ]\Bigg [\varvec{\mathcal {Z}H}(\varvec{B}-\varvec{I}_p)\varvec{\beta }+\varvec{\mathcal {Z}}\varvec{\vartheta }\Bigg ]^{\prime }\\&\times E\Bigg [(1-\frac{p_2-2}{\chi ^{2}_{p_2+4}(\varvec{\Delta ^{*}})})I(\chi ^{2}_{p_2+4}(\varvec{\Delta ^{*}})<p_2-2)\Bigg ]\le 0,\quad \forall \Delta ^{*}\in {\mathbb {R}}^{+} \end{aligned}$$

The right hand side of the equations above is just real numbers. Since the expectation of a positive random variable is positive, then by definition of an indicator function,

$$\begin{aligned} (1-\frac{p_2-2}{\chi ^{2}_{p_2+2}(\varvec{\Delta ^{*}})})I(\chi ^{2}_{p_2+2}(\varvec{\Delta ^{*}})<p_2-2)\ge 0. \end{aligned}$$

Since \(P(\chi ^{2}_{p_2+2}(\varvec{\Delta ^{*}})>0)=1\) Thus, for all \(\Delta ^{*}\in (0,+\infty )\)

$$\begin{aligned} \mathcal {V} (\varvec{\beta }^{LPS})\le \mathcal {V} (\varvec{\beta }^{LS}). \end{aligned}$$

5 Monte Carlo simulation

In this section, we provide the details of an extensive Monte Carlo simulation study in order to compare the performances of the listed estimators in terms of relative efficiency, which is defined in Eq. (24) where \(\widehat{\varvec{\beta }}^{*}\) corresponds to the listed methods in the paper

$$\begin{aligned} RMSE\left( \widehat{\varvec{\beta }}^{*}\right) =\frac{MSE\left( \widehat{\varvec{\beta }}^{UR}\right) }{MSE(\widehat{\varvec{\beta }}^{*})}. \end{aligned}$$
(24)

Since one of our main aims is to investigate the performance of the estimators under a multicollinear design, we generate the design matrix \({{\mathbf {X}}}\) using the multivariate normal distribution with zero mean vector \(\mathbf{0}\) and variance covariance matrix \(\varvec{\Sigma }\) such that \({{\mathbf {X}}}\sim N\left( \mathbf{0},\varvec{\Sigma }\right) \in {\mathbb {R}}^{n \times p}\), where \(\Sigma _{ij}=\rho ^{|i-j|}\), \(i,j=1,2, \ldots , p\), n is the sample size and p is the number of predictor variables. In this setting, \(\rho \) controls the degree of correlation between the predictors, and it is taken as 0.6 and 0.9.

We consider the candidate sub-model given in (11). The hypothesis \({{\textbf {H}}}_0 : {{\textbf {H}}} \varvec{\beta } = {{\textbf {h}}} \) is tested against \({{\textbf {H}}}_1 : {{\textbf {H}}} \varvec{\beta } \ne {{\textbf {h}}}\) where \({{\textbf {H}}}=\left( {{\mathbf {0}}}_{p_2 \times p_1}, {{\mathbf {I}}}_{p_2} \right) \in {\mathbb {R}}^{p_2 \times p}\) is a matrix of rank \(p_2\), \({{\mathbf {I}}}_{p_2}\in {\mathbb {R}}^{p_2 \times p_2}\) is an identity matrix of order \(p_2\) such that \(p=p_1+p_2\). The sample size is chosen to be \(n=50, 100, 200\). The true regression parameters are taken to be \({\varvec{\beta }}=\left( {\varvec{\beta }}_1^\prime , {\varvec{\beta }}_2^\prime \right) ^\prime \) where \({\varvec{\beta }}_1 \in {\mathbb {R}}^{p_1}\) and \({\varvec{\beta }}_2 \in {\mathbb {R}}^{p_2}\) are active and inactive parameter vectors respectively.

The response variable is generated using the beta distribution such that \(y_i \sim Beta\left( \mu _i \phi , (1-\mu _i)\phi \right) \) where

$$\begin{aligned} \mu _i=\frac{\exp ({{\mathbf {x}}}_i^\prime {\varvec{\beta }})}{1+\exp ({{\mathbf {x}}}_i^\prime {\varvec{\beta }})}. \end{aligned}$$

which is known as the logit link function and the dispersion parameter is fixed to be 5. The confidence level in prior information is taken as 0.5 which means that equal weights are put on the unrestricted and restricted Liu estimates. However, one can use the estimated value of \(\delta \) using (16). In designing this Monte Carlo experiment, we also aim to understand the effects of the departure from the true parameter vector. Thus, we focus on the two different cases, namely, \({\varvec{\beta }}_2={{\mathbf {0}}}_{p_2}\) meaning that the null hypothesis \({{\textbf {H}}}_0\) holds and \({\varvec{\beta }}_2 \ne {{\mathbf {0}}}_{p_2}\) which means that the alternative hypothesis \({{\textbf {H}}}_1\) holds. In order to measure the effect of the departure from the null hypothesis, we define another parameter \(\Delta \) that represents the distance between the simulated model and the candidate sub-model. It is defined as \(\Delta =\Vert {\varvec{\beta }}-{\varvec{\beta }}^{(0)} \Vert \) where \(\Vert . \Vert \) is the usual Euclidean norm and \({\varvec{\beta }}^{(0)}=\left( {\varvec{\beta }}_1^\prime , {{\mathbf {0}}}_{p_2}^\prime \right) ^\prime \) is the parameter vector under \({{\textbf {H}}}_0\). Thus, \(\Delta =0\) means that we consider the first scenario, we set \({\varvec{\beta }}_1^\prime =(2.75, -1.75, 1.45)\) and \({\varvec{\beta }}_2={{\mathbf {0}}}_{p_2}\). In the second scenario, \({\varvec{\beta }}_1\) is the same while \({\varvec{\beta }}_2=\left( \sqrt{\Delta }, {{\mathbf {0}}}_{p_2-1} ^\prime \right) ^\prime \) where \(\Delta \in [0,2]\). Moreover, \(p_2\) is chosen to be 10, 15, 20.

Fig. 1
figure 1

RMSEs of the estimators versus DELTA when \(\rho =0.6\)

The number of repetitions in the simulation is 1000. The simulated mean squared error of an estimator \(\widehat{\varvec{\beta }}^{*}\) is computed as follows

$$\begin{aligned} \mathrm{MSE}\left( \widehat{\varvec{\beta }}^{*} \right)= & {} \frac{1}{1000}\sum _{r=1}^{1000}\left( \widehat{\varvec{\beta }}^{*} -{\varvec{\beta }}\right) _r^{\prime }\left( \widehat{\varvec{\beta }}^{*} -{\varvec{\beta }}\right) \end{aligned}$$

Note that we use the relative efficiency, which is the relative mean squared errors (RMSE) such that a value of RMSE larger than one shows that the estimator \(\widehat{\varvec{\beta }}^*\) is superior to \(\widehat{\varvec{\beta }}^{UR}\).

The results of the simulation is summarized in Figs. 1 and 2 showing the RMSE performance of the methods with respect to \(\Delta \). The following the conclusions can be deducted from the figures:

  • At \( \Delta = 0 \), the performance of the restricted Liu estimator is the best in all the situations. When the null hypothesis is violated, the RMSE of this estimator sharply decreases.

  • At \( \Delta = 0 \), the Liu positive Stein estimator is better than the Liu Stein estimator. However, as \( \Delta \) moves away from zero, the performance of these two estimators becomes the same.

  • The RMSE of all estimators increases as the correlation between predictor variables increases.

  • The RMSE of all estimators generally decreases as the sample size n increases.

  • The most important result is that for high correlation (\(\rho =0.9\)) the relative efficiencies of all estimates are higher than the unrestricted estimator.

  • For high correlation (\(\rho =0.9\)), the restricted Liu shrinkage estimators generally has higher relative efficiencies in a wide range of \(\Delta \).

Fig. 2
figure 2

RMSEs of the estimators versus DELTA when \(\rho =0.9\)

6 Real data application

In this section, we apply the proposed estimators to the two real data set as given in the subsections.

6.1 Government spending in Dutch cities data (2005)

The aim of this data is to explain the proportion of Dutch city budgets spent on administration and government based on 10 covariates. The data is contained in fmlogit package in R as a data frame with 429 observations and 12 variables. The dependent variable is governing and the remaining variables are the explanatory variables which are given in Table 1.

Table 1 Variable descriptions in the dataset
Table 2 Variables included in the competing models
Fig. 3
figure 3

Bivariate correlation plot of the explanatory variables in real data

Since there are missing values in some observations, we exclude them and make a complete case analysis. We fit a beta regression model and observe the significant variables. We summarize the unrestricted and restricted models in Table 2. From Fig. 3, it is seen that there is a high correlation between some covariates. Also, we compute the condition number (CN) of the matrix of cross products \({{\textbf {X}}}^\prime \widehat{{{\mathbf {W}}}} {{\textbf {X}}}\) as 809.097 which is defined as the square root of the ratio of the maximum eigenvalue to the minimum eigenvalue \({{\textbf {X}}}^\prime \widehat{{{\mathbf {W}}}} {{\textbf {X}}}\). Both of the correlation plot and CN indicate that there is a severe collinearity problem and using the usual beta regression for this data may not be appropriate, as such analysis may result in unreliable estimates. Therefore, we applied the proposed estimators given in this paper. The corresponding values of the AIC criterion given in Table 2 indicate that five of the variables houseval, education, recreation, social and urbanplanning are effective and the rest of variables are ineffective (see also the \(R^2\) values). Therefore, we use this restriction in the analysis and computed the proposed Liu shrinkage estimators (Table 3). We obtained the optimal value of \(\delta \) as 0.48 in the Liu shrinkage estimator. Further, we set significance level as 0.05 in the preliminary test estimator. To evaluate the performance of the proposed new estimators, we apply the bootstrap technique with \(n=200\) and 2000 boot times. We then compute the mean squares as the square of the estimated bias plus the square of the standard deviation for each estimator.

The results show that the bootstrap root mean squares given in Table 3 (and standard errors, Online Appendix 4) of the proposed estimators, specifically the Stein-type and positive Stein–type shrinkage estimators, are generally lower than those based on the unrestricted maximum likelihood estimator of the beta regression model. The relative efficiencies also show that the restricted estimators had the highest value of 2.235 which is preferable to the other estimates.

Table 3 Coefficients and bootstrapped root mean square errors of the proposed estimators

6.2 Student performance data set

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features and it was collected by using school reports and questionnaires of 395 students. The outcome variable is the students’ performance of Mathematics graded in the interval [0, 20] . In Cortez and Silva (2008), the data set was modeled under binary/five-level classification and regression tasks. However, we use beta regression without categorization of the outcome variable, as it may lose some useful information (Altman and Royston 2006). Note that the outcome variable is final grade (G3) which has a strong correlation with attributes second period grade (G2) and first period grade (G1). As the G3 (final grade in mathematics) was bounded in the interval [0, 20] , we converted it to the interval \(\left( 0,1\right) \) and fitted beta regression. Based on the AIC and \(R^2\) measures, we form the null model given in Table 4. We then apply the estimators developed in this paper. We set the significance level (\(\alpha \)) equal to 0.05. Since we do not have any prior information about the parameters, therefore we use the estimated the value of \(\delta \) (0.354) in the linear shrinkage estimator. The results show that our proposed estimates outperform the unrestricted in most of the cases in terms of the bootstrap root mean squares (Table 5) and standard errors (Online Appendix 4). Further, of those proposed estimates, the restricted estimators are by far the best estimate in terms of having higher relative efficiency (2.579). Positive Stein and Stein type estimates are the other favorable estimators.

Table 4 Variables included in the competing models
Table 5 Coefficients and bootstrapped root mean square errors of the proposed estimators for education data

7 Conclusion

In this paper, we considered different types of improved shrinkage estimators based on the Liu estimator, namely, restricted, preliminary test, Stein-type, positive Stein-type, and linear shrinkage estimators for the beta regression model. We obtained the analytical biases and variances of the proposed estimators under the local alternative hypothesis. Further, we conducted an extensive simulation study to examine the performance of the proposed estimators in a limited number of samples. Our results showed that Stein–type estimators uniformly outperform the usual maximum likelihood estimators. Other shrinkage type estimators also had higher relative efficiencies compared to the maximum likelihood in a wide range of parameter space. We concluded the paper by applying the proposed methodology for two known real data from econometrics and education. The superiority of the proposed estimators were evident in terms of having high overall relative efficiencies and lower bootstrap standard and mean square errors in both of the real examples.