1 Introduction

Small area estimation (SAE) involves estimating parameters in small sub-populations for which direct design-based estimators are unreliable due to the small number of observations in such domains. Among the works that exhaustively describe the existing theory of SAE, there are Rao and Molina [18], Ghosh and Rao [11], Datta [6] and Morales et al. [15]. To estimate the model parameters several methods have been proposed in the literature, including Method of Moments (MoM), Minimum Norm Quadratic Unbiased Estimators and Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML), which require the normality assumption. For more details, we refer the reader to [7, 18] and [15].

Depending on the aggregation level of the response variable, small-area models are divided into unit-level models and area-level models. Among the latter, the Fay-Herriot (FH) model [10] is widely used in SAE, since it allows to combine survey data with data from other sources. It can be seen as a linear mixed model with random intercepts (see [18] and [15]), for which the theory distinguishes two cases: balanced and unbalanced design. The peculiarities of the FH model are the fact that the sampling variances are known and that each area is an observation, configuring it as a balanced model. However, as these sampling variances by area are different from each other, the heteroscedasticity of the residual error variances configures it actually as an unbalanced model. For this reason, a point of interest of that model may focus on possible biased REML estimates [5, 7, 22]. It is well known [7] that the bias of the ML and REML estimators is a point of attention for theoretical and application studies. These methods, and the MoM, are the most commonly used way to achieve the variance component estimate in the FH model: the first two require the assumption of normality, and the last one is a distribution-free estimation method. Some studies, in particular [7], showed that the REML and the moment estimators are the most efficient, having a quite similar behavior when the sample size and, in particular, the dimension of the vector of the random-effects increases. They both approach the Cramer-Rao lower bound. ML and REML estimators are translation invariant even additive functions of the data [18]. In particular, REML estimation method searches for the maximization of a joint probability function of the residual vector of independent contrast, given a matrix of independent column vectors that transforms linearly the vector of parameters. The transformed data result no longer function of the model covariates, and consequently independent of the best linear unbiased estimates of the model fixed-effects [13]. Best Linear Unbiased Predictors under ML and REML estimators, together with the MoM estimator, result model-unbiased, when the random components of the FH model are symmetrically distributed around zero [18]. While for the balanced random intercepts model the four most employed above cited estimators coincide and are at the same time unbiased, the situation is different when the random intercepts model is unbalanced. In particular, the most used method, REML, results biased [7] in case of unbalanced design. Therefore it is essential to discuss the measure of this bias. The bias of the variance component of the random-area effects REML estimates will affect both the fixed and random-effects of the FH model. Thus, this bias involves the small area predictions, because the above variance component appears on both regression-synthetic and direct estimator parts of the linear predictor of the FH model. Applications in SAE are quite crucial because the bias of the variance parameters estimates lead to some consequences. The small area predictors may be biased. We carefully assess the circumstance for which in both the balanced random intercept model [7] and the FH model with equal sampling variances [18] the GLS fixed effects estimator changes to the OLS estimator. In particular, we analyze the impact of unequal sampling variances in the FH model on the bias of the linear predictor, when the REML estimation method is employed. We connect the bias of the REML estimates, considering the set of the unequal sampling variances in the FH model as being equivalent to different sampling sizes among subjects (clusters) in the standard linear mixed model with random intercepts. As we demonstrate empirically via simulation, the variability of sampling variances, or the variability of the subject (cluster) sizes, is taken as the motivation of the bias itself. Two measures of that bias are introduced, both starting from the departure of the GLS estimator from the OLS estimator. The first is given by the relative variance of the GLS estimator with respect to the subject size of sampling variances, with respect to their maximum variance. The second is based on the restricted log-likelihood displacement in the direction of the vector of the actual subject (cluster) sizes, or the sampling variances in a Fay-Herriot model, starting from the log-likelihood when using the OLS estimator.

Another topic of study of the FH model could be found in the assumption of normality for the response variable and random effects, because often this assumption doesn’t hold, due to asymmetry in the data distribution [2]. Some authors, to overcome the problem of asymmetry in the data pattern, have proposed to modify the distribution of the random effect with a power exponential distribution [9] or to use another model for small area estimation [2, 8] and [19]. To the best of our knowledge, there are no works dealing with the asymmetry of the shape of the empirical distribution of the sampling variances.

The aim of the present work is to understand how the heterogeneity of the design variances influences the linear predictor of the FH model. This in terms of the possible bias of the REML estimate of the variance component, and, given biased estimates, with an evaluation of its mean squared error. The heterogeneity of sampling variances is evaluated by their empirical distribution. It occurs when considering the impact on the fixed-effects estimates. The fixed-effects estimates play a role in the random-area estimates, together with the variance components of the model. Thus, when we have a bias in the variance components estimates, consequently the fixed and random part of the linear predictor result biased. Given the level of variability of the sampling variances, another aspect is assessed in the present work via simulation. We consider different realistic cases in which the shape of the empirical distribution of the sampling variances may influence the level of the bias of the linear predictor.

Our research question is, therefore, to understand how the heterogeneity of sampling variances impacts the predictor. In particular, we want to investigate how some key measures related to the FH model change when the empirical distribution of the sampling variances passes from a situation of positive asymmetry to a symmetric one. Moreover, we take into account the issue of the relation between small area design estimates and their sampling variances. Graphical inspection in general shows that the relative standard error is a decreasing function of the magnitude of the direct estimates. From another point of view, in some applications, those estimates at the area level are an increasing function of the sampling variances. This suggests to investigate the relation between these two quantities. In the present paper, we focused on both the two main types: linear and non-linear relationships. In particular, how this given relation in the research may cause relevant changes in the measures, after the model fitting: the bias of the predictor, its mean squared prediction error, and a measure of predictor’s efficiency with respect to its MSE, introduced within the paper. The rest of the work is organized as follows. Section 2 gives some backgrounds on the Fay-Herriot model; Sect. 3 studies some measures for the bias of the restricted maximum likelihood estimates, both in the frame of the linear mixed models and the Fay-Herriot model, and introduces a measure of the efficiency of the linear predictor; Sect. 4 presents simulation experiments on artificial data to investigate the performance of model fitting in the presence of biased REML estimates of the variance component of the FH model, also with an assessment of the mean squared error; Sect. 5 gives some conclusions. The paper contains three appendices: Appendix A gives developments for the Proof of Proposition 2, Appendix B gives developments for the Proof of Proposition 3, while Appendix C gives developments for the Proof of Proposition 4.

2 Background

2.1 The Fay-Herriot model

The Fay-Herriot model [10] is the basic area-level model, widely used in small-area estimation [18]. It is made by two components:

  1. 1.

    A sampling model

    $$\begin{aligned} y_i = \theta _i + e_i, \quad i = 1, \dots , m \end{aligned}$$

    where \(e_i \sim N(0, \psi _i)\) is the sampling error associated with the direct estimator \(y_i\) and m the number of small areas. The linking model below assumes that the small area parameter \(\theta _i\) is related to auxiliary variables through a linear regression model:

  2. 2.

    A linking model

    $$\begin{aligned} \theta _i = \textbf{x}_i^{\prime }{\varvec{\beta }} + u_i, \quad i = 1, \dots , m \end{aligned}$$

where \({\varvec{\beta }}\) is a \(p \times 1\) vector of regression coefficients and \(u_i\) are the area specific random effects, \(u_i \sim N(0, \sigma ^2_u)\). The sampling model and the linking model lead to a linear mixed model given as

$$\begin{aligned} y_i = \textbf{x}_i^{\prime }{\varvec{\beta }} + u_i + e_i , \quad i = 1, \dots , m \end{aligned}$$

Random effects \(u_i\) and sampling errors \(e_i\) are mutually independent. The model variance \(\sigma ^2_u\) is unknown and needs to be estimated, differently from the sampling variance \(\psi _i\) that in the Fay-Herriot model is usually assumed to be known. Sampling variances are quite different among the areas, and together with the circumstance that we get one unit per subject in the mixed-effects framework, this situation constitutes the most distinctive feature of the model. Like in a general linear mixed model, we can calculate an estimate of the variance component \(\widehat{\sigma }_u\) and of the fixed effect \(\widehat{{\varvec{\beta }}}\). The so-called Empirical Best Linear Unbiased Predictor (EBLUP) of the parameter under the FH model is given by

$$\begin{aligned} \widetilde{y}_i = \widehat{\gamma }_i y_i + (1- \widehat{\gamma }_i) \textbf{x}_i^{\prime } \widehat{{\varvec{\beta }}} \end{aligned}$$

with \(\widehat{\gamma }_i\) the shrinkage factor \(\widehat{\gamma }_i = \frac{\widehat{\sigma }_u^2}{\widehat{\sigma }_u^2 + \psi _i}\). Furthermore, given the variance component - \(\sigma _u^2\) - the Mean Squared Error (MSE) of the BLUP is:

$$\begin{aligned} \text {MSE}(\widetilde{y}^{blup}_i)= & {} g_{1i}(\sigma ^2_u) + g_{2i}(\sigma ^2_u), \\ g_{1i}(\sigma _u^2)= & {} \frac{\sigma _u^2 \psi _i}{\sigma _u^2 + \psi _i} = \gamma _i \psi _i; \quad g_{2i}(\sigma _u^2) = (1- \gamma _i)^2 \textbf{x}^{\prime }_i \biggl ( \sum _{i = 1}^{m} \gamma _i \textbf{x}_i \textbf{x}_i^{\prime } \biggr )^{-1} \end{aligned}$$

whereas the MSE of the Empirical BLUP is, with \(\widehat{\sigma }_u^2\) estimator of \(\sigma _u^2\):

$$\begin{aligned} \text {MSE}(\widetilde{y}^{eblup}_i)= & {} g_{1i}(\widehat{\sigma }_u^2) + g_{2i}(\widehat{\sigma }_u^2) + g_{3i}(\widehat{\sigma }_u^2), \\ g_{3i}(\widehat{\sigma }_u^2)= & {} (1 - \gamma _i)^2 \gamma _i \widehat{\sigma }_u^{-2} \overline{V}(\widehat{\sigma }_u^2) \end{aligned}$$

where \(\overline{V}(\widehat{\sigma }_u^2)\) is the asymptotic variance of \(\widehat{\sigma }_u^2\). Prasad and Rao [17, 18] derived the analytic estimator of \(\text {MSE}(\widetilde{y}^{eblup}_i)\) as

$$\begin{aligned} mse(\widetilde{y}^{eblup}_i)= & {} g_{1i}(\widehat{\sigma }_u^2) + g_{2i}(\widehat{\sigma }_u^2) + 2g_{3i}(\widehat{\sigma }_u^2), \end{aligned}$$

where \(\widehat{\sigma }_u^2\) is the REML estimator.

3 Some theoretical considerations

3.1 On the bias of the REML estimates

In this section, we consider the following special case of the general Linear Mixed Effects model (LME), the random-intercepts model (see for a comprehensive coverage [7] and [12]:

$$\begin{aligned} \textbf{y} =\textbf{X}\beta +\textbf{Zu}+\textbf{e},\ \ \ \ \textbf{u} \sim N(\textbf{0},\sigma _{u}^{2}\textbf{I}_{m}),\ \ \ \ \textbf{e}\sim N( \textbf{0},\sigma ^{2}\textbf{I}_{n}), \end{aligned}$$
(1)

with

$$\begin{aligned} \textbf{y}_{i}= & {} \textbf{X}_{i}{\beta }+\textbf{Z}_{i}u_{i}+ \textbf{e}_{i},\ \ \ \textbf{Z}_{i}=\textbf{1}_{n_{i}},\ \ \ u_{i}\sim N(0,\sigma _{u}^{2}),\ \ \textbf{e}_{i}\sim N(\textbf{0},\sigma ^{2}\textbf{I}_{n_{i}}),\ \ \\ y_{ij}= & {} \textbf{x}_{ij}^{\prime }{\beta +}u_{i}+e_{ij},\ \ \ \ i=1,\ldots ,m,\ \ \ j=1,\ldots ,n_{i},\ \ \ \ \Sigma _{i=1}^{m}n_{i}=n, \\ var(\textbf{Z}_{i}u_{i})= & {} \textbf{Z}_{i}\textbf{G}^{*}\textbf{Z} _{i}^{\prime }=\sigma _{u}^{2}\textbf{1}_{n_{i}}\textbf{1}_{n_{i}}^{\prime },\ \ \ \ \textbf{G}^{*}=var(u_{i}), \end{aligned}$$

where \({\varvec{\beta }}\) is a \(p\times 1\) vector of fixed effects, \( \textbf{u}\) an \(m\times 1\) vector of random effects, \(\textbf{X}\) and \( \textbf{Z}\) are the design matrices of fixed and random effects, respectively. The Best Linear Unbiased Predictor is:

$$\begin{aligned} \widetilde{y} = {\varvec{X}}\widehat{{\varvec{\beta }}} + \widetilde{{\varvec{u}}} = {\varvec{X}}\widehat{{\varvec{\beta }}} + {\varvec{G}}{\varvec{Z}}^{\prime }{\varvec{V}}^{-1}({\varvec{y}} - {\varvec{X}}{\varvec{\beta }}) \end{aligned}$$
(2)

The model in (1) is the unbalanced LME model with random intercepts. In the context of the normal distribution of random effects and model errors, both Maximum Likelihood (ML) and REML estimators are generally employed in the applications. REML estimators take advantage of the reduction of the bias in the variance components and LME models. Maximization algorithms are developed in order to find estimates of the variance parameters \({\varvec{\theta }}\) of the model (1), e.g., Newton–Raphson, or Fisher-scoring. The components of the vector \({\varvec{\theta }}\) are the variance parameters that made the General Least Squares (GLS) estimates of the fixed effects of the LME model (see for example [13]). In the case of normality of random effects and residual errors, moment estimators can approximate the performance of the REML estimators, the last considered as the golden standard in the estimation of the parameters in the LME model. In general, these methods produce, in case of normality, estimates that result very close together, and close to the Cramér-Rao bound when m is large [7].

With the focus on the REML estimation method, the restricted log-likelihood function of the block-diagonal version of the LME model is:

$$\begin{aligned} l_{R}({\theta })= & {} -\frac{1}{2}(n-p)\log \sigma ^{2}-\frac{1}{2}\log \left| \Sigma _{i=1}^{m}\textbf{X}_{i}^{\prime }\textbf{V}_{i}^{-1} \textbf{X}_{i}\right| -\frac{1}{2}\Sigma _{i=1}^{m}\log \textbf{V}_{i} \nonumber \\{} & {} -\frac{1}{2\sigma ^{2}}\log \left[ \Sigma _{i=1}^{m}(\textbf{y}_{i}- \textbf{X}_{i}{\beta })^{\prime }\textbf{V}_{i}^{-1}(\textbf{y}_{i}- \textbf{X}_{i}{\beta })\right] , \end{aligned}$$
(3)

where \(\textbf{V}_{i}=\textbf{Z}_{i}\textbf{GZ}_{i}^{\prime }+ \textbf{I}_{n_{i}}=\sigma ^{-2}var(u_{i})\textbf{1}_{n_{i}}\textbf{1} _{n_{i}}^{\prime }+\textbf{I}_{n_{i}}=g\textbf{1}_{n_{i}}\textbf{1} _{n_{i}}^{\prime }+\textbf{I}_{n_{i}}\), and\(\ \mathbf {G=}\sigma ^{-2}\textbf{ G}^{*}=g\textbf{1}_{n_{i}}\textbf{1}_{n_{i}}^{\prime }\), the scaled covariance matrix of random effects.

An alternative version of the (3) is the profile-likelihood function, that comes from considering that the variance \(\sigma ^{2}\) can be employed as the following maximizer \(\widehat{\sigma }^{2}=n^{-1}\Sigma _{i=1}^{m}( \textbf{y}_{i}-\textbf{X}_{i}{\varvec{\beta }})^{\prime }\textbf{V}_{i}^{-1}( \textbf{y}_{i}-\textbf{X}_{i}{\varvec{\beta }})\) of the parameter \(\sigma ^{2}\) in the log-likelihood. e.g., This procedure reduces the number of parameters to be estimated. Thus the profile restricted log-likelihood function \( l_{R}^{*}\), with \({\varvec{\theta }}=\sigma _{u}^{2}\), and \(\sigma ^{2}\) eliminated, is defined as follows:

$$\begin{aligned} l_{R}^{*}({\beta },\sigma _{u}^{2})= & {} -\frac{1}{2}(n-p)\log \left[ \Sigma _{i=1}^{m}(\textbf{y}_{i}-\textbf{X}_{i}{\beta } )^{\prime }\textbf{V}_{i}^{-1}(\textbf{y}_{i}-\textbf{X}_{i}{\beta }) \right] \nonumber \\{} & {} -\frac{1}{2}\log \left| \Sigma _{i=1}^{m}\textbf{X}_{i}^{\prime } \textbf{V}_{i}^{-1}\textbf{X}_{i}\right| -\frac{1}{2}\Sigma _{i=1}^{m}\log \left| \textbf{V}_{i}\right| . \end{aligned}$$
(4)

An interesting feature of the REML estimation method, when applied to the model in (1), is that when data are balanced (i.e. there are the same number of units per subject - cluster of observations - \(n_{i}=n_{i^{\prime }},\ i,i^{\prime }=1,\ldots ,m\)) the estimates coincide with moment estimates and with other quadratic unbiased estimates. Conversely, when we have an unbalanced design both ML and REML estimators lead to biased variance parameters estimates, particularly in the case of small samples.

In many applications with unbalanced data is then important to know the level of the departure of these estimates from unbiasedness. In this case there are no closed-form solution for ML or REML component estimates of \({\varvec{\theta }}\), while for a balanced design and a model as in (1) these estimates admit an explicit solution.

Substituting in the profile-likelihood (4) \(\left| \textbf{V} _{i}\right| =\left| g\textbf{1}_{n_{i}}\textbf{1}_{n_{i}}^{\prime }+ \textbf{I}_{n_{i}}\right| \), and \(\textbf{V}_{i}^{-1}=(g\textbf{1} _{n_{i}}\textbf{1}_{n_{i}}^{\prime }+\textbf{I}_{n_{i}})^{-1}=\textbf{I} _{n_{i}}-\frac{g}{1+n_{i}g}\textbf{1}_{n_{i}}\textbf{1}_{n_{i}}^{\prime }\), the log-likelihood (4) for the unbalanced model becomes (see hereafter again [7]):

$$\begin{aligned} l_{R}^{*}({\beta },g)= & {} -\frac{1}{2}(n-p)\log \left[ \Sigma _{i=1}^{m}(\textbf{y}_{i}-\textbf{X}_{i}{\beta })^{\prime }\textbf{V} _{i}^{-1}(\textbf{y}_{i}-\textbf{X}_{i}{\beta })\right] \nonumber \\{} & {} -\frac{1}{2}\log \left| \Sigma _{i=1}^{m}\textbf{X}_{i}^{\prime } \textbf{V}_{i}^{-1}\textbf{X}_{i}\right| -\frac{1}{2}\Sigma _{i=1}^{m}\log \left| \textbf{V}_{i}\right| \nonumber \\= & {} -\frac{1}{2}(n-p)\log \left[ \Sigma _{i=1}^{m}(\textbf{y}_{i}-\textbf{X} _{i}{\beta })^{\prime }(\textbf{y}_{i}-\textbf{X}_{i}{\beta } )-g\Sigma _{i=1}^{m}\frac{n_{i}^{2}(\overline{y}_{i}-{\beta }^{\prime }\overline{\textbf{x}}_{i})^{2}}{1+n_{i}g}\right] \nonumber \\{} & {} -\frac{1}{2}\log \left| \Sigma _{i=1}^{m}(\textbf{X}_{i}^{\prime } \textbf{X}_{i}-\frac{n_{i}^{2}g}{1+n_{i}g}\overline{\textbf{x}}_{i}\overline{ \textbf{x}}_{i}^{\prime })\right| -\frac{1}{2}\Sigma _{i=1}^{m}\log (1+n_{i}g) \nonumber \\= & {} -\frac{1}{2}(n-p)\log T(n_{i})-\frac{1}{2}\log \left| \Sigma _{i=1}^{m}(\textbf{X}_{i}^{\prime }\textbf{X}_{i}-\frac{n_{i}^{2}g}{1+n_{i}g} \overline{\textbf{x}}_{i}\overline{\textbf{x}}_{i}^{\prime })\right| \nonumber \\{} & {} -\frac{1}{2}\Sigma _{i=1}^{m}\log (1+n_{i}g) \nonumber \\ T(n_{i})= & {} \Sigma _{i=1}^{m}(\textbf{y}_{i}-\textbf{X}_{i}{\beta } )^{\prime }(\textbf{y}_{i}-\textbf{X}_{i}{\beta })-g\Sigma _{i=1}^{m} \frac{n_{i}^{2}(\overline{y}_{i}-{\beta }^{\prime }\overline{\textbf{x }}_{i})^{2}}{1+n_{i}g} \nonumber \\ \overline{y}_{i}= & {} \frac{1}{n_{i}}\Sigma _{j=1}^{n_{i}}\textbf{y} _{i}^{\prime }\textbf{1}_{n_{i}}, \overline{\textbf{x}}_{i}=\frac{1 }{n_{i}}\Sigma _{j=1}^{n_{i}}\textbf{X}_{i}^{\prime }\textbf{1}_{n_{i}}. \end{aligned}$$
(5)

Then, we get the GLS estimate of \({\beta }\) as:

$$\begin{aligned} \widehat{{\beta }}= & {} (\Sigma _{i=1}^{m}\textbf{X}_{i}^{\prime } \textbf{V}_{i}^{-1}\textbf{X}_{i})^{-1}\Sigma _{i=1}^{m}\textbf{X} _{i}^{\prime }\textbf{V}_{i}^{-1}\textbf{y}_{i} \nonumber \\= & {} \left[ \Sigma _{i=1}^{m}(\textbf{X}_{i}^{\prime }\textbf{X}_{i}-\frac{ n_{i}^{2}g}{1+n_{i}g}\overline{\textbf{x}}_{i}\overline{\textbf{x}} _{i}^{\prime })\right] ^{-1}\left[ \Sigma _{i=1}^{m}(\textbf{X}_{i}^{\prime } \textbf{y}_{i}-\frac{n_{i}^{2}g}{1+n_{i}g}\overline{\textbf{x}}_{i}\overline{ y}_{i})\right] . \end{aligned}$$
(6)

Proposition 1

Given the model (1), a sufficient condition for the unbiasedness of the REML covariance parameter estimates is that \(n_{i}=n_{i^{\prime }}=\overline{n}, \ \forall i, i^{\prime }=1,\dots ,m\), i.e., a balanced LME model. The parameter estimates \(\widehat{g}_{R}\) and \(\widehat{\sigma }_{R}^{2}\) are:

$$\begin{aligned} \widehat{g}_{R}= & {} (m-1)^{-1}\times \left[ \frac{1}{m(\overline{n}-1)}\Sigma _{i=1}^{m}\left\{ (\textbf{y}_{i}-\textbf{X}_{i}\widehat{{\beta }} _{OLS})^{\prime }(\textbf{y}_{i}-\textbf{X}_{i}\widehat{{\beta }} _{OLS})-\overline{n}(\overline{y}_{i}-\overline{\overline{y}})^{2}\right\} \right] ^{-1} \nonumber \\{} & {} \times \Sigma _{i=1}^{m}(\overline{y}_{i}-\overline{\overline{y}})^{2}- \frac{1}{\overline{n}} \nonumber \\ \widehat{\sigma }_{R}^{2}= & {} [m(\overline{n}-1)-p+1]^{-1}\Sigma _{i=1}^{m} \left[ (\textbf{y}_{i}-\textbf{X}_{i}\widehat{{\beta }} _{OLS})^{\prime }(\textbf{y}_{i}-\textbf{X}_{i}\widehat{{\beta }} _{OLS})-\overline{n}(\overline{y}_{i}-\overline{\overline{y}})^{2}\right] , \end{aligned}$$
(7)

where \(\overline{\overline{y}}=\overline{\textbf{y}}^{\prime }1_{n},\ \ \overline{\textbf{y}}=m^{-1}\Sigma _{i=1}^{m}y_{i}\). Further, we have that \(\widehat{{\beta }}\equiv \widehat{{\beta }}_{OLS}\).

The proofs of the equality \(\widehat{{\beta }}\equiv \widehat{{\beta }}_{OLS}\), and of the closed-form solutions for the REML estimates \(\widehat{g}_{R}\) and \(\widehat{\sigma }_{R}^{2}\) can be found in [7], Section 2.4.1.

The underlying assumption of the Proposition 1 is when we have a balanced LME model, and consequently, an OLS estimate of the vector of the fixed-effects parameters, the REML estimator of the variance components results unbiased. Thus, a rough measure of the departure from unbiasedness can be assumed in terms of the difference \(\left| \widehat{{\beta }}-\widehat{{\beta }}_{OLS}\right| \). In particular, the size of subjects (clusters) can be very different. Thus, in the next Proposition 2, we introduce an index based on \(var(\widehat{{ \beta }}-\widehat{{\beta }}_{OLS})\), when the subjects (clusters) vary. This variance is multiplied in the proposed index by a coefficient, to be found via simulation, due to the non-linearity of the REML function maximization algorithms. Clearly when \(var(\widehat{{\beta }}- \widehat{{\beta }}_{OLS})=0\), the REML estimates are unbiased.

Proposition 2

Given the LME model with the unbalanced design (1), and the GLS estimator of the fixed effects (6) with given \(g=\sigma ^{-2}var(u_{i})\), and the subject (cluster) sample size as a set of m parameters \(\eta _{i}\in U\), \(U\subset \mathbb {R}:\) \(0<\eta _{i}\le m\overline{\eta }\), with \(\overline{\eta }=\mu (\eta )\) and \(S^{2}(\eta )\) as the variance of the \(\eta _{i}\) ’s. Then (6) takes the form:

$$\begin{aligned} \widehat{{\beta }}(\eta )=\left[ \Sigma _{i=1}^{m}\left( \textbf{X} _{i}^{\prime }\textbf{X}_{i}-\frac{\eta _{i}^{2}g}{1+\eta _{i}g}\overline{ \textbf{x}}_{i}\overline{\textbf{x}}_{i}^{\prime }\right) \right] ^{-1}\left[ \Sigma _{i=1}^{m}\left( \textbf{X}_{i}^{\prime }\textbf{y}_{i}-\frac{\eta _{i}^{2}g }{1+\eta _{i}g}\overline{\textbf{x}}_{i}\overline{y}_{i}\right) \right] \end{aligned}$$
(8)

where \(\widehat{{\beta }}(\eta )\) is a continuous and differentiable function of \(\eta \), with values among the m subject (clusters). Thus, with \(=\overline{\eta }^{2}(m-1)\):

$$\begin{aligned} S^{2}(\widehat{{\beta }}(\eta )-\widehat{{\beta }} _{OLS})=\left. \left( \frac{\partial \widehat{{\beta }}(\eta )}{ \partial \eta }\frac{\partial \widehat{{\beta }}^{\prime }(\eta )}{ \partial \eta }\right) \right| _{\eta =\overline{\eta }}\times S^{2}(\eta ) \end{aligned}$$

with:

$$\begin{aligned} \frac{\partial (\widehat{{\beta }}(\eta )-\widehat{{\beta }} _{OLS})}{\partial \eta }= & {} 2\left[ \Sigma _{i=1}^{m}\textbf{X}_{i}^{\prime } \textbf{X}_{i}-\frac{\eta _{i}^{2}g}{1+\eta _{i}g}\Sigma _{i=1}^{m}\overline{ \textbf{x}}_{i}\overline{\textbf{x}}_{i}^{\prime }\right] ^{-1} \\{} & {} \left[ (\Sigma _{i=1}^{m}\eta \overline{\textbf{x}}_{i}\overline{\textbf{x}} _{i}^{\prime })\times \widehat{{\beta }}_{OLS}-\Sigma _{i=1}^{m}\eta \overline{\textbf{x}}_{i}\overline{y}_{i}\right] . \end{aligned}$$

An index, say \(\delta _{1}\), of bias of the REML estimates for \(\widehat{g}_{R}\) and \(\widehat{\sigma }_{R}^{2}\) is then:

$$\begin{aligned} \delta _{1}= & {} \frac{S^{2}(\eta )(\widehat{{\beta }}_{OLS}^{\prime } \textbf{1}_{p})^{-1}}{\overline{\eta }^{2}(m-1)}tr\left[ \lambda (p)\times S^{2}(\widehat{{\beta }}(\eta )-\widehat{{\beta }}_{OLS}) \right] \nonumber \\= & {} \frac{S^{2}(\eta )(\widehat{{\beta }}_{OLS}^{\prime }\textbf{1} _{p})^{-1}}{\overline{\eta }^{2}(m-1)}tr\left[ \lambda (p)\times \left. \left( \frac{\partial \widehat{{\beta }}(\eta )}{\partial \eta }\frac{ \partial \widehat{{\beta }}^{\prime }(\eta )}{\partial \eta }\right) \right| _{\eta =\overline{\eta }}\right] . \end{aligned}$$
(9)

The index \(\delta _{1}\) reports an unknown coefficient \(\lambda \), depending on the dimension of the vector \({\varvec{\beta }}\).

The Proof is reported in Appendix A.

The departure from the unbiasedness of the REML variance component estimates of g and \(\sigma ^{2}\) can be measured by the displacement of the profile log-likelihood \(l_{R}^{*}\), when we introduce a perturbation vector \({\eta }=(\eta _{1},\ldots ,\eta _{m})^{\prime }\) in the likelihood that gives the OLS estimator \(\widehat{{\beta }} _{OLS}\). Remembering that with \(\eta =const.\), e.g., say \(\overline{\eta }\), among subjects (clusters), for the model (1) it happens that \(\widehat{{\varvec{\beta }}}(\eta )\equiv \widehat{{\varvec{\beta }}}_{OLS}\), it suffices we take the likelihood displacement [18] and [1] by the \(m\times 1\) vector \({\varvec{\eta }}\) of the perturbed profile log-likelihood \(l_{R}^{*}\) to get a measure for departure from unbiasedness. The influence of \({\varvec{\eta }}\) is then given by the “distance” of \(\widehat{{\varvec{\beta }}}(\eta )\) from \( \widehat{{\varvec{\beta }}}_{OLS}\). Formula (9) takes into account a coefficient of proportionality \(\lambda \), that depends on the non-linearity of the optimization estimation procedure, to be estimated.

Proposition 3

Under the unbalanced LME model (1), and given the REML profile log-likelihood \(l_{R}^{*}\), the perturbation \(m\times 1\) vector \({\eta }=(\eta _{1},\ldots ,\eta _{m})^{\prime } \), with \(\overline{{\eta }}=(\overline{\eta },\ldots ,\overline{ \eta })^{\prime }\), and the likelihood displacement \(LD({\varvec{\eta }} )\):

$$\begin{aligned} LD({\varvec{\eta }})= & {} 2[l_{R}^{*}(\widehat{{\varvec{\beta }}}_{OLS}( \overline{{\varvec{\eta }}}),\widehat{g}_{R})-l_{R}^{*}(\widehat{{\varvec{\beta }}}({\varvec{\eta }}),\widehat{g}_{R})], \nonumber \\ l_{R}^{*}(\widehat{{\beta }}_{OLS}(\overline{{\eta }}), \widehat{g}_{R})= & {} \left. l_{R}^{*}({\beta }(\overline{{\eta }}))\right| _{{\beta =}\widehat{{\beta }}_{OLS},g= \widehat{g}_{R}}, \nonumber \\ l_{R}^{*}(\widehat{{\beta }}({\eta }),\widehat{g}_{R})= & {} \left. l_{R}^{*}({\beta }(\overline{{\eta }} ))\right| _{{\beta =}\widehat{{\beta }},g=\widehat{g}_{R}} \end{aligned}$$
(10)

the sensitivity of the \(\widehat{{\varvec{\beta }}}_{OLS}\) estimate in the “direction” of \(\widehat{{\varvec{\beta }}}\) is given by the curvature \(\delta _{2}\) of the surface \(({\varvec{\omega }}^{\prime }, LD({\varvec{\omega }}))\) [1]:

$$\begin{aligned} \delta _{2}=\left. \frac{\partial ^{2}LD({\varvec{\omega }})}{\partial b^{2}}= \frac{\partial ^{2}LD(\overline{{\eta }}+b{\eta })}{\partial b^{2}}\right| _{b=0}, \end{aligned}$$
(11)

with \(LD({\omega })=\left. LD({\eta })\right| _{ {\eta =\omega }}\), \({\omega }=\ \overline{{\eta } }+b{\eta }=(\overline{\eta }+b\eta _{1},\ldots ,\overline{\eta }+b\eta _{m})^{\prime }\), and b a scalar. Furthermore, given the \(p\times m\) matrix:

$$\begin{aligned} \textbf{C}=\left\{ C_{ij}\right\} =\left\{ \frac{\partial ^{2}l_{R}^{*}( \widehat{{\beta }}({\eta }),\widehat{g}_{R})}{\partial \beta _{i}\partial \eta _{j}}\right\} , \end{aligned}$$

together with the negative information matrix of the model, of dimension \(p\times p\):

$$\begin{aligned} \textbf{I}(\beta )=\left\{ \frac{\partial ^{2}l_{R}^{*}(\widehat{{ \beta }}_{OLS}(\overline{{\eta }}),\widehat{g}_{R})}{\partial \beta _{i}\beta _{j}}\right\} , \end{aligned}$$

and the \(m\times m\) matrix \(\textbf{D}=\textbf{C}^{\prime } \textbf{I}^{-1}(\beta )\textbf{C}\), the maximum curvature is achieved over the norm \(\left\| {\eta }^{*}\right\| =1\) of the unit vector \({\eta }^{*}\) by value of \(\delta _{2}\), say \(\delta _{2}^{*}\), given by:

$$\begin{aligned} \delta _{2}^{*}=\max \left| {\eta }^{*\prime }\textbf{D}\eta ^{*}\right| . \end{aligned}$$

Then \(\delta _{2}^{*}\) is the specified by the eigenvector \({\varvec{\eta }}\) corresponding to the largest eigenvalue \(\delta _{2}^{*}\) of \(\textbf{D}\). The set of \(n_{i}\in \mathbb {N} \) correspondent to the maximum normal curvature \(\delta _{2}^{*}\), \(\delta _{2}^{*}(\textbf{n})\) is then given by the vector \(\textbf{n}^{*}=(n_{1}^{*},\ldots ,n_{m}^{*})\) for which we have \(\min \left\| m{\eta }^{*}-\textbf{n} \right\| \).

The Proof is provided in Appendix B.

The Fay-Herriot model described in Sect. 2 can be viewed as a particular case of the block-diagonal LME model in (1), for which we have one observation per subject (cluster), i.e., \( n_{i}=n_{i^{\prime }}=1,\ \Sigma _{i=1}^{m}=m\), a balanced design, and:

$$\begin{aligned} y_{i}= & {} \textbf{x}_{i}^{\prime }{\beta +}u_{i}+e_{i},\ \ \ \ i=1,\ldots ,m,\ \ \ \ \ u_{i}\sim N(0,\sigma _{u}^{2}),\ \ \ e_{i}\overset{ind}{\sim }N(0,\psi _{i}) \nonumber \\ \textbf{y}= & {} \textbf{X}\beta +\mathbf {Zu+e}, \textbf{y}=\underset{ 1\le i\le m}{\hbox {col}}(y_{i}),\ \ \ \textbf{X}=\underset{1\le i\le m}{ \hbox {col}}(\textbf{x}_{i}^{\prime }),\ \ \ \ \mathbf {Z=I}_{m}, \nonumber \\ \textbf{u}= & {} \underset{1\le i\le m}{\hbox {col}}(u_{i}),\ \ \ \ \ \textbf{e }=\underset{1\le i\le m}{\hbox {col}}(e_{i}),\ \ \ \ V_{i}=(\sigma _{u}^{2}+\psi _{i}),\ \ \ \ \textbf{V}=\underset{1\le i\le m}{diag}(V_{1},\ldots ,V_{m}) \end{aligned}$$
(12)

The restricted version of the log-likelihood of the FH model and the GLS fixed-effects estimator are then [18]:

$$\begin{aligned} l_{R}(\sigma _{u}^{2})= & {} const.-\frac{1}{2}\Sigma _{i=1}^{m}\log \left| V_{i}\right| -\frac{1}{2}\log \left| \Sigma _{i=1}^{m}\textbf{x}_{i} \textbf{x}_{i}^{\prime }V_{i}^{-1}\right| -\frac{1}{2}\Sigma _{i=1}^{m}(y_{i}-\textbf{x}_{i}^{\prime }{\beta })^{\prime }V_{i}^{-1}(y_{i}-\textbf{x}_{i}^{\prime }{\beta })\nonumber \\ \widehat{{\beta }}= & {} \left[ \Sigma _{i=1}^{m}\textbf{x}_{i}\textbf{x} _{i}^{\prime }V_{i}^{-1}\right] ^{-1}\left[ \Sigma _{i=1}^{m}\textbf{x} _{i}y_{i}V_{i}^{-1}\right] \nonumber \\= & {} \left[ \Sigma _{i=1}^{m}\textbf{x}_{i}(\sigma _{u}^{2}+\psi _{i})^{-1}\textbf{x}_{i}^{\prime }\right] ^{-1}\left[ \Sigma _{i=1}^{m}\textbf{x}_{i}(\sigma _{u}^{2}+\psi _{i})^{-1}y_{i}\right] . \end{aligned}$$
(13)

The estimator in (12) can be viewed as a special case of the estimator in (6). With:

$$\begin{aligned} \textbf{X}_{i}= & {} \textbf{x}_{i}^{\prime },\ \ \ \ \overline{\textbf{x}}_{i}= \textbf{x}_{i},\ \ \ \ \overline{y}_{i}=y_{i},\ \ \ \ g=\sigma _{u}^{2}, \\ n_{i}= & {} n_{i^{\prime }}=1,\ \ \ \Sigma _{i=1}^{m}n_{i}=m, \end{aligned}$$

and setting \(d_{i}=1-(\sigma _{u}^{2}+\psi _{i})^{-1}\), the estimator \( \widehat{{\beta }}\) for the FH model becomes:

$$\begin{aligned} \widehat{{\beta }}=\left[ \Sigma _{i=1}^{m}(\textbf{x}_{i}\textbf{x} _{i}^{\prime }-d_{i}\textbf{x}_{i}\textbf{x}_{i}^{\prime })\right] ^{-1} \left[ \Sigma _{i=1}^{m}(\textbf{x}_{i}y_{i}-d_{i}\textbf{x}_{i}y_{i})\right] . \end{aligned}$$

Now, if in the estimator (8) we have \(\eta _{i}=\alpha _{i},\ \ g=\sigma _{u}^{2}\), we get:

$$\begin{aligned} \frac{\eta _{i}^{2}g}{1+\eta _{i}g}\equiv d_{i}=\frac{\alpha _{i}^{2}\sigma _{u}^{2}}{1+\alpha _{i}\sigma _{u}^{2}}. \end{aligned}$$

Suppose we have \({\alpha }^{*}=(\alpha _{1}^{*},\ldots ,\alpha _{m}^{*})^{\prime }\), as a positive solution of \(\alpha _{i}^{2}\sigma _{u}^{2}-d_{i}\alpha _{i}\sigma _{u}^{2}-d_{i}=0\), then:

$$\begin{aligned} d_{i}^{*}= & {} \frac{(\alpha _{i}^{*})^{2}\sigma _{u}^{2}}{1+\alpha _{i}^{*}\sigma _{u}^{2}}=1-(\sigma _{u}^{2}+\psi _{i})^{-1} \\\longrightarrow & {} \psi _{i}=\frac{1+\alpha _{i}^{*}\sigma _{u}^{2}}{ 1+\alpha _{i}^{*}\sigma _{u}^{2}+(\alpha _{i}^{*})^{2}\sigma _{u}^{2} }-\sigma _{u}^{2} \end{aligned}$$

Proposition 4

Under the Fay-Herriot model in (12), and \( d_{i}=1-(\sigma _{u}^{2}+\psi _{i})^{-1}=\frac{\alpha _{i}^{2}\sigma _{u}^{2} }{1+\alpha _{i}\sigma _{u}^{2}}\), \(\alpha _{i}\in U\), \(U\subset \mathbb {R}:\) \(0<\alpha _{i}\le m\overline{\alpha }\) with \(\overline{\alpha } =\mu (\alpha )\) and \(S^{2}(\alpha )\) as the variance of the \( \alpha _{i}\)’s, we have that:

$$\begin{aligned} \widehat{{\beta }}({\alpha })= & {} \left[ \Sigma _{i=1}^{m}( \textbf{x}_{i}\textbf{x}_{i}^{\prime }-d_{i}\textbf{x}_{i}\textbf{x} _{i}^{\prime })\right] ^{-1}\left[ \Sigma _{i=1}^{m}(\textbf{x} _{i}y_{i}-d_{i}\textbf{x}_{i}y_{i})\right] , \nonumber \\ \delta _{1}= & {} \frac{S^{2}(\alpha )(\widehat{{\beta }}_{OLS}^{\prime } \textbf{1}_{p})^{-1}}{\overline{\alpha }^{2}(m-1)}tr\left[ \lambda (p)\times var(\widehat{{\beta }}({\alpha })-\widehat{{\beta }} _{OLS})\right] . \end{aligned}$$
(14)

When \(\alpha ^{*}=(\alpha _{1}^{*},\ldots ,\alpha _{m}^{*})^{\prime }=(\overline{\alpha },\ldots ,\overline{\alpha })^{\prime }\), \( d_{i}=d\), \(\widehat{{\beta }}(\alpha )\equiv \widehat{{ \beta }}_{OLS}\), and constant \(\psi =(\psi _{1},\ldots ,\psi _{m})=(\psi ,\ldots ,\psi )\), under normality we get:

$$\begin{aligned} (y_{i}-\textbf{x}_{i}^{\prime }{\beta })^{2}/(1-d)\sim \chi _{m-p}^{2}. \end{aligned}$$
(15)

The Proof is given in Appendix C.

3.2 A measure of the predictor efficiency

As in applied research and simulation experiments is quite essential the appraisal of the model fitting, here we introduce a normalized index as measure of efficiency. This index takes the form of an average correlation coefficient, with values that range between 0 and 1, with 1 denoting the maximum efficiency, and 0 the total inefficiency.

Since, in the area-level models, the residual error variance is not to be estimated, some interesting consequences in terms of the measure of the general efficiency of the predictor can arise.

In the frame of the area-level SAE models, we adapt the general linear mixed model \({\varvec{y}}= {\varvec{X}}{\varvec{\beta }} + {\varvec{Z}}{\varvec{u}} + {\varvec{e}}\), with \(var({\varvec{y}})={\varvec{V}}={\varvec{Z}}{\varvec{V}}_{u} {\varvec{Z}}^{\prime } + {\varvec{V}}_{e}= {\varvec{V}}_{u} + {\varvec{V}}_{e}\), where \({\varvec{Z}}={\varvec{I}}_m\) with \({\varvec{V}}_{u}=var({\varvec{u}})=\sigma ^2_u{\varvec{I}}_m, \quad {\varvec{V}}_{e}=var({\varvec{e}})=\underset{1\le i \le m}{diag}(\psi _{1}, \dots , \psi _{m})\) the matrix of the sampling variances, and \({\varvec{P}}={\varvec{V}}^{-1}({\varvec{I}}-{\varvec{P}}_{{\varvec{X}}})\) the projection matrix in the residuals complement subspace of \({{\varvec{y}}}\), with \({\varvec{P}}_{{\varvec{X}}} = {\varvec{X}} ({\varvec{X}}^{\prime }{\varvec{V}}^{-1}{\varvec{X}})^{-1}{\varvec{X}}^{\prime }{\varvec{V}}^{-1}\) the projection matrix onto the column space of \({\varvec{X}}\).

Considering the following relations

$$\begin{aligned} \widehat{{{\varvec{y}}}}^{blup}= & {} {{\varvec{y}}}-{{\varvec{V}}}_{e}{{\varvec{P}}}{{\varvec{y}}} \\ MSE(\widehat{{{\varvec{y}}}}^{blup})= & {} {{\varvec{V}}}_{e}-{{\varvec{V}}}_{e}{{\varvec{P}}}{{\varvec{V}}}_{e}, \end{aligned}$$

the conditional residuals are given by: \({{\varvec{r}}}^{c} = {{\varvec{y}}} - {{\varvec{y}}}^{blup}\):

$$\begin{aligned} {{\varvec{r}}}^{c}= & {} {{\varvec{y}}}-\widehat{{{\varvec{y}}}}^{blup}={{\varvec{V}}}_{e}{{\varvec{P}}}{{\varvec{y}}}, ~ ~ \text {and} \\ var({{\varvec{r}}}^{c})= & {} var({{\varvec{y}}}-\widehat{{{\varvec{y}}}}^{blup})= {{\varvec{V}}}_{e}{{\varvec{P}}}{{\varvec{V}}}{{\varvec{P}}}{{\varvec{V}}}_{e}={{\varvec{V}}}_{e}{{\varvec{P}}}{{\varvec{V}}}_{e}. \end{aligned}$$

The covariance between \({{\varvec{r}}}^{c}\) and \({{\varvec{e}}}\) is given by:

$$\begin{aligned} cov({{\varvec{r}}}^{c},{{\varvec{e}}})= & {} cov({{\varvec{V}}}_{e}{{\varvec{P}}}{{\varvec{y}}},{{\varvec{e}}})={{\varvec{V}}}_{e}{{\varvec{P}}}cov({{\varvec{y}}},{{\varvec{e}}}) \\= & {} {{\varvec{V}}}_{e}{{\varvec{P}}}{{\varvec{V}}}_{e}={{\varvec{V}}}_{e}-MSE(\widehat{{{\varvec{y}}}}^{blup})=var({{\varvec{r}}}^{c}). \end{aligned}$$

Denoting with \(c_{ij}\) the generic element of \(cov({{\varvec{r}}}^{c},{\varvec{e}})\) with \(i=1,\dots , m, \quad j = 1,\dots , m, \quad i \ne j\) indicating the small areas, and denoting with \(c_{ii}\) the diagonal one, we have:

$$\begin{aligned}{}[cov({{\varvec{r}}}^{c},{{\varvec{e}}})]_{ii}= & {} c_{ii}=\psi _{i}-g_{1,j}(\widehat{ {\theta } })-g_{2,i}(\widehat{{\theta }})\nonumber \\ corr({r}_{i}^{c},{e}_{j})= & {} \frac{cov({r}_{i}^{c},{e}_{j})}{\sqrt{ var({r}_{i}^{c})var({e}_{j})}}=\frac{c_{ij}}{\sqrt{c_{ii}\psi _{j}}}, \end{aligned}$$
(16)

where in case of \(i=j\) it holds \(corr({r}_{i}^{c},{e}_{i})=\sqrt{\frac{c_{ii}}{\psi _{i}}}\).

Following [14], a measure of the predictor efficiency, based on \(tr[cov({\varvec{r}}^{c},{\varvec{e}})]\) is then introduced as:

$$\begin{aligned} {\varepsilon } =\frac{1}{m}\sum _{i}corr(r_i^{c},e_i)=\frac{1}{m}\sum _{i} \sqrt{\frac{c_{ii}}{\psi _{i}}},\quad 0\le {\varepsilon } \le 1. \end{aligned}$$

That measure is closely linked to the capability of the model in fitting the data and the sampling variances. In fact, when this relationship is linear, the MSE can “replicate” the behavior of the sampling variances, translating into more efficiency—as the value of the sampling variances increases.

4 Simulations

In this section, three simulations are carried out in order to evaluate the impact of the distribution of the sampling variances on the small area predictor, in terms of increase or decrease in the performance measures. In most applications, when a direct estimator is available, the area variances of the direct estimator of the parameters under study depend on the sample size given by the design, together with the variance of the characteristic of interest in that area. The aim of the simulation experiments is to study the behavior of the linear predictor of the FH model, when focusing on the shape of the empirical distribution of a set of sampling variances. The cases investigated may represent, in our opinion, the realistic way in which the heterogeneity of the sampling variances may arise.

The behavior of the sampling variances is assessed when we pass from a situation of positive asymmetry to one of quasi-symmetry distribution. All the simulations are based on artificial data, but they differ by the relation which links the sampling variances and the parameter of interest.

A fundamental point in the simulation with artificial data is to preserve the proportionality that exists between the value of the small area parameter and its variability. To do this, for each domain of interest \(i=1,\dots ,m\), first, generate with a Beta distribution the standard error of the sampling variances \(\sqrt{\psi _i}\), and then obtain the value of the parameter (hereafter referred to as \(\bar{y}_i\)) by linking them in a relationship that makes the latter distributed as a Beta distribution of the second kind, in the first simulation, while, in the second simulation, they are linked in a linear relation with a Beta distribution. In the third simulation, instead, no relationship is assumed between the parameter of interest and the standard error of the sampling variances.

Furthermore, the use of the Beta distribution allows to manage of the data generating process, with regards to set and controlling the range [ab], but also to analyze the relationship between the variance and the asymmetry—which in a Beta distribution are inversely proportional.

In each simulation, three scenarios are settled: the first scenario represents the case of a strong positive asymmetry, and the second—even if remains a case of positive asymmetry—has the same variability as the third, set to be the quasi-symmetrical distribution scenario.

We treated the case of the FH model without auxiliary variables, that is ANOVA with random effects \(y_i = u_i + e_i\), with \( u_i \sim N(0, \sigma ^2_{u})\), and \(e_i \sim N(0, \psi _i)\).

4.1 Simulation 1

From the following expression \(\sqrt{\psi _{i}}= \frac{\bar{y}_{i} - \delta _1}{(\delta _2 - \delta _1)} \left( 1-\sqrt{\psi _{i}}\right) \) with fixed \(\delta _2\) and \(\delta _1\) is obtained the mean of the population parameter \(\bar{y}_{i}=\delta _1 + (\delta _2 - \delta _1) \frac{\sqrt{\psi _{i}}}{\left[ 1 - \sqrt{\psi _{i}}\right] }\).

Set the sampling variances distributed as a Beta with parameters \(\alpha \) and \(\beta \), \(\sqrt{\psi _{i}}\sim Beta(\alpha , \beta )\), it follows that \(\frac{\sqrt{\psi _{i}}}{\left[ 1 - \sqrt{\psi _{i}}\right] }=\frac{Beta(\alpha , \beta )}{[1 - Beta(\alpha , \beta )]}\) is a Beta distribution of the second kind. So the relation between the standard error of the sampling variances and the mean population parameter depends on a non-linear relation.

For fixed values of \(\sqrt{\psi _{i}}\) the population mean \(\bar{y}_{i}\) is an exact Beta distribution of second kind in the range \([\delta _1, \delta _2]\) with mean \(E(\bar{y}_i) = (\delta _2 - \delta _1)\frac{\alpha }{(\beta - 1)}\) with \(\beta > 1\), and variance \(V\left( \bar{y}_{i}\right) = (\delta _2 - \delta _1)^2 \frac{\alpha (\alpha + \beta - 1)}{(\beta - 2)(\beta - 1)^2}\) with \(\beta > 2\).

Simulation 1 has the following steps:

  1. 1.

    For \(k=1,2,3\) set the following combination of parameters \(\alpha _k=(1,2,3)\), \(\beta _k=(7,2.69,3)\) to have different scenarios of skewness.

  2. 2.

    Denoting the domain of interest as \(i=1,\dots ,m\) with \(m=200\), and define the sampling variances as \(\sqrt{\psi _i^{(k)}} \sim a+(b-a)Beta(\alpha _k, \beta _k)\) in the range \([a=0, b=1]\). Generate random samples of the sampling variances \(\psi _{i}^{(k)}\) for each scenario k of skewness.

  3. 3.

    Calculate the population parameter \(\bar{y}_{i}^{(k)}=\delta _1 + (\delta _2 - \delta _1) \frac{\sqrt{\psi _{i}^{(k)}}}{\left[ 1 - \sqrt{\psi _{i}^{(k)}}\right] }\) with \(\sqrt{\psi _{i}^{(k)}}\) generated in the previous step and with fixed \(\delta _1=0\) and \(\delta _2=3\).

  4. 4.

    Repeat \(L=10^3\) times (\(l=1,\ldots ,L\)):

    1. 4.1

      Calculate \(\psi ^{*(k,l)}_i \sim N(\psi _i^{(k)}, 0.0001)\).

    2. 4.2

      Calculate the parameter in the l-th sample replicate as \(\bar{y}_{i}^{*(k,l)}=\delta _1 + (\delta _2 - \delta _1) \frac{\sqrt{\psi _{i}^{*(k,l)}}}{\left[ 1 - \sqrt{\psi _{i}^{*(k,l)}}\right] }\).

    3. 4.3

      Fit the Anova model to the simulated data and obtain the predictor \(\tilde{y}_i^{(k,l)}\).

    4. 4.4

      Calculate the \(mse\left( \tilde{y}_i^{(k,l)} \right) \).

  5. 5.

    Calculate the following performance measures BIAS and Root MSE (RMSE):

    $$\begin{aligned} BIAS_{i}^{(k)}=\frac{1}{L}\sum _{l=1}^{L}(\tilde{y}_{i}^{(k,l)}-\bar{y}_i^{(k)}),\quad RMSE_{i}^{(k)}=\bigg (\frac{1}{L}\sum _{l=1}^{L}(\tilde{y}_{i}^{(k,l)}-\bar{y}_{i}^{(k)})^2\bigg )^{1/2}, \end{aligned}$$
  6. 6.

    Calculate the corresponding relative performance measures in %, i.e Average Absolute Relative BIAS (AARBIAS) and Average Absolute Relative Root MSE (AARRMSE):

    $$\begin{aligned} RBIAS_{i}^{(k)}= & {} 100 \frac{BIAS_{i}^{(k)}}{\bar{\bar{y}}_i^{(k)}}, \ RRMSE_{i}^{(k)}=100 \frac{RMSE_{i}^{(k)}}{\bar{\bar{y}}_i^{(k)}}, \ \text {with} \ \bar{\bar{y}}_i^{(k)}=\frac{1}{L}\sum _{l=1}^{L}\bar{y}_i^{(k)}\\ AARBIAS^{(k)}= & {} \frac{1}{m}\sum _{i=1}^{m}|RBIAS_i^{(k)}|, \ AARRMSE^{(k)}=\frac{1}{m}\sum _{i=1}^{m}|RRMSE_i^{(k)} |. \end{aligned}$$
  7. 7.

    Calculate the estimated MSE of the predictor, \(mse(\tilde{y}_{i}^{(k,l)})\) denoted as Average Root MSE (ARMSE):

    $$\begin{aligned} ARMSE^{(k)} = \frac{1}{m} \sum _{i=1}^{m} \sqrt{ \frac{1}{L} \sum _{l=1}^{L} mse\left( \tilde{y}_i^{(k,l)} \right) }. \end{aligned}$$
  8. 8.

    Calculate the following efficiency measure introduced in Sect. 3.2:

    $$\begin{aligned} {\varepsilon ^{(k)}} =\frac{1}{m}\sum _{i=1}^{m}corr(r_i^{c},e_i)^{(k)}=\frac{1}{mL}\sum _{i=1}^{m}\sum _{l=1}^{L}\sqrt{\frac{c_{ii}^{(k,l)}}{\psi ^{*(k,l)}_i}}, \quad 0\le {\varepsilon ^{(k)}} \le 1. \end{aligned}$$

Table 1 presents the results of Simulation 1 for the performance measures of the predictor in three scenarios of skewness for the distribution of the sampling variances. For the considered k scenarios the corresponding variances of the Beta distribution of \(\sqrt{\psi _i^{(k)}}\) are respectively: 0.01, 0.04 and 0.04. While the corresponding variances of the Beta distribution of second kind are respectively: 0.04, 3.75, and 3.75. The results show that the relative BIAS and the relative MSE, as well as the efficiency measure \(\varepsilon \), tend to decrease as the sampling variances from a skewness tend to a quasi-symmetric shape. For the estimated mse of the predictor the behaviour is opposite.

Table 2 shows the results of Simulation 1 when the sampling variances are smoothed by a Generalized Variance Function approach (GVF) [4]. Given the i-th area parameter estimate \(\bar{y}_{i}\), the GVF applied gives the smoothed variances \(\psi _{i}^{GVF}=\widehat{\alpha }_{0}\bar{y}_{i}^{2}+\widehat{\alpha }_{1}\bar{y}_{i}\) by the ordinary least squares regression \(cv^{2}(\bar{y}_{i})=\frac{\psi _{i}}{\bar{y}_{i}^{2}}=\alpha _{0}+\frac{\alpha _{1}}{\bar{y}_{i}}\).

In Fig. 1 are reported the histograms of the distributions in the three scenarios from asymmetry to quasi-symmetry for the \(\sqrt{\psi _i^{(k)}}\).

Table 1 Performance measures of \(\tilde{y}^{*(k)}\) in each scenario of skewness \(k=1,2,3\) with parameters \(\alpha _k=(1,2,3)\) and \(\beta _k=(7,2.69,3)\), when a non-linear relation between \(\bar{y}_i\) and \(\psi _i\) is assumed
Table 2 Performance measures of \(\tilde{y}^{*(k)}\) in each scenario of skewness \(k=1,2,3\) with parameters \(\alpha _k=(1,2,3)\) and \(\beta _k=(7,2.69,3)\) when a non-linear relation between \(\bar{y}_i\) and \(\psi _i\) is assumed and smoothed sampling variances \(\psi _{i}^{GVF}\) are applied
Fig. 1
figure 1

Distributions of the \(\sqrt{\psi _i^{(k)}} \sim Beta(\alpha _k, \beta _k)\) with \(\alpha _k=(1,2,3)\) and \(\beta _k=(7,2.69,3)\) in different scenarios of skewness: \(k=1\) (left) extreme asymmetry, \(k=2\) (center) asymmetry, \(k=3\) (right) quasi-symmetry

4.2 Simulation 2

Denoting with \(\sqrt{\psi _i} \sim Beta(\alpha , \beta )\) the standard error of the sampling variances distributed as a Beta with parameters \(\alpha \) and \(\beta \). In order to simulate a real data situation where there is a proportional relation between the parameter of interest and the sampling variances, here the population parameter of the mean is set in a linear relation with the standard error of the sampling variances as follows: \(\bar{y}_i=\delta _1+\delta _2\sqrt{\psi _i}\), with known coefficients \(\delta _1\) and \(\delta _2\).

Simulation 2 has the following steps:

  1. 1.

    For \(k=1,2,3\) set the following combination of parameters \(\alpha _k=(1,1,3)\), \(\beta _k=(7,3.1,3)\) to have different scenarios of skewness.

  2. 2.

    Denoting the domain of interest as \(i=1,\dots ,m\) with \(m=200\). Define the sampling variances as \(\sqrt{\psi _i^{(k)}} \sim a+(b-a)Beta(\alpha _k, \beta _k)\) in the range \([a=0, b=1]\) and generate random samples for the sampling variances \(\psi _i^{(k)}\) for each scenario k of skewness.

  3. 3.

    Calculate the population parameter \(\bar{y}_i^{(k)}=\delta _1+\delta _2\sqrt{\psi _i^{(k)}}\) with \(\delta _1=1\), \(\delta _2=3\).

  4. 4.

    Repeat \(L=10^3\) times (\(l=1,\ldots ,L\)):

    1. 4.1

      Calculate \(\psi ^{*(k,l)}_i \sim N(\psi _i^{(k)}, 0.0001)\).

    2. 4.2

      Calculate the sampling parameter of the mean as \(\bar{y}_i^{(k,l)}=\delta _1+\delta _2\sqrt{\psi _i^{*(k)}}\).

    3. 4.3

      Fit the ANOVA model to the data and obtain the empirical predictor \(\tilde{y}_i^{(k,l)}\).

    4. 4.4

      Calculate the \(mse\left( \tilde{y}_i^{(k,l)} \right) \).

  5. 5.

    Calculate the performance measures BIAS and RMSE, the corresponding relative performance measures in % AARBIAS and AARRMSE with Average Root MSE of the predictor and the efficiency measure \(\varepsilon \) following the expressions in Simulation 1 (steps from 5 to 8).

Table 3 presents the results of Simulation 2 for different scenarios on the distribution of the sampling variances. For the considered k scenarios of skewness the corresponding variances of the Beta distribution of \(\sqrt{\psi _i^{(k)}}\) are respectively: 0.01, 0.04 and 0.04. Note that between scenario \(k=2\) and \(k=3\) we have no difference in the variability for the distribution of \(\sqrt{\psi _i}\). As the variance of the distribution of the sampling variances increases when we pass from scenario \(k=1\) to scenario \(k=2\) or \(k=3\), the relative measures of MSE and BIAS, as well as the estimated mse of the predictor, tend to increase consequently, and it is the same for the efficiency measure \(\varepsilon \).

Table 3 Performance measures of \(\tilde{y}^{(k)}\) in each scenario of skewness \(k=1,2,3\) with parameters \(\alpha _k=(1,2,3)\) and \(\beta _k=(7,2.69,3)\), when a linear relation between \(\bar{y}_i\) and \(\psi _i\) is assumed

4.3 Simulation 3

In this simulation no relation is assumed between the population parameter of the mean and the standard error of the sampling variances, as this often happens in practice. Denoting with \(\sqrt{\psi _i} \sim Beta(\alpha , \beta )\) the standard error of the sampling variances distributed as a Beta with parameters \(\alpha \) and \(\beta \). With known parameters \(\mu _{\bar{y}}=3\) and \(\sigma ^2_{\bar{y}}=0.01\), the population mean is set as \(\bar{y}_i \sim N (\mu _{\bar{y}}, \sigma ^2_{\bar{y}})\).

Simulation 3 has the following steps:

  1. 1.

    For \(k=1,2,3\) set the following combination of parameters \(\alpha _k=(1,1,3)\), \(\beta _k=(7,3.1,3)\) to have different scenarios of skewness.

  2. 2.

    Denoting the domain of interest as \(i=1,\dots ,m\) with \(m=200\). Define the sampling variances as \(\sqrt{\psi _i^{(k)}} \sim a+(b-a)Beta(\alpha _k, \beta _k)\) in the range \([a=0, b=1]\) and generate random samples for the sampling variances \(\psi _i^{(k)}\) for each scenario k of skewness.

  3. 3.

    Calculate the population mean \(\bar{y}^{(k)}_i \sim N (\mu _{\bar{y}}, \sigma ^2_{\bar{y}})\).

  4. 4.

    Repeat \(L=10^3\) times (\(l=1,\ldots ,L\)):

    1. 4.1

      Calculate \(\psi ^{*(k,l)}_i \sim N(\psi _i^{(k)}, 0.0001)\).

    2. 4.2

      Calculate the sampling replicates of the mean as \(\bar{y}_i^{(k,l)} \sim N (\mu _{\bar{y}}, 0.5)\).

    3. 4.3

      Fit the ANOVA model to the sampling data and obtain the empirical predictor \(\tilde{y}_i^{(k,l)}\).

    4. 4.4

      Calculate the \(mse\left( \tilde{y}_i^{(k,l)} \right) \).

  5. 5.

    Calculate the performance measures of BIAS and RMSE, the corresponding relative performance measures in %, AARBIAS and AARRMSE, together with the Average Root estimated MSE of the predictor and the efficiency measure \(\varepsilon \) following the expressions in Simulation 1 (steps from 5 to 8).

Table 4 presents the results of Simulation 3 for different scenarios on the distribution of the sampling variances, when no relation is assumed with the parameter of interest. When we pass from scenario \(k=1\) to scenario \(k=2\) or \(k=3\), the relative measures of BIAS and MSE, as well as the estimated mse of the predictor, tend to decrease. For the efficiency measure \(\varepsilon \) the behaviour is opposite.

Table 4 Performance measures of \(\tilde{y}^{(k)}\) in each scenario of skewness \(k=1,2,3\) with parameters \(\alpha _k=(1,2,3)\) and \(\beta _k=(7,2.69,3)\), when no relation between \(\bar{y}_i\) and \(\psi _i\) is assumed

4.4 Simulation results

The simulation experiments highlight different behaviors of bias, mean squared error, and efficiency of the linear predictor, when the shape of the empirical distribution of the sampling variances range from asymmetric to symmetric. And at the same time, is shown what happens when different kind of relations may occur between the area parameters and the corresponding sampling variances. Three of these relations are studied, non-linear, linear, and absence of a relation, that means that the sampling variances may vary randomly when given the area parameters. Table 1 shows that in case of non-linear relationship between the sampling variances and the area parameter estimates in the Simulation 4.1 both the AARBIAS and the AARMSE decrease, while the prediction ARMSE increases. In this first simulation experiment, because the non-linear relationship between the sampling variances and the area parameters is based on a Beta distribution of the second kind, i.e. \(\frac{\sqrt{\psi _{i}}}{1-\sqrt{\psi _{i}}}\sim Beta^{*}(\alpha ,\beta )\), with \(\sqrt{\psi _{i}}\sim Beta(\alpha ,\beta )\) in the range (0, 1), in some cases we may observe that \(\sqrt{\psi _{i}}\longrightarrow 1\), bringing the correspondent generated area parameters to increase considerably. In particular, the cases in which it could be observed that \(\sqrt{\psi _{i}}\) approaches 1 are those in which the sample variances are more heterogeneous and more sparse (i.e. passing from scenario 1 to scenario 3). In these cases, also the variance component estimates of the FH model increase, leading to a numerically relevant value of the leading g1 component of the prediction means squared error. In fact, when \(\widehat{\sigma }_{u}^{2}\) increases, the value of \(\widehat{\gamma }_{i}=\frac{\widehat{\sigma }_{u}^{2}}{\widehat{\sigma }_{u}^{2}+\psi _{i}}\) for the i-th area approaches 1, being \(g_{1}(i)=\widehat{\gamma } _{i}\psi _{i}=\frac{\widehat{\sigma }_{u}^{2}\psi _{i}}{\widehat{\sigma } _{u}^{2}+\psi _{i}}\), and the mean squared error tends to \(\psi _{i}\). The efficiency index \(\varepsilon \) in the Simulation 4.1 decreases, going from asymmetry to symmetry of the shape (Fig. 1). In this scenario, the prediction mean squared error increases, so that \(c_{ii}=cov(r_{i}^{c},e_{i})\) decreases, together with the overall measure of the index \(\varepsilon \). Table 2 shows the same experiment of the Simulation 4.1, when the sampling variances are smoothed by a GVF approach. Even if we apply the GVF approach in smoothing sampling variances, the behavior of the indexes under investigation remains approximately the same. The only difference is in terms of the magnitude of the numerical outcomes. All the indexes are greater, for every level of asymmetry considered, leading to consider that this type of GVF smoothing may worsen the estimation of the linear predictor.

Simulation 4.2 is conducted by considering the same asymmetry scenarios described by Simulation 4.1, when the relation between sampling variances ad area level parameters estimates is now linear, with the sampling variances that increase accordingly to the values of the area parameter estimates. Table 3 shows the summary of the findings of the simulation experiment. All the performance measures considered move increasing, going from asymmetry (\(k=1\) scenario) to symmetry (\(k=3\) scenario) of the shape of the empirical distribution of the sampling variances generated randomly, taking into consideration the linear relationship described. While AARBIAS, AARMSE, and ARMSE increase, when moving toward symmetry, and highlight that when the sampling variances are very heterogeneous (proceeding towards the symmetry of the shape, as in Fig. 1), the efficiency index \(\varepsilon \) increases on its own. Despite that the values of the numerator of \(corr(r_{i}^{c},e_{i})\), i.e., the value of \(\sqrt{c_{ii}}\), decreases, due to the values of the ARMSE passing from asymmetry to symmetry, these areas for which this happens report a relatively small sampling variances (the denominator of the index \(\varepsilon \)). Conversely, the linearity depicted by the experiment between the sampling variances and the area parameter estimates may outperform the \(cov(r_{i}^{c},e_{i})\) respect to the correspondent \(\psi _{i}\), that explains the numerical increment of the efficiency index.

Simulation 4.3 reports the outcomes of the experiment in which the sampling variances data are generated without any relation with the small area parameter estimates, following a random normal distribution. The results are similar to the Simulation 4.1, because sampling variances result a little bit concentrated, due to the shape of the normal distribution. Moving towards the symmetry of the shape of the empirical distribution of the sampling variances, AARBIAS and AARMSE decrease, while the ARMSE increases, following the trend observed in the experiment of the Simulation 4.1, accordingly to the situation of relatively concentrated values around the median of the sampling variances. This implies that, according to the theoretical considerations above, when sampling variances are very similar, and then the generalized least squares estimator converges to the ordinary least squares estimator, both biases and average root mean squared errors would improve. The ARMSE index increases, due to a similar behavior of the leading part of the prediction mean squared error, i.e., the \(g_{1}\). Different is the behavior of the index of efficiency \(\varepsilon \), respect to what happens in the Simulation experiment 4.1. A tentative explanation, in our opinion, is that the random and independent values of the sampling variances, whatever are the values of the area parameters, the covariance between the model conditional residuals and the model errors (16) (i.e., the sampling error in the FH model) results in general very high. This because the model fitting procedure operates, assigning randomly, and independently of parameters, the sampling variances to the observations with heterogeneous mean squared errors.

5 Discussion

The Fay-Herriot model is a widely used model in the SAE methodology. The main research issue investigated by the present paper is related to the assessment of the impact of the distribution of the sampling variances on the Best Linear Unbiased Predictor of the model. Together with the mean squared prediction error, an efficiency measure of the predictor is also introduced.

The examination of the relation between the direct estimates at area level and their sampling variances leads to some considerations. The first is that, as studied via simulations, non-linear and linear relationships lead to two different situations. Linear relations between area direct estimates confirm that the average absolute relative bias of the linear predictor increases when the variability of the sampling variances increases. The contrary occurs when the relationship is non-linear. In our opinion, this means that when exists linearity, the departure of the GLS estimator from the OLS estimator respects the rule that large heteroscedasticity of the sampling variances can lead to imprecise estimates of the predictor. In the case of non linearity, the precision of the linear predictor is connected mainly with the asymmetry of the empirical distribution of the sampling variances, as well as with the concentration of the sampling variances. This is because the rule based on the GLS-OLS joint evaluation (see Proposition 2 in the paper) is no longer suitable and does not exist a linear model to fit the data. The present paper proposes two different measures of the bias of the REML estimates, both for the LME model with random intercepts and unbalanced data and for the Fay-Herriot model. They are treated under the same behavior in terms of the departure from the OLS fixed effects estimator. The first is based on the evaluation of the variance of the GLS estimator, in terms of the variability of the sampling variances among the small areas. The second is the employment of the likelihood displacement, measured starting from the vector of equal sampling variances that leads to unbiased OLS estimates, evaluated in the direction given by the actual set of sampling variances. The index of efficiency introduced in the present work takes in account the capability of the linear predictor in reducing the mean squared error of the small area estimates, given the area sampling variances. The index may have, in our opinion, a general application. Despite the literature offers several extension of the area-level small area estimation models, like spatial, data transformed, robust models (see [3, 16, 20, 21]), a measure of the difference area by area between the sampling variance and the mean squared error of the actual linear predictor is available. Then, it can become always possible to form the index, and consequently, to evaluate the correlations that constitutes both the analytical and aggregated measures of the performance of the linear predictor inside the small areas, in terms of its mean squared error.

The Fay-Herriot model is regarded as the basic area-level model, whose peculiarity is to consider the small areas sampling variances as known. However, it may happen that, due to small sample sizes, sampling variances can vary considerably. The issue of stabilize the small area variances is a very heartfelt issue for statisticians that work in the field of sampling surveys. A standard method for smoothing the sample variances is the Generalized Variance Function approach. One way to overcome the problems exposed in the present work, concerning the bias and the mean squared error of the linear predictor of the Fay-Herriot model, may be given by this method. As proved via empirical simulation experiments, the more concentrated and similar in their values are the sampling variances, the less biased is the linear predictor (see Simulations 4.1 and 4.2). In fact, with less unstable sampling variances, the Generalized Variance Function approach predicts variances by smoothing, forcing the original sampling variances to be more similar. This may guarantee that in certain cases, i.e. when there is a linear relation between the sampling variances and the area parameters, the smoothing of the sampling variances reduces biases and mean squared errors. This is if the set of the predicted variances result finally less heterogeneous, after applying the Generalized Variance Function smoothing procedures. Simulation 4.1 (Table 2) shows, on the contrary, that when there is non-linearity between the sampling variances and the area parameters, the smoothing may not achieve the desired result. Or it may be very harmful when searching for reducing bias and mean squared error of the linear predictor.