1 Introduction

In survey analysis estimation of characteristics of interest for subpopulations (also called domains or small areas) for which sample sizes are small is challenging. We adopt an approach were the survey estimates are improved via covariate information. To produce reliable estimates in surveys utilizing covariates for small areas is known as the small-area estimation (SAE) problem (Pfeffermann 2002). Rao (2003) has given a comprehensive overview of theory and methods of model-based SAE. Most surveys are conducted continuously in time based on cross-sectional repeated measures data. There are also some works related to time series and longitudinal surveys in small-area estimation, for example, one can refer to Consortium (2004), Ferrante and Pacei (2004), Nissinen (2009), Singh and Sisodia (2011) and Ngaruye et al. (2017). In Ngaruye et al. (2017), the authors have proposed a multivariate linear model for repeated measures data in a SAE context. The model is a combination of the classical growth curve model (Potthoff and Roy 1964) and a random effects model. This model accounts for longitudinal surveys, i.e., units are sampled ones and then followed in time, grouped (blocked) response units and time correlated random effects. It is common to obtain incomplete repeated measures data in longitudinal surveys. In this article, we extend the above mentioned model and let the model include a monotonic missing observation structure. In particular, drop-outs from the survey can be handled, i.e., when it is planned to follow units in time, but before the end-point some units disappear.

Missing data may be due to a number of limitations such as unexpected budget constraints, but also it may happen that for various reasons units for which the measurements were expected to be sampled over time disappeared from the survey. The statistical analysis of data with missing values emerged early in the 1970s with advancement of modern computer-based technology (Little and Rubin 1987). Since then, several methods of analysis of missing data have been developed following the missing data mechanism whether ignorable for inferences which includes missing data at random and missing data completely at random or nonignorable missing data. Many authors have dealt with the problem of missing data and we can refer to Little and Rubin (1987), Carriere (1999), Srivastava (2002), Kim and Timm (2006) and Longford (2006), for example. In particular, incomplete data in the classical growth curve models and in random effects growth curve model have been considered, for example, by Kleinbaum (1973), Woolson and Leeper (1980), Srivastava (1985), Liski (1985), Liski and Nummi (1990), and Nummi (1997).

In Sect. 2, we present the formulation of a multivariate linear model for repeated measures data. Thereafter this model is extended to handle missing data. A “canonical” form of the model is considered in Sect. 3. In Sect. 4, the estimation of parameters and prediction of random effects and small-area means are derived. Finally, in Sect. 5, we give a small simulation study to show the performance of the proposed estimation procedure.

2 Multivariate linear model for repeated measures data

We will in this section consider the multivariate linear regression model for repeated measurements with covariates at p timepoints suitable for discussing the SAE problem, which was defined by Ngaruye et al. (2017), when data are complete. It is supposed that the target population of size N, whose characteristic of interest y is divided into m subpopulations called small areas of sizes \(N_d\), \(d=1,\ldots ,m\), and the units in all small areas are grouped in k different categories. Furthermore, we assume the mean growth of each unit in area d for each one of the k groups to be, for example, a polynomial in time of degree \(q-1\) and also suppose that we have covariate variables related to the characteristic of interest, whose values are available for all units in the population. Out of the whole population N and small areas \(N_d\), n and \(n_d\) “units” are sampled according to some sampling scheme which, however, technically in the present work is of no interest. The model at small-area level for the sampled units is written:

$$\begin{aligned} \varvec{ Y}_d&=\varvec{ A}\varvec{ B}\varvec{ C}_{d}+\varvec{1}_p\varvec{\gamma }'\varvec{ X}_{d}+\varvec{ u}_d\varvec{z}_d'+\varvec{ E}_d,\nonumber \\&\varvec{u}_d\sim \mathcal {N}_{p}(\varvec{0},\varvec{\Sigma }^u), \qquad \varvec{ E}_d\sim \mathcal {N}_{p,n_d}(\varvec{0},\varvec{\Sigma }_e, \varvec{ I}_{n_d}), \end{aligned}$$
(1)

and when combining all disjoint m small areas and all n sampled units divided into k non-overlapping group units yields

$$\begin{aligned} \varvec{Y}&=\varvec{A}\varvec{B}\varvec{H}\varvec{C}+\varvec{1}_p\varvec{\gamma }'\varvec{X}+\varvec{U}\varvec{Z}+\varvec{E},\nonumber \\&\varvec{U}\sim \mathcal {N}_{p,m}(\varvec{0},\varvec{\Sigma }^u,\varvec{I}_m),~~p\le m,\qquad \varvec{E}\sim \mathcal {N}_{p,n}(\varvec{0},\varvec{\Sigma }_e,\varvec{I}_n), \end{aligned}$$
(2)

where \(\varvec{\Sigma }^u\) is an unknown arbitrary positive definite matrix and without loss of generality \(\varvec{\Sigma }_e=\sigma _e^2\varvec{I}_p\) which is assumed to be known. In practise \(\sigma _e^2\) is estimated from the survey and only depends on how many units are sampled from the total population N. In model (2), \(\varvec{Y}{:}p\times n\) is the data matrix, \(\varvec{A}{:}p\times q,~q\le p\), is the within individual design matrix indicating the time dependency within individuals, \(\varvec{B}{:}q\times k\) is unknown parameter matrix, \(\varvec{C}{:}mk\times n\) with \(\mathrm{rank}(\varvec{C})+p\le n \quad \text {and} \quad p\le m\) is the between individuals design matrix accounting for group effects, \(\varvec{\gamma }\) is an r-vector of fixed regression coefficients representing the effects of auxiliary variables, \(\varvec{X}{:}r\times n\) is a known matrix taking the values of the covariates, the matrix \(\varvec{U}{:}p\times m \) is a matrix of random effect whose columns are assumed to be independently distributed as a multivariate normal distribution with mean zero and a positive dispersion matrix \(\varvec{\Sigma }^u\), i.e., \(\varvec{U}\sim \mathcal {N}_{p,m}(\varvec{0},\varvec{\Sigma }^u,\varvec{I}_m)\), \(\varvec{Z}:m\times n \) is a design matrix for random effect and the columns of the error matrix \(\varvec{E}\) are assumed to be independently distributed as p-variate normal distribution with mean zero and and known covariance matrix \(\varvec{\Sigma }_e\), i.e., \(\varvec{E}\sim {N}_{p,n}(\varvec{0},\varvec{\Sigma }_e,\varvec{I}_n)\). The matrix \(\varvec{H}=(\varvec{I}_k:\varvec{I}_k\dots \varvec{I}_k):k\times mk\) captures all k group units by stacking as column blocks the m data matrices of model (1) together in a new matrix and is included in the model for technical issues of estimation. More details about model formulation and estimation of model parameters can be found in Ngaruye et al. (2017).

3 Incomplete data

Consider model (2) and suppose that there are missing values in such a way that the measurements taken at time t, (for \(t=1,\ldots ,p\)), on each unit are not all complete and the number of observations for the different p timepoints are \(n_{1},\ldots ,n_{p}\), with \(n_{1}\ge n_{2}\ge \cdots \ge n_{p}>p\). Such a pattern of missing observations follows a so-called monotone sample.

Let the sample observations be composed of mutually disjoint h sets according to the monotonic pattern of missing data, where the ith set, (\(i=1,\ldots ,h\)), is the sample data matrix \(\varvec{Y}_{i}:p_i\times n_{i}\), whose units in the sample have completed for the first period \(i=1\) and failed to complete for \(i=2,\ldots ,h\) periods with \(p_i\le p\) and \(\sum _{i=1}^hp_i=p\). Such a data set is called an h-step monotone missing data pattern. For technical simplicity, in this paper, we only study a three-step monotone missing structure with complete sample data for a given number of timepoints and incomplete sample data for the other timepoints. This means that we have a complete data set \(n_1\) observations with \(p_1\) dimension and an incomplete data set \(n_2\) and \(n_3\) observations with \(p_2\) and \(p_3\) dimensions, where \(p_1+p_2+p_3=p\).

3.1 The model which handles missing data

In this article, we will only present details for a three-step monotonic pattern. We assume that the model, defined in (2), holds together with a monotonic missing structure. This extended model can be presented by three equations:

$$\begin{aligned} \varvec{Y}_i = \varvec{A}_{i}\varvec{B}\varvec{H}\varvec{C}_i+\varvec{1}_{p_i}\varvec{\gamma }'\varvec{X}_i+\varvec{U}_i\varvec{Z}_i+\varvec{E}_i,\qquad i=1,2,3, \end{aligned}$$
(3)

where \(\varvec{A}'=(\varvec{A}_1':\varvec{A}_2':\varvec{A}_3')\), \(\varvec{A}_i\): \(p_i\times q\), \(q<p\), \(\sum _{i=1}^3p_i=p\), \(\varvec{H}=(\varvec{I}_k:\varvec{I}_k\dots \varvec{I}_k)\): \(k\times mk\),

$$\begin{aligned} \varvec{C}_{i}= \begin{pmatrix} \varvec{C}_{i1}&{} {} &{} \varvec{0}\\ {} &{} \ddots &{} {}\\ \varvec{0}&{} {} &{} \varvec{C}_{im} \end{pmatrix},\qquad \varvec{C}_{id}= \begin{pmatrix} \varvec{1}_{n_{id1}}'&{} {} &{} \varvec{0}\\ {} &{} \ddots &{} {}\\ \varvec{0}&{} {} &{} \varvec{1}_{n_{idk}}' \end{pmatrix}, \end{aligned}$$

\(n_{idg}\) equals the number of observations for the response \(\varvec{Y}_i\), dth small area and gth group, \(\varvec{X}_{i}\) represents all covariates for the \(\varvec{Y}_{i}\) response,

$$\begin{aligned} \varvec{Z}_{i}= \begin{pmatrix} \varvec{z}_{i1}'&{} {} &{} \varvec{0}\\ {} &{} \ddots &{} {}\\ \varvec{0}&{} {} &{} \varvec{z}_{im}' \end{pmatrix},\qquad \varvec{z}_{id}=\frac{1}{\sqrt{n_{id}}}\varvec{1}_{n_{id}},\qquad i=1,2,3,\quad d=1,2,\dots ,m, \end{aligned}$$
(4)

\(\varvec{U}_1=(\varvec{I}_{p_1}:\varvec{0}:\varvec{0})\varvec{U}\), \(\varvec{U}_2=(\varvec{0}:\varvec{I}_{p_2}:\varvec{0})\varvec{U}\), \(\varvec{U}_3=(\varvec{0}:\varvec{0}:\varvec{I}_{p_3})\varvec{U}\), \(\varvec{U}\sim \mathcal {N}_{p,m}\Big (\varvec{0}, \varvec{\Sigma }^u, \varvec{I}_m\Big )\), \(\varvec{E}_i\sim \mathcal {N}_{p_i,n_i}\Big (\varvec{0}, \varvec{I}_{p_i}, \sigma ^2_i\varvec{I}_{n_i}\Big )\), \(\{\varvec{E}_i\}\) are mutually independent and \(\varvec{E}_i\) is independent of \(\varvec{U}_i\). In particular the construction of \(\varvec{Z}_{i}\) helps to derive a number of mathematical results including

$$\begin{aligned} \mathcal {C}(\varvec{Z}_i')\subseteq \mathcal {C}(\varvec{C}_i'), \qquad \varvec{Z}_i\varvec{Z}_i'=\varvec{I}_m, \end{aligned}$$
(5)

where \(\mathcal {C}(\varvec{Q})\) stands for the column vector space generated by the columns of the matrix \(\varvec{Q}\).

3.2 A canonical version of the model

The model defined through (3) will be transmitted to a simpler model which will be utilized when estimating the unknown parameters. A couple of definitions will be necessary to introduce but first it is noted that because \(\mathcal {C}(\varvec{Z}_i')\subseteq \mathcal {C}(\varvec{C}_i')\)

$$\begin{aligned} (\varvec{C}_i\varvec{C}_i')^{-1/2}\varvec{C}_i\varvec{Z}_i'\varvec{Z}_i\varvec{C}_i'(\varvec{C}_i\varvec{C}_i')^{-1/2},\quad i=1,2,3, \end{aligned}$$

are idempotent. It is supposed that we have so many observations that the inverses exist. Therefore, there exists an orthogonal matrix \(\varvec{\Gamma }_i=(\varvec{\Gamma }_{i1}:\varvec{\Gamma }_{i2})\), \(km\times m,\,km\times (k-1)m\), such that

$$\begin{aligned} (\varvec{C}_i\varvec{C}_i')^{-1/2}\varvec{C}_i\varvec{Z}_i'\varvec{Z}_i\varvec{C}_i'(\varvec{C}_i\varvec{C}_i')^{-1/2}= \varvec{\Gamma }_i \begin{pmatrix} \varvec{I}_{m}&{} {} &{} \varvec{0}\\ \varvec{0}&{} {} &{} \varvec{0} \end{pmatrix} \varvec{\Gamma }_i'=\varvec{\Gamma }_{i1}\varvec{\Gamma }_{i1}',\quad i=1,2,3. \end{aligned}$$

Moreover \(\varvec{\Gamma }_{i1}'\varvec{\Gamma }_{i1}=\varvec{I}_m\). Put

$$\begin{aligned} \varvec{K}_{ij}= & {} \varvec{H}(\varvec{C}_i\varvec{C}_i')^{1/2}\varvec{\Gamma }_{ij},\qquad i=1,2,3,\quad j=1,2,\nonumber \\ \varvec{R}_{ij}= & {} \varvec{C}_i'(\varvec{C}_i\varvec{C}_i')^{-1/2}\varvec{\Gamma }_{ij},\qquad i=1,2,3,\quad j=1,2, \end{aligned}$$
(6)

and let \(\varvec{Q}^{o}\) be any matrix of full rank spanning \(\mathcal {C}(\varvec{Q})^{\perp }\), the orthogonal complement to \(\mathcal {C}(\varvec{Q})\). The following transformations of \(\varvec{Y}_{i}\), \(i=1,2,3\), are made

$$\begin{aligned} \varvec{V}_{i0}= & {} \varvec{Y}_{i}(\varvec{C}_{i}')^o=\varvec{1}_{p_i}\varvec{\gamma }'\varvec{X}_i(\varvec{C}_i')^o+ \varvec{E}_i(\varvec{C}_i')^o,\quad i=1,2,3, \end{aligned}$$
(7)
$$\begin{aligned} \varvec{V}_{i1}= & {} \varvec{Y}_{i}\varvec{R}_{i1}=\varvec{A}_i\varvec{B}\varvec{K}_{i1}+\varvec{1}_{p_i}\varvec{\gamma }' \varvec{X}_i\varvec{R}_{i1}+ (\varvec{U}_i\varvec{Z}_i+\varvec{E}_i)\varvec{R}_{i1}, \quad i=1,2,3,\end{aligned}$$
(8)
$$\begin{aligned} \varvec{V}_{i2}= & {} \varvec{Y}_{i}\varvec{R}_{i2}=\varvec{A}_i\varvec{B}\varvec{K}_{i2}+\varvec{1}_{p_i}\varvec{\gamma }' \varvec{X}_i\varvec{R}_{i2}+ \varvec{E}_i\varvec{R}_{i2}, \quad i=1,2,3. \end{aligned}$$
(9)

Before we analyze the transformations above, we need a few technical relations concerning \(\varvec{Z}_{i}\), \(i=1,2,3\). To some extent, the next lemma is our main contribution, because without it, the mathematics would become very difficult to carry out. Note that the result depends on the definition of \(\varvec{Z}_{i}\), \(i=1,2,3\) given in (4).

Lemma 3.1

Let \(\varvec{Z}_{i}\), \(i=1,2,3\), be as in (4), and let \(\varvec{R}_{ij}\) , \(i=1,2,3\) , \(j=1,2\) be defined in (6). Then

  1. (i)

    \(\varvec{Z}_{i}\varvec{R}_{i1}\varvec{R}_{i1}'\varvec{Z}_{i}'=\varvec{I}_{m}\) ;

  2. (ii)

    \(\varvec{R}_{i1}'\varvec{Z}_{i}'\varvec{Z}_{i}\varvec{R}_{i1}=\varvec{I}_{m}\) ;

  3. (iii)

    \(\varvec{R}_{ij}'\varvec{R}_{ij}=\varvec{I}_m\) .

Proof

Using (6), (5) and the definition of \(\varvec{\Gamma }_{i1}\) it follows that

$$\begin{aligned} \varvec{Z}_{i}\varvec{R}_{i1}\varvec{R}_{i1}'\varvec{Z}_{i}'= & {} \varvec{Z}_{i}\varvec{C}_{i}'(\varvec{C}_{i}\varvec{C}_{i}')^{-1/2}\varvec{\Gamma }_{i1}\varvec{\Gamma }_{i1}' (\varvec{C}_{i}\varvec{C}_{i}')^{-1/2}\varvec{C}_{i}\varvec{Z}_{i}'\\= & {} \varvec{Z}_{i}\varvec{P}_{C_i}\varvec{Z}_{i}'\varvec{Z}_{i}\varvec{P}_{C_i}\varvec{Z}_{i}'= \varvec{Z}_{i}\varvec{Z}_{i}'\varvec{Z}_{i}\varvec{Z}_{i}'=\varvec{I}_{m}, \end{aligned}$$

where \(\varvec{P}_{C_i}=\varvec{C}_i(\varvec{C}_i'\varvec{C}_i)^{-1}\varvec{C}_i'\) is the unique orthogonal projection on \(\mathcal {C}(\varvec{C}_i)\), and thus statement (i) is established. Moreover, once again using (6) and the definition of \(\varvec{\Gamma }_{i1}\)

$$\begin{aligned} \varvec{R}_{i1}'\varvec{Z}_{i}'\varvec{Z}_{i}\varvec{R}_{i1}= & {} \varvec{\Gamma }_{i1}(\varvec{C}_{i}\varvec{C}_{i}')^{-1/2}\varvec{C}_{i}\varvec{Z}_{i}' \varvec{Z}_{i}\varvec{C}_{i}'(\varvec{C}_{i}\varvec{C}_{i}')^{-1/2}\varvec{\Gamma }_{i1}\\= & {} \varvec{\Gamma }_{i1}'\varvec{\Gamma }_{i1}\varvec{\Gamma }_{i1}'\varvec{\Gamma }_{i1}=\varvec{I}_{m}, \end{aligned}$$

and statement (ii) is verified. Statement (iii) can be shown in a similar way. \(\square \)

3.3 The likelihood

We start to define the covariance between two random matrices and the dispersion matrix. Let \(\varvec{X}\) and \(\varvec{Y}\) be two random matrices. The covariance between \(\varvec{X}\) and \(\varvec{Y}\) is defined by

$$\begin{aligned} C[\varvec{X},\varvec{Y}]=E[\mathrm{vec}\varvec{X}\mathrm{vec'}\varvec{Y}]-E[\mathrm{vec}\varvec{X}]E[\mathrm{vec'}\varvec{Y}], \end{aligned}$$

and the dispersion matrix \(D[\varvec{X}]\) is defined by \(D[\varvec{X}]=\mathrm{cov} (\varvec{X},\varvec{X})\), where \(\mathrm{vec}\) is the usual columnwise vectorization operator and \(\mathrm{vec'}\) is its transpose.

The transformation which has taken place in the previous section is one-to-one. Based on \(\{\varvec{V}_{ij}\}\), \(i=1,2,3\), \(j=0,1,2\), we will set up the likelihood for all observations. However, first, we present the marginal densities (likelihood function) for \(\{\varvec{V}_{ij}\}\), which of course are normally distributed. Thus, to determine the distributions, it is enough to present means and dispersion matrices:

$$\begin{aligned} E[\varvec{V}_{i0}]&=\varvec{1}_{p_i}\varvec{\gamma }'\varvec{X}_i(\varvec{C}_i')^o,\qquad D[\varvec{V}_{i0}]=\sigma _i^2(\varvec{C}_i')^{o'}(\varvec{C}_i')^o\otimes \varvec{I}_{p_i},\end{aligned}$$
(10)
$$\begin{aligned} E[\varvec{V}_{i1}]&=\varvec{A}_i\varvec{B}\varvec{K}_{i1}+\varvec{1}_{p_i}\varvec{\gamma }'\varvec{X}_i\varvec{R}_{i1},\qquad D[\varvec{V}_{i1}]=\varvec{I}_{m}\otimes (\varvec{\Sigma }_{ii}^u + \sigma _i^2\varvec{I}_{p_i}),\end{aligned}$$
(11)
$$\begin{aligned} E[\varvec{V}_{i2}]&=\varvec{A}_i\varvec{B}\varvec{K}_{i2}+\varvec{1}_{p_i}\varvec{\gamma }' \varvec{X}_i\varvec{R}_{i2},\qquad D[\varvec{V}_{i2}]=\sigma _i^2\varvec{I}_{mp_i}, \end{aligned}$$
(12)

for \(i=1,2,3\). The matrix \(\varvec{\Sigma }_{ii}^u\) stands for the covariance matrix between rows of the ith data matrix \(\varvec{Y}_i, i=1,2,3\). Concerning the simultaneous distribution of \(\{\varvec{V}_{ij}\}\), \(i=1,2,3\), \(j=0,1,2\), \(\varvec{V}_{i0}\) and \(\varvec{V}_{i2}\), \(i=1,2,3\); they are independently distributed and these variables are also independent of \(\{\varvec{V}_{i1}\}\). However, the elements in \(\{\varvec{V}_{i1}\}\) are not independently distributed. We have to pay attention to the likelihood of these variables and \(\{\mathrm{vec}\varvec{V}_{i1}\}\), \(i=1,2,3\), will be considered.

Let \(L(\varvec{V};\varvec{\Theta })\) denote the likelihood function for the random variable \(\varvec{V}\) with parameter \(\varvec{\Theta }\). We are going to discuss

$$\begin{aligned}&L(\mathrm{vec}\varvec{V}_{31},\mathrm{vec}\varvec{V}_{21},\mathrm{vec}\varvec{V}_{11};\bullet )\nonumber \\&\qquad \qquad = L(\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{21},\mathrm{vec}\varvec{V}_{11};\bullet ) L(\mathrm{vec}\varvec{V}_{21}|\mathrm{vec}\varvec{V}_{11};\bullet )L(\mathrm{vec}\varvec{V}_{11};\bullet ), \end{aligned}$$
(13)

where in (13), \(\bullet \) indicates that no parameters have been specified.

The next result, which is obtained by straightforward calculations, will be used in the forthcoming presentation:

$$\begin{aligned} D\left[ \begin{array}{c} \mathrm{vec}\varvec{V}_{11}\\ \mathrm{vec}\varvec{V}_{21}\\ \mathrm{vec}\varvec{V}_{31} \end{array} \right] =\Big (\varvec{R}_{i1}'\varvec{Z}_{i}'\varvec{Z}_{j}\varvec{R}_{j1}\otimes \varvec{\Sigma }_{ij}^u\Big )_{i=1,2,3;j=1,2,3}+ \begin{pmatrix} \sigma _1^2\varvec{I}_{mp_1}&{}\varvec{0}&{}\varvec{0}\\ \varvec{0}&{}\sigma _2^2\varvec{I}_{mp_2}&{}\varvec{0}\\ \varvec{0}&{}\varvec{0}&{}\sigma _3^2\varvec{I}_{mp_3} \end{pmatrix}, \end{aligned}$$
(14)

where \(\big (\bullet \big )_{i=1,2,3;j=1,2,3}\) denotes a block partitioned matrix.

From the factorization of the likelihood in (13), it follows that we have to investigate

$$\begin{aligned} L(\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{21},\mathrm{vec}\varvec{V}_{11};\bullet ). \end{aligned}$$

Thus, we are interested in the conditional expectation and the conditional dispersion. The conditional mean equals

$$\begin{aligned}&E[\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{11},\mathrm{vec}\varvec{V}_{21}]\\&\quad =E[\mathrm{vec}\varvec{V}_{31}]+ (C[\varvec{V}_{31},\varvec{V}_{11}],C[\varvec{V}_{31},\varvec{V}_{21}])D[(\mathrm{vec}'\varvec{V}_{11},\mathrm{vec}'\varvec{V}_{21})']^{-1}\\&\qquad \times ((\mathrm{vec}'\varvec{V}_{11},\mathrm{vec}'\varvec{V}_{21})'- (E[\mathrm{vec}'\varvec{V}_{11}],E[\mathrm{vec}'\varvec{V}_{21}])'), \end{aligned}$$

where the expectations for \(\mathrm{vec}\varvec{V}_{i1}\), \(i=1,2,3\) can be obtained from (11). Moreover, the conditional dispersion is given by

$$\begin{aligned}&D[\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{11},\mathrm{vec}\varvec{V}_{21}]=D[\varvec{V}_{31}]\\&\qquad - (C[\varvec{V}_{31},\varvec{V}_{11}],C[\varvec{V}_{31},\varvec{V}_{21}])D[(\mathrm{vec}'\varvec{V}_{11},\mathrm{vec}'\varvec{V}_{21})']^{-1} (C[\varvec{V}_{31},\varvec{V}_{11}],C[\varvec{V}_{31},\varvec{V}_{21}])'. \end{aligned}$$

The next lemma fills in the details of this relation and the conditional mean, and indeed shows that relative complicated expressions can be dramatically simplified using Lemma 3.1.

Lemma 3.2

Let \(\varvec{V}_{i1}\), \(i=1,2,3\), be defined in (8). Then

  1. (i)

    \(D[\varvec{V}_{31}]=\varvec{I}_m\otimes (\varvec{\Sigma }_{33}^u + \sigma _3^2\varvec{I}_{p_3})\) ;

  2. (ii)

    \(C[\varvec{V}_{31},\varvec{V}_{11}]=\varvec{R}_{31}'\varvec{Z}_{3}'\varvec{Z}_{1}\varvec{R}_{11}\otimes \varvec{\Sigma }_{31}^u\) ;

  3. (iii)

    \(C[\varvec{V}_{31},\varvec{V}_{21}]=\varvec{R}_{31}'\varvec{Z}_{3}' \varvec{Z}_{2}\varvec{R}_{21}\otimes \varvec{\Sigma }_{32}^u\) ;

  4. (iv)
    $$\begin{aligned} D\left[ \begin{array}{c}\mathrm{vec}\varvec{V}_{11}\\ \mathrm{vec}\varvec{V}_{21} \end{array} \right] =\left( \begin{array}{cc} \varvec{I}_m\otimes (\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})&{} \varvec{R}_{11}'\varvec{Z}_{1}'\varvec{Z}_{2}\varvec{R}_{21}\otimes \varvec{\Sigma }_{12}^u\\ \varvec{R}_{21}'\varvec{Z}_{2}'\varvec{Z}_{1}\varvec{R}_{11}\otimes \varvec{\Sigma }_{21}^u&{} \varvec{I}_m\otimes (\varvec{\Sigma }_{22}^u + \sigma _2^2\varvec{I}_{p_2}) \end{array}\right) ; \end{aligned}$$
  5. (v)
    $$\begin{aligned}&D\left[ \begin{array}{c} \mathrm{vec}\varvec{V}_{11}\\ \mathrm{vec}\varvec{V}_{21} \end{array} \right] ^{-1}\\&=\left( \begin{array}{cc} \varvec{Q}_{11}^{-1}&{}\varvec{0}\\ \varvec{0}&{}\varvec{0} \end{array}\right) + \left( \begin{array}{c} -\varvec{Q}_{11}^{-1}\varvec{Q}_{12}\\ \varvec{I}_m \end{array} \right) (\varvec{Q}_{22}-\varvec{Q}_{21} \varvec{Q}_{11}^{-1}\varvec{Q}_{12})^{-1}\big (-\varvec{Q}_{21}\varvec{Q}_{11}^{-1}\quad \varvec{I}_m), \end{aligned}$$

    where

    $$\begin{aligned}&\varvec{Q}_{11}^{-1}=\varvec{I}_m\otimes (\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1},\\&\varvec{Q}_{11}^{-1}\varvec{Q}_{12}=\varvec{R}_{11}'\varvec{Z}_{1}'\varvec{Z}_{2}\varvec{R}_{21}\otimes (\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1}\varvec{\Sigma }_{12}^u,\\&\varvec{Q}_{22}-\varvec{Q}_{21}\varvec{Q}_{11}^{-1}\varvec{Q}_{12}= \varvec{I}_m\otimes (\varvec{\Sigma }_{22}^u + \sigma _2^2\varvec{I}_{p_2}-\varvec{\Sigma }_{21}^u (\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1}\varvec{\Sigma }_{12}^u); \end{aligned}$$
  6. (vi)
    $$\begin{aligned}&(C[\varvec{V}_{31},\varvec{V}_{11}],C[\varvec{V}_{31},\varvec{V}_{21}])D[(\mathrm{vec}'\varvec{V}_{11},\mathrm{vec}'\varvec{V}_{21})']^{-1} (C[\varvec{V}_{31},\varvec{V}_{11}],C[\varvec{V}_{31},\varvec{V}_{21}])'\\&=\varvec{I}_m\otimes (\varvec{\Sigma }_{31}^u(\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{m})^{-1}\varvec{\Sigma }_{13}^u +\varvec{\Psi }_{32}\varvec{\Psi }_{22}^{-1}\varvec{\Psi }_{23}), \end{aligned}$$

    where

    $$\begin{aligned} \varvec{\Psi }_{32}&= \varvec{\Psi }_{23}'=\varvec{\Sigma }_{32}^u-\varvec{\Sigma }_{31}^u(\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1}\varvec{\Sigma }_{12}^u,\end{aligned}$$
    (15)
    $$\begin{aligned} \varvec{\Psi }_{22}&= \varvec{\Sigma }_{22}^u + \sigma _2^2\varvec{I}_{p_2}-\varvec{\Sigma }_{21}^u (\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1}\varvec{\Sigma }_{12}^u. \end{aligned}$$
    (16)

Proof

Statements (i), (ii), (iii), and (iv) follow directly from (14). In (v), the inverse of a partitioned matrix is utilized and (vi) is obtained by straightforward matrix manipulations and application of Lemma 3.1. \(\square \)

Put

$$\begin{aligned} \varvec{B}_{1} &= \varvec{\Sigma }_{31}^u(\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1},\end{aligned}$$
(17)
$$\begin{aligned} \varvec{B}_{2} &= \varvec{\Sigma }_{32}^u\varvec{\Psi }_{22}^{-1}, \end{aligned}$$
(18)
$$\begin{aligned} \varvec{\Psi }_{33}&= \varvec{\Sigma }_{33}^u-\varvec{\Sigma }_{31}^u(\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1}\varvec{\Sigma }_{13}^u, \end{aligned}$$
(19)

where \(\varvec{\Psi }_{22}\) is given in (16) and then the next theorem is directly established using Lemma 3.2.

Theorem 3.1

Let \(\varvec{V}_{i1}\), \(i=1,2,3\), be defined in (8) and \(\varvec{\Psi }_{ij}\), \(i,j=2,3\), be defined in Lemma 3.2 and (19). Moreover, let \(\varvec{B}_{1}\) and \(\varvec{B}_{2}\) be given by (17) and (18), respectively. Then \(\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{11},\mathrm{vec}\varvec{V}_{21}\sim N_{p_3m} (\varvec{M}_{31},\varvec{D}_{31})\), where

$$\begin{aligned} \varvec{M}_{31}= & {} E[\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{11},\mathrm{vec}\varvec{V}_{21}]=E[\mathrm{vec}\varvec{V}_{31}]\\&\,\,+ (\varvec{R}_{31}'\varvec{Z}_{3}'\varvec{Z}_{1}\varvec{R}_{11}\otimes \varvec{B}_{1}(\varvec{I}_m+ \varvec{\Sigma }_{12}^u\varvec{\Psi }_{22}^{-1}\varvec{\Sigma }_{21}^u(\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1})) \mathrm{vec}(\varvec{V}_{11}-E[\varvec{V}_{11}])\\&\,\,- (\varvec{R}_{31}'\varvec{Z}_{3}'\varvec{Z}_{2}\varvec{R}_{21}\otimes \varvec{B}_{1}\varvec{\Sigma }_{12}^u) \mathrm{vec}(\varvec{V}_{21}-E[\varvec{V}_{21}])\\&\,\,+ (\varvec{R}_{31}'\varvec{Z}_{3}'\varvec{Z}_{2}\varvec{R}_{21}\otimes \varvec{B}_{2}) \mathrm{vec}(\varvec{V}_{21}-E[\varvec{V}_{21}])\\&\,\,- (\varvec{R}_{31}'\varvec{Z}_{3}'\varvec{Z}_{1}\varvec{R}_{11}\otimes \varvec{B}_{2} \varvec{\Sigma }_{21}^u(\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1}) \mathrm{vec}(\varvec{V}_{11}-E[\varvec{V}_{11}]), \end{aligned}$$

and

$$\begin{aligned} \varvec{D}_{31}=D[\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{11},\mathrm{vec}\varvec{V}_{21}]=\varvec{I}_m\otimes \varvec{\Psi }_{3\cdot 2}, \end{aligned}$$

where

$$\begin{aligned} \varvec{\Psi }_{3\cdot 2}=\varvec{\Psi }_{33}-\varvec{\Psi }_{32}\varvec{\Psi }_{22}^{-1}\varvec{\Psi }_{23}. \end{aligned}$$
(20)

The result of the theorem shows that \(\mathrm{vec}\varvec{V}_{31}\) given \(\mathrm{vec}\varvec{V}_{11}\) and \(\mathrm{vec}\varvec{V}_{21}\) , and if \(E[\mathrm{vec}\varvec{V}_{11}]\) , \(E[\mathrm{vec}\varvec{V}_{21}]\) , \(\varvec{\Sigma }_{21}^u\) , \(\varvec{\Sigma }_{11}^u\) and \(\varvec{\Psi }_{22}\) do not depend on unknown parameters, the model with unknown mean parameters \(\varvec{B}_{1}\) and \(\varvec{B}_{2}\) and unknown dispersion \(\varvec{\Psi }_{3\cdot 2}\) is the same as a vectorized MANOVA model (e.g., see Srivastava, 2002, for information about MANOVA).

Moreover, it follows from (13) that

$$\begin{aligned} L(\mathrm{vec}\varvec{V}_{21}|\mathrm{vec}\varvec{V}_{11};\bullet ) \end{aligned}$$

is needed. However, the calculations are the same as above and we only present the final result.

Theorem 3.2

Let \(\varvec{V}_{i1}\), \(i=1,2\), be defined in (8) and \(\varvec{\Psi }_{22}\) in Lemma 3.2. Put

$$\begin{aligned} \varvec{B}_{0}=\varvec{\Sigma }_{21}^u(\varvec{\Sigma }_{11}^u + \sigma _1^2\varvec{I}_{p_1})^{-1}. \end{aligned}$$
(21)

Then \(\mathrm{vec}\varvec{V}_{21}|\mathrm{vec}\varvec{V}_{11}\sim N_{p_2m} (\varvec{M}_{21},\varvec{I}_m\otimes \varvec{\Psi }_{22})\), where

$$\begin{aligned} \varvec{M}_{21}= & {} E[\mathrm{vec}\varvec{V}_{21}|\mathrm{vec}\varvec{V}_{11}]=E[\mathrm{vec}\varvec{V}_{21}]\\&\,\,+ (\varvec{R}_{21}'\varvec{Z}_{2}'\varvec{Z}_{1}\varvec{R}_{11}\otimes \varvec{B}_{0}) \mathrm{vec}(\varvec{V}_{11}-E[\varvec{V}_{11}]). \end{aligned}$$

Hence, it has been established that \(\mathrm{vec}\varvec{V}_{21}|\mathrm{vec}\varvec{V}_{11}\) is a vectorized MANOVA model.

Theorem 3.3

The likelihood for \(\{\varvec{V}_{ij}\}\), \(i=1,2,3\), \(j=0,1,2\), given in (7), (8) and (9) equals

$$\begin{aligned}&L(\{\varvec{V}_{ij}\},\, i=1,2,3,\, j=0,1,2;\varvec{\gamma },\varvec{B},\varvec{\Sigma }^u)= \prod _{i=1}^3L(\{\varvec{V}_{i0}\},\, i=1,2,3;\varvec{\gamma })\\&\qquad \times \prod _{i=1}^3L(\{\varvec{V}_{i2}\},\, i=1,2,3;\varvec{\gamma },\varvec{B}) \\&\qquad \times L(\varvec{V}_{11};\varvec{\gamma },\varvec{B},\varvec{\Sigma }_{11}^u)\times L(\varvec{V}_{21}|\varvec{V}_{11};\varvec{\gamma },\varvec{B},\varvec{B}_0,\varvec{\Psi }_{22})\\&\qquad \times L(\mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{11},\mathrm{vec}\varvec{V}_{21};\varvec{\gamma },\varvec{B},\varvec{B}_{0},\varvec{B}_{1},\varvec{B}_{2},\varvec{\Psi }_{22},\varvec{\Psi }_{3\cdot 2}), \end{aligned}$$

where all parameters mentioned in the likelihoods have been defined earlier in Sect. 3.

4 Estimation of parameters and prediction of small-area means

For the monotone missing value problem, treated in the previous sections, it was shown that it is possible to present a model which seems to be easy to utilize. The remaining part of the report consists of a relatively straightforward approach for predicting the small areas which is of concern in this article.

4.1 Estimation

To estimate the parameters, a restricted likelihood approach is proposed. For the likelihood given in Theorem 3.3, we start to estimate \(\varvec{B}\) and \(\varvec{\gamma }\) by maximizing

$$\begin{aligned} \prod _{i=1}^3L\left( \{\varvec{V}_{i0}\},\, i=1,2,3;\varvec{\gamma }\right) \prod _{i=1}^3L\left( \{\varvec{V}_{i2}\},\, i=1,2,3;\varvec{\gamma },\varvec{B}\right) . \end{aligned}$$

From this part of the likelihood, generally, we cannot estimate \(\varvec{B}\) and \(\varvec{\gamma }\), only specific linear combinations are estimable. However, \(\varvec{B}\) and \(\varvec{\gamma }\) can be expressed as a linear function of new unknown parameters, say \(\varvec{\Theta }\), which can be estimated together with \(\varvec{\Sigma }_{11}^u\) from \(L(\varvec{V}_{11};\widehat{\varvec{\gamma }}(\varvec{\Theta }),\widehat{\varvec{B}}(\varvec{\Theta }),\varvec{\Sigma }_{11}^u)\), which is a likelihood from a MANOVA model. Furthermore, inserting these estimators in

$$\begin{aligned} L\left( \varvec{V}_{21}|\varvec{V}_{11};\widehat{\varvec{\gamma }}(\widehat{\varvec{\Theta }}),\widehat{\varvec{B}}(\widehat{\varvec{\Theta }}),\widehat{\varvec{\Sigma }}_{11}^u,\varvec{B}_{0},\varvec{\Psi }_{22}\right) \end{aligned}$$

and thereafter maximizing the likelihoods with respect to the remaining unknown parameters produces estimators for \(\varvec{\Sigma }_{12}^u\) (using \(\varvec{B}_{0}\) in Eq. (21)) and \(\varvec{\Sigma }_{22}^u\) (using \(\varvec{\Psi }_{22}\) in (16)). Inserting all the obtained estimators in

$$\begin{aligned} L\left( \mathrm{vec}\varvec{V}_{31}|\mathrm{vec}\varvec{V}_{11},\mathrm{vec}\varvec{V}_{21};\widehat{\varvec{\gamma }}(\widehat{\varvec{\Theta }}),\widehat{\varvec{B}}(\widehat{\varvec{\Theta }}),\widehat{\varvec{\Sigma }}_{11}^u,\widehat{\varvec{B}}_{0},\varvec{B}_{1},\varvec{B}_{2},\widehat{\varvec{\Psi }}_{22},\varvec{\Psi }_{3\cdot 2}\right) , \end{aligned}$$

and then maximizing the likelihood with respect to \(\varvec{B}_{1}\), \(\varvec{B}_{2}\) and \(\varvec{\Psi }_{3\cdot 2}\) yields estimators for \(\varvec{\Sigma }_{31}^u\), \(\varvec{\Sigma }_{32}^u\) and \(\varvec{\Sigma }_{33}^u\) (using (17), (18), (19) with (15), (16) and (20)).

4.2 Prediction

To perform predictions of small-area means we first have to predict \(\varvec{U}_1\), \(\varvec{U}_2\) and \(\varvec{U}_3\) in the model given by (3). Put

$$\begin{aligned} \varvec{y}= \begin{pmatrix} \mathrm{vec}\varvec{Y}_1\\ \mathrm{vec}\varvec{Y}_2\\ \mathrm{vec}\varvec{Y}_3 \end{pmatrix}\quad \text {and} \quad \varvec{v}= \begin{pmatrix} \mathrm{vec}\varvec{U}_1\\ \mathrm{vec}\varvec{U}_2\\ \mathrm{vec}\varvec{U}_3 \end{pmatrix}. \end{aligned}$$

Following Henderson’s prediction approach to linear mixed model (Henderson 1975), the prediction of \(\varvec{v}\) can be derived in two stages, where in the first stage \(\varvec{\Sigma }^u\) is supposed to be known. Thus, the idea is to maximize the joint density of

$$\begin{aligned} f(\varvec{y},\varvec{v})&=f(\varvec{y}\mid \varvec{v})f(\varvec{v})\nonumber \\&=c~\mathrm{exp}\left\{ -\frac{1}{2}\mathrm{tr}\Big \{\big (\varvec{y}-\varvec{\mu }\big )'\varvec{\Sigma }^{-1}\big (\varvec{y}-\varvec{\mu }\big )+\varvec{v}' \varvec{\Omega }^{-1}\varvec{v}\Big \}\right\} , \end{aligned}$$
(22)

with respect to \(\mathrm{vec}\varvec{B}\), \(\varvec{\gamma }\), which are included in \(\varvec{\mu }\), and \(\varvec{v}\), which is also included in \(\varvec{\mu }\) but also appear in the term \(\varvec{v}' \varvec{\Omega }^{-1}\varvec{v}\). Moreover, in (22), c is a known constant and \(\varvec{\Omega }\) is given by

$$\begin{aligned} \varvec{\Omega }=\left( \begin{array}{ccc} \varvec{I}\otimes \varvec{\Sigma }_{11}^u&{}\varvec{I}\otimes \varvec{\Sigma }_{12}^u&{} \varvec{I}\otimes \varvec{\Sigma }_{13}^u\\ \varvec{I}\otimes \varvec{\Sigma }_{21}^u&{}\varvec{I}\otimes \varvec{\Sigma }_{22}^u &{}\varvec{I}\otimes \varvec{\Sigma }_{23}^u\\ \varvec{I}\otimes \varvec{\Sigma }_{31}^u&{}\varvec{I}\otimes \varvec{\Sigma }_{32}^u &{}\varvec{I}\otimes \varvec{\Sigma }_{33}^u \end{array}\right) . \end{aligned}$$

The vector \(\varvec{\mu }\) and the matrix \(\varvec{\Sigma }\) are the expectation and dispersion of \(\varvec{y}\mid \varvec{v}\) and are respectively given by

$$\begin{aligned} E[\varvec{y}\mid \varvec{v}]=\varvec{\mu }=\varvec{H}_1\mathrm{vec}\varvec{B}+\varvec{H}_2\varvec{\gamma }+\varvec{H}_3\varvec{v}, \end{aligned}$$

where

$$\begin{aligned} \varvec{H}_1=\left( \begin{array}{c} \varvec{C}_1'\varvec{H}'\otimes \varvec{A}_1\\ \varvec{C}_2'\varvec{H}'\otimes \varvec{A}_2\\ \varvec{C}_3'\varvec{H}'\otimes \varvec{A}_3 \end{array}\right) , \quad \varvec{H}_2=\left( \begin{array}{c}\varvec{X}_1'\otimes \varvec{1}_{p_1}\\ \varvec{X}_2'\otimes \varvec{1}_{p_2}\\ \varvec{X}_3'\otimes \varvec{1}_{p_3} \end{array}\right) , \quad \varvec{H}_3=\left( \begin{array}{c} \varvec{Z}_1'\otimes \varvec{I}_m \\ \varvec{Z}_2'\otimes \varvec{I}_m \\ \varvec{Z}_3'\otimes \varvec{I}_m \end{array}\right) , \end{aligned}$$

and

$$\begin{aligned} D[\varvec{y}\mid \varvec{v}]=\varvec{\Sigma }= \left( \begin{array}{ccc} \sigma _{1}^2\varvec{I}_{p_1n_1} &{}\varvec{0}&{}\varvec{0}\\ \varvec{0}&{}\sigma _{2}^2\varvec{I}_{p_2n_2}&{}\varvec{0}\\ \varvec{0}&{}\varvec{0}&{}\sigma _{2}^2\varvec{I}_{p_3n_3} \end{array}\right) . \end{aligned}$$

Supposing \(\varvec{\Sigma }^u\) is known, and then using (22) together with standard results from linear models theory we find estimators of the unknown parameters and of \(\varvec{v}\) as a function of \(\varvec{\Sigma }^u\) and thereafter replacement of \(\varvec{\Sigma }^u\) by its estimator, which is obtained as described in Sect. 4.1, yields an estimator \(\widehat{\varvec{v}}\).

The prediction of small-area means is performed in the sense that estimating the small-area means is equivalent to predicting small-area means of non-sampled values, given the sample data and auxiliary data. To this end, for the dth area and gth group units, we consider the means for sample observations of the data matrices \(\varvec{Y}_1, \varvec{Y}_2\) and \(\varvec{Y}_3\) and predict the means of non-sampled values. Use the superscripts s and r to indicate the corresponding partitions for observed sample data and non-observed sample data in the target population, \(\varvec{Y}_{id}^{(s)}\) and \(\varvec{Y}_{id}^{(r)}\), respectively. Similarly, we denote by \(\varvec{X}_{id}^{(r)}:r\times (N_d-n_{id})\), \(\varvec{C}_{id}^{(r)}:mk\times (N_d-n_{id})\) and \(\varvec{z}_{id}^{(r)}:(N_d-n_{id})\times 1\) the corresponding matrix of covariates, design matrix and design vector for non-sampled units in the population, respectively. Then, the prediction of small-area means at each timepoint and for different group units is presented in the next proposition

Proposition 4.1

Consider repeated measures data with missing values on the variable of interest for three-steps monotone sample data described by the models in (3).

  1. (i)

    The target small-area means at each timepoint are elements of the vectors

    $$\begin{aligned} \widehat{\varvec{\mu }}_{d}=\frac{1}{N_d}\Big (\widehat{\varvec{\mu }}_{d}^{(s)}+\widehat{\varvec{\mu }}_{d}^{(r)}\Big ),\quad d=1,\dots ,m, \end{aligned}$$

    where

    $$\begin{aligned} \widehat{\varvec{\mu }}_{d}^{(s)}= \begin{pmatrix} \varvec{Y}_{1d}^{(s)}\varvec{1}_{n_{1d}}\\ \varvec{Y}_{2d}^{(s)}\varvec{1}_{n_{2d}}\\ \varvec{Y}_{3d}^{(s)}\varvec{1}_{n_{3d}}\\ \end{pmatrix}, \end{aligned}$$

    and

    $$\begin{aligned} \widehat{\varvec{\mu }}_{d}^{(r)}= \begin{pmatrix} \Big (\varvec{A}_1\widehat{\varvec{B}}\varvec{C}_{1d}^{(r)}+\varvec{1}_{p_1}\widehat{\varvec{\gamma }}'\varvec{X}_{1d}^{(r)}+\widehat{\varvec{u}}_{1d}\varvec{z}_{1d}^{(r)'}\Big )\varvec{1}_{N_d-n_{1d}}\\ \Big (\varvec{A}_2\widehat{\varvec{B}}\varvec{C}_{2d}^{(r)}+\varvec{1}_{p_2}\widehat{\varvec{\gamma }}'\varvec{X}_{2d}^{(r)}+\widehat{\varvec{u}}_{2d}\varvec{z}_{2d}^{(r)'}\Big )\varvec{1}_{N_d-n_{2d}}\\ \Big (\varvec{A}_3\widehat{\varvec{B}}\varvec{C}_{3d}^{(r)}+\varvec{1}_{p_3}\widehat{\varvec{\gamma }}'\varvec{X}_{3d}^{(r)}+\widehat{\varvec{u}}_{3d}\varvec{z}_{3d}^{(r)'}\Big )\varvec{1}_{N_d-n_{1d}}\\ \end{pmatrix},\quad d=1,\ldots ,m. \end{aligned}$$
  2. (ii)

    The small-area means at each timepoint for each group units for complete and incomplete data sets are given by

    $$\begin{aligned} \widehat{\varvec{\mu }}_{dg}&=\frac{1}{N_{dg}}\Big (\widehat{\varvec{\mu }}_{dg}^{(s)}+\widehat{\varvec{\mu }}_{dg}^{(r)}\Big ),\quad d=1,\ldots ,m,~g=1,\dots ,k, \end{aligned}$$

    where

    $$\begin{aligned} \widehat{\varvec{\mu }}_{dg}^{(s)}= \begin{pmatrix} \varvec{Y}_{1d}^{(s)}\varvec{1}_{n_{1dg}}\\ \varvec{Y}_{2d}^{(s)}\varvec{1}_{n_{2dg}}\\ \varvec{Y}_{3d}^{(s)}\varvec{1}_{n_{3dg}}\\ \end{pmatrix}, \end{aligned}$$

    and

    $$\begin{aligned} \widehat{\varvec{\mu }}_{dg}^{(r)}&= \begin{pmatrix} \Big (\varvec{A}_1\widehat{\varvec{B}}\varvec{C}_{1dg}^{(r)}+\varvec{1}_{p_1}\widehat{\varvec{\gamma }}'\varvec{X}_{1dg}^{(r)}+\widehat{\varvec{u}}_{1d}\varvec{z}_{1dg}^{(r)'}\Big )\varvec{1}_{N_{dg}-n_{1dg}}\\ \Big (\varvec{A}_2\widehat{\varvec{B}}\varvec{C}_{2dg}^{(r)}+\varvec{1}_{p_2}\widehat{\varvec{\gamma }}'\varvec{X}_{2dg}^{(r)}+\widehat{\varvec{u}}_{2d}\varvec{z}_{2dg}^{(r)'}\Big )\varvec{1}_{N_{dg}-n_{2dg}}\\ \Big (\varvec{A}_3\widehat{\varvec{B}}\varvec{C}_{3dg}^{(r)}+\varvec{1}_{p_3}\widehat{\varvec{\gamma }}'\varvec{X}_{3dg}^{(r)}+\widehat{\varvec{u}}_{3d}\varvec{z}_{3dg}^{(r)'}\Big )\varvec{1}_{N_{dg}-n_{3dg}}\\ \end{pmatrix},\\&d=1,\ldots ,m,~g=1,\ldots ,k. \end{aligned}$$

Note that the predicted vector \(\widehat{\varvec{u}}_{id}\) is the dth column of the predicted matrix\(\widehat{\varvec{U}}_i,~i=1,2,3\) and \(\widehat{\varvec{\beta }}_g\) is the column of the estimated parameter matrix \(\widehat{\varvec{B}}\) for the corresponding group g.

A direct application of Proposition 4.1 is to find the target small-area means for each group across all timepoints obtained as a linear combination of \(\widehat{\varvec{\mu }}_{dg}\) depending on the type of the characteristics of interest.

5 Simulation study

In this section we give a small simulation study to show the performance of the estimation of the covariance matrix \(\varvec{\Sigma }^u\). Assume we have \(m=\) 10, 25, 50, 100 and 200 small areas, and \(k=2\) groups. Furthermore, let \(p=6\) with \(p_1=3\), \(p_2=2\), \(p_3=1\) timepoints and \(q=2\) with

$$\begin{aligned} \varvec{A}_1 = \begin{pmatrix} 1 &{} 1\\ 1 &{} 2\\ 1 &{} 3 \end{pmatrix}, \quad \varvec{A}_2 = \begin{pmatrix} 1 &{} 4\\ 1 &{} 5 \end{pmatrix} \quad \text {and}\quad \varvec{A}_3 = \begin{pmatrix} 1&6 \end{pmatrix}. \end{aligned}$$

For simplicity, assume we have equal number of observations \(n_{idg}\), \(i=1,2,3\), for each small area \(d=1,\ldots ,m\) and group \(g=1,2\), with \(n_{1dg}> n_{2dg} > n_{3dg}\), i.e., we have equal numbers of drop-outs in each small area and each group. For example, if \(n_{1dg}=5, n_{2dg}=4\) and \(n_{3dg}=3\), we have one drop out in each small area and each group for every time period \(i=1,2,3\), see Fig. 1 for the incomplete monotone missing data pattern. In addition, in the simulations, let \(\sigma _i^2 = 0.01\), \(i=1,2,3\)

$$\begin{aligned} \varvec{B} = \begin{pmatrix} 1 &{} 2\\ 3 &{} 4 \end{pmatrix} ,\quad \varvec{\gamma } = \begin{pmatrix} 1\\ 2\\ 3 \end{pmatrix}, \quad \text {and}\quad \varvec{\Sigma }^u = \varvec{I}_p. \end{aligned}$$
Fig. 1
figure 1

Incomplete monotone missing data with \(p_1=3, p_2=2, p_3=1\), \(n_i = n_{idg}mk\) for \(i=1,2,3\) where \(m=10, 25, 50, 100, 200\) small areas and \(k=2\) groups

As result we compare the Frobenius norm of the difference between the estimated covariance matrix \(\widehat{\varvec{\Sigma }}^u\) and the true value \(\varvec{\Sigma }^u = \varvec{I}_p\), that is

$$\begin{aligned} ||\widehat{\varvec{\Sigma }}^u - \varvec{\Sigma }^u||_F = \mathrm{vec}'\left( \widehat{\varvec{\Sigma }}^u - \varvec{\Sigma }^u\right) \mathrm{vec}\left( \widehat{\varvec{\Sigma }}^u - \varvec{\Sigma }^u\right) , \end{aligned}$$

for different number of small areas m and different sample sizes \(n_{idg}\). In Table 1, we can see, that in general we get a better estimate of the covariance matrix \(\varvec{\Sigma }^u\) for larger number of small areas and larger sample sizes.

Table 1 Frobenius norm of the difference between the estimated covariance matrix \(\widehat{\varvec{\Sigma }}^u\) and the true value \(\varvec{\Sigma }^u = \varvec{I}_p\), for different small areas \(m=10, 25, 50, 100, 200\) and sample sizes \(n_{idg}\), \(i=1,2,3\)