On the Maximum Likelihood Estimation of Population and Domain Means

Wywiał, Janusz L.

doi:10.1007/s42519-023-00337-4

On the Maximum Likelihood Estimation of Population and Domain Means

Original Article
Open access
Published: 12 July 2023

Volume 17, article number 40, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

On the Maximum Likelihood Estimation of Population and Domain Means

Download PDF

Janusz L. Wywiał ORCID: orcid.org/0000-0002-3392-1688¹

966 Accesses
Explore all metrics

Abstract

Estimating of population and domain means based on model-design approaches is considered in this paper. Population elements randomly belong to domains. A joint distribution of the variable under study and an auxiliary variable is assumed. Data are observed in a sample selected from a fixed population. The partition of the sample elements into domains of the population is also known. Outside of the sample, values of the auxiliary variable are known but their partition among the domains is not known. The domain means are estimated based on the likelihood function of the data observed in the sample and outside of it. The maximum likelihood estimation method provides regression-type estimators of domain means of the variable under study. They are dependent on posterior probabilities that observations of the auxiliary variable belong to particular domains. Moreover, the weighted means of the domain averages estimators are used to estimation of the population mean. The accuracy of the evaluated estimators and the ordinary estimator is compared using a simulation analysis. The results of this paper could be useful in economic, demographic and sociological surveys.

Domain Mean Estimators Assisted by Nested Error Regression Models

Improved Estimation Strategy for Population Mean Using Auxiliary Information

Article 01 December 2021

Model-based variance estimation in two-dimensional systematic sampling

Article 03 October 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Domains are usually treated as fixed and mutually disjoint subsets of the population. We consider the case when a population element belong to a domain with some probability. Therefore, the size of the domain is random. Our problem is estimation domain means and the population mean of a variable under study y on the basis of a random sample selected from a whole population. In the population all values of an auxiliary variable x are known, while values of y are observed only in the sample. In the sample domains are identifiable, while outside of the population they are not. Under the outlined assumptions let us consider the following motivation examples. Let us consider the population of firms that have taken out bank loans for investments. The values of granted loans are observations of the variable y, while the observations of the variable x are the values of companies’ capital. Any company with the probability $p_h$, $h=1, \ldots ,H$ may default on its loan. The h index identifies the domain consisted of companies classified approximately at the same credit risk. The total distribution of x, y variables in the population will be treated as a mixture of the distributions of these variables in domains weighted by the probabilities $p_h$. The empirical Sect. 3.2 of the article presents other examples.

Ża̧dło [14] assumed that population elements randomly belong to domains. He presented several examples. For instance, he considered the estimation of the income of different enterprises when they randomly belonged to different investment intervals. Elections are another example of domains. In this case a domain consists of people who vote for a specific party. Often, the selection of a particular party is random because a lot of voters are not committed to voting for any particular party. The model for generating an accounting error (see [13]) can also exemplify domains. An observed value of an accounting document is treated as the outcome of the random variable, which is a mixture of two distribution functions. One of these distribution functions generates the true accounting amount, and the second generates an accounting amount contaminated with an error. Documents without errors belong to the first domain and documents polluted with accounting errors belong to the second domain. Hence, documents randomly belong to the domains. This idea, which is based on distribution mixtures, is developed in this paper.

Many auxiliary variables are usually observed during national censuses. Moreover, variables under study (observed during a census) can be used as auxiliary variables in survey sampling on a subsequent occasion. Therefore, we can expect these variables to be highly correlated. Let us note that apart from the above examples, there are many populations where all values of the auxiliary variable are observed. These can be found in economic, demographic, agricultural and other official registers.

In this paper, the model or model-randomization approaches are taken into account (see, e.g., [8] or [9]. Estimating domain means is usually better, when the estimation is supported by data on auxiliary variables observed outside the sample but usually under the assumption that their distribution among domains is known (see several monographs on Small Area Sampling and, e.g., [10]). The models formulated in this paper are close to those considered by Chambers and Skinner [2]. Estimators of domain averages are derived by means of the maximum pseudo-likelihood method. More precisely, a variant of the likelihood method of estimation based on incomplete data of the variable under study is adopted to estimate distribution mixture parameters. Our analysis is mainly supported by monographs [4, 6, 7].

The most important results of the paper are as follows:

The pseudo-likelihood function is formulated for estimation of the mixture distribution parameters in the case when data are observed in the sample selected according various inclusion probabilities (Sect. 2.2).
On the basis of this function, the regression and ratio type estimators of domain means are derived in the case of bivariate normal components of the distribution mixture (Sect. 3.1 and Appendix).
These results are generalized into the case of a multidimensional auxiliary variable (Sect. 3.1).
The linear combination of the regression (ratio) estimators is used to estimation the population mean (Sect. 3.1).
Examples of the simulation analysis of the estimation accuracy are prepared (Sect. 3.2).

2 General Results

2.1 Model-Design Approach

Let us denote by U a population of size N partitioned into H mutually disjoint domains denoted by $U_h$, $h=1, \ldots ,H$, $1<H<N$. Let $[y_k,\textbf{x}_k,\textbf{z}_{k*}]$ be the k-th observation of the variable under study, an auxiliary variable vector, and a vector identifying domains where $\textbf{x}_k=[x_{k,1} \ldots x_{k,m}]$, $1\le m<N$ and $\textbf{z}_{k*}=[z_{k,1} \ldots z_{k,h} \ldots z_{k,H}]$, $k=1, \ldots ,N$. Let $\textbf{z}^{(h)}$ be a row vector, in which all H elements are equal to zero except the h-th element, which is equal to one, and this identifies the h-th domain. When $\textbf{z}_{k*}=\textbf{z}^{(h)}$, then the k-the population element is in the h-th domain.

Let us assume that $[y_k \textbf{x}_k \textbf{z}_{k*}]$ is an observation of a random vector $[Y_k \textbf{X}_k \textbf{Z}_{k*}]$ attached to a k-th population element, where $\textbf{X}_k=[X_{k,1} \ldots X_{k,m}]$ and $\textbf{Z}_{k*}=[Z_{k,1} \ldots Z_{k,H}]$. The random vectors $[Y_k \textbf{X}_k \textbf{Z}_{k*}]$, $k\in U$, are independent, and each of them has the same probability distribution. Let $P(\textbf{Z}_{k*}=\textbf{z}^{(h)})=p_h$, $h=1, \ldots ,H$, $\sum _{h=1}^Hp_h=1$. Random variable $\textbf{Z}_{k*}$ has multinomial distribution with parameters $(1,p_1, \ldots ,p_H)$ (see, e.g., [7]). Event $\{Y_k<y_k,\textbf{X}_k<\textbf{x}_k\}$ with specific feature $\{\textbf{Z}_{k*}=\textbf{z}^{(h)}\}$ written as $\{Y_k<y_k,\textbf{X}_k<\textbf{x}_k,\textbf{Z}_{k*}=\textbf{z}^{(h)}\}$ concerns the h-th domain and

$$\begin{aligned} \{Y_k<y_k,\textbf{X}_k<\textbf{x}_k\}=\bigcup _{h=1}^H\{Y_k<y_k, \textbf{X}_k<\textbf{x}_k,\textbf{Z}_{k*}=\textbf{z}^{(h)}\}. \end{aligned}$$

The events $\{Y_k<y_k,\textbf{X}_k<\textbf{x}_k,\textbf{Z}_{k*}=\textbf{z}^{(h)}\}$ and $\{Y_k<y_k,\textbf{X}_k<\textbf{x}_k,\textbf{Z}_{k*}=\textbf{z}^{(t)}\}$ are mutually exclusive for all $h\ne t$ and $h=1, \ldots ,H$, $t=1, \ldots ,H$. This and the total probability theorem lets us write the following:

$$\begin{aligned} F(y_k,\textbf{x}_k)=P(Y_k<y_k,\textbf{X}_k<\textbf{x}_k)=\sum _{h=1}^HF(y_k, \textbf{x}_k|\textbf{Z}_{k*}=\textbf{z}^{(h)})p_h \end{aligned}$$

where $F(y_k,\textbf{x}_k|\textbf{Z}_{k*}=\textbf{z}^{(h)})$ is the conditional distribution function. In the case where variables $[Y_k\textbf{X}_k]$ are continuous, we have:

$$\begin{aligned} f(y_k,\textbf{x}_k)=\sum _{h=1}^Hf(y_k,\textbf{x}_k|\textbf{Z}_{k*}=\textbf{z}^{(h)})p_h \end{aligned}$$

where $f(y_k,\textbf{x}_k|\textbf{Z}_{k*}=\textbf{z}^{(h)})$, $h=1, \ldots ,G$, $k\in U$, are density functions. This leads to the conclusion that our model defines the following distribution function: $F(y,\textbf{x})=\prod _{k\in U}F(y_k,\textbf{x}_k)$ or density function $f(y,\textbf{x})=\prod _{k\in U}f(y_k,\textbf{x}_k)$.

According to the assumptions of this model, the random variable $\sum _{k=1}^N\textbf{Z}_{k*}$ has multinomial probability distribution with parameters $[N,p_1, \ldots ,p_H]$. Moreover, the column vector $\textbf{Z}_{*h}=[Z_{1,h} \ldots Z_{N,h}]^T$ identifies the h-th domain of size $N_h=\sum _{k\in U}\textbf{Z}_{k,h}$ where $0\le N_h\le N$ and the expected domains sizes are $E(N_h)=Np_h$, $h=1, \ldots ,H$ because $N_h$ has binomial distribution with parameters $(N, p_h)$. Let us note that the introduced definitions lead to the conclusion that the sizes and consistencies of the domains are random. Hence, the multinomial probability model leads to partitions of the population into disjoint subsets called domains. Therefore, each outcome of partitioning the population into domains could be different.

Our main aim is to estimate the expected (domain mean) value $\mu _h=E(Y_k|\textbf{Z}_{k*}=\textbf{z}^{(h)})$ and the probabilities $p_h$, $h=1, \ldots ,H$. Additionally, estimators of the expected value (population mean) $\mu =\sum _{h=1}^Hp_h\mu _h$ are proposed.

In order to do this, sample s of size $n\le N$ is selected from population U according to a sampling design denoted by $P(s)\ge 0$, $s\in {{\mathscr {S}}}$, where ${{\mathscr {S}}}$ is sampling space and $\sum _{s\in {{\mathscr {S}}}}P(s)=1$. Inclusion probabilities of the sampling design are defined by $\pi _k=\sum _{\{s: k\in s,s\in {{\mathscr {S}}}\}}P(s)$, $k=1, \ldots ,N$. Let $\underline{s}=U-s$ be the complement of s in U. Moreover, let $s=\bigcup _{h=1}^Hs_h$, where $s_h\subseteq U_h$, $n_h$ is the size of $s_h$, $n=\sum _{h=1}^Hn_h$ is size of s. We assume that $1<n_h\le N_h$ for $h=1, \ldots H.$ If $s=U$, then $\underline{s}$ is the empty set.

2.2 Maximum Likelihood Estimation

Identifying a domain is possible after observation of variable $\textbf{Z}_{k*}$ in sample s. The density function of the conditional distribution of $[Y_k \textbf{X}_k \textbf{Z}_{k*}]$ provided $\textbf{Z}_{k*}=\textbf{z}^{(h)}$ will be denoted by $f_h(y_k,\textbf{x}_k,\theta _h)$, $h=1, \ldots ,H$ where $\theta _h=[\theta _{h,1} \ldots \theta _{h,m}]$, $\theta _h\subseteq R^m$, $\theta =[\theta _1 \cdots \theta _h \cdots \theta _H]$. Therefore, the observed values of the variables in the whole population are defined by the following distribution mixture:

$$\begin{aligned} f(y_k,\textbf{x}_k,\Theta )=\sum _{h=1}^Hp_hf_h(y_k,\textbf{x}_k,\theta _h),\quad k\in U \end{aligned}$$

(1)

where $\Theta =\{\textbf{p}\cup \theta \}$, $\textbf{p}=[p_1 \ldots p_H]$. We assume that only values $\textbf{x}_1, \ldots , \textbf{x}_k, \ldots ,\textbf{x}_N$ are observed in the whole population before selecting a sample. The marginal distribution of $\textbf{X}_k$ is as follows:

$$\begin{aligned} g(\textbf{x}_k,\Theta _x)=\int _{R}f(y_k,\textbf{x}_k,\Theta )\textrm{d}y_k=\sum _{h=1}^Hp_hg_h(\textbf{x}_k,\theta _{x,h}),\quad k\in U \end{aligned}$$

where $g_h(\textbf{x}_k,\theta _{x,h})=\int _{R}f_h(y_k,\textbf{x}_k,\theta _{h})\textrm{d}y_k$, $\theta _{x,h}\subseteq \theta _{h}$ and $\Theta _x\subseteq \Theta .$ Moreover, let: $\theta _x=[\theta _{x,1} \cdots \theta _{x,H}]$, $\Theta _x=\{\theta _x,\textbf{p}\}$.

The sample contains the following data on variable values: $[y_k \textbf{x}_k,\textbf{z}_{k*}]$ of random variables $(Y_k\textbf{X}_k,\textbf{Z}_{k*})$, $k\in s$. Let ${\textbf {d}}_s=\{[y_k\; {\textbf {x}}_{k*}\; {\textbf {z}}_{k*}],k\in s\}$ and ${\textbf {x}}_{\underline{s}}=\{{\textbf {x}}_{k*},k\in \underline{s}\}$. Hence, the sample contains complete data on the distribution mixture, while outside of the sample, the data are incomplete.

When the sample is selected according to preassigned inclusion probabilities, the pseudo-likelihood approach (see, [3, 8, 12]) leads to the following function:

$$\begin{aligned} l({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})=l_1({\textbf {d}}_s)+l_2({\textbf {x}}_{\underline{s}}) \end{aligned}$$

(2)

where the complete and incomplete log-likelihood functions are as follows, respectively:

$$\begin{aligned} {\left\{ \begin{array}{ll} l_1({\textbf {d}}_s)=\sum _{h=1}^H\textrm{ln}(p_h)\sum _{k\in s_h}\frac{1}{\pi _k}+\sum _{h=1}^H\sum _{k\in s_h}\frac{\textrm{ln}(f_h(y_k,\textbf{x}_k,\theta _h))}{\pi _k},\\ l_2({\textbf {x}}_{\underline{s}})=\sum _{k\in \underline{s}}\frac{\textrm{ln}(g(\textbf{x}_k,\Theta _x))}{1-\pi _k}. \end{array}\right. } \end{aligned}$$

(3)

where $n_h$ is the size of $s_h\subseteq U_h$, which is the sub-sample of $s=\bigcup _{h=1}^Hs_h$, $N_h\ge n_h>1$, $n=\sum _{h=1}^Hn_h$. We can easy show that $E_P(l_1({\textbf {d}}_s))=l_1({\textbf {d}}_U)$ and $E_P(l_2({\textbf {x}}_{\underline{s}}))=l_2({\textbf {x}}_{U})$ where

$$\begin{aligned} l_1({\textbf {d}}_U)=\sum _{h=1}^HN_h\textrm{ln}(p_h)+\sum _{h=1}^HN_h\textrm{ln}(f_h(y_k,\textbf{x}_k,\theta _h)), \quad l_2({\textbf {x}}_U)=\sum _{k\in U}\textrm{ln}(g(\textbf{x}_k,\Theta _x)). \end{aligned}$$

This means that the sample log-likelihood functions $l_1({\textbf {d}}_s)$ and $l_2({\textbf {x}}_{\underline{s}})$ are design-unbiased estimators of the population log-likelihood functions $l_1({\textbf {d}}_U)$ and $l_2({\textbf {x}}_{U})$, respectively.

Usually, looking for the maximum of the log-likelihood function $l({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})$ is very complex and not exact. An approximation method has to be applied to solve the problem. Therefore, we use the more simple iteration method known as the EM-algorithm (see [4, 6, 7]). According to this method, function $ l({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})$ is replaced with the following:

$$\begin{aligned} l^{(t)}({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})=l_1({\textbf {d}}_s)+l_2^{(t)}({\textbf {x}}_{\underline{s}}) \end{aligned}$$

(4)

where

$$\begin{aligned}{} & {} l_2^{(t)}({\textbf {x}}_{\underline{s}})=\sum _{h=1}^H\tau _h^{(t)}\textrm{ln}(p_h)+\sum _{h=1}^H\sum _{k\in \underline{s}}\frac{\tau _{h,k}^{(t)}\textrm{ln}(g_h(\textbf{x}_k,\theta _{x,h}))}{1-\pi _k}, \end{aligned}$$

(5)

$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} \hat{\tau }_h^{(t)}=\hat{\tau }_h(\hat{\Theta }_x^{(t)})=\sum _{k\in \underline{s}}\frac{\tau _{h,k}^{(t)}}{1-\pi _k},\\ \tau _{h,k}^{(t)}=\tau _h(\textbf{x}_k,\hat{\Theta }_x^{(t)})= \frac{p_hg_h(\textbf{x}_k,\hat{\theta }_x^{(t)})}{g(\textbf{x}_k,\hat{\Theta }_x^{(t)})}, \end{array}\right. } \end{aligned}$$

(6)

$\sum _{h=1}^H\tau _{h,k}^{(t)}=1$ and $\hat{\tau }^{(t)}_{h,k}$ is the posterior probability that the k-element ($k\in \underline{s}$) belongs to the h-th domain. Moreover, $\hat{\tau }_h^{(t)}$ is the estimator of the expected size of the h-domain in the set $\underline{s}$. The Appendix provides outline of how to get optimal values of parameters $\hat{\Theta }_x^{(t+1)}$ and the following estimators of probabilities $p_h$:

$$\begin{aligned} \hat{p}^{(t+1)}_h=\frac{\hat{N}_h+\hat{\tau }_h^{(t)}}{\hat{N}+\hat{\tau }^{(t)}}, \quad h=1, \ldots ,H. \end{aligned}$$

(7)

where

$$\begin{aligned} \hat{N}_h=\sum _{k\in s_h}\frac{1}{\pi _k},\quad \hat{N}=\sum _{h=1}^H\hat{N}_h=\sum _{k\in s}\frac{1}{\pi _k},\quad \hat{\tau }^{(t)}=\sum _{h=1}^H\hat{\tau }_h^{(t)}. \end{aligned}$$

Statistics $\hat{N}$ and $\hat{\tau }^{(t)}$ are estimators of N. In general, estimators of $\hat{\Theta }^{(t+1)}$ could be obtained as roots of the first subsystem of the equation system (21). Moreover, $\tilde{N}^{(t)}_h=N\hat{p}_h^{(t)}$ is the estimator of the expected values of the domain size $Np_h$. The initial values of $\hat{\Theta }^{(t)}$ and $\hat{p}^{(t)}_h$ are equal to the roots of system $\frac{\partial l^{(t)}({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})}{\partial p_h}={\textbf {0}}$ and $\hat{p}^{(t)}_h=\frac{\hat{N}_h}{\hat{N}}$, $h=1, \ldots ,H$.

When $\pi _k$, $k\in U$ depend on variables from ${\textbf {X}}$, the likelihood function under the condition that ${\textbf {X}}={\textbf {x}}$ needs to be consider. Several aspects of this problem were discussed by Pfeffermann [8] on the basis of large literature. Therefore, in order to simplify our considerations, we assume that the inclusion probabilities $\pi _k$, $k\in U$ as well as $p_h$, $h=1, \ldots ,H$ could depend on the non-random auxiliary variable that is different from observations of variables from ${\textbf {X}}$.

The simple random sample drawn without replacement does not depend on the auxiliary variables. In this case $\pi _k=\frac{n}{N}$ for $k\in U$, and the estimator expressed by (7) simplifies to the following form:

$$\begin{aligned} \hat{p}^{(t+1)}_h=\frac{1}{2}(\bar{p}_h+\bar{\tau }_h),\quad \bar{p}_h=\frac{n_h}{n}, \quad \bar{\tau }_h^{(t)}=\frac{1}{N-n}\sum _{k\in \underline{s}}\tau _{h,k}^{(t)},\quad h=1, \ldots ,H. \end{aligned}$$

(8)

3 Estimation for a Bivariate Normal Model

3.1 Estimators

We assume that the components of the distribution mixture are two dimensional normal components with parameters: $N(\mu _{y,h},\mu _{x,h},\sigma ^2_{y,h},,\sigma ^2_{x,h},\rho _h)$, $h=1, \ldots ,H$.

In the Appendix, we evaluate estimators of domain means $\mu _{y,h}$ and the fraction of the population elements in domains $p_h$, $h=1, \ldots ,H$ according to the EM estimation algorithm and expressions (4)–(8). From Expressions (6) and (7), let us write for $t=0,1,2, \ldots $ the following:

$$\begin{aligned} {\left\{ \begin{array}{ll} \tau _h^{(t)}=\sum _{k\in \underline{s}}\frac{\tau _{h,k}^{(t)}}{1-\pi _k},\\ \tau _{h,k}^{(t)}=\frac{\hat{p}_h^{(t)}g_h(x_k,\hat{x}_h^{(t)},\sigma ^{2(t)}_{x,\underline{s},h})}{\sum _{i=1}^H\hat{p}_i^{(t)}g_i(x_k,\hat{x}_i^{(t)},\sigma ^{2(t)}_{x,\underline{s},i})},\quad \tau _{h,k}^{(0)}=\frac{\bar{p}_hg_h(x_k,\bar{x}_{s_h},\sigma ^2_{x,s_h})}{\sum _{i=1}^H\bar{p}_ig_i(x_k,\bar{x}_{s_i},\sigma ^2_{x,s_i})}. \end{array}\right. } \end{aligned}$$

(9)

where $\hat{p}_h$ and $p_h^{(0)}=\bar{p}$ are explained by expressions (7) and (8).

$$\begin{aligned}{} & {} \hat{x}_h^{(t+1)}=w^{(t)}\bar{x}_{s_h}+(1-w_h^{(t)})\bar{x}_{\underline{s},h}^{(t)}, \end{aligned}$$

(10)

$$\begin{aligned}{} & {} \bar{x}_{\underline{s},h}^{(t)}=\frac{1}{\tau _h^{(t)}}\sum _{k\in \underline{s}}x_k\frac{\tau _{h,k}^{(t)}}{1-\pi _k},\quad w^{(t)}=\frac{\hat{N}_h}{\hat{N}_h+\tau _h^{(t)}},\qquad \bar{x}_h^{(0)}=\bar{x}_{s_h},\nonumber \\{} & {} \sigma _{x,s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{(x_k-\bar{x}_{s_h})^2}{\pi _k},\quad \sigma _{y,s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{(x_k-\bar{y}_{s_h})^2}{\pi _k},\nonumber \\{} & {} \sigma _{xy,s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{(x_k-\bar{x}_{s_h})(y_k-\bar{y}_{s_h})}{\pi _k},\;\; \bar{x}_{s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{x_k}{\pi _k},\;\;\bar{y}_{s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{y_k}{\pi _k}.\nonumber \\ \end{aligned}$$

(11)

The following regression-type estimators of $\mu _{y,h}$ are derived in the Appendix:

$$\begin{aligned}{} & {} \hat{y}_h^{(t+1)}=\bar{y}_{s_h}-\frac{\sigma _{xy,s_h}}{\hat{\sigma }^{2(t+1)}_{x,h}}(\bar{x}_{s_h}-\hat{x}_h^{(t+1)})\quad \text {or}\nonumber \\{} & {} \quad \hat{y}_h^{(t+1)}=\bar{y}_{s_h}-(1-w_h^{(t)})\frac{\sigma _{xy,s_h}}{\hat{\sigma }^{2(t+1)}_{x,h}}\left( \bar{x}_{s_h}-\bar{x}_{\underline{s},h}^{(t)}\right) , \end{aligned}$$

(12)

$$\begin{aligned}{} & {} \tilde{y}_h^{(t+1)}=\bar{y}_{s_h}-\frac{\sigma _{xy,s_h}}{\sigma ^2_{x,s_h}}(\bar{x}_{s_h}-\hat{x}_h^{(t+1)})\quad \text {or}\nonumber \\{} & {} \quad \tilde{y}_h^{(t+1)}=\bar{y}_{s_h}-(1-w_h^{(t)})\frac{\sigma _{xy,s_h}}{\sigma ^2_{x,s_h}}\left( \bar{x}_{s_h}-\bar{x}_{\underline{s},h}^{(t)}\right) , \end{aligned}$$

(13)

where $t=0,1,2, \ldots $,

$$\begin{aligned}{} & {} \hat{\sigma }_{x,h}^{2(t+1)}=w^{(t)}_h\sigma _{x,s_h}^2+(1-w_h^{(t)})\sigma ^{2(t)}_{x,\underline{s},h},\qquad \hat{\sigma }_{x,h}^{2(0)}=\sigma _{x,s_h}^2, \end{aligned}$$

(14)

$$\begin{aligned}{} & {} \sigma ^{2(t)}_{x,\underline{s},h}=\frac{1}{\tau _h^{(t)}}\sum _{k\in \underline{s}}\frac{(x_k-\bar{x}_{\underline{s},h}^{(t)})^2}{1-\pi _k}\tau _{h,k}^{(t)}, \end{aligned}$$

(15)

When the constant of the linear regression y on x is approximately equal to zero, we can use the following ratio-type estimator:

$$\begin{aligned} \check{y}_h^{(t+1)}=\bar{y}_{s_h}\frac{\hat{x}_h^{(t+1)}}{\bar{x}_{s_h}}=w_h^{(t)}\bar{y}_{s_h}+(1-w_h^{(t)})\bar{y}_{s_h}\frac{\bar{x}_{\underline{s},h}^{(t)}}{\bar{x}_{s_h}}, \end{aligned}$$

(16)

Particularly, in the case of a simple random sample drawn without replacement, when $\pi _k=\frac{n}{N}$ for all $k\in U$, we have:

$$\begin{aligned}{} & {} \hat{N}_h=N\bar{p}_h=N\frac{n_h}{N},\quad \tau _h^{(t)}=N\bar{\tau }_h^{(t)},\quad \bar{\tau }_h^{(t)}=\frac{1}{N-n}\sum _{k\in \underline{s}}\tau _{h,k}^{(t)},\quad w^{(t)}=\frac{\bar{p}_h}{\bar{p}_h+\bar{\tau }_h^{(t)}},\nonumber \\{} & {} \bar{x}_{s_h}=\frac{1}{n_h}\sum _{k\in s_h}x_k,\quad \bar{y}_{s_h}=\frac{1}{n_h}\sum _{k\in s_h}y_k,\quad \sigma ^2_{x,s_h}=\sigma _{xx,s_h},\quad \sigma ^2_{y,s_h}=\sigma _{yy,s_h},\nonumber \\{} & {} \sigma _{xy,s_h}=\frac{1}{n_h}\sum _{k\in s_h}(x_k-\bar{x}_{s_h})(y_k-\bar{x}_{s_h}),\nonumber \\{} & {} \bar{x}_{\underline{s},h}^{(t)}=\frac{1}{\tau _h^{(t)}}\sum _{k\in \underline{s}}x_k\tau _{h,k}^{(t)},\quad \sigma ^{2(t)}_{x,\underline{s},h}=\frac{1}{\tau _h^{(t)}}\sum _{k\in \underline{s}}(x_k-\bar{x}_{\underline{s},h}^{(t)})^2\tau _{h,k}^{(t)}, \end{aligned}$$

(17)

Generalization of the proposed regression-type estimators into the case of a multi-dimensional auxiliary variable is as follows. Let

$$\begin{aligned}{} & {} \hat{\textbf{x}}^{(t+1)}=w^{(t)}\bar{\textbf{x}}_{s_h}+(1-w^{(t)})\bar{\textbf{x}}_{\underline{s},h}^{(t)},\\{} & {} \quad \bar{\textbf{x}}_{s_h}=[\bar{x}_{1s_h} \ldots \bar{x}_{is_h} \ldots \bar{x}_{ms_h}],\quad \bar{\textbf{x}}_{\underline{s},h}^{(t)}=[\bar{x}_{1\underline{s}h} \ldots \bar{x}_{i\underline{s}h} \ldots \bar{x}_{m\underline{s}h}],\\{} & {} \bar{x}_{i,s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{x_{k,i}}{\pi _k},\quad \bar{x}_{i,\underline{s},h}=\frac{1}{\tau _h^{(t)}}\sum _{k\in \underline{s}}\frac{x_{k,i}\tau _{h,k}^{(t)}}{1-\pi _k}, \quad \tau _{h,k}^{(t)}\\{} & {} =\frac{\hat{p}^{(t)}_hg_h\left( \textbf{x}_k,\hat{\textbf{x}}^{(t+1)},\hat{\Sigma }_{xxh}^{(t)}\right) }{\sum _{e=1}^{H}\hat{p}^{(t)}_eg_e\left( \textbf{x}_k,\hat{\textbf{x}}^{(t+1)},\hat{\Sigma }_{xxe}^{(t)}\right) }. \end{aligned}$$

Let $\mathbf {I_a}$ be the unit matrix of degree a and $\mathbf {J_a}$ be the a-element column vector which all elements are equal to one. The rows of the matrix $\textbf{X}$ could be rewritten in such a way that

$$\begin{aligned}{} & {} \textbf{X}=\left[ \begin{array}{l} \textbf{X}_s\\ \textbf{X}_{\underline{s}}\\ \end{array}\right] ,\quad \textbf{U}=\left[ \begin{array}{l} \textbf{U}_s\\ \textbf{U}_{\underline{s}}\\ \end{array}\right] ,\quad \textbf{U}_{\underline{s}}=\textbf{M}\textbf{X}_{\underline{s}},\quad \textbf{M}=\textbf{I}_{N-n}-\frac{1}{N-n}\textbf{J}_{N-n}\textbf{J}_{N-n}^T,\\{} & {} \varvec{\pi }=\left[ \begin{array}{l} \varvec{\pi }_s\\ \varvec{\pi }_{\underline{s}}\\ \end{array}\right] ,\quad \underline{\varvec{\pi }}=\textbf{J}_N-\varvec{\pi },\quad D(\varvec{\pi })=\textrm{diag}(\varvec{\pi }),\\{} & {} \hat{\Sigma }_{xx,h}^{(t)}=w^{(t)}\Sigma _{xx,s_h}+(1-w^{(t)})\hat{\Sigma }_{xx,\underline{s},h}^{(t)},\\{} & {} \Sigma _{xx,s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{(\textbf{x}_k-\bar{\textbf{x}}_{s_h})^T(\textbf{x}_k-\bar{\textbf{x}}_{s_h})}{\pi _k},\quad \Sigma _{xy,s_h}=\frac{1}{\hat{N}_h}\sum _{k\in s_h}\frac{(\textbf{x}_k-\bar{\textbf{x}}_{s_h})^T(y_k-\bar{y}_{s_h})}{\pi _k}\\{} & {} \Sigma _{xx,\underline{s},h}^{(t)}=\frac{1}{\tau ^{(t)}_h}\sum _{k\in \underline{s}}\frac{(\textbf{x}_k-\bar{\textbf{x}}_{\underline{s},h})^T(\textbf{x}_k-\bar{\textbf{x}}_{\underline{s},h})}{1-\pi _k}\tau ^{(t)}_{h,k}\\{} & {} =\frac{1}{\hat{N}}\textbf{U}_{\underline{s}}^T\textbf{U}_{\underline{s}}D^{-1}(\varvec{\pi }_{\underline{s}})=\frac{1}{\hat{N}}\textbf{X}_{\underline{s}}^T\textbf{M}\textbf{X}_{\underline{s}}D^{-1}(\varvec{\pi }_{\underline{s}}) \end{aligned}$$

These let us generalize the estimators defined by expressions (12) and (13) as follows:

$$\begin{aligned}{} & {} \hat{y}_h^{(t+1)}=\bar{y}_{s_h}-\hat{\Sigma }^{-1}_{xx,h}\Sigma _{xy,s_h}^T(\bar{\textbf{x}}_{s_h}-\hat{\textbf{x}}_h^{(t+1)})\;\text {or}\;\nonumber \\{} & {} \hat{y}_h^{(t+1)}=\bar{y}_{s_h}-(1-w_h^{(t)})\hat{\Sigma }^{-1}_{xx,h}\Sigma ^T_{xy,s_h}(\bar{\textbf{x}}_{s_h}-\bar{\textbf{x}}_{\underline{s},h}^{(t)}),\end{aligned}$$

(18)

$$\begin{aligned}{} & {} \tilde{y}_h^{(t+1)}=\bar{y}_{s_h}-\Sigma ^{-1}_{xx,s_h}\Sigma _{xy,s_h}^T(\bar{\textbf{x}}_{s_h}-\hat{\textbf{x}}_h^{(t+1)})\;\text {or}\;\nonumber \\{} & {} \tilde{y}_h^{(t+1)}=\bar{y}_{s_h}-(1-w_h^{(t)})\Sigma ^{-1}_{xx,s_h}\Sigma ^T_{xy,s_h}(\bar{\textbf{x}}_{s_h}-\bar{\textbf{x}}_{\underline{s},h}^{(t)}), \end{aligned}$$

(19)

where $t=0,1,2, \ldots $ and $\bar{\textbf{x}}_{\underline{s},h}^{(0)}=\bar{\textbf{x}}_{s_h}$, $\hat{\Sigma }_{xx,h}^{(0)}=\Sigma _{xx,s_h}$.

Usually, the estimation process is stopped when the number of iterations t reaches the preassigned level T. Some other stopping rules are discussed, e.g., in [6, 7]. These works also considered several procedures which assess accuracy of estimators such as bootstrap methods.

Finally, let us show that the estimators evaluated in the previous paragraph, which are given by expressions (12)–(14), (16), (18), (19), (7) and (8), let us construct the following estimators of the population mean:

$$\begin{aligned}{} & {} \hat{y}_h^{(t+1)}=\sum _{h=1}^H\hat{p}_h^{(t+1)}\hat{y}_h^{(t+1)},\qquad \tilde{y}_h^{(t+1)}=\sum _{h=1}^H\hat{p}_h^{(t+1)}\tilde{y}_h^{(t+1)},\nonumber \\{} & {} \qquad \check{y}_h^{(t+1)}=\sum _{h=1}^H\hat{p}_h^{(t+1)}\check{y}_h^{(t+1)}. \end{aligned}$$

(20)

where $t=0,1,2, \ldots $.

3.2 Simulation Study

Let simple random samples $\{s_j,j=1, \ldots ,M\}$ be independently drawn without replacement from the whole population of size N. We assume that each of them is partitioned between H-domains in such a way that $s_j=s_{1,j}\cup \ldots \cup s_{h,j}\cup \ldots \cup s_{H,j}$ and $2\le n_h\le n-2(H-1)$, $h=1, \ldots ,H$. Values of relative efficiency coefficient for estimator of the mean in the h-domain, $h=1, \ldots ,H$, are defined as the following ratio:

$$\begin{aligned} e(t_{s_h})=\frac{mse(t_{s_h})}{v(\bar{y}_{s_h})}100\%, \quad h=1, \ldots ,H \end{aligned}$$

where $mse(t_{s_h})=\frac{1}{M}\sum _{j=1}^{M}(t_{s_{h,j}}-\bar{y}_h)^2$, $v(\bar{y}_{s_h})=\frac{1}{M}\sum _{j=1}^{M}(\bar{y}_{s_{h,j}}-\bar{y}_h)^2$, $\bar{y}_h=\frac{1}{N_h}\sum _{j=1}^{M}y_{k,i}$, $h=1, \ldots ,H$. The relative bias of estimators is defined as follows:

$$\begin{aligned} b(t_{s_h})= & {} \frac{|\bar{t}_{s_h}-\bar{y}_h|}{mse(t_{s_h})}100\%,\qquad \\ \bar{t}_{s_h}= & {} \frac{1}{M}\sum _{j=1}^{M}t_{s_{h,j}},\quad h=1, \ldots ,H. \end{aligned}$$

We assume that $M=10 000$.

Example 1

Let us consider the following simple set of data on a two-dimensional random variable generated according to two-dimensional normal distribution. The set consists of three domains of the same size equal to 500. Hence, a population of size 1500 is divided into three domains. The data in the h-domain are generated according to normal distribution $N(\mu _{x,h},\mu _{y,h},v_{x,h},v_{y,h},\rho _h)$. We will consider the following population partitioned into domains. The domain parameters of the population are: N(8, 4, 1, 1, 0.5), N(14, 11.2, 1, 1, 0.8) and N(20, 19, 1, 1, 0.95). The spread of artificially generated data is shown in Fig. 1.

The simple random sample ($s=s_1\cup s_2\cup s_3$) is drawn without replacement from the whole population of size N. We assume that the size of each $s_{h,j}$ is $2\le n_h\le n-2(H-1)$, $h=1,2,3$, $j=1, \ldots ,M$. In the second column of Table 1, the domains are identified by integers 1, 2 and 3. Columns 3–6 give the relative efficiency coefficient values e(.) for the domains.

Estimators $\bar{y}_{s_h}$, $\tilde{y}_h^{(t)}$ and $\check{y}_h^{(t)}$ are less accurate than $\hat{y}^{(t)}_h$. In the second domain estimators $\tilde{y}_h^{(t)}$ and $\check{y}_h^{(t)}$ are more accurate than $\bar{y}_{s_h}$ and comparable with the accuracy of $\hat{y}_h^{(t)}$ for $n=75, 150$ and in the third domain for $n=150$. All considered estimators are practically unbiased because their relative biases (evaluated as the ratio of the bias module by the square root of the mean square error of the estimator) are not larger than $0.1\%$. Therefore, the biases are not shown in Table 1. Accuracy of the estimators increases when the sample size also increases. When the correlation coefficient between the auxiliary variable and the variable under study in a particular domain increases, then the accuracy of the estimator also increases. Regression estimator $\hat{y}^{(t)}_h$ is significantly more accurate than the ordinary sub-sample mean $\bar{y}_{s_h}$. Statistic $\hat{y}^{(t)}_h$ seems to be the most universal of the considered estimators and therefore should be preferred.

Table 1 Relative efficiency coefficients

Full size table

The relative biases of $\hat{p}_h^{(t)}$, $h=1, \ldots ,H$, are not larger than $0.5\%$. Accuracies of these also increase when the sample size increases. They are better than ordinary sample frequencies $\bar{p}_h$ for $n\ge 45$. Hence, the considered procedure could also be used to estimate of the probabilities $p_h$, $h=1, \ldots ,H$ of distribution mixtures. The several last rows of Table 1 let us say that all three estimators of the population average are significantly better than the simple sample mean. Moreover, the ratio-type estimator is the most accurate.

Example 2

The second population consists of data published in [11] about Swedich municipalities. We consider data about three variables REV84 (real estate values from 1984), RMT85 (revenues from municipal taxation in 1985) and ME84 (municipal employees in 1984). We take into account these data without the largest outliers. The size of the considered population is 281. The population was partitioned into three domains according to quantiles 30% and 70% of variable REV84. This provided the following sizes of domains: $N_1=86$, $N_2=109$ and $N_3=86$. Real estate valuation depends on market fluctuations. Therefore, the same property today may be in the first domain, but tomorrow it may be in a different domain. Therefore, belonging to a domain can be treated as random.

The distributions of variables RMT85 and ME84 have too much asymmetry on the right side, and they differ too significantly from normal distribution. Therefore, we considered their logarithmic transformation, and the spread of this is shown Fig. 2. The domain mean values of logRMT85 were $\mu _1=6.704$, $\mu _2=7.5.20$ and $\mu _3=8.528$. The simulation of estimation accuracy was based on the simple random samples drawn without replacement. The sizes of the samples were: 8 (2.85% the population size), 14 (4.98%) and 28(2.96%). Table 2 shows only the accuracies of the estimation of population mean because the estimators of the domain means were less accurate than the simple random sample mean. Analysis of Table 2 lets us say that all of the estimators of the population mean that are taken into account, are more accurate than the simple random sample mean. The accuracy of the second regression estimator is the best among the considered ones, and similarly its relative bias is the smallest.

Table 2 Estimation accuracy of population mean

Full size table

Example 3

Let us consider data about current and starting salaries of employees that are available in the SPSS statistical packages as the example dataset. The set consists of 474 observations. The two data domains are identified. The first domain of 390 observations is the set of clerks and the second one consists of 84 managers. In general, an employee randomly belongs to one of these domains, because one day he could be a manager and the next day he could be a clerk, and vice versa. The starting and current salaries in the first domain (clerks) are $14164 and $28054, respectively. The starting and current salaries in the second domain (managers) are $28091 and $63978, respectively. The spread of the data partitioned into domains is shown in Fig. 3. The following sizes of samples were taken into account: 15 (3.2%), 24 (5.1%) and 48 (10.1%). The results of the simulation are shown in Table 3. Similar to example 2, this table shows only the accuracy of the estimation of the population mean because the estimators of domain means were less accurate than the simple random sample mean.

Analysis of Table 3 leads to the conclusions that the all estimators are more accurate than the simple sample mean for sample size $n>24$ except the ratio estimator because its deff coefficients are less than 100% for all sample sizes. The accuracy of the ratio estimator decreases when the sample size increases. The relative biases of the estimators are quite large.

Table 3 Estimation accuracy of the population mean

Full size table

Analysis of all the tables and figures lets us say that estimation of domain means is possible only when data observed in domains are well separated. The more optimistic conclusion is that the proposed estimators of population means are always more accurate than the simple random sample in all considered cases, when the sample size is at least 5% of the population size. Their biases are also rather acceptable. Of those estimators of the population mean, the second regression estimator and the ratio estimator are the best and could be used in practical research.

The presented simulation analyzes will be continued in a wider scope in the next article. In particular, different mixtures of at least three-dimensional probability distributions will be considered. In addition, various modifications to the estimators used herein will be proposed, leading to a more accurate estimation of the domain averages.

4 Conclusions

Three estimators of domain means use additional auxiliary data in order to improve estimation accuracy. Properties of the maximum likelihood method let us to derive new estimators of domain and population means. The simulation analysis shows which is the best estimator for occasions when the domains are sufficiently well separated. This separation may not to be very obvious when estimating the population mean. In this case, all of the estimators of population means were better than the simple random sample mean. The considered estimation method lets us also estimate the probabilities of the distribution mixtures. Generalization of the regression estimators for a multidimensional auxiliary variable was also shown.

Some other generalization or modifications of the estimation procedure are possible. Auxiliary variables observed in censuses or in official registers can be used to improve the efficiency of estimating means. We can consider distributions other than normal as elements of the mixture. For instance, expenditures or incomes in domains could be modeled by means of asymmetric distributions like lognormal or gamma distributions.

References

Cassel CM, Särndal CE, Wretman JH (1977) Foundation of inference in survey sampling. Wiley, New York
MATH Google Scholar
Chambers RL, Skinner CJ (2003) Analysis of survey data. Wiley, Chichester
Book MATH Google Scholar
Chen J, Sitter RR (1999) A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys. Stat Sin 9:385–406
MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
MATH Google Scholar
Kendall MG, Stuart A (1961) The advanced theory of statistics. Vol. 2: inference and relationship. Charles Griffin & Company Limited, London
MATH Google Scholar
McLachlan G, Krishnan T (1977) The EM algorithm and extensions. Wiley, New York
MATH Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Pfeffermann D (1993) The role of sampling weights when modelling survey data. Int Stat Rev 61(2):317–337
Article MATH Google Scholar
Skinner CJ, Holt D, Smith TMF (1989) Analysis of complex surveys. Wiley, New York
MATH Google Scholar
Rao JNK, Molina I (2015) Small area estimation. Wiley, Hoboken
Book MATH Google Scholar
Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New York
Book MATH Google Scholar
Thompson ME (1997) Theory of sample surveys. Chapman & Hall, London
Book MATH Google Scholar
WywiałJ L (2018) Application of two gamma distribution mixture to financial auditing. Sankhya B 80(1):1–18
Article MathSciNet Google Scholar
Ża̧dło T (2006) On prediction of total value in incompletely specified domains. Aust N Z J Stat 48(3):269–283
Article MathSciNet Google Scholar

Download references

Acknowledgements

This paper is the result of a grant supported by the National Science Centre, Poland, No. 2016/21/B/HS4/00666.

Author information

Authors and Affiliations

Department of Statistics, Econometrics and Mathematics, University of Economics in Katowice, Katowice, Poland
Janusz L. Wywiał

Authors

Janusz L. Wywiał
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Janusz L. Wywiał.

Ethics declarations

Conflict of interest

The author declare that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Derivation of parameters $\hat{\Theta }_x^{(t+1)}$

$\hat{\Theta }_x^{(t+1)}$ maximize the log-likelihood function $l^{(t)}({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})$ [given by expression (4)] provided $\sum _{h=1}^Hp_h=1$. Equivalently $\hat{\Theta }_x^{(t+1)}$ maximize the Lagrangian function: $\psi _x^{(t)}(\Theta )=l^{(t)}({\textbf {d}}_s,{\textbf {x}}_s)+\lambda (1-\sum _{h=1}^Hp_h)$. According to the pseudo-likelihood approach (see, e.g., [8, 12]), $\hat{\Theta }_x^{(t+1)}$ is the solution of the following pseudo-log-likelihood estimation equation system:

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{\partial \psi ^{(t)}_x(\Theta )}{\partial \Theta }=\frac{\partial l_1({\textbf {d}}_s)}{\partial \Theta }+\frac{\partial l^{(t)}_2({\textbf {x}}_{\underline{s}})}{\partial \Theta _x}=\textbf{0}, \\ \frac{\partial l^{(t)}({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})}{\partial p_h}=\lambda ,\quad h=1, \ldots ,H \end{array}\right. } \end{aligned}$$

(21)

where

$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} \frac{\partial l_1({\textbf {d}}_s)}{\partial \Theta }=\sum _{h=1}^H\sum _{k\in s_h}\frac{\partial ln(f_h(y_k,\textbf{x}_k,\Theta ))}{\partial \Theta }, \\ \frac{\partial l^{(t)}_2({\textbf {x}}_{\underline{s}})}{\partial \Theta _x}=\sum _{h=1}^H\sum _{k\in \underline{s}}\frac{\partial ln(g_h(\textbf{x}_k,\Theta _x))}{\partial \Theta _x}\tau ^{(t)}_{h,k}, \end{array}\right. } \end{aligned}$$

(22)

$$\begin{aligned}{} & {} \frac{\partial l^{(t)}({\textbf {d}}_s,{\textbf {x}}_{\underline{s}})}{\partial p_h}=\frac{\partial l_1({\textbf {d}}_s)}{\partial p_h}+\frac{\partial l^{(t)}_2({\textbf {x}}_{\underline{s}})}{\partial p_h},\quad h=1, \ldots ,H, \end{aligned}$$

(23)

$$\begin{aligned}{} & {} \frac{\partial l_1({\textbf {d}}_s)}{\partial p_h}=\frac{\hat{N}_h}{p_h},\quad \quad \frac{\partial l^{(t)}_2({\textbf {x}}_{\underline{s}})}{\partial p_h}=\frac{\hat{\tau }_h^{(t)}}{p_h}. \end{aligned}$$

(24)

These equations let us write: $\hat{N}_h+\hat{\tau }_h^{(t)}=\lambda p_h$ for $h=1, \ldots ,H$. This leads to $\lambda =\sum _{h=1}^H(\hat{N}_h+\hat{\tau }_h^{(t)})$ and finally to the expression (7).

1.2 Derivation of estimators under an assumed normal distribution mixture

Under an assumed mixture of normal distribution, the components of the model expressed by (1) become as follows:

$$\begin{aligned} f_h(y_k,x_k,\theta _h)=\frac{1}{2\pi \sqrt{\sigma _{x,h}^2 \sigma _{y,h}^2(1-\rho _h^2)}}exp\left\{ Q(y_k,x_k,\theta _h)\right\} \end{aligned}$$

(25)

where $\theta _h=[\mu _{y,h}\;\mu _{x,h}\;\sigma _{y,h}\;\sigma _{x,h}\;\sigma _{x,y,h}]$, $\rho _h=\frac{\sigma _{xy,h}}{\sigma _{x,h}\sigma _{y,h}}$ and

$$\begin{aligned}{} & {} Q(y_k,x_k,\theta _h)=-\frac{1}{2(1-\rho _h^2)}\left( \frac{(x_k-\mu _{x,h})^2}{\sigma ^2_{x,h}}\right. \nonumber \\{} & {} \left. -2\rho _h\frac{(x_k-\mu _{x,h})(y_k-\mu _{y,h})}{\sqrt{\sigma ^2_{x,h}\sigma ^2_{y,h}}} +\frac{(y_k-\mu _{y,h})^2}{\sigma ^2_{y,h}}\right) ,\nonumber \\{} & {} g(\textbf{x}_k,\Theta _x)=\sum _{h=1}^Hp_hg_h(\textbf{x}_k,\theta _{x,h}),\\{} & {} g_h(x_k,\theta _{x,h})=g_h(x_k,\mu _{x,h},\sigma ^2_{x,h})= \frac{1}{\sqrt{2\pi }}(\sigma ^2_{x,h})^{-\frac{1}{2}} exp{\left\{ -\frac{(x_k-\mu _{x,h})^2}{2\sigma ^2_{x,h}}\right\} }.\nonumber \end{aligned}$$

(26)

The log-likelihood functions defined by expressions (3)–(6) take the following forms:

$$\begin{aligned} l_1({\textbf {d}}_s)= & {} \sum _{h=1}^H\hat{N}_hln(p_h)-\hat{N}ln(2\pi )+\nonumber \\{} & {} -\frac{1}{2}\sum _{h=1}^H\hat{N}_h\left( \ln (\sigma ^2_{x,h})+ln(\sigma ^2_{y,h})+ln(1-\rho _h^2)\right) +\sum _{h=1}^HQ_{s_h}(y_k,x_k,\theta _h),\nonumber \\ \end{aligned}$$

(27)

where

$$\begin{aligned} Q_{s_h}(y_k,x_k,\theta _h)= & {} -\frac{\hat{N}_h}{2(1-\rho _h^2)} \left( \frac{\sigma ^2_{*x,s_h}}{\sigma ^2_{x,h}}-2\rho _h\frac{\sigma _{*xy,s_h}}{\sqrt{\sigma ^2_{x,h} \sigma ^2_{y,h}}}+\frac{\sigma ^2_{*y,s_h}}{\sigma ^2_{y,h}}\right) ,\nonumber \\ \sigma _{*xy,s_h}= & {} \frac{1}{\hat{N}_h}\sum _{k\in s_h}(x_k-\mu _{x,h})(y_k-\mu _{y,h}),\quad \sigma ^2_{*x,s_h}=\sigma _{*xx,s_h},\quad \sigma ^2_{*y,s_h}=\sigma _{*yy,s_h}.\nonumber \\ \end{aligned}$$

(28)

$$\begin{aligned} l^{(t)}_2({\textbf {x}}_{\underline{s}})= & {} \sum _{h=1}^Hln(p_h)\tau _h^{(t)}-\frac{1}{2}\sum _{h=1}^H\sum _{k\in \underline{s}}\frac{\tau _{h,k}^{(t)}}{1-\pi _k}\left( ln(2\pi )+ln\left( \sigma _{x,h}^2\right) +\frac{(x_k-\mu _{x,h})^2}{\sigma _{x,h}^2}\right) \nonumber \\= & {} \sum _{h=1}^Hln(p_h)\tau _h^{(t)}-\frac{\tau ^{(t)}}{2}\ln (2\pi )-\frac{1}{2}\sum _{h=1}^H\ln (\sigma _{x,h}^2)\tau _h^{(t)}-\frac{1}{2}\sum _{h=1}^H\frac{\sigma ^{2(t)}_{*x,\underline{s},h}\tau _h^{(t)}}{\sigma _{x,h}^2},\nonumber \\ \end{aligned}$$

(29)

$$\begin{aligned} \sigma ^{2(t)}_{*x,\underline{s},h}= & {} \frac{1}{\tau _h^{(t)}}\sum _{k\in \underline{s}} \frac{(x_k-\mu _{x,h})^2}{1-\pi _k}\tau _{h,k}^{(t)}. \end{aligned}$$

(30)

The first derivatives of the likelihood function expressed by (27) in the case of normal distribution, as given by (25), are evaluated as follows (see Kendall and Stuart [5], page 57):

$$\begin{aligned}{} & {} \frac{\partial l_1(\textbf{d}_s)}{\partial \mu _{x,h}}=\frac{\hat{N}_h}{\sigma _{x,h}(1-\rho _h^2)}\left( \frac{\bar{x}_{s_h}-\mu _{x,h}}{\sigma _{x,h}}-\rho _h\frac{\bar{y}_{s_h}-\mu _{y,h}}{\sigma _{y,h}}\right) , \end{aligned}$$

(31)

$$\begin{aligned}{} & {} \frac{\partial l_1(\textbf{d}_s)}{\partial \mu _{y,h}}=\frac{\hat{N}_h}{\sigma _{y,h}(1-\rho _h^2)}\left( \frac{\bar{y}_{s_h}-\mu _{y,h}}{\sigma _{y,h}}-\rho _h\frac{\bar{x}_{s_h}-\mu _{x,h}}{\sigma _{x,h}}\right) , \end{aligned}$$

(32)

$$\begin{aligned}{} & {} \frac{\partial l_1(\textbf{d}_s)}{\partial \sigma ^2_{x,h}}=-\frac{\hat{N}_h}{2\sigma ^2_{x,h}(1-\rho _h^2)}\left( (1-\rho _h^2)-\frac{\sigma ^2_{*x,s_h}}{\sigma ^2_{x,h}}+\rho _h\frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}\right) , \end{aligned}$$

(33)

$$\begin{aligned}{} & {} \frac{\partial l_1(\textbf{d}_s)}{\partial \sigma ^2_{y,h}}=-\frac{\hat{N}_h}{2\sigma ^2_{y,h}(1-\rho _h^2)}\left( (1-\rho _h^2)-\frac{\sigma ^2_{*y,s_h}}{\sigma ^2_{y,h}}+\rho _h\frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}\right) , \end{aligned}$$

(34)

$$\begin{aligned}{} & {} \frac{\partial l_1(\textbf{d}_s)}{\partial \rho _h}=\frac{\hat{N}_h}{1-\rho _h^2}\left\{ \rho _h-\frac{1}{1-\rho _h^2}\left[ \rho _h\left( \frac{\sigma ^2_{*x,s_h}}{\sigma ^2_{x,h}}+\frac{\sigma ^2_{*y,s_h}}{\sigma ^2_{y,h}}\right) -(1+\rho _h^2)\frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}\right] \right\} .\nonumber \\ \end{aligned}$$

(35)

The first derivatives of the likelihood function expressed by (29) in the case of normal distribution are as follows:

$$\begin{aligned}{} & {} \frac{\partial l_2^{(t)}(\textbf{x}_{\underline{s}})}{\partial \mu _{x,h}}=\sum _{k\in \underline{s}}\frac{(x_h-\mu _{x,h})\tau ^{(t)}_{h,k}}{\sigma _{x,h}^2}=\frac{(\bar{x}_{\underline{s},h}^{(t)}-\mu _{x,h})\tau ^{(t)}_h}{\sigma _{x,h}^2}, \end{aligned}$$

(36)

$$\begin{aligned}{} & {} \frac{\partial l_2^{(t)}(\textbf{x}_{\underline{s}})}{\partial \sigma ^2_{x,h}}=-\frac{\tau ^{(t)}_{h}}{2\sigma ^2_{x,h}}\left( 1-\frac{\sigma _{*x,\underline{s},h}^2}{\sigma ^2_{x,h}}\right) \end{aligned}$$

(37)

where $\tau ^{(t)}_{h,k}$, $\tau ^{(t)}_h$, $\bar{x}_{\underline{s},h}^{(t)}$ and $\sigma _{*x,\underline{s},h}$ are given by expressions (9), (11) and (30).

Equations $\frac{\partial l^{(t)}(\textbf{x}_U)}{\partial \mu _{x,h}}=\frac{\partial l_1(\textbf{d}_s)}{\partial \mu _{x,h}}+\frac{\partial l_2^{(t)}(\textbf{x}_{\underline{s}})}{\partial \mu _{x,h}}=0$ and $\frac{\partial l^{(t)}(\textbf{x}_U)}{\partial \mu _{y,h}}=\frac{\partial l_1(\textbf{d}_s)}{\partial \mu _{y,h}}=0$ are equivalent to the following, respectively:

$$\begin{aligned} \frac{\hat{N}_h}{\sigma _{x,h}(1-\rho _h^2)}\left( \frac{\bar{x}_{s_h}-\mu _{x,h}}{\sigma _{x,h}}-\rho _h\frac{\bar{y}_{s_h}-\mu _{y,h}}{\sigma _{y,h}}\right) +\frac{(\bar{x}_{\underline{s},h}-\mu _{x,h})\tau ^{(t)}_h}{\sigma _{x,h}^2}=0 \end{aligned}$$

(38)

and

$$\begin{aligned} \frac{\bar{y}_{s_h}-\mu _{y,h}}{\sigma _{y,h}}=\rho _h\frac{\bar{x}_{s_h}-\mu _{x,h}}{\sigma _{x,h}}. \end{aligned}$$

(39)

When in (38) we replace $\frac{\bar{y}_{s_h}-\mu _{y,h}}{\sigma _{y,h}}$ with the right side of equation (39), then, after simplification, we have:

$$\begin{aligned} \mu _{x,h}(\hat{N}_h+\tau ^{(t)}_h)= \hat{N}_h\bar{x}_{s_h}+\tau ^{(t)}_h\bar{x}_{\underline{s},h}^{(t)}. \end{aligned}$$

This leads to the iterative estimator $\hat{x}_{\underline{s},h}^{(t+1)}$, given by (10) and (11).

Equation $\frac{\partial l_1(\textbf{d}_s)}{\partial \mu _{y,h}}=0$ is equivalent to the following:

$$\begin{aligned} \mu _{y,h}=\bar{y}_{s_h}-\frac{\sigma _{xy,h}}{\sigma ^2_{x,h}}(\bar{x}_{s_h}-\mu _{x,h}). \end{aligned}$$

After replacing $\mu _{x,h}$ with $\hat{x}_{x,h}^{(t+1)}$, we have:

$$\begin{aligned} \hat{\mu }_{y,h}^{(t+1)}=\bar{y}_{s_h}-\frac{\sigma _{xy,h}}{\sigma ^2_{x,h}} (\bar{x}_{s_h}-\hat{x}_{x,h}^{(t+1)})=\bar{y}_{s_h}-(1-w_h^{(t)}) \frac{\sigma _{xy,h}}{\sigma ^2_{x,h}}(\bar{x}_{s_h}-\bar{x}_{\underline{s},h}^{(t)}), \end{aligned}$$

(40)

Now, when we replace $\sigma _{xy,h}$ and $\sigma _{x,h}^2$ with $\sigma _{xy,s_h}$ and $\sigma _{x,s_h}^2$, respectively, we obtain the estimator $\tilde{y}^{(t+1)}_h$ expressed by (13).

Equations $\frac{\partial l^{(t)}(\textbf{x}_U)}{\partial \sigma ^2_{y,h}}=\frac{\partial l_1(\textbf{d}_s)}{\partial \sigma ^2_{y,h}}=0$, $\frac{\partial l^{(t)}(\textbf{x}_U)}{\partial \sigma ^2_{x,h}}=\frac{\partial l_1(\textbf{d}_s)}{\partial \sigma ^2_{x,h}}+\frac{\partial l_2^{(t)}(\textbf{x}_{\underline{s}})}{\partial \sigma ^2_{x,h}}=0$ are equivalent to the following:

$$\begin{aligned} 1-\rho _h^2= & {} \frac{\sigma ^2_{*y,s_h}}{\sigma ^2_{y,h}}-\rho _h \frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}, \end{aligned}$$

(41)

$$\begin{aligned} 1-\rho _h^2= & {} \frac{\sigma ^2_{*x,s_h}}{\sigma ^2_{x,h}}-\rho _h \frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}+\frac{(1-\rho _h^2)\tau ^{(t)}_{h}}{n_h} \left( \frac{\sigma ^2_{*x,\underline{s},h}}{\sigma ^2_{x,h}}-1\right) \end{aligned}$$

(42)

and after multiplying $\frac{\partial l_1(\textbf{d}_s)}{\partial \rho _h}=0$ [see equation (35)] by $\frac{1-\rho ^2_h}{n_h\rho _h}$ we have:

$$\begin{aligned} 1-\rho _h^2=\frac{\sigma ^2_{*y,s_h}}{\sigma ^2_{y,h}}+\frac{\sigma ^2_{*x,s_h}}{\sigma ^2_{x,h}} -\frac{1+\rho _h^2}{\rho _h}\frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}. \end{aligned}$$

(43)

Similar to Kendall and Stuart [5] pp. 57–58, let us add Eqs. (41) and (42). Next, after subtracting Eq. (43) from this sum, we have:

$$\begin{aligned} 1-\rho _h^2=\frac{1-\rho _h^2}{\rho _h}\frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}+\frac{(1-\rho _h^2)\tau ^{(t)}_{h}}{n_h}\left( \frac{\sigma ^2_{*x,\underline{s},h}}{\sigma ^2_{x,h}}-1\right) . \end{aligned}$$

The above equation is multiplied by $\frac{\rho _h^2}{1-\rho _h^2}$ and simplified to the form:

$$\begin{aligned} \rho _h\frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}=\rho _h^2 \left( 1-\frac{\tau ^{(t)}_{h}}{n_h}\left( \frac{\sigma ^2_{*x,\underline{s},h}}{\sigma ^2_{x,h}}-1\right) \right) . \end{aligned}$$

In Eq. (42) we replace $\rho _h\frac{\sigma _{*xy,s_h}}{\sigma _{x,h}\sigma _{y,h}}$ with the right side of the above equation. After some simplification, this lets us write:

$$\begin{aligned} \sigma ^2_{x,h}=w_h^{(t)}\sigma ^2_{*x,s_h}+(1-w_h^{(t)})\sigma ^{2(t)}_{*x,\underline{s},h}. \end{aligned}$$

In expressions (28) and (30), mean $\mu _h$ is replaced by $\bar{x}_{s_h}$ and $\bar{x}_{\underline{s}_h}^{(t)}$, respectively. This leads to expression (14) and (12).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wywiał, J.L. On the Maximum Likelihood Estimation of Population and Domain Means. J Stat Theory Pract 17, 40 (2023). https://doi.org/10.1007/s42519-023-00337-4

Download citation

Accepted: 10 June 2023
Published: 12 July 2023
DOI: https://doi.org/10.1007/s42519-023-00337-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the Maximum Likelihood Estimation of Population and Domain Means

Abstract

Similar content being viewed by others

Domain Mean Estimators Assisted by Nested Error Regression Models

Improved Estimation Strategy for Population Mean Using Auxiliary Information

Model-based variance estimation in two-dimensional systematic sampling

1 Introduction

2 General Results

2.1 Model-Design Approach

2.2 Maximum Likelihood Estimation

3 Estimation for a Bivariate Normal Model

3.1 Estimators

3.2 Simulation Study

Example 1

Example 2

Example 3

4 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 Derivation of parameters \(\hat{\Theta }_x^{(t+1)}\)

1.2 Derivation of estimators under an assumed normal distribution mixture

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the Maximum Likelihood Estimation of Population and Domain Means

Abstract

Similar content being viewed by others

Domain Mean Estimators Assisted by Nested Error Regression Models

Improved Estimation Strategy for Population Mean Using Auxiliary Information

Model-based variance estimation in two-dimensional systematic sampling

1 Introduction

2 General Results

2.1 Model-Design Approach

2.2 Maximum Likelihood Estimation

3 Estimation for a Bivariate Normal Model

3.1 Estimators

3.2 Simulation Study

Example 1

Example 2

Example 3

4 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Derivation of parameters \(\hat{\Theta }_x^{(t+1)}\)

1.2 Derivation of estimators under an assumed normal distribution mixture

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation