Skip to main content

Advertisement

Log in

Robust model-based sampling designs

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We investigate methods for the design of sample surveys, and address the traditional resistance of survey samplers to the use of model-based methods by incorporating model robustness at the design stage. The designs are intended to be sufficiently flexible and robust that resulting estimates, based on the designer’s best guess at an appropriate model, remain reasonably accurate in a neighbourhood of this central model. Thus, consider a finite population of N units in which a survey variable Y is related to a q dimensional auxiliary variable x. We assume that the values of x are known for all N population units, and that we will select a sample of nN population units and then observe the n corresponding values of Y. The objective is to predict the population total \(T=\sum_{i=1}^{N}Y_{i}\). The design problem which we consider is to specify a selection rule, using only the values of the auxiliary variable, to select the n units for the sample so that the predictor has optimal robustness properties. We suppose that T will be predicted by methods based on a linear relationship between Y—possibly transformed—and given functions of x. We maximise the mean squared error of the prediction of T over realistic neighbourhoods of the fitted linear relationship, and of the assumed variance and correlation structures. This maximised mean squared error is then minimised over the class of possible samples, yielding an optimally robust (‘minimax’) design. To carry out the minimisation step we introduce a genetic algorithm and discuss its tuning for maximal efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Chambers, R.L., Dunstan, R.: Estimating distribution functions from survey data. Biometrika 73, 597–604 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Duan, N.: Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78, 605–610 (1983)

    Article  MATH  Google Scholar 

  • Fang, Z., Wiens, D.P.: Integer-valued, minimax robust designs for estimation and extrapolation in heteroscedastic, approximately linear models. J. Am. Stat. Assoc. 95, 807–818 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, Y., Welsh, A.H.: Transformation and smoothing in sample survey data. Scand. J. Stat. 37, 496–513 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Mandal, A., Johnson, K., Wu, J.C.F., Bornemeier, D.: Identifying promising compounds in drug discovery: genetic algorithms and some new statistical techniques. J. Chem. Inf. Model. 47, 981–988 (2007)

    Article  Google Scholar 

  • Thisted, R.: Elements of Statistical Computing. Chapman and Hall, New York (1988)

    MATH  Google Scholar 

  • Valliant, R., Dorfman, A.H., Royall, R.M.: Finite Population Sampling and Inference—a Prediction Approach. Wiley, New York (2000)

    MATH  Google Scholar 

  • Welsh, A.H., Zhou, X.H.: Estimating the retransformed mean in a heteroscedastic two-part model. J. Stat. Plan. Inference 136, 860–881 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Wiens, D.P., Xu, X.: Robust prediction and extrapolation designs for misspecified generalized linear regression models. J. Stat. Plan. Inference 138, 30–46 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Wiens, D.P., Zhou, J.: Robust estimators and designs for field experiments. J. Stat. Plan. Inference 138, 93–104 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The research of A. Welsh is supported by an Australian Research Council Discovery Project grant. That of D. Wiens is supported by the Natural Sciences and Engineering Research Council of Canada, and was largely carried out while enjoying the hospitality of the Centre for Mathematics and its Applications at Australian National University. The work has benefited from the incisive comments of two anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas P. Wiens.

Appendix: Derivations

Appendix: Derivations

Proof of Lemma 1

If \(\boldsymbol{H}\in\mathcal{H}^{\prime}\), then \(\mathcal{L}(\boldsymbol{H})\leq\mathcal{L}(\tau^{2}\mathbf{I}_{N})\) and (13) holds for \(\mathcal{H}^{\prime}\). Since \(\tau^{2}\mathbf{I} _{N}\in\mathcal{H}\), the proof will be complete if we can establish that \(\mathcal{H}\subset\mathcal{H}^{\prime}\). To show this, let \(\boldsymbol {H}\in\mathcal{H}\). Recall that the spectral radius \(\rho( \mathbf{M} ) =\{\operatorname{ch}_{\max}( \mathbf{M}^{T}\mathbf{M})\}^{1/2}\) is bounded by any induced matrix norm, so that

$$\operatorname{ch}_{\max}( \boldsymbol{H}) =\rho( \boldsymbol{H}) \leq\Vert\boldsymbol{H}\Vert\leq\tau^{2}. $$

Thus, for any non-null vector t,

$$\mathbf{t}^{T}\boldsymbol{H}\mathbf{t\leq}\tau^{2}\mathbf{t}^{T}\mathbf {t} $$

or, equivalently,

$$\mathbf{t}^{T}\bigl( \boldsymbol{H}-\tau^{2}\mathbf{I}_{N}\bigr) \mathbf{t}\leq0, $$

so that \(\boldsymbol{H}\in\mathcal{H}^{\prime}\). □

Proof of Lemma 2

Using (5), (10) and (11), we can write

$$ n^{1/2}(\boldsymbol{\hat{\theta}}-\boldsymbol{\theta})=n^{1/2} \mathbf{B}_{n}^{-1} ( \mathbf{b}_{f}+ \mathbf{b}_{\varepsilon}+\mathbf{b}_{\eta } ) , $$
(19)

where

It follows from (19) that

By (C2) and the remark following the assumptions, it now suffices to show that each of the terms in the numerator is bounded. From (C2),

$$n\Vert\mathbf{b}_{f}\Vert^{2}\leq\Vert\mathbf{f}_{n}\Vert ^{2}\operatorname{tr} (\mathbf{C}_{n})\leq\tau_{f}^{2}\operatorname{tr}(\mathbf{C}_{n}). $$

For the second term, using (C2) and (12b) we obtain

Similarly, for the third term,

$$nE\bigl(\Vert\mathbf{b}_{\eta}\Vert^{2}\bigr)\leq \tau_{h}^{2}\operatorname{tr} (\mathbf{C}_{n}). $$

 □

Proof of Theorem 1

We require the definition

$$Q_{ij}=g_{0}^{-1/2}(\mathbf{x}_{i})r_{i}-g_{0}^{-1/2}(\mathbf {x}_{j})r_{j}$$

of the difference between pairs of normalised residuals. Using this definition, the numerator of (16) is

$$ \mathcal{L}_{N}^{\ast}(f,g,h)= ( nN )^{-1}\sum _{i=n+1}^{N}g_{0}( \mathbf{x}_{i})\sum_{j=1}^{n}E \bigl( Q_{ij}^{2} \bigr) . $$
(20)

Write \(\hat{Y}_{i}\) in (7) as

$$\hat{Y}_{i}=\frac{1}{n}\sum_{j=1}^{n} \gamma\bigl\{\gamma^{-1}(Y_{i})-g_{0}^{1/2}( \mathbf{x}_{i})Q_{ij}\bigr\}. $$

Expanding around \(g_{0}^{1/2}(\mathbf{x}_{i})Q_{ij}\) and substituting into (8) gives

where δ i,j lies between γ −1(Y i ) and \(\gamma^{-1} (Y_{i})-g_{0}^{1/2}(\mathbf{x}_{i})Q_{ij}\) or \(|\delta_{i,j}-\gamma^{-1} (Y_{i})|\leq|g_{0}^{1/2}(\mathbf{x}_{i})Q_{ij}|\). (The main difficulty with using this expression directly is that γ′ is a complicated function of f, g and h.) We apply the Cauchy-Schwarz Inequality to obtain

and then bound the term involving γ′ in order to obtain the loss function \(\mathcal{L}_{N}(f,g,h)\).

To develop a bound for

$$(nN)^{-1}\sum_{i=n+1}^{N}\sum_{j=1}^{n}E \bigl\{\gamma ^{\prime}(\delta_{i,j})^{2}\bigr\}, $$

note that Q ij is linear in the residuals, hence in the elements {γ −1(Y i )} of δ. Thus we can write

$$g_{0}^{1/2}(\mathbf{x}_{i})Q_{ij}=\mathbf{a}_{i,j}^{T}\boldsymbol{\delta}, $$

where the elements of a i,j are bounded functions of {x i }. Then we can write

$$\delta_{i,j}=\gamma^{-1}(Y_{i})-t_{i,j}\mathbf{a}_{i,j}^{T}\boldsymbol {\delta },\quad 0\leq t_{i,j}\leq1, $$

and hence, assuming without loss of generality that γ′ is nondecreasing and K is a finite, positive constant,

$$\bigl|\gamma^{\prime}(\delta_{i,j})\bigr|\leq \bigl|\gamma^{\prime} \bigl(\gamma^{-1}(Y_{i})+K\Vert\boldsymbol{\delta}\Vert\bigr)\bigr|. $$

It follows from (C4) that

is bounded.

Finally, we show that the loss \(\mathcal{L}_{N}(f,g,h)\) is O(1). By (C3) we have that

Using (C1), the claim will follow if we can find constants \(\{ K_{i,N}^{2}\} _{i=1}^{N}\) satisfying \(\operatorname{E}\{r_{i}^{2} /g_{0}(\mathbf{x}_{i})\}\leq K_{i,N}^{2}\) and \(N^{-1}\sum _{i=1}^{N}K_{i,N} ^{2}=O(1)\). From (6), we have an upper bound if we choose

and from (C3) and (12a)–(12c),

which is O(1) by C2) and Lemma 2. □

Proof of Theorem 2

We first represent \(\mathcal{L}_{N}^{\ast}\) at (20) as

$$\mathcal{L}_{N}^{\ast}=E\bigl\{\mathbf{r}^{T} \operatorname{diag}\bigl(\mathbf{G}_{0,n}^{-1/2}, \mathbf{I}_{N-n}\bigr)\mathbf{U}\operatorname{diag}\bigl( \mathbf{G}_{0,n}^{-1/2},\mathbf{I}_{N-n}\bigr)\mathbf{r} \bigr\}, $$

the expected value of a quadratic form in the residual vector r N =(r 1,…,r N )T. Combining (19) and (6), we have

where T is the N×N matrix

with K and P defined in (17) and (18) respectively. Thus, with \(\mathbf{V}=\operatorname{diag}(\mathbf{G}_{0,n}^{-1/2},\mathbf{I}_{N-n})\mathbf{T}\) and M=V T UV,

where \(\mathcal{L}_{f,N}=\mathbf{f}_{N}^{T}\mathbf{Mf}_{N}\), \(\mathcal {L}_{g,N}=\sigma_{\varepsilon}^{2}\operatorname{tr}(\mathbf{G}_{N}\mathbf{M})\) and \(\mathcal{L}_{h,N}=\operatorname{tr}(\mathbf{H}_{N}\mathbf{M})\).

To obtain the maximum of \(\mathcal{L}_{f,N}\), note that \(\mathcal {L}_{f,N} \leq\tau_{f}^{2}\operatorname{ch}_{\max}(\mathbf{M})\), with equality if and only if f N is a characteristic vector of M of norm τ f . Any such vector is in the column space of M, hence (since \(\mathbf{Z}_{N}^{T}\mathbf{M}=\mathbf{0}_{p\times N}\)) satisfies (11) and so is also the maximiser under this constraint. Finally, \(\max_{\mathcal{G}}\mathcal{L}_{g,N}=\sigma_{\varepsilon }^{2}(1+\tau_{g}^{2}) \operatorname{tr}(\mathbf{G}_{0,N}\mathbf{M})\) follows from (12b) and \(\max_{\mathcal{H}}\mathcal{L}_{h,N}=\tau _{h}^{2}\operatorname{tr}(\mathbf{M})\) follows from Lemma 1. Combining these results, we obtain

and the result follows on dividing both sides by the normalising value \(\tau_{f}^{2}+\sigma_{\varepsilon}^{2}( 1+\tau_{g}^{2}) +\tau _{h}^{2}\). □

Proof of Lemma 3

The characteristic equation for \(\tilde{\mathbf{M}}=N\mathbf{M}\) is

where

$$k(\lambda)=\bigl\lvert ( 1-\lambda ) ( \mathbf{M}_{1} - \lambda\mathbf{I}_{n} ) -\mathbf{M}_{2}\mathbf{M}_{2}^{T} \bigr\rvert . $$

Thus \(\operatorname{ch}_{\max}(\mathbf{M})=\max(1,\lambda_{\max})/N\), where λ max is the largest zero of k(λ). The factorization

of M 2 into a product of an n×(p+1) matrix with a (p+1)×(Nn) matrix shows that the rank of M 2 cannot exceed p+1. Thus, since p+1<n, we have that \(k(1)=(-1)^{n}|\mathbf {M}_{2}\mathbf{M}_{2}^{T}|=0\), and hence λ max≥1 and

$$\operatorname{ch}_{\max}(\mathbf{M})=N^{-1}\lambda_{\max}. $$

Now note that the characteristic polynomial of \(\hat{\mathbf{M}}=N\mathbf{M}^{\mathbf{\ast}}\) is

hence \(\operatorname{ch}_{\max}(\hat{\mathbf{M}})=\lambda_{\max}\) and the result follows. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Welsh, A.H., Wiens, D.P. Robust model-based sampling designs. Stat Comput 23, 689–701 (2013). https://doi.org/10.1007/s11222-012-9339-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9339-3

Keywords

Navigation