Kernel-based methods for combining information of several frame surveys

Sánchez-Borrego, I.; Arcos, A.; Rueda, M.

doi:10.1007/s00184-018-0686-8

Kernel-based methods for combining information of several frame surveys

Published: 03 October 2018

Volume 82, pages 71–86, (2019)
Cite this article

Metrika Aims and scope Submit manuscript

I. Sánchez-Borrego¹,
A. Arcos¹ &
M. Rueda¹

156 Accesses
1 Citation
Explore all metrics

Abstract

A sample selected from a single sampling frame may not represent adequatly the entire population. Multiple frame surveys are becoming increasingly used and popular among statistical agencies and private organizations, in particular in situations where several sampling frames may provide better coverage or can reduce sampling costs for estimating population quantities of interest. Auxiliary information available at the population level is often categorical in nature, so that incorporating categorical and continuous information can improve the efficiency of the method of estimation. Nonparametric regression methods represent a widely used and flexible estimation approach in the survey context. We propose a kernel regression estimator for dual frame surveys that can handle both continuous and categorical data. This methodology is extended to multiple frame surveys. We derive theoretical properties of the proposed methods and numerical experiments indicate that the proposed estimator perform well in practical settings under different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel density estimation from complex surveys in the presence of complete auxiliary information

Article 01 January 2019

Calibration estimation in dual-frame surveys

Article 01 September 2015

Population empirical likelihood estimation in dual frame surveys

Article 05 August 2020

References

Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63:413–420
Article MathSciNet MATH Google Scholar
Arcos A, Molina D, Rueda M, Ranalli M (2015) Frames2: a package for estimation in dual frame surveys. R J 7(1):52–72
Google Scholar
Bankier M (1986) Estimators based on several stratified samples with applications to multiple frame surveys. J Am Stat Assoc 81:1074–1079
Article MATH Google Scholar
Breidt FJ, Opsomer J (2000) Local polynomial regression estimators in survey sampling. Ann Stat 28:1026–1053
Article MATH Google Scholar
Breidt FJ, Opsomer J (2009) Nonparametric and semiparametric estimation in complex surveys. In: Rao C, Pfeffermann D (eds) Handbook of statistics, vol 29. Part B: sample surveys: theory, methods and inference. North Holland, Amsterdam, pp 103–119
Google Scholar
Brick JM, Dipko S, Presser S, Tucker C, Yuan Y (2006) Nonresponse bias in a dual frame survey of cell and landline numbers. Public Opin Q 70:780–793
Article Google Scholar
Epanechnikov VA (1969) Non-parametric estimation of a multivariate probability density. Theor Probab Appl 14(1):153–158
Article MathSciNet Google Scholar
Fan I, Gijbels I (1996) Local polynomial modelling and its applications. Chapman & Hall, London
MATH Google Scholar
Fuller W, Burmeister L (1972) Estimators for samples selected from two overlapping frames. In: Proceedings of social science section of The American Statistical Asociation, pp 101–102
Hartley HO (1962) Multiple frame surveys. In: Proceedings of the social statistics section, American Statistical Association, pp 203–206
Hartley HO (1974) Multiple frame methodology and selected applications. Sankhya 36:99–118
Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Article MathSciNet MATH Google Scholar
Iachan R, Dennis ML (1993) A multiple frame approach to sampling the homeless and transient population. J Off Stat 9:747–764
Google Scholar
Kalton G, Anderson D (1986) Sampling rare populations. J R Stat Soc Ser A (General) 149(1):65–82
Article Google Scholar
Kuo L (1988) Classical and prediction approaches to estimating distribution functions from survey data. In: ASA Proceedings of the section on survey re-search methods, American Statistical Association, vol 420, pp 280–285
Lohr SL (2011) Alternative survey sample designs: sampling with multiple overlapping frames. Surv Methodol 37(2):197–213
Google Scholar
Lohr SL (2009) Multiple frame surveys. In: Pfeffermann D, Rao C (eds) Handbook of statistics, vol 29. Part A: sample surveys: design, methods and applications. North Holland, Amsterdam, pp 71–78
Google Scholar
Lohr S, Rao J (2006) Estimation in multiple-frame surveys. J Am Stat Assoc 101(475):1019–1030
Article MathSciNet MATH Google Scholar
Mecatti F, Singh AC (2014) Estimation in multiple frame surveys: a simplified and unified review using the multiplicity approach. J-SFdS 155(5):51–69
MathSciNet MATH Google Scholar
Mecatti F (2007) A single frame multiplicity estimator for multiple frame surveys. Surv Methodol 33(2):151–157
Google Scholar
Metcalf P, Scott A (2009) Using multiple frames in health surveys. Stat Med 28:1512–1523
Article MathSciNet Google Scholar
Montanari GE, Ranalli MG (2005) Nonparametric model calibration estimation in survey sampling. J Am Stat Assoc 100(472):1429–1442
Article MathSciNet MATH Google Scholar
Nadaraya EA (1964) On estimating regression. Theor Probab Appl 9(1):141–142
Article MATH Google Scholar
Ranalli MG, Arcos A, Rueda M, Teodoro A (2016) Calibration estimation in dual-frame surveys. Stat Method Appl 25(3):321–349
Article MathSciNet MATH Google Scholar
Rao J, Wu C (2010) Pseudoempirical likelihood inference for multiple frame surveys. J Am Stat Assoc 105(492):1494–1503
Article MATH Google Scholar
Rueda M, Sánchez-Borrego I (2009) A predictive estimator of finite population mean using nonparametric regression. Comput Stat 24:1–14
Article MathSciNet MATH Google Scholar
Rueda M, Arcos A, Molina D, Ranalli M (2017) Estimation techniques for ordinal data in multiple frame surveys with complex sampling designs. Int Stat Rev 86:51–67. https://doi.org/10.1111/insr.12218 (in press)
Article MathSciNet Google Scholar
Sánchez-Borrego I, Opsomer J, Rueda M, Arcos A (2014) Nonparametric estimation with mixed data types in survey sampling. Rev Mat Complut 27(2):685–700
Article MathSciNet MATH Google Scholar
Singh AC, Mecatti F (2011) Generalized multiplicity-adjusted Horvitz–Thompson estimation as a unified approach to multiple frame surveys. J Off Stat 27(4):633–650
Google Scholar
Singh AC, Mecatti F (2014) Estimation in multiple frame surveys: a simplified and unified review using the multiplicity approach. SFdS 155:28–50
MathSciNet MATH Google Scholar
Skinner C (1991) On the eficiency of raking ratio estimation for multiple frame surveys. J Am Stat Assoc 86(415):779–784
Article MATH Google Scholar
Skinner C, Rao J (1996) Estimation in dual frame surveys with complex designs. J Am Stat Assoc 91(433):349–356
Article MathSciNet MATH Google Scholar
Watson GS (1964) Smooth regression analysis. Sankhya Ser A 26(4):359–372
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported by Ministerio de Economía y Competitividad. Grant number [MTM2015-63609-R] and by Consejería de Economía, Innovación, Ciencia y Empleo (Grant SEJ2954, Junta de Andalucía, Spain).

Author information

Authors and Affiliations

Department of Statistics and Operational Research, Faculty of Science, University of Granada, Campus de Fuentenueva, s/n, 18071, Granada, Spain
I. Sánchez-Borrego, A. Arcos & M. Rueda

Authors

I. Sánchez-Borrego
View author publications
You can also search for this author in PubMed Google Scholar
A. Arcos
View author publications
You can also search for this author in PubMed Google Scholar
M. Rueda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. Sánchez-Borrego.

Appendix

1.1 Proof of Theorem 1.

We write

$$\begin{aligned} {\hat{y}}_{\mathrm{np}}^{\mathrm{d}}= & {} \sum _{j \in a}{\widehat{m}}_{j}^A + \sum _{j \in s_a} \left( y_j - {\widehat{m}}_{j}^A \right) w_{A_j} + \sum _{j \in b}{\widehat{m}}_{j}^B \nonumber \\&\quad + \sum _{j \in s_b} \left( y_j - {\widehat{m}}_{j}^B \right) w_{B_j} + \eta \left( \sum _{j \in ab}{\widehat{m}}_{j}^A+ \sum _{j \in s_{ab}} \left( y_j - {\widehat{m}}_{j}^A \right) w_{A_j}\right) \nonumber \\&\quad + (1-\eta )\left( \sum _{j \in ba}{\widehat{m}}_{j}^B + \sum _{j \in s_{ba}} \left( y_j - {\widehat{m}}_{j}^B\right) w_{B_j}\right) . \end{aligned}$$

(20)

Let ${\hat{y}}_{\mathrm{np}}^{a}$ denote the terms $\sum _{j \in a}{\widehat{m}}_{j}^A + \sum _{j \in s_a} (y_j - {\widehat{m}}_{j}^A ) w_{A_j}$. Similarly, ${\hat{y}}_{\mathrm{np}}^{b}$, ${\hat{y}}_\mathrm{np}^{ab}$ and ${\hat{y}}_{\mathrm{np}}^{ba}$ denote the corresponding terms on expansion (20).

We write

$$\begin{aligned} \left( {\hat{y}}_{\mathrm{np}}^{\mathrm{d}}-Y\right)= & {} \left( {\hat{y}}_\mathrm{np}^{\mathrm{a}}-Y_{a}\right) +\left( {\hat{y}}_{\mathrm{np}}^\mathrm{b}-Y_{b}\right) \nonumber \\&+ \eta \left( {\hat{y}}_{\mathrm{np}}^\mathrm{ab}-Y_{ab}\right) +(1-\eta ) \left( {\hat{y}}_{\mathrm{np}}^\mathrm{ba}-Y_{ba}\right) . \end{aligned}$$

(21)

Under (A1), (A2) and (A4)–(A7), $\lambda $ satisfying $0 \le \lambda \le 1$ and taking expectations, Theorem 1 in Breidt and Opsomer (2000) holds for $\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{a}} - Y_{a}\right) $ and the same applies to terms $\left( {\hat{y}}_\mathrm{np}^{\mathrm{b}} - Y_{b}\right) $, $\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{ab}} - Y_{ab}\right) $ and $\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{ba}} - Y_{ba}\right) $. Then, the result follows.

1.2 Proof of Theorem 2.

We write

$$\begin{aligned} \left( {\hat{y}}_{\mathrm{np}}^{\mathrm{d}}-Y \right)= & {} \sum _{j \in a} \big (y_j - {\widehat{m}}_{j}^A \big )( w_{A_j}I_{js_A} -1) + \sum _{j \in ab} \big (y_j - {\widehat{m}}_{j}^A \big )(w_{A_j}I_{js_A} -1)\eta \nonumber \\&\quad + \sum _{j \in b} \big (y_j - {\widehat{m}}_{j}^B \big )( w_{B_j}I_{js_B} -1) + \sum _{j \in ba} \big (y_j - {\widehat{m}}_{j}^B \big )(w_{B_j}I_{js_B} -1)(1-\eta ), \end{aligned}$$

where $I_{js_A}=1$ if $j \in s_A$ and $I_{js_A}=0 $ otherwise, and $I_{js_B}=1$ if $j \in s_B$ and $I_{js_B}=0 $.

The design variance is given by

$$\begin{aligned} V_d({\hat{y}}^{d}_{\mathrm{np}})= & {} E_{dA} \left( \sum _{j \in a} \big (y_j - {\widehat{m}}_{j}^A \big )( w_{A_j}I_{js_A} -1) + \sum _{j \in ab} \big (y_j - {\widehat{m}}_{j}^A \big )(w_{A_j}I_{js_A} -1)\eta \right) ^2 \nonumber \\&\quad + E_{dB} \left( \sum _{j \in b} \big (y_j - {\widehat{m}}_{j}^B \big )( w_{B_j}I_{js_B} -1) + \sum _{j \in ba} \big (y_j - {\widehat{m}}_{j}^B \big )(w_{B_j}I_{js_B} -1)(1-\eta ) \right) ^2 \nonumber \\= & {} E_{dA} \left( \sum _{j \in A} \big (y_j - {\widehat{m}}_{j}^A \big )( w_{A_j}I_{js_A} -1)\eta _{A_j}\right) ^2\\&\quad +E_{dB} \left( \sum _{j \in B} \big (y_j - {\widehat{m}}_{j}^B \big )( w_{B_j}I_{js_B} -1)(\eta _{B_j})\right) ^2 , \end{aligned}$$

because the sampling designs $d_A$ and $d_B$ are independent. Let

$$\begin{aligned}&c_A= \sum _{j \in A} (y_j-m_j)( w_{A_j}I_{js_A}-1)\eta _{A_j},&c_B= \sum _{j \in B} (y_j-m_j)( w_{B_j}I_{js_B}-1)\eta _{B_j}, \nonumber \\&t_A= \sum _{j \in A} \big (m_j- {\widehat{m}}_{j}^A\big )( w_{A_j}I_{js_A}-1)\eta _{A_j}&\text {and } t_B= \sum _{j \in B} \big (m_j- {\widehat{m}}_{j}^B\big )( w_{B_j}I_{js_B}-1)\eta _{B_j}. \end{aligned}$$

Then

$$\begin{aligned}&E_{dA} \left( \sum _{j \in A} \big (y_k - {\widehat{m}}_{k}^A \big )( w_{A_j}I_{js_A} -1)\eta _{A_j}\right) ^2 = E_{dA} (c_A +t_A)^2 \nonumber \\&\quad = E_{dA}(c_A^2) +E_{dA}(t_A^2)+ 2E_{dA}(t_A c_A) = E_{dA}(c_A^2) +o(1), \end{aligned}$$

(22)

because of lemma 5 in Breidt and Opsomer (2000), $E_{dA}(t_A^2)=o(1)$, so that $E_{dA}(t_A c_A) \le $$(E_{dA}(t_A^2) E_{dA}(c_A^2))^{1/2} =o(1)$.

Similarly $E_{dB}(t_B c_B)= E_{dB}(c_B^2) +o(1)$.

We have thus that the asymptotic variance of the estimator is given by

$$\begin{aligned} AV_d({\hat{y}}_{\mathrm{np}}^{\mathrm{d}})= E_{dA}(c_A^2)+ E_{dB}(c_B^2). \end{aligned}$$

By using the properties of the Horvitz–Thompson estimator (Horvitz and Thompson 1952) we can deduce

$$\begin{aligned} E_{dA}(c_A^2)= & {} \sum _{k,j \in A} (y_k-m_k)(y_j-m_j) \left( \sum _{s_A \ni k,j} w_{A_k}w_{A_j} (p_d(s_A) -1) \eta _{A_k}\eta _{A_j} \right) \nonumber \\= & {} \sum _{k,j \in A} (y_k-m_k)(y_j-m_j) (w_{A_k}w_{A_j} \pi _{A_{kj}} -1) \eta _{A_k}\eta _{A_j}. \end{aligned}$$

(23)

Using Theorem 3 of Breidt and Opsomer (2000) for the sampling design $d_A$, we obtain that an unbiased estimator of this variance is given by

$$\begin{aligned} \displaystyle \sum _{k,j \in s_A} \frac{\big (y_k - {\widehat{m}}_{k}^A \big )\big (y_j - {\widehat{m}}_{j}^A \big ) \left( w_{A_k}w_{A_j} \pi _{A_{kj}} -1\right) \eta _{A_k}\eta _{A_j} }{\pi _{A_{kj}}}. \end{aligned}$$

A similar expression can be derived for $E_{dB}(c_B^2)$ and then, the result follows.

1.3 Proof of Theorem 3 and 4.

Proofs of Theorems 3 and 4 are similar to proofs of Theorems 1 and 2; the value of $\eta _{A_j}$ and $\eta _{B_j}$ in frames A and B is now assumed by the factors $\frac{1}{mu_j}$ for each frame.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sánchez-Borrego, I., Arcos, A. & Rueda, M. Kernel-based methods for combining information of several frame surveys. Metrika 82, 71–86 (2019). https://doi.org/10.1007/s00184-018-0686-8

Download citation

Received: 14 February 2018
Published: 03 October 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s00184-018-0686-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel-based methods for combining information of several frame surveys

Abstract

Access this article

Similar content being viewed by others

Kernel density estimation from complex surveys in the presence of complete auxiliary information

Calibration estimation in dual-frame surveys

Population empirical likelihood estimation in dual frame surveys

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Theorem 1.

1.2 Proof of Theorem 2.

1.3 Proof of Theorem 3 and 4.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel-based methods for combining information of several frame surveys

Abstract

Access this article

Similar content being viewed by others

Kernel density estimation from complex surveys in the presence of complete auxiliary information

Calibration estimation in dual-frame surveys

Population empirical likelihood estimation in dual frame surveys

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Theorem 1.

1.2 Proof of Theorem 2.

1.3 Proof of Theorem 3 and 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation