Skip to main content
Log in

Kernel-based methods for combining information of several frame surveys

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

A sample selected from a single sampling frame may not represent adequatly the entire population. Multiple frame surveys are becoming increasingly used and popular among statistical agencies and private organizations, in particular in situations where several sampling frames may provide better coverage or can reduce sampling costs for estimating population quantities of interest. Auxiliary information available at the population level is often categorical in nature, so that incorporating categorical and continuous information can improve the efficiency of the method of estimation. Nonparametric regression methods represent a widely used and flexible estimation approach in the survey context. We propose a kernel regression estimator for dual frame surveys that can handle both continuous and categorical data. This methodology is extended to multiple frame surveys. We derive theoretical properties of the proposed methods and numerical experiments indicate that the proposed estimator perform well in practical settings under different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63:413–420

    Article  MathSciNet  MATH  Google Scholar 

  • Arcos A, Molina D, Rueda M, Ranalli M (2015) Frames2: a package for estimation in dual frame surveys. R J 7(1):52–72

    Google Scholar 

  • Bankier M (1986) Estimators based on several stratified samples with applications to multiple frame surveys. J Am Stat Assoc 81:1074–1079

    Article  MATH  Google Scholar 

  • Breidt FJ, Opsomer J (2000) Local polynomial regression estimators in survey sampling. Ann Stat 28:1026–1053

    Article  MATH  Google Scholar 

  • Breidt FJ, Opsomer J (2009) Nonparametric and semiparametric estimation in complex surveys. In: Rao C, Pfeffermann D (eds) Handbook of statistics, vol 29. Part B: sample surveys: theory, methods and inference. North Holland, Amsterdam, pp 103–119

    Google Scholar 

  • Brick JM, Dipko S, Presser S, Tucker C, Yuan Y (2006) Nonresponse bias in a dual frame survey of cell and landline numbers. Public Opin Q 70:780–793

    Article  Google Scholar 

  • Epanechnikov VA (1969) Non-parametric estimation of a multivariate probability density. Theor Probab Appl 14(1):153–158

    Article  MathSciNet  Google Scholar 

  • Fan I, Gijbels I (1996) Local polynomial modelling and its applications. Chapman & Hall, London

    MATH  Google Scholar 

  • Fuller W, Burmeister L (1972) Estimators for samples selected from two overlapping frames. In: Proceedings of social science section of The American Statistical Asociation, pp 101–102

  • Hartley HO (1962) Multiple frame surveys. In: Proceedings of the social statistics section, American Statistical Association, pp 203–206

  • Hartley HO (1974) Multiple frame methodology and selected applications. Sankhya 36:99–118

    Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MathSciNet  MATH  Google Scholar 

  • Iachan R, Dennis ML (1993) A multiple frame approach to sampling the homeless and transient population. J Off Stat 9:747–764

    Google Scholar 

  • Kalton G, Anderson D (1986) Sampling rare populations. J R Stat Soc Ser A (General) 149(1):65–82

    Article  Google Scholar 

  • Kuo L (1988) Classical and prediction approaches to estimating distribution functions from survey data. In: ASA Proceedings of the section on survey re-search methods, American Statistical Association, vol 420, pp 280–285

  • Lohr SL (2011) Alternative survey sample designs: sampling with multiple overlapping frames. Surv Methodol 37(2):197–213

    Google Scholar 

  • Lohr SL (2009) Multiple frame surveys. In: Pfeffermann D, Rao C (eds) Handbook of statistics, vol 29. Part A: sample surveys: design, methods and applications. North Holland, Amsterdam, pp 71–78

    Google Scholar 

  • Lohr S, Rao J (2006) Estimation in multiple-frame surveys. J Am Stat Assoc 101(475):1019–1030

    Article  MathSciNet  MATH  Google Scholar 

  • Mecatti F, Singh AC (2014) Estimation in multiple frame surveys: a simplified and unified review using the multiplicity approach. J-SFdS 155(5):51–69

    MathSciNet  MATH  Google Scholar 

  • Mecatti F (2007) A single frame multiplicity estimator for multiple frame surveys. Surv Methodol 33(2):151–157

    Google Scholar 

  • Metcalf P, Scott A (2009) Using multiple frames in health surveys. Stat Med 28:1512–1523

    Article  MathSciNet  Google Scholar 

  • Montanari GE, Ranalli MG (2005) Nonparametric model calibration estimation in survey sampling. J Am Stat Assoc 100(472):1429–1442

    Article  MathSciNet  MATH  Google Scholar 

  • Nadaraya EA (1964) On estimating regression. Theor Probab Appl 9(1):141–142

    Article  MATH  Google Scholar 

  • Ranalli MG, Arcos A, Rueda M, Teodoro A (2016) Calibration estimation in dual-frame surveys. Stat Method Appl 25(3):321–349

    Article  MathSciNet  MATH  Google Scholar 

  • Rao J, Wu C (2010) Pseudoempirical likelihood inference for multiple frame surveys. J Am Stat Assoc 105(492):1494–1503

    Article  MATH  Google Scholar 

  • Rueda M, Sánchez-Borrego I (2009) A predictive estimator of finite population mean using nonparametric regression. Comput Stat 24:1–14

    Article  MathSciNet  MATH  Google Scholar 

  • Rueda M, Arcos A, Molina D, Ranalli M (2017) Estimation techniques for ordinal data in multiple frame surveys with complex sampling designs. Int Stat Rev 86:51–67. https://doi.org/10.1111/insr.12218 (in press)

    Article  MathSciNet  Google Scholar 

  • Sánchez-Borrego I, Opsomer J, Rueda M, Arcos A (2014) Nonparametric estimation with mixed data types in survey sampling. Rev Mat Complut 27(2):685–700

    Article  MathSciNet  MATH  Google Scholar 

  • Singh AC, Mecatti F (2011) Generalized multiplicity-adjusted Horvitz–Thompson estimation as a unified approach to multiple frame surveys. J Off Stat 27(4):633–650

    Google Scholar 

  • Singh AC, Mecatti F (2014) Estimation in multiple frame surveys: a simplified and unified review using the multiplicity approach. SFdS 155:28–50

    MathSciNet  MATH  Google Scholar 

  • Skinner C (1991) On the eficiency of raking ratio estimation for multiple frame surveys. J Am Stat Assoc 86(415):779–784

    Article  MATH  Google Scholar 

  • Skinner C, Rao J (1996) Estimation in dual frame surveys with complex designs. J Am Stat Assoc 91(433):349–356

    Article  MathSciNet  MATH  Google Scholar 

  • Watson GS (1964) Smooth regression analysis. Sankhya Ser A 26(4):359–372

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by Ministerio de Economía y Competitividad. Grant number [MTM2015-63609-R] and by Consejería de Economía, Innovación, Ciencia y Empleo (Grant SEJ2954, Junta de Andalucía, Spain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. Sánchez-Borrego.

Appendix

Appendix

1.1 Proof of Theorem 1.

We write

$$\begin{aligned} {\hat{y}}_{\mathrm{np}}^{\mathrm{d}}= & {} \sum _{j \in a}{\widehat{m}}_{j}^A + \sum _{j \in s_a} \left( y_j - {\widehat{m}}_{j}^A \right) w_{A_j} + \sum _{j \in b}{\widehat{m}}_{j}^B \nonumber \\&\quad + \sum _{j \in s_b} \left( y_j - {\widehat{m}}_{j}^B \right) w_{B_j} + \eta \left( \sum _{j \in ab}{\widehat{m}}_{j}^A+ \sum _{j \in s_{ab}} \left( y_j - {\widehat{m}}_{j}^A \right) w_{A_j}\right) \nonumber \\&\quad + (1-\eta )\left( \sum _{j \in ba}{\widehat{m}}_{j}^B + \sum _{j \in s_{ba}} \left( y_j - {\widehat{m}}_{j}^B\right) w_{B_j}\right) . \end{aligned}$$
(20)

Let \({\hat{y}}_{\mathrm{np}}^{a}\) denote the terms \(\sum _{j \in a}{\widehat{m}}_{j}^A + \sum _{j \in s_a} (y_j - {\widehat{m}}_{j}^A ) w_{A_j}\). Similarly, \({\hat{y}}_{\mathrm{np}}^{b}\), \({\hat{y}}_\mathrm{np}^{ab}\) and \({\hat{y}}_{\mathrm{np}}^{ba}\) denote the corresponding terms on expansion (20).

We write

$$\begin{aligned} \left( {\hat{y}}_{\mathrm{np}}^{\mathrm{d}}-Y\right)= & {} \left( {\hat{y}}_\mathrm{np}^{\mathrm{a}}-Y_{a}\right) +\left( {\hat{y}}_{\mathrm{np}}^\mathrm{b}-Y_{b}\right) \nonumber \\&+ \eta \left( {\hat{y}}_{\mathrm{np}}^\mathrm{ab}-Y_{ab}\right) +(1-\eta ) \left( {\hat{y}}_{\mathrm{np}}^\mathrm{ba}-Y_{ba}\right) . \end{aligned}$$
(21)

Under (A1), (A2) and (A4)–(A7), \(\lambda \) satisfying \(0 \le \lambda \le 1\) and taking expectations, Theorem 1 in Breidt and Opsomer (2000) holds for \(\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{a}} - Y_{a}\right) \) and the same applies to terms \(\left( {\hat{y}}_\mathrm{np}^{\mathrm{b}} - Y_{b}\right) \), \(\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{ab}} - Y_{ab}\right) \) and \(\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{ba}} - Y_{ba}\right) \). Then, the result follows.

1.2 Proof of Theorem 2.

We write

$$\begin{aligned} \left( {\hat{y}}_{\mathrm{np}}^{\mathrm{d}}-Y \right)= & {} \sum _{j \in a} \big (y_j - {\widehat{m}}_{j}^A \big )( w_{A_j}I_{js_A} -1) + \sum _{j \in ab} \big (y_j - {\widehat{m}}_{j}^A \big )(w_{A_j}I_{js_A} -1)\eta \nonumber \\&\quad + \sum _{j \in b} \big (y_j - {\widehat{m}}_{j}^B \big )( w_{B_j}I_{js_B} -1) + \sum _{j \in ba} \big (y_j - {\widehat{m}}_{j}^B \big )(w_{B_j}I_{js_B} -1)(1-\eta ), \end{aligned}$$

where \(I_{js_A}=1\) if \(j \in s_A\) and \(I_{js_A}=0 \) otherwise, and \(I_{js_B}=1\) if \(j \in s_B\) and \(I_{js_B}=0 \).

The design variance is given by

$$\begin{aligned} V_d({\hat{y}}^{d}_{\mathrm{np}})= & {} E_{dA} \left( \sum _{j \in a} \big (y_j - {\widehat{m}}_{j}^A \big )( w_{A_j}I_{js_A} -1) + \sum _{j \in ab} \big (y_j - {\widehat{m}}_{j}^A \big )(w_{A_j}I_{js_A} -1)\eta \right) ^2 \nonumber \\&\quad + E_{dB} \left( \sum _{j \in b} \big (y_j - {\widehat{m}}_{j}^B \big )( w_{B_j}I_{js_B} -1) + \sum _{j \in ba} \big (y_j - {\widehat{m}}_{j}^B \big )(w_{B_j}I_{js_B} -1)(1-\eta ) \right) ^2 \nonumber \\= & {} E_{dA} \left( \sum _{j \in A} \big (y_j - {\widehat{m}}_{j}^A \big )( w_{A_j}I_{js_A} -1)\eta _{A_j}\right) ^2\\&\quad +E_{dB} \left( \sum _{j \in B} \big (y_j - {\widehat{m}}_{j}^B \big )( w_{B_j}I_{js_B} -1)(\eta _{B_j})\right) ^2 , \end{aligned}$$

because the sampling designs \(d_A\) and \(d_B\) are independent. Let

$$\begin{aligned}&c_A= \sum _{j \in A} (y_j-m_j)( w_{A_j}I_{js_A}-1)\eta _{A_j},&c_B= \sum _{j \in B} (y_j-m_j)( w_{B_j}I_{js_B}-1)\eta _{B_j}, \nonumber \\&t_A= \sum _{j \in A} \big (m_j- {\widehat{m}}_{j}^A\big )( w_{A_j}I_{js_A}-1)\eta _{A_j}&\text {and } t_B= \sum _{j \in B} \big (m_j- {\widehat{m}}_{j}^B\big )( w_{B_j}I_{js_B}-1)\eta _{B_j}. \end{aligned}$$

Then

$$\begin{aligned}&E_{dA} \left( \sum _{j \in A} \big (y_k - {\widehat{m}}_{k}^A \big )( w_{A_j}I_{js_A} -1)\eta _{A_j}\right) ^2 = E_{dA} (c_A +t_A)^2 \nonumber \\&\quad = E_{dA}(c_A^2) +E_{dA}(t_A^2)+ 2E_{dA}(t_A c_A) = E_{dA}(c_A^2) +o(1), \end{aligned}$$
(22)

because of lemma 5 in Breidt and Opsomer (2000), \(E_{dA}(t_A^2)=o(1)\), so that \(E_{dA}(t_A c_A) \le \)\((E_{dA}(t_A^2) E_{dA}(c_A^2))^{1/2} =o(1)\).

Similarly \(E_{dB}(t_B c_B)= E_{dB}(c_B^2) +o(1)\).

We have thus that the asymptotic variance of the estimator is given by

$$\begin{aligned} AV_d({\hat{y}}_{\mathrm{np}}^{\mathrm{d}})= E_{dA}(c_A^2)+ E_{dB}(c_B^2). \end{aligned}$$

By using the properties of the Horvitz–Thompson estimator (Horvitz and Thompson 1952) we can deduce

$$\begin{aligned} E_{dA}(c_A^2)= & {} \sum _{k,j \in A} (y_k-m_k)(y_j-m_j) \left( \sum _{s_A \ni k,j} w_{A_k}w_{A_j} (p_d(s_A) -1) \eta _{A_k}\eta _{A_j} \right) \nonumber \\= & {} \sum _{k,j \in A} (y_k-m_k)(y_j-m_j) (w_{A_k}w_{A_j} \pi _{A_{kj}} -1) \eta _{A_k}\eta _{A_j}. \end{aligned}$$
(23)

Using Theorem 3 of Breidt and Opsomer (2000) for the sampling design \(d_A\), we obtain that an unbiased estimator of this variance is given by

$$\begin{aligned} \displaystyle \sum _{k,j \in s_A} \frac{\big (y_k - {\widehat{m}}_{k}^A \big )\big (y_j - {\widehat{m}}_{j}^A \big ) \left( w_{A_k}w_{A_j} \pi _{A_{kj}} -1\right) \eta _{A_k}\eta _{A_j} }{\pi _{A_{kj}}}. \end{aligned}$$

A similar expression can be derived for \(E_{dB}(c_B^2)\) and then, the result follows.

1.3 Proof of Theorem 3 and 4.

Proofs of Theorems 3 and 4 are similar to proofs of Theorems 1 and 2; the value of \(\eta _{A_j}\) and \(\eta _{B_j}\) in frames A and B is now assumed by the factors \(\frac{1}{mu_j}\) for each frame.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sánchez-Borrego, I., Arcos, A. & Rueda, M. Kernel-based methods for combining information of several frame surveys. Metrika 82, 71–86 (2019). https://doi.org/10.1007/s00184-018-0686-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-018-0686-8

Keywords

Navigation