Abstract
A sample selected from a single sampling frame may not represent adequatly the entire population. Multiple frame surveys are becoming increasingly used and popular among statistical agencies and private organizations, in particular in situations where several sampling frames may provide better coverage or can reduce sampling costs for estimating population quantities of interest. Auxiliary information available at the population level is often categorical in nature, so that incorporating categorical and continuous information can improve the efficiency of the method of estimation. Nonparametric regression methods represent a widely used and flexible estimation approach in the survey context. We propose a kernel regression estimator for dual frame surveys that can handle both continuous and categorical data. This methodology is extended to multiple frame surveys. We derive theoretical properties of the proposed methods and numerical experiments indicate that the proposed estimator perform well in practical settings under different scenarios.
Similar content being viewed by others
References
Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63:413–420
Arcos A, Molina D, Rueda M, Ranalli M (2015) Frames2: a package for estimation in dual frame surveys. R J 7(1):52–72
Bankier M (1986) Estimators based on several stratified samples with applications to multiple frame surveys. J Am Stat Assoc 81:1074–1079
Breidt FJ, Opsomer J (2000) Local polynomial regression estimators in survey sampling. Ann Stat 28:1026–1053
Breidt FJ, Opsomer J (2009) Nonparametric and semiparametric estimation in complex surveys. In: Rao C, Pfeffermann D (eds) Handbook of statistics, vol 29. Part B: sample surveys: theory, methods and inference. North Holland, Amsterdam, pp 103–119
Brick JM, Dipko S, Presser S, Tucker C, Yuan Y (2006) Nonresponse bias in a dual frame survey of cell and landline numbers. Public Opin Q 70:780–793
Epanechnikov VA (1969) Non-parametric estimation of a multivariate probability density. Theor Probab Appl 14(1):153–158
Fan I, Gijbels I (1996) Local polynomial modelling and its applications. Chapman & Hall, London
Fuller W, Burmeister L (1972) Estimators for samples selected from two overlapping frames. In: Proceedings of social science section of The American Statistical Asociation, pp 101–102
Hartley HO (1962) Multiple frame surveys. In: Proceedings of the social statistics section, American Statistical Association, pp 203–206
Hartley HO (1974) Multiple frame methodology and selected applications. Sankhya 36:99–118
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Iachan R, Dennis ML (1993) A multiple frame approach to sampling the homeless and transient population. J Off Stat 9:747–764
Kalton G, Anderson D (1986) Sampling rare populations. J R Stat Soc Ser A (General) 149(1):65–82
Kuo L (1988) Classical and prediction approaches to estimating distribution functions from survey data. In: ASA Proceedings of the section on survey re-search methods, American Statistical Association, vol 420, pp 280–285
Lohr SL (2011) Alternative survey sample designs: sampling with multiple overlapping frames. Surv Methodol 37(2):197–213
Lohr SL (2009) Multiple frame surveys. In: Pfeffermann D, Rao C (eds) Handbook of statistics, vol 29. Part A: sample surveys: design, methods and applications. North Holland, Amsterdam, pp 71–78
Lohr S, Rao J (2006) Estimation in multiple-frame surveys. J Am Stat Assoc 101(475):1019–1030
Mecatti F, Singh AC (2014) Estimation in multiple frame surveys: a simplified and unified review using the multiplicity approach. J-SFdS 155(5):51–69
Mecatti F (2007) A single frame multiplicity estimator for multiple frame surveys. Surv Methodol 33(2):151–157
Metcalf P, Scott A (2009) Using multiple frames in health surveys. Stat Med 28:1512–1523
Montanari GE, Ranalli MG (2005) Nonparametric model calibration estimation in survey sampling. J Am Stat Assoc 100(472):1429–1442
Nadaraya EA (1964) On estimating regression. Theor Probab Appl 9(1):141–142
Ranalli MG, Arcos A, Rueda M, Teodoro A (2016) Calibration estimation in dual-frame surveys. Stat Method Appl 25(3):321–349
Rao J, Wu C (2010) Pseudoempirical likelihood inference for multiple frame surveys. J Am Stat Assoc 105(492):1494–1503
Rueda M, Sánchez-Borrego I (2009) A predictive estimator of finite population mean using nonparametric regression. Comput Stat 24:1–14
Rueda M, Arcos A, Molina D, Ranalli M (2017) Estimation techniques for ordinal data in multiple frame surveys with complex sampling designs. Int Stat Rev 86:51–67. https://doi.org/10.1111/insr.12218 (in press)
Sánchez-Borrego I, Opsomer J, Rueda M, Arcos A (2014) Nonparametric estimation with mixed data types in survey sampling. Rev Mat Complut 27(2):685–700
Singh AC, Mecatti F (2011) Generalized multiplicity-adjusted Horvitz–Thompson estimation as a unified approach to multiple frame surveys. J Off Stat 27(4):633–650
Singh AC, Mecatti F (2014) Estimation in multiple frame surveys: a simplified and unified review using the multiplicity approach. SFdS 155:28–50
Skinner C (1991) On the eficiency of raking ratio estimation for multiple frame surveys. J Am Stat Assoc 86(415):779–784
Skinner C, Rao J (1996) Estimation in dual frame surveys with complex designs. J Am Stat Assoc 91(433):349–356
Watson GS (1964) Smooth regression analysis. Sankhya Ser A 26(4):359–372
Acknowledgements
This research was supported by Ministerio de Economía y Competitividad. Grant number [MTM2015-63609-R] and by Consejería de Economía, Innovación, Ciencia y Empleo (Grant SEJ2954, Junta de Andalucía, Spain).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Theorem 1.
We write
Let \({\hat{y}}_{\mathrm{np}}^{a}\) denote the terms \(\sum _{j \in a}{\widehat{m}}_{j}^A + \sum _{j \in s_a} (y_j - {\widehat{m}}_{j}^A ) w_{A_j}\). Similarly, \({\hat{y}}_{\mathrm{np}}^{b}\), \({\hat{y}}_\mathrm{np}^{ab}\) and \({\hat{y}}_{\mathrm{np}}^{ba}\) denote the corresponding terms on expansion (20).
We write
Under (A1), (A2) and (A4)–(A7), \(\lambda \) satisfying \(0 \le \lambda \le 1\) and taking expectations, Theorem 1 in Breidt and Opsomer (2000) holds for \(\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{a}} - Y_{a}\right) \) and the same applies to terms \(\left( {\hat{y}}_\mathrm{np}^{\mathrm{b}} - Y_{b}\right) \), \(\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{ab}} - Y_{ab}\right) \) and \(\left( {\hat{y}}_{\mathrm{np}}^{\mathrm{ba}} - Y_{ba}\right) \). Then, the result follows.
1.2 Proof of Theorem 2.
We write
where \(I_{js_A}=1\) if \(j \in s_A\) and \(I_{js_A}=0 \) otherwise, and \(I_{js_B}=1\) if \(j \in s_B\) and \(I_{js_B}=0 \).
The design variance is given by
because the sampling designs \(d_A\) and \(d_B\) are independent. Let
Then
because of lemma 5 in Breidt and Opsomer (2000), \(E_{dA}(t_A^2)=o(1)\), so that \(E_{dA}(t_A c_A) \le \)\((E_{dA}(t_A^2) E_{dA}(c_A^2))^{1/2} =o(1)\).
Similarly \(E_{dB}(t_B c_B)= E_{dB}(c_B^2) +o(1)\).
We have thus that the asymptotic variance of the estimator is given by
By using the properties of the Horvitz–Thompson estimator (Horvitz and Thompson 1952) we can deduce
Using Theorem 3 of Breidt and Opsomer (2000) for the sampling design \(d_A\), we obtain that an unbiased estimator of this variance is given by
A similar expression can be derived for \(E_{dB}(c_B^2)\) and then, the result follows.
1.3 Proof of Theorem 3 and 4.
Proofs of Theorems 3 and 4 are similar to proofs of Theorems 1 and 2; the value of \(\eta _{A_j}\) and \(\eta _{B_j}\) in frames A and B is now assumed by the factors \(\frac{1}{mu_j}\) for each frame.
Rights and permissions
About this article
Cite this article
Sánchez-Borrego, I., Arcos, A. & Rueda, M. Kernel-based methods for combining information of several frame surveys. Metrika 82, 71–86 (2019). https://doi.org/10.1007/s00184-018-0686-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-018-0686-8