Abstract
We consider the problem of finite population mean estimation with mixed data types. A model-assisted estimator based on nonparametric regression is proposed, which can handle discrete and continuous data and incorporates the sampling design in a natural manner. The proposed method shares the design-based properties of the kernel-based model-assisted estimator in the presence of continuous covariates and performs well under different scenarios in simulation experiments.
Similar content being viewed by others
References
Aitchison, J., Aitken, C.G.G.: Multivariate binary discrimination by the kernel method. Biometrika 63, 413–420 (1976)
Breidt, F.J., Claeskens, G., Opsomer, J.D.: Model-assisted estimation for complex surveys using penalised splines. Biometrika 92(4), 831–846 (2005)
Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Ann. Stat. 28, 1026–1053 (2000)
Cassel, C.M., Särndal, C.E., Wretman, J.H.: Foundations of Inference in Survey Sampling. Wiley, New York (1977)
Chen, J., Qin, J.: Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika 80, 107–116 (1993)
Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87, 376–382 (1992)
Fan, J.: Local linear regression smoothers and their minimax efficiencies. Ann. Stat. 21, 196–216 (1993)
Fuller, W.A.: Regression estimation for survey samples. Surv. Methodol. 28, 5–23 (2002)
Hall, P.: On nonparametric multivariate binary discrimination. Biometrika 68, 287–294 (1981)
Hall, P., Wand, M.P.: On nonparametric discrimination using density differences. Biometrika 75, 541–547 (1988)
Hedayat, A., Sinha, B.: Design and Inference in Finite Population Sampling. Wiley, New York (1991)
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)
Kuo, L.: Classical and prediction approaches to estimating distribution functions from survey data. In: ASA Proceedings of the Section on Survey Research Methods, pp. 280–285. American Statistical Association, Alexandria, VA (1988)
Li, Q., Ouyang, D., Racine, J.S.: Nonparametric with weakly dependent data: the discrete and continuous regressor case. J. Nonparametr. Stat. 21(6), 697–711 (2009)
Li, Q., Racine, J.S.: Nonparametric Econometrics: Theory and Practice. Princeton University Press, New Jersey (2007)
Li, Q., Racine, J.S.: Nonparametric estimation of conditional cdf and quantile functions with mixed categorical and continuous data. J. Bus. Econ. Stat. 26(4), 423–434 (2008)
Li, Q., Racine, J.S.: Smooth varying-coefficient estimation and inference for qualitative and quantitative data. Econ. Theory 26, 1607–1637 (2010)
Little, R.J.A.: To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc. 99(466), 546–556 (2004)
Nadaraya, E.A.: On estimating regression. Theory Probab. Appl. 9, 141–142 (1964)
Opsomer, J.D., Miller, C.P.: Selecting the amount of smoothing in nonparametric regression estimation for complex surveys. J. Nonparametr. Stat. 17, 593–611 (2005)
Ouyang, D., Li, Q., Racine, J.S.: Nonparametric estimation of regression functions with discrete regressors. Econ. Theory 25(1), 1–42 (2009)
Racine, J., Li, Q.: Nonparametric estimation of regression functions with both categorical and continuous data. J. Econ. 119, 99–130 (2004)
Robinson, P.M., Särndal, C.E.: Asymptotic properties of the generalized regression estimator in probability sampling. Sankhya, Series B, vol. 45, pp. 240–248 (1983)
Royall, R.M.: The prediction approach to sampling theory. In: Krishnaiah, P.R., Rao, C.R. (eds.) Handbook of Statistics, vol. 6, pp. 399–413. Elsevier, Amsterdam (1988)
Rueda, M., Sánchez-Borrego, I.: A predictive estimator of finite population mean using nonparametric regression. Comput. Stat. 24, 1–14 (2009)
Särndal, C.E., Swensson, B., Wretman, J.: The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76, 527–537 (1989)
Simonoff, J.S.: Smoothing Methods in Statistics. Springer, New York (1996)
Titterington, D.M.: A comparative study of kernel-based density estimates for categorical data. Technometrics 22, 259–268 (1980)
Valliant, R., Dorfman, A.H.: Finite Population Sampling and Inference: A Prediction Approach. Wiley, New York (1999)
Acknowledgments
This work is partially supported by Ministerio de Educación y Ciencia (contract No. MTM2009-10055 and MTM2012-35650).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1
We write
Then
Under assumptions 1–6, there exists \(n_{0}\), such that \(n \ge n_{0}\) implies lemma 2(ii) in [3],
which holds by using the fact that \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\).
We write
with \(p_{1}=0,1\) and using \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\). As \(M < 1\), the same bound is valid for \(t_{jg}/N\).
Therefore the \(N^{-1}t_{jg}\) are uniformly bounded in \(j\) and the \(N^{-1}\widehat{t}_{jg}\) are uniformly bounded in \(j\) and \(s\). The \(m_{j}\) are continuous functions of the \(t_{jg}\) with denominators uniformly bounded away from 0 by (31). Analogously, the \(\widehat{m}_{j}\) are continuous functions of the uniformly bounded \(\widehat{t}_{jg}\), well defined by the \(\delta /N^{2}\) adjustment in (12). Hence, the \(m_{j}\) are uniformly bounded in \(j\) and the \(\widehat{m}_{j}\) are uniformly bounded in \(j\) and \(s\).
Under assumptions 1–6, we also have
and the first term in (30) converges to 0 as \(N \rightarrow \infty \), following the argument of Theorem 1 in [23]. Under assumption 5,
Using \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\), lemmas 3, 4 and 5 in [3] continue to hold. Then, using lemma 4,
and by (34) the second term in (30) converges to 0 as \(N \rightarrow \infty \). By the Markov inequality the theorem follows. \(\square \)
Proof of Theorem 2:
Let
so that
It is immediate that \(E_{d}(b_{N}^{2})=o(1)\) by lemma 5 in [3], that also follows because of \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\). \(E_{d}(a_{N}^{2})\) is bounded because \(Var_{d}(a_{N})=O(1)\), as \(Var_{d}(n^{-1/2}a_{N})=O(1/n)\) for the Horvitz–Thompson mean estimator. Then,
Hence,
and the theorem follows. \(\square \)
Proof of Theorems 3 and 4
Under conditions 1–6 and using again that \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\), the asymptotic normality of the generalized difference estimator follows and a variance consistent estimator of the asymptotic mean square error can be derived, analogously as Theorems 3 and 4 in [3]. \(\square \)
Rights and permissions
About this article
Cite this article
Sánchez-Borrego, I., Opsomer , J.D., Rueda, M. et al. Nonparametric estimation with mixed data types in survey sampling. Rev Mat Complut 27, 685–700 (2014). https://doi.org/10.1007/s13163-013-0142-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13163-013-0142-2