Skip to main content
Log in

Nonparametric estimation with mixed data types in survey sampling

  • Published:
Revista Matemática Complutense Aims and scope Submit manuscript

Abstract

We consider the problem of finite population mean estimation with mixed data types. A model-assisted estimator based on nonparametric regression is proposed, which can handle discrete and continuous data and incorporates the sampling design in a natural manner. The proposed method shares the design-based properties of the kernel-based model-assisted estimator in the presence of continuous covariates and performs well under different scenarios in simulation experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aitchison, J., Aitken, C.G.G.: Multivariate binary discrimination by the kernel method. Biometrika 63, 413–420 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  2. Breidt, F.J., Claeskens, G., Opsomer, J.D.: Model-assisted estimation for complex surveys using penalised splines. Biometrika 92(4), 831–846 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  3. Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Ann. Stat. 28, 1026–1053 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  4. Cassel, C.M., Särndal, C.E., Wretman, J.H.: Foundations of Inference in Survey Sampling. Wiley, New York (1977)

    MATH  Google Scholar 

  5. Chen, J., Qin, J.: Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika 80, 107–116 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  6. Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87, 376–382 (1992)

    Article  MATH  Google Scholar 

  7. Fan, J.: Local linear regression smoothers and their minimax efficiencies. Ann. Stat. 21, 196–216 (1993)

    Article  MATH  Google Scholar 

  8. Fuller, W.A.: Regression estimation for survey samples. Surv. Methodol. 28, 5–23 (2002)

    Google Scholar 

  9. Hall, P.: On nonparametric multivariate binary discrimination. Biometrika 68, 287–294 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  10. Hall, P., Wand, M.P.: On nonparametric discrimination using density differences. Biometrika 75, 541–547 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  11. Hedayat, A., Sinha, B.: Design and Inference in Finite Population Sampling. Wiley, New York (1991)

    MATH  Google Scholar 

  12. Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  13. Kuo, L.: Classical and prediction approaches to estimating distribution functions from survey data. In: ASA Proceedings of the Section on Survey Research Methods, pp. 280–285. American Statistical Association, Alexandria, VA (1988)

  14. Li, Q., Ouyang, D., Racine, J.S.: Nonparametric with weakly dependent data: the discrete and continuous regressor case. J. Nonparametr. Stat. 21(6), 697–711 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  15. Li, Q., Racine, J.S.: Nonparametric Econometrics: Theory and Practice. Princeton University Press, New Jersey (2007)

    Google Scholar 

  16. Li, Q., Racine, J.S.: Nonparametric estimation of conditional cdf and quantile functions with mixed categorical and continuous data. J. Bus. Econ. Stat. 26(4), 423–434 (2008)

    Article  MathSciNet  Google Scholar 

  17. Li, Q., Racine, J.S.: Smooth varying-coefficient estimation and inference for qualitative and quantitative data. Econ. Theory 26, 1607–1637 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  18. Little, R.J.A.: To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc. 99(466), 546–556 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  19. Nadaraya, E.A.: On estimating regression. Theory Probab. Appl. 9, 141–142 (1964)

    Article  Google Scholar 

  20. Opsomer, J.D., Miller, C.P.: Selecting the amount of smoothing in nonparametric regression estimation for complex surveys. J. Nonparametr. Stat. 17, 593–611 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  21. Ouyang, D., Li, Q., Racine, J.S.: Nonparametric estimation of regression functions with discrete regressors. Econ. Theory 25(1), 1–42 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  22. Racine, J., Li, Q.: Nonparametric estimation of regression functions with both categorical and continuous data. J. Econ. 119, 99–130 (2004)

    Article  MathSciNet  Google Scholar 

  23. Robinson, P.M., Särndal, C.E.: Asymptotic properties of the generalized regression estimator in probability sampling. Sankhya, Series B, vol. 45, pp. 240–248 (1983)

  24. Royall, R.M.: The prediction approach to sampling theory. In: Krishnaiah, P.R., Rao, C.R. (eds.) Handbook of Statistics, vol. 6, pp. 399–413. Elsevier, Amsterdam (1988)

  25. Rueda, M., Sánchez-Borrego, I.: A predictive estimator of finite population mean using nonparametric regression. Comput. Stat. 24, 1–14 (2009)

    Article  MATH  Google Scholar 

  26. Särndal, C.E., Swensson, B., Wretman, J.: The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76, 527–537 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  27. Simonoff, J.S.: Smoothing Methods in Statistics. Springer, New York (1996)

    Book  MATH  Google Scholar 

  28. Titterington, D.M.: A comparative study of kernel-based density estimates for categorical data. Technometrics 22, 259–268 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  29. Valliant, R., Dorfman, A.H.: Finite Population Sampling and Inference: A Prediction Approach. Wiley, New York (1999)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by Ministerio de Educación y Ciencia (contract No. MTM2009-10055 and MTM2012-35650).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Rueda.

Appendix

Appendix

Proof of Theorem 1

We write

$$\begin{aligned} \overline{y}_\mathrm{np}-\overline{Y}= \frac{1}{N} \sum _{j \in U} (y_{j}-m_{j}) \left( \frac{I_{j}}{\pi _{j}}-1\right) + \frac{1}{N}\sum _{j \in U} (\widehat{m}_{j}-m_{j}) \left( 1-\frac{I_{j}}{\pi _{j}}\right) . \end{aligned}$$
(29)

Then

$$\begin{aligned}&E_{d}\left| \overline{y}_\mathrm{np}-\overline{Y} \right| \le E_{d}\left| \sum _{j \in U}\frac{y_{j}-m_{j}}{N} \left( \frac{I_{j}}{\pi _{j}}-1\right) \right| \nonumber \\&+ \left\{ E_{d}\left[ \sum _{j \in U} \frac{(\widehat{m}_{j}-m_{j})^{2}}{N}\right] E_{d}\left[ \sum _{j \in U} \frac{(1-\pi _{j}^{-1}I_{j})^{2}}{N}\right] \right\} ^{1/2}. \end{aligned}$$
(30)

Under assumptions 1–6, there exists \(n_{0}\), such that \(n \ge n_{0}\) implies lemma 2(ii) in [3],

$$\begin{aligned} \sum _{i \in U} I_{\{|x^{c}-x^{c}_{i}| \le h\}} \ge 1, \end{aligned}$$
(31)

which holds by using the fact that \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\).

We write

$$\begin{aligned} \lim _{N \rightarrow \infty } \sup \left| \widehat{t}_{jg}/N \right|&= \lim _{N \rightarrow \infty } \sup \left| \sum _{i \in U} \frac{1}{Nh}K\left( \frac{x_{i}^{c}-x_{j}^{c}}{h}\right) l_{\lambda }(x_{i}^{d}-x_{j}^{d})y_{i}^{p_{1}} \frac{I_{i}}{\pi _{i}}\right| \nonumber \\&\le \lim _{N \rightarrow \infty } \sup \sum _{i \in U} \frac{C}{NhM} I_{\{x^{c}_{j}-h \le x^{c}_{i} \le x^{c}_{j}+h|\}}, \end{aligned}$$
(32)

with \(p_{1}=0,1\) and using \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\). As \(M < 1\), the same bound is valid for \(t_{jg}/N\).

Therefore the \(N^{-1}t_{jg}\) are uniformly bounded in \(j\) and the \(N^{-1}\widehat{t}_{jg}\) are uniformly bounded in \(j\) and \(s\). The \(m_{j}\) are continuous functions of the \(t_{jg}\) with denominators uniformly bounded away from 0 by (31). Analogously, the \(\widehat{m}_{j}\) are continuous functions of the uniformly bounded \(\widehat{t}_{jg}\), well defined by the \(\delta /N^{2}\) adjustment in (12). Hence, the \(m_{j}\) are uniformly bounded in \(j\) and the \(\widehat{m}_{j}\) are uniformly bounded in \(j\) and \(s\).

Under assumptions 1–6, we also have

$$\begin{aligned} \lim _{N \rightarrow \infty } \sup \frac{1}{N} \sum _{j \in U} (y_{j}-m_{j})^{2} < \infty , \end{aligned}$$
(33)

and the first term in (30) converges to 0 as \(N \rightarrow \infty \), following the argument of Theorem 1 in [23]. Under assumption 5,

$$\begin{aligned} E_{d}\left( \sum _{j \in U} \frac{(1-\pi _{j}^{-1}I_{j})^{2}}{N}\right) =\sum _{j \in U} \frac{\pi _{j}(1-\pi _{j})}{N\pi _{j}^{2}}\le \frac{1}{M}. \end{aligned}$$
(34)

Using \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\), lemmas 3, 4 and 5 in [3] continue to hold. Then, using lemma 4,

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{j \in U} E_{d}(\widehat{m}_{j}- m_{j})^{2}=0, \end{aligned}$$
(35)

and by (34) the second term in (30) converges to 0 as \(N \rightarrow \infty \). By the Markov inequality the theorem follows. \(\square \)

Proof of Theorem 2:

Let

$$\begin{aligned} a_{N}=n^{1/2}\sum _{j \in U}\frac{y_{j}-m_{j}}{N}\left( \frac{I_{j}}{\pi _{j}}-1\right) , \quad b_{N}=n^{1/2}\sum _{j \in U}\frac{m_{j}-\widehat{m}_{j}}{N}\left( \frac{I_{j}}{\pi _{j}}-1\right) ,\qquad \end{aligned}$$
(36)

so that

$$\begin{aligned} nE_{d}(\overline{y}_\mathrm{np}-\overline{Y})^2=E_{d}(a_{N}^{2})+E_{d}(b_{N}^{2})+2E_{d}(a_{N}b_{N}). \end{aligned}$$
(37)

It is immediate that \(E_{d}(b_{N}^{2})=o(1)\) by lemma 5 in [3], that also follows because of \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\). \(E_{d}(a_{N}^{2})\) is bounded because \(Var_{d}(a_{N})=O(1)\), as \(Var_{d}(n^{-1/2}a_{N})=O(1/n)\) for the Horvitz–Thompson mean estimator. Then,

$$\begin{aligned} E_{d}(a_{N}b_{N}) \le (E_{d}(a_{N}^{2})E_{d}(b_{N}^{2}))^{1/2}=o(1). \end{aligned}$$
(38)

Hence,

$$\begin{aligned} nE_{d}(\overline{y}_\mathrm{np}-\overline{Y})^2=E_{d}(a_{N}^{2})+o(1), \end{aligned}$$
(39)

and the theorem follows. \(\square \)

Proof of Theorems 3 and 4

Under conditions 1–6 and using again that \(\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1\), the asymptotic normality of the generalized difference estimator follows and a variance consistent estimator of the asymptotic mean square error can be derived, analogously as Theorems 3 and 4 in [3]. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sánchez-Borrego, I., Opsomer , J.D., Rueda, M. et al. Nonparametric estimation with mixed data types in survey sampling. Rev Mat Complut 27, 685–700 (2014). https://doi.org/10.1007/s13163-013-0142-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13163-013-0142-2

Keywords

Mathematics Subject Classification (2000)

Navigation