Nonparametric estimation with mixed data types in survey sampling

Sánchez-Borrego, I.; Opsomer , J. D.; Rueda, M.; Arcos, A.

doi:10.1007/s13163-013-0142-2

Nonparametric estimation with mixed data types in survey sampling

Published: 05 December 2013

Volume 27, pages 685–700, (2014)
Cite this article

Revista Matemática Complutense Aims and scope Submit manuscript

I. Sánchez-Borrego¹,
J. D. Opsomer ²,
M. Rueda¹ &
…
A. Arcos¹

239 Accesses
3 Citations
Explore all metrics

Abstract

We consider the problem of finite population mean estimation with mixed data types. A model-assisted estimator based on nonparametric regression is proposed, which can handle discrete and continuous data and incorporates the sampling design in a natural manner. The proposed method shares the design-based properties of the kernel-based model-assisted estimator in the presence of continuous covariates and performs well under different scenarios in simulation experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

References

Aitchison, J., Aitken, C.G.G.: Multivariate binary discrimination by the kernel method. Biometrika 63, 413–420 (1976)
Article MATH MathSciNet Google Scholar
Breidt, F.J., Claeskens, G., Opsomer, J.D.: Model-assisted estimation for complex surveys using penalised splines. Biometrika 92(4), 831–846 (2005)
Article MATH MathSciNet Google Scholar
Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Ann. Stat. 28, 1026–1053 (2000)
Article MATH MathSciNet Google Scholar
Cassel, C.M., Särndal, C.E., Wretman, J.H.: Foundations of Inference in Survey Sampling. Wiley, New York (1977)
MATH Google Scholar
Chen, J., Qin, J.: Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika 80, 107–116 (1993)
Article MATH MathSciNet Google Scholar
Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87, 376–382 (1992)
Article MATH Google Scholar
Fan, J.: Local linear regression smoothers and their minimax efficiencies. Ann. Stat. 21, 196–216 (1993)
Article MATH Google Scholar
Fuller, W.A.: Regression estimation for survey samples. Surv. Methodol. 28, 5–23 (2002)
Google Scholar
Hall, P.: On nonparametric multivariate binary discrimination. Biometrika 68, 287–294 (1981)
Article MATH MathSciNet Google Scholar
Hall, P., Wand, M.P.: On nonparametric discrimination using density differences. Biometrika 75, 541–547 (1988)
Article MATH MathSciNet Google Scholar
Hedayat, A., Sinha, B.: Design and Inference in Finite Population Sampling. Wiley, New York (1991)
MATH Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)
Article MATH MathSciNet Google Scholar
Kuo, L.: Classical and prediction approaches to estimating distribution functions from survey data. In: ASA Proceedings of the Section on Survey Research Methods, pp. 280–285. American Statistical Association, Alexandria, VA (1988)
Li, Q., Ouyang, D., Racine, J.S.: Nonparametric with weakly dependent data: the discrete and continuous regressor case. J. Nonparametr. Stat. 21(6), 697–711 (2009)
Article MATH MathSciNet Google Scholar
Li, Q., Racine, J.S.: Nonparametric Econometrics: Theory and Practice. Princeton University Press, New Jersey (2007)
Google Scholar
Li, Q., Racine, J.S.: Nonparametric estimation of conditional cdf and quantile functions with mixed categorical and continuous data. J. Bus. Econ. Stat. 26(4), 423–434 (2008)
Article MathSciNet Google Scholar
Li, Q., Racine, J.S.: Smooth varying-coefficient estimation and inference for qualitative and quantitative data. Econ. Theory 26, 1607–1637 (2010)
Article MATH MathSciNet Google Scholar
Little, R.J.A.: To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc. 99(466), 546–556 (2004)
Article MATH MathSciNet Google Scholar
Nadaraya, E.A.: On estimating regression. Theory Probab. Appl. 9, 141–142 (1964)
Article Google Scholar
Opsomer, J.D., Miller, C.P.: Selecting the amount of smoothing in nonparametric regression estimation for complex surveys. J. Nonparametr. Stat. 17, 593–611 (2005)
Article MATH MathSciNet Google Scholar
Ouyang, D., Li, Q., Racine, J.S.: Nonparametric estimation of regression functions with discrete regressors. Econ. Theory 25(1), 1–42 (2009)
Article MATH MathSciNet Google Scholar
Racine, J., Li, Q.: Nonparametric estimation of regression functions with both categorical and continuous data. J. Econ. 119, 99–130 (2004)
Article MathSciNet Google Scholar
Robinson, P.M., Särndal, C.E.: Asymptotic properties of the generalized regression estimator in probability sampling. Sankhya, Series B, vol. 45, pp. 240–248 (1983)
Royall, R.M.: The prediction approach to sampling theory. In: Krishnaiah, P.R., Rao, C.R. (eds.) Handbook of Statistics, vol. 6, pp. 399–413. Elsevier, Amsterdam (1988)
Rueda, M., Sánchez-Borrego, I.: A predictive estimator of finite population mean using nonparametric regression. Comput. Stat. 24, 1–14 (2009)
Article MATH Google Scholar
Särndal, C.E., Swensson, B., Wretman, J.: The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76, 527–537 (1989)
Article MATH MathSciNet Google Scholar
Simonoff, J.S.: Smoothing Methods in Statistics. Springer, New York (1996)
Book MATH Google Scholar
Titterington, D.M.: A comparative study of kernel-based density estimates for categorical data. Technometrics 22, 259–268 (1980)
Article MATH MathSciNet Google Scholar
Valliant, R., Dorfman, A.H.: Finite Population Sampling and Inference: A Prediction Approach. Wiley, New York (1999)
Google Scholar

Download references

Acknowledgments

This work is partially supported by Ministerio de Educación y Ciencia (contract No. MTM2009-10055 and MTM2012-35650).

Author information

Authors and Affiliations

Department of Statistics and Operational Research, University of Granada, Granada, Spain
I. Sánchez-Borrego, M. Rueda & A. Arcos
Department of Statistics, Colorado State University, Fort Collins, USA
J. D. Opsomer

Authors

I. Sánchez-Borrego
View author publications
You can also search for this author in PubMed Google Scholar
J. D. Opsomer
View author publications
You can also search for this author in PubMed Google Scholar
M. Rueda
View author publications
You can also search for this author in PubMed Google Scholar
A. Arcos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Rueda.

Appendix

Proof of Theorem 1

We write

$$\begin{aligned} \overline{y}_\mathrm{np}-\overline{Y}= \frac{1}{N} \sum _{j \in U} (y_{j}-m_{j}) \left( \frac{I_{j}}{\pi _{j}}-1\right) + \frac{1}{N}\sum _{j \in U} (\widehat{m}_{j}-m_{j}) \left( 1-\frac{I_{j}}{\pi _{j}}\right) . \end{aligned}$$

(29)

Then

$$\begin{aligned}&E_{d}\left| \overline{y}_\mathrm{np}-\overline{Y} \right| \le E_{d}\left| \sum _{j \in U}\frac{y_{j}-m_{j}}{N} \left( \frac{I_{j}}{\pi _{j}}-1\right) \right| \nonumber \\&+ \left\{ E_{d}\left[ \sum _{j \in U} \frac{(\widehat{m}_{j}-m_{j})^{2}}{N}\right] E_{d}\left[ \sum _{j \in U} \frac{(1-\pi _{j}^{-1}I_{j})^{2}}{N}\right] \right\} ^{1/2}. \end{aligned}$$

(30)

Under assumptions 1–6, there exists $n_{0}$, such that $n \ge n_{0}$ implies lemma 2(ii) in [3],

$$\begin{aligned} \sum _{i \in U} I_{\{|x^{c}-x^{c}_{i}| \le h\}} \ge 1, \end{aligned}$$

(31)

which holds by using the fact that $\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1$.

We write

$$\begin{aligned} \lim _{N \rightarrow \infty } \sup \left| \widehat{t}_{jg}/N \right|&= \lim _{N \rightarrow \infty } \sup \left| \sum _{i \in U} \frac{1}{Nh}K\left( \frac{x_{i}^{c}-x_{j}^{c}}{h}\right) l_{\lambda }(x_{i}^{d}-x_{j}^{d})y_{i}^{p_{1}} \frac{I_{i}}{\pi _{i}}\right| \nonumber \\&\le \lim _{N \rightarrow \infty } \sup \sum _{i \in U} \frac{C}{NhM} I_{\{x^{c}_{j}-h \le x^{c}_{i} \le x^{c}_{j}+h|\}}, \end{aligned}$$

(32)

with $p_{1}=0,1$ and using $\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1$. As $M < 1$, the same bound is valid for $t_{jg}/N$.

Therefore the $N^{-1}t_{jg}$ are uniformly bounded in $j$ and the $N^{-1}\widehat{t}_{jg}$ are uniformly bounded in $j$ and $s$. The $m_{j}$ are continuous functions of the $t_{jg}$ with denominators uniformly bounded away from 0 by (31). Analogously, the $\widehat{m}_{j}$ are continuous functions of the uniformly bounded $\widehat{t}_{jg}$, well defined by the $\delta /N^{2}$ adjustment in (12). Hence, the $m_{j}$ are uniformly bounded in $j$ and the $\widehat{m}_{j}$ are uniformly bounded in $j$ and $s$.

Under assumptions 1–6, we also have

$$\begin{aligned} \lim _{N \rightarrow \infty } \sup \frac{1}{N} \sum _{j \in U} (y_{j}-m_{j})^{2} < \infty , \end{aligned}$$

(33)

and the first term in (30) converges to 0 as $N \rightarrow \infty $, following the argument of Theorem 1 in [23]. Under assumption 5,

$$\begin{aligned} E_{d}\left( \sum _{j \in U} \frac{(1-\pi _{j}^{-1}I_{j})^{2}}{N}\right) =\sum _{j \in U} \frac{\pi _{j}(1-\pi _{j})}{N\pi _{j}^{2}}\le \frac{1}{M}. \end{aligned}$$

(34)

Using $\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1$, lemmas 3, 4 and 5 in [3] continue to hold. Then, using lemma 4,

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{j \in U} E_{d}(\widehat{m}_{j}- m_{j})^{2}=0, \end{aligned}$$

(35)

and by (34) the second term in (30) converges to 0 as $N \rightarrow \infty $. By the Markov inequality the theorem follows. $\square $

Proof of Theorem 2:

Let

$$\begin{aligned} a_{N}=n^{1/2}\sum _{j \in U}\frac{y_{j}-m_{j}}{N}\left( \frac{I_{j}}{\pi _{j}}-1\right) , \quad b_{N}=n^{1/2}\sum _{j \in U}\frac{m_{j}-\widehat{m}_{j}}{N}\left( \frac{I_{j}}{\pi _{j}}-1\right) ,\qquad \end{aligned}$$

(36)

so that

$$\begin{aligned} nE_{d}(\overline{y}_\mathrm{np}-\overline{Y})^2=E_{d}(a_{N}^{2})+E_{d}(b_{N}^{2})+2E_{d}(a_{N}b_{N}). \end{aligned}$$

(37)

It is immediate that $E_{d}(b_{N}^{2})=o(1)$ by lemma 5 in [3], that also follows because of $\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1$. $E_{d}(a_{N}^{2})$ is bounded because $Var_{d}(a_{N})=O(1)$, as $Var_{d}(n^{-1/2}a_{N})=O(1/n)$ for the Horvitz–Thompson mean estimator. Then,

$$\begin{aligned} E_{d}(a_{N}b_{N}) \le (E_{d}(a_{N}^{2})E_{d}(b_{N}^{2}))^{1/2}=o(1). \end{aligned}$$

(38)

Hence,

$$\begin{aligned} nE_{d}(\overline{y}_\mathrm{np}-\overline{Y})^2=E_{d}(a_{N}^{2})+o(1), \end{aligned}$$

(39)

and the theorem follows. $\square $

Proof of Theorems 3 and 4

Under conditions 1–6 and using again that $\lambda \le l_{\lambda }(x^{d}_{i}-x^{d}_{j})\le 1$, the asymptotic normality of the generalized difference estimator follows and a variance consistent estimator of the asymptotic mean square error can be derived, analogously as Theorems 3 and 4 in [3]. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sánchez-Borrego, I., Opsomer , J.D., Rueda, M. et al. Nonparametric estimation with mixed data types in survey sampling. Rev Mat Complut 27, 685–700 (2014). https://doi.org/10.1007/s13163-013-0142-2

Download citation

Received: 25 February 2013
Accepted: 21 November 2013
Published: 05 December 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s13163-013-0142-2

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric estimation with mixed data types in survey sampling

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Violating the normality assumption may be the lesser of two evils

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 1

Proof of Theorem 2:

Proof of Theorems 3 and 4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Nonparametric estimation with mixed data types in survey sampling

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Violating the normality assumption may be the lesser of two evils

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2:

Proof of Theorems 3 and 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation