Abstract
Non-probability samples become increasingly popular in sampling survey with lower costs, shorter time durations and higher efficiencies. In the high-dimensional superpopulation modeling approach for non-probability samples, a model is fitted for the analysis variable from a non-probability sample, and is used to project the sample to the full population. In practice, there exist situations that the covariates in modeling process are not directly observed, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable confounder. In the paper, we propose to calibrate the covariates by nonparametrically regressing the observable contaminated covariate on the confounder. We employ the SCAD-penalized least squares method to investigate the variable selection and inference problems for non-probability samples based on the calibrated covariates. A SCAD-penalized estimator for the parameter and the population mean estimator are obtained. Under some mild assumptions, we establish the “oracle property” of the proposed SCAD-penalized estimator and give the consistency properties of the proposed population mean estimator. Simulation studies are conducted to assess the finite-sample performance of the proposed method. An application to a Boston housing price study demonstrates the utility of the proposed method in practice.
Similar content being viewed by others
References
Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Tourangeau R (2013) Summary report of the AAPOR task force on non-probability sampling. J Surv Stat Methodol 1(2):90–143
Bethlehem J (2016) Solving the nonresponse problem with sample matching? Soc Sci Comput Rev 34(1):59–77
Chen JKT, Valliant RL, Elliott MR (2019) Calibrating non-probability surveys to estimated control totals using LASSO, with an application to political polling. J R Stat Soc Ser C (Appl Stat) 68(3):657–681
Cooper D, Greenaway M (2015) Non-probability survey sampling in official statistics. Retrieved from Office for National Statistics website: https://www.google.com/url
Craven P, Wahba G (1978) Smoothing noisy data with spline functions. Numer Math 31(4):377–403
Cui X, Guo W, Lin L, Zhu L (2009) Covariate-adjusted nonlinear regression. Ann Stat 37(4):1839–1870
Cui X (2008) Statistical analysis of two types of complex data and its associated model. Ph.D. Thesis, Shandong University, Jinan
Delaigle A, Hall P, Zhou WX (2016) Nonparametric covariate-adjusted regression. Ann Stat 44(5):2190–2220
Elliott MR, Valliant R (2017) Inference for non-probability samples. Stat Sci 32(2):249–264
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102
Keiding N, Louis TA (2016) Perils and potentials of self-selected entry to epidemiological studies and surveys. J R Stat Soc Ser A (Stat Soc) 179(2):319–376
Kim J K, Park S, Chen Y, Wu C (2018) Combining non-probability and probability survey samples through mass imputation. arXiv preprint arXiv: 1812.10694
Li F, Lin L, Cui X (2010) Covariate-adjusted partially linear regression models. Commun Stat Theory Methods 39(6):1054–1074
Li X, Du J, Li G, Fan M (2014) Variable selection for covariae adjusted regression model. J Syst Sci Complexity 27(6):1227–1246
Meijer RJ, Goeman JJ (2013) Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical J 55(2):141–155
Nguyen DV, Sentürk D (2008) Multicovariate-adjusted regression models. J Stat Comput Simul 78(9):813–827
Schreuder HT, Gregoire TG, Weyer JP (2001) For what applications can probability and non-probability sampling be used? Environ Monit Assess 66(3):281–291
Şentürk D, Müller HG (2005) Covariate adjusted correlation analysis via varying coefficient models. Scand J Stat 32(3):365–383
Şentürk D, Müller HG (2005) Covariate-adjusted regression. Biometrika 92(1):75–89
Şentürk D, Müller HG (2009) Covariate-adjusted generalized linear models. Biometrika 96(2):357–370
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Yang S, Kim JK, Song R (2020) Doubly robust inference when combining probability and non-probability samples with high dimensional data. J R Stat Soc Ser B (Stat Methodol) 82(2):445–465
Ża̧dło, T (2009) On MSE of EBLUP. Stat Papers 50(1):101–118
Zhang J, Zhu LP, Zhu LX (2012) On a dimension reduction regression with covariate adjustment. J Multivariate Anal 104(1):39–55
Zhang J, Yu Y, Zhu L, Liang H (2013) Partial linear single index models with distortion measurement errors. Ann Inst Stat Math 65(2):237–267
Zhang L (2019) On valid descriptive inference from non-probability sample. Stat Theory Related Fields 3(2):103–113
Zhu LX, Fang KT (1996) Asymptotics for kernel estimate of sliced inverse regression. Ann Stat 24(3):1053–1068
Zou H (2008) A note on path-based variable selection in the penalized proportional hazards model. Biometrika 95(1):241–247
Acknowledgements
This work is supported by the National Natural Science Foundation of China (NSFC) (No. 11901175) and Fundamental Research Funds for Hubei Key Laboratory of Applied Mathematics, Hubei University (No. HBAM 201907).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The detailed proofs of the three theorems is shown as follows. Under assumptions (B1)-(B4) and (C1)-(C6), similarly to Lemma 2.1 in Cui (2008), we obtain
Proof of Theorem 3.1: Let \(w_{n}=n^{-1/2}+a_{n}\) and \(\Vert u\Vert =C\), where C is a sufficiently large constant. It is sufficient to show that, for any given \(\varepsilon \), there exists a large enough constant C such that
which implies that there exists a local minimizer in the ball \(\{\beta _{0}+w_{n}u\}\) with probability of at least \(1-\varepsilon \). Hence, there exists a local minimizer such that \(\Vert \widehat{\beta }-\beta _{0})\Vert =O_{P}(w_{n})\).
From (2.8), using \(p_{\lambda }\left( 0 \right) =0\), we obtain
where k is the dimension of \(\beta _{I0}\), and \(\beta _{0j}\) is the jth element of \(\beta _{0}\). Then, we analyze the above difference in two steps.
Step 1: Show that
For \(L_{I}\), we decompose it into two parts by performing a simple calculation, and obtain
where \(L_{1}=w_{n}^{2}u^{\text {T}}X^{\text {T}}Xu/2+w_{n}^{2}u^{\text {T}}\left( \widehat{X}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) u/2\). By (A1), it implies that \(w_{n}^{2}u^{\text {T}} \left( \widehat{X}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) u/2=O_{P}\left( w_{n}^{2}n^{1/2}/2 \right) \Vert u\Vert ^{2}\), we obtain
By performing a simple calculation, we obtain
Notice that
which implies that
From the above argument, we can rewrite equation (A.2) as
Next, we focus on \(L_{2}\). As \(\varepsilon _{i}=Y_{i}-X_{i}^{\text {T}}\beta _{0}\), we perform a simple calculation on \(L_{2}\) and obtain
Due to \(\left| L_{21}\right| \le w_{n}\Vert \varepsilon ^{\text {T}}X\Vert \Vert u\Vert =w_{n}\Vert \sum _{i=1}^{n}\varepsilon _{i}X_{i}\Vert \Vert u\Vert \) and assumption (C1), it implies that
then, we obtain \(\left| L_{21}\right| \le O_{P}\left( w_{n}n^{-1/2}\Vert u\Vert \right) =O_{P}\left( nw_{n}^{2} \right) \Vert u\Vert \). Similarly to the proof of Lemma 2.1 in Cui (2008), we have \(\left| L_{22}\right| =w_{n}\left| \varepsilon ^{\text {T}}\left( \widehat{X}-X \right) u\right| =o_{P}\left( nw_{n}^{2} \right) \Vert u\Vert \) and \(L_{23}=L_{24}=o_{P}\left( nw_{n}^{2} \right) \Vert u\Vert \). Based on the above proof, (A.1) can be obtained.
Step 2: Show that
We perform a second-order Taylor expansion of \(p_{\lambda }\left( \left| \beta _{0j}+w_{n}u_{j}\right| \right) \) around \(\left| \beta _{0j}\right| \) and obtain
Using the definitions of \(a_{n}=\max \left\{ p_{\lambda }\left( \left| \beta _{0j}\right| \right) \right\} \), \(b_{n}=\max \left\{ \left| p^{''}_{\lambda }\left( \beta _{0j} \right) \right| :\beta _{0j}\ne 0\right\} \) and the inequality \(\left| sgn\left( \beta _{0j} \right) \right| \le 1\), we have
and
Due to \(w_{n}=n^{-1/2}+a_{n}\), which implies \(a_{n}\le w_{n}\), we obtain
From the above results, we know \(L_{I}\) dominates all of the items uniformly in \(\Vert u\Vert =C\) when a sufficiently large C is chosen. As \(L_{I}\) is positive, this completes the proof of theorem 3.1. Proof of Theorem 3.2: In order to improve readability, we divide the proof of Theorem 3.2 into two steps, as shown below. In this step, we will show weak consistency. As shown in Theorem 1, there is root n-uniform local maximum \(\mathbf {B}\). In step 2, we need to prove the asymptotic normality of the penalty least squares estimator.
Step 1: It is sufficient to show that with probability tending to 1 as \(n\rightarrow \infty \), for any \(\beta \) satisfying \(\beta _{I}-\beta _{I0}=O_{P}(n^{-1/2})\) and \(j=k+1, \ldots , q\), we have
To show (A.3), considering the partial derivative of \(L_{n}(\beta )\) at any differentiable point \(\beta =\left( \beta _{1}, \ldots , \beta _{q} \right) \), we obtain
where \(\beta _{0}\) is the true value of \(\beta \), \(j=k+1, \ldots , q\).
We first consider \(P_{1}\), by theorem 3.1, it is easy to prove a for any \(\beta \) satisfying \(\beta _{I}-\beta _{I0}=O_{P}\left( n^{-1/2} \right) \) and \(\left| \beta _{II}-\beta _{II0}\right| \le \varepsilon _{n}=Cn^{-1/2}\) satisfying any positive constant C,
and
From the above argument, it shows \(P_{1}=O_{P}\left( n^{-1/2} \right) \). By (A2), (A4) and the above similar argument, we can get \(P_{i}=o_{P}(1), i=2, 3, 4, 5\).
Using the above arguments, \(\liminf \limits _{n\rightarrow \infty }\liminf \limits _{\nu \rightarrow 0^{+}}p_{\lambda }^{'}\left( \nu \right) /\lambda >0\) and \(n^{-1/2}/\lambda \rightarrow 0\), we obtain
the sign of the derivative is completely determined by that of \(\beta _{j}\). Hence, (A.3) follows. This completes the proof.
Step 2:
Using the Taylor’s theorem on \(\nabla L_{n}\left( \widehat{\beta _{I}} \right) \) at \(\beta _{I0}\), and \(\widehat{\beta _{I}}\) must satisfy the penalized least squares equation \(\nabla L_{n}\left( \widehat{\beta _{I}} \right) =0\), we have
where is \(\beta _{I0}^{*}\) between \(\widehat{\beta _{I}}\) and \(\beta _{I0}\). Using the definitions of \(L_{n}\left( \cdot \right) \) and \(\varepsilon =Y-X\beta _{0}\), we can have
and
where \(\nabla p_{\lambda }\left( |\beta _{I0}| \right) =\left( p^{'}_{\lambda }\left( |\beta _{01}| \right) sgn\left( \beta _{01} \right) , \ldots , p^{'}_{\lambda }\left( |\beta _{k}| \right) sgn\left( \beta _{k} \right) \right) _{k\times 1}^{\text {T}}\), \(\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) \) is the diagonal matrix whose diagonal elements are \(p^{''}_{\lambda }\left( \beta _{0j}^{*} \right) , j=1, 2, \ldots , k\).
For the first term of (A.4), we have the follows as \(n^{-1/2}\left( \widehat{X}_{I}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) =O_{P}(n^{-1/2})\),
and perform a simple calculation, we obtain
where \(K_{1}=n^{-1/2}\left( \widehat{X}_{I}-{X}_{I} \right) ^{\text {T}}\varepsilon \) and \(K_{2}=n^{-1/2}\left( \widehat{X}_{I}-{X}_{I} \right) ^{\text {T}}\left( \widehat{X}_{I}-{X}_{I} \right) \beta _{I0}\). By (A3) and \(\widehat{X}-X=o_{P}(1)\), it is obvious that \(K_{1}=K_{2}=o_{P}(1)\). Then, multiplying (A.5) by \(\sqrt{n}A_{n}M_{I}^{-1}\),
Next, we prove that \(W_{1}\) and \(W_{2}\) satisfy the assumptions of Lindeberg-Feller central limit theorem. For \(W_{1}\), denote \(T_{ni}=n^{-1/2}A_{n}M_{I}^{-1}X_{Ii}\varepsilon _{i}\), for any \(\delta >0\), we have
Applying the argument of Craven and Wahba (1978) and assumption (C1), we have \(P\left( \Vert T_{ni}\Vert >\delta \right) \le \text {E}{\Vert T_{ni}\Vert }^{2}/n\delta ^{2}\le \sigma ^{2}\lambda _{\max }\left( A_{n}A_{n}^{\text {T}} \right) /n\delta ^{2}\lambda _{\min }^{2}\left( M_{I} \right) =O_{P}\left( n^{-1} \right) \), and
then, we obtain
which implies that \(W_{1}\) satisfies the conditions of the Lindeberg-Feller central limit theorem. For \(W_{2}\), under (A2), we can get the same conclusion, and these terms are not correlated. Thus, \(\text {Var}\left( W_{1}-W_{2} \right) =A_{n}\left( \sigma ^{2}M_{I}^{-1}+R \right) A_{n}^{\text {T}}\rightarrow H\), where H is a \(p\times p\) nonnegative symmetric matrix. The above two steps complete the proof of theorem 3.2. Proof of Theorem 3: By Theorem 1, we have \(\widehat{\beta }-\beta _{0}=O_{P}\left( n^{-1/2}+a_{n} \right) \), \(\widehat{X}-X=o_{P}(1)\), then
Rights and permissions
About this article
Cite this article
Pan, Y., Cai, W. & Liu, Z. Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model. Stat Methods Appl 31, 955–979 (2022). https://doi.org/10.1007/s10260-021-00619-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-021-00619-w