Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model

Pan, Yingli; Cai, Wen; Liu, Zhan

doi:10.1007/s10260-021-00619-w

Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model

Original Paper
Published: 20 January 2022

Volume 31, pages 955–979, (2022)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

346 Accesses
2 Citations
Explore all metrics

Abstract

Non-probability samples become increasingly popular in sampling survey with lower costs, shorter time durations and higher efficiencies. In the high-dimensional superpopulation modeling approach for non-probability samples, a model is fitted for the analysis variable from a non-probability sample, and is used to project the sample to the full population. In practice, there exist situations that the covariates in modeling process are not directly observed, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable confounder. In the paper, we propose to calibrate the covariates by nonparametrically regressing the observable contaminated covariate on the confounder. We employ the SCAD-penalized least squares method to investigate the variable selection and inference problems for non-probability samples based on the calibrated covariates. A SCAD-penalized estimator for the parameter and the population mean estimator are obtained. Under some mild assumptions, we establish the “oracle property” of the proposed SCAD-penalized estimator and give the consistency properties of the proposed population mean estimator. Simulation studies are conducted to assess the finite-sample performance of the proposed method. An application to a Boston housing price study demonstrates the utility of the proposed method in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of the finite population distribution function using a global penalized calibration method

Article 23 February 2018

Semi-parametric regression when some (expensive) covariates are missing by design

Article 01 January 2020

An empirical likelihood approach under cluster sampling with missing observations

Article 03 August 2018

References

Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Tourangeau R (2013) Summary report of the AAPOR task force on non-probability sampling. J Surv Stat Methodol 1(2):90–143
Article Google Scholar
Bethlehem J (2016) Solving the nonresponse problem with sample matching? Soc Sci Comput Rev 34(1):59–77
Article Google Scholar
Chen JKT, Valliant RL, Elliott MR (2019) Calibrating non-probability surveys to estimated control totals using LASSO, with an application to political polling. J R Stat Soc Ser C (Appl Stat) 68(3):657–681
Article MathSciNet Google Scholar
Cooper D, Greenaway M (2015) Non-probability survey sampling in official statistics. Retrieved from Office for National Statistics website: https://www.google.com/url
Craven P, Wahba G (1978) Smoothing noisy data with spline functions. Numer Math 31(4):377–403
Article MathSciNet Google Scholar
Cui X, Guo W, Lin L, Zhu L (2009) Covariate-adjusted nonlinear regression. Ann Stat 37(4):1839–1870
Article MathSciNet Google Scholar
Cui X (2008) Statistical analysis of two types of complex data and its associated model. Ph.D. Thesis, Shandong University, Jinan
Delaigle A, Hall P, Zhou WX (2016) Nonparametric covariate-adjusted regression. Ann Stat 44(5):2190–2220
MathSciNet MATH Google Scholar
Elliott MR, Valliant R (2017) Inference for non-probability samples. Stat Sci 32(2):249–264
Article Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102
Article Google Scholar
Keiding N, Louis TA (2016) Perils and potentials of self-selected entry to epidemiological studies and surveys. J R Stat Soc Ser A (Stat Soc) 179(2):319–376
Article MathSciNet Google Scholar
Kim J K, Park S, Chen Y, Wu C (2018) Combining non-probability and probability survey samples through mass imputation. arXiv preprint arXiv: 1812.10694
Li F, Lin L, Cui X (2010) Covariate-adjusted partially linear regression models. Commun Stat Theory Methods 39(6):1054–1074
Article MathSciNet Google Scholar
Li X, Du J, Li G, Fan M (2014) Variable selection for covariae adjusted regression model. J Syst Sci Complexity 27(6):1227–1246
Article MathSciNet Google Scholar
Meijer RJ, Goeman JJ (2013) Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical J 55(2):141–155
Article MathSciNet Google Scholar
Nguyen DV, Sentürk D (2008) Multicovariate-adjusted regression models. J Stat Comput Simul 78(9):813–827
Article MathSciNet Google Scholar
Schreuder HT, Gregoire TG, Weyer JP (2001) For what applications can probability and non-probability sampling be used? Environ Monit Assess 66(3):281–291
Article Google Scholar
Şentürk D, Müller HG (2005) Covariate adjusted correlation analysis via varying coefficient models. Scand J Stat 32(3):365–383
Article MathSciNet Google Scholar
Şentürk D, Müller HG (2005) Covariate-adjusted regression. Biometrika 92(1):75–89
Article MathSciNet Google Scholar
Şentürk D, Müller HG (2009) Covariate-adjusted generalized linear models. Biometrika 96(2):357–370
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Yang S, Kim JK, Song R (2020) Doubly robust inference when combining probability and non-probability samples with high dimensional data. J R Stat Soc Ser B (Stat Methodol) 82(2):445–465
Article MathSciNet Google Scholar
Ża̧dło, T (2009) On MSE of EBLUP. Stat Papers 50(1):101–118
Zhang J, Zhu LP, Zhu LX (2012) On a dimension reduction regression with covariate adjustment. J Multivariate Anal 104(1):39–55
Article MathSciNet Google Scholar
Zhang J, Yu Y, Zhu L, Liang H (2013) Partial linear single index models with distortion measurement errors. Ann Inst Stat Math 65(2):237–267
Article MathSciNet Google Scholar
Zhang L (2019) On valid descriptive inference from non-probability sample. Stat Theory Related Fields 3(2):103–113
Article MathSciNet Google Scholar
Zhu LX, Fang KT (1996) Asymptotics for kernel estimate of sliced inverse regression. Ann Stat 24(3):1053–1068
Article MathSciNet Google Scholar
Zou H (2008) A note on path-based variable selection in the penalized proportional hazards model. Biometrika 95(1):241–247
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) (No. 11901175) and Fundamental Research Funds for Hubei Key Laboratory of Applied Mathematics, Hubei University (No. HBAM 201907).

Author information

Authors and Affiliations

Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, 430062, China
Yingli Pan, Wen Cai & Zhan Liu

Authors

Yingli Pan
View author publications
You can also search for this author in PubMed Google Scholar
Wen Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhan Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The detailed proofs of the three theorems is shown as follows. Under assumptions (B1)-(B4) and (C1)-(C6), similarly to Lemma 2.1 in Cui (2008), we obtain

$$\begin{aligned}&(A1)\quad \frac{1}{\sqrt{n}}\left( \widehat{X}-X \right) ^{\text {T}}Y=o_{P}\left( 1 \right) \\&(A2)\quad \frac{1}{n}X^{\text {T}}\left( \widehat{X}-X \right) =M\text {diag}\left\{ \frac{1}{n}\sum _{i=1}^{n}\limits \frac{\left( \widetilde{X}_{i1}-X_{i1} \right) }{\text {E}{X_{1}}}, \ldots , \frac{1}{n}\sum _{i=1}^{n}\limits \frac{\left( \widetilde{X}_{iq}-X_{iq} \right) }{\text {E}{X_{q}}}\right\} +o_{P}\left( n^{-1/2} \right) \\&(A3)\quad \frac{1}{n}\left( \widehat{X}-X \right) ^{\text {T}}\left( \widehat{X}-X \right) =o_{P}\left( n^{-1/2} \right) \\&(A4)\quad \frac{1}{n}\widehat{X}^{\text {T}}\widehat{X}-\frac{1}{n}X^{\text {T}}X=O_{P}\left( n^{-1/2} \right) \end{aligned}$$

Proof of Theorem 3.1: Let $w_{n}=n^{-1/2}+a_{n}$ and $\Vert u\Vert =C$, where C is a sufficiently large constant. It is sufficient to show that, for any given $\varepsilon $, there exists a large enough constant C such that

$$\begin{aligned} P\{\inf _{\Vert u\Vert =C}\limits L_{n}(\beta _{0}+w_{n}u)-L_{n}(\beta _{0})> 0\}\ge 1-\varepsilon , \end{aligned}$$

which implies that there exists a local minimizer in the ball $\{\beta _{0}+w_{n}u\}$ with probability of at least $1-\varepsilon $. Hence, there exists a local minimizer such that $\Vert \widehat{\beta }-\beta _{0})\Vert =O_{P}(w_{n})$.

From (2.8), using $p_{\lambda }\left( 0 \right) =0$, we obtain

$$\begin{aligned}&L_{n}(\beta _{0}+w_{n}u)-L_{n}(\beta _{0})\\&\quad \ge \frac{1}{2}\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\left( \beta _{0}+w_{n}u \right) \right) ^{2}-\frac{1}{2}\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\beta _{0} \right) ^{2}\\&\quad \quad +n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }\left( \left| \beta _{0j}+w_{n}u_{j}\right| \right) -p_{\lambda }\left( \left| \beta _{0j}\right| \right) \right\} \\ \overset{\wedge }{=}&L_{I}+L_{II}, \end{aligned}$$

where k is the dimension of $\beta _{I0}$, and $\beta _{0j}$ is the jth element of $\beta _{0}$. Then, we analyze the above difference in two steps.

Step 1: Show that

$$\begin{aligned} L_{I}=L_{11}+L_{21}+L_{22}+L_{23}+L_{24}+o_{P}\left( w_{n}^{2}n/2 \right) \Vert u\Vert ^{2}. \end{aligned}$$

(A.1)

For $L_{I}$, we decompose it into two parts by performing a simple calculation, and obtain

$$\begin{aligned} L_{I}=&\frac{w_{n}^{2}}{2}\sum _{i=1}^{n}\limits \left( \widehat{X}_{i}^{\text {T}}u \right) ^{2}-w_{n}\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\beta _{0} \right) \widehat{X}_{i}^{\text {T}}u\\ \overset{\wedge }{=}&L_{1}+L_{2}, \end{aligned}$$

where $L_{1}=w_{n}^{2}u^{\text {T}}X^{\text {T}}Xu/2+w_{n}^{2}u^{\text {T}}\left( \widehat{X}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) u/2$. By (A1), it implies that $w_{n}^{2}u^{\text {T}} \left( \widehat{X}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) u/2=O_{P}\left( w_{n}^{2}n^{1/2}/2 \right) \Vert u\Vert ^{2}$, we obtain

$$\begin{aligned} L_{1}&=w_{n}^{2}u^{\text {T}}X^{\text {T}}Xu/2+O_{P}\left( w_{n}^{2}n^{1/2}/2 \right) \Vert u\Vert ^{2} \nonumber \\&\overset{\wedge }{=}L_{11}+o_{P}\left( w_{n}^{2}n/2 \right) \Vert u\Vert ^{2}. \end{aligned}$$

(A.2)

By performing a simple calculation, we obtain

$$\begin{aligned} L_{11}=\frac{w_{n}^{2}}{2}nu^{\text {T}}\left( \frac{1}{n}X^{\text {T}}X-\text {E}{\left( XX^{\text {T}} \right) } \right) u+\frac{w_{n}^{2}}{2}nu^{\text {T}}\text {E}{\left( XX^{\text {T}} \right) }u. \end{aligned}$$

Notice that

$$\begin{aligned} P\left( \Vert \frac{1}{n}X^{\text {T}}X-\text {E}{\left( XX^{\text {T}} \right) }\Vert \ge \varepsilon \right) \le \frac{n}{n^{2}\varepsilon ^{2}}\text {E}{\sum _{i,j}^{q}\left\{ X_{i}X_{j}-\text {E}{\left( X_{i}X_{j} \right) }\right\} ^{2}}=\frac{1}{n}, \end{aligned}$$

which implies that

$$\begin{aligned} \Vert \frac{1}{n}X^{\text {T}}X-\text {E}{\left( XX^{\text {T}} \right) }\Vert =O_{P}\left( n^{-1/2} \right) =o_{P}(1). \end{aligned}$$

From the above argument, we can rewrite equation (A.2) as

$$\begin{aligned} L_{1}=\frac{w_{n}^{2}}{2}nu^{\text {T}}\text {E}{\left( XX^{\text {T}} \right) }u+o_{P}\left( nw_{n}^{2}/2 \right) \Vert u\Vert ^{2}. \end{aligned}$$

Next, we focus on $L_{2}$. As $\varepsilon _{i}=Y_{i}-X_{i}^{\text {T}}\beta _{0}$, we perform a simple calculation on $L_{2}$ and obtain

$$\begin{aligned} L_{2}=&-w_{n}\sum _{i=1}^{n}\limits \left\{ \left( \varepsilon _{i}-\left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}\beta _{0} \right) \left( X_{i}^{\text {T}}u+\left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}u \right) \right\} \\ =&-w_{n}\varepsilon ^{\text {T}}Xu-w_{n}\varepsilon ^{\text {T}}\left( \widehat{X}-X \right) u+w_{n}u^{\text {T}}X^{\text {T}}\left( \widehat{X}-X \right) \beta _{0}\\&+w_{n}u^{\text {T}}\left( \widehat{X}-X \right) ^{\text {T}}\left( \widehat{X}-X \right) \beta _{0}\\&\overset{\wedge }{=}L_{21}+L_{22}+L_{23}+L_{24}. \end{aligned}$$

Due to $\left| L_{21}\right| \le w_{n}\Vert \varepsilon ^{\text {T}}X\Vert \Vert u\Vert =w_{n}\Vert \sum _{i=1}^{n}\varepsilon _{i}X_{i}\Vert \Vert u\Vert $ and assumption (C1), it implies that

$$\begin{aligned} \text {E}{\Vert \sum _{i=1}^{n}\limits \varepsilon _{i}X_{i}\Vert ^{2}}=\text {E}{\Vert \sum _{l=1}^{q}\limits \sum _{i=1}^{n}\limits \sum _{i=1}^{n}\limits \varepsilon _{i}\varepsilon _{j}X_{il}X_{jl}\Vert }\le \sigma ^{2}n, \end{aligned}$$

then, we obtain $\left| L_{21}\right| \le O_{P}\left( w_{n}n^{-1/2}\Vert u\Vert \right) =O_{P}\left( nw_{n}^{2} \right) \Vert u\Vert $. Similarly to the proof of Lemma 2.1 in Cui (2008), we have $\left| L_{22}\right| =w_{n}\left| \varepsilon ^{\text {T}}\left( \widehat{X}-X \right) u\right| =o_{P}\left( nw_{n}^{2} \right) \Vert u\Vert $ and $L_{23}=L_{24}=o_{P}\left( nw_{n}^{2} \right) \Vert u\Vert $. Based on the above proof, (A.1) can be obtained.

Step 2: Show that

$$\begin{aligned} L_{II}\le nw_{n}^{2}\Vert u\Vert +nb_{n}w_{n}^{2}\Vert u\Vert ^{2}. \end{aligned}$$

We perform a second-order Taylor expansion of $p_{\lambda }\left( \left| \beta _{0j}+w_{n}u_{j}\right| \right) $ around $\left| \beta _{0j}\right| $ and obtain

$$\begin{aligned} L_{II}&=n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }\left( \left| \beta _{0j}+w_{n}u_{j}\right| \right) -p_{\lambda }\left( \left| \beta _{0j}\right| \right) \right\} \\&=n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }^{'}\left( \left| \beta _{0j}\right| \right) sgn\left( \beta _{0j} \right) w_{n}u_{j}+\frac{1}{2}p_{\lambda }^{''}\left( \left| \beta _{0j}\right| \right) w_{n}^{2}u_{j}^{2}\left( 1+o(1) \right) \right\} . \end{aligned}$$

Using the definitions of $a_{n}=\max \left\{ p_{\lambda }\left( \left| \beta _{0j}\right| \right) \right\} $, $b_{n}=\max \left\{ \left| p^{''}_{\lambda }\left( \beta _{0j} \right) \right| :\beta _{0j}\ne 0\right\} $ and the inequality $\left| sgn\left( \beta _{0j} \right) \right| \le 1$, we have

$$\begin{aligned} n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }^{'}\left( \left| \beta _{0j}\right| \right) sgn\left( \beta _{0j} \right) w_{n}u_{j}\right\} \le&n\sum _{j=1}^{k}\limits \left\{ \left| p_{\lambda }^{'}\left( \left| \beta _{0j}\right| \right) sgn\left( \beta _{0j} \right) w_{n}u_{j}\right| \right\} \\ \le&n\sum _{j=1}^{k}\limits \left\{ |a_{n}|w_{n}|u_{j}|\right\} =n|a_{n}|w_{n}\Vert u\Vert , \end{aligned}$$

and

$$\begin{aligned} n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }^{''}\left( |\beta _{0j}| \right) w_{n}^{2}u_{j}^{2}\left( 1+o(1) \right) \right\} /2\le nb_{n}w_{n}^{2}\Vert u\Vert ^{2}. \end{aligned}$$

Due to $w_{n}=n^{-1/2}+a_{n}$, which implies $a_{n}\le w_{n}$, we obtain

$$\begin{aligned} L_{II}\le a_{n}nw_{n}\Vert u\Vert +nb_{n}w_{n}^{2}\Vert u\Vert ^{2}\le w_{n}^{2}n\Vert u\Vert +nb_{n}w_{n}^{2}\Vert u\Vert ^{2}. \end{aligned}$$

From the above results, we know $L_{I}$ dominates all of the items uniformly in $\Vert u\Vert =C$ when a sufficiently large C is chosen. As $L_{I}$ is positive, this completes the proof of theorem 3.1. Proof of Theorem 3.2: In order to improve readability, we divide the proof of Theorem 3.2 into two steps, as shown below. In this step, we will show weak consistency. As shown in Theorem 1, there is root n-uniform local maximum $\mathbf {B}$. In step 2, we need to prove the asymptotic normality of the penalty least squares estimator.

Step 1: It is sufficient to show that with probability tending to 1 as $n\rightarrow \infty $, for any $\beta $ satisfying $\beta _{I}-\beta _{I0}=O_{P}(n^{-1/2})$ and $j=k+1, \ldots , q$, we have

$$\begin{aligned} \frac{\partial L_{n}\left( \beta \right) }{\partial \beta _{j}}= \left\{ \begin{array}{l} > 0,\quad \text {for}\quad 0<\beta _{j}<\varepsilon _{n},\\< 0,\quad \text {for}\quad -\varepsilon _{n}<\beta _{j}<0. \end{array} \right. \end{aligned}$$

(A.3)

To show (A.3), considering the partial derivative of $L_{n}(\beta )$ at any differentiable point $\beta =\left( \beta _{1}, \ldots , \beta _{q} \right) $, we obtain

$$\begin{aligned} \frac{\partial L_{n}\left( \beta \right) }{\partial \beta _{j}}=&-\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\beta \right) \widehat{X}_{ij}+np_{\lambda }^{'}\left( \left| \beta _{j}\right| \right) sgn\left( \beta _{j} \right) \\ =&-\sum _{i=1}^{n}\limits \left( \varepsilon _{i}-X_{i}^{\text {T}}\left( \beta -\beta _{0} \right) \right) X_{ij}-\sum _{i=1}^{n}\limits \left( \varepsilon _{i}-X_{i}^{\text {T}}\left( \beta -\beta _{0} \right) \right) \left( \widehat{X}_{ij}-X_{ij} \right) \\&+\sum _{i=1}^{n}\limits \left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}\beta X_{ij}+\sum _{i=1}^{n}\limits \left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}\beta \left( \widehat{X}_{ij}-X_{ij} \right) +np_{\lambda }^{'}\left( \left| \beta _{j}\right| \right) sgn\left( \beta _{j} \right) \\ \overset{\wedge }{=}&P_{1}+P_{2}+P_{3}+P_{4}+P_{5}, \end{aligned}$$

where $\beta _{0}$ is the true value of $\beta $, $j=k+1, \ldots , q$.

We first consider $P_{1}$, by theorem 3.1, it is easy to prove a for any $\beta $ satisfying $\beta _{I}-\beta _{I0}=O_{P}\left( n^{-1/2} \right) $ and $\left| \beta _{II}-\beta _{II0}\right| \le \varepsilon _{n}=Cn^{-1/2}$ satisfying any positive constant C,

$$\begin{aligned} \text {E}\left( \max _{k+1\le j \le q} \left| \sum _{i=1}^{n}\limits \varepsilon _{i}X_{ij}\right| \right) \le \text {E}^{1/2}\left( \sum _{j=k+1}^{q}\limits \left| \sum _{i=1}^{n}\limits \varepsilon _{i}X_{ij}\right| ^{2} \right) \le \sigma n^{-1/2} \end{aligned}$$

and

$$\begin{aligned} \max _{k+1\le j \le q}\left| \sum _{i=1}^{n}\limits X_{i}^{\text {T}}\left( \beta -\beta _{0} \right) X_{ij}\right|&\le \Vert \beta -\beta _{0}\Vert \max _{k+1\le j \le q}\sqrt{X_{\cdot j}^{\text {T}}XX^{\text {T}}X_{\cdot j}}\\&\le Cn^{-1/2}\max _{k+1\le j \le q} \Vert X_{\cdot j}\Vert \lambda _{\max }^{1/2}\left( L_{n} \right) \\&=O_{P}\left( n^{-1/2} \right) . \end{aligned}$$

From the above argument, it shows $P_{1}=O_{P}\left( n^{-1/2} \right) $. By (A2), (A4) and the above similar argument, we can get $P_{i}=o_{P}(1), i=2, 3, 4, 5$.

Using the above arguments, $\liminf \limits _{n\rightarrow \infty }\liminf \limits _{\nu \rightarrow 0^{+}}p_{\lambda }^{'}\left( \nu \right) /\lambda >0$ and $n^{-1/2}/\lambda \rightarrow 0$, we obtain

$$\begin{aligned} \frac{\partial L_{n}\left( \beta \right) }{\partial \beta _{j}}&=-O_{P}\left( n^{-1/2} \right) +o_{P}\left( 1 \right) +np^{'}_{\lambda }\left( |\beta _{j}| \right) sgn\left( \beta _{j} \right) \\&=n\lambda \left\{ \lambda ^{-1}p^{'}_{\lambda }\left( |\beta _{j}| \right) sgn\left( \beta _{j} \right) -O_{P}\left( n^{-1/2}/\lambda \right) \right\} , \end{aligned}$$

the sign of the derivative is completely determined by that of $\beta _{j}$. Hence, (A.3) follows. This completes the proof.

Step 2:

Using the Taylor’s theorem on $\nabla L_{n}\left( \widehat{\beta _{I}} \right) $ at $\beta _{I0}$, and $\widehat{\beta _{I}}$ must satisfy the penalized least squares equation $\nabla L_{n}\left( \widehat{\beta _{I}} \right) =0$, we have

$$\begin{aligned} 0=\nabla L_{n}\left( \widehat{\beta _{I}} \right) =\nabla L_{n}\left( \beta _{I0} \right) +\nabla ^{2} L_{n}\left( \beta _{I0}^{*} \right) \left( \widehat{\beta _{I}}-\beta _{I0} \right) , \end{aligned}$$

where is $\beta _{I0}^{*}$ between $\widehat{\beta _{I}}$ and $\beta _{I0}$. Using the definitions of $L_{n}\left( \cdot \right) $ and $\varepsilon =Y-X\beta _{0}$, we can have

$$\begin{aligned} \nabla L_{n}\left( \beta _{I0} \right) =&-\widehat{X}^{\text {T}}_{I}\left( Y-\widehat{X}_{I}\beta _{I0} \right) +n\nabla p_{\lambda }\left( |\beta _{I0}| \right) \nonumber \\ =&-X_{I}^{\text {T}}\varepsilon +X_{I}^{\text {T}}\left( \widehat{X}_{I}-X_{I} \right) \beta _{I0}-\left( \widehat{X}_{I}-X_{I} \right) ^{\text {T}}\varepsilon \nonumber \\&+\left( \widehat{X}_{I}-X_{I} \right) ^{\text {T}}\left( \widehat{X}_{I}-X_{I} \right) \beta _{I0}+n\nabla p_{\lambda }\left( |\beta _{I0}| \right) \end{aligned}$$

(A.4)

and

$$\begin{aligned} \nabla ^{2} L_{n}\left( \beta _{I0}^{*} \right) =&\widehat{X}_{I}^{\text {T}}\widehat{X}_{I}+n\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) \\ =&X^{\text {T}}_{I}X_{I}+\left( \widehat{X}^{\text {T}}_{I}\widehat{X}_{I}-X^{\text {T}}_{I}X_{I} \right) +n\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) , \end{aligned}$$

where $\nabla p_{\lambda }\left( |\beta _{I0}| \right) =\left( p^{'}_{\lambda }\left( |\beta _{01}| \right) sgn\left( \beta _{01} \right) , \ldots , p^{'}_{\lambda }\left( |\beta _{k}| \right) sgn\left( \beta _{k} \right) \right) _{k\times 1}^{\text {T}}$, $\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) $ is the diagonal matrix whose diagonal elements are $p^{''}_{\lambda }\left( \beta _{0j}^{*} \right) , j=1, 2, \ldots , k$.

For the first term of (A.4), we have the follows as $n^{-1/2}\left( \widehat{X}_{I}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) =O_{P}(n^{-1/2})$,

$$\begin{aligned}&\frac{1}{n}\widehat{X}_{I}^{\text {T}}\left( Y-\widehat{X}_{I}\beta _{I0} \right) \nonumber \\ \rightarrow&\left( \frac{1}{n}X^{\text {T}}_{I}X_{I}+\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) \right) \left( \widehat{\beta _{I}}-\beta _{I0} \right) +\nabla p_{\lambda }\left( |\beta _{I0}| \right) \nonumber \\ \overset{\wedge }{=}&\left( M_{I}+\Sigma _{\lambda }\left( \beta _{I0} \right) \right) \left( \widehat{\beta _{I}}-\beta _{I0} \right) +P_{n}, \end{aligned}$$

(A.5)

and perform a simple calculation, we obtain

$$\begin{aligned} \frac{1}{\sqrt{n}}\widehat{X}^{\text {T}}_{I}\left( Y-\widehat{X}_{I}\beta _{I0} \right) \overset{\wedge }{=}\frac{1}{\sqrt{n}}X_{I}^{\text {T}}\varepsilon -\frac{1}{\sqrt{n}}X_{I}^{\text {T}}\left( \widehat{X}_{I}-{X}_{I} \right) \beta _{I0}+K_{1}-K_{2}, \end{aligned}$$

where $K_{1}=n^{-1/2}\left( \widehat{X}_{I}-{X}_{I} \right) ^{\text {T}}\varepsilon $ and $K_{2}=n^{-1/2}\left( \widehat{X}_{I}-{X}_{I} \right) ^{\text {T}}\left( \widehat{X}_{I}-{X}_{I} \right) \beta _{I0}$. By (A3) and $\widehat{X}-X=o_{P}(1)$, it is obvious that $K_{1}=K_{2}=o_{P}(1)$. Then, multiplying (A.5) by $\sqrt{n}A_{n}M_{I}^{-1}$,

$$\begin{aligned}&\sqrt{n}A_{n}M_{I}^{-1}\left( M_{I}+\Sigma _{\lambda }\left( \beta _{I0} \right) \right) \left\{ \left( \widehat{\beta _{I}}-\beta _{I0} \right) +\left( M_{I}+\Sigma _{\lambda }\left( \beta _{I0} \right) \right) ^{-1}P_{n}\right\} \\ =&\frac{1}{\sqrt{n}}A_{n}M_{I}^{-1}X_{I}^{\text {T}}\varepsilon -\frac{1}{\sqrt{n}}A_{n}M_{I}^{-1}X_{I}^{\text {T}}\left( \widehat{X}_{I}-X_{I} \right) \beta _{I0}+o_{P}(1)\\ \overset{\wedge }{=}&W_{1}-W_{2}+o_{P}(1). \end{aligned}$$

Next, we prove that $W_{1}$ and $W_{2}$ satisfy the assumptions of Lindeberg-Feller central limit theorem. For $W_{1}$, denote $T_{ni}=n^{-1/2}A_{n}M_{I}^{-1}X_{Ii}\varepsilon _{i}$, for any $\delta >0$, we have

$$\begin{aligned} \sum _{i=1}^{n}\limits \text {E}{\left\{ \Vert T_{ni}\Vert ^{2}\mathbbm {1}\left( \Vert T_{ni}\Vert>\delta \right) \right\} }=n\text {E}{\Vert T_{ni}\Vert ^{2}}\mathbbm {1}\left( \Vert T_{ni}\Vert>\delta \right) \le n \left\{ \text {E}{\Vert T_{ni}\Vert ^{4}}\right\} ^{1/2}\left\{ P\left( \Vert T_{ni}\Vert >\delta \right) \right\} ^{1/2}. \end{aligned}$$

Applying the argument of Craven and Wahba (1978) and assumption (C1), we have $P\left( \Vert T_{ni}\Vert >\delta \right) \le \text {E}{\Vert T_{ni}\Vert }^{2}/n\delta ^{2}\le \sigma ^{2}\lambda _{\max }\left( A_{n}A_{n}^{\text {T}} \right) /n\delta ^{2}\lambda _{\min }^{2}\left( M_{I} \right) =O_{P}\left( n^{-1} \right) $, and

$$\begin{aligned} \text {E}{\left( \Vert T_{ni}\Vert ^{4} \right) }=\text {E}{\left( T_{ni}^{\text {T}}T_{ni} \right) }^{2}=O\left( n^{-2} \right) , \end{aligned}$$

then, we obtain

$$\begin{aligned} \sum _{i=1}^{n}\limits \text {E}{\left\{ \Vert T_{ni}\Vert ^{2}\mathbbm {1}\left( \Vert T_{ni}\Vert >\delta \right) \right\} }=O\left( n\frac{1}{n}\sqrt{\frac{1}{n}} \right) =o(1), \end{aligned}$$

which implies that $W_{1}$ satisfies the conditions of the Lindeberg-Feller central limit theorem. For $W_{2}$, under (A2), we can get the same conclusion, and these terms are not correlated. Thus, $\text {Var}\left( W_{1}-W_{2} \right) =A_{n}\left( \sigma ^{2}M_{I}^{-1}+R \right) A_{n}^{\text {T}}\rightarrow H$, where H is a $p\times p$ nonnegative symmetric matrix. The above two steps complete the proof of theorem 3.2. Proof of Theorem 3: By Theorem 1, we have $\widehat{\beta }-\beta _{0}=O_{P}\left( n^{-1/2}+a_{n} \right) $, $\widehat{X}-X=o_{P}(1)$, then

$$\begin{aligned} \widehat{\overline{Y}}=&N^{-1}\sum _{i\in M}Y_{i}+N^{-1}\sum _{i\in \overline{M}}\widehat{X}_{i}^{\text {T}}\widehat{\beta } \\ =&N^{-1}\sum _{i\in M}Y_{i}+N^{-1}\sum _{i\in \overline{M}}\left( X_{i}+(\widehat{X}_{i}-X_{i}) \right) ^{\text {T}}\left( \beta _{0}+(\widehat{\beta }-\beta _{0}) \right) \\ =&N^{-1}\left( \sum _{i\in M}Y_{i}+\sum _{i\in \overline{M}}X_{i}^{\text {T}}\beta _{0} \right) +N^{-1}\sum _{i\in \overline{M}}X_{i}^{\text {T}} (\widehat{\beta }-\beta _{0}) \\&+N^{-1}\sum _{i\in \overline{M}}(\widehat{X}_{i}-X_{i})^{\text {T}}\beta _{0}+N^{-1}\sum _{i\in \overline{M}}(\widehat{X}_{i}-X_{i})^{\text {T}}(\widehat{\beta }-\beta _{0}) \\ =&\overline{Y}+N^{-1}O_{P}\left( n^{-1/2}+a_{n} \right) +N^{-1}o_{P}(1)+N^{-1}o_{P}\left( n^{-1/2}+a_{n} \right) \\ =&\overline{Y}+N^{-1}O_{P}\left( n^{-1/2}+a_{n} \right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, Y., Cai, W. & Liu, Z. Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model. Stat Methods Appl 31, 955–979 (2022). https://doi.org/10.1007/s10260-021-00619-w

Download citation

Accepted: 23 December 2021
Published: 20 January 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10260-021-00619-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model

Abstract

Access this article

Similar content being viewed by others

Estimation of the finite population distribution function using a global penalized calibration method

Semi-parametric regression when some (expensive) covariates are missing by design

An empirical likelihood approach under cluster sampling with missing observations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model

Abstract

Access this article

Similar content being viewed by others

Estimation of the finite population distribution function using a global penalized calibration method

Semi-parametric regression when some (expensive) covariates are missing by design

An empirical likelihood approach under cluster sampling with missing observations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation