Skip to main content
Log in

Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Non-probability samples become increasingly popular in sampling survey with lower costs, shorter time durations and higher efficiencies. In the high-dimensional superpopulation modeling approach for non-probability samples, a model is fitted for the analysis variable from a non-probability sample, and is used to project the sample to the full population. In practice, there exist situations that the covariates in modeling process are not directly observed, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable confounder. In the paper, we propose to calibrate the covariates by nonparametrically regressing the observable contaminated covariate on the confounder. We employ the SCAD-penalized least squares method to investigate the variable selection and inference problems for non-probability samples based on the calibrated covariates. A SCAD-penalized estimator for the parameter and the population mean estimator are obtained. Under some mild assumptions, we establish the “oracle property” of the proposed SCAD-penalized estimator and give the consistency properties of the proposed population mean estimator. Simulation studies are conducted to assess the finite-sample performance of the proposed method. An application to a Boston housing price study demonstrates the utility of the proposed method in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Tourangeau R (2013) Summary report of the AAPOR task force on non-probability sampling. J Surv Stat Methodol 1(2):90–143

    Article  Google Scholar 

  • Bethlehem J (2016) Solving the nonresponse problem with sample matching? Soc Sci Comput Rev 34(1):59–77

    Article  Google Scholar 

  • Chen JKT, Valliant RL, Elliott MR (2019) Calibrating non-probability surveys to estimated control totals using LASSO, with an application to political polling. J R Stat Soc Ser C (Appl Stat) 68(3):657–681

    Article  MathSciNet  Google Scholar 

  • Cooper D, Greenaway M (2015) Non-probability survey sampling in official statistics. Retrieved from Office for National Statistics website: https://www.google.com/url

  • Craven P, Wahba G (1978) Smoothing noisy data with spline functions. Numer Math 31(4):377–403

    Article  MathSciNet  Google Scholar 

  • Cui X, Guo W, Lin L, Zhu L (2009) Covariate-adjusted nonlinear regression. Ann Stat 37(4):1839–1870

    Article  MathSciNet  Google Scholar 

  • Cui X (2008) Statistical analysis of two types of complex data and its associated model. Ph.D. Thesis, Shandong University, Jinan

  • Delaigle A, Hall P, Zhou WX (2016) Nonparametric covariate-adjusted regression. Ann Stat 44(5):2190–2220

    MathSciNet  MATH  Google Scholar 

  • Elliott MR, Valliant R (2017) Inference for non-probability samples. Stat Sci 32(2):249–264

    Article  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102

    Article  Google Scholar 

  • Keiding N, Louis TA (2016) Perils and potentials of self-selected entry to epidemiological studies and surveys. J R Stat Soc Ser A (Stat Soc) 179(2):319–376

    Article  MathSciNet  Google Scholar 

  • Kim J K, Park S, Chen Y, Wu C (2018) Combining non-probability and probability survey samples through mass imputation. arXiv preprint arXiv: 1812.10694

  • Li F, Lin L, Cui X (2010) Covariate-adjusted partially linear regression models. Commun Stat Theory Methods 39(6):1054–1074

    Article  MathSciNet  Google Scholar 

  • Li X, Du J, Li G, Fan M (2014) Variable selection for covariae adjusted regression model. J Syst Sci Complexity 27(6):1227–1246

    Article  MathSciNet  Google Scholar 

  • Meijer RJ, Goeman JJ (2013) Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical J 55(2):141–155

    Article  MathSciNet  Google Scholar 

  • Nguyen DV, Sentürk D (2008) Multicovariate-adjusted regression models. J Stat Comput Simul 78(9):813–827

    Article  MathSciNet  Google Scholar 

  • Schreuder HT, Gregoire TG, Weyer JP (2001) For what applications can probability and non-probability sampling be used? Environ Monit Assess 66(3):281–291

    Article  Google Scholar 

  • Şentürk D, Müller HG (2005) Covariate adjusted correlation analysis via varying coefficient models. Scand J Stat 32(3):365–383

    Article  MathSciNet  Google Scholar 

  • Şentürk D, Müller HG (2005) Covariate-adjusted regression. Biometrika 92(1):75–89

    Article  MathSciNet  Google Scholar 

  • Şentürk D, Müller HG (2009) Covariate-adjusted generalized linear models. Biometrika 96(2):357–370

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Yang S, Kim JK, Song R (2020) Doubly robust inference when combining probability and non-probability samples with high dimensional data. J R Stat Soc Ser B (Stat Methodol) 82(2):445–465

    Article  MathSciNet  Google Scholar 

  • Ża̧dło, T (2009) On MSE of EBLUP. Stat Papers 50(1):101–118

  • Zhang J, Zhu LP, Zhu LX (2012) On a dimension reduction regression with covariate adjustment. J Multivariate Anal 104(1):39–55

    Article  MathSciNet  Google Scholar 

  • Zhang J, Yu Y, Zhu L, Liang H (2013) Partial linear single index models with distortion measurement errors. Ann Inst Stat Math 65(2):237–267

    Article  MathSciNet  Google Scholar 

  • Zhang L (2019) On valid descriptive inference from non-probability sample. Stat Theory Related Fields 3(2):103–113

    Article  MathSciNet  Google Scholar 

  • Zhu LX, Fang KT (1996) Asymptotics for kernel estimate of sliced inverse regression. Ann Stat 24(3):1053–1068

    Article  MathSciNet  Google Scholar 

  • Zou H (2008) A note on path-based variable selection in the penalized proportional hazards model. Biometrika 95(1):241–247

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) (No. 11901175) and Fundamental Research Funds for Hubei Key Laboratory of Applied Mathematics, Hubei University (No. HBAM 201907).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhan Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The detailed proofs of the three theorems is shown as follows. Under assumptions (B1)-(B4) and (C1)-(C6), similarly to Lemma 2.1 in Cui (2008), we obtain

$$\begin{aligned}&(A1)\quad \frac{1}{\sqrt{n}}\left( \widehat{X}-X \right) ^{\text {T}}Y=o_{P}\left( 1 \right) \\&(A2)\quad \frac{1}{n}X^{\text {T}}\left( \widehat{X}-X \right) =M\text {diag}\left\{ \frac{1}{n}\sum _{i=1}^{n}\limits \frac{\left( \widetilde{X}_{i1}-X_{i1} \right) }{\text {E}{X_{1}}}, \ldots , \frac{1}{n}\sum _{i=1}^{n}\limits \frac{\left( \widetilde{X}_{iq}-X_{iq} \right) }{\text {E}{X_{q}}}\right\} +o_{P}\left( n^{-1/2} \right) \\&(A3)\quad \frac{1}{n}\left( \widehat{X}-X \right) ^{\text {T}}\left( \widehat{X}-X \right) =o_{P}\left( n^{-1/2} \right) \\&(A4)\quad \frac{1}{n}\widehat{X}^{\text {T}}\widehat{X}-\frac{1}{n}X^{\text {T}}X=O_{P}\left( n^{-1/2} \right) \end{aligned}$$

Proof of Theorem 3.1: Let \(w_{n}=n^{-1/2}+a_{n}\) and \(\Vert u\Vert =C\), where C is a sufficiently large constant. It is sufficient to show that, for any given \(\varepsilon \), there exists a large enough constant C such that

$$\begin{aligned} P\{\inf _{\Vert u\Vert =C}\limits L_{n}(\beta _{0}+w_{n}u)-L_{n}(\beta _{0})> 0\}\ge 1-\varepsilon , \end{aligned}$$

which implies that there exists a local minimizer in the ball \(\{\beta _{0}+w_{n}u\}\) with probability of at least \(1-\varepsilon \). Hence, there exists a local minimizer such that \(\Vert \widehat{\beta }-\beta _{0})\Vert =O_{P}(w_{n})\).

From (2.8), using \(p_{\lambda }\left( 0 \right) =0\), we obtain

$$\begin{aligned}&L_{n}(\beta _{0}+w_{n}u)-L_{n}(\beta _{0})\\&\quad \ge \frac{1}{2}\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\left( \beta _{0}+w_{n}u \right) \right) ^{2}-\frac{1}{2}\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\beta _{0} \right) ^{2}\\&\quad \quad +n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }\left( \left| \beta _{0j}+w_{n}u_{j}\right| \right) -p_{\lambda }\left( \left| \beta _{0j}\right| \right) \right\} \\ \overset{\wedge }{=}&L_{I}+L_{II}, \end{aligned}$$

where k is the dimension of \(\beta _{I0}\), and \(\beta _{0j}\) is the jth element of \(\beta _{0}\). Then, we analyze the above difference in two steps.

Step 1: Show that

$$\begin{aligned} L_{I}=L_{11}+L_{21}+L_{22}+L_{23}+L_{24}+o_{P}\left( w_{n}^{2}n/2 \right) \Vert u\Vert ^{2}. \end{aligned}$$
(A.1)

For \(L_{I}\), we decompose it into two parts by performing a simple calculation, and obtain

$$\begin{aligned} L_{I}=&\frac{w_{n}^{2}}{2}\sum _{i=1}^{n}\limits \left( \widehat{X}_{i}^{\text {T}}u \right) ^{2}-w_{n}\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\beta _{0} \right) \widehat{X}_{i}^{\text {T}}u\\ \overset{\wedge }{=}&L_{1}+L_{2}, \end{aligned}$$

where \(L_{1}=w_{n}^{2}u^{\text {T}}X^{\text {T}}Xu/2+w_{n}^{2}u^{\text {T}}\left( \widehat{X}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) u/2\). By (A1), it implies that \(w_{n}^{2}u^{\text {T}} \left( \widehat{X}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) u/2=O_{P}\left( w_{n}^{2}n^{1/2}/2 \right) \Vert u\Vert ^{2}\), we obtain

$$\begin{aligned} L_{1}&=w_{n}^{2}u^{\text {T}}X^{\text {T}}Xu/2+O_{P}\left( w_{n}^{2}n^{1/2}/2 \right) \Vert u\Vert ^{2} \nonumber \\&\overset{\wedge }{=}L_{11}+o_{P}\left( w_{n}^{2}n/2 \right) \Vert u\Vert ^{2}. \end{aligned}$$
(A.2)

By performing a simple calculation, we obtain

$$\begin{aligned} L_{11}=\frac{w_{n}^{2}}{2}nu^{\text {T}}\left( \frac{1}{n}X^{\text {T}}X-\text {E}{\left( XX^{\text {T}} \right) } \right) u+\frac{w_{n}^{2}}{2}nu^{\text {T}}\text {E}{\left( XX^{\text {T}} \right) }u. \end{aligned}$$

Notice that

$$\begin{aligned} P\left( \Vert \frac{1}{n}X^{\text {T}}X-\text {E}{\left( XX^{\text {T}} \right) }\Vert \ge \varepsilon \right) \le \frac{n}{n^{2}\varepsilon ^{2}}\text {E}{\sum _{i,j}^{q}\left\{ X_{i}X_{j}-\text {E}{\left( X_{i}X_{j} \right) }\right\} ^{2}}=\frac{1}{n}, \end{aligned}$$

which implies that

$$\begin{aligned} \Vert \frac{1}{n}X^{\text {T}}X-\text {E}{\left( XX^{\text {T}} \right) }\Vert =O_{P}\left( n^{-1/2} \right) =o_{P}(1). \end{aligned}$$

From the above argument, we can rewrite equation (A.2) as

$$\begin{aligned} L_{1}=\frac{w_{n}^{2}}{2}nu^{\text {T}}\text {E}{\left( XX^{\text {T}} \right) }u+o_{P}\left( nw_{n}^{2}/2 \right) \Vert u\Vert ^{2}. \end{aligned}$$

Next, we focus on \(L_{2}\). As \(\varepsilon _{i}=Y_{i}-X_{i}^{\text {T}}\beta _{0}\), we perform a simple calculation on \(L_{2}\) and obtain

$$\begin{aligned} L_{2}=&-w_{n}\sum _{i=1}^{n}\limits \left\{ \left( \varepsilon _{i}-\left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}\beta _{0} \right) \left( X_{i}^{\text {T}}u+\left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}u \right) \right\} \\ =&-w_{n}\varepsilon ^{\text {T}}Xu-w_{n}\varepsilon ^{\text {T}}\left( \widehat{X}-X \right) u+w_{n}u^{\text {T}}X^{\text {T}}\left( \widehat{X}-X \right) \beta _{0}\\&+w_{n}u^{\text {T}}\left( \widehat{X}-X \right) ^{\text {T}}\left( \widehat{X}-X \right) \beta _{0}\\&\overset{\wedge }{=}L_{21}+L_{22}+L_{23}+L_{24}. \end{aligned}$$

Due to \(\left| L_{21}\right| \le w_{n}\Vert \varepsilon ^{\text {T}}X\Vert \Vert u\Vert =w_{n}\Vert \sum _{i=1}^{n}\varepsilon _{i}X_{i}\Vert \Vert u\Vert \) and assumption (C1), it implies that

$$\begin{aligned} \text {E}{\Vert \sum _{i=1}^{n}\limits \varepsilon _{i}X_{i}\Vert ^{2}}=\text {E}{\Vert \sum _{l=1}^{q}\limits \sum _{i=1}^{n}\limits \sum _{i=1}^{n}\limits \varepsilon _{i}\varepsilon _{j}X_{il}X_{jl}\Vert }\le \sigma ^{2}n, \end{aligned}$$

then, we obtain \(\left| L_{21}\right| \le O_{P}\left( w_{n}n^{-1/2}\Vert u\Vert \right) =O_{P}\left( nw_{n}^{2} \right) \Vert u\Vert \). Similarly to the proof of Lemma 2.1 in Cui (2008), we have \(\left| L_{22}\right| =w_{n}\left| \varepsilon ^{\text {T}}\left( \widehat{X}-X \right) u\right| =o_{P}\left( nw_{n}^{2} \right) \Vert u\Vert \) and \(L_{23}=L_{24}=o_{P}\left( nw_{n}^{2} \right) \Vert u\Vert \). Based on the above proof, (A.1) can be obtained.

Step 2: Show that

$$\begin{aligned} L_{II}\le nw_{n}^{2}\Vert u\Vert +nb_{n}w_{n}^{2}\Vert u\Vert ^{2}. \end{aligned}$$

We perform a second-order Taylor expansion of \(p_{\lambda }\left( \left| \beta _{0j}+w_{n}u_{j}\right| \right) \) around \(\left| \beta _{0j}\right| \) and obtain

$$\begin{aligned} L_{II}&=n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }\left( \left| \beta _{0j}+w_{n}u_{j}\right| \right) -p_{\lambda }\left( \left| \beta _{0j}\right| \right) \right\} \\&=n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }^{'}\left( \left| \beta _{0j}\right| \right) sgn\left( \beta _{0j} \right) w_{n}u_{j}+\frac{1}{2}p_{\lambda }^{''}\left( \left| \beta _{0j}\right| \right) w_{n}^{2}u_{j}^{2}\left( 1+o(1) \right) \right\} . \end{aligned}$$

Using the definitions of \(a_{n}=\max \left\{ p_{\lambda }\left( \left| \beta _{0j}\right| \right) \right\} \), \(b_{n}=\max \left\{ \left| p^{''}_{\lambda }\left( \beta _{0j} \right) \right| :\beta _{0j}\ne 0\right\} \) and the inequality \(\left| sgn\left( \beta _{0j} \right) \right| \le 1\), we have

$$\begin{aligned} n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }^{'}\left( \left| \beta _{0j}\right| \right) sgn\left( \beta _{0j} \right) w_{n}u_{j}\right\} \le&n\sum _{j=1}^{k}\limits \left\{ \left| p_{\lambda }^{'}\left( \left| \beta _{0j}\right| \right) sgn\left( \beta _{0j} \right) w_{n}u_{j}\right| \right\} \\ \le&n\sum _{j=1}^{k}\limits \left\{ |a_{n}|w_{n}|u_{j}|\right\} =n|a_{n}|w_{n}\Vert u\Vert , \end{aligned}$$

and

$$\begin{aligned} n\sum _{j=1}^{k}\limits \left\{ p_{\lambda }^{''}\left( |\beta _{0j}| \right) w_{n}^{2}u_{j}^{2}\left( 1+o(1) \right) \right\} /2\le nb_{n}w_{n}^{2}\Vert u\Vert ^{2}. \end{aligned}$$

Due to \(w_{n}=n^{-1/2}+a_{n}\), which implies \(a_{n}\le w_{n}\), we obtain

$$\begin{aligned} L_{II}\le a_{n}nw_{n}\Vert u\Vert +nb_{n}w_{n}^{2}\Vert u\Vert ^{2}\le w_{n}^{2}n\Vert u\Vert +nb_{n}w_{n}^{2}\Vert u\Vert ^{2}. \end{aligned}$$

From the above results, we know \(L_{I}\) dominates all of the items uniformly in \(\Vert u\Vert =C\) when a sufficiently large C is chosen. As \(L_{I}\) is positive, this completes the proof of theorem 3.1. Proof of Theorem 3.2: In order to improve readability, we divide the proof of Theorem 3.2 into two steps, as shown below. In this step, we will show weak consistency. As shown in Theorem 1, there is root n-uniform local maximum \(\mathbf {B}\). In step 2, we need to prove the asymptotic normality of the penalty least squares estimator.

Step 1: It is sufficient to show that with probability tending to 1 as \(n\rightarrow \infty \), for any \(\beta \) satisfying \(\beta _{I}-\beta _{I0}=O_{P}(n^{-1/2})\) and \(j=k+1, \ldots , q\), we have

$$\begin{aligned} \frac{\partial L_{n}\left( \beta \right) }{\partial \beta _{j}}= \left\{ \begin{array}{l} > 0,\quad \text {for}\quad 0<\beta _{j}<\varepsilon _{n},\\< 0,\quad \text {for}\quad -\varepsilon _{n}<\beta _{j}<0. \end{array} \right. \end{aligned}$$
(A.3)

To show (A.3), considering the partial derivative of \(L_{n}(\beta )\) at any differentiable point \(\beta =\left( \beta _{1}, \ldots , \beta _{q} \right) \), we obtain

$$\begin{aligned} \frac{\partial L_{n}\left( \beta \right) }{\partial \beta _{j}}=&-\sum _{i=1}^{n}\limits \left( Y_{i}-\widehat{X}_{i}^{\text {T}}\beta \right) \widehat{X}_{ij}+np_{\lambda }^{'}\left( \left| \beta _{j}\right| \right) sgn\left( \beta _{j} \right) \\ =&-\sum _{i=1}^{n}\limits \left( \varepsilon _{i}-X_{i}^{\text {T}}\left( \beta -\beta _{0} \right) \right) X_{ij}-\sum _{i=1}^{n}\limits \left( \varepsilon _{i}-X_{i}^{\text {T}}\left( \beta -\beta _{0} \right) \right) \left( \widehat{X}_{ij}-X_{ij} \right) \\&+\sum _{i=1}^{n}\limits \left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}\beta X_{ij}+\sum _{i=1}^{n}\limits \left( \widehat{X}_{i}-X_{i} \right) ^{\text {T}}\beta \left( \widehat{X}_{ij}-X_{ij} \right) +np_{\lambda }^{'}\left( \left| \beta _{j}\right| \right) sgn\left( \beta _{j} \right) \\ \overset{\wedge }{=}&P_{1}+P_{2}+P_{3}+P_{4}+P_{5}, \end{aligned}$$

where \(\beta _{0}\) is the true value of \(\beta \), \(j=k+1, \ldots , q\).

We first consider \(P_{1}\), by theorem 3.1, it is easy to prove a for any \(\beta \) satisfying \(\beta _{I}-\beta _{I0}=O_{P}\left( n^{-1/2} \right) \) and \(\left| \beta _{II}-\beta _{II0}\right| \le \varepsilon _{n}=Cn^{-1/2}\) satisfying any positive constant C,

$$\begin{aligned} \text {E}\left( \max _{k+1\le j \le q} \left| \sum _{i=1}^{n}\limits \varepsilon _{i}X_{ij}\right| \right) \le \text {E}^{1/2}\left( \sum _{j=k+1}^{q}\limits \left| \sum _{i=1}^{n}\limits \varepsilon _{i}X_{ij}\right| ^{2} \right) \le \sigma n^{-1/2} \end{aligned}$$

and

$$\begin{aligned} \max _{k+1\le j \le q}\left| \sum _{i=1}^{n}\limits X_{i}^{\text {T}}\left( \beta -\beta _{0} \right) X_{ij}\right|&\le \Vert \beta -\beta _{0}\Vert \max _{k+1\le j \le q}\sqrt{X_{\cdot j}^{\text {T}}XX^{\text {T}}X_{\cdot j}}\\&\le Cn^{-1/2}\max _{k+1\le j \le q} \Vert X_{\cdot j}\Vert \lambda _{\max }^{1/2}\left( L_{n} \right) \\&=O_{P}\left( n^{-1/2} \right) . \end{aligned}$$

From the above argument, it shows \(P_{1}=O_{P}\left( n^{-1/2} \right) \). By (A2), (A4) and the above similar argument, we can get \(P_{i}=o_{P}(1), i=2, 3, 4, 5\).

Using the above arguments, \(\liminf \limits _{n\rightarrow \infty }\liminf \limits _{\nu \rightarrow 0^{+}}p_{\lambda }^{'}\left( \nu \right) /\lambda >0\) and \(n^{-1/2}/\lambda \rightarrow 0\), we obtain

$$\begin{aligned} \frac{\partial L_{n}\left( \beta \right) }{\partial \beta _{j}}&=-O_{P}\left( n^{-1/2} \right) +o_{P}\left( 1 \right) +np^{'}_{\lambda }\left( |\beta _{j}| \right) sgn\left( \beta _{j} \right) \\&=n\lambda \left\{ \lambda ^{-1}p^{'}_{\lambda }\left( |\beta _{j}| \right) sgn\left( \beta _{j} \right) -O_{P}\left( n^{-1/2}/\lambda \right) \right\} , \end{aligned}$$

the sign of the derivative is completely determined by that of \(\beta _{j}\). Hence, (A.3) follows. This completes the proof.

Step 2:

Using the Taylor’s theorem on \(\nabla L_{n}\left( \widehat{\beta _{I}} \right) \) at \(\beta _{I0}\), and \(\widehat{\beta _{I}}\) must satisfy the penalized least squares equation \(\nabla L_{n}\left( \widehat{\beta _{I}} \right) =0\), we have

$$\begin{aligned} 0=\nabla L_{n}\left( \widehat{\beta _{I}} \right) =\nabla L_{n}\left( \beta _{I0} \right) +\nabla ^{2} L_{n}\left( \beta _{I0}^{*} \right) \left( \widehat{\beta _{I}}-\beta _{I0} \right) , \end{aligned}$$

where is \(\beta _{I0}^{*}\) between \(\widehat{\beta _{I}}\) and \(\beta _{I0}\). Using the definitions of \(L_{n}\left( \cdot \right) \) and \(\varepsilon =Y-X\beta _{0}\), we can have

$$\begin{aligned} \nabla L_{n}\left( \beta _{I0} \right) =&-\widehat{X}^{\text {T}}_{I}\left( Y-\widehat{X}_{I}\beta _{I0} \right) +n\nabla p_{\lambda }\left( |\beta _{I0}| \right) \nonumber \\ =&-X_{I}^{\text {T}}\varepsilon +X_{I}^{\text {T}}\left( \widehat{X}_{I}-X_{I} \right) \beta _{I0}-\left( \widehat{X}_{I}-X_{I} \right) ^{\text {T}}\varepsilon \nonumber \\&+\left( \widehat{X}_{I}-X_{I} \right) ^{\text {T}}\left( \widehat{X}_{I}-X_{I} \right) \beta _{I0}+n\nabla p_{\lambda }\left( |\beta _{I0}| \right) \end{aligned}$$
(A.4)

and

$$\begin{aligned} \nabla ^{2} L_{n}\left( \beta _{I0}^{*} \right) =&\widehat{X}_{I}^{\text {T}}\widehat{X}_{I}+n\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) \\ =&X^{\text {T}}_{I}X_{I}+\left( \widehat{X}^{\text {T}}_{I}\widehat{X}_{I}-X^{\text {T}}_{I}X_{I} \right) +n\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) , \end{aligned}$$

where \(\nabla p_{\lambda }\left( |\beta _{I0}| \right) =\left( p^{'}_{\lambda }\left( |\beta _{01}| \right) sgn\left( \beta _{01} \right) , \ldots , p^{'}_{\lambda }\left( |\beta _{k}| \right) sgn\left( \beta _{k} \right) \right) _{k\times 1}^{\text {T}}\), \(\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) \) is the diagonal matrix whose diagonal elements are \(p^{''}_{\lambda }\left( \beta _{0j}^{*} \right) , j=1, 2, \ldots , k\).

For the first term of (A.4), we have the follows as \(n^{-1/2}\left( \widehat{X}_{I}^{\text {T}}\widehat{X}-X^{\text {T}}X \right) =O_{P}(n^{-1/2})\),

$$\begin{aligned}&\frac{1}{n}\widehat{X}_{I}^{\text {T}}\left( Y-\widehat{X}_{I}\beta _{I0} \right) \nonumber \\ \rightarrow&\left( \frac{1}{n}X^{\text {T}}_{I}X_{I}+\nabla ^{2}p_{\lambda }\left( |\beta _{I0}^{*}| \right) \right) \left( \widehat{\beta _{I}}-\beta _{I0} \right) +\nabla p_{\lambda }\left( |\beta _{I0}| \right) \nonumber \\ \overset{\wedge }{=}&\left( M_{I}+\Sigma _{\lambda }\left( \beta _{I0} \right) \right) \left( \widehat{\beta _{I}}-\beta _{I0} \right) +P_{n}, \end{aligned}$$
(A.5)

and perform a simple calculation, we obtain

$$\begin{aligned} \frac{1}{\sqrt{n}}\widehat{X}^{\text {T}}_{I}\left( Y-\widehat{X}_{I}\beta _{I0} \right) \overset{\wedge }{=}\frac{1}{\sqrt{n}}X_{I}^{\text {T}}\varepsilon -\frac{1}{\sqrt{n}}X_{I}^{\text {T}}\left( \widehat{X}_{I}-{X}_{I} \right) \beta _{I0}+K_{1}-K_{2}, \end{aligned}$$

where \(K_{1}=n^{-1/2}\left( \widehat{X}_{I}-{X}_{I} \right) ^{\text {T}}\varepsilon \) and \(K_{2}=n^{-1/2}\left( \widehat{X}_{I}-{X}_{I} \right) ^{\text {T}}\left( \widehat{X}_{I}-{X}_{I} \right) \beta _{I0}\). By (A3) and \(\widehat{X}-X=o_{P}(1)\), it is obvious that \(K_{1}=K_{2}=o_{P}(1)\). Then, multiplying (A.5) by \(\sqrt{n}A_{n}M_{I}^{-1}\),

$$\begin{aligned}&\sqrt{n}A_{n}M_{I}^{-1}\left( M_{I}+\Sigma _{\lambda }\left( \beta _{I0} \right) \right) \left\{ \left( \widehat{\beta _{I}}-\beta _{I0} \right) +\left( M_{I}+\Sigma _{\lambda }\left( \beta _{I0} \right) \right) ^{-1}P_{n}\right\} \\ =&\frac{1}{\sqrt{n}}A_{n}M_{I}^{-1}X_{I}^{\text {T}}\varepsilon -\frac{1}{\sqrt{n}}A_{n}M_{I}^{-1}X_{I}^{\text {T}}\left( \widehat{X}_{I}-X_{I} \right) \beta _{I0}+o_{P}(1)\\ \overset{\wedge }{=}&W_{1}-W_{2}+o_{P}(1). \end{aligned}$$

Next, we prove that \(W_{1}\) and \(W_{2}\) satisfy the assumptions of Lindeberg-Feller central limit theorem. For \(W_{1}\), denote \(T_{ni}=n^{-1/2}A_{n}M_{I}^{-1}X_{Ii}\varepsilon _{i}\), for any \(\delta >0\), we have

$$\begin{aligned} \sum _{i=1}^{n}\limits \text {E}{\left\{ \Vert T_{ni}\Vert ^{2}\mathbbm {1}\left( \Vert T_{ni}\Vert>\delta \right) \right\} }=n\text {E}{\Vert T_{ni}\Vert ^{2}}\mathbbm {1}\left( \Vert T_{ni}\Vert>\delta \right) \le n \left\{ \text {E}{\Vert T_{ni}\Vert ^{4}}\right\} ^{1/2}\left\{ P\left( \Vert T_{ni}\Vert >\delta \right) \right\} ^{1/2}. \end{aligned}$$

Applying the argument of Craven and Wahba (1978) and assumption (C1), we have \(P\left( \Vert T_{ni}\Vert >\delta \right) \le \text {E}{\Vert T_{ni}\Vert }^{2}/n\delta ^{2}\le \sigma ^{2}\lambda _{\max }\left( A_{n}A_{n}^{\text {T}} \right) /n\delta ^{2}\lambda _{\min }^{2}\left( M_{I} \right) =O_{P}\left( n^{-1} \right) \), and

$$\begin{aligned} \text {E}{\left( \Vert T_{ni}\Vert ^{4} \right) }=\text {E}{\left( T_{ni}^{\text {T}}T_{ni} \right) }^{2}=O\left( n^{-2} \right) , \end{aligned}$$

then, we obtain

$$\begin{aligned} \sum _{i=1}^{n}\limits \text {E}{\left\{ \Vert T_{ni}\Vert ^{2}\mathbbm {1}\left( \Vert T_{ni}\Vert >\delta \right) \right\} }=O\left( n\frac{1}{n}\sqrt{\frac{1}{n}} \right) =o(1), \end{aligned}$$

which implies that \(W_{1}\) satisfies the conditions of the Lindeberg-Feller central limit theorem. For \(W_{2}\), under (A2), we can get the same conclusion, and these terms are not correlated. Thus, \(\text {Var}\left( W_{1}-W_{2} \right) =A_{n}\left( \sigma ^{2}M_{I}^{-1}+R \right) A_{n}^{\text {T}}\rightarrow H\), where H is a \(p\times p\) nonnegative symmetric matrix. The above two steps complete the proof of theorem 3.2. Proof of Theorem 3: By Theorem 1, we have \(\widehat{\beta }-\beta _{0}=O_{P}\left( n^{-1/2}+a_{n} \right) \), \(\widehat{X}-X=o_{P}(1)\), then

$$\begin{aligned} \widehat{\overline{Y}}=&N^{-1}\sum _{i\in M}Y_{i}+N^{-1}\sum _{i\in \overline{M}}\widehat{X}_{i}^{\text {T}}\widehat{\beta } \\ =&N^{-1}\sum _{i\in M}Y_{i}+N^{-1}\sum _{i\in \overline{M}}\left( X_{i}+(\widehat{X}_{i}-X_{i}) \right) ^{\text {T}}\left( \beta _{0}+(\widehat{\beta }-\beta _{0}) \right) \\ =&N^{-1}\left( \sum _{i\in M}Y_{i}+\sum _{i\in \overline{M}}X_{i}^{\text {T}}\beta _{0} \right) +N^{-1}\sum _{i\in \overline{M}}X_{i}^{\text {T}} (\widehat{\beta }-\beta _{0}) \\&+N^{-1}\sum _{i\in \overline{M}}(\widehat{X}_{i}-X_{i})^{\text {T}}\beta _{0}+N^{-1}\sum _{i\in \overline{M}}(\widehat{X}_{i}-X_{i})^{\text {T}}(\widehat{\beta }-\beta _{0}) \\ =&\overline{Y}+N^{-1}O_{P}\left( n^{-1/2}+a_{n} \right) +N^{-1}o_{P}(1)+N^{-1}o_{P}\left( n^{-1/2}+a_{n} \right) \\ =&\overline{Y}+N^{-1}O_{P}\left( n^{-1/2}+a_{n} \right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, Y., Cai, W. & Liu, Z. Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model. Stat Methods Appl 31, 955–979 (2022). https://doi.org/10.1007/s10260-021-00619-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-021-00619-w

Keywords

Navigation