Skip to main content
Log in

Model-assisted calibration with SCAD to estimated control for non-probability samples

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Non-probability samples have been used in various fields in recent years. However, they usually can result in biased estimates. Calibration to estimated control has been proposed to reduce bias from non-probability samples. The relationship models between the study variable and covariates will help to improve the efficiency of calibration. Specifically, the selection of important covariates is a key issue in establishing the relationship models. In this paper, model-assisted calibration to estimated control using the smoothly clipped absolute deviation (SCAD) is proposed to make inference from non-probability samples. Instead of the traditional chi-square distance, the modified forward Kullback–Leibler distance is explored in the proposed method and the corresponding asymptotic properties are derived. Moreover, the classical variable selection approach SCAD is also implemented to conduct both variable selection and parameter estimation in establishing the relationship models for calibration. The performances of the proposed method are investigated through simulation studies, and an application to analyze a non-probability sample from the National Health Interview Survey in 2017.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Gile KJ, Tourangeau R (2013) Summary report of the AAPOR task force on nonprobability sampling. J Surv Stat Methodol 1(2):90–143

    Article  Google Scholar 

  • Breidt FJ, Opsomer JD (2017) Model-assisted survey estimation with modern prediction techniques. Stat Sci 32(2):190–205

    Article  MathSciNet  Google Scholar 

  • Chen JKT, Valliant R, Elliott MR (2018) Model-assisted calibration of non-probability sample survey data using adaptive LASSO. Surv Methodol 44(1):117–144

    Google Scholar 

  • Chen JKT, Valliant R, Elliott MR (2018) Calibrating non-probability surveys to estimated control totals using LASSO, with an application to political polling. J R Stat Soc 68(3):657–681

    Article  MathSciNet  Google Scholar 

  • Chen Y, Li P, Wu C (2018) Doubly robust inference with non-probability survey samples. arXiv preprint arXiv:1805.06432

  • Dever J. A (2008) Sampling weight calibration with estimated control totals. Ph.D. dissertation, University of Maryland, joint program in survey methodology

  • Dever JA, Valliant R (2010) A comparison of variance estimators for poststratification to estimated control totals. Surv Methodol 36(1):45–56

    Google Scholar 

  • Dever JA, Valliant R (2016) A general regression estimation adjusted for undercoverage and estimated control totals. J Surv Stat Methodol 4:289–318

    Article  Google Scholar 

  • Deville JC, Särndal CE (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87(418):376–382

    Article  MathSciNet  Google Scholar 

  • DiSogra C, Cobb C, Chan E, Dennis JM (2011) Calibrating non-probability internet samples with probability samples using early adopter characteristics. Section on survey research methods, joint statistical meetings

  • Elliott MR, Valliant R (2017) Inference for nonprobability samples. Stat Sci 32(2):249–264

    Article  MathSciNet  Google Scholar 

  • Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Haziza D, Lesage E (2016) A discussion of weighting procedures for unit nonresponse. J Off Stat 32(1):129–145

    Article  Google Scholar 

  • Kott PS, Liao D (2017) Calibration weighting for nonresponse that is not missing at random: allowing more calibration than response-model variables. J Surv Stat Methodol 5(2):159–174

    Article  Google Scholar 

  • Lee S, Valliant R (2009) Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Soc Methods Res 37(3):319–343

    Article  MathSciNet  Google Scholar 

  • Lesage E, Haziza D, D’Haultfouill X (2019) A cautionary tale on instrumental calibration for the treatment of nonignorable unit nonresponse in surveys. J Am Stat Assoc 114(526):906–915

  • Mercer AW, Kreuter F, Keeter S, Stuart EA (2017) Theory and practice in nonprobability surveys-parallels between causal inference and survey inference. Public Opin Q 81:250–279

    Article  Google Scholar 

  • Montanari GE, Ranalli MG (2012) Calibration inspired by semiparametric regression as a treatment for nonresponse. J Off Stat 28(2):239–277

    Google Scholar 

  • McConville KS, Breidt FJ, Lee MCT, Moisen GG (2017) Model-assisted survey regression estimation with the lasso. J Surv Stat Methodol 5(2):131–158

    Article  Google Scholar 

  • Montanari GE, Ranalli MG (2005) Nonparametric model calibration estimation in survey sampling. J Am Stat Assoc 100(472):1429–1442

    Article  MathSciNet  Google Scholar 

  • Robbins M. W, Dastidar B. G, Ramchand R (2019) Blending of probability and non-probability samples: applications to a survey of military caregivers. arXiv preprint arXiv:1908.04217

  • Rueda M, Borrego IS, Arcos A, Martnez S (2010) Model-calibration estimation of the distribution function using nonparametric regression. Metrika 71:33–44

    Article  MathSciNet  Google Scholar 

  • Särndal CE (1980) On \(\pi\)-inverse weighting versus best linear unbiased weighting in probability sampling. Biometrika 67(3):639–650

    MathSciNet  MATH  Google Scholar 

  • Tan Z, Wu C (2015) Generalized pseudo empirical likelihood inferences for complex surveys. Can J Stat 43(1):1–17

    Article  MathSciNet  Google Scholar 

  • Tan Z (2013) Simple design-efficient calibration estimators for rejective and high-entropy sampling. Biometrika 100(2):399–415

    Article  MathSciNet  Google Scholar 

  • Tibshirani RJ (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Valliant R, Dever JA (2011) Estimating propensity adjustments for volunteer web surveys. Soc Methods Res 40(1):105–137

    Article  MathSciNet  Google Scholar 

  • Wu C (2003) Optimal calibration estimators in survey sampling. Biometrika 90(4):937–951

    Article  MathSciNet  Google Scholar 

  • Wu C, Sitter R (2001) A model-calibration approach to using complete auxiliary information from survey data. J Am Stat Assoc 96(453):185–193

    Article  MathSciNet  Google Scholar 

  • Yang S, Kim J. K, Song R (2019) Doubly robust inference when combining probability and non-probability samples with high-dimensional data. arXiv preprint arXiv: 1903.05212v1

Download references

Acknowledgements

This research is supported in part by the National Social Science Foundation of China (18BTJ022 to Z. L.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingli Pan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (rar 672 KB)

Appendix: proofs of theorems

Appendix: proofs of theorems

Proof of Lemma 1

The estimated control calibrated weights can be expressed as \(w_{i}=d_{i}^{A}g_{i}\), where \(g_{i}=1/\left( 1-q_{i}^{A}x_{i}^{\text {T}}\lambda \right)\) by the standard Lagrange multiplier method. The Lagrange multiplier \(\lambda =\left( \lambda _{1},\lambda _{2},\ldots ,\lambda _{p} \right) ^{\text {T}}\) can be obtained by solving the following equation

$$\begin{aligned} \sum \limits _{i\in {s_{A}}}\frac{d_{i}^{A}x_{i}^{\text {T}}}{1-q_{i}^{A}x_{i}^{\text {T}}\lambda }-\sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}=0. \end{aligned}$$
(A1)

Define \(d_{i}^{A}x_{i}^{\text {T}}=d_{i}^{A}x_{i}^{\text {T}}\left( 1-q_{i}^{A}x_{i}^{\text {T}}\lambda +q_{i}^{A}x_{i}^{\text {T}}\lambda \right)\) and substitute it into (A1), we can obtain

$$\begin{aligned}&\sum \limits _{i\in {s_{A}}}\frac{d_{i}^{A}x_{i}^{\text {T}}\left( 1-q_{i}^{A}x_{i}^{\text {T}}\lambda +q_{i}^{A}x_{i}^{\text {T}}\lambda \right) }{1-q_{i}^{A}x_{i}^{\text {T}}\lambda }-\sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}=0\nonumber ,\\&\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}}+\sum \limits _{i\in {s_{A}}}\frac{\lambda ^{\text {T}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}}}{1-q_{i}^{A}x_{i}^{\text {T}}\lambda }-\sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}=0\nonumber ,\\&\lambda ^{\text {T}}\sum \limits _{i\in {s_{A}}}\frac{d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}}}{1-\lambda ^{\text {T}}q_{i}^{A}x_{i}}=\sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}}. \end{aligned}$$
(A2)

It follows from (A2) that

$$\begin{aligned} \frac{\lambda ^{*\text {T}}}{1+\lambda ^{*\text {T}}u^{*}}\sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}}\le {{\mathbf {R}}}, \end{aligned}$$
(A3)

where \(\lambda ^{*}=(\left| \lambda _{1}\right| , \cdots , \left| \lambda _{p}\right| )^{\text {T}}\), \(u^{*}=\left( \max \limits _{i\in {s_{A}}}{|q_{i}^{A}x_{i1}|},\max \limits _{i\in {s_{A}}}{|q_{i}^{A}x_{i2}|},\ldots ,\max \limits _{i\in {s_{A}}}{|q_{i}^{A}x_{ip}|} \right) ^{\text {T}}=o_{p}\left( n_{A}^{1/2} \right)\) by the condition (A1) and \({\mathbf {R}}=\left( \left| \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i1}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i1}\right| ,\ldots ,\left| \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{ip}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{ip}\right| \right)\). The condition (A2) implies \(\mathbf {{\widehat{T}}}^{X}_{A}={\mathbf {T}}^{X}+O_{p}\left( Nn_{A}^{-1/2} \right)\), where \(\widehat{{\mathbf {T}}}^{X}_{A}=\left( \widehat{{\mathbf {T}}}^{X}_{A1},\widehat{{\mathbf {T}}}^{X}_{A2},\ldots ,\widehat{{\mathbf {T}}}^{X}_{Ap} \right) =\left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i1},\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i2},\ldots ,\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{ip} \right)\), and the condition (A3) implies \(\mathbf {{\widehat{T}}}^{X}_{B}={\mathbf {T}}^{X}+O_{p}\left( Nn_{B}^{-1/2} \right)\). Furthermore, we can obtain \(\widehat{{\mathbf {T}}}^{X}_{B}-\widehat{{\mathbf {T}}}^{X}_{A}=O_{p}\left( Nn_{*}^{-1/2} \right)\), where \(n_{*}=\min \left( n_{A},n_{B} \right)\). Under conditions (A1)-(A4), we have \(\sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}}=O\left( N \right)\), \(\lambda ^{\text {T}}=O_{p}\left( n_{*}^{-1/2} \right)\) and \(\max \limits _{i\in {s_{A}}}|\lambda ^{\text {T}}q_{i}^{A}x_{i}|=o_{p}(1)\). Furthermore, we can obtain

$$\begin{aligned}&\left( \lambda ^{ECGPEL} \right) ^{\text {T}}\nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1}\left( 1+o_{p}\left( 1 \right) \right) \nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1} \nonumber \\&\qquad +\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1}o_{p}\left( 1 \right) \nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1}\nonumber \\&\qquad +O_{p}\left( Nn_{*}^{-1/2} \right) O_{p}\left( 1/N \right) o_{p}\left( 1 \right) \nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1}+o_{p}\left( n_{*}^{-1/2} \right) . \nonumber \end{aligned}$$

Then, we have

$$\begin{aligned} \lambda ^{ECGPEL}&=\left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1}\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&=\left( {\mathbf {X}}_{A}^{\text {T}}{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {X}}_{A} \right) ^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {X}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {X}}_{A} \right] ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) , \end{aligned}$$
(A4)

where \({\mathbf {Q}}^{A}\) is a \(n_{A}\times {n_{A}}\) diagonal matrix in which the diagonal elements is \(q_{i}^{A}\) in non-probability sample, \({\mathbf {d}}^{B}=\left( d_{1}^{B},d_{2}^{B},\ldots ,d_{n_{B}}^{B} \right) ^{\text {T}}\), \({\mathbf {X}}_{B}=\left( x_{1},x_{2},\ldots ,x_{n_{B}} \right) ^{\text {T}}\), \(x^{\text {T}}_{i}=(x_{i1}, x_{i2}, \cdots , x_{ip})\), \(i=1, 2, \cdots , n_{B}\). By Taylor’s expansion, we can obtain \(g_{i}=1/\left( 1-q_{i}^{A}x_{i}^{\text {T}}\lambda \right) =1+q_{i}^{A}x_{i}^{\text {T}}\lambda +o_{p}\left( n_{A}^{-1/2} \right) \doteq 1+q_{i}^{A}x_{i}^{\text {T}}\lambda\). Then, the ECGPEL estimator of estimated control calibrated weights is given by

$$\begin{aligned}&{\mathbf {W}}^{ECGPEL}\nonumber \\&\quad = {\mathbf {D}}^{A}\left[ {\mathbf {E}}+{\mathbf {Q}}^{A}{\mathbf {X}}_{A}\lambda ^{ECGPEL}+o_{p}\left( n_{A}^{-1/2} \right) \right] \nonumber \\&\quad ={\mathbf {d}}^{A}+{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {X}}_{A}\left( {\mathbf {X}}_{A}^{\text {T}}{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {X}}_{A} \right) ^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {X}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {X}}_{A} \right] ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\qquad +o_{p}\left( n_{A}^{-1/2} \right) \nonumber \\&\quad ={\mathbf {d}}^{A}+{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {X}}_{A}\left( {\mathbf {X}}_{A}^{\text {T}}{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {X}}_{A} \right) ^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {X}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {X}}_{A} \right] ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) , \end{aligned}$$
(A5)

where \({\mathbf {E}}={\mathbf {1}}_{(n_{A} \times 1)}\). Therefore, the ECGPEL estimator of the population mean is

$$\begin{aligned}&\widehat{\overline{{\mathbf {Y}}}}^{ECGPEL}\nonumber \\&\quad ={\widehat{N}}^{-1}\left( {\mathbf {W}}^{ECGPEL} \right) ^{\text {T}}{\mathbf {y}}\nonumber \\&\quad ={\widehat{N}}^{-1}\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {y}}+{\widehat{N}}^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {X}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {X}}_{A} \right] \left( {\mathbf {X}}_{A}^{\text {T}}{\mathbf {Q}}^{A}{\mathbf {D}}^{A}{\mathbf {X}}_{A} \right) ^{-1}{\mathbf {X}}_{A}^{\text {T}}{\mathbf {Q}}^{A}{\mathbf {D}}^{A}{\mathbf {y}}\nonumber \\&\qquad +o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\quad =\widehat{\overline{{\mathbf {Y}}}}^{PHT}+{\widehat{N}}^{-1}\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) \left( \sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1}\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}x_{i}y_{i}\nonumber \\&\qquad +o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\quad =\widehat{\overline{{\mathbf {Y}}}}^{ECGREG}+o_{p}\left( n_{*}^{-1/2} \right) , \end{aligned}$$
(A6)

where \(\widehat{\overline{{\mathbf {Y}}}}^{ECGREG}=\widehat{\overline{{\mathbf {Y}}}}^{PHT}+{\widehat{N}}^{-1}\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}x_{i}^{\text {T}}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}x_{i}^{\text {T}} \right) \left( \sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}x_{i}x_{i}^{\text {T}} \right) ^{-1}\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}x_{i}y_{i}\) is the population mean estimator based on estimated control generalized regression model, and \(\widehat{\overline{{\mathbf {Y}}}}^{PHT}={\widehat{N}}^{-1}\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {y}}\). \(\square\)

Proof of Lemma 2

The estimated control model-assisted calibrated weights can be expressed as \(w_{i}=d_{i}^{A}g_{i}\), where \(g_{i}=1/\left( 1-q_{i}^{A}\nu _{i}\lambda \right)\) by the standard Lagrange multiplier method. The Lagrange multiplier \(\lambda =\left( \lambda _{1},\lambda _{2} \right) ^{\text {T}}\) can be obtained by solving the following equation

$$\begin{aligned} \sum \limits _{i\in {s_{A}}}\frac{d_{i}^{A}\nu _{i}}{1-q_{i}^{A}\nu _{i}\lambda }-\sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}=0. \end{aligned}$$
(A7)

Define \(d_{i}^{A}\nu _{i}=d_{i}^{A}\nu _{i}\left( 1-q_{i}^{A}\nu _{i}\lambda +q_{i}^{A}\nu _{i}\lambda \right)\), and substitute it into (A7), we can obtain

$$\begin{aligned}&\sum \limits _{i\in {s_{A}}}\frac{d_{i}^{A}\nu _{i}\left( 1-q_{i}^{A}\nu _{i}\lambda +q_{i}^{A}\nu _{i}\lambda \right) }{1-q_{i}^{A}\nu _{i}\lambda }-\sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}=0,\nonumber \\&\sum \limits _{i\in {s_{A}}}d_{i}^{A}\nu _{i}+\sum \limits _{i\in {s_{A}}}\frac{\lambda ^{\text {T}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i}}{1-q_{i}^{A}\nu _{i}\lambda }-\sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}=0,\nonumber \\&\lambda ^{\text {T}}\sum \limits _{i\in {s_{A}}}\frac{d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i}}{1-\lambda ^{\text {T}}q_{i}^{A}\nu _{i}^{\text {T}}}=\sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}\nu _{i}. \end{aligned}$$
(A8)

It follows from (A8) that

$$\begin{aligned} \frac{\lambda ^{*\text {T}}}{1+\lambda ^{*\text {T}}u^{*}}\sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i}\le {{\mathbf {K}}}, \end{aligned}$$
(A9)

where \(\lambda ^{*}=(\left| \lambda _{1}\right| , \left| \lambda _{2}\right| )^{\text {T}}\), \(u^{*}=\left( \max \limits _{i\in {s_{A}}}\left| q_{i}^{A}\right| ,\max \limits _{i\in {s_{A}}}\left| q_{i}^{A}{\hat{\mu }}_{i}\right| \right) ^{\text {T}}\) and its order is \(o_{p}\left( n_{A}^{1/2} \right)\) by the condition (B1), and \({\mathbf {K}}=\left( \left| \sum \nolimits _{i\in {s_{B}}}d_{i}^{B}-\sum \nolimits _{i\in {s_{A}}}d_{i}^{A}\right| ,\left| \sum \nolimits _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}-\sum \nolimits _{i\in {s_{A}}}d_{i}^{A}{\hat{\mu }}_{i}\right| \right)\). The condition (B2) implies \(\widehat{{\mathbf {T}}}^{M}_{A}={\mathbf {T}}^{M}+O_{p}\left( Nn_{A}^{-1/2} \right)\), where \(\widehat{{\mathbf {T}}}^{M}_{A}=\left( \widehat{{\mathbf {T}}}^{M}_{A1},\widehat{{\mathbf {T}}}^{M}_{A2} \right) =\left( \sum \limits _{i\in {s_{A}}}d_{i}^{A},\sum \limits _{i\in {s_{A}}}d_{i}^{A}{\hat{\mu }}_{i} \right)\), and the condition (B3) implies \(\widehat{{\mathbf {T}}}^{M}_{B}={\mathbf {T}}^{M}+O_{p}\left( Nn_{B}^{-1/2} \right)\). Furthermore, we can obtain \(\widehat{{\mathbf {T}}}^{{\mathbf {M}}}_{B}-\widehat{{\mathbf {T}}}^{{\mathbf {M}}}_{A}=O_{p}\left( Nn_{*}^{-1/2} \right)\), where \(n_{*}=\min \left( n_{A},n_{B} \right)\). Under conditions (B1)–(B4), we have \(\sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i}=O\left( N \right)\), \(\lambda ^{\text {T}}=O_{p}\left( n_{*}^{-1/2} \right)\) and \(\max \limits _{i\in {s_{A}}}|\lambda ^{\text {T}}q_{i}^{A}\nu _{i}|=o_{p}(1)\). Furthermore, we can obtain

$$\begin{aligned}&\left( \lambda ^{ECMCGPEL} \right) ^{\text {T}}\nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}\nu _{i} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i} \right) ^{-1}\left( 1+o_{p}\left( 1 \right) \right) \nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}\nu _{i} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i} \right) ^{-1} \nonumber \\&\qquad +\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}\nu _{i} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i} \right) ^{-1}o_{p}\left( 1 \right) \nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}\nu _{i} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i} \right) ^{-1}\nonumber \\&\qquad +O_{p}\left( Nn_{*}^{-1/2} \right) O_{p}\left( 1/N \right) o_{p}\left( 1 \right) \nonumber \\&\quad =\left( \sum \limits _{i\in {s_{B}}}d_{i}^{B}\nu _{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}\nu _{i} \right) \left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i} \right) ^{-1}+o_{p}\left( n_{*}^{-1/2} \right) . \nonumber \end{aligned}$$

Then, we can obtain

$$\begin{aligned} \lambda ^{ECMCGPEL}&=\left( \sum \limits _{i\in {s_{A}}}d_{i}^{A}q_{i}^{A}\nu _{i}^{\text {T}}\nu _{i} \right) ^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {M}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {M}}_{A} \right] ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&=\left( {\mathbf {M}}_{A}^{\text {T}}{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {M}}_{A} \right) ^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {M}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {M}}_{A} \right] ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) , \end{aligned}$$
(A10)

where \({\mathbf {d}}^{B}=\left( d_{1}^{B},d_{2}^{B},\ldots ,d_{n_{B}}^{B} \right) ^{\text {T}}\), \({\mathbf {M}}_{B}=({\mathbf {M}}_{B1}, {\mathbf {M}}_{B2})=\left( {\mathbf {1}}_{(n_{B} \times 1)},\left( {\hat{\mu }}_{i} \right) _{i\in {s_{B}}} \right)\). By Taylor’s expansion, we can obtain \(g_{i}=1/\left( 1-q_{i}^{A}x_{i}^{\text {T}}\lambda \right) =1+q_{i}^{A}x_{i}^{\text {T}}\lambda +o_{p}\left( n_{A}^{-1/2} \right) \doteq 1+q_{i}^{A}x_{i}^{\text {T}}\lambda\). Then, the estimator of estimated control model-assisted calibrated weights is given by

$$\begin{aligned}&{\mathbf {W}}^{ECMCGPEL}\nonumber \\&\quad ={\mathbf {D}}^{A}\left[ {\mathbf {E}}+{\mathbf {Q}}^{A}{\mathbf {M}}_{A}\lambda ^{ECMCGPEL}+o_{p}\left( n_{A}^{-1/2} \right) \right] \nonumber \\&\quad ={\mathbf {d}}^{A}+{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {M}}_{A}\left( {\mathbf {M}}_{A}^{\text {T}}{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {M}}_{A} \right) ^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {M}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {M}}_{A} \right] ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\qquad +o_{p}\left( n_{A}^{-1/2} \right) \nonumber \\&\quad ={\mathbf {d}}^{A}+{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {M}}_{A}\left( {\mathbf {M}}_{A}^{\text {T}}{\mathbf {D}}^{A}{\mathbf {Q}}^{A}{\mathbf {M}}_{A} \right) ^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {M}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {M}}_{A} \right] ^{\text {T}}+o_{p}\left( n_{*}^{-1/2} \right) . \end{aligned}$$

Thus, the ECMCGPEL estimator of the population mean is

$$\begin{aligned}&\widehat{\overline{{\mathbf {Y}}}}^{ECMCGPEL}\nonumber \\&\quad ={\widehat{N}}^{-1}\left( {\mathbf {W}}^{ECMCGPEL} \right) ^{\text {T}}{\mathbf {y}}\nonumber \\&\quad ={\widehat{N}}^{-1}\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {y}}+{\widehat{N}}^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {M}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {M}}_{A} \right] \left( {\mathbf {M}}_{A}^{\text {T}}{\mathbf {Q}}^{A}{\mathbf {D}}^{A}{\mathbf {M}}_{A} \right) ^{-1}\nonumber \\&\qquad \cdot {\mathbf {M}}_{A}^{\text {T}}{\mathbf {Q}}^{A}{\mathbf {D}}^{A}{\mathbf {y}}+o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\quad ={\widehat{N}}^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}y_{i}+{\widehat{N}}^{-1}\left( \sum _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}-\sum _{i\in {s_{A}}}d_{i}^{A}{\hat{\mu }}_{i} \right) \widehat{{\mathbf {H}}}^{MC}+o_{p}\left( n_{*}^{-1/2} \right) , \end{aligned}$$
(A11)

where

$$\begin{aligned} \widehat{{\mathbf {H}}}^{MC}= & {} \frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})\left( y_{i}-{\bar{y}} \right) }{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})^{2}},\\ \hat{{\bar{\mu }}}= & {} \frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}{\hat{\mu }}_{i}}{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}},\\ {\bar{y}}= & {} \frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}y_{i}}{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}}. \end{aligned}$$

\(\square\)

Proof of Lemma 3

By taking the variance of equation in (3.6), we have

$$\begin{aligned}&V\Big (\widehat{\overline{{\mathbf {Y}}}}^{ECMCGPEL}\Big )\nonumber \\&\quad \doteq V_{{\mathcal {A}}}\Big ({\widehat{N}}^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}y_{i}+{\widehat{N}}^{-1}\Big (\sum \limits _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}{\hat{\mu }}_{i}\Big )\widehat{{\mathbf {H}}}^{MC}\Big ) \nonumber \\&\quad = {\widehat{N}}^{-2}V_{{\mathcal {A}}}\Big [\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - {\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}) + \sum \limits _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}\Big ] \nonumber \\&\quad ={\widehat{N}}^{-2}\bigg \{V_{{\mathcal {A}}}\Big [E_{{\mathcal {B}}}\Big (\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - {\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}) + \sum \limits _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}\Big )\Big ] \nonumber \\&\qquad + E_{{\mathcal {A}}}\Big [V_{{\mathcal {B}}}\Big (\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - {\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}) + \sum \limits _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}\Big )\Big ]\bigg \}\nonumber \\&\quad = {\widehat{N}}^{-2}\bigg \{V_{{\mathcal {A}}}\Big [\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - {\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}) + \sum \limits _{i\in {U}}{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}\Big ] + V_{{\mathcal {B}}}\Big (\sum \limits _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}\Big )\bigg \}\nonumber \\&\quad = {\widehat{N}}^{-2} \bigg \{\sum _{i\in {U}}\bigg (\frac{y_{i}-{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}}{\pi _{i}^{A}}\bigg )^{2}\pi _{i}^{A}(1-\pi _{i}^{A}) + \sum _{i\in {U}}\bigg (\frac{{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}}{\pi _{i}^{B}}\bigg )^{2}\pi _{i}^{B}(1-\pi _{i}^{B})\nonumber \\&\qquad + \sum _{i\in {U}}\sum _{j \ne i} (\pi _{ij}^{A}-\pi _{i}^{A}\pi _{j}^{A}) \frac{y_{i}-{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}}{\pi _{i}^{A}} \frac{y_{j}-{\hat{\mu }}_{j}\widehat{{\mathbf {H}}}^{MC}}{\pi _{j}^{A}}\nonumber \\&\qquad + \sum _{i\in {U}}\sum _{j \ne i} (\pi _{ij}^{B}-\pi _{i}^{B}\pi _{j}^{B}) \frac{{\hat{\mu }}_{i}\widehat{{\mathbf {H}}}^{MC}}{\pi _{i}^{B}} \frac{{\hat{\mu }}_{j}\widehat{{\mathbf {H}}}^{MC}}{\pi _{j}^{B}}\bigg \}\,. \end{aligned}$$
(A12)

Thus, Lemma 3 holds. \(\square\)

Proof of Theorem 1

Under the condition (C6), the SCAD regression satisfies the oracle property according to Theorems 1 and 2 in Fan and Li (2001), that is

$$\begin{aligned}&\text {Pr}\left( \widehat{{\mathbf {B}}}^{\left( 2 \right) }=0 \right) \rightarrow 1,\nonumber \\ \sqrt{n_{A}}\left\{ I_{1}\left( {\mathbf {B}}^{\left( 1 \right) } \right) +\Sigma \right\}&\left\{ \widehat{{\mathbf {B}}}^{\left( 1 \right) }-{\mathbf {B}}^{\left( 1 \right) }+\left( I_{1}\left( {\mathbf {B}}^{\left( 1 \right) } \right) +\Sigma \right) ^{-1}{\mathbf {b}}\right\} \rightarrow {N}\left\{ 0,I_{1}\left( {\mathbf {B}}^{\left( 1 \right) } \right) \right\} , \end{aligned}$$

where \(I_{1}\left( {\mathbf {B}}^{\left( 1 \right) } \right) =I_{1}\left( {\mathbf {B}}^{\left( 1 \right) },0 \right)\) is the Fisher information of \({\mathbf {B}}^{\left( 1 \right) }\) under known \({\mathbf {B}}^{\left( 2 \right) }=0\), \(\Sigma =\text {diag}\left\{ p''_{\lambda _{n_{A}}}\left( \left| {\mathbf {B}}_{1}\right| \right) ,\ldots ,p''_{\lambda _{n_{A}}}\left( \left| {\mathbf {B}}_{q}\right| \right) \right\}\), \({\mathbf {b}}=\left( p'_{\lambda _{n_{A}}}\left( \left| {\mathbf {B}}_{1}\right| \right) \text {sgn}\left( {\mathbf {B}}_{1} \right) ,\ldots ,p'_{\lambda _{n_{A}}}\left( \left| {\mathbf {B}}_{1}\right| \right) \text {sgn}\left( {\mathbf {B}}_{q} \right) \right)\), and \({\mathbf {B}}^{\left( 1 \right) }=\left( {\mathbf {B}}_{1},\ldots ,{\mathbf {B}}_{q} \right) ^{\text {T}}\). If \(\max \left\{ \left| p''_{\lambda _{n_{A}}}\left( \left| {\mathbf {B}}_{j}\right| \right) \right| :{\mathbf {B}}_{j}\ne 0,j=1,2,\ldots ,q\right\} \rightarrow 0\), we have \(\widehat{{\mathbf {B}}}^{\left( 1 \right) }={\mathbf {B}}+O_{p}\left( n_{A}^{-1/2} \right)\).

We use SCAD to estimate the model parameter \(\beta\) based on the non-probability sample \(s_{A}\). We can then obtain the SCAD coefficient estimator \(\widehat{{\mathbf {B}}}\) of \(\beta\). Similar to the proof of Lemma 2, the ECMC-SCAD-GPEL estimator of the population mean can be given by

$$\begin{aligned}&\widehat{\overline{{\mathbf {Y}}}}^{ECMCSCADGPEL}\nonumber \\&\quad ={\widehat{N}}^{-1}\left( {\mathbf {W}}^{ECMCSCADGPEL} \right) ^{\text {T}}{\mathbf {y}}\nonumber \\&\quad ={\widehat{N}}^{-1}\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {y}}+{\widehat{N}}^{-1}\left[ \left( {\mathbf {d}}^{B} \right) ^{\text {T}}{\mathbf {M}}_{B}-\left( {\mathbf {d}}^{A} \right) ^{\text {T}}{\mathbf {M}}_{A} \right] \left( {\mathbf {M}}_{A}^{\text {T}}{\mathbf {Q}}^{A}{\mathbf {D}}^{A}{\mathbf {M}}_{A} \right) ^{-1}\nonumber \\&\qquad \cdot {\mathbf {M}}_{A}^{\text {T}}{\mathbf {Q}}^{A}{\mathbf {D}}^{A}{\mathbf {y}}+o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\quad ={\widehat{N}}^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}y_{i}+{\widehat{N}}^{-1}\left( \sum _{i\in {s_{B}}}d_{i}^{B}{\hat{\mu }}_{i}-\sum _{i\in {s_{A}}}d_{i}^{A}{\hat{\mu }}_{i} \right) \widehat{{\mathbf {H}}}^{MC}+o_{p}\left( n_{*}^{-1/2} \right) , \end{aligned}$$
(A13)

where

$$\begin{aligned} \widehat{{\mathbf {H}}}^{MC}= & {} \frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})\left( y_{i}-{\bar{y}} \right) }{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})^{2}},\\ \hat{{\bar{\mu }}}= & {} \frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}{\hat{\mu }}_{i}}{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}},\\ {\bar{y}}= & {} \frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}y_{i}}{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}}. \end{aligned}$$

Now we derive the consistency of population mean estimator \(\widehat{\overline{{\mathbf {Y}}}}^{ECMCSCADGPEL}\). Under conditions (C1)-(C3), the second order Taylor series expansion of \(\mu (x_{i},\widehat{{\mathbf {B}}})\) around \({\mathbf {B}}\) is:

$$\begin{aligned}&\mu (x_{i},\widehat{{\mathbf {B}}})=\mu (x_{i},{\mathbf {B}})+\left\{ \frac{\mu \left( x_{i},s \right) }{\partial {s}}|_{s={\mathbf {B}}}\right\} ^{\text {T}}\left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) \nonumber \\&\quad + \left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) ^{\text {T}}\left\{ \frac{\partial ^{2}{\mu \left( x_{i},s \right) }}{\partial {s}\partial {s}^{\text {T}}}|_{s=\mathbf {B^{*}}}\right\} \left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) , \end{aligned}$$
(A14)

for \({\mathbf {B}}^{*}\in (\widehat{{\mathbf {B}}},{\mathbf {B}})\) or \(({\mathbf {B}},\widehat{{\mathbf {B}}})\). Let

$$\begin{aligned} {\mathbf {f}}\left( x_{i},{\mathbf {B}} \right)&=\frac{\partial \mu \left( x_{i},s \right) }{\partial {s}}\mid _{s={\mathbf {B}}},\nonumber \\ {\mathbf {g}}\left( x_{i},{\mathbf {B}}^{*} \right)&=\frac{\partial ^{2}{\mu \left( x_{i},s \right) }}{\partial {s}\partial {s}^{\text {T}}}|_{s=\mathbf {B^{*}}}. \end{aligned}$$

Note that \({\mathbf {f}}\) is a vector of the length p and \({\mathbf {g}}\) is a matrix of size \(p\times {p}\), where p is the number of parameters \(\beta\). Under conditions (C2) and (C3)

$$\begin{aligned} \max _{i}\left| {\mathbf {f}}\left( x_{i},{\mathbf {B}} \right) \right|&\le {f\left( x_{i},{\mathbf {B}} \right) },\nonumber \\ \max _{k,j}\left| {\mathbf {g}}\left( x_{i},{\mathbf {B}}^{*} \right) \right|&\le {g\left( x_{i},{\mathbf {B}}^{*} \right) }. \end{aligned}$$
(A15)

By Condition (C1), we use the probability sample \(s_{B}\) to estimate the population mean based on the second order Taylor series expansion of \(\mu (x_{i},\widehat{{\mathbf {B}}})\) around \({\mathbf {B}}\), that is

$$\begin{aligned}&N^{-1}\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},\widehat{{\mathbf {B}}})\nonumber \\&\quad =N^{-1}\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})+N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{B}{\mathbf {f}}^{\text {T}}(x_{i},{\mathbf {B}})\left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) \nonumber \\&\qquad +N^{-1}\sum \limits _{i\in {s_{B}}}d_{i}^{B}\left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) ^{\text {T}}{\mathbf {g}}(x_{i},\mathbf {B^{*}})\left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) \nonumber \\&\quad =N^{-1}\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})+O_{p}\left( n_{B}^{-1/2} \right) +O_{p}\left( n_{B}^{-1/2} \right) O_{p}\left( n_{B}^{-1/2} \right) \nonumber \\&\quad =N^{-1}\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})+O_{p}\left( n_{B}^{-1/2} \right) . \end{aligned}$$
(A16)

By the Eq. (A16), we can obtain \(\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},\widehat{{\mathbf {B}}})=\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})+O_{p}\left( Nn_{B}^{-1/2} \right)\). Combing this result and the condition (C5), we have

$$\begin{aligned} \frac{\sum _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},\widehat{{\mathbf {B}}})}{{\widehat{N}}}&=\frac{\sum _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})+O_{p}\left( Nn_{B}^{-1/2} \right) }{{\widehat{N}}}\nonumber \\&=\frac{\sum _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})+O_{p}\left( Nn_{B}^{-1/2} \right) }{N+O_{p}\left( Nn_{B}^{-1/2} \right) }\nonumber \\&=N^{-1}\sum _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})+o_{p}\left( 1 \right) . \end{aligned}$$
(A17)

Similarly, we use the non-probability sample \(s_{A}\) to estimate the population mean. Based on the second order Taylor series expansion of \(\mu (x_{i},\widehat{{\mathbf {B}}})\) around \({\mathbf {B}}\), we have

$$\begin{aligned}&N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},\widehat{{\mathbf {B}}})\nonumber \\&\quad =N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}})+N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}{\mathbf {f}}^{\text {T}}(x_{i},{\mathbf {B}})\left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) \nonumber \\&\qquad +N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}\left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) ^{\text {T}}{\mathbf {g}}(x_{i},\mathbf {B^{*}})\left( \widehat{{\mathbf {B}}}-{\mathbf {B}} \right) \nonumber \\&\quad =N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}})+O_{p}\left( n_{A}^{-1/2} \right) +O_{p}\left( n_{A}^{-1/2} \right) O_{p}\left( n_{A}^{-1/2} \right) \nonumber \\&\quad =N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}})+O_{p}\left( n_{A}^{-1/2} \right) . \end{aligned}$$
(A18)

It follows that \(\sum \limits _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},\widehat{{\mathbf {B}}})=\sum \limits _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}})+O_{p}\left( Nn_{A}^{-1/2} \right)\). By condition (C5), we can further get

$$\begin{aligned} \frac{\sum _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},\widehat{{\mathbf {B}}})}{{\widehat{N}}}&=\frac{\sum _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}})+O_{p}\left( Nn_{A}^{-1/2} \right) }{{\widehat{N}}}\nonumber \\&=\frac{\sum _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}})+O_{p}\left( Nn_{A}^{-1/2} \right) }{N+O_{p}\left( Nn_{A}^{-1/2} \right) }\nonumber \\&=N^{-1}\sum _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}})+o_{p}\left( 1 \right) . \end{aligned}$$
(A19)

By conditions (C6), (A17) and (A19), we can obtain

$$\begin{aligned}&{\widehat{N}}^{-1}\sum _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},\widehat{{\mathbf {B}}})-{\widehat{N}}^{-1}\sum _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},\widehat{{\mathbf {B}}})\nonumber \\&\quad =N^{-1}\sum _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})-N^{-1}\sum _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}}) +o_{p}\left( 1 \right) -o_{p}\left( 1 \right) \nonumber \\&\quad =N^{-1}\sum _{i\in {s_{B}}}d_{i}^{B}\mu (x_{i},{\mathbf {B}})-N^{-1}\sum _{i\in {s_{A}}}d_{i}^{A}\mu (x_{i},{\mathbf {B}}) +o_{p}\left( 1 \right) . \end{aligned}$$
(A20)

Besides, we also have

$$\begin{aligned} \hat{{\bar{\mu }}}=&\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\mu (x_{i},\widehat{{\mathbf {B}}})/\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\nonumber \\ =&\left( \sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A} \right) ^{-1}\bigg (\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\Big [\mu (x_{i},{\mathbf {B}}) +{\mathbf {f}}^{\text {T}}(x_{i},{\mathbf {B}})(\widehat{{\mathbf {B}}}-{\mathbf {B}})\nonumber \\&+(\widehat{{\mathbf {B}}}-{\mathbf {B}})^{\text {T}}{\mathbf {g}}(x_{i},\mathbf {B^{*}})(\widehat{{\mathbf {B}}}-{\mathbf {B}})\Big ]\bigg )\nonumber \\ =&\left( \sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A} \right) ^{-1}\nonumber \\&\left( \sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\left[ \mu (x_{i},{\mathbf {B}})+O_{p}\left( n_{A}^{-1/2} \right) +O_{p}\left( n_{A}^{-1/2} \right) O_{p}\left( n_{A}^{-1/2} \right) \right] \right) \nonumber \\ =&\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\mu (x_{i},{\mathbf {B}})/\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}+O_{p}\left( n_{A}^{-1/2} \right) \nonumber \\ =&{\bar{\mu }}+O_{p}\left( n_{A}^{-1/2} \right) . \end{aligned}$$
(A21)

where \({\bar{\mu }}=\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\mu (x_{i},{\mathbf {B}})/\sum \limits _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}=\sum \limits _{i\in {U}}q_{i}\mu (x_{i},{\mathbf {B}})/\sum \limits _{i\in {U}}q_{i}\). By Eqs. (A14) and (A21), we have

$$\begin{aligned}&N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})\nonumber \\&\quad =N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\left[ \mu (x_{i},{\mathbf {B}})+{\mathbf {f}}^{\text {T}}(x_{i},{\mathbf {B}})(\widehat{{\mathbf {B}}}-{\mathbf {B}}) +(\widehat{{\mathbf {B}}}-{\mathbf {B}})^{\text {T}}{\mathbf {g}}(x_{i},\mathbf {B^{*}})(\widehat{{\mathbf {B}}}-{\mathbf {B}})-\hat{{\bar{\mu }}} \right] \nonumber \\&\quad =N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}\bigg [\mu (x_{i},{\mathbf {B}})+{\mathbf {f}}^{\text {T}}(x_{i},{\mathbf {B}})(\widehat{{\mathbf {B}}}-{\mathbf {B}}) +(\widehat{{\mathbf {B}}}-{\mathbf {B}})^{\text {T}}{\mathbf {g}}(x_{i},\mathbf {B^{*}})(\widehat{{\mathbf {B}}}-{\mathbf {B}})\nonumber \\&\qquad -{\bar{\mu }}-O_{p}\left( n_{A}^{-1/2} \right) \bigg ]\nonumber \\&\quad =N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}(\mu _{i}-{\bar{\mu }})+O_{p}\left( n_{A}^{-1/2} \right) +O_{p}\left( n_{A}^{-1} \right) -O_{p}\left( n_{A}^{-1/2} \right) \nonumber \\&\quad =N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}(\mu _{i}-{\bar{\mu }})+O_{p}\left( n_{A}^{-1/2} \right) . \end{aligned}$$
(A22)

According to (A22), we have

$$\begin{aligned} N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})^{2} = N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}(\mu _{i}-{\bar{\mu }})^{2}+O_{p}\left( n_{A}^{-1} \right) \,. \end{aligned}$$
(A23)

Under Eqs. (A22) and (A23), we have

$$\begin{aligned} \widehat{{\mathbf {H}}}^{MC}&=\frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})\left( y_{i}-{\bar{y}} \right) }{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})^{2}} =\frac{N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})\left( y_{i}-{\bar{y}} \right) }{N^{-1}\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}({\hat{\mu }}_{i}-\hat{{\bar{\mu }}})^{2}}\nonumber \\&=\frac{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}(\mu _{i}-{\bar{\mu }})\left( y_{i}-{\bar{y}} \right) +O_{p}\left( n_{A}^{-1/2} \right) }{\sum _{i\in {s_{A}}}q_{i}^{A}d_{i}^{A}(\mu _{i}-{\bar{\mu }})^{2}+O_{p}\left( n_{A}^{-1} \right) }\nonumber \\&\rightarrow \quad {{\mathbf {H}}}^{MC} \quad as \quad n_{A}\rightarrow \infty . \end{aligned}$$
(A24)

Therefore, we can obtain \(\widehat{{\mathbf {H}}}^{MC}={\mathbf {H}}^{MC}+o_{p}\left( 1 \right)\). In addition, we also have

$$\begin{aligned} \frac{\sum _{i\in {s_{A}}}d_{i}^{A}y_{i}}{{\widehat{N}}}&=\frac{\sum _{i\in {s_{A}}}d_{i}^{A}y_{i}}{N+O_{p}\left( Nn_{A}^{-1/2} \right) }\nonumber \\&=\frac{\sum _{i\in {s_{A}}}d_{i}^{A}y_{i}}{N}+O_{p}\left( n_{A}^{-1/2} \right) . \end{aligned}$$
(A25)

Furthermore, by (A13), (A20), (A24) and (A25), we can obtain

$$\begin{aligned}&\widehat{\overline{{\mathbf {Y}}}}^{ECMCSCADGPEL}\nonumber \\&\quad = {\widehat{N}}^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}y_{i} +\left[ N^{-1}\sum _{i\in {s_{B}}}d_{i}^{B}\mu \left( x_{i},{\mathbf {B}} \right) -N^{-1}\sum _{i\in {s_{A}}}d_{i}^{A}\mu \left( x_{i},{\mathbf {B}} \right) +o_{p}\left( 1 \right) \right] \nonumber \\&\qquad \cdot \left( {\mathbf {H}}^{MC}+o_{p}\left( 1 \right) \right) +o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\quad = N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}y_{i}+O_{p}\left( n_{A}^{-1/2} \right) +N^{-1}\left[ \sum _{i\in {s_{B}}}d_{i}^{B}\mu \left( x_{i},{\mathbf {B}} \right) -\sum _{i\in {s_{A}}}d_{i}^{A}\mu \left( x_{i},{\mathbf {B}} \right) \right] {\mathbf {H}}^{MC}\nonumber \\&\qquad +o_{p}\left( n_{*}^{-1/2} \right) \nonumber \\&\quad = N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}y_{i} +N^{-1}\left[ \sum _{i\in {s_{B}}}d_{i}^{B}\mu \left( x_{i},{\mathbf {B}} \right) -\sum _{i\in {s_{A}}}d_{i}^{A}\mu \left( x_{i},{\mathbf {B}} \right) \right] {\mathbf {H}}^{MC}+O_{p}\left( n_{A}^{-1/2} \right) , \end{aligned}$$
(A26)

where

$$\begin{aligned} {\mathbf {H}}^{MC}=\frac{\sum _{i\in {U}}q_{i}(\mu _{i}-{\bar{\mu }})\left( y_{i}-{\bar{y}} \right) }{\sum _{i\in {U}}q_{i}(\mu _{i}-{\bar{\mu }})^{2}}, \quad {\bar{\mu }}=\frac{\sum _{i\in {U}}q_{i}{\mu }_{i}}{\sum _{i\in {U}}q_{i}}, \quad {\bar{y}}=\frac{\sum _{i\in {U}}q_{i}{y}_{i}}{\sum _{i\in {U}}q_{i}}. \end{aligned}$$

Therefore, Theorem 1 holds. \(\square\)

Proof of Theorem 2

$$\begin{aligned} \text {E}_{\xi }\left[ {\mathbf {H}}^{MC} \right]&=\text {E}_{\xi }\left[ \frac{\sum _{i\in {U}}q_{i}\left( \mu _{i}-{\bar{\mu }} \right) \left( y_{i}-{\bar{y}} \right) }{\sum _{i\in {U}}q_{i}\left( \mu _{i}-{\bar{\mu }} \right) ^{2}} \right] =\frac{\sum _{i\in {U}}q_{i}\left( \mu _{i}-{\bar{\mu }} \right) \left( \mu _{i}-{\bar{\mu }} \right) }{\sum _{i\in {U}}q_{i}\left( \mu _{i}-{\bar{\mu }} \right) ^{2}}=1\,. \end{aligned}$$

Combining \(\text {E}_{\xi }\left[ y_{i} \right] =\mu _{i}\) and \(\text {E}_{\xi }\left[ {\mathbf {H}}^{MC} \right] =1\), we can get

$$\begin{aligned}&\text {E}_{{\mathcal {B}}}\left[ \text {E}_{\xi }\bigg (\widehat{\overline{{\mathbf {Y}}}}^{ECMCSCADGPEL}-\overline{{\mathbf {Y}}}\bigg ) \right] \nonumber \\&\quad \dot{=} \text {E}_{{\mathcal {B}}}\left[ \text {E}_{\xi }\bigg (N^{-1}\sum _{i\in {s_{A}}}d_{i}^{A} (y_{i} - \mu _{i}{\mathbf {H}}^{MC}) +N^{-1}\sum _{i\in {s_{B}}}d_{i}^{B}\mu _{i}-N^{-1}\sum _{i\in {U}}y_{i}\bigg ) \right] \nonumber \\&\quad = N^{-1}\text {E}_{{\mathcal {B}}}\bigg \{\text {E}_{\xi }\Big [\sum _{i\in {s_{A}}}d_{i}^{A} (y_{i} - \mu _{i}{\mathbf {H}}^{MC}) +\sum _{i\in {s_{B}}}d_{i}^{B}\mu _{i}-\sum _{i\in {U}}y_{i}\Big ]\bigg \}\nonumber \\&\quad = N^{-1} \text {E}_{{\mathcal {B}}}\left[ \sum _{i\in {s_{A}}}d_{i}^{A}(\mu _{i}-\mu _{i}) + \sum _{i\in {s_{B}}}d_{i}^{B}\mu _{i}-\sum _{i\in {U}}\mu _{i} \right] \nonumber \\&\quad = N^{-1}\sum _{i\in {U}}\mu _{i}-N^{-1}\sum _{i\in {U}}\mu _{i}\nonumber \\&\quad =0\,. \end{aligned}$$
(A27)

Thus, Theorem 2 holds. \(\square\)

Proof of Theorem 3

By taking the variance of equation in (4.5), we have

$$\begin{aligned}&V\Big (\widehat{\overline{{\mathbf {Y}}}}^{ECMCSCADGPEL}\Big )\nonumber \\&\quad \doteq V_{{\mathcal {A}}}\Big (N^{-1}\sum \limits _{i\in {s_{A}}}d_{i}^{A}y_{i}+N^{-1}\Big (\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu _{i}-\sum \limits _{i\in {s_{A}}}d_{i}^{A}\mu _{i}\Big ){\mathbf {H}}^{MC}\Big ) \nonumber \\&\quad = N^{-2}V_{{\mathcal {A}}}\Big [\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - \mu _{i}{\mathbf {H}}^{MC}) + \sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu _{i}{\mathbf {H}}^{MC}\Big ] \nonumber \\&\quad = N^{-2}\bigg \{V_{{\mathcal {A}}}\Big [E_{{\mathcal {B}}}\Big (\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - \mu _{i}{\mathbf {H}}^{MC}) + \sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu _{i}{\mathbf {H}}^{MC}\Big )\Big ] \nonumber \\&\qquad + E_{{\mathcal {A}}}\Big [V_{{\mathcal {B}}}\Big (\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - \mu _{i}{\mathbf {H}}^{MC}) + \sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu _{i}{\mathbf {H}}^{MC}\Big )\Big ]\bigg \}\nonumber \\&\quad = N^{-2}\bigg \{V_{{\mathcal {A}}}\Big [\sum \limits _{i\in {s_{A}}}d_{i}^{A}(y_{i} - \mu _{i}{\mathbf {H}}^{MC}) + \sum \limits _{i\in {U}}\mu _{i}{\mathbf {H}}^{MC}\Big ] + V_{{\mathcal {B}}}\Big (\sum \limits _{i\in {s_{B}}}d_{i}^{B}\mu _{i}{\mathbf {H}}^{MC}\Big )\bigg \}\nonumber \\&\quad = N^{-2} \bigg \{\sum _{i\in {U}}\bigg (\frac{y_{i}-\mu _{i}{\mathbf {H}}^{MC}}{\pi _{i}^{A}}\bigg )^{2}\pi _{i}^{A}(1-\pi _{i}^{A}) + \sum _{i\in {U}}\bigg (\frac{\mu _{i}{\mathbf {H}}^{MC}}{\pi _{i}^{B}}\bigg )^{2}\pi _{i}^{B}(1-\pi _{i}^{B})\nonumber \\&\qquad + \sum _{i\in {U}}\sum _{j \ne i} (\pi _{ij}^{A}-\pi _{i}^{A}\pi _{j}^{A}) \frac{y_{i}-\mu _{i}{\mathbf {H}}^{MC}}{\pi _{i}^{A}} \frac{y_{j}-\mu _{j}{\mathbf {H}}^{MC}}{\pi _{j}^{A}}\nonumber \\&\qquad + \sum _{i\in {U}}\sum _{j \ne i} (\pi _{ij}^{B}-\pi _{i}^{B}\pi _{j}^{B}) \frac{\mu _{i}{\mathbf {H}}^{MC}}{\pi _{i}^{B}} \frac{\mu _{j}{\mathbf {H}}^{MC}}{\pi _{j}^{B}}\bigg \}\,. \end{aligned}$$
(A28)

Thus, Theorem 3 holds. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Tu, C. & Pan, Y. Model-assisted calibration with SCAD to estimated control for non-probability samples. Stat Methods Appl 31, 849–879 (2022). https://doi.org/10.1007/s10260-021-00615-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-021-00615-0

Keywords

Navigation