Skip to main content
Log in

Generalized regression estimators with concave penalties and a comparison to lasso type estimators

  • Published:
METRON Aims and scope Submit manuscript

Abstract

The generalized regression (GREG) estimator is usually used in survey sampling when incorporating auxiliary information. Generally, not all available covariates significantly contribute to the estimation process when there are multiple covariates. We propose two new GREG estimators based on concave penalties: one built from the smoothly clipped absolute deviation (SCAD) penalty and the other built from the minimax concave penalty (MCP). The performances of these estimators are compared to lasso-type estimators through a simulation study in a simple random sample (SRS) setting and a probability proportional to size (PPS) sample setting. It is shown that the proposed estimators produce improved estimates of the population total compared to that of the traditional GREG estimator and the estimators built from LASSO. Asymptotic properties are also derived for the proposed estimators. Bootstrap methods are also explored to improve coverage probability when the sample size is small.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The data set is available from UC Irvine Machine Learning Repository. https://archive.ics.uci.edu/dataset/211/communities+and+crime+unnormalized.

References

  1. Cassel, C.M., Särndal, C.E., Wretman, J.H.: Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63(3), 615–620 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  2. Särndal, C.-E., Swensson, B., Wretman, J.: Model Assisted Survey Sampling. Springer, New York (2003)

    MATH  Google Scholar 

  3. Fuller, W.A.: Sampling Statistics. John Wiley & Sons, Hoboken, New Jersey (2011)

    MATH  Google Scholar 

  4. Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47(260), 663–685 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  5. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  6. Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Annals of statistics, 1356–1378 (2000)

  7. Zou, H.: The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476), 1418–1429 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fan, J.: Comments on «wavelets in statistics: A review» by a. antoniadis. Journal of the Italian Statistical Society 6(2), 131–138 (1997)

  9. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2), 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  11. Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research 11(3) (2010)

  12. Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 101–148 (2010)

  13. Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Statistical Science 27(4), 481–499 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  14. Wang, H., Li, G., Jiang, G.: Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics 25(3), 347–355 (2007)

    Article  MathSciNet  Google Scholar 

  15. Wang, M., Song, L., Tian, G.-l: Scad-penalized least absolute deviation regression in high-dimensional models. Communications in Statistics-Theory and Methods 44(12), 2452–2472 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Jiang, H., Zheng, W., Dong, Y.: Sparse and robust estimation with ridge minimax concave penalty. Information Sciences 571, 154–174 (2021)

    Article  MathSciNet  Google Scholar 

  17. Staerk, C., Kateri, M., Ntzoufras, I.: High-dimensional variable selection via low-dimensional adaptive learning. Electronic Journal of Statistics 15(1), 830–879 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  18. McConville, K.S., Breidt, F.J., Lee, T.C., Moisen, G.G.: Model-assisted survey regression estimation with the lasso. Journal of Survey Statistics and Methodology 5(2), 131–158 (2017)

    Article  Google Scholar 

  19. Ta, T., Shao, J., Li, Q., Wang, L.: Generalized regression estimators with high-dimensional covariates. Statistica Sinica 30(3), 1135–1154 (2020)

    MathSciNet  MATH  Google Scholar 

  20. Dagdoug, M., Goga, C., Haziza, D.: Model-assisted estimation through random forests in finite population sampling. Journal of the American Statistical Association, 1–18 (2021)

  21. Dagdoug, M., Goga, C., Haziza, D.: Model-assisted estimation in high-dimensional settings for survey data. Journal of Applied Statistics, 1–25 (2022)

  22. Chauvet, G., Goga, C.: Asymptotic efficiency of the calibration estimator in a high-dimensional data setting. Journal of Statistical Planning and Inference 217, 177–187 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  23. Wei, F., Zhu, H.: Group coordinate descent algorithms for nonconvex penalized regression. Computational statistics & data analysis 56(2), 316–326 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  24. Fan, Y., Li, R.: Variable selection in linear mixed effects models. Annals of statistics 40(4), 2043 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and computing 25(2), 173–187 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Ma, S., Huang, J.: A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association 112(517), 410–423 (2017)

    Article  MathSciNet  Google Scholar 

  27. Wang, X., Zhu, Z., Zhang, H.H.: Spatial Heterogeneity Automatic Detection and Estimation (2020)

  28. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33(1), 1 (2010)

    Article  Google Scholar 

  29. McConville, K.: Improved estimation for complex surveys using modern regression techniques. PhD thesis, Colorado State University (2011)

  30. Kim, Y., Choi, H., Oh, H.-S.: Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association 103(484), 1665–1673 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  31. Xie, H., Huang, J.: SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics 37(2), 673–696 (2009). https://doi.org/10.1214/07-AOS580

    Article  MathSciNet  MATH  Google Scholar 

  32. Wang, L., Li, H., Huang, J.Z.: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association 103(484), 1556–1569 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  33. Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics 5(1), 232–253 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  34. Wang, X., Zhu, Z., Zhang, H.H.: Spatial heterogeneity automatic detection and estimation. Computational Statistics & Data Analysis 180, 107667 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  35. Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Annals of statistics, 1026–1053 (2000)

  36. Hájek, J.: Limiting distributions in simple random sampling from a finite population. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 361–374 (1960)

    MathSciNet  MATH  Google Scholar 

  37. Krewski, D., Rao, J.N.K.: Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods. The Annals of Statistics, 1010–1019 (1981)

  38. Bickel, P.J., Freedman, D.A.: Asymptotic normality and the bootstrap in stratified sampling. The annals of statistics, 470–482 (1984)

  39. Hájek, J.: Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics 35(4), 1491–1523 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  40. Chen, J., Rao, J.: Asymptotic normality under two-phase sampling designs. Statistica sinica, 1047–1064 (2007)

  41. Tillé, Y.: An elimination procedure for unequal probability sampling without replacement. Biometrika 83(1), 238–241 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  42. Mashreghi, Z., Haziza, D., Léger, C.: A survey of bootstrap methods in finite population sampling. Statistics Surveys 10, 1–52 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  43. Booth, J.G., Butler, R.W., Hall, P.: Bootstrap methods for finite populations. Journal of the American Statistical Association 89(428), 1282–1289 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  44. Barbiero, A., Mecatti, F.: Bootstrap algorithms for variance estimation in \(\pi \)ps sampling. In: Complex Data Modeling and Computationally Intensive Statistical Methods, pp. 57–69. Springer, Italy (2010)

  45. Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml

  46. Avella-Medina, M., Ronchetti, E.: Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105(1), 31–44 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  47. Wang, L., Zhou, J., Qu, A.: Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68(2), 353–360 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  48. Tsung, C., Kuang, J., Valliant, R.L., Elliott, M.R.: Model-assisted calibration of non-probability sample survey data using adaptive lasso. Survey Methodology 44(1), 117–145 (2018)

    Google Scholar 

  49. Lehmann, E.L.: Elements of Large-sample Theory. Springer, New York (1999)

    Book  MATH  Google Scholar 

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Wang.

Ethics declarations

Conflict of interest

No financial or non-financial interests are directly or indirectly related to the submitted work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proof of Lemma 1

Here, we follow the procedure used in [29] to prove the property of the oracle estimator.

Recall that, the oracle estimator is defined as

$$\begin{aligned}\hat{\varvec{\beta }}_{1}^{or}=\left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}y_{i}. \end{aligned}$$

The dimension of \(\varvec{x}_{i1}\) is \(s+1\). Let \(\varvec{\beta }_{1N}=\left( \sum _{i=1}^{N}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{N}\varvec{x}_{i1}y_{i}\), and based on the definition, we know that \(\sum _{i=1}^{N}\varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) =0\).

Thus, we have

$$\begin{aligned} \hat{\varvec{\beta }}_{1}^{or}-\varvec{\beta }_{1N}&=\left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}y_{i}-\varvec{\beta }_{1N}\\&=\left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) \\&={ \left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}}\sum _{i=1}^{N}\left( \frac{I_{i}}{\pi _{i}}-1\right) \varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) . \end{aligned}$$

Based on assumptions D3, D4 and D5, we have

$$\begin{aligned}\frac{\sqrt{n}}{N}\sum _{i\in U}\left( \frac{I_{i}}{\pi _{i}}-1\right) \varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) \overset{d}{\rightarrow }N\left( 0,\varvec{\Sigma }_{1}\right) \end{aligned}$$

where \(\varvec{\Sigma }_{1}\) is a positive definite matrix. And

$$\begin{aligned}\frac{1}{N}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}-C_{1}=o_{p}\left( 1\right) ,\end{aligned}$$

where \(C_{1}\) is the \((s+1)\times (s+1)\) submatrix of C.

Thus we have,

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{\beta }}_{1}^{or}-\varvec{\beta }_{1N}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\pi ^{-1}C_{1}^{-1}\varvec{\Sigma }_{1}C_{1}^{-1}\right) . \end{aligned}$$
(1)

Under the superpopulation model, we know that \(\varvec{\beta }_{1N}\) is an OLS estimator of \(\varvec{\beta }_{1}\), by the similar arguments of [29] on page 80 (Theorem 2.7.4 in [49] and the Slutsky’s theorem ), we have that

$$\begin{aligned} \sqrt{N}\left( \varvec{\beta }_{1N}-\varvec{\beta }_{1}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\sigma ^{2}C_{1}^{-1}\right) . \end{aligned}$$
(2)

By combining results (1) and (2), we have

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{\beta }}_{1}^{or}-\varvec{\beta }_{1}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\varvec{V}_{1}\right) \end{aligned}$$

where \(\varvec{V}_{1}=\pi ^{-1}C_{1}^{-1}\varvec{\Sigma }_{1}C_{1}^{-1}+\sigma ^{2}C_{1}\). The completes the proof.

Proof of Theorem 1

Define the following two objective functions,

$$\begin{aligned} Q\left( \varvec{\beta }\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }\right) ^{2}+n\sum _{j=1}^{p}p_{\gamma }(\beta _{j};\lambda )=L(\varvec{\beta })+P(\varvec{\beta }), \end{aligned}$$
$$\begin{aligned} Q_{1}\left( \varvec{\beta }_{1}\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1}\right) ^{2}+n\sum _{j=1}^{s}p_{\gamma }(\beta _{j};\lambda )=L_{1}(\varvec{\beta }_{1})+P_{1}(\varvec{\beta }_{1}), \end{aligned}$$

where

$$\begin{aligned} L\left( \varvec{\beta }\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }\right) ^{2};\quad P\left( \varvec{\beta }\right) =n\lambda \sum _{j=1}^{p}\rho _{\gamma }(\beta _{j};\lambda ), \end{aligned}$$

and

$$\begin{aligned} L_{1}\left( \varvec{\beta }_{1}\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1}\right) ^{2};\quad P_{1}\left( \varvec{\beta }_{1}\right) =n\lambda \sum _{j=1}^{s}\rho _{\gamma }(\beta _{j};\lambda ). \end{aligned}$$

It is known that \(Q(\varvec{\beta })\) is the objective function in (11) to find the SCAD or MCP based survey-weighted regression estimator. Define a function \(\varvec{\beta }^{*}=T(\varvec{\beta })\) such that the first \(1+s\) components are the same as \(\varvec{\beta }\), but the last \(p-s\) components are 0.

Consider \(\Theta =\left\{ \varvec{\beta }:\Vert \varvec{\beta }-\varvec{\beta }_{0}\Vert \le \phi _{N}\right\} \). By Lemma 1, there exist an event \(E_{1}=\left\{ \Vert \hat{\varvec{\beta }}^{or}-\varvec{\beta }_{0}\Vert \le \phi _{N}\right\} \) such that \(P(E_{1}^{c})\le \epsilon _1\), where \(\phi _N = O(N^{-1/2+\delta })\) and \(0< \delta <1/2\). Thus \(\hat{\varvec{\beta }}^{or} \in \Theta \) is in \(E_1\).

The proof will be completed in two steps.

  1. 1.

    In event \(E_{1}\), show that \(Q\left( \varvec{\beta }^{*}\right) >Q\left( \hat{\varvec{\beta }}^{or}\right) \) for any \(\varvec{\beta }\in \Theta \) and \(\varvec{\beta }^{*}\ne \hat{\varvec{\beta }}^{or}\).

  2. 2.

    There is an event \(E_2\) such that \(P(E_2^c) \le \epsilon _2\). In \(E_1\cap E_2\), there is a neighborhood of \(\hat{\varvec{\beta }}^{or}\), denoted by \(\Theta _n\) such that \(Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) \) for any \(\varvec{\beta }\in \Theta \cap \Theta _{n}\) for sufficiently large n and N.

By combining the results in the two step, we have that \(Q(\varvec{\beta }) > Q(\hat{\varvec{\beta }}^{or})\) for any \(\varvec{\beta } \in \Theta _n \cap \Theta \) and \(\varvec{\beta } \ne \hat{\varvec{\beta }}^{or}\) in \(E_1\cap E_2\). Thus \(\hat{\varvec{\beta }}^{or}\) is a strict local minimizer of \(Q(\varvec{\beta })\) over \(E_1\cap E_2\) with \(P(E_1\cap E_2) \ge 1 - \epsilon _1 -\epsilon _2\).

First show that \(P\left( T(\varvec{\beta })\right) =C_{N}\) for \(\varvec{\beta }\in \Theta \). We know that for \(j=s+1,\dots , p\), \(\rho _{\gamma }(\beta _{j};\lambda )=0\) since \(\beta _{j}=0\). And for \(j=1,\dots ,s\),

$$\begin{aligned} \vert \beta _{j}\vert =\vert \beta _{j}-\beta _{j0}+\beta _{j0}\vert \ge \vert \beta _{j0}\vert -\sup _{0\le j \le s}\vert \beta _{j}-\beta _{j0}\vert \ge b-\phi _{N}>a\lambda , \end{aligned}$$

where the last inequality follows from the assumption that \(b>a\lambda \gg \phi _N\). Thus \(\rho _{\gamma }(\beta _{j};\lambda )\) is a constant for \(\vert \beta _{j}\vert \ge a\lambda \). Thus \(P(\varvec{\beta }^*) = P\left( T(\varvec{\beta })\right) \) is a constant and \(P(\varvec{\beta }^*) = P_1(\varvec{\beta }_1)\).

Since \(L\left( \varvec{\beta }^{*}\right) =L_{1}\left( \varvec{\beta }_{1}\right) \) and \(L\left( \hat{\varvec{\beta }}^{or}\right) =L_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \), thus \(Q\left( \varvec{\beta }^{*}\right) =Q_{1}\left( \varvec{\beta }_{1}\right) \) and \(Q\left( \hat{\varvec{\beta }}^{or}\right) =Q_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \). Also it is known that \(\hat{\varvec{\beta }}_{1}^{or}\) is the unique global minimizer of \(L_{1}\left( \varvec{\beta }_{1}\right) \), thus \(L_{1}\left( \varvec{\beta }_{1}\right) >L_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \) for \(\varvec{\beta }_{1}\ne \hat{\varvec{\beta }}_{1}^{or}\) and \(Q_{1}\left( \varvec{\beta }_{1}\right) >Q_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \) for \(\varvec{\beta }_{1}\ne \hat{\varvec{\beta }}_{1}^{or}\). Thus \(Q\left( \varvec{\beta }^{*}\right) >Q\left( \hat{\varvec{\beta }}^{or}\right) \) for any \(\varvec{\beta }\in \varvec{\Theta }\) and \(\varvec{\beta }^{*}\ne \hat{\varvec{\beta }}^{or}\).

Then the next step is to show that \(Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) \) for any \( \varvec{\beta }\in \Theta \cap \Theta _{n}\), where \(\Theta _{n}=\left\{ \varvec{\beta },\sup _{j}\vert \beta _{j}-{\hat{\beta }}_{j}^{or}\vert \le t_{n}\right\} \). For \(\varvec{\beta }\in \Theta _{n}\cap \Theta \), by Taylor expansion, we have

$$\begin{aligned} Q\left( \varvec{\beta }\right) -Q\left( \varvec{\beta }^{*}\right) =\Gamma _{1}+\Gamma _{2}, \end{aligned}$$

where

$$\begin{aligned} \Gamma _{1}&=-\left( \varvec{y}-\varvec{X}\varvec{\beta }^{m}\right) ^{T}\varvec{\Omega }\varvec{X}\left( \varvec{\beta }-\varvec{\beta }^{*}\right) ,\\ \Gamma _{2}&=n\lambda \sum _{j=1}^{p}\rho _{\gamma }^{\prime }\left( \beta _{j}^{m};\lambda \right) \frac{\beta _{j}^{m}}{\vert \beta _{j}^{m}\vert }\left( \beta _{j}-\beta _{j}^{*}\right) , \end{aligned}$$

\(\varvec{\beta }^{m}=\alpha \varvec{\beta }+\left( 1-\alpha \right) \varvec{\beta }^{*}\) for some constant \(\alpha \in \left( 0,1\right) \) and \(\varvec{\Omega } = diag(1/\pi _1,\dots , 1/\pi _n)\).

For \(j=0,1,\dots ,s\), \(\beta _{j}^{m}=\beta _{j}=\beta _{j}^{*}\) and for \(j=s+1,\dots p\), \(\beta _{j}^{*}=0\) and \(\beta _{j}^{m}=\alpha \beta _{j}\), thus

$$\begin{aligned} \Gamma _{2}=n\lambda \sum _{j=s+1}^{p}\rho _{\gamma }^{\prime }\left( \alpha \beta _{j};\lambda \right) \frac{\alpha \beta _{j}}{\vert \alpha \beta _{j}\vert }\beta _{j}=n\lambda \sum _{j=s+1}^{p}\rho _{\gamma }^{\prime }\left( \alpha \beta _{j};\lambda \right) \vert \beta _{j}\vert . \end{aligned}$$

Also for \(j=s+1,\dots p\), \({\hat{\beta }}_{j}^{or}=0\), thus \(\vert \beta _{j}\vert \le t_{n}\) and \(\rho _{\gamma }^{\prime }\left( \alpha \beta _{j};\lambda \right) \ge \rho _{\gamma }^{\prime }\left( \alpha t_{n};\lambda \right) \) by the concavity of \(\rho (\cdot )\). Thus,

$$\begin{aligned} \Gamma _{2}\ge n\lambda \sum _{j=s+1}^{p}\rho _{\gamma }^{\prime }\left( \alpha t_{n};\lambda \right) \vert \beta _{j}\vert . \end{aligned}$$
(3)

Now consider \(\Gamma _{1}\), which can be written as

$$\begin{aligned} \Gamma _{1} = -\sum _{j=s+1}^{p}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}\right) x_{ij}\beta _{j}. \end{aligned}$$
(4)

For \(j=s+1,\dots ,p\), we have the following,

$$\begin{aligned} \sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}\right) x_{ij}&=\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}+\varvec{x}_{i}^{T}\varvec{\beta }_{N}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij} \nonumber \\&=\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}+\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{N}-\varvec{\beta }^{m}\right) x_{ij} \nonumber \\&=n\left[ \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}+\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{N}-\varvec{\beta }^{m}\right) x_{ij}\right] , \end{aligned}$$
(5)

where \(\varvec{\beta }_N \) is finite population coefficient vector, which can be considered as a traditional OLS estimator of \(\varvec{\beta }_0\), thus \(\varvec{\beta }_N - \varvec{\beta }_0 = O_p(N^{-1/2})\). Then there exists an event \(E_{21} = \{ \vert \varvec{\beta }_N - \varvec{\beta }_0\vert \le \phi _N\}\), such that \(P(E_{21}^c) \le \epsilon _{21}\). Thus over event \(E_{21}\), we have

$$\begin{aligned} \vert \beta _{Nj}-\beta _{j}^{m}\vert ={\left\{ \begin{array}{ll} \vert \beta _{Nj}-\beta _{j}\vert \le \vert \beta _{Nj}-\beta _{0j}\vert +\vert \beta _{j}-\beta _{0j}\vert \le 2\phi _{N}, &{} \text {for }j=0,\dots ,s,\\ \vert \beta _{Nj}-\alpha \beta _{j}\vert \le \vert \beta _{Nj}-\beta _{0j}\vert +\alpha \vert \beta _{j}-\beta _{0j}\vert \le 2\phi _{N}, &{} \text {for }j=s+1,\dots p. \end{array}\right. } \end{aligned}$$

Also from assumption C3 in [29], it is known that \(\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}x_{ij}=O_{p}\left( 1\right) \). Thus, there exists an event \(E_{22}\) such that \(P(E_{22}^c) \le \epsilon _{22}\) and over event \(E_{22}\), \(\vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij} \vert \le M_1\). Thus the second part of (5) becomes,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{N}-\varvec{\beta }^{m}\right) x_{ij}&=\sum _{l=0}^{p}\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij}\left( \beta _{Nl}-\beta _{l}^{m}\right) \le \sum _{l=0}^{p}\vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij}\left( \beta _{Nl}-\beta _{l}^{m}\right) \vert \nonumber \\&\le \sum _{l=0}^{p}\vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij}\vert \vert \beta _{Nl}-\beta _{l}^{m}\vert \le 2(p+1)M_{1}\phi _{N}. \end{aligned}$$
(6)

From page 73 of [29],

$$\begin{aligned} \frac{\sqrt{N}}{n}\sum _{i\in U}\left( \frac{I_{i}}{\pi _{i}}-1\right) \left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) \varvec{x}_{i}\overset{d}{\rightarrow }N\left( 0,\pi ^{-3}\varvec{V}\right) . \end{aligned}$$

Thus, \(\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}=O_{p}\left( N^{-1/2}\right) \), thus, \(\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}=o_{p}\left( N^{-1/2+\delta }\right) = o_p(\phi _N)\). This means that there exists an event \(E_{23}\) such that \(P(E_{23}^c)\le \epsilon _{23}\), and over event \(E_{23}\), the first part of (5) can be bounded by

$$\begin{aligned} \vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}\vert \le M_{2}\phi _N, \end{aligned}$$
(7)

where \(M_2 >0\).

From (6) and (7), \(\Gamma _1\) in (4) can be bounded by

$$\begin{aligned} \Gamma _{1}&\ge -\sum _{j=s+1}^{p}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}\right) x_{ij}\vert \beta _{j}\vert \nonumber \\&\ge -n\left( M_{2}\phi _N+2(p+1)M_{1}\phi _{N}\right) \sum _{j=s+1}^{p}\vert \beta _{j}\vert . \end{aligned}$$
(8)

According to (3) and (8), we have

$$\begin{aligned} \Gamma _{1}+\Gamma _{2}\ge n\left\{ \sum _{j=s+1}^{p}\left[ \lambda \rho _{\gamma }^{\prime }\left( \alpha t_{n};\lambda \right) -M_{2}\phi _N-2(p+1)M_{1}\phi _{N}\right] \vert \beta _{j}\vert \right\} . \end{aligned}$$

Since \(\rho ^{\prime }\left( \alpha t_{n}\right) \rightarrow 1\) and \(\lambda \gg \phi _{N}\), thus \(Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) \) for sufficient large n and N. And here event \(E_2 = E_{21}\cap E_{22} \cap E_{23}\).

If let \(t_n = o(N^{-1/2})\), thus \(\sqrt{N}\left( \hat{\varvec{\beta }}_{CC}-\hat{\varvec{\beta }}^{or}\right) \overset{p}{\rightarrow }0\).

Proof of Theorem 2

The procedure of proving Theorem 2 is similar to the proof in [29] (page 75–78).

From the assumption D5 in [29] and [18] provided above,

$$\begin{aligned} \left( \begin{array}{c} \frac{\sqrt{n}}{N}\sum _{i\in U}y_{i}\left( \frac{I_{i}}{\pi _{i}}-1\right) \\ \frac{\sqrt{n}}{N}\sum _{i\in U}\varvec{x}_{i}\left( \frac{I_{i}}{\pi _{i}}-1\right) \end{array}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\left[ \begin{array}{cc} \Sigma ^{yy} &{} \varvec{\Sigma }^{yx}\\ \varvec{\Sigma }^{xy} &{} \varvec{\Sigma }^{xx} \end{array}\right] \right) . \end{aligned}$$

It is known that \(\hat{\varvec{\beta }}_{CC}\) converges in probability to \(\varvec{\beta }_{0}\), we apply the Slutsky’s theorem and obtain

$$\begin{aligned} \frac{\sqrt{n}}{N}\left( {\hat{Y}}_{\text {diff}}-{\hat{Y}}_{CC}\right) =\frac{\sqrt{n}}{N}\sum _{i\in U}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{0}-\hat{\varvec{\beta }}_{CC}\right) \left( 1-\frac{I_{i}}{\pi _{i}}\right) = o_p(1). \end{aligned}$$

Furthermore, define a function \(g\left( \cdot ,\cdot \right) \) such that \(g\left( \varvec{a},\varvec{b}\right) =\left( a_{1},\varvec{a}_{2}^{T}\varvec{b}\right) \), and we apply the Slutsky’s theorem and have

$$\begin{aligned} \left( \begin{array}{c} \frac{\sqrt{n}}{N}\sum _{i\in U}y_{i}\left( \frac{I_{i}}{\pi _{i}}-1\right) \\ \frac{\sqrt{n}}{N}\sum _{i\in U}\varvec{x}_{i}^{T}\hat{\varvec{\beta }}_{CC}\left( \frac{I_{i}}{\pi _{i}}-1\right) \end{array}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\left[ \begin{array}{cc} \Sigma ^{yy} &{} \varvec{\Sigma }^{yx}\varvec{\beta }_{0}\\ \varvec{\beta }_{0}\varvec{\Sigma }^{xy} &{} \varvec{\beta }_{0}^{T}\varvec{\Sigma }^{xx}\varvec{\beta }_{0} \end{array}\right] \right) . \end{aligned}$$

Consider \(h\left( a_{1},a_{2}\right) =a_{1}-a_{2}\). By using Delta method, we have that

$$\begin{aligned} \frac{\sqrt{n}}{N}\left[ \Sigma ^{yy}-\varvec{\Sigma }^{yx}\varvec{\beta }_{0}-\varvec{\beta }_{0}^{T}\varvec{\Sigma }^{xy}+\varvec{\beta }_{0}^{T}\varvec{\Sigma }^{xx}\varvec{\beta }_{0}\right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right) \overset{d}{\rightarrow }N\left( 0,1\right) . \end{aligned}$$

Now consider

$$\begin{aligned} V\left( {\hat{Y}}_{\text {diff}}\right)&=\sum _{i\in U}\sum _{j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{0}\right) \left( y_{j}-\varvec{x}_{j}^{T}\varvec{\beta }_{0}\right) \\&=\frac{N^{2}}{n}\left( \Sigma _{N}^{yy}-\varvec{\Sigma }_{N}^{yx}\varvec{\beta }_{0}-\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xy}+\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xx}\varvec{\beta }_{0}\right) , \end{aligned}$$

and

$$\begin{aligned} {\hat{V}}\left( {\hat{Y}}_{CC}\right) =\sum \sum _{i,j\in S}\frac{\Delta _{ij}}{\pi _{ij}}\frac{\left( y_{i}-\varvec{x}_{i}^{T}\hat{\varvec{\beta }}_{CC}\right) }{\pi _{i}}\frac{\left( y_{j}-\varvec{x}_{j}^{T}\hat{\varvec{\beta }}_{CC}\right) }{\pi _{j}}. \end{aligned}$$

We have,

$$\begin{aligned} \frac{n}{N^{2}}{\hat{V}}\left( {\hat{Y}}_{CC}\right)&=\frac{n}{N^{2}}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}y_{i}y_{j}-\frac{n}{N^{2}}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}y_{i}\varvec{x}_{j}^{T}\hat{\varvec{\beta }}_{CC}\\&\quad -\hat{\varvec{\beta }}_{CC}^{T}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}\varvec{x}_{i}y_{j}-\hat{\varvec{\beta }}_{CC}^T\frac{n}{N^{2}}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}\varvec{x}_{i}\varvec{x}_{j}^{T}\hat{\varvec{\beta }}_{CC}\\&= {\hat{\Sigma }}_{N}^{yy}-\varvec{{\hat{\Sigma }}}_{N}^{yx}\hat{\varvec{\beta }}_{CC}-\hat{\varvec{\beta }}_{CC}^{T}\varvec{{\hat{\Sigma }}}_{N}^{xy}+\hat{\varvec{\beta }}^{T}_{CC}\varvec{{\hat{\Sigma }}}_{N}^{xx}\hat{\varvec{\beta }}_{CC}. \end{aligned}$$

By Theorem 1 and the assumption D6 in [29], we have that

$$\begin{aligned} \frac{n}{N^{2}}{\hat{V}}\left( {\hat{Y}}_{CC}\right)&=\Sigma _{N}^{yy}-\varvec{\Sigma }_{N}^{yx}\varvec{\beta }_{0}-\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xy}+\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xx}\varvec{\beta }_{0}+o_{p}(1)\\&=\frac{n}{N^{2}}V\left( {\hat{Y}}_{\text {diff}}\right) +o_{p}\left( 1\right) . \end{aligned}$$

Thus

$$\begin{aligned} \left[ {\hat{V}}\left( {\hat{Y}}_{CC}\right) \right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right)&=\left[ V\left( {\hat{Y}}_{\text {diff}}\right) \right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right) +o_{p}\left( \frac{\sqrt{n}}{N}\right) O_{p}\left( \frac{N}{\sqrt{n}}\right) \\&=\left[ V\left( {\hat{Y}}_{\text {diff}}\right) \right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right) +o_{p}\left( 1\right) \\&\overset{d}{\rightarrow }N\left( 0,1\right) . \end{aligned}$$

Bootstrap algorithms

1.1 SRS

The algorithm described here is from [42].

  1. 1

    For each unit in the original sample, repeat \(k = \lfloor N/n \rfloor \) times to create the fixed part of the pseudo-population \(U^f\).

  2. 2

    Take a simple random sample \(U^{c}\) of size \(N - nk\) without replacement from the original sample. The completed pseudo-population is \(U^* = U^f \cup U^{c}\).

  3. 3

    Take a simple random sample of size n without replacement from \(U^*\).

  4. 4

    Compute the bootstrap statistic, \({\hat{Y}}^*\), based on the bootstrap sample.

  5. 5

    Repeat steps 2 to 4 B times to get bootstrap estimates.

1.2 PPS

Here 0.5\(\pi \)ps-bootstrap algorithm in [44] is described.

  1. 1

    For each unit in the original sample, \((y_i, \varvec{x}_i, M_i,\pi _i)\), repeat \(d_i\) times to create the bootstrap pseudo-population \(U^*\). The bootstrap population size is \(N^* = \sum _{i \in S}d_i\) and \(M^* = \sum _{i \in S}d_iM_i\).

  2. 2

    Take a bootstrap sample a bootstrap sample with sample size n with inclusion probabilites \(n \frac{M_i^*}{M^*}\).

  3. 3

    Repeat steps 1 to 2 B times to get bootstrap estimates.

\(d_i\) is determined in the following way. Let \(\pi ^{-1} = c_i + r_i\), where \(c_i = \lfloor \pi ^{-1}\rfloor \). Then, \(d_i = c_i\) if \(r_i <0.5\), \(d_i = c_i + 1\), if \(r_i \ge 0.5\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McDonald, E., Wang, X. Generalized regression estimators with concave penalties and a comparison to lasso type estimators. METRON (2023). https://doi.org/10.1007/s40300-023-00253-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40300-023-00253-4

Keywords

Navigation