Generalized regression estimators with concave penalties and a comparison to lasso type estimators

McDonald, Elena; Wang, Xin

doi:10.1007/s40300-023-00253-4

Generalized regression estimators with concave penalties and a comparison to lasso type estimators

Published: 09 August 2023

(2023)
Cite this article

METRON Aims and scope Submit manuscript

177 Accesses
Explore all metrics

Abstract

The generalized regression (GREG) estimator is usually used in survey sampling when incorporating auxiliary information. Generally, not all available covariates significantly contribute to the estimation process when there are multiple covariates. We propose two new GREG estimators based on concave penalties: one built from the smoothly clipped absolute deviation (SCAD) penalty and the other built from the minimax concave penalty (MCP). The performances of these estimators are compared to lasso-type estimators through a simulation study in a simple random sample (SRS) setting and a probability proportional to size (PPS) sample setting. It is shown that the proposed estimators produce improved estimates of the population total compared to that of the traditional GREG estimator and the estimators built from LASSO. Asymptotic properties are also derived for the proposed estimators. Bootstrap methods are also explored to improve coverage probability when the sample size is small.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Data Availability

The data set is available from UC Irvine Machine Learning Repository. https://archive.ics.uci.edu/dataset/211/communities+and+crime+unnormalized.

References

Cassel, C.M., Särndal, C.E., Wretman, J.H.: Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63(3), 615–620 (1976)
Article MathSciNet MATH Google Scholar
Särndal, C.-E., Swensson, B., Wretman, J.: Model Assisted Survey Sampling. Springer, New York (2003)
MATH Google Scholar
Fuller, W.A.: Sampling Statistics. John Wiley & Sons, Hoboken, New Jersey (2011)
MATH Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47(260), 663–685 (1952)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Annals of statistics, 1356–1378 (2000)
Zou, H.: The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476), 1418–1429 (2006)
Article MathSciNet MATH Google Scholar
Fan, J.: Comments on «wavelets in statistics: A review» by a. antoniadis. Journal of the Italian Statistical Society 6(2), 131–138 (1997)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2), 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research 11(3) (2010)
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 101–148 (2010)
Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Statistical Science 27(4), 481–499 (2012)
Article MathSciNet MATH Google Scholar
Wang, H., Li, G., Jiang, G.: Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics 25(3), 347–355 (2007)
Article MathSciNet Google Scholar
Wang, M., Song, L., Tian, G.-l: Scad-penalized least absolute deviation regression in high-dimensional models. Communications in Statistics-Theory and Methods 44(12), 2452–2472 (2015)
Article MathSciNet MATH Google Scholar
Jiang, H., Zheng, W., Dong, Y.: Sparse and robust estimation with ridge minimax concave penalty. Information Sciences 571, 154–174 (2021)
Article MathSciNet Google Scholar
Staerk, C., Kateri, M., Ntzoufras, I.: High-dimensional variable selection via low-dimensional adaptive learning. Electronic Journal of Statistics 15(1), 830–879 (2021)
Article MathSciNet MATH Google Scholar
McConville, K.S., Breidt, F.J., Lee, T.C., Moisen, G.G.: Model-assisted survey regression estimation with the lasso. Journal of Survey Statistics and Methodology 5(2), 131–158 (2017)
Article Google Scholar
Ta, T., Shao, J., Li, Q., Wang, L.: Generalized regression estimators with high-dimensional covariates. Statistica Sinica 30(3), 1135–1154 (2020)
MathSciNet MATH Google Scholar
Dagdoug, M., Goga, C., Haziza, D.: Model-assisted estimation through random forests in finite population sampling. Journal of the American Statistical Association, 1–18 (2021)
Dagdoug, M., Goga, C., Haziza, D.: Model-assisted estimation in high-dimensional settings for survey data. Journal of Applied Statistics, 1–25 (2022)
Chauvet, G., Goga, C.: Asymptotic efficiency of the calibration estimator in a high-dimensional data setting. Journal of Statistical Planning and Inference 217, 177–187 (2022)
Article MathSciNet MATH Google Scholar
Wei, F., Zhu, H.: Group coordinate descent algorithms for nonconvex penalized regression. Computational statistics & data analysis 56(2), 316–326 (2012)
Article MathSciNet MATH Google Scholar
Fan, Y., Li, R.: Variable selection in linear mixed effects models. Annals of statistics 40(4), 2043 (2012)
Article MathSciNet MATH Google Scholar
Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and computing 25(2), 173–187 (2015)
Article MathSciNet MATH Google Scholar
Ma, S., Huang, J.: A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association 112(517), 410–423 (2017)
Article MathSciNet Google Scholar
Wang, X., Zhu, Z., Zhang, H.H.: Spatial Heterogeneity Automatic Detection and Estimation (2020)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33(1), 1 (2010)
Article Google Scholar
McConville, K.: Improved estimation for complex surveys using modern regression techniques. PhD thesis, Colorado State University (2011)
Kim, Y., Choi, H., Oh, H.-S.: Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association 103(484), 1665–1673 (2008)
Article MathSciNet MATH Google Scholar
Xie, H., Huang, J.: SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics 37(2), 673–696 (2009). https://doi.org/10.1214/07-AOS580
Article MathSciNet MATH Google Scholar
Wang, L., Li, H., Huang, J.Z.: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association 103(484), 1556–1569 (2008)
Article MathSciNet MATH Google Scholar
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics 5(1), 232–253 (2011)
Article MathSciNet MATH Google Scholar
Wang, X., Zhu, Z., Zhang, H.H.: Spatial heterogeneity automatic detection and estimation. Computational Statistics & Data Analysis 180, 107667 (2023)
Article MathSciNet MATH Google Scholar
Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Annals of statistics, 1026–1053 (2000)
Hájek, J.: Limiting distributions in simple random sampling from a finite population. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 361–374 (1960)
MathSciNet MATH Google Scholar
Krewski, D., Rao, J.N.K.: Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods. The Annals of Statistics, 1010–1019 (1981)
Bickel, P.J., Freedman, D.A.: Asymptotic normality and the bootstrap in stratified sampling. The annals of statistics, 470–482 (1984)
Hájek, J.: Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics 35(4), 1491–1523 (1964)
Article MathSciNet MATH Google Scholar
Chen, J., Rao, J.: Asymptotic normality under two-phase sampling designs. Statistica sinica, 1047–1064 (2007)
Tillé, Y.: An elimination procedure for unequal probability sampling without replacement. Biometrika 83(1), 238–241 (1996)
Article MathSciNet MATH Google Scholar
Mashreghi, Z., Haziza, D., Léger, C.: A survey of bootstrap methods in finite population sampling. Statistics Surveys 10, 1–52 (2016)
Article MathSciNet MATH Google Scholar
Booth, J.G., Butler, R.W., Hall, P.: Bootstrap methods for finite populations. Journal of the American Statistical Association 89(428), 1282–1289 (1994)
Article MathSciNet MATH Google Scholar
Barbiero, A., Mecatti, F.: Bootstrap algorithms for variance estimation in $\pi $ps sampling. In: Complex Data Modeling and Computationally Intensive Statistical Methods, pp. 57–69. Springer, Italy (2010)
Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml
Avella-Medina, M., Ronchetti, E.: Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105(1), 31–44 (2018)
Article MathSciNet MATH Google Scholar
Wang, L., Zhou, J., Qu, A.: Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68(2), 353–360 (2012)
Article MathSciNet MATH Google Scholar
Tsung, C., Kuang, J., Valliant, R.L., Elliott, M.R.: Model-assisted calibration of non-probability sample survey data using adaptive lasso. Survey Methodology 44(1), 117–145 (2018)
Google Scholar
Lehmann, E.L.: Elements of Large-sample Theory. Springer, New York (1999)
Book MATH Google Scholar

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Archer Daniels Midland, Erlanger, KY, USA
Elena McDonald
Department of Mathematics and Statistics, San Diego State University, San Diego, CA, USA
Xin Wang

Authors

Elena McDonald
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Wang.

Ethics declarations

Conflict of interest

No financial or non-financial interests are directly or indirectly related to the submitted work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proof of Lemma 1

Here, we follow the procedure used in [29] to prove the property of the oracle estimator.

Recall that, the oracle estimator is defined as

$$\begin{aligned}\hat{\varvec{\beta }}_{1}^{or}=\left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}y_{i}. \end{aligned}$$

The dimension of $\varvec{x}_{i1}$ is $s+1$. Let $\varvec{\beta }_{1N}=\left( \sum _{i=1}^{N}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{N}\varvec{x}_{i1}y_{i}$, and based on the definition, we know that $\sum _{i=1}^{N}\varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) =0$.

Thus, we have

$$\begin{aligned} \hat{\varvec{\beta }}_{1}^{or}-\varvec{\beta }_{1N}&=\left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}y_{i}-\varvec{\beta }_{1N}\\&=\left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) \\&={ \left( \sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}}\sum _{i=1}^{N}\left( \frac{I_{i}}{\pi _{i}}-1\right) \varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) . \end{aligned}$$

Based on assumptions D3, D4 and D5, we have

$$\begin{aligned}\frac{\sqrt{n}}{N}\sum _{i\in U}\left( \frac{I_{i}}{\pi _{i}}-1\right) \varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) \overset{d}{\rightarrow }N\left( 0,\varvec{\Sigma }_{1}\right) \end{aligned}$$

where $\varvec{\Sigma }_{1}$ is a positive definite matrix. And

$$\begin{aligned}\frac{1}{N}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i1}\varvec{x}_{i1}^{T}-C_{1}=o_{p}\left( 1\right) ,\end{aligned}$$

where $C_{1}$ is the $(s+1)\times (s+1)$ submatrix of C.

Thus we have,

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{\beta }}_{1}^{or}-\varvec{\beta }_{1N}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\pi ^{-1}C_{1}^{-1}\varvec{\Sigma }_{1}C_{1}^{-1}\right) . \end{aligned}$$

(1)

Under the superpopulation model, we know that $\varvec{\beta }_{1N}$ is an OLS estimator of $\varvec{\beta }_{1}$, by the similar arguments of [29] on page 80 (Theorem 2.7.4 in [49] and the Slutsky’s theorem ), we have that

$$\begin{aligned} \sqrt{N}\left( \varvec{\beta }_{1N}-\varvec{\beta }_{1}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\sigma ^{2}C_{1}^{-1}\right) . \end{aligned}$$

(2)

By combining results (1) and (2), we have

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{\beta }}_{1}^{or}-\varvec{\beta }_{1}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\varvec{V}_{1}\right) \end{aligned}$$

where $\varvec{V}_{1}=\pi ^{-1}C_{1}^{-1}\varvec{\Sigma }_{1}C_{1}^{-1}+\sigma ^{2}C_{1}$. The completes the proof.

Proof of Theorem 1

Define the following two objective functions,

$$\begin{aligned} Q\left( \varvec{\beta }\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }\right) ^{2}+n\sum _{j=1}^{p}p_{\gamma }(\beta _{j};\lambda )=L(\varvec{\beta })+P(\varvec{\beta }), \end{aligned}$$

$$\begin{aligned} Q_{1}\left( \varvec{\beta }_{1}\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1}\right) ^{2}+n\sum _{j=1}^{s}p_{\gamma }(\beta _{j};\lambda )=L_{1}(\varvec{\beta }_{1})+P_{1}(\varvec{\beta }_{1}), \end{aligned}$$

where

$$\begin{aligned} L\left( \varvec{\beta }\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }\right) ^{2};\quad P\left( \varvec{\beta }\right) =n\lambda \sum _{j=1}^{p}\rho _{\gamma }(\beta _{j};\lambda ), \end{aligned}$$

and

$$\begin{aligned} L_{1}\left( \varvec{\beta }_{1}\right) =\frac{1}{2}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1}\right) ^{2};\quad P_{1}\left( \varvec{\beta }_{1}\right) =n\lambda \sum _{j=1}^{s}\rho _{\gamma }(\beta _{j};\lambda ). \end{aligned}$$

It is known that $Q(\varvec{\beta })$ is the objective function in (11) to find the SCAD or MCP based survey-weighted regression estimator. Define a function $\varvec{\beta }^{*}=T(\varvec{\beta })$ such that the first $1+s$ components are the same as $\varvec{\beta }$, but the last $p-s$ components are 0.

Consider $\Theta =\left\{ \varvec{\beta }:\Vert \varvec{\beta }-\varvec{\beta }_{0}\Vert \le \phi _{N}\right\} $. By Lemma 1, there exist an event $E_{1}=\left\{ \Vert \hat{\varvec{\beta }}^{or}-\varvec{\beta }_{0}\Vert \le \phi _{N}\right\} $ such that $P(E_{1}^{c})\le \epsilon _1$, where $\phi _N = O(N^{-1/2+\delta })$ and $0< \delta <1/2$. Thus $\hat{\varvec{\beta }}^{or} \in \Theta $ is in $E_1$.

The proof will be completed in two steps.

1.
In event $E_{1}$, show that $Q\left( \varvec{\beta }^{*}\right) >Q\left( \hat{\varvec{\beta }}^{or}\right) $ for any $\varvec{\beta }\in \Theta $ and $\varvec{\beta }^{*}\ne \hat{\varvec{\beta }}^{or}$.
2.
There is an event $E_2$ such that $P(E_2^c) \le \epsilon _2$. In $E_1\cap E_2$, there is a neighborhood of $\hat{\varvec{\beta }}^{or}$, denoted by $\Theta _n$ such that $Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) $ for any $\varvec{\beta }\in \Theta \cap \Theta _{n}$ for sufficiently large n and N.

By combining the results in the two step, we have that $Q(\varvec{\beta }) > Q(\hat{\varvec{\beta }}^{or})$ for any $\varvec{\beta } \in \Theta _n \cap \Theta $ and $\varvec{\beta } \ne \hat{\varvec{\beta }}^{or}$ in $E_1\cap E_2$. Thus $\hat{\varvec{\beta }}^{or}$ is a strict local minimizer of $Q(\varvec{\beta })$ over $E_1\cap E_2$ with $P(E_1\cap E_2) \ge 1 - \epsilon _1 -\epsilon _2$.

First show that $P\left( T(\varvec{\beta })\right) =C_{N}$ for $\varvec{\beta }\in \Theta $. We know that for $j=s+1,\dots , p$, $\rho _{\gamma }(\beta _{j};\lambda )=0$ since $\beta _{j}=0$. And for $j=1,\dots ,s$,

$$\begin{aligned} \vert \beta _{j}\vert =\vert \beta _{j}-\beta _{j0}+\beta _{j0}\vert \ge \vert \beta _{j0}\vert -\sup _{0\le j \le s}\vert \beta _{j}-\beta _{j0}\vert \ge b-\phi _{N}>a\lambda , \end{aligned}$$

where the last inequality follows from the assumption that $b>a\lambda \gg \phi _N$. Thus $\rho _{\gamma }(\beta _{j};\lambda )$ is a constant for $\vert \beta _{j}\vert \ge a\lambda $. Thus $P(\varvec{\beta }^*) = P\left( T(\varvec{\beta })\right) $ is a constant and $P(\varvec{\beta }^*) = P_1(\varvec{\beta }_1)$.

Since $L\left( \varvec{\beta }^{*}\right) =L_{1}\left( \varvec{\beta }_{1}\right) $ and $L\left( \hat{\varvec{\beta }}^{or}\right) =L_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) $, thus $Q\left( \varvec{\beta }^{*}\right) =Q_{1}\left( \varvec{\beta }_{1}\right) $ and $Q\left( \hat{\varvec{\beta }}^{or}\right) =Q_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) $. Also it is known that $\hat{\varvec{\beta }}_{1}^{or}$ is the unique global minimizer of $L_{1}\left( \varvec{\beta }_{1}\right) $, thus $L_{1}\left( \varvec{\beta }_{1}\right) >L_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) $ for $\varvec{\beta }_{1}\ne \hat{\varvec{\beta }}_{1}^{or}$ and $Q_{1}\left( \varvec{\beta }_{1}\right) >Q_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) $ for $\varvec{\beta }_{1}\ne \hat{\varvec{\beta }}_{1}^{or}$. Thus $Q\left( \varvec{\beta }^{*}\right) >Q\left( \hat{\varvec{\beta }}^{or}\right) $ for any $\varvec{\beta }\in \varvec{\Theta }$ and $\varvec{\beta }^{*}\ne \hat{\varvec{\beta }}^{or}$.

Then the next step is to show that $Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) $ for any $ \varvec{\beta }\in \Theta \cap \Theta _{n}$, where $\Theta _{n}=\left\{ \varvec{\beta },\sup _{j}\vert \beta _{j}-{\hat{\beta }}_{j}^{or}\vert \le t_{n}\right\} $. For $\varvec{\beta }\in \Theta _{n}\cap \Theta $, by Taylor expansion, we have

$$\begin{aligned} Q\left( \varvec{\beta }\right) -Q\left( \varvec{\beta }^{*}\right) =\Gamma _{1}+\Gamma _{2}, \end{aligned}$$

where

$$\begin{aligned} \Gamma _{1}&=-\left( \varvec{y}-\varvec{X}\varvec{\beta }^{m}\right) ^{T}\varvec{\Omega }\varvec{X}\left( \varvec{\beta }-\varvec{\beta }^{*}\right) ,\\ \Gamma _{2}&=n\lambda \sum _{j=1}^{p}\rho _{\gamma }^{\prime }\left( \beta _{j}^{m};\lambda \right) \frac{\beta _{j}^{m}}{\vert \beta _{j}^{m}\vert }\left( \beta _{j}-\beta _{j}^{*}\right) , \end{aligned}$$

$\varvec{\beta }^{m}=\alpha \varvec{\beta }+\left( 1-\alpha \right) \varvec{\beta }^{*}$ for some constant $\alpha \in \left( 0,1\right) $ and $\varvec{\Omega } = diag(1/\pi _1,\dots , 1/\pi _n)$.

For $j=0,1,\dots ,s$, $\beta _{j}^{m}=\beta _{j}=\beta _{j}^{*}$ and for $j=s+1,\dots p$, $\beta _{j}^{*}=0$ and $\beta _{j}^{m}=\alpha \beta _{j}$, thus

$$\begin{aligned} \Gamma _{2}=n\lambda \sum _{j=s+1}^{p}\rho _{\gamma }^{\prime }\left( \alpha \beta _{j};\lambda \right) \frac{\alpha \beta _{j}}{\vert \alpha \beta _{j}\vert }\beta _{j}=n\lambda \sum _{j=s+1}^{p}\rho _{\gamma }^{\prime }\left( \alpha \beta _{j};\lambda \right) \vert \beta _{j}\vert . \end{aligned}$$

Also for $j=s+1,\dots p$, ${\hat{\beta }}_{j}^{or}=0$, thus $\vert \beta _{j}\vert \le t_{n}$ and $\rho _{\gamma }^{\prime }\left( \alpha \beta _{j};\lambda \right) \ge \rho _{\gamma }^{\prime }\left( \alpha t_{n};\lambda \right) $ by the concavity of $\rho (\cdot )$. Thus,

$$\begin{aligned} \Gamma _{2}\ge n\lambda \sum _{j=s+1}^{p}\rho _{\gamma }^{\prime }\left( \alpha t_{n};\lambda \right) \vert \beta _{j}\vert . \end{aligned}$$

(3)

Now consider $\Gamma _{1}$, which can be written as

$$\begin{aligned} \Gamma _{1} = -\sum _{j=s+1}^{p}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}\right) x_{ij}\beta _{j}. \end{aligned}$$

(4)

For $j=s+1,\dots ,p$, we have the following,

$$\begin{aligned} \sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}\right) x_{ij}&=\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}+\varvec{x}_{i}^{T}\varvec{\beta }_{N}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij} \nonumber \\&=\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}+\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{N}-\varvec{\beta }^{m}\right) x_{ij} \nonumber \\&=n\left[ \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}+\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{N}-\varvec{\beta }^{m}\right) x_{ij}\right] , \end{aligned}$$

(5)

where $\varvec{\beta }_N $ is finite population coefficient vector, which can be considered as a traditional OLS estimator of $\varvec{\beta }_0$, thus $\varvec{\beta }_N - \varvec{\beta }_0 = O_p(N^{-1/2})$. Then there exists an event $E_{21} = \{ \vert \varvec{\beta }_N - \varvec{\beta }_0\vert \le \phi _N\}$, such that $P(E_{21}^c) \le \epsilon _{21}$. Thus over event $E_{21}$, we have

$$\begin{aligned} \vert \beta _{Nj}-\beta _{j}^{m}\vert ={\left\{ \begin{array}{ll} \vert \beta _{Nj}-\beta _{j}\vert \le \vert \beta _{Nj}-\beta _{0j}\vert +\vert \beta _{j}-\beta _{0j}\vert \le 2\phi _{N}, &{} \text {for }j=0,\dots ,s,\\ \vert \beta _{Nj}-\alpha \beta _{j}\vert \le \vert \beta _{Nj}-\beta _{0j}\vert +\alpha \vert \beta _{j}-\beta _{0j}\vert \le 2\phi _{N}, &{} \text {for }j=s+1,\dots p. \end{array}\right. } \end{aligned}$$

Also from assumption C3 in [29], it is known that $\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}x_{ij}=O_{p}\left( 1\right) $. Thus, there exists an event $E_{22}$ such that $P(E_{22}^c) \le \epsilon _{22}$ and over event $E_{22}$, $\vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij} \vert \le M_1$. Thus the second part of (5) becomes,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{N}-\varvec{\beta }^{m}\right) x_{ij}&=\sum _{l=0}^{p}\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij}\left( \beta _{Nl}-\beta _{l}^{m}\right) \le \sum _{l=0}^{p}\vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij}\left( \beta _{Nl}-\beta _{l}^{m}\right) \vert \nonumber \\&\le \sum _{l=0}^{p}\vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij}\vert \vert \beta _{Nl}-\beta _{l}^{m}\vert \le 2(p+1)M_{1}\phi _{N}. \end{aligned}$$

(6)

From page 73 of [29],

$$\begin{aligned} \frac{\sqrt{N}}{n}\sum _{i\in U}\left( \frac{I_{i}}{\pi _{i}}-1\right) \left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) \varvec{x}_{i}\overset{d}{\rightarrow }N\left( 0,\pi ^{-3}\varvec{V}\right) . \end{aligned}$$

Thus, $\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}=O_{p}\left( N^{-1/2}\right) $, thus, $\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}=o_{p}\left( N^{-1/2+\delta }\right) = o_p(\phi _N)$. This means that there exists an event $E_{23}$ such that $P(E_{23}^c)\le \epsilon _{23}$, and over event $E_{23}$, the first part of (5) can be bounded by

$$\begin{aligned} \vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}\vert \le M_{2}\phi _N, \end{aligned}$$

(7)

where $M_2 >0$.

From (6) and (7), $\Gamma _1$ in (4) can be bounded by

$$\begin{aligned} \Gamma _{1}&\ge -\sum _{j=s+1}^{p}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }^{m}\right) x_{ij}\vert \beta _{j}\vert \nonumber \\&\ge -n\left( M_{2}\phi _N+2(p+1)M_{1}\phi _{N}\right) \sum _{j=s+1}^{p}\vert \beta _{j}\vert . \end{aligned}$$

(8)

According to (3) and (8), we have

$$\begin{aligned} \Gamma _{1}+\Gamma _{2}\ge n\left\{ \sum _{j=s+1}^{p}\left[ \lambda \rho _{\gamma }^{\prime }\left( \alpha t_{n};\lambda \right) -M_{2}\phi _N-2(p+1)M_{1}\phi _{N}\right] \vert \beta _{j}\vert \right\} . \end{aligned}$$

Since $\rho ^{\prime }\left( \alpha t_{n}\right) \rightarrow 1$ and $\lambda \gg \phi _{N}$, thus $Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) $ for sufficient large n and N. And here event $E_2 = E_{21}\cap E_{22} \cap E_{23}$.

If let $t_n = o(N^{-1/2})$, thus $\sqrt{N}\left( \hat{\varvec{\beta }}_{CC}-\hat{\varvec{\beta }}^{or}\right) \overset{p}{\rightarrow }0$.

Proof of Theorem 2

The procedure of proving Theorem 2 is similar to the proof in [29] (page 75–78).

From the assumption D5 in [29] and [18] provided above,

$$\begin{aligned} \left( \begin{array}{c} \frac{\sqrt{n}}{N}\sum _{i\in U}y_{i}\left( \frac{I_{i}}{\pi _{i}}-1\right) \\ \frac{\sqrt{n}}{N}\sum _{i\in U}\varvec{x}_{i}\left( \frac{I_{i}}{\pi _{i}}-1\right) \end{array}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\left[ \begin{array}{cc} \Sigma ^{yy} &{} \varvec{\Sigma }^{yx}\\ \varvec{\Sigma }^{xy} &{} \varvec{\Sigma }^{xx} \end{array}\right] \right) . \end{aligned}$$

It is known that $\hat{\varvec{\beta }}_{CC}$ converges in probability to $\varvec{\beta }_{0}$, we apply the Slutsky’s theorem and obtain

$$\begin{aligned} \frac{\sqrt{n}}{N}\left( {\hat{Y}}_{\text {diff}}-{\hat{Y}}_{CC}\right) =\frac{\sqrt{n}}{N}\sum _{i\in U}\varvec{x}_{i}^{T}\left( \varvec{\beta }_{0}-\hat{\varvec{\beta }}_{CC}\right) \left( 1-\frac{I_{i}}{\pi _{i}}\right) = o_p(1). \end{aligned}$$

Furthermore, define a function $g\left( \cdot ,\cdot \right) $ such that $g\left( \varvec{a},\varvec{b}\right) =\left( a_{1},\varvec{a}_{2}^{T}\varvec{b}\right) $, and we apply the Slutsky’s theorem and have

$$\begin{aligned} \left( \begin{array}{c} \frac{\sqrt{n}}{N}\sum _{i\in U}y_{i}\left( \frac{I_{i}}{\pi _{i}}-1\right) \\ \frac{\sqrt{n}}{N}\sum _{i\in U}\varvec{x}_{i}^{T}\hat{\varvec{\beta }}_{CC}\left( \frac{I_{i}}{\pi _{i}}-1\right) \end{array}\right) \overset{d}{\rightarrow }N\left( \varvec{0},\left[ \begin{array}{cc} \Sigma ^{yy} &{} \varvec{\Sigma }^{yx}\varvec{\beta }_{0}\\ \varvec{\beta }_{0}\varvec{\Sigma }^{xy} &{} \varvec{\beta }_{0}^{T}\varvec{\Sigma }^{xx}\varvec{\beta }_{0} \end{array}\right] \right) . \end{aligned}$$

Consider $h\left( a_{1},a_{2}\right) =a_{1}-a_{2}$. By using Delta method, we have that

$$\begin{aligned} \frac{\sqrt{n}}{N}\left[ \Sigma ^{yy}-\varvec{\Sigma }^{yx}\varvec{\beta }_{0}-\varvec{\beta }_{0}^{T}\varvec{\Sigma }^{xy}+\varvec{\beta }_{0}^{T}\varvec{\Sigma }^{xx}\varvec{\beta }_{0}\right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right) \overset{d}{\rightarrow }N\left( 0,1\right) . \end{aligned}$$

Now consider

$$\begin{aligned} V\left( {\hat{Y}}_{\text {diff}}\right)&=\sum _{i\in U}\sum _{j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{0}\right) \left( y_{j}-\varvec{x}_{j}^{T}\varvec{\beta }_{0}\right) \\&=\frac{N^{2}}{n}\left( \Sigma _{N}^{yy}-\varvec{\Sigma }_{N}^{yx}\varvec{\beta }_{0}-\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xy}+\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xx}\varvec{\beta }_{0}\right) , \end{aligned}$$

and

$$\begin{aligned} {\hat{V}}\left( {\hat{Y}}_{CC}\right) =\sum \sum _{i,j\in S}\frac{\Delta _{ij}}{\pi _{ij}}\frac{\left( y_{i}-\varvec{x}_{i}^{T}\hat{\varvec{\beta }}_{CC}\right) }{\pi _{i}}\frac{\left( y_{j}-\varvec{x}_{j}^{T}\hat{\varvec{\beta }}_{CC}\right) }{\pi _{j}}. \end{aligned}$$

We have,

$$\begin{aligned} \frac{n}{N^{2}}{\hat{V}}\left( {\hat{Y}}_{CC}\right)&=\frac{n}{N^{2}}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}y_{i}y_{j}-\frac{n}{N^{2}}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}y_{i}\varvec{x}_{j}^{T}\hat{\varvec{\beta }}_{CC}\\&\quad -\hat{\varvec{\beta }}_{CC}^{T}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}\varvec{x}_{i}y_{j}-\hat{\varvec{\beta }}_{CC}^T\frac{n}{N^{2}}\sum \sum _{i,j\in U}\frac{\Delta _{ij}}{\pi _{i}\pi _{j}}\frac{I_{i}I_{j}}{\pi _{ij}}\varvec{x}_{i}\varvec{x}_{j}^{T}\hat{\varvec{\beta }}_{CC}\\&= {\hat{\Sigma }}_{N}^{yy}-\varvec{{\hat{\Sigma }}}_{N}^{yx}\hat{\varvec{\beta }}_{CC}-\hat{\varvec{\beta }}_{CC}^{T}\varvec{{\hat{\Sigma }}}_{N}^{xy}+\hat{\varvec{\beta }}^{T}_{CC}\varvec{{\hat{\Sigma }}}_{N}^{xx}\hat{\varvec{\beta }}_{CC}. \end{aligned}$$

By Theorem 1 and the assumption D6 in [29], we have that

$$\begin{aligned} \frac{n}{N^{2}}{\hat{V}}\left( {\hat{Y}}_{CC}\right)&=\Sigma _{N}^{yy}-\varvec{\Sigma }_{N}^{yx}\varvec{\beta }_{0}-\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xy}+\varvec{\beta }_{0}^{T}\varvec{\Sigma }_{N}^{xx}\varvec{\beta }_{0}+o_{p}(1)\\&=\frac{n}{N^{2}}V\left( {\hat{Y}}_{\text {diff}}\right) +o_{p}\left( 1\right) . \end{aligned}$$

Thus

$$\begin{aligned} \left[ {\hat{V}}\left( {\hat{Y}}_{CC}\right) \right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right)&=\left[ V\left( {\hat{Y}}_{\text {diff}}\right) \right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right) +o_{p}\left( \frac{\sqrt{n}}{N}\right) O_{p}\left( \frac{N}{\sqrt{n}}\right) \\&=\left[ V\left( {\hat{Y}}_{\text {diff}}\right) \right] ^{-1/2}\left( {\hat{Y}}_{CC}-Y_t\right) +o_{p}\left( 1\right) \\&\overset{d}{\rightarrow }N\left( 0,1\right) . \end{aligned}$$

Bootstrap algorithms

1.1 SRS

The algorithm described here is from [42].

1
For each unit in the original sample, repeat $k = \lfloor N/n \rfloor $ times to create the fixed part of the pseudo-population $U^f$.
2
Take a simple random sample $U^{c}$ of size $N - nk$ without replacement from the original sample. The completed pseudo-population is $U^* = U^f \cup U^{c}$.
3
Take a simple random sample of size n without replacement from $U^*$.
4
Compute the bootstrap statistic, ${\hat{Y}}^*$, based on the bootstrap sample.
5
Repeat steps 2 to 4 B times to get bootstrap estimates.

1.2 PPS

Here 0.5$\pi $ps-bootstrap algorithm in [44] is described.

1
For each unit in the original sample, $(y_i, \varvec{x}_i, M_i,\pi _i)$, repeat $d_i$ times to create the bootstrap pseudo-population $U^*$. The bootstrap population size is $N^* = \sum _{i \in S}d_i$ and $M^* = \sum _{i \in S}d_iM_i$.
2
Take a bootstrap sample a bootstrap sample with sample size n with inclusion probabilites $n \frac{M_i^*}{M^*}$.
3
Repeat steps 1 to 2 B times to get bootstrap estimates.

$d_i$ is determined in the following way. Let $\pi ^{-1} = c_i + r_i$, where $c_i = \lfloor \pi ^{-1}\rfloor $. Then, $d_i = c_i$ if $r_i <0.5$, $d_i = c_i + 1$, if $r_i \ge 0.5$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

McDonald, E., Wang, X. Generalized regression estimators with concave penalties and a comparison to lasso type estimators. METRON (2023). https://doi.org/10.1007/s40300-023-00253-4

Download citation

Received: 10 June 2022
Accepted: 27 July 2023
Published: 09 August 2023
DOI: https://doi.org/10.1007/s40300-023-00253-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized regression estimators with concave penalties and a comparison to lasso type estimators

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Bootstrap algorithms

1.1 SRS

1.2 PPS

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalized regression estimators with concave penalties and a comparison to lasso type estimators

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Bootstrap algorithms

1.1 SRS

1.2 PPS

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation