Abstract
The generalized regression (GREG) estimator is usually used in survey sampling when incorporating auxiliary information. Generally, not all available covariates significantly contribute to the estimation process when there are multiple covariates. We propose two new GREG estimators based on concave penalties: one built from the smoothly clipped absolute deviation (SCAD) penalty and the other built from the minimax concave penalty (MCP). The performances of these estimators are compared to lasso-type estimators through a simulation study in a simple random sample (SRS) setting and a probability proportional to size (PPS) sample setting. It is shown that the proposed estimators produce improved estimates of the population total compared to that of the traditional GREG estimator and the estimators built from LASSO. Asymptotic properties are also derived for the proposed estimators. Bootstrap methods are also explored to improve coverage probability when the sample size is small.
Similar content being viewed by others
Data Availability
The data set is available from UC Irvine Machine Learning Repository. https://archive.ics.uci.edu/dataset/211/communities+and+crime+unnormalized.
References
Cassel, C.M., Särndal, C.E., Wretman, J.H.: Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63(3), 615–620 (1976)
Särndal, C.-E., Swensson, B., Wretman, J.: Model Assisted Survey Sampling. Springer, New York (2003)
Fuller, W.A.: Sampling Statistics. John Wiley & Sons, Hoboken, New Jersey (2011)
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47(260), 663–685 (1952)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288 (1996)
Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Annals of statistics, 1356–1378 (2000)
Zou, H.: The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476), 1418–1429 (2006)
Fan, J.: Comments on «wavelets in statistics: A review» by a. antoniadis. Journal of the Italian Statistical Society 6(2), 131–138 (1997)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360 (2001)
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2), 894–942 (2010)
Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research 11(3) (2010)
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 101–148 (2010)
Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Statistical Science 27(4), 481–499 (2012)
Wang, H., Li, G., Jiang, G.: Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics 25(3), 347–355 (2007)
Wang, M., Song, L., Tian, G.-l: Scad-penalized least absolute deviation regression in high-dimensional models. Communications in Statistics-Theory and Methods 44(12), 2452–2472 (2015)
Jiang, H., Zheng, W., Dong, Y.: Sparse and robust estimation with ridge minimax concave penalty. Information Sciences 571, 154–174 (2021)
Staerk, C., Kateri, M., Ntzoufras, I.: High-dimensional variable selection via low-dimensional adaptive learning. Electronic Journal of Statistics 15(1), 830–879 (2021)
McConville, K.S., Breidt, F.J., Lee, T.C., Moisen, G.G.: Model-assisted survey regression estimation with the lasso. Journal of Survey Statistics and Methodology 5(2), 131–158 (2017)
Ta, T., Shao, J., Li, Q., Wang, L.: Generalized regression estimators with high-dimensional covariates. Statistica Sinica 30(3), 1135–1154 (2020)
Dagdoug, M., Goga, C., Haziza, D.: Model-assisted estimation through random forests in finite population sampling. Journal of the American Statistical Association, 1–18 (2021)
Dagdoug, M., Goga, C., Haziza, D.: Model-assisted estimation in high-dimensional settings for survey data. Journal of Applied Statistics, 1–25 (2022)
Chauvet, G., Goga, C.: Asymptotic efficiency of the calibration estimator in a high-dimensional data setting. Journal of Statistical Planning and Inference 217, 177–187 (2022)
Wei, F., Zhu, H.: Group coordinate descent algorithms for nonconvex penalized regression. Computational statistics & data analysis 56(2), 316–326 (2012)
Fan, Y., Li, R.: Variable selection in linear mixed effects models. Annals of statistics 40(4), 2043 (2012)
Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and computing 25(2), 173–187 (2015)
Ma, S., Huang, J.: A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association 112(517), 410–423 (2017)
Wang, X., Zhu, Z., Zhang, H.H.: Spatial Heterogeneity Automatic Detection and Estimation (2020)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33(1), 1 (2010)
McConville, K.: Improved estimation for complex surveys using modern regression techniques. PhD thesis, Colorado State University (2011)
Kim, Y., Choi, H., Oh, H.-S.: Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association 103(484), 1665–1673 (2008)
Xie, H., Huang, J.: SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics 37(2), 673–696 (2009). https://doi.org/10.1214/07-AOS580
Wang, L., Li, H., Huang, J.Z.: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association 103(484), 1556–1569 (2008)
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics 5(1), 232–253 (2011)
Wang, X., Zhu, Z., Zhang, H.H.: Spatial heterogeneity automatic detection and estimation. Computational Statistics & Data Analysis 180, 107667 (2023)
Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Annals of statistics, 1026–1053 (2000)
Hájek, J.: Limiting distributions in simple random sampling from a finite population. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 361–374 (1960)
Krewski, D., Rao, J.N.K.: Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods. The Annals of Statistics, 1010–1019 (1981)
Bickel, P.J., Freedman, D.A.: Asymptotic normality and the bootstrap in stratified sampling. The annals of statistics, 470–482 (1984)
Hájek, J.: Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics 35(4), 1491–1523 (1964)
Chen, J., Rao, J.: Asymptotic normality under two-phase sampling designs. Statistica sinica, 1047–1064 (2007)
Tillé, Y.: An elimination procedure for unequal probability sampling without replacement. Biometrika 83(1), 238–241 (1996)
Mashreghi, Z., Haziza, D., Léger, C.: A survey of bootstrap methods in finite population sampling. Statistics Surveys 10, 1–52 (2016)
Booth, J.G., Butler, R.W., Hall, P.: Bootstrap methods for finite populations. Journal of the American Statistical Association 89(428), 1282–1289 (1994)
Barbiero, A., Mecatti, F.: Bootstrap algorithms for variance estimation in \(\pi \)ps sampling. In: Complex Data Modeling and Computationally Intensive Statistical Methods, pp. 57–69. Springer, Italy (2010)
Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml
Avella-Medina, M., Ronchetti, E.: Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105(1), 31–44 (2018)
Wang, L., Zhou, J., Qu, A.: Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68(2), 353–360 (2012)
Tsung, C., Kuang, J., Valliant, R.L., Elliott, M.R.: Model-assisted calibration of non-probability sample survey data using adaptive lasso. Survey Methodology 44(1), 117–145 (2018)
Lehmann, E.L.: Elements of Large-sample Theory. Springer, New York (1999)
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No financial or non-financial interests are directly or indirectly related to the submitted work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proof of Lemma 1
Here, we follow the procedure used in [29] to prove the property of the oracle estimator.
Recall that, the oracle estimator is defined as
The dimension of \(\varvec{x}_{i1}\) is \(s+1\). Let \(\varvec{\beta }_{1N}=\left( \sum _{i=1}^{N}\varvec{x}_{i1}\varvec{x}_{i1}^{T}\right) ^{-1}\sum _{i=1}^{N}\varvec{x}_{i1}y_{i}\), and based on the definition, we know that \(\sum _{i=1}^{N}\varvec{x}_{i1}\left( y_{i}-\varvec{x}_{i1}^{T}\varvec{\beta }_{1N}\right) =0\).
Thus, we have
Based on assumptions D3, D4 and D5, we have
where \(\varvec{\Sigma }_{1}\) is a positive definite matrix. And
where \(C_{1}\) is the \((s+1)\times (s+1)\) submatrix of C.
Thus we have,
Under the superpopulation model, we know that \(\varvec{\beta }_{1N}\) is an OLS estimator of \(\varvec{\beta }_{1}\), by the similar arguments of [29] on page 80 (Theorem 2.7.4 in [49] and the Slutsky’s theorem ), we have that
By combining results (1) and (2), we have
where \(\varvec{V}_{1}=\pi ^{-1}C_{1}^{-1}\varvec{\Sigma }_{1}C_{1}^{-1}+\sigma ^{2}C_{1}\). The completes the proof.
Proof of Theorem 1
Define the following two objective functions,
where
and
It is known that \(Q(\varvec{\beta })\) is the objective function in (11) to find the SCAD or MCP based survey-weighted regression estimator. Define a function \(\varvec{\beta }^{*}=T(\varvec{\beta })\) such that the first \(1+s\) components are the same as \(\varvec{\beta }\), but the last \(p-s\) components are 0.
Consider \(\Theta =\left\{ \varvec{\beta }:\Vert \varvec{\beta }-\varvec{\beta }_{0}\Vert \le \phi _{N}\right\} \). By Lemma 1, there exist an event \(E_{1}=\left\{ \Vert \hat{\varvec{\beta }}^{or}-\varvec{\beta }_{0}\Vert \le \phi _{N}\right\} \) such that \(P(E_{1}^{c})\le \epsilon _1\), where \(\phi _N = O(N^{-1/2+\delta })\) and \(0< \delta <1/2\). Thus \(\hat{\varvec{\beta }}^{or} \in \Theta \) is in \(E_1\).
The proof will be completed in two steps.
-
1.
In event \(E_{1}\), show that \(Q\left( \varvec{\beta }^{*}\right) >Q\left( \hat{\varvec{\beta }}^{or}\right) \) for any \(\varvec{\beta }\in \Theta \) and \(\varvec{\beta }^{*}\ne \hat{\varvec{\beta }}^{or}\).
-
2.
There is an event \(E_2\) such that \(P(E_2^c) \le \epsilon _2\). In \(E_1\cap E_2\), there is a neighborhood of \(\hat{\varvec{\beta }}^{or}\), denoted by \(\Theta _n\) such that \(Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) \) for any \(\varvec{\beta }\in \Theta \cap \Theta _{n}\) for sufficiently large n and N.
By combining the results in the two step, we have that \(Q(\varvec{\beta }) > Q(\hat{\varvec{\beta }}^{or})\) for any \(\varvec{\beta } \in \Theta _n \cap \Theta \) and \(\varvec{\beta } \ne \hat{\varvec{\beta }}^{or}\) in \(E_1\cap E_2\). Thus \(\hat{\varvec{\beta }}^{or}\) is a strict local minimizer of \(Q(\varvec{\beta })\) over \(E_1\cap E_2\) with \(P(E_1\cap E_2) \ge 1 - \epsilon _1 -\epsilon _2\).
First show that \(P\left( T(\varvec{\beta })\right) =C_{N}\) for \(\varvec{\beta }\in \Theta \). We know that for \(j=s+1,\dots , p\), \(\rho _{\gamma }(\beta _{j};\lambda )=0\) since \(\beta _{j}=0\). And for \(j=1,\dots ,s\),
where the last inequality follows from the assumption that \(b>a\lambda \gg \phi _N\). Thus \(\rho _{\gamma }(\beta _{j};\lambda )\) is a constant for \(\vert \beta _{j}\vert \ge a\lambda \). Thus \(P(\varvec{\beta }^*) = P\left( T(\varvec{\beta })\right) \) is a constant and \(P(\varvec{\beta }^*) = P_1(\varvec{\beta }_1)\).
Since \(L\left( \varvec{\beta }^{*}\right) =L_{1}\left( \varvec{\beta }_{1}\right) \) and \(L\left( \hat{\varvec{\beta }}^{or}\right) =L_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \), thus \(Q\left( \varvec{\beta }^{*}\right) =Q_{1}\left( \varvec{\beta }_{1}\right) \) and \(Q\left( \hat{\varvec{\beta }}^{or}\right) =Q_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \). Also it is known that \(\hat{\varvec{\beta }}_{1}^{or}\) is the unique global minimizer of \(L_{1}\left( \varvec{\beta }_{1}\right) \), thus \(L_{1}\left( \varvec{\beta }_{1}\right) >L_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \) for \(\varvec{\beta }_{1}\ne \hat{\varvec{\beta }}_{1}^{or}\) and \(Q_{1}\left( \varvec{\beta }_{1}\right) >Q_{1}\left( \hat{\varvec{\beta }}_{1}^{or}\right) \) for \(\varvec{\beta }_{1}\ne \hat{\varvec{\beta }}_{1}^{or}\). Thus \(Q\left( \varvec{\beta }^{*}\right) >Q\left( \hat{\varvec{\beta }}^{or}\right) \) for any \(\varvec{\beta }\in \varvec{\Theta }\) and \(\varvec{\beta }^{*}\ne \hat{\varvec{\beta }}^{or}\).
Then the next step is to show that \(Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) \) for any \( \varvec{\beta }\in \Theta \cap \Theta _{n}\), where \(\Theta _{n}=\left\{ \varvec{\beta },\sup _{j}\vert \beta _{j}-{\hat{\beta }}_{j}^{or}\vert \le t_{n}\right\} \). For \(\varvec{\beta }\in \Theta _{n}\cap \Theta \), by Taylor expansion, we have
where
\(\varvec{\beta }^{m}=\alpha \varvec{\beta }+\left( 1-\alpha \right) \varvec{\beta }^{*}\) for some constant \(\alpha \in \left( 0,1\right) \) and \(\varvec{\Omega } = diag(1/\pi _1,\dots , 1/\pi _n)\).
For \(j=0,1,\dots ,s\), \(\beta _{j}^{m}=\beta _{j}=\beta _{j}^{*}\) and for \(j=s+1,\dots p\), \(\beta _{j}^{*}=0\) and \(\beta _{j}^{m}=\alpha \beta _{j}\), thus
Also for \(j=s+1,\dots p\), \({\hat{\beta }}_{j}^{or}=0\), thus \(\vert \beta _{j}\vert \le t_{n}\) and \(\rho _{\gamma }^{\prime }\left( \alpha \beta _{j};\lambda \right) \ge \rho _{\gamma }^{\prime }\left( \alpha t_{n};\lambda \right) \) by the concavity of \(\rho (\cdot )\). Thus,
Now consider \(\Gamma _{1}\), which can be written as
For \(j=s+1,\dots ,p\), we have the following,
where \(\varvec{\beta }_N \) is finite population coefficient vector, which can be considered as a traditional OLS estimator of \(\varvec{\beta }_0\), thus \(\varvec{\beta }_N - \varvec{\beta }_0 = O_p(N^{-1/2})\). Then there exists an event \(E_{21} = \{ \vert \varvec{\beta }_N - \varvec{\beta }_0\vert \le \phi _N\}\), such that \(P(E_{21}^c) \le \epsilon _{21}\). Thus over event \(E_{21}\), we have
Also from assumption C3 in [29], it is known that \(\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\varvec{x}_{i}^{T}x_{ij}=O_{p}\left( 1\right) \). Thus, there exists an event \(E_{22}\) such that \(P(E_{22}^c) \le \epsilon _{22}\) and over event \(E_{22}\), \(\vert \frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}x_{il}x_{ij} \vert \le M_1\). Thus the second part of (5) becomes,
From page 73 of [29],
Thus, \(\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}=O_{p}\left( N^{-1/2}\right) \), thus, \(\frac{1}{n}\sum _{i=1}^{n}\frac{1}{\pi _{i}}\left( y_{i}-\varvec{x}_{i}^{T}\varvec{\beta }_{N}\right) x_{ij}=o_{p}\left( N^{-1/2+\delta }\right) = o_p(\phi _N)\). This means that there exists an event \(E_{23}\) such that \(P(E_{23}^c)\le \epsilon _{23}\), and over event \(E_{23}\), the first part of (5) can be bounded by
where \(M_2 >0\).
From (6) and (7), \(\Gamma _1\) in (4) can be bounded by
According to (3) and (8), we have
Since \(\rho ^{\prime }\left( \alpha t_{n}\right) \rightarrow 1\) and \(\lambda \gg \phi _{N}\), thus \(Q\left( \varvec{\beta }\right) \ge Q\left( \varvec{\beta }^{*}\right) \) for sufficient large n and N. And here event \(E_2 = E_{21}\cap E_{22} \cap E_{23}\).
If let \(t_n = o(N^{-1/2})\), thus \(\sqrt{N}\left( \hat{\varvec{\beta }}_{CC}-\hat{\varvec{\beta }}^{or}\right) \overset{p}{\rightarrow }0\).
Proof of Theorem 2
The procedure of proving Theorem 2 is similar to the proof in [29] (page 75–78).
From the assumption D5 in [29] and [18] provided above,
It is known that \(\hat{\varvec{\beta }}_{CC}\) converges in probability to \(\varvec{\beta }_{0}\), we apply the Slutsky’s theorem and obtain
Furthermore, define a function \(g\left( \cdot ,\cdot \right) \) such that \(g\left( \varvec{a},\varvec{b}\right) =\left( a_{1},\varvec{a}_{2}^{T}\varvec{b}\right) \), and we apply the Slutsky’s theorem and have
Consider \(h\left( a_{1},a_{2}\right) =a_{1}-a_{2}\). By using Delta method, we have that
Now consider
and
We have,
By Theorem 1 and the assumption D6 in [29], we have that
Thus
Bootstrap algorithms
1.1 SRS
The algorithm described here is from [42].
-
1
For each unit in the original sample, repeat \(k = \lfloor N/n \rfloor \) times to create the fixed part of the pseudo-population \(U^f\).
-
2
Take a simple random sample \(U^{c}\) of size \(N - nk\) without replacement from the original sample. The completed pseudo-population is \(U^* = U^f \cup U^{c}\).
-
3
Take a simple random sample of size n without replacement from \(U^*\).
-
4
Compute the bootstrap statistic, \({\hat{Y}}^*\), based on the bootstrap sample.
-
5
Repeat steps 2 to 4 B times to get bootstrap estimates.
1.2 PPS
Here 0.5\(\pi \)ps-bootstrap algorithm in [44] is described.
-
1
For each unit in the original sample, \((y_i, \varvec{x}_i, M_i,\pi _i)\), repeat \(d_i\) times to create the bootstrap pseudo-population \(U^*\). The bootstrap population size is \(N^* = \sum _{i \in S}d_i\) and \(M^* = \sum _{i \in S}d_iM_i\).
-
2
Take a bootstrap sample a bootstrap sample with sample size n with inclusion probabilites \(n \frac{M_i^*}{M^*}\).
-
3
Repeat steps 1 to 2 B times to get bootstrap estimates.
\(d_i\) is determined in the following way. Let \(\pi ^{-1} = c_i + r_i\), where \(c_i = \lfloor \pi ^{-1}\rfloor \). Then, \(d_i = c_i\) if \(r_i <0.5\), \(d_i = c_i + 1\), if \(r_i \ge 0.5\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
McDonald, E., Wang, X. Generalized regression estimators with concave penalties and a comparison to lasso type estimators. METRON (2023). https://doi.org/10.1007/s40300-023-00253-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40300-023-00253-4