Abstract
The assumption of equal variances is not always appropriate and different approaches for modelling variance heterogeneity have been widely studied in the literature. One of these approaches is joint location and scale model defined with the idea that both the location and the scale depend on explanatory variables through parametric linear models. Because the joint location and scale model includes two models, it does not deal well with a large number of irrelevant variables. Therefore, determining the variables that are important for the location and the scale is as important as estimating the parameters of these models. From this point of view, a combine robust estimation and variable selection method is proposed to simultaneously estimate the parameters and select the important variables. This is done using the least favorable distribution and least absolute shrinkage and selection operator method. Under appropriate conditions, we study the consistency, asymptotic distribution and the sparsity property of the proposed robust estimator. Simulation studies and a real data example are provided to demonstrate the advantages of the proposed method over existing methods in literature.
Similar content being viewed by others
References
Aitkin M (1987) Modelling variance heterogeneity in normal regression using GLIM. J R Stat Soc Ser C (Appl Stat) 36(3):332–339
Antoniadis A, Gijbels I, Lambert-Lacroix S, Poggi JM (2016) Joint estimation and variable selection for mean and dispersion in proper dispersion models. Electron J Stat 10:1630–1676
Arslan O (2012) Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression. Comput Stat Data Anal 56:1952–1965
Arslan O (2016) Penalized MM regression estimation with Lγ penalty: a robust version of bridge regression. Statistics 50(6):1236–1260
Breusch TS, Pagan AR (1979) A simple test for heteroskedasticity and random coefficient variation. Econometrica 47(5):1287–1294
Caner M (2009) LASSO-type GMM estimator. Econom Theory 25:270–290
Cox DR, Hinkley DV (1974) Theoretical statistics, vol 1. Chapman and Hall, London
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression (with discussion). Ann Stat 32:407–499
Fan JQ, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties J. Am Stat Assoc 96:1348–1360
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–135
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York
Harvey AC (1976) Estimating regression models with multiplicative heteroscedasticity. Econometrica 44:460–465
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35(1):73–101
Huber PJ, Ronchetti EM (2009) Robust statistics, vol 2. Wiley, New York
Knight K, Fu W (2000) Asymptotics for Lasso-type estimators. Ann Stat 28(5):1356–1378
Li G, Peng H, Zhu L (2011) Nonconcave penalized M-estimation with a diverging number of parameters. Stat Sin 21:391–419
Li HQ, Wu LC, Yi JY (2016) A skew-normal mixture of joint location, scale and skewness models. Appl Math J Chin Univ 31(3):283–295
Li H, Wu L, Ma T (2017) Variable selection in joint location, scale and skewness models of the skew-normal distribution. J Syst Sci Compl 30:694–709
Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. In: Engle RF, McFadden DL (eds) Handbook of econometrics, vol 4. Elsevier, Amsterdam, pp 2111–2245
Owen AB (2007) A robust hybrid of lasso and ridge regression. Contemp Math 443(7):59–72
Park RE (1966) Estimation with heteroscedastic error terms. Econometrica 34(4):888
Rosset S, Zhu J (2004) Discussion of “least angle regression”, by B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Ann Stat 32:469–475
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Taylor JT, Verbyla AP (2004) Joint modelling of location and scale parameters of the t distribution. Stat Model 4:91–112
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288
Verbyla AP (1993) Variance heterogeneity: residual maximum likelihood and diagnostics. J R Stat Soc B 52:493–508
Wang L, Li R (2009) Weighted Wilcoxon-type smoothly clipped absolute deviation method. Biometr J Int Biometr Soc 65(2):564–571
Wang H, Li G, Jiang G (2006) Robust regression shrinkage and consistent variable selection via the lad-lasso. J Bus Econ Stat 11:1–6
Wang X, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108(502):632–643
Wu LC (2014) Variable selection in joint location and scale models of the skew-t-normal distribution. Commun Stat Simul Comput 43(3):615–630
Wu LC, Li HQ (2012) Variable selection for joint mean and dispersion models of the inverse Gaussian distribution. Metrika 75:795–808
Wu LC, Zhang ZZ, Xu DK (2012) Variable selection in joint mean and variance models of Box Cox transformation. J Appl Stat 39(12):2543–2555
Wu LC, Zhang ZZ, Xu DK (2013) Variable selection in joint location and scale models of the skew-normal distribution. J Stat Comput Simul 83:1266–1278
Wu LC, Tian GL, Zhang YQ, Ma T (2017) Variable selection in joint location, scale and skewness models with a skew-t-normal distribution. Stat Interface 10(2):217–227
Zheng Q, Gallagher C, Kulasekera KB (2013) Adaptive penalized quantile regression for high dimensional data. J Stat Plan Inference 143(6):1029–1038
Zheng Q, Peng L, He X (2015) Globally adaptive quantile regression with ultra-high dimensional data. Ann Stat 43(5):2225–2258
Zheng Q, Gallagher C, Kulasekera KB (2017) Robust adaptive Lasso for variable selection. Commun Stat Theory Methods 46(9):4642–4659
Acknowledgements
The authors thank the anonymous referees, the editor and the associate editor for their careful reading and suggestions of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Theorem 1
First, we have to show that \({Z}_{n}\left({\varvec{\theta}}\right)\) converges uniformly in probability to \(Z\left({\varvec{\theta}}\right)\) given in (26), then \({\widehat{{\varvec{\theta}}}}_{n}\) is uniformly bounded in probability.
At first, we show that
in probability. Since the third term in \({Z}_{n}\left({\varvec{\theta}}\right)\) is not stochastic and the parameter space is compact (A4), it is sufficient to show that \(\frac{1}{2n}\sum_{i=1}^{n}{{\varvec{z}}}_{{\varvec{i}}}^{T}{\varvec{\gamma}}+\frac{1}{n}\sum_{i=1}^{n}\rho \left(\frac{{y}_{i}-{{\varvec{x}}}_{{\varvec{i}}}^{T}{\varvec{\upbeta}}}{{e}^{{{\varvec{z}}}_{{\varvec{i}}}^{T}{\varvec{\gamma}}/2}}\right)\) converges uniformly in probability to \({l}\left({\varvec{\theta}}\right)\) to show that \({Z}_{n}\left({\varvec{\theta}}\right)\) converges uniformly in probability to \(Z\left({\varvec{\theta}}\right)\) (Arslan 2016).
\(\rho \) function given in (9) is continuous (A4). Furthermore \(\mathrm{sup}\rho \left(r;{\varvec{\theta}}\right)<\infty \), \(\rho \left(t\right)\le \underset{\theta \in\Theta }{\mathrm{sup}}\rho \left(r;{\varvec{\theta}}\right)\) and \(E\left[\underset{{\varvec{\theta}}\in\Theta }{\mathrm{sup}}\rho \left(r;{\varvec{\theta}}\right)\right]<\infty \) where \(r=\frac{y-{{\varvec{x}}}^{T}{\varvec{\upbeta}}}{{e}^{{{\varvec{z}}}^{{\varvec{T}}}{\varvec{\gamma}}/2}}\). Thus, we have that \(E\left[\rho \left(r;{\varvec{\theta}}\right)\right]\) is continuous and
in probability (Newey and McFadden 1994). This result combined with \({\lambda }_{n}/n\to 0\) implies that
in probability. Further, since
we have
Then it follows that
Combining these results (the convergence in probability of \({Z}_{n}\) and \({\widehat{{\varvec{\theta}}}}_{n}\) is uniformly bounded) we obtain
Moreover, when \({\lambda }_{n}/n\to 0\) as \(n\to \infty \), \({Z}_{n}\left({\varvec{\theta}}\right)\) converges uniformly in probability to \({l}\left({\varvec{\theta}}\right)\) and, since \({l}\left({\varvec{\theta}}\right)\) has a unique minimum at \({{\varvec{\theta}}}_{0}\) (A2) we get \({\widehat{{\varvec{\theta}}}}_{n}\to {{\varvec{\theta}}}_{0}\) in probability which confirms the consistency of \({\widehat{{\varvec{\theta}}}}_{n}.\)
Proof of Theorem 2
Let us define \({Q}_{n}\left({\varvec{u}}\right)={\mathcal{L}}_{n}\left({{\varvec{\theta}}}_{0}+{n}^{-\frac{1}{2}}{\varvec{u}}\right)-{\mathcal{L}}_{n}\left({{\varvec{\theta}}}_{0}\right)\) with \({\varvec{u}}\in {\mathbb{R}}^{s}\). Obviously, \({Q}_{n}\left({\varvec{u}}\right)\) is minimized at \({\widehat{{\varvec{u}}}}_{n}=\sqrt{n}\left({\widehat{{\varvec{\theta}}}}_{n}-{{\varvec{\theta}}}_{0}\right)\) because \({\widehat{{\varvec{\theta}}}}_{n}\) minimizes \({\mathcal{L}}_{n}\left({\varvec{\theta}}\right)\). First, we need to show that
\({Q}_{n}\left({\varvec{u}}\right)\) can be rewritten as
For the first part of above equation, using Taylor series expansion around \({\varvec{u}}=0\), we get
Since \(\frac{1}{\sqrt{n}}{h}^{^{\prime}}\left({{\varvec{\theta}}}_{0}\right)\stackrel{D}{\to }{\varvec{W}}\) with \({\varvec{W}}\sim {N}_{s}\left(0,A\left({h}^{^{\prime}}\left({{\varvec{\theta}}}_{0}\right)\right)\right)\) and \(\frac{1}{n}{h}^{{^{\prime}}{^{\prime}}}\left({{\varvec{\theta}}}_{0}\right)\to B\left({h}^{^{\prime}}\left({{\varvec{\theta}}}_{0}\right)\right)\) where \(A\left({h}^{^{\prime}}\left({{\varvec{\theta}}}_{0}\right)\right)=E\left[{\left({h}^{^{\prime}}\left({{\varvec{\theta}}}_{0}\right)\right)}^{2}\right]\) and \(B\left({h}^{^{\prime}}\left({{\varvec{\theta}}}_{0}\right)\right)=E\left[{h}^{{^{\prime}}{^{\prime}}}\left({{\varvec{\theta}}}_{0}\right)\right]\), we obtain
(Arslan 2016).
Similar to Knight and Fu (2000) and Arslan (2016), we have
as \(n\to \infty \). Then, we obtain
as \(n\to \infty \). Since \(Q\left({\varvec{u}}\right)\) has a unique minimum and \({Q}_{n}\left({\varvec{u}}\right)\) can be approximated by a convex function, we finally have
Proof of Theorem 3
First, we prove that for any given \({{\varvec{\theta}}}^{({s}_{1})}\) satisfying \({{\varvec{\theta}}}^{({s}_{1})}-{{\varvec{\theta}}}_{0}^{({s}_{1})}=O\left({n}^{-1/2}\right)\) and any constant \(c > 0\), we have
Simple calculations lead to the following expression of the derivative of \(Q\left({\varvec{\theta}}\right)\).
Then applying the Taylor’s expansion, for any \({\theta }_{j}\) \(\left(j={s}_{1}+1,{s}_{1}+2,\dots ,s\right)\) we obtain
where \({{\varvec{\theta}}}^{*}\) is between \({\varvec{\theta}}\) and \({{\varvec{\theta}}}_{0}\). On the other hand, we know that (Fan and Li 2001)
According to Theorem 1, it is clear that \(\Vert {{\widehat{{\varvec{\theta}}}}_{n}-{\varvec{\theta}}}_{0}\Vert ={O}_{p}\left({n}^{-1/2}\right).\) Then, we obtain
While \({{\lambda }_{n}^{-1}n}^{-\frac{1}{2}}\to 0\) as \(n\to \infty ,\) the sign of the derivative is completely determined by that of \({\theta }_{j}.\)
Namely, we can ensure that
Hence, with probability tending to 1, \({\mathcal{L}}_{n}\left({\varvec{\theta}}\right)\) achieve its minimum at \({\varvec{\theta}}={\left({\left({{\varvec{\theta}}}^{({s}_{1})}\right)}^{T},{0}^{T}\right)}^{T}.\) This completes the proof of Theorem 3.
Rights and permissions
About this article
Cite this article
Güney, Y., Tuaç, Y., Özdemir, Ş. et al. Robust estimation and variable selection in heteroscedastic regression model using least favorable distribution. Comput Stat 36, 805–827 (2021). https://doi.org/10.1007/s00180-020-01036-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01036-5