Abstract
Multivariate data is collected in many fields, such as chemometrics, econometrics, financial engineering and genetics. In multivariate data, heteroscedasticity and collinearity occur frequently. And selecting material predictors is also a key issue when analyzing multivariate data. To accomplish these tasks, multivariate linear regression model is often constructed. We thus propose row-sparse elastic-net regularized multivariate Huber regression model in this paper. For this new model, we proof its grouping effect property and the property of resisting sample outliers. Based on the KKT condition, an accelerated proximal sub-gradient algorithm is designed to solve the proposed model and its convergency is also established. To demonstrate the accuracy and efficiency, simulation and real data experiments are carried out. The numerical results show that the new model can deal with heteroscedasticity and collinearity well.
Similar content being viewed by others
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Breiman, L., Friedman, J.H.: Predicting multivariate responses in multiple linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 59, 3–54 (1997)
Chen, B.Z., Kong, L.C.: High-dimensional least square matrix regression via elastic net penalty. Pac. J. Optim. 13(2), 185–196 (2017)
Chen, B.Z., Zhai, W.J., Huang, Z.Y.: Low-rank elastic-net regularized multivariate Huber regression model. Appl. Math. Model. 87, 571–583 (2020)
Das, J., Gayvert, K., Bunea, F., Wegkamp, M., Yu, H.: Encapp: elastic-net-based prognosis prediction and biomarker discovery for human cancers. BMC Genom. 16, 1–13 (2015)
Hastie, T., Tibshirani, R., et al.: ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1, 1–21 (2000)
Huber, P.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)
Huber, P.: Robust Statistics. Wiley, New York (1981)
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
Mukherjee, A., Zhu, J.: Reduced rank ridge regression and its kernel extensions. Stat. Anal. Data Min. ASA Data Sci J. 4, 612–622 (2011)
Negahban, S., Wainwright, M.: Simultaneous support recovery in high dimensions: benefits and perils of block \(l_1/l_{\infty }\)-regularization. IEEE Trans. Inform. Theory 57, 3841–3863 (2011)
Obozinski, G., Wainwright, M., Jordan, M.: Support union recovery in high-dimensional multivariate regression. Ann. Stat. 39(1), 1–47 (2011)
Rodol\(\grave{a}\), E., Torsello, A., Harada, T., Kuniyoshi, Y., Cremers, D.: Elastic net constraints for shape matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1169–1176, (2013)
Similä, T., Tikka, J.: Input selection and shrinkage in multiresponse linear regression. Comput. Stat. Data Anal. 52, 406–422 (2007)
Skagerberg, S., MacGregor, J.F., Kiparissides, C.: Multivariate data analysis applied to low-density polyethylene reactors. Chemom. Intell. Lab. Syst. 14, 341–356 (1992)
Stransky, N.: The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391), 603–607 (2012)
Toh, K., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pac. J. Optim. 6, 615–640 (2010)
Tropp, J.A.: Algorithms for simultaneous sparse approximation. Part II Convex Relaxation. Signal Process. 86, 589–602 (2006)
Turlach, B., Venables, W., Wright, S.: Simultaneous variable selection. Technometrics 47, 350–363 (2005)
Xin, X., Hu, J., Liu, L.: On the oracle property of a generalized adaptive elastic-net for multivariate linear regression with a diverging number of parameters. J. Multiva. Anal. 162, 16–31 (2017)
Yi, C., Huang, J.: Semismooth Newton coordinate descent algorithm for elastic-net penalized Huber loss regression and quantile regression. J. Comput. Graph. Stat. 26, 547–557 (2017)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320 (2005)
Zou, H., Zhang, H.: On the adaptive elastic-net with a diverging number of parameters. Ann. Statist. 37, 1733–1751 (2009)
Acknowledgements
The authors are very grateful to two anonymous reviewers and associate editor for their insightful remarks and comments which considerably improved the presentation of our paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the Key Program of Cangzhou Jiaotong College (HB202001002) and the National Natural Science Foundation of China (12071022).
Appendix
Appendix
1.1 Proof of Theorem 2
As shown in Chen et al. (2020), the derivative of \(H_\alpha ^n(B)\) is
where \(\varPsi (B)=\left( \psi ^\mathrm{T}\left( {{\varvec{y}}}_1-B^\mathrm{T}{{\varvec{x}}}_1\right) , \cdots , \psi ^\mathrm{T}\left( {{\varvec{y}}}_n-B^\mathrm{T}{{\varvec{x}}}_n\right) \right) ^\mathrm{T}\),
and \(\varPsi (B)\) has the following upper bound
Let \(L(\lambda _1,\lambda _2, B) = \tfrac{1}{n}\sum \nolimits _{i=1}^{n} h_\alpha \left( {{\varvec{y}}}_i-B^\mathrm{T}{{\varvec{x}}}_i\right) +\lambda _1\sum \nolimits _{k=1}^{m}\Vert {{\varvec{b}}}_k\Vert _2+\tfrac{\lambda _2}{2}\sum \nolimits _{k=1}^{m}\Vert {{\varvec{b}}}_k\Vert _2^2\). Then, the Karush–Kuhn–Tucker condition of optimization problem (2) is
where \(S=\left( {{\varvec{s}}}_1,\cdots ,{{\varvec{s}}}_m\right) ^\mathrm{T}\) and \({{\varvec{s}}}_k\) satisfies
Let \({{\varvec{e}}}^{(k)}= (0,\cdots , 0,1, 0,\cdots , 0)^\mathrm{T}\in \mathbb {R}^m\), where “1” is the kth component of \({{\varvec{e}}}^{(k)}\). Multiplying \({{\varvec{e}}}^{(k)}\) on the both sides of equation (11), it follows that
i.e.,
For the purpose of deriving the upper bound for \(\Vert \hat{{{\varvec{b}}}}_i-\hat{{{\varvec{b}}}}_j\Vert _2\), we consider the case \(\hat{{{\varvec{b}}}}_i\ne 0\) and \(\hat{{{\varvec{b}}}}_j\ne 0\). By letting \(k=i\) and \(k=j\) in (12), we obtain
Then, we have
It follows that
On the one hand,
On the other hand,
where \(\theta\) is the angle between \(\hat{{{\varvec{b}}}}_i\) and \(\hat{{{\varvec{b}}}}_j\). Thus,
Combining (13), (14) and (15), it is easy to obtain the desired result. \(\square\)
1.2 Appendix: proof of Corollary 1
If \(\dot{{{\varvec{x}}}}_{i}=\dot{{{\varvec{x}}}}_{j}\), then the sample correlation coefficient \(\rho =\tfrac{1}{n}\dot{{{\varvec{x}}}}^\mathrm{T}_{i} \dot{{{\varvec{x}}}}_{j}=1\). Considering the upper bound (5), we have \(\Vert \hat{{{\varvec{b}}}}_i -\hat{{{\varvec{b}}}}_j\Vert _2\le 0\). It follows that \(\hat{{{\varvec{b}}}}_i=\hat{{{\varvec{b}}}}_j\). \(\square\)
1.3 Appendix: proof of Theorem 3
For the optimization problem (6), the Karush–Kuhn–Tucker conditions are
where \({{\varvec{g}}}_k^{\text {T}}\) is the kth row of G and
If \({{\varvec{b}}}_k=0\), equality (16) becomes
It follows that
Considering the second inequality in (17), we use
to determine \({{\varvec{b}}}_k=0\).
If \({{\varvec{b}}}_k\ne 0\), (16) is in the following form
It is equivalent to
Taking the \(\ell _2\)-norm on both sides, we obtain that
Inserting this expression into (19), we obtain
Combining (18) and (20), the desired result (7) can be obtained.\({\square }\)
1.4 Appendix: proof of Theorem 4
Following the procedure in Beck and Teboulle (2009) or Toh and Yun (2010), inequality (8) can be easily obtained.
Considering the triangular inequality \(\Vert \hat{B}-B^0\Vert _\mathrm{F}\le \Vert \hat{B}\Vert _\mathrm{F}+\Vert B^0\Vert _\mathrm{F}\) and (8), we have
Note that \(\hat{B}\) is the solution to (2). It follows that
It is easy to obtain
Combining (22) and (23), we can obtain the following upper bound of \(\Vert \hat{B}\Vert _\mathrm{F}\)
Inserting this inequality to (21), it follows that
where \(C=\min \left\{ \Vert Y\Vert _\mathrm{F}^{2}/(2n\lambda _1),~\Vert Y\Vert _\mathrm{F}\sqrt{1/(n\lambda _2)}\right\}\). In order to make \(B^k\) to be the \(\epsilon\)-optimal solution, i.e., \(F(B^k)- F(\hat{B})\le \epsilon\), we only need to terminate the algorithm when
It follows that
\({\square }\)
Rights and permissions
About this article
Cite this article
Chen, B., Zhai, W. & Kong, L. Variable selection and collinearity processing for multivariate data via row-elastic-net regularization. AStA Adv Stat Anal 106, 79–96 (2022). https://doi.org/10.1007/s10182-021-00403-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-021-00403-x