Abstract
This paper explores a novel high-dimensional sparse multiplicative model, which deal with data with positive responses, particularly in economical and biomedical researches. The proposed regularized method is conducted on the least product relative error (LPRE), and can be applied on various penalties including adaptive Lasso, SCAD, and MCP. An adjusted ADMM algorithm is adopted to obtain the estimators based on LPRE loss. Additionally, we prove the consistency and compute the convergence rates of the estimator. To validate the effectiveness of the proposed method, we conduct extensive numerical studies and real data analysis, yielding valuable insights and practical applications, utilizing well-known datasets of the Boston housing data and gold price data.
Similar content being viewed by others
References
Bickel PJ, Ritov Y, Tsybakov AB (2009) Simultaneous analysis of lasso and dantzig selector. Ann Stat 37:1705–1732
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122
Chen J, Chen Z (2008) Extended bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771
Chen Y, Liu H (2023) A new relative error estimation for partially linear multiplicative model. Commun Stat-Simul Comput 52(10):4962–4980
Chen K, Guo S, Lin Y, Ying Z (2010) Least absolute relative error estimation. J Am Stat Assoc 105:1104–1112
Chen K, Lin Y, Wang Z, Ying Z (2016) Least product relative error estimation. J Multivar Anal 144:91–98
Chen Y, Liu H, Ma J (2022) Local least product relative error estimation for single-index varying-coefficient multiplicative model with positive responses. J Comput Appl Math 415:114478
Ding H, Wang Z, Wu Y (2018) A relative error-based estimation with an increasing number of parameters. Commun Stat-Theory Methods 47(1):196–209
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Ma Y, Dai W (2014) Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Am Stat Assoc 109(507):1270–1284
Fan R, Zhang S, Wu Y (2023) Nonconcave penalized m-estimation for the least absolute relative errors model. Commun Stat-Theory Methods 52:1118–1135
Hao M, Lin Y, Zhao X (2016) A relative error-based approach for variable selection. Comput Stat Data Anal 103:250–262
Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102
Li Z, Liu Y, Liu Z (2017) Empirical likelihood and general relative error criterion with divergent dimension. Statistics 51:1006–1022
Liu H, Xia X (2018) Estimation and empirical likelihood for single-index multiplicative models. J Stat Plan Inference 193:70–88
Liu X, Lin Y, Wang Z (2016) Group variable selection for relative error regression. J Stat Plan Inference 175:40–50
Liu H, Zhang X, Hu H, Ma J (2023) Analysis of the positive response data with the varying coefficient partially nonlinear multiplicative model. Stat Pap 1–30
Ming H, Liu H, Yang H (2022) Least product relative error estimation for identification in multiplicative additive models. J Comput Appl Math 404:113886
Negahban SN, Ravikumar P, Wainwright MJ, Yu B (2012) A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Stat Sci 27:538–557
Raskutti G, Wainwright MJ, Yu B (2011) Minimax rates of estimation for high-dimensional linear regression over \(\ell _q \)-balls. IEEE Trans Inf Theory 57(10):6976–6994
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Wang L, Kim Y, Li R (2013) Calibrating non-convex penalized regression in ultra-high dimension. Ann Stat 41(5):2505
Wang Z, Liu W, Lin Y (2015) A change-point problem in relative error-based regression. TEST 24(4):835–856
Wang Z, Chen Z, Wu Y (2017) A relative error estimation approach for multiplicative single index model. J Syst Sci Complex 30:1160–1172
Xia X, Liu Z, Yang H (2016) Regularized estimation for the least absolute relative error models with a diverging number of covariates. Comput Stat Data Anal 96:104–119
Yang Y, Ye F (2013) General relative error criterion and m-estimation. Front Math China 8:695–715
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhang Q, Wang Q (2013) Local least absolute relative error estimating approach for partially linear multiplicative model. Stat Sin 23:1091–1116
Zhang J, Zhu J, Feng Z (2019) Estimation and hypothesis test for single-index multiplicative models. TEST 28:242–268
Zhang J, Zhu J, Zhou Y, Cui X, Lu T (2020) Multiplicative regression models with distortion measurement errors. Stat Pap 61:2031–2057
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509
Acknowledgements
This work is supported by National Natural Science Foundation of China (Grant No. 12371281).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Our goal is to present techniques for deriving bounds on the difference between the optimal solution \(\hat{\varvec{\beta }}\) to the convex program (4) and the unknown vector \(\varvec{\beta }\). Before delving into our proof, we present some preliminary work, the primary results of which are derived from Negahban et al. (2012). Our work is based on imposing sparsity constraints on the model space and then studying the behavior of estimators. The purpose of the model subspace \({\mathcal {M}}\) is to capture the constrains specified by the model. The orthogonal complement of the space \({\mathcal {M}}\), namely, the set
is referred to as the perturbation subspace, representing deviations away from the model subspace \({\mathcal {M}}\).
Definition 1
(Decomposability of \({\mathcal {R}}\)) Given a subspace \({\mathcal {M}}\), a norm-based regularizer \({\mathcal {R}}\) is decomposable with respect to \(({\mathcal {M}},{\mathcal {M}}^{\bot })\) if
for all \(\varvec{\beta }\in {\mathcal {M}}\) and \(\varvec{\gamma }\in {\mathcal {M}}^{\bot }\). Define the projection operator
with the projection \(\Pi _{{\mathcal {M}}^{\bot }}\) define in an analogous manner.
Lemma 1
Suppose that \({\mathcal {L}}\) is a convex and differentiable function, and consider any optimal solution \(\hat{\varvec{\beta }}\) to the optimization problem (5) with a strictly positive regularization parameter satisfying
Then for any pair \(({\mathcal {M}},{\mathcal {M}}^{\bot })\) over which \({\mathcal {R}}\) is decomposable, the error \(\hat{\Delta }=\hat{\varvec{\beta }}-\varvec{\beta }\) belongs to the set
where \({\mathcal {R}}^*(\cdot )\) is the dual norm of \({\mathcal {R}}(\cdot )\).
Definition 2
(Restricted strong convexity) The loss function satisfies a restricted strong convexity (RSC) condition with curvature \({\mathcal {K}}_{{\mathcal {L}}}>0\) and tolerance function \(\tau _{{\mathcal {L}}}\) if
for all \(\Delta \in {\mathbb {C}}({\mathcal {M}},{\mathcal {M}}^{\bot };\varvec{\beta })\), where \(\delta {\mathcal {L}}(\Delta ,\varvec{\beta }):= {\mathcal {L}}(\varvec{\beta }+\Delta )-{\mathcal {L}}(\varvec{\beta })-\langle \nabla {\mathcal {L}}(\varvec{\beta }),\Delta \rangle \).
In particular, when \(\varvec{\beta }\in {\mathcal {M}}\), there are many statistical models for which the RSC condition holds with tolerance \(\tau _{{\mathcal {L}}}(\varvec{\beta })=0\). When loss function is twice differentiable, strong convexity amounts to lower bound on the eigenvalues of Hessian \(\nabla ^2{\mathcal {L}}(\varvec{\beta })\), holding uniformly for all \(\varvec{\beta }\) in a neighborhood of \(\varvec{\beta }\).
Definition 3
(Subspace compatibility constant) For any subspace \({\mathcal {M}}\) of \({\mathbb {R}}^p\), the subspace compatibility constant with respect to the pair \(({\mathcal {R}},\parallel \cdot \parallel _2)\) is given by
This measure indicates the degree of compatibility between the regularizer and the error norm over the subspace \({\mathcal {M}}\). In this paper, when the regularizer \({\mathcal {R}}(u)=\parallel u\parallel _{w,1}\), we can calculate the subspace compatibility constant \(\Psi ({\mathcal {M}})=\sup _{u\in {\mathcal {M}}\backslash {0}}\frac{\parallel u\parallel _{w,1}}{\parallel u\parallel _2}=\parallel w \parallel _2\).
Let’s begin by analyzing the decomposability of the penalty function. Suppose our model class of interest consists of s-sparse parameter vectors in \(\varvec{\beta }\in {\mathbb {R}}^p\), where \(\beta _i\ne 0\) if and only if \(i\in S\), and S is s-sized subset of \( \{1,2,\ldots ,p\}\). For any given subset S and its complement \(S^c\), we define the model subspace
Here our notation explicitly indicates that \({\mathcal {M}}\) depends explicitly on the chosen subset S. By construction, we have \(\Pi _{{\mathcal {M}}(S)}(\varvec{\beta })=\varvec{\beta }\) for any vector \(\varvec{\beta }\) that is supported on S. It is easy to obtain that the subspace \({\mathcal {M}}(S)\) contains \(\varvec{\beta }\).
We claim that for a strictly positive weight vector w, the weighted \(L_1\)-norm \(\parallel \varvec{\beta }\parallel _{w,1}:=\sum _{j=1}^{p}w_j\mid \beta _j\mid \) is decomposable with respect to the pair \(({\mathcal {M}}(S),{\mathcal {M}}^{\bot }(S))\). Indeed, by construction of the subspace, any \(\varvec{\beta }\in {\mathcal {M}}(S)\) can be written in the partitioned from \(\varvec{\beta }=(\varvec{\beta }_S,0_{S^c})\), where \(\varvec{\beta }_S\in {\mathbb {R}}^s\) and \(\varvec{0}_{S^c}\in {\mathbb {R}}^{p-s}\) is a vector of zeros. Similarly, any vector \(\varvec{\gamma }\in {\mathcal {M}}^{\bot }(S)\) has the partitioned representation \((\varvec{0}_S,\varvec{\gamma }_{S^c})\). Putting together the pieces, we obtain
To demonstrate the decomposability of the weight \(L_1\)-norm,we need to verify specific conditions. In this section, we’ll address these conditions and their implications for extending error bounds from Negahban et al. (2012).
First, to extend the error bounds, we must confirm that the regularizer meets the condition \(\lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta }))\). Simultaneously, we need to ensure that the LPRE-based loss function exhibits a form of restricted strong convexity. An important consequence of Lemma 1 is that, for any decomposable regularizer and a suitable choice of \(\lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta }))\) as the regularizer parameter, the error vector \(\hat{\Delta }=\hat{\varvec{\beta }}-\varvec{\beta }\) resides within a highly specific set, contingent on the unknown vector \(\varvec{\beta }\). Now, our task is to select an appropriate \(\lambda >0\) that satisfies \(\lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta }))\) with high probability. The least product relative error loss function is given by
with the gradient
With simple algebraic operations, we can get \(\nabla {\mathcal {L}}(\varvec{\beta })=\frac{1}{n}\sum _{i=1}^{n}(\varepsilon _i^{-1}-\varepsilon _i)X_i=\frac{1}{n}X^T(\varepsilon ^{-1}-\varepsilon )=\frac{1}{n}X^T\widetilde{\varepsilon }\), where as the dual norm of \(L_1\)-norm is the \(L_{\infty }\)-norm. Consequently, it is easily to obtain
Using the column normalization (C1) and sub-Gaussian (C2) conditions, for each \(j=1,\ldots ,p\), we have the tail bound
Consequently, by union bound, we conclude that
Setting \(t^2=\frac{4\sigma ^2\log p}{n}\), we can see that the choice of \(\lambda \) given in the statement is valid with probability at least \(1-c_1\exp (-c_2n\lambda ^2)\). By Lemma 1, we can choose \(\lambda =4\sigma \sqrt{\frac{\log p}{n}}\), then
where \(\varvec{\beta }\in {\mathcal {M}}(S)\) so that \({\mathcal {R}}(\varvec{\beta }_{{\mathcal {M}}^{\bot }})=\parallel \varvec{\beta }_{{\mathcal {M}}^{\bot }}\parallel _{w,1}=0\).
We now turn to an important requirement of the loss function. The \(p\times p\) Hessian matrix
has a rank no greater than n, indicating that the LPRE loss lacks strong convexity when \(p>n\). Similar to Bickel et al. (2009), consider the case where the elements of the Hessian matrix \(H_n\) are close to those of a positive definite \(p\times p\) matrix H. Denote
the maximal difference between the elements of the two matrices. Then, for any \(\Delta \) satisfying (A.8), we get
where s represents the number of non-zero components of \(\beta \). By using the definition of \(\eta _n\), we derive the first inequality. Assuming the weight function in Eq. (A.8) is \(w_j=1\) for all j leads to the second inequality. Lastly, applying the Cauchy-Schwarz inequality, we find that \(\parallel \Delta _S\parallel _{1} \le \sqrt{s} \parallel \Delta _S\parallel _{2}\) holds, thus confirming the validity of the last inequality. Thus, for \(\eta _n s\) small enough, the \({\Delta ^TH_n\Delta }/{\parallel \Delta \parallel _2^2}\) is bounded away from 0. This means that we have a kind of “restricted” positive definiteness, which is valid only for the vectors satisfying (A.8). This type of lower bound implies that LPRE-based loss satisfies a restricted strong convexity (RSC) condition with curvature \({\mathcal {K}}_{{\mathcal {L}}}>0\) if
for all \(\Delta \in {\mathbb {C}}({\mathcal {M}},{\mathcal {M}}^{\bot };\varvec{\beta })\).
In summary, we have confirmed the decomposability of the regularizer and the presence of restricted strong convexity in the LPRE-based loss. As a result, Theorem 1 follows from the bounds established in Corollary 1 of Negahban et al. (2012).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Y., Ming, H. & Yang, H. Efficient variable selection for high-dimensional multiplicative models: a novel LPRE-based approach. Stat Papers (2024). https://doi.org/10.1007/s00362-024-01545-1
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s00362-024-01545-1