Skip to main content
Log in

Efficient variable selection for high-dimensional multiplicative models: a novel LPRE-based approach

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

This paper explores a novel high-dimensional sparse multiplicative model, which deal with data with positive responses, particularly in economical and biomedical researches. The proposed regularized method is conducted on the least product relative error (LPRE), and can be applied on various penalties including adaptive Lasso, SCAD, and MCP. An adjusted ADMM algorithm is adopted to obtain the estimators based on LPRE loss. Additionally, we prove the consistency and compute the convergence rates of the estimator. To validate the effectiveness of the proposed method, we conduct extensive numerical studies and real data analysis, yielding valuable insights and practical applications, utilizing well-known datasets of the Boston housing data and gold price data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1

Similar content being viewed by others

References

  • Bickel PJ, Ritov Y, Tsybakov AB (2009) Simultaneous analysis of lasso and dantzig selector. Ann Stat 37:1705–1732

    Article  MathSciNet  Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122

    Article  Google Scholar 

  • Chen J, Chen Z (2008) Extended bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771

    Article  MathSciNet  Google Scholar 

  • Chen Y, Liu H (2023) A new relative error estimation for partially linear multiplicative model. Commun Stat-Simul Comput 52(10):4962–4980

    Article  MathSciNet  Google Scholar 

  • Chen K, Guo S, Lin Y, Ying Z (2010) Least absolute relative error estimation. J Am Stat Assoc 105:1104–1112

    Article  MathSciNet  Google Scholar 

  • Chen K, Lin Y, Wang Z, Ying Z (2016) Least product relative error estimation. J Multivar Anal 144:91–98

    Article  MathSciNet  Google Scholar 

  • Chen Y, Liu H, Ma J (2022) Local least product relative error estimation for single-index varying-coefficient multiplicative model with positive responses. J Comput Appl Math 415:114478

    Article  MathSciNet  Google Scholar 

  • Ding H, Wang Z, Wu Y (2018) A relative error-based estimation with an increasing number of parameters. Commun Stat-Theory Methods 47(1):196–209

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan J, Ma Y, Dai W (2014) Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Am Stat Assoc 109(507):1270–1284

    Article  MathSciNet  Google Scholar 

  • Fan R, Zhang S, Wu Y (2023) Nonconcave penalized m-estimation for the least absolute relative errors model. Commun Stat-Theory Methods 52:1118–1135

    Article  MathSciNet  Google Scholar 

  • Hao M, Lin Y, Zhao X (2016) A relative error-based approach for variable selection. Comput Stat Data Anal 103:250–262

    Article  MathSciNet  Google Scholar 

  • Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102

    Article  Google Scholar 

  • Li Z, Liu Y, Liu Z (2017) Empirical likelihood and general relative error criterion with divergent dimension. Statistics 51:1006–1022

    Article  MathSciNet  Google Scholar 

  • Liu H, Xia X (2018) Estimation and empirical likelihood for single-index multiplicative models. J Stat Plan Inference 193:70–88

    Article  MathSciNet  Google Scholar 

  • Liu X, Lin Y, Wang Z (2016) Group variable selection for relative error regression. J Stat Plan Inference 175:40–50

    Article  MathSciNet  Google Scholar 

  • Liu H, Zhang X, Hu H, Ma J (2023) Analysis of the positive response data with the varying coefficient partially nonlinear multiplicative model. Stat Pap 1–30

  • Ming H, Liu H, Yang H (2022) Least product relative error estimation for identification in multiplicative additive models. J Comput Appl Math 404:113886

    Article  MathSciNet  Google Scholar 

  • Negahban SN, Ravikumar P, Wainwright MJ, Yu B (2012) A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Stat Sci 27:538–557

    Article  MathSciNet  Google Scholar 

  • Raskutti G, Wainwright MJ, Yu B (2011) Minimax rates of estimation for high-dimensional linear regression over \(\ell _q \)-balls. IEEE Trans Inf Theory 57(10):6976–6994

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  Google Scholar 

  • Wang L, Kim Y, Li R (2013) Calibrating non-convex penalized regression in ultra-high dimension. Ann Stat 41(5):2505

    Article  Google Scholar 

  • Wang Z, Liu W, Lin Y (2015) A change-point problem in relative error-based regression. TEST 24(4):835–856

    Article  MathSciNet  Google Scholar 

  • Wang Z, Chen Z, Wu Y (2017) A relative error estimation approach for multiplicative single index model. J Syst Sci Complex 30:1160–1172

    Article  MathSciNet  Google Scholar 

  • Xia X, Liu Z, Yang H (2016) Regularized estimation for the least absolute relative error models with a diverging number of covariates. Comput Stat Data Anal 96:104–119

    Article  MathSciNet  Google Scholar 

  • Yang Y, Ye F (2013) General relative error criterion and m-estimation. Front Math China 8:695–715

    Article  MathSciNet  Google Scholar 

  • Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    Article  MathSciNet  Google Scholar 

  • Zhang Q, Wang Q (2013) Local least absolute relative error estimating approach for partially linear multiplicative model. Stat Sin 23:1091–1116

    MathSciNet  Google Scholar 

  • Zhang J, Zhu J, Feng Z (2019) Estimation and hypothesis test for single-index multiplicative models. TEST 28:242–268

    Article  MathSciNet  Google Scholar 

  • Zhang J, Zhu J, Zhou Y, Cui X, Lu T (2020) Multiplicative regression models with distortion measurement errors. Stat Pap 61:2031–2057

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Article  MathSciNet  Google Scholar 

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. 12371281).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hu Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Our goal is to present techniques for deriving bounds on the difference between the optimal solution \(\hat{\varvec{\beta }}\) to the convex program (4) and the unknown vector \(\varvec{\beta }\). Before delving into our proof, we present some preliminary work, the primary results of which are derived from Negahban et al. (2012). Our work is based on imposing sparsity constraints on the model space and then studying the behavior of estimators. The purpose of the model subspace \({\mathcal {M}}\) is to capture the constrains specified by the model. The orthogonal complement of the space \({\mathcal {M}}\), namely, the set

$$\begin{aligned} {\mathcal {M}}^{\bot }:= \left\{ v\in {\mathbb {R}}^p\mid \langle u,v\rangle =0 \text {~for all~} u\in {\mathcal {M}} \right\} , \end{aligned}$$

is referred to as the perturbation subspace, representing deviations away from the model subspace \({\mathcal {M}}\).

Definition 1

(Decomposability of \({\mathcal {R}}\)) Given a subspace \({\mathcal {M}}\), a norm-based regularizer \({\mathcal {R}}\) is decomposable with respect to \(({\mathcal {M}},{\mathcal {M}}^{\bot })\) if

$$\begin{aligned} {\mathcal {R}}(\varvec{\beta }+\varvec{\gamma })={\mathcal {R}}(\varvec{\beta })+{\mathcal {R}}(\varvec{\gamma }) \end{aligned}$$
(A.1)

for all \(\varvec{\beta }\in {\mathcal {M}}\) and \(\varvec{\gamma }\in {\mathcal {M}}^{\bot }\). Define the projection operator

$$\begin{aligned} \Pi _{{\mathcal {M}}}(u):= \mathop {\arg \min }\limits _{v\in {\mathcal {M}}}\parallel u-v\parallel _2 \end{aligned}$$
(A.2)

with the projection \(\Pi _{{\mathcal {M}}^{\bot }}\) define in an analogous manner.

Lemma 1

Suppose that \({\mathcal {L}}\) is a convex and differentiable function, and consider any optimal solution \(\hat{\varvec{\beta }}\) to the optimization problem (5) with a strictly positive regularization parameter satisfying

$$\begin{aligned} \lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta })). \end{aligned}$$
(A.3)

Then for any pair \(({\mathcal {M}},{\mathcal {M}}^{\bot })\) over which \({\mathcal {R}}\) is decomposable, the error \(\hat{\Delta }=\hat{\varvec{\beta }}-\varvec{\beta }\) belongs to the set

$$\begin{aligned} {\mathbb {C}}({\mathcal {M}},{\mathcal {M}}^{\bot };\varvec{\beta }):=\{\Delta \in {\mathbb {R}}^p\mid {\mathcal {R}}(\Delta _{{\mathcal {M}}^{\bot }})\le 3{\mathcal {R}}(\Delta _{{\mathcal {M}}})+4{\mathcal {R}}(\varvec{\beta }_{{\mathcal {M}}^{\bot }})\}, \end{aligned}$$
(A.4)

where \({\mathcal {R}}^*(\cdot )\) is the dual norm of \({\mathcal {R}}(\cdot )\).

Definition 2

(Restricted strong convexity) The loss function satisfies a restricted strong convexity (RSC) condition with curvature \({\mathcal {K}}_{{\mathcal {L}}}>0\) and tolerance function \(\tau _{{\mathcal {L}}}\) if

$$\begin{aligned} \delta {\mathcal {L}}(\Delta ,\varvec{\beta })\ge {\mathcal {K}}_{{\mathcal {L}}}\parallel \Delta \parallel ^2_2- \tau ^2_{{\mathcal {L}}}(\varvec{\beta }) \end{aligned}$$
(A.5)

for all \(\Delta \in {\mathbb {C}}({\mathcal {M}},{\mathcal {M}}^{\bot };\varvec{\beta })\), where \(\delta {\mathcal {L}}(\Delta ,\varvec{\beta }):= {\mathcal {L}}(\varvec{\beta }+\Delta )-{\mathcal {L}}(\varvec{\beta })-\langle \nabla {\mathcal {L}}(\varvec{\beta }),\Delta \rangle \).

In particular, when \(\varvec{\beta }\in {\mathcal {M}}\), there are many statistical models for which the RSC condition holds with tolerance \(\tau _{{\mathcal {L}}}(\varvec{\beta })=0\). When loss function is twice differentiable, strong convexity amounts to lower bound on the eigenvalues of Hessian \(\nabla ^2{\mathcal {L}}(\varvec{\beta })\), holding uniformly for all \(\varvec{\beta }\) in a neighborhood of \(\varvec{\beta }\).

Definition 3

(Subspace compatibility constant) For any subspace \({\mathcal {M}}\) of \({\mathbb {R}}^p\), the subspace compatibility constant with respect to the pair \(({\mathcal {R}},\parallel \cdot \parallel _2)\) is given by

$$\begin{aligned} \Psi ({\mathcal {M}}):=\sup \limits _{u\in {\mathcal {M}}\backslash \{0\}}\frac{{\mathcal {R}}(u)}{\parallel u\parallel _2}. \end{aligned}$$
(A.6)

This measure indicates the degree of compatibility between the regularizer and the error norm over the subspace \({\mathcal {M}}\). In this paper, when the regularizer \({\mathcal {R}}(u)=\parallel u\parallel _{w,1}\), we can calculate the subspace compatibility constant \(\Psi ({\mathcal {M}})=\sup _{u\in {\mathcal {M}}\backslash {0}}\frac{\parallel u\parallel _{w,1}}{\parallel u\parallel _2}=\parallel w \parallel _2\).

Let’s begin by analyzing the decomposability of the penalty function. Suppose our model class of interest consists of s-sparse parameter vectors in \(\varvec{\beta }\in {\mathbb {R}}^p\), where \(\beta _i\ne 0\) if and only if \(i\in S\), and S is s-sized subset of \( \{1,2,\ldots ,p\}\). For any given subset S and its complement \(S^c\), we define the model subspace

$$\begin{aligned} {\mathcal {M}}(S):=\{\varvec{\beta }\in {\mathbb {R}}^p\mid \beta _j=0 \text {~for all~} j\notin S\}. \end{aligned}$$

Here our notation explicitly indicates that \({\mathcal {M}}\) depends explicitly on the chosen subset S. By construction, we have \(\Pi _{{\mathcal {M}}(S)}(\varvec{\beta })=\varvec{\beta }\) for any vector \(\varvec{\beta }\) that is supported on S. It is easy to obtain that the subspace \({\mathcal {M}}(S)\) contains \(\varvec{\beta }\).

We claim that for a strictly positive weight vector w, the weighted \(L_1\)-norm \(\parallel \varvec{\beta }\parallel _{w,1}:=\sum _{j=1}^{p}w_j\mid \beta _j\mid \) is decomposable with respect to the pair \(({\mathcal {M}}(S),{\mathcal {M}}^{\bot }(S))\). Indeed, by construction of the subspace, any \(\varvec{\beta }\in {\mathcal {M}}(S)\) can be written in the partitioned from \(\varvec{\beta }=(\varvec{\beta }_S,0_{S^c})\), where \(\varvec{\beta }_S\in {\mathbb {R}}^s\) and \(\varvec{0}_{S^c}\in {\mathbb {R}}^{p-s}\) is a vector of zeros. Similarly, any vector \(\varvec{\gamma }\in {\mathcal {M}}^{\bot }(S)\) has the partitioned representation \((\varvec{0}_S,\varvec{\gamma }_{S^c})\). Putting together the pieces, we obtain

$$\begin{aligned} \parallel \varvec{\beta }+\varvec{\gamma }\parallel _{w,1}=\parallel (\varvec{\beta }_S,\varvec{0}_{S^c})+(\varvec{0}_S,\varvec{\gamma }_{S^c}) \parallel _{w,1}=\parallel \varvec{\beta }\parallel _{w,1}+\parallel \varvec{\gamma }\parallel _{w,1}. \end{aligned}$$

To demonstrate the decomposability of the weight \(L_1\)-norm,we need to verify specific conditions. In this section, we’ll address these conditions and their implications for extending error bounds from Negahban et al. (2012).

First, to extend the error bounds, we must confirm that the regularizer meets the condition \(\lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta }))\). Simultaneously, we need to ensure that the LPRE-based loss function exhibits a form of restricted strong convexity. An important consequence of Lemma 1 is that, for any decomposable regularizer and a suitable choice of \(\lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta }))\) as the regularizer parameter, the error vector \(\hat{\Delta }=\hat{\varvec{\beta }}-\varvec{\beta }\) resides within a highly specific set, contingent on the unknown vector \(\varvec{\beta }\). Now, our task is to select an appropriate \(\lambda >0\) that satisfies \(\lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta }))\) with high probability. The least product relative error loss function is given by

$$\begin{aligned} {\mathcal {L}}(\varvec{\beta })=\frac{1}{n}\sum _{i=1}^{n}\{Y_i\exp (-X_i^T{\varvec{\beta }})+Y_i^{-1}\exp (X_i^T{\varvec{\beta }})\}, \end{aligned}$$

with the gradient

$$\begin{aligned} \nabla {\mathcal {L}}(\varvec{\beta })=\frac{1}{n}\sum _{i=1}^{n}\{-Y_i\exp (-X_i^T{\varvec{\beta }})+Y_i^{-1}\exp (X_i^T{\varvec{\beta }})\}X_i. \end{aligned}$$
(A.7)

With simple algebraic operations, we can get \(\nabla {\mathcal {L}}(\varvec{\beta })=\frac{1}{n}\sum _{i=1}^{n}(\varepsilon _i^{-1}-\varepsilon _i)X_i=\frac{1}{n}X^T(\varepsilon ^{-1}-\varepsilon )=\frac{1}{n}X^T\widetilde{\varepsilon }\), where as the dual norm of \(L_1\)-norm is the \(L_{\infty }\)-norm. Consequently, it is easily to obtain

$$\begin{aligned} \lambda \ge 2{\mathcal {R}}^*(\nabla {\mathcal {L}}(\varvec{\beta }))=2\parallel \frac{1}{n}X^T\widetilde{\varepsilon }\parallel _{\infty }. \end{aligned}$$

Using the column normalization (C1) and sub-Gaussian (C2) conditions, for each \(j=1,\ldots ,p\), we have the tail bound

$$\begin{aligned} P\{\vert \langle X_j, \widetilde{\varepsilon }\rangle /n\vert \ge t\}\le 2\exp (-\frac{nt^2}{2\sigma ^2}). \end{aligned}$$

Consequently, by union bound, we conclude that

$$\begin{aligned} P\{ \parallel X^T\widetilde{\varepsilon }/n\parallel _{\infty } \ge t\}\le 2\exp (-\frac{nt^2}{2\sigma ^2}+\log p). \end{aligned}$$

Setting \(t^2=\frac{4\sigma ^2\log p}{n}\), we can see that the choice of \(\lambda \) given in the statement is valid with probability at least \(1-c_1\exp (-c_2n\lambda ^2)\). By Lemma 1, we can choose \(\lambda =4\sigma \sqrt{\frac{\log p}{n}}\), then

$$\begin{aligned} \parallel \Delta _{S^c}\parallel _{w,1}\le 3\parallel \Delta _{S}\parallel _{w,1}, \end{aligned}$$
(A.8)

where \(\varvec{\beta }\in {\mathcal {M}}(S)\) so that \({\mathcal {R}}(\varvec{\beta }_{{\mathcal {M}}^{\bot }})=\parallel \varvec{\beta }_{{\mathcal {M}}^{\bot }}\parallel _{w,1}=0\).

We now turn to an important requirement of the loss function. The \(p\times p\) Hessian matrix

$$\begin{aligned} H_n&:=\nabla ^2{\mathcal {L}}(\varvec{\beta })\\&=\frac{1}{n}\sum _{i=1}^{n}\{Y_i\exp (-X_i^T{\varvec{\beta }})+Y_i^{-1}\exp (X_i^T{\varvec{\beta }})\}X_iX_i^T\\&=\frac{1}{n}\sum _{i=1}^{n}(\varepsilon _i+\varepsilon _i^{-1})X_iX_i^T \end{aligned}$$

has a rank no greater than n, indicating that the LPRE loss lacks strong convexity when \(p>n\). Similar to Bickel et al. (2009), consider the case where the elements of the Hessian matrix \(H_n\) are close to those of a positive definite \(p\times p\) matrix H. Denote

$$\begin{aligned} \eta _n\triangleq \max \limits _{i,j}\mid (H_n-H)_{i,j}\mid , \end{aligned}$$

the maximal difference between the elements of the two matrices. Then, for any \(\Delta \) satisfying (A.8), we get

$$\begin{aligned} \frac{\Delta ^TH_n\Delta }{\parallel \Delta \parallel _2^2}&= \frac{\Delta ^TH\Delta +\Delta ^T(H_n-H)\Delta }{\parallel \Delta \parallel _2^2}\\&\ge \frac{\Delta ^TH\Delta }{\parallel \Delta \parallel _2^2}-\frac{\eta _n\parallel \Delta \parallel _{1}^2}{\parallel \Delta \parallel _2^2}\\&\ge \frac{\Delta ^TH\Delta }{\parallel \Delta \parallel _2^2}-\eta _n\left( \frac{4\parallel \Delta _S\parallel _{1}}{\parallel \Delta _S\parallel _{2}}\right) ^2\\&\ge \frac{\Delta ^TH\Delta }{\parallel \Delta \parallel _2^2}-16\eta _n s, \end{aligned}$$

where s represents the number of non-zero components of \(\beta \). By using the definition of \(\eta _n\), we derive the first inequality. Assuming the weight function in Eq. (A.8) is \(w_j=1\) for all j leads to the second inequality. Lastly, applying the Cauchy-Schwarz inequality, we find that \(\parallel \Delta _S\parallel _{1} \le \sqrt{s} \parallel \Delta _S\parallel _{2}\) holds, thus confirming the validity of the last inequality. Thus, for \(\eta _n s\) small enough, the \({\Delta ^TH_n\Delta }/{\parallel \Delta \parallel _2^2}\) is bounded away from 0. This means that we have a kind of “restricted” positive definiteness, which is valid only for the vectors satisfying (A.8). This type of lower bound implies that LPRE-based loss satisfies a restricted strong convexity (RSC) condition with curvature \({\mathcal {K}}_{{\mathcal {L}}}>0\) if

$$\begin{aligned} \Delta ^T\nabla ^2{\mathcal {L}}(\varvec{\beta })\Delta \ge {\mathcal {K}}_{{\mathcal {L}}}\parallel \Delta \parallel ^2 \end{aligned}$$

for all \(\Delta \in {\mathbb {C}}({\mathcal {M}},{\mathcal {M}}^{\bot };\varvec{\beta })\).

In summary, we have confirmed the decomposability of the regularizer and the presence of restricted strong convexity in the LPRE-based loss. As a result, Theorem 1 follows from the bounds established in Corollary 1 of Negahban et al. (2012).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Ming, H. & Yang, H. Efficient variable selection for high-dimensional multiplicative models: a novel LPRE-based approach. Stat Papers (2024). https://doi.org/10.1007/s00362-024-01545-1

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s00362-024-01545-1

Keywords

Navigation