Optimal designs in sparse linear models

Huang, Yimin; Kong, Xiangshun; Ai, Mingyao

doi:10.1007/s00184-019-00722-9

Optimal designs in sparse linear models

Published: 17 June 2019

Volume 83, pages 255–273, (2020)
Cite this article

Metrika Aims and scope Submit manuscript

Yimin Huang¹,
Xiangshun Kong² &
Mingyao Ai¹

423 Accesses
2 Citations
Explore all metrics

Abstract

The Lasso approach is widely adopted for screening and estimating active effects in sparse linear models with quantitative factors. Many design schemes have been proposed based on different criteria to make the Lasso estimator more accurate. This article applies $\varPhi _l$-optimality to the asymptotic covariance matrix of the Lasso estimator. Smaller mean squared error and higher power of significant hypothesis tests can be achieved. A theoretically converging algorithm is given for searching for $\varPhi _l$-optimal designs, and modified by intermittent diffusion to avoid local solutions. Some simulations are given to support the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymptotic Statistical Analysis of Sparse Group LASSO via Approximate Message Passing

On the RODEO Method for Variable Selection

Robust optimal designs using a model misspecification term

Article Open access 08 February 2023

References

Belloni A, Wang L (2010) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806
Article MathSciNet Google Scholar
Cai T, Liu W, Luo X (2011) A constrained $l_1$-minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106(494):594–607
Article Google Scholar
Chow S, Yang T, Zhou H (2013) Global optimizations by intermittent diffusion. In: Adamatzky A, Chen G (eds) Chaos, CNN, memristors and beyond. World Scientific Press, Singapore, pp 466–479
Chapter Google Scholar
Cook RD, Nachtsheim CJ (1980) A comparison of algorithms for constructing exact d-optimal designs. Technometrics 22(3):315–324
Article Google Scholar
Deng X, Lin CD, Qian PZG (2013) The lasso with nearly orthogonal Latin hypercube designs. https://uq.wisc.edu/papers/Lasso_Design.pdf
Dette H, Melas VB, Guchenko R (2014) Bayesian t-optimal discriminating designs. Annal Stat 43(5):1959–1985
Article MathSciNet Google Scholar
Dette H, Titoff S (2009) Optimal discrimination designs. Annal Stat 37(4):2056–2082
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Article Google Scholar
Gilmour SG, Trinca LA (2012) Optimum design of experiments for statistical inference. J Royal Stat Soc Ser C 61(3):345–401
Article MathSciNet Google Scholar
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909
MathSciNet MATH Google Scholar
Jones B, Lin DKJ, Nachtsheim CJ (2008) Bayesian d-optimal supersaturated designs. J Stat Plan Inference 138(1):86–92
Article MathSciNet Google Scholar
Kaymal T (2013) Assessing the operational effectiveness of a small surface combat ship in an anti-surfacewarfare environment. Masters thesis, Naval Postgraduate School, Monterey, California. https://calhoun.nps.edu/handle/10945/34685
Kiefer JC (1974) General equivalence theory for optimum designs. Annal Stat 2(5):849–879
Article MathSciNet Google Scholar
Li WW, Wu CFJ (1997) Columnwise-pairwise algorithms with applications to the construction of supersaturated designs. Technometrics 39(2):171–179
Article MathSciNet Google Scholar
Nguyen NK (1996) An algorithmic approach to constructing supersaturated designs. Technometrics 38(1):69–73
Article Google Scholar
Pukelsheim FJ (1993) Optimal design of experiments. Wiley, New York
MATH Google Scholar
Ravi SN, Ithapu VK, Johnson SC, Singh V (2016) Experimental design on a budget for sparse linear models and applications. In: Proceedings of the 33rd international conference on machine learning 48, 583-592
Satterthwaite FE (1959) Random balance experimentation. Technometrics 1(2):111–137
Article MathSciNet Google Scholar
Silvey SD (1980) Optimal design. Springer, New York
Book Google Scholar
Sun T, Zhang CH (2012) Scaled sparse linear regression. Biometrika 99(4):879–898
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B 58(1):267–288
MathSciNet MATH Google Scholar
van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Annal Stat 42(3):1166–1202
Article MathSciNet Google Scholar
Wu CFJ, Hamada M (2009) Experiments: planning, analysis and parameter design optimization. Wiley, New York
MATH Google Scholar
Xing D, Wan H, Zhu MY, Sanchez SM, Kaymal T (2013) Simulation screening experiments using lasso-optimal supersatured design and analysis: a maritime operations application. In: Proceedings of the 2013 winter simulation conference: simulation: making decisions in a complex world, 497-508
Zhang CH, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J Royal Stat Soc Ser B 76(1):217–242
Article MathSciNet Google Scholar

Download references

Acknowledgements

Both Yimin Huang and Xiangshun Kong are the common first authors, and Mingyao Ai is the corresponding author. The authors sincerely thank the editor, associate editor, and two referees for their valuable comments and insightful suggestions, which lead to further improvement of this article. The work is supported by NSFC grants 11671019 and 11801033, LMEQF and Beijing Institute of Technology Research Fund Program for Young Scholars.

Author information

Authors and Affiliations

LMAM, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, 100871, China
Yimin Huang & Mingyao Ai
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, 100081, China
Xiangshun Kong

Authors

Yimin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangshun Kong
View author publications
You can also search for this author in PubMed Google Scholar
Mingyao Ai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingyao Ai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of Theorems

Here we first introduce the following Lemma 1, which establishes the theoretical foundation how to calculate $d(\cdot ,\xi )$ in Theorem 1.

Lemma 1

Let $\varvec{\theta }(\varepsilon )$ be the solution of

$$\begin{aligned} \begin{aligned} \min _{\varvec{\theta }}\ \varvec{\theta } (A+\varepsilon B)\varvec{\theta }^T,\ \ \mathrm{subject~to}\ \Vert (A+\varepsilon B)\varvec{\theta }^T-\varvec{e}_j\Vert _\infty \le \mu , \end{aligned} \end{aligned}$$

(6)

where $\varvec{\theta }$ is a p-dimensional row vector, j is a given integer in $\{1\ldots p\}$, A is a $p\times p$ symmetric positive definite matrix, B is a $p\times p$ symmetric matrix, $\varvec{e}_j$ is the p-dimensional vector with 1 in the jth entry and 0 in others and $\mu >0$ is a constant. Then it follows that $\varvec{\theta }(\varepsilon )$ is continuous and differentiable at $\varepsilon =0$.

Proof

(a) First, we prove the continuity of $\varvec{\theta }(\epsilon )$ at $\epsilon =0$. We show this by contradiction. Note that the objective function in (6) is strictly convex near $\varepsilon =0$. Hence, the solution of the quadratic optimization (6) is unique. And, the feasible area is a convex polygon.

In high dimensional space, the feasible area is viewed as a hyperplane and put below the bottom of the graph of objective function. Then, we keep raising the hyperplane. Until the two images intersected for the first time, the solution is found.

Since the feasible area and the graph of objective function are continuous with respect to $\varepsilon $, their intersection is also continuous.

(b) Second, we prove the differentiability of $\varvec{\theta }(\epsilon )$ at $\epsilon =0$. Let $\min _{\varvec{\theta }}\varvec{\theta } (A+\varepsilon B)\varvec{\theta }^T=c$. Note that $\varvec{\theta }(A+\varepsilon B)\varvec{\theta }^T=c$ is an ellipsoid, and the restricted area is a convex polyhedron $\mathcal {P}$. Hence, $\varvec{\theta }(\varepsilon )$ is the unique intersection of the ellipsoid and one hyperplane which is the i-th boundary of the convex polyhedron $\mathcal {P}$. To be precise, $\varvec{\theta }(\varepsilon )$ is the unique solution of

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{\theta }(A+\varepsilon B)\varvec{\theta }^T=c\\ (A+\varepsilon B)_i\varvec{\theta }^T-\mathbf{1 }_{\{i=j\}}=\mu \quad (or~ -\mu ),\\ \end{array} \right. \end{aligned}$$

(7)

where $(A+\varepsilon B)_i$ is the i-th row of the matrix $A+\varepsilon B$, j is the integer in (6) and $\mathbf{1 }_{\{i=j\}}$ is the indicator function.

We notice that the value of c will change if $\varepsilon $ changes a little. However, the value of i will not change, because both the convex polyhedron and $\varvec{\theta }(\varepsilon )$ are continuous with respect to $\varepsilon $ according to (a), i.e., the solution is still on the i-th boundary. If $\varvec{\theta }(\varepsilon )$ locates on the both i-th and $i'$-th boundary of the polyhedron, the solution will locate on one of them after $\varepsilon $ changes. In other words, there exists an i such that equation (7) holds for any sufficiently small $\varepsilon $. Further, the conclusion is the same for the situation that $\varvec{\theta }(\varepsilon )$ locates on more boundaries (the intersection of more than two hyperplanes).

Since the solution is unique, there exist a set of vectors $(A_k, c_k), k=1,\ldots p,$ such that (7) are equivalent to $\sum _{k=1}^p(A_k\varvec{\theta }^T+c_k)^2=0$. Note that the solution $\varvec{\theta }(\varepsilon )$ doesn’t depend on the parameter c in (7), since c is only used to match the constant terms $c_k$’s. Thus, the solution only depends on $\varepsilon $. We obtain the differentiability of $\varvec{\theta }(\varepsilon )$ because the solution of the linear equations is an algebraic expression. $\square $

Let ${\dot{\varTheta }}=\lim _{\varepsilon \rightarrow 0+}\varepsilon ^{-1}(\varTheta (\varepsilon )-\varTheta (0)),$ in which $\varTheta (\varepsilon )$ is the solution of

$$\begin{aligned} \begin{aligned}&\min _{\varTheta }\quad \mathrm{tr}[\varTheta ({\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi ))\varTheta ^T],\\&\mathrm{subject~to}\quad \Vert ({\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi ))\theta _{j}^T-e_j\Vert _\infty \le \mu ~~\mathrm{for}~~j=1,\ldots ,p. \end{aligned} \end{aligned}$$

The existence of ${\dot{\varTheta }}$ can be verified by substituting $A={\hat{\varSigma }}_\xi $ and $B=\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi $ into Lemma 1.

Proof of Theorem 1

The approximate design $\xi $ is a minimizer or a stationary point in (3) if and only if the Fréchet derivative

$$\begin{aligned} d(\varvec{x},\xi )=\lim _{\varepsilon \rightarrow 0+} \varepsilon ^{-1}\{\log \varPhi _l[M((1-\varepsilon )\xi +\varepsilon \delta _{\varvec{x}})]-\log \varPhi _l(M(\xi ))\} \end{aligned}$$

is non-negative for any $\varvec{x}\in {\mathcal {X}}$, where $\delta _{\varvec{x}}$ denotes a one-point design on $\varvec{x}$. To get the Fréchet derivative, we can calculate the G$\hat{a}$teaux derivative of $\log (\varPhi _l(M(\cdot )))$ at $\xi $ in the direction $\delta _{\varvec{x}}$, i.e., $G(\xi ,\delta _{\varvec{x}})=\lim _{\varepsilon \rightarrow 0+}\varepsilon ^{-1}[\log \varPhi _l(M(\xi +\varepsilon \delta _{\varvec{x}}))-\log \varPhi _l(M(\xi ))].$ Note that $\log \varPhi _l(M(\cdot ))=\log [\mathrm{tr}(M(\cdot )^l)]^{1/l}=l^{-1}\log \mathrm{tr}(M(\cdot )^l).$ We have

$$\begin{aligned} \log \varPhi _l(M(\xi +\varepsilon \delta _{\varvec{x}}))-\log \varPhi _l(M(\xi ))=\frac{1}{l}\log \frac{\mathrm{tr}(M_1^l)}{\mathrm{tr}(M_2^l)}, \end{aligned}$$

(8)

where $M_1=M(\xi +\varepsilon \delta _{\varvec{x}})$ and $M_2=M(\xi )$.

The definition of $M_1$ gives that $M_1=\varTheta _1({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varTheta _1^T,$ where $\varTheta _1$ is the solution of

$$\begin{aligned} \begin{aligned} \min _{\varTheta } \mathrm{tr}(\varTheta ({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varTheta ^T),\ \ \mathrm{s.t.} \Vert ({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varvec{\theta }_{j}^T-\varvec{e}_j\Vert _\infty \le \mu \quad \mathrm{~for~} j=1,\ldots ,p. \end{aligned} \end{aligned}$$

Similarly, we have $M_2=\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T,$ where $\varTheta _2$ is the solution of

$$\begin{aligned} \begin{aligned} \min _{\varTheta }\ \mathrm{tr}(\varTheta {\hat{\varSigma }}_\xi \varTheta ^T),\ \ \mathrm{s.t.}\ \Vert {\hat{\varSigma }}_\xi \varvec{\theta }_{j}^T-\varvec{e}_j\Vert _\infty \le \mu \quad \mathrm{~for~} j=1,\ldots ,p. \end{aligned} \end{aligned}$$

From Lemma 1, the relation between $\varTheta _1$ and $\varTheta _2$ can be represented as

$$\begin{aligned} \begin{aligned} \varTheta _1(\varepsilon )&=\varTheta _1(0)+\left. \frac{d\varTheta _1(\varepsilon )}{d\varepsilon }\right| _{\varepsilon =0}\cdot \varepsilon +O(\varepsilon ^2)\\&=\varTheta _2 + {\dot{\varTheta }}_1(0) \cdot \varepsilon + o(\varepsilon ). \end{aligned} \end{aligned}$$

Denote $\dot{\varTheta }_1(0)$ by $\dot{\varTheta }_1$ for convenience. Then, it follows that

$$\begin{aligned} \begin{aligned} \mathrm{tr}(M_1^l)&=\mathrm{tr}[\varTheta _1({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varTheta _1^T]^l\\&=\mathrm{tr}[(\varTheta _1{\hat{\varSigma }}_\xi \varTheta _1^T)^l+l\varepsilon (\varTheta _1{\hat{\varSigma }}_\xi \varTheta _1^T)^{l-1}(\varTheta _1\varvec{x}\varvec{x}^T\varTheta _1^T)]+o(\varepsilon )\\&=\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^l+l\varepsilon (\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}({\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T) \\&\quad +l\varepsilon (\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T)]+o(\varepsilon )\\&=\mathrm{tr}(M_2^l)+l\varepsilon \cdot \mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T\\&\quad +\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]+o(\varepsilon ). \end{aligned} \end{aligned}$$

Substituting the above decomposition of $\mathrm{tr}(M_1^l)$ into the (8), we have

$$\begin{aligned} \frac{1}{l}\log \frac{tr(M_1^l)}{tr(M_2^l)}= & {} l^{-1}\log \{1+l\varepsilon \mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T\\&+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]\mathrm{tr}(M_2^l)^{-1}\}+o(\varepsilon )\\= & {} \varepsilon \mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T\\&+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]\mathrm{tr}(M_2^l)^{-1}+o(\varepsilon ). \end{aligned}$$

Therefore, the G$\hat{a}$teaux derivative can be calculated by

$$\begin{aligned}&\lim _{\varepsilon \rightarrow 0+} \frac{\log \varPhi _l(M(\xi +\varepsilon \delta _{\varvec{x}}))-\log \varPhi _l(M(\xi ))}{\varepsilon }\\&\qquad =\frac{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]}{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^l]}. \end{aligned}$$

The Fréchet derivative of $\log (\varPhi _l(M(\cdot )))$ at $\xi $ in the direction of $\delta _{\varvec{x}}$ is obtained by $d(\varvec{x},\xi )=G(\xi ,\delta _{\varvec{x}}-\xi ).$ We need only to replace the design matrix $\varvec{x}\varvec{x}^T$ of the design $\delta _{\varvec{x}}$ in G$\hat{a}$teaux derivative of $\log (\varPhi _l(M(\cdot )))$ with $\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi $. Simple calculation shows that

$$\begin{aligned} d(\varvec{x},\xi )=\frac{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]}{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^l]}-1. \end{aligned}$$

$\square $

For realization of the proposed algorithm, an approximation for ${\dot{\varTheta }}$ must be given first. When $\mu =0$, which could happen if $n\ge p$, we have $\varTheta (\varepsilon )[{\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi )]= I.$ By taking the derivative of both sides at $\varepsilon =0$, we have

$$\begin{aligned} \varTheta (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi )+{\dot{\varTheta }}[{\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi )]=O. \end{aligned}$$

Right multiplying both sides of the above equation by $\varTheta ^T$ gives

$$\begin{aligned} {\dot{\varTheta }}{\hat{\varSigma }}_\xi \varTheta ^T=\varTheta {\hat{\varSigma }}_\xi \varTheta ^T-\varTheta (\varvec{x}\varvec{x}^T)\varTheta ^T. \end{aligned}$$

Substituting the above equation into d, the approximate form of d is given by

$$\begin{aligned} d(\varvec{x},\xi )=1-\frac{\mathrm{tr}[(\varTheta {\hat{\varSigma }}_\xi \varTheta ^T)^{l-1}(\varTheta \varvec{x}\varvec{x}^T\varTheta ^T)]}{\mathrm{tr}(\varTheta {\hat{\varSigma }}_\xi \varTheta ^T)^l}. \end{aligned}$$

(9)

For other cases, the inverse of ${\hat{\varSigma }}$ doesn’t exist, but the form is still used with the restriction of $\varTheta $ in (3). Analogous to that, we also use it to approach the asymptotic covariance matrix. Therefore, in the simulations, we still use this approximation of d even for the case of $n<p$.

Proof of Theorem 2

Note that in Algorithm 1, the function $\varPhi _l(M(\xi _t))>0$ decreases with respect to t. So there must exist a non-negative real number $\varPhi _l^*$ such that $\lim _{t\rightarrow \infty }\varPhi _l(M(\xi _t))=\varPhi _l^*.$

Without loss of generality, we assume that there exists a design $\xi ^{**}$ such that $\varPhi _l(M(\xi ^{**}))$$=\varPhi _l^*.$ Now we are ready to prove by contradiction. If $\xi ^{**}$ is not the optimal design, i.e., $\varPhi _l(M(\xi ^*))<\varPhi _l^*$, then there exists an $x^*$ such that $d(\varvec{x}^*,\xi ^{**})<0$ by Theorem 1, or $\inf _{\varvec{x}\in {\mathcal {X}}}d(\varvec{x},\xi ^{**})=-2\gamma <0$. Hence, for any sufficiently large t, we have $\inf _{\varvec{x}\in {\mathcal {X}}}d(\varvec{x},\xi _t)\le -\gamma .$ Let ${\widetilde{\xi }}_{t+1}(\alpha )=(1-\alpha )\xi _t+\alpha \delta _{\varvec{x}_t},$ where $\varvec{x}_t=\arg \min _{\varvec{x}\in {\mathcal {X}}}d(\varvec{x},\xi _t)$, and $\alpha _t=\arg \min _{\alpha \in [0,1]}$$\varPhi _l(M({\widetilde{\xi }}_t)).$ A Taylor expansion of $\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]$ at $\alpha =0$ gives that

$$\begin{aligned}&\varPhi _l(M(\xi _{t+1}))-\varPhi _l(M(\xi _t))\nonumber \\&\quad \le \varPhi _l\{M[{\widetilde{\xi }}_{t+1}(\alpha _{t+1})]\}-\varPhi _l(M(\xi _t))\nonumber \\&\quad =\min _{\alpha \in [0,1]}\{\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]-\varPhi _l(M(\xi _t))\}\nonumber \\&\quad \le \min _{\alpha \in [0,1]}\left( \alpha \left. \frac{d\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]}{d\alpha }\right| _{\alpha =0}+{1\over 2}\alpha ^2U\right) , \end{aligned}$$

(10)

where U is the upper bound of the second order derivative of $\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]$ over [0, 1].

Note that in the proof of Theorem 1, we have

$$\begin{aligned} \left. \frac{d\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]}{d\alpha }\right| _{\alpha =0}=d(\varvec{x}_{t+1},\xi _{t+1}). \end{aligned}$$

(11)

By substituting (11) into (10), it follows that

$$\begin{aligned} \begin{aligned} \varPhi _l(M(\xi _{t+1}))-\varPhi _l(M(\xi _t))&\le \min _{\alpha \in [0,1]}(-\alpha \gamma +\frac{1}{2}\alpha ^2U)=-\frac{\gamma ^2}{2U}. \end{aligned} \end{aligned}$$

(12)

Therefore, for $t>N$, we have that

$$\begin{aligned} \begin{aligned} \varPhi _l(M(\xi _t))-\varPhi _l(M(\xi _N))&=\sum _{k=N}^{t-1}\varPhi _l(M(\xi _{k+1}))-\varPhi _l(M(\xi _k))\le -(t-N)\frac{\gamma ^2}{2U}. \end{aligned} \end{aligned}$$

(13)

The left hand side tends to a finite number while the right hand side tends to $-\infty $. This leads to a contradiction as $t\rightarrow \infty $. $\square $

Proof of Theorem 3

Analogous to the proof of Theorem 2, here we need only to prove that there exists a real number $\gamma \ge 0$ such that

$$\begin{aligned} \left. \frac{dg(\widetilde{\varvec{\omega }}^{(t+1)}(\alpha ))}{d\alpha }\right| _{\alpha =0}=-2\gamma <0. \end{aligned}$$

It should be noted that if the left hand side is greater than or equal to zero, $\alpha =0$ will be a minimizer, which contradicts with the original hypothesis that $\alpha =0$ is not a minimizer.$\square $

Appendix B: The four designs in Example 2

See Tables 4, 5, 6 and 7.

Table 4 The seven-point $\varPhi _l$-optimal design (in transpose) in Example 2

Full size table

Table 5 The seven-point D-optimal supersaturated design (in transpose) in Example 2

Full size table

Table 6 The seven-point nearly orthogonal LHD (in transpose) in Example 2

Full size table

Table 7 The seven-point spectral experimental design (in transpose) in Example 2

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Y., Kong, X. & Ai, M. Optimal designs in sparse linear models. Metrika 83, 255–273 (2020). https://doi.org/10.1007/s00184-019-00722-9

Download citation

Received: 27 December 2018
Published: 17 June 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s00184-019-00722-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal designs in sparse linear models

Abstract

Access this article

Similar content being viewed by others

Asymptotic Statistical Analysis of Sparse Group LASSO via Approximate Message Passing

On the RODEO Method for Variable Selection

Robust optimal designs using a model misspecification term

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs of Theorems

Lemma 1

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Appendix B: The four designs in Example 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal designs in sparse linear models

Abstract

Access this article

Similar content being viewed by others

Asymptotic Statistical Analysis of Sparse Group LASSO via Approximate Message Passing

On the RODEO Method for Variable Selection

Robust optimal designs using a model misspecification term

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs of Theorems

Lemma 1

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Appendix B: The four designs in Example 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation