Skip to main content
Log in

Optimal designs in sparse linear models

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

The Lasso approach is widely adopted for screening and estimating active effects in sparse linear models with quantitative factors. Many design schemes have been proposed based on different criteria to make the Lasso estimator more accurate. This article applies \(\varPhi _l\)-optimality to the asymptotic covariance matrix of the Lasso estimator. Smaller mean squared error and higher power of significant hypothesis tests can be achieved. A theoretically converging algorithm is given for searching for \(\varPhi _l\)-optimal designs, and modified by intermittent diffusion to avoid local solutions. Some simulations are given to support the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Belloni A, Wang L (2010) Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4):791–806

    Article  MathSciNet  Google Scholar 

  • Cai T, Liu W, Luo X (2011) A constrained \(l_1\)-minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106(494):594–607

    Article  Google Scholar 

  • Chow S, Yang T, Zhou H (2013) Global optimizations by intermittent diffusion. In: Adamatzky A, Chen G (eds) Chaos, CNN, memristors and beyond. World Scientific Press, Singapore, pp 466–479

    Chapter  Google Scholar 

  • Cook RD, Nachtsheim CJ (1980) A comparison of algorithms for constructing exact d-optimal designs. Technometrics 22(3):315–324

    Article  Google Scholar 

  • Deng X, Lin CD, Qian PZG (2013) The lasso with nearly orthogonal Latin hypercube designs. https://uq.wisc.edu/papers/Lasso_Design.pdf

  • Dette H, Melas VB, Guchenko R (2014) Bayesian t-optimal discriminating designs. Annal Stat 43(5):1959–1985

    Article  MathSciNet  Google Scholar 

  • Dette H, Titoff S (2009) Optimal discrimination designs. Annal Stat 37(4):2056–2082

    Article  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Article  Google Scholar 

  • Gilmour SG, Trinca LA (2012) Optimum design of experiments for statistical inference. J Royal Stat Soc Ser C 61(3):345–401

    Article  MathSciNet  Google Scholar 

  • Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909

    MathSciNet  MATH  Google Scholar 

  • Jones B, Lin DKJ, Nachtsheim CJ (2008) Bayesian d-optimal supersaturated designs. J Stat Plan Inference 138(1):86–92

    Article  MathSciNet  Google Scholar 

  • Kaymal T (2013) Assessing the operational effectiveness of a small surface combat ship in an anti-surfacewarfare environment. Masters thesis, Naval Postgraduate School, Monterey, California. https://calhoun.nps.edu/handle/10945/34685

  • Kiefer JC (1974) General equivalence theory for optimum designs. Annal Stat 2(5):849–879

    Article  MathSciNet  Google Scholar 

  • Li WW, Wu CFJ (1997) Columnwise-pairwise algorithms with applications to the construction of supersaturated designs. Technometrics 39(2):171–179

    Article  MathSciNet  Google Scholar 

  • Nguyen NK (1996) An algorithmic approach to constructing supersaturated designs. Technometrics 38(1):69–73

    Article  Google Scholar 

  • Pukelsheim FJ (1993) Optimal design of experiments. Wiley, New York

    MATH  Google Scholar 

  • Ravi SN, Ithapu VK, Johnson SC, Singh V (2016) Experimental design on a budget for sparse linear models and applications. In: Proceedings of the 33rd international conference on machine learning 48, 583-592

  • Satterthwaite FE (1959) Random balance experimentation. Technometrics 1(2):111–137

    Article  MathSciNet  Google Scholar 

  • Silvey SD (1980) Optimal design. Springer, New York

    Book  Google Scholar 

  • Sun T, Zhang CH (2012) Scaled sparse linear regression. Biometrika 99(4):879–898

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Annal Stat 42(3):1166–1202

    Article  MathSciNet  Google Scholar 

  • Wu CFJ, Hamada M (2009) Experiments: planning, analysis and parameter design optimization. Wiley, New York

    MATH  Google Scholar 

  • Xing D, Wan H, Zhu MY, Sanchez SM, Kaymal T (2013) Simulation screening experiments using lasso-optimal supersatured design and analysis: a maritime operations application. In: Proceedings of the 2013 winter simulation conference: simulation: making decisions in a complex world, 497-508

  • Zhang CH, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J Royal Stat Soc Ser B 76(1):217–242

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Both Yimin Huang and Xiangshun Kong are the common first authors, and Mingyao Ai is the corresponding author. The authors sincerely thank the editor, associate editor, and two referees for their valuable comments and insightful suggestions, which lead to further improvement of this article. The work is supported by NSFC grants 11671019 and 11801033, LMEQF and Beijing Institute of Technology Research Fund Program for Young Scholars.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyao Ai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of Theorems

Here we first introduce the following Lemma 1, which establishes the theoretical foundation how to calculate \(d(\cdot ,\xi )\) in Theorem 1.

Lemma 1

Let \(\varvec{\theta }(\varepsilon )\) be the solution of

$$\begin{aligned} \begin{aligned} \min _{\varvec{\theta }}\ \varvec{\theta } (A+\varepsilon B)\varvec{\theta }^T,\ \ \mathrm{subject~to}\ \Vert (A+\varepsilon B)\varvec{\theta }^T-\varvec{e}_j\Vert _\infty \le \mu , \end{aligned} \end{aligned}$$
(6)

where \(\varvec{\theta }\) is a p-dimensional row vector, j is a given integer in \(\{1\ldots p\}\), A is a \(p\times p\) symmetric positive definite matrix, B is a \(p\times p\) symmetric matrix, \(\varvec{e}_j\) is the p-dimensional vector with 1 in the jth entry and 0 in others and \(\mu >0\) is a constant. Then it follows that \(\varvec{\theta }(\varepsilon )\) is continuous and differentiable at \(\varepsilon =0\).

Proof

(a) First, we prove the continuity of \(\varvec{\theta }(\epsilon )\) at \(\epsilon =0\). We show this by contradiction. Note that the objective function in (6) is strictly convex near \(\varepsilon =0\). Hence, the solution of the quadratic optimization (6) is unique. And, the feasible area is a convex polygon.

In high dimensional space, the feasible area is viewed as a hyperplane and put below the bottom of the graph of objective function. Then, we keep raising the hyperplane. Until the two images intersected for the first time, the solution is found.

Since the feasible area and the graph of objective function are continuous with respect to \(\varepsilon \), their intersection is also continuous.

(b) Second, we prove the differentiability of \(\varvec{\theta }(\epsilon )\) at \(\epsilon =0\). Let \(\min _{\varvec{\theta }}\varvec{\theta } (A+\varepsilon B)\varvec{\theta }^T=c\). Note that \(\varvec{\theta }(A+\varepsilon B)\varvec{\theta }^T=c\) is an ellipsoid, and the restricted area is a convex polyhedron \(\mathcal {P}\). Hence, \(\varvec{\theta }(\varepsilon )\) is the unique intersection of the ellipsoid and one hyperplane which is the i-th boundary of the convex polyhedron \(\mathcal {P}\). To be precise, \(\varvec{\theta }(\varepsilon )\) is the unique solution of

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{\theta }(A+\varepsilon B)\varvec{\theta }^T=c\\ (A+\varepsilon B)_i\varvec{\theta }^T-\mathbf{1 }_{\{i=j\}}=\mu \quad (or~ -\mu ),\\ \end{array} \right. \end{aligned}$$
(7)

where \((A+\varepsilon B)_i\) is the i-th row of the matrix \(A+\varepsilon B\), j is the integer in (6) and \(\mathbf{1 }_{\{i=j\}}\) is the indicator function.

We notice that the value of c will change if \(\varepsilon \) changes a little. However, the value of i will not change, because both the convex polyhedron and \(\varvec{\theta }(\varepsilon )\) are continuous with respect to \(\varepsilon \) according to (a), i.e., the solution is still on the i-th boundary. If \(\varvec{\theta }(\varepsilon )\) locates on the both i-th and \(i'\)-th boundary of the polyhedron, the solution will locate on one of them after \(\varepsilon \) changes. In other words, there exists an i such that equation (7) holds for any sufficiently small \(\varepsilon \). Further, the conclusion is the same for the situation that \(\varvec{\theta }(\varepsilon )\) locates on more boundaries (the intersection of more than two hyperplanes).

Since the solution is unique, there exist a set of vectors \((A_k, c_k), k=1,\ldots p,\) such that (7) are equivalent to \(\sum _{k=1}^p(A_k\varvec{\theta }^T+c_k)^2=0\). Note that the solution \(\varvec{\theta }(\varepsilon )\) doesn’t depend on the parameter c in (7), since c is only used to match the constant terms \(c_k\)’s. Thus, the solution only depends on \(\varepsilon \). We obtain the differentiability of \(\varvec{\theta }(\varepsilon )\) because the solution of the linear equations is an algebraic expression. \(\square \)

Let \({\dot{\varTheta }}=\lim _{\varepsilon \rightarrow 0+}\varepsilon ^{-1}(\varTheta (\varepsilon )-\varTheta (0)),\) in which \(\varTheta (\varepsilon )\) is the solution of

$$\begin{aligned} \begin{aligned}&\min _{\varTheta }\quad \mathrm{tr}[\varTheta ({\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi ))\varTheta ^T],\\&\mathrm{subject~to}\quad \Vert ({\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi ))\theta _{j}^T-e_j\Vert _\infty \le \mu ~~\mathrm{for}~~j=1,\ldots ,p. \end{aligned} \end{aligned}$$

The existence of \({\dot{\varTheta }}\) can be verified by substituting \(A={\hat{\varSigma }}_\xi \) and \(B=\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi \) into Lemma 1.

Proof of Theorem 1

The approximate design \(\xi \) is a minimizer or a stationary point in (3) if and only if the Fréchet derivative

$$\begin{aligned} d(\varvec{x},\xi )=\lim _{\varepsilon \rightarrow 0+} \varepsilon ^{-1}\{\log \varPhi _l[M((1-\varepsilon )\xi +\varepsilon \delta _{\varvec{x}})]-\log \varPhi _l(M(\xi ))\} \end{aligned}$$

is non-negative for any \(\varvec{x}\in {\mathcal {X}}\), where \(\delta _{\varvec{x}}\) denotes a one-point design on \(\varvec{x}\). To get the Fréchet derivative, we can calculate the G\(\hat{a}\)teaux derivative of \(\log (\varPhi _l(M(\cdot )))\) at \(\xi \) in the direction \(\delta _{\varvec{x}}\), i.e., \(G(\xi ,\delta _{\varvec{x}})=\lim _{\varepsilon \rightarrow 0+}\varepsilon ^{-1}[\log \varPhi _l(M(\xi +\varepsilon \delta _{\varvec{x}}))-\log \varPhi _l(M(\xi ))].\) Note that \(\log \varPhi _l(M(\cdot ))=\log [\mathrm{tr}(M(\cdot )^l)]^{1/l}=l^{-1}\log \mathrm{tr}(M(\cdot )^l).\) We have

$$\begin{aligned} \log \varPhi _l(M(\xi +\varepsilon \delta _{\varvec{x}}))-\log \varPhi _l(M(\xi ))=\frac{1}{l}\log \frac{\mathrm{tr}(M_1^l)}{\mathrm{tr}(M_2^l)}, \end{aligned}$$
(8)

where \(M_1=M(\xi +\varepsilon \delta _{\varvec{x}})\) and \(M_2=M(\xi )\).

The definition of \(M_1\) gives that \(M_1=\varTheta _1({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varTheta _1^T,\) where \(\varTheta _1\) is the solution of

$$\begin{aligned} \begin{aligned} \min _{\varTheta } \mathrm{tr}(\varTheta ({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varTheta ^T),\ \ \mathrm{s.t.} \Vert ({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varvec{\theta }_{j}^T-\varvec{e}_j\Vert _\infty \le \mu \quad \mathrm{~for~} j=1,\ldots ,p. \end{aligned} \end{aligned}$$

Similarly, we have \(M_2=\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T,\) where \(\varTheta _2\) is the solution of

$$\begin{aligned} \begin{aligned} \min _{\varTheta }\ \mathrm{tr}(\varTheta {\hat{\varSigma }}_\xi \varTheta ^T),\ \ \mathrm{s.t.}\ \Vert {\hat{\varSigma }}_\xi \varvec{\theta }_{j}^T-\varvec{e}_j\Vert _\infty \le \mu \quad \mathrm{~for~} j=1,\ldots ,p. \end{aligned} \end{aligned}$$

From Lemma 1, the relation between \(\varTheta _1\) and \(\varTheta _2\) can be represented as

$$\begin{aligned} \begin{aligned} \varTheta _1(\varepsilon )&=\varTheta _1(0)+\left. \frac{d\varTheta _1(\varepsilon )}{d\varepsilon }\right| _{\varepsilon =0}\cdot \varepsilon +O(\varepsilon ^2)\\&=\varTheta _2 + {\dot{\varTheta }}_1(0) \cdot \varepsilon + o(\varepsilon ). \end{aligned} \end{aligned}$$

Denote \(\dot{\varTheta }_1(0)\) by \(\dot{\varTheta }_1\) for convenience. Then, it follows that

$$\begin{aligned} \begin{aligned} \mathrm{tr}(M_1^l)&=\mathrm{tr}[\varTheta _1({\hat{\varSigma }}_\xi +\varepsilon \varvec{x}\varvec{x}^T)\varTheta _1^T]^l\\&=\mathrm{tr}[(\varTheta _1{\hat{\varSigma }}_\xi \varTheta _1^T)^l+l\varepsilon (\varTheta _1{\hat{\varSigma }}_\xi \varTheta _1^T)^{l-1}(\varTheta _1\varvec{x}\varvec{x}^T\varTheta _1^T)]+o(\varepsilon )\\&=\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^l+l\varepsilon (\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}({\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T) \\&\quad +l\varepsilon (\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T)]+o(\varepsilon )\\&=\mathrm{tr}(M_2^l)+l\varepsilon \cdot \mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T\\&\quad +\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]+o(\varepsilon ). \end{aligned} \end{aligned}$$

Substituting the above decomposition of \(\mathrm{tr}(M_1^l)\) into the (8), we have

$$\begin{aligned} \frac{1}{l}\log \frac{tr(M_1^l)}{tr(M_2^l)}= & {} l^{-1}\log \{1+l\varepsilon \mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T\\&+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]\mathrm{tr}(M_2^l)^{-1}\}+o(\varepsilon )\\= & {} \varepsilon \mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T\\&+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]\mathrm{tr}(M_2^l)^{-1}+o(\varepsilon ). \end{aligned}$$

Therefore, the G\(\hat{a}\)teaux derivative can be calculated by

$$\begin{aligned}&\lim _{\varepsilon \rightarrow 0+} \frac{\log \varPhi _l(M(\xi +\varepsilon \delta _{\varvec{x}}))-\log \varPhi _l(M(\xi ))}{\varepsilon }\\&\qquad =\frac{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]}{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^l]}. \end{aligned}$$

The Fréchet derivative of \(\log (\varPhi _l(M(\cdot )))\) at \(\xi \) in the direction of \(\delta _{\varvec{x}}\) is obtained by \(d(\varvec{x},\xi )=G(\xi ,\delta _{\varvec{x}}-\xi ).\) We need only to replace the design matrix \(\varvec{x}\varvec{x}^T\) of the design \(\delta _{\varvec{x}}\) in G\(\hat{a}\)teaux derivative of \(\log (\varPhi _l(M(\cdot )))\) with \(\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi \). Simple calculation shows that

$$\begin{aligned} d(\varvec{x},\xi )=\frac{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^{l-1}(\varTheta _2\varvec{x}\varvec{x}^T\varTheta _2^T+{\dot{\varTheta }}_1{\hat{\varSigma }}_\xi \varTheta _2^T+\varTheta _2{\hat{\varSigma }}_\xi {\dot{\varTheta }}_1^T)]}{\mathrm{tr}[(\varTheta _2{\hat{\varSigma }}_\xi \varTheta _2^T)^l]}-1. \end{aligned}$$

\(\square \)

For realization of the proposed algorithm, an approximation for \({\dot{\varTheta }}\) must be given first. When \(\mu =0\), which could happen if \(n\ge p\), we have \(\varTheta (\varepsilon )[{\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi )]= I.\) By taking the derivative of both sides at \(\varepsilon =0\), we have

$$\begin{aligned} \varTheta (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi )+{\dot{\varTheta }}[{\hat{\varSigma }}_\xi +\varepsilon (\varvec{x}\varvec{x}^T-{\hat{\varSigma }}_\xi )]=O. \end{aligned}$$

Right multiplying both sides of the above equation by \(\varTheta ^T\) gives

$$\begin{aligned} {\dot{\varTheta }}{\hat{\varSigma }}_\xi \varTheta ^T=\varTheta {\hat{\varSigma }}_\xi \varTheta ^T-\varTheta (\varvec{x}\varvec{x}^T)\varTheta ^T. \end{aligned}$$

Substituting the above equation into d, the approximate form of d is given by

$$\begin{aligned} d(\varvec{x},\xi )=1-\frac{\mathrm{tr}[(\varTheta {\hat{\varSigma }}_\xi \varTheta ^T)^{l-1}(\varTheta \varvec{x}\varvec{x}^T\varTheta ^T)]}{\mathrm{tr}(\varTheta {\hat{\varSigma }}_\xi \varTheta ^T)^l}. \end{aligned}$$
(9)

For other cases, the inverse of \({\hat{\varSigma }}\) doesn’t exist, but the form is still used with the restriction of \(\varTheta \) in (3). Analogous to that, we also use it to approach the asymptotic covariance matrix. Therefore, in the simulations, we still use this approximation of d even for the case of \(n<p\).

Proof of Theorem 2

Note that in Algorithm 1, the function \(\varPhi _l(M(\xi _t))>0\) decreases with respect to t. So there must exist a non-negative real number \(\varPhi _l^*\) such that \(\lim _{t\rightarrow \infty }\varPhi _l(M(\xi _t))=\varPhi _l^*.\)

Without loss of generality, we assume that there exists a design \(\xi ^{**}\) such that \(\varPhi _l(M(\xi ^{**}))\)\(=\varPhi _l^*.\) Now we are ready to prove by contradiction. If \(\xi ^{**}\) is not the optimal design, i.e., \(\varPhi _l(M(\xi ^*))<\varPhi _l^*\), then there exists an \(x^*\) such that \(d(\varvec{x}^*,\xi ^{**})<0\) by Theorem 1, or \(\inf _{\varvec{x}\in {\mathcal {X}}}d(\varvec{x},\xi ^{**})=-2\gamma <0\). Hence, for any sufficiently large t, we have \(\inf _{\varvec{x}\in {\mathcal {X}}}d(\varvec{x},\xi _t)\le -\gamma .\) Let \({\widetilde{\xi }}_{t+1}(\alpha )=(1-\alpha )\xi _t+\alpha \delta _{\varvec{x}_t},\) where \(\varvec{x}_t=\arg \min _{\varvec{x}\in {\mathcal {X}}}d(\varvec{x},\xi _t)\), and \(\alpha _t=\arg \min _{\alpha \in [0,1]}\)\(\varPhi _l(M({\widetilde{\xi }}_t)).\) A Taylor expansion of \(\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]\) at \(\alpha =0\) gives that

$$\begin{aligned}&\varPhi _l(M(\xi _{t+1}))-\varPhi _l(M(\xi _t))\nonumber \\&\quad \le \varPhi _l\{M[{\widetilde{\xi }}_{t+1}(\alpha _{t+1})]\}-\varPhi _l(M(\xi _t))\nonumber \\&\quad =\min _{\alpha \in [0,1]}\{\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]-\varPhi _l(M(\xi _t))\}\nonumber \\&\quad \le \min _{\alpha \in [0,1]}\left( \alpha \left. \frac{d\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]}{d\alpha }\right| _{\alpha =0}+{1\over 2}\alpha ^2U\right) , \end{aligned}$$
(10)

where U is the upper bound of the second order derivative of \(\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]\) over [0, 1].

Note that in the proof of Theorem 1, we have

$$\begin{aligned} \left. \frac{d\varPhi _l[M({\widetilde{\xi }}_{t+1}(\alpha ))]}{d\alpha }\right| _{\alpha =0}=d(\varvec{x}_{t+1},\xi _{t+1}). \end{aligned}$$
(11)

By substituting (11) into (10), it follows that

$$\begin{aligned} \begin{aligned} \varPhi _l(M(\xi _{t+1}))-\varPhi _l(M(\xi _t))&\le \min _{\alpha \in [0,1]}(-\alpha \gamma +\frac{1}{2}\alpha ^2U)=-\frac{\gamma ^2}{2U}. \end{aligned} \end{aligned}$$
(12)

Therefore, for \(t>N\), we have that

$$\begin{aligned} \begin{aligned} \varPhi _l(M(\xi _t))-\varPhi _l(M(\xi _N))&=\sum _{k=N}^{t-1}\varPhi _l(M(\xi _{k+1}))-\varPhi _l(M(\xi _k))\le -(t-N)\frac{\gamma ^2}{2U}. \end{aligned} \end{aligned}$$
(13)

The left hand side tends to a finite number while the right hand side tends to \(-\infty \). This leads to a contradiction as \(t\rightarrow \infty \). \(\square \)

Proof of Theorem 3

Analogous to the proof of Theorem 2, here we need only to prove that there exists a real number \(\gamma \ge 0\) such that

$$\begin{aligned} \left. \frac{dg(\widetilde{\varvec{\omega }}^{(t+1)}(\alpha ))}{d\alpha }\right| _{\alpha =0}=-2\gamma <0. \end{aligned}$$

It should be noted that if the left hand side is greater than or equal to zero, \(\alpha =0\) will be a minimizer, which contradicts with the original hypothesis that \(\alpha =0\) is not a minimizer.\(\square \)

Appendix B: The four designs in Example 2

See Tables 456 and 7.

Table 4 The seven-point \(\varPhi _l\)-optimal design (in transpose) in Example 2
Table 5 The seven-point D-optimal supersaturated design (in transpose) in Example 2
Table 6 The seven-point nearly orthogonal LHD (in transpose) in Example 2
Table 7 The seven-point spectral experimental design (in transpose) in Example 2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Kong, X. & Ai, M. Optimal designs in sparse linear models. Metrika 83, 255–273 (2020). https://doi.org/10.1007/s00184-019-00722-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-019-00722-9

Keywords

Navigation