Lasso Kriging for efficiently selecting a global trend model

Park, Inseok

doi:10.1007/s00158-021-02939-7

Lasso Kriging for efficiently selecting a global trend model

Research Paper
Published: 11 June 2021

Volume 64, pages 1527–1543, (2021)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Inseok Park ORCID: orcid.org/0000-0001-5457-5369¹

535 Accesses
8 Citations
Explore all metrics

Abstract

Kriging has been more and more widely used as a method to construct surrogate models in a variety of areas within the engineering field. The universal Kriging is less appealing than the ordinary Kriging in the case that an informed decision could be hardly made to select the variables for capturing the global trends in responses. The Penalized Blind Kriging (PBK) systematically carries out model selection with penalizing the likelihood function, which leads to improving the predictive performance of a universal Kriging model. However, the PBK demands the execution of an iterative algorithm, which involves repeatedly solving a possibly time-consuming optimization problem to find a varying optimal solution to the correlation coefficient vector. In this paper, the Lasso Kriging (LK) is proposed to not only improve the predictive performance but avoid the iterative computation. The LK selects the important variables fundamentally by solving a Lasso problem using the LARS algorithm with CV. The one-standard error rule is employed to compensate for less penalizing the regression coefficients than the PBK does. Given the selected important variables, unknown Kriging parameters are estimated in the same manner as in the universal Kriging. A linear and a nonlinear mathematical problem and seven highly nonlinear benchmark problems are used to demonstrate the effectiveness of the LK concerning the model selection and predictive performance as well as the computational efficiency. The LK proves to be an effective approach that both improves predictive accuracy as much as the PBK does and requires a little more computational complexity than the universal Kriging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Barba LA, Forsyth GF (2018) CFD Python: the 12 steps to Navier-Stokes equations. J Open Source Educ 1(9):1–3
Article Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach (2nd ed.). Springer
Coelho RF, Lebon J, Bouillard P (2011) Hierarchical stochastic metamodels based on moving least squares and polynomial chaos expansion. Struct Multidiscip Optim 43(5):707–729
Article MathSciNet Google Scholar
Constantine PG (2015) Active subspaces: emerging ideas for dimension reduction in parameter studies. Society for Industrial and Applied Mathematics, Philadelphia
Book Google Scholar
Dwight R, Han ZH (2009) Efficient uncertainty quantification using gradient-enhanced kriging. 50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 1–23
Echard B, Gayton N, Lemaire M (2011) AK-MCS: an active learning reliability method combining Kriging and Monte Carlo simulation. Struct Saf 33(2):145–154
Article Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Article MathSciNet Google Scholar
Efron B, Hastie T (2016) Computer age statistical inference: algorithms, evidence, and data science. Cambridge University Press
Forsberg J, Nilsson L (2005) On polynomial response surfaces and kriging for use in structural optimization of crashworthiness. Struct Multidiscip Optim 29(3):232–243
Article Google Scholar
Golub GH, Van Loan CF (1983) Matrix computations. Johns Hopkins University Press
Harrell FE (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction (2nd ed.). Springer
Hesterberg T, Choi NH, Meier L, Fraley C (2008) Least angle and l1 penalized regression: a review. Stat Surv 2:61–93
Article MathSciNet Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Huang D, Allen TT, Notz WI, Zheng N (2006) Global optimization of stochastic black-box systems via sequential kriging meta-models. J Glob Optim 34(3):441–466
Article MathSciNet Google Scholar
Hung Y (2011) Penalized blind Kriging in computer experiments. Stat Sin 21(3):1171–1190
Article MathSciNet Google Scholar
Joseph VR, Hung Y, Sudjianto A (2008) Blind Kriging: a new method for developing metamodels. ASME J Mech Design 130(3):1–8
Article Google Scholar
Kumano T, Jeong S, Obayashi S, Ito Y, Hatanaka K, Morin H (2006) Multidisciplinary design optimization of wing shape for a small jet aircraft using kriging model. 44th AIAA Aerospace Sciences Meeting and Exhibit 1–13
Liang H, Zhu M (2013) Comment on “metamodeling method using dynamic Kriging for design optimization”. AIAA J 51(12):2988–2989
Article Google Scholar
Liang H, Zhu M, Wu Z (2014) Using cross-validation to design trend function in Kriging surrogate modeling. AIAA J 52(10):2313–2327
Article Google Scholar
Martin JD, Simpson TW (2005) On the use of Kriging models to approximate deterministic computer models. AIAA J 43(4):853–863
Article Google Scholar
Regis RG, Shoemaker CA (2013) Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Eng Optim 45(5):529–555
Article MathSciNet Google Scholar
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–435
MathSciNet MATH Google Scholar
Schobi R, Sudret B, Wiart J (2015) Polynomial-chaos-based Kriging. Int J Uncertain Quantif 5(2):171–193
Article MathSciNet Google Scholar
Scipy (2018) Scientific tools for Python. https://www.scipy.org, Release: 1.2.0
Simpson TW, Mauery TM, Korte JJ, Mistree F (2001) Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA J 39(12):2233–2241
Article Google Scholar
Song H, Choi KK, Lamb D (2013a) A study on improving the accuracy of kriging models by using correlation model/mean structure selection and penalized log-likelihood function. 10th World Congress on Structural and Multidisciplinary Optimization 1–10
Song H, Choi KK, Lee I, Zhao L, Lamb D (2013b) Adaptive virtual support vector machine for reliability analysis of high-dimensional problems. Struct Multidiscip Optim 47(4):479–491
Article MathSciNet Google Scholar
Storn R, Price K (1997) Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc 58(1):267–288
MathSciNet MATH Google Scholar
Willmes L, Baeck T, Jin Y, Sendhoff B (2003) Comparing neural networks and kriging for fitness approximation in evolutionary optimization. IEEE Congress on Evolutionary Computation:663–670
Zhao L, Choi K, Lee I (2011) Metamodeling method using dynamic Kriging for design optimization. AIAA J 49(9):2034–2046
Article Google Scholar
Zhang Y, Park C, Kim NH, Haftka RT (2017) Function prediction at one inaccessible point using converging lines. J Mech Des 139(5):051402
Article Google Scholar
Zhang Y, Yao W, Ye S, Chen X (2019) A regularization method for constructing trend function in Kriging model. Struct Multidiscip Optim 59(4):1221–1239
Article Google Scholar
Zhang Y, Yao W, Chen X, Ye S (2020) A penalized blind likelihood Kriging method for surrogate modeling. Struct Multidiscip Optim 61(2):457–474
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc 67(2):301–320
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

BETA CAE Systems USA, Inc., Farmington Hills, MI, USA
Inseok Park

Authors

Inseok Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inseok Park.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Replication of results

The author wishes to withhold the Python source code executed to obtain the results in Section 6 for commercialization purposes. However, the algorithms needed to replicate the results are presented in Sections 3.4, 5, and Appendix. And the SciPy code for implementing the differential evolution is available online.

Additional information

Responsible Editor: Palaniappan Ramu

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Fundamental theory and algorithmic description of the LARS

Suppose we have n observed responses denoted by response vector y = (y¹, y², …, yⁿ)^T and m input variables (regressors). Let x₁, x₂,…, x_m be m column vectors of size n composing the n-by-m design matrix X_n × m = (x₁, x₂, …, x_m), m regression coefficients be denoted by coefficient vector β = (β₁, β₂, …, β_m)^T and the prediction vector be defined as $ \hat{\boldsymbol{\mu}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} $ where $ \hat{\boldsymbol{\beta}} $ is the LARS coefficient vector fit to the observed data. The LARS procedure begins with setting all the coefficients to 0, $ {\hat{\boldsymbol{\beta}}}_0=\mathbf{0} $ and standardizing both X and y using (9). Then, it identifies the variable most correlated with the current residual vector $ \mathbf{r}=\mathbf{y}-\hat{\boldsymbol{\mu}} $; at the initial step, $ {\hat{\mathbf{r}}}_0=\mathbf{y} $ since $ {\hat{\boldsymbol{\mu}}}_0=\boldsymbol{X}{\hat{\boldsymbol{\beta}}}_0=\mathbf{0} $. The correlations of m variables with the current residual can be quantified by computing a correlation vector defined as

$$ \mathbf{c}\left(\hat{\boldsymbol{\mu}}\right)={\boldsymbol{X}}^{\mathrm{T}}\left(\mathbf{y}-\hat{\boldsymbol{\mu}}\right) $$

(30)

If there are only two variables (m = 2) as illustrated by Fig. 8, the current correlations of x₁ and x₂ with r are $ {c}_1\left({\hat{\boldsymbol{\mu}}}_0\right)={\mathbf{x}}_1^{\mathrm{T}}\mathbf{y} $ and $ {c}_2\left({\hat{\boldsymbol{\mu}}}_0\right)={\mathbf{x}}_2^{\mathrm{T}}\mathbf{y} $, respectively. Suppose that x₁ is more correlated with the current residual $ {\hat{\mathbf{r}}}_0=\mathbf{y} $ than x₂: $ \left|{c}_1\left({\hat{\boldsymbol{\mu}}}_0\right)\right|>\left|{c}_2\left({\hat{\boldsymbol{\mu}}}_0\right)\right| $. The geometrical meaning of this inequality is that residual vector $ {\hat{\mathbf{r}}}_0 $ has a smaller angle with x₁ than x₂; in other words, x₁ is closer to $ {\hat{\mathbf{r}}}_0 $ than x₂. Next, since $ {c}_1\left({\hat{\boldsymbol{\mu}}}_0\right)>0 $ in the case shown in Fig. 8, the prediction vector $ {\hat{\boldsymbol{\mu}}}_0 $ is updated into $ {\hat{\boldsymbol{\mu}}}_1 $ by moving $ {\hat{\boldsymbol{\mu}}}_0 $ in the direction of x₁, which is expressed as $ {\boldsymbol{\mu}}_1={\hat{\boldsymbol{\mu}}}_0+{\gamma}_1{\mathbf{x}}_1 $ until |c₁(μ₁)| = |c₂(μ₁)|: $ {\hat{\boldsymbol{\mu}}}_1={\hat{\boldsymbol{\mu}}}_0+{\hat{\gamma}}_1{\mathbf{x}}_1 $ where $ {\hat{\gamma}}_1 $ is the quantity that makes the residual vector $ {\hat{\mathbf{r}}}_1=\mathbf{y}-{\hat{\boldsymbol{\mu}}}_1 $ bisects the angle between x₁ and x₂. Next, the prediction vector moves along $ {\hat{\mathbf{r}}}_1 $, which is expressed by $ {\boldsymbol{\mu}}_2={\hat{\boldsymbol{\mu}}}_1+{\gamma}_2{\mathbf{u}}_2 $ where u₂ is the unit vector pointing in the equiangular (least angle) direction lying along $ {\hat{\mathbf{r}}}_1 $. In the case of m = 2, $ {\hat{\gamma}}_2 $ is the value making $ {\hat{\boldsymbol{\mu}}}_2={\hat{\boldsymbol{\mu}}}_1+{\hat{\gamma}}_2{\mathbf{u}}_2={\overline{\mathbf{y}}}_2 $ where $ {\overline{\mathbf{y}}}_2 $ is the least square fit of x₁ and x₂ to y. If m > 2, the equiangular vector u and the step size $ \hat{\gamma} $ are computed as follows:

Given the prediction vector $ {\hat{\boldsymbol{\mu}}}_{\mathcal{A}} $ where $ \mathcal{A} $ denotes the current active set, the correlations of m variables with the current residual $ \mathbf{y}-{\hat{\boldsymbol{\mu}}}_{\mathcal{A}} $ are computed using (30), which can be restated for each variable x_j, j = 1, 2, …, m as

$$ {\hat{c}}_j={\mathbf{x}}_j^{\mathrm{T}}\left(\mathbf{y}-{\hat{\boldsymbol{\mu}}}_{\mathcal{A}}\right) $$

(31)

Each member in $ \mathcal{A} $ is an index for each variable having the largest absolute correlation,

$$ \hat{C}={\max}_j\left\{\left|{\hat{c}}_j\right|\right\}\mathrm{and}\mathcal{A}=\left\{j:\left|{\hat{c}}_j\right|=\hat{C}\right\} $$

(32)

Also, $ \left|{\hat{c}}_j\right|<\hat{C} $ for $ j\in {\mathcal{A}}^{\complement } $ where $ {\mathcal{A}}^{\complement } $ is the complement of $ \mathcal{A} $. For the active set $ \mathcal{A} $, define matrix $ {\boldsymbol{X}}_{\mathcal{A}} $ as

$$ {\boldsymbol{X}}_{\mathcal{A}}={\left(\cdots {s}_j{\mathbf{x}}_j\cdots \right)}_{j\in \mathcal{A}} $$

(33)

where $ {s}_j=\operatorname{sign}\left({\hat{c}}_j\right)\in \left\{-1,1\right\} $ for each $ j\in \mathcal{A} $. Let

$$ {A}_{\mathcal{A}}={\left({\mathbf{1}}_{\mathcal{A}}^{\mathrm{T}}{\boldsymbol{g}}_{\mathcal{A}}^{-1}{\mathbf{1}}_{\mathcal{A}}\right)}^{-1/2}\ \mathrm{and}\ {\boldsymbol{g}}_{\mathcal{A}}={\boldsymbol{X}}_{\mathcal{A}}^{\mathrm{T}}{\boldsymbol{X}}_{\mathcal{A}} $$

(34)

where $ {\mathbf{1}}_{\mathcal{A}} $ is a vector of ones of which the size is $ \left|\mathcal{A}\right| $, which is the number of the elements in $ \mathcal{A} $. The equiangular vector $ {\mathbf{u}}_{\mathcal{A}} $ now can be computed using

$$ {\mathbf{u}}_{\mathcal{A}}={\boldsymbol{X}}_{\mathcal{A}}{\boldsymbol{w}}_{\mathcal{A}}\ \mathrm{and}\ {\boldsymbol{w}}_{\mathcal{A}}={A}_{\mathcal{A}}{\boldsymbol{g}}_{\mathcal{A}}^{-1}{\mathbf{1}}_{\mathcal{A}} $$

(35)

$ {\mathbf{u}}_{\mathcal{A}} $ is the unit vector having the equal inner product with each column of $ {\boldsymbol{X}}_{\mathcal{A}} $ as follows:

$$ {\boldsymbol{X}}_{\mathcal{A}}^{\mathrm{T}}{\mathbf{u}}_{\mathcal{A}}={A}_{\mathcal{A}}{\mathbf{1}}_{\mathcal{A}}\ \mathrm{and}\ {\left\Vert {\mathbf{u}}_{\mathcal{A}}\right\Vert}^2=1 $$

(36)

The equation right above implies that the angle between each column vector s_jx_j of $ {\boldsymbol{X}}_{\mathcal{A}} $ and $ {\mathbf{u}}_{\mathcal{A}} $ is equal and less than 90° for each $ j\in \mathcal{A} $ since $ {A}_{\mathcal{A}}>0 $. Next, the prediction vector $ {\hat{\boldsymbol{\mu}}}_{\mathcal{A}} $ is moved along $ {\mathbf{u}}_{\mathcal{A}} $ as much as $ \hat{\gamma} $,

$$ {\hat{\boldsymbol{\mu}}}_{{\mathcal{A}}_{+}}={\hat{\boldsymbol{\mu}}}_{\mathcal{A}}+\hat{\gamma}{\mathbf{u}}_{\mathcal{A}} $$

(37)

$ \hat{\gamma} $ can be computed using

$$ \hat{\gamma}={\min^{+}}_{j\in {\mathcal{A}}^{\complement }}\left\{\frac{\hat{C}-{\hat{c}}_j}{A_{\mathcal{A}}-{a}_j},\frac{\hat{C}+{\hat{c}}_j}{A_{\mathcal{A}}+{a}_j}\right\} $$

(38)

where a_j is a component of the inner product vector a for each $ j\in {\mathcal{A}}^{\complement } $. a is defined as

$$ \boldsymbol{a}={\boldsymbol{X}}^{\mathrm{T}}{\mathbf{u}}_{\mathcal{A}} $$

(39)

“+” symbol in (38) signifies that only positive components are considered for each $ j\in {\mathcal{A}}^{\complement } $. $ \hat{\gamma} $ is the minimum of the positive component values computed using (38) for $ j\in {\mathcal{A}}^{\complement } $. The active set $ \mathcal{A} $ is also updated into $ {\mathcal{A}}_{+}=\mathcal{A}\cup \left\{\hat{j}\right\} $ where $ \hat{j} $ is the index corresponding to $ \hat{\gamma} $. (38) implies that when γ reaches $ \hat{\gamma} $ with $ j=\hat{j} $, the correlation of $ {\mathbf{x}}_{j=\hat{j}} $ with the evolving residual $ {\mathbf{r}}_{\mathcal{A}}=\mathbf{y}-\left({\hat{\boldsymbol{\mu}}}_{\mathcal{A}}+\gamma {\mathbf{u}}_{\mathcal{A}}\right) $ becomes first equal to the equally declining absolute correlation of x_j for $ j\in \mathcal{A} $ with $ {\mathbf{r}}_{\mathcal{A}} $. And, it is not possible to compute $ \hat{\gamma} $ using (38) at the last step since $ {\mathcal{A}}^{\complement }=\varnothing $. At the last step, the LARS computes $ \hat{\gamma}=\hat{C}/{A}_{\mathcal{A}} $, $ \mathcal{A}=\left\{1,2,\dots, m\ \right\} $, which makes $ {\hat{\boldsymbol{\mu}}}_m=\boldsymbol{X}{\hat{\boldsymbol{\beta}}}_m={\overline{\mathbf{y}}}_m $ where $ {\hat{\boldsymbol{\mu}}}_m $ is the prediction vector at the last step, and $ {\overline{\mathbf{y}}}_m $ is the OLS regression fit using all the m variables. The LARS estimate $ {\hat{\boldsymbol{\beta}}}_m $ at the last step equals the OLS coefficient estimate for the entire set of m variables: $ {\hat{\boldsymbol{\beta}}}_m={\hat{\boldsymbol{\beta}}}_{OLS} $.

The LARS can exactly fit the Lasso coefficient profiles only if a certain restriction is met. If this condition is violated, the original LARS should be modified to provide the exact Lasso solutions. The restriction that the Lasso puts on the LARS is that the sign of any nonzero coefficient estimate $ {\hat{\beta}}_j $ must agree with the sign of the corresponding correlation $ {\hat{c}}_j $ at any step,

$$ \operatorname{sign}\left({\hat{\beta}}_j\right)=\operatorname{sign}\left({\hat{c}}_j\right)={s}_j $$

(40)

This restriction should be checked on over the entire procedure because the LARS does not impose it. Let $ \hat{\boldsymbol{d}} $ of size m be defined as

$$ {\hat{d}}_j=\left\{\begin{array}{cc}{s}_j{w}_{{\mathcal{A}}_j}& \mathrm{for}\ j\in \mathcal{A}\\ {}0& \mathrm{for}\ j\in {\mathcal{A}}^{\complement}\end{array}\begin{array}{c}\kern0.5em \\ {}\ \end{array}\right. $$

(41)

Define γ_j for each $ j\in \mathcal{A} $ as

$$ {\gamma}_j=-\frac{{\hat{\beta}}_j}{{\hat{d}}_j} $$

(42)

where $ {\hat{\beta}}_j $ is a component of the current LARS coefficient vector $ \hat{\boldsymbol{\beta}} $ for each $ j\in \mathcal{A} $, which makes $ {\hat{\boldsymbol{\mu}}}_{\mathcal{A}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} $. Compute $ \overset{\sim }{\gamma } $ using

$$ \overset{\sim }{\gamma }=\underset{\gamma_j>0}{\min}\left\{{\gamma}_j\right\} $$

(43)

Also, if there is no γ_j > 0, $ \overset{\sim }{\gamma } $ is given infinity. (43) implies that β_j first changes sign when $ j=\overset{\sim }{j} $, which corresponds to $ \overset{\sim }{\gamma } $. If the first sign change occurs before γ reaches $ \hat{\gamma} $, then increasing γ should be stopped at $ \overset{\sim }{\gamma } $, and $ \overset{\sim }{j} $ should be removed from the active set $ \mathcal{A} $ as well. This can be algebraically expressed as follows: if $ \overset{\sim }{\gamma }<\hat{\gamma} $,

$$ {\hat{\boldsymbol{\mu}}}_{{\mathcal{A}}_{+}}={\hat{\boldsymbol{\mu}}}_{\mathcal{A}}+\overset{\sim }{\gamma }{\mathbf{u}}_{\mathcal{A}}\ \mathrm{and}\ {\mathcal{A}}_{+}=\mathcal{A}-\left\{\overset{\sim }{j}\right\} $$

(44)

Each step involved in the LARS with the lasso modification is briefly described below. The notation is slightly modified to denote each iteration.

1.
The procedure begins with standardizing X and y using (9). Set $ {\hat{\boldsymbol{\beta}}}_0=\mathbf{0} $, which leads to $ {\hat{\boldsymbol{\mu}}}_0=\mathbf{0} $.
2.
Compute $ {\hat{\mathbf{c}}}_0={\boldsymbol{X}}^{\mathrm{T}}\mathbf{y} $ and find $ {\hat{j}}_0 $ giving $ {\hat{C}}_0={\max}_j\left|{\hat{c}}_{0_j}\right| $ for j ∈ {1, 2, …, m} .
3.
Set k = 1. Define $ {\mathcal{A}}_1=\left\{{\hat{j}}_0\right\} $ and $ {\boldsymbol{X}}_{{\mathcal{A}}_1}={s}_{{\hat{j}}_0}{\mathbf{x}}_{{\hat{j}}_0} $ where $ {s}_{{\hat{j}}_0}=\operatorname{sign}\left({\hat{c}}_{0_{\hat{j}}}\right) $.
4.
Compute $ {\boldsymbol{g}}_{{\mathcal{A}}_k}={\boldsymbol{X}}_{{\mathcal{A}}_k}^{\mathrm{T}}{\boldsymbol{X}}_{{\mathcal{A}}_k} $, $ {A}_{{\mathcal{A}}_k}={\left({\mathbf{1}}_{{\mathcal{A}}_k}^{\mathrm{T}}{\boldsymbol{g}}_{{\mathcal{A}}_k}^{-1}{\mathbf{1}}_{{\mathcal{A}}_k}\right)}^{-1/2} $ and $ {\boldsymbol{w}}_{{\mathcal{A}}_k}={A}_{{\mathcal{A}}_k}{\boldsymbol{g}}_{{\mathcal{A}}_k}^{-1}{\mathbf{1}}_{\mathcal{A}} $. Define equiangular vector $ {\mathbf{u}}_k={\boldsymbol{X}}_{{\mathcal{A}}_k}{\boldsymbol{w}}_{{\mathcal{A}}_k} $. If k = 1, $ {\mathbf{u}}_k={s}_{{\hat{j}}_0}{\mathbf{x}}_{{\hat{j}}_0}/\left\Vert {\mathbf{x}}_{{\hat{j}}_0}\right\Vert $.
5.
Given the inner product vector a_k = X^Tu_k, compute the step size $ {\hat{\gamma}}_k=\underset{j\in {\mathcal{A}}_k^{\complement }}{\min^{+}}\left\{\frac{{\hat{C}}_{k-1}-{\hat{c}}_{k-{1}_j}}{A_{{\mathcal{A}}_k}-{a}_{k_j}},\frac{{\hat{C}}_{k-1}+{\hat{c}}_{k-{1}_j}}{A_{{\mathcal{A}}_k}+{a}_{k_j}}\right\} $. Determine index $ {\hat{j}}_k $ corresponding to $ {\hat{\gamma}}_k $. If $ {\mathcal{A}}_k^{\complement }=\varnothing $, $ {\hat{\gamma}}_k={\hat{C}}_{k-1}/{A}_{{\mathcal{A}}_k} $.
6.
Given the vector $ {\hat{\boldsymbol{d}}}_k $ equaling $ {s}_j{w}_{{\mathcal{A}}_{k_j}} $ for $ j\in {\mathcal{A}}_k $ where $ {s}_j=\operatorname{sign}\left({\hat{c}}_{k-{1}_j}\right) $ and 0 elsewhere, compute the Lasso modification step size $ {\overset{\sim }{\gamma}}_k=\underset{\gamma_{k_j}>0}{\min}\left\{{\gamma}_{k_j}\right\} $ where $ {\gamma}_{k_j}=-\frac{{\hat{\beta}}_{k-{1}_j}}{{\hat{d}}_{k_j}} $ for $ j\in {\mathcal{A}}_k $. Determine index $ \tilde{j}_{k} $ corresponding to $ {\overset{\sim }{\gamma}}_k $.
7. a)
If $ {\hat{\gamma}}_k\le {\overset{\sim }{\gamma}}_k $, compute the prediction vector $ {\hat{\boldsymbol{\mu}}}_k= $ $ {\hat{\boldsymbol{\mu}}}_{k-1}+{\hat{\gamma}}_k{\mathbf{u}}_k $ and the LARS coefficient vector $ {\hat{\boldsymbol{\beta}}}_k={\hat{\boldsymbol{\beta}}}_{k-1}+{\hat{\gamma}}_k{\hat{\boldsymbol{d}}}_k $. If $ {\mathcal{A}}_k^{\complement }=\varnothing $, return $ \left\{{\hat{\boldsymbol{\beta}}}_0,{\hat{\boldsymbol{\beta}}}_1,\dots, {\hat{\boldsymbol{\beta}}}_{k_{max}}\right\} $ and stop; otherwise, update the active set, $ {\mathcal{A}}_{k+1}={\mathcal{A}}_k\cup \left\{{\hat{j}}_k\right\} $.
1. b)
  If $ {\hat{\gamma}}_k>{\overset{\sim }{\gamma}}_k $, compute $ {\hat{\boldsymbol{\mu}}}_k= $ $ {\hat{\boldsymbol{\mu}}}_{k-1}+{\overset{\sim }{\gamma}}_k{\mathbf{u}}_k $ and $ {\hat{\boldsymbol{\beta}}}_k={\hat{\boldsymbol{\beta}}}_{k-1}+{\overset{\sim }{\gamma}}_k{\hat{\boldsymbol{d}}}_k $. Update the active set, $ {\mathcal{A}}_{k+1}={\mathcal{A}}_k-\left\{\tilde{j}_{k}\right\} $.
8.
Compute the correlation vector $ {\hat{\mathbf{c}}}_k={\boldsymbol{X}}^{\mathrm{T}}\left(\mathbf{y}-{\hat{\boldsymbol{\mu}}}_k\right) $ and $ {\hat{C}}_k=\underset{j}{\max}\left\{\left|{\hat{c}}_{k_j}\right|\right\} $ for j ∈ {1, 2, …, m}. If $ {\hat{C}}_k=0 $, return $ \left\{{\hat{\boldsymbol{\beta}}}_0,{\hat{\boldsymbol{\beta}}}_1,\dots, {\hat{\boldsymbol{\beta}}}_{k_{max}}\right\} $ and stop; otherwise, update $ {\boldsymbol{X}}_{{\mathcal{A}}_{k+1}}={\left(\mathbf{\cdots}{s}_j{\mathbf{x}}_j\mathbf{\cdots}\right)}_{j\in {\mathcal{A}}_{k+1}} $ where $ {s}_j=\operatorname{sign}\left({\hat{c}}_{k_j}\right) $, set k = k + 1, and go to Step 4.

A sequence of the computed LARS coefficient vectors $ {\left\{{\hat{\boldsymbol{\beta}}}_k\right\}}_0^{k_{max}} $ provides the Lasso solutions of the coefficient vector β for $ {\hat{t}}_k={\left\Vert {\hat{\boldsymbol{\beta}}}_k\right\Vert}_1={\sum}_{j=1}^m\left|{\hat{\beta}}_{k_j}\right| $, k = 0, 1, …, k_max. And, the entire coefficient profiles can be simply achieved by sequentially drawing a line between two points $ {\hat{\beta}}_{k_j} $ and $ {\hat{\beta}}_{k+{1}_j} $, k = 0, 1, …, k_max − 1 for each j along the axis of $ t={\sum}_{j=1}^m\left|{\beta}_j\right| $.

The computational complexity of the LARS is largely dependent on the computation for inverting $ {\boldsymbol{g}}_{{\mathcal{A}}_k}={\boldsymbol{X}}_{{\mathcal{A}}_k}^{\mathrm{T}}{\boldsymbol{X}}_{{\mathcal{A}}_k} $ in Step 4 each time active set $ {\mathcal{A}}_k $ is updated. Since $ {\boldsymbol{X}}_{{\mathcal{A}}_k} $ is composed by adding a column vector $ {s}_{{\hat{j}}_{k-1}}{\mathbf{x}}_{{\hat{j}}_{k-1}} $ to $ {\boldsymbol{X}}_{{\mathcal{A}}_{k-1}} $ or deleting $ {s}_{{\tilde{j}}_{k-1}}{\mathbf{x}}_{{\tilde{j}}_{k-1}} $ from $ {\boldsymbol{X}}_{{\mathcal{A}}_{k-1}} $, $ {\boldsymbol{g}}_{{\mathcal{A}}_k}^{-1} $ can be efficiently derived by updating the Cholesky factorization of $ {\boldsymbol{g}}_{{\mathcal{A}}_{k-1}} $ computed at the previous iteration (Golub and Van Loan 1983). With the Cholesky factorization implemented, the computations for fitting the entire Lasso coefficient profiles are of the same order as that of computing the OLS coefficient estimates using all the variables unless the Lasso restriction is occasionally violated, which results in considerably increasing the number of iterations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, I. Lasso Kriging for efficiently selecting a global trend model. Struct Multidisc Optim 64, 1527–1543 (2021). https://doi.org/10.1007/s00158-021-02939-7

Download citation

Received: 20 July 2020
Revised: 14 April 2021
Accepted: 23 April 2021
Published: 11 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00158-021-02939-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lasso Kriging for efficiently selecting a global trend model

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Replication of results

Additional information

Publisher's note

Appendix: Fundamental theory and algorithmic description of the LARS

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lasso Kriging for efficiently selecting a global trend model

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Replication of results

Additional information

Publisher's note

Appendix: Fundamental theory and algorithmic description of the LARS

Appendix: Fundamental theory and algorithmic description of the LARS

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation