Skip to main content
Log in

Lasso Kriging for efficiently selecting a global trend model

  • Research Paper
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

Kriging has been more and more widely used as a method to construct surrogate models in a variety of areas within the engineering field. The universal Kriging is less appealing than the ordinary Kriging in the case that an informed decision could be hardly made to select the variables for capturing the global trends in responses. The Penalized Blind Kriging (PBK) systematically carries out model selection with penalizing the likelihood function, which leads to improving the predictive performance of a universal Kriging model. However, the PBK demands the execution of an iterative algorithm, which involves repeatedly solving a possibly time-consuming optimization problem to find a varying optimal solution to the correlation coefficient vector. In this paper, the Lasso Kriging (LK) is proposed to not only improve the predictive performance but avoid the iterative computation. The LK selects the important variables fundamentally by solving a Lasso problem using the LARS algorithm with CV. The one-standard error rule is employed to compensate for less penalizing the regression coefficients than the PBK does. Given the selected important variables, unknown Kriging parameters are estimated in the same manner as in the universal Kriging. A linear and a nonlinear mathematical problem and seven highly nonlinear benchmark problems are used to demonstrate the effectiveness of the LK concerning the model selection and predictive performance as well as the computational efficiency. The LK proves to be an effective approach that both improves predictive accuracy as much as the PBK does and requires a little more computational complexity than the universal Kriging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Barba LA, Forsyth GF (2018) CFD Python: the 12 steps to Navier-Stokes equations. J Open Source Educ 1(9):1–3

    Article  Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach (2nd ed.). Springer

  • Coelho RF, Lebon J, Bouillard P (2011) Hierarchical stochastic metamodels based on moving least squares and polynomial chaos expansion. Struct Multidiscip Optim 43(5):707–729

    Article  MathSciNet  Google Scholar 

  • Constantine PG (2015) Active subspaces: emerging ideas for dimension reduction in parameter studies. Society for Industrial and Applied Mathematics, Philadelphia

    Book  Google Scholar 

  • Dwight R, Han ZH (2009) Efficient uncertainty quantification using gradient-enhanced kriging. 50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 1–23

  • Echard B, Gayton N, Lemaire M (2011) AK-MCS: an active learning reliability method combining Kriging and Monte Carlo simulation. Struct Saf 33(2):145–154

    Article  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499

    Article  MathSciNet  Google Scholar 

  • Efron B, Hastie T (2016) Computer age statistical inference: algorithms, evidence, and data science. Cambridge University Press

  • Forsberg J, Nilsson L (2005) On polynomial response surfaces and kriging for use in structural optimization of crashworthiness. Struct Multidiscip Optim 29(3):232–243

    Article  Google Scholar 

  • Golub GH, Van Loan CF (1983) Matrix computations. Johns Hopkins University Press

  • Harrell FE (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer

  • Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction (2nd ed.). Springer

  • Hesterberg T, Choi NH, Meier L, Fraley C (2008) Least angle and l1 penalized regression: a review. Stat Surv 2:61–93

    Article  MathSciNet  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  • Huang D, Allen TT, Notz WI, Zheng N (2006) Global optimization of stochastic black-box systems via sequential kriging meta-models. J Glob Optim 34(3):441–466

    Article  MathSciNet  Google Scholar 

  • Hung Y (2011) Penalized blind Kriging in computer experiments. Stat Sin 21(3):1171–1190

    Article  MathSciNet  Google Scholar 

  • Joseph VR, Hung Y, Sudjianto A (2008) Blind Kriging: a new method for developing metamodels. ASME J Mech Design 130(3):1–8

    Article  Google Scholar 

  • Kumano T, Jeong S, Obayashi S, Ito Y, Hatanaka K, Morin H (2006) Multidisciplinary design optimization of wing shape for a small jet aircraft using kriging model. 44th AIAA Aerospace Sciences Meeting and Exhibit 1–13

  • Liang H, Zhu M (2013) Comment on “metamodeling method using dynamic Kriging for design optimization”. AIAA J 51(12):2988–2989

    Article  Google Scholar 

  • Liang H, Zhu M, Wu Z (2014) Using cross-validation to design trend function in Kriging surrogate modeling. AIAA J 52(10):2313–2327

    Article  Google Scholar 

  • Martin JD, Simpson TW (2005) On the use of Kriging models to approximate deterministic computer models. AIAA J 43(4):853–863

    Article  Google Scholar 

  • Regis RG, Shoemaker CA (2013) Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Eng Optim 45(5):529–555

    Article  MathSciNet  Google Scholar 

  • Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–435

    MathSciNet  MATH  Google Scholar 

  • Schobi R, Sudret B, Wiart J (2015) Polynomial-chaos-based Kriging. Int J Uncertain Quantif 5(2):171–193

    Article  MathSciNet  Google Scholar 

  • Scipy (2018) Scientific tools for Python. https://www.scipy.org, Release: 1.2.0

  • Simpson TW, Mauery TM, Korte JJ, Mistree F (2001) Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA J 39(12):2233–2241

    Article  Google Scholar 

  • Song H, Choi KK, Lamb D (2013a) A study on improving the accuracy of kriging models by using correlation model/mean structure selection and penalized log-likelihood function. 10th World Congress on Structural and Multidisciplinary Optimization 1–10

  • Song H, Choi KK, Lee I, Zhao L, Lamb D (2013b) Adaptive virtual support vector machine for reliability analysis of high-dimensional problems. Struct Multidiscip Optim 47(4):479–491

    Article  MathSciNet  Google Scholar 

  • Storn R, Price K (1997) Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Willmes L, Baeck T, Jin Y, Sendhoff B (2003) Comparing neural networks and kriging for fitness approximation in evolutionary optimization. IEEE Congress on Evolutionary Computation:663–670

  • Zhao L, Choi K, Lee I (2011) Metamodeling method using dynamic Kriging for design optimization. AIAA J 49(9):2034–2046

    Article  Google Scholar 

  • Zhang Y, Park C, Kim NH, Haftka RT (2017) Function prediction at one inaccessible point using converging lines. J Mech Des 139(5):051402

    Article  Google Scholar 

  • Zhang Y, Yao W, Ye S, Chen X (2019) A regularization method for constructing trend function in Kriging model. Struct Multidiscip Optim 59(4):1221–1239

    Article  Google Scholar 

  • Zhang Y, Yao W, Chen X, Ye S (2020) A penalized blind likelihood Kriging method for surrogate modeling. Struct Multidiscip Optim 61(2):457–474

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc 67(2):301–320

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inseok Park.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Replication of results

The author wishes to withhold the Python source code executed to obtain the results in Section 6 for commercialization purposes. However, the algorithms needed to replicate the results are presented in Sections 3.4, 5, and Appendix. And the SciPy code for implementing the differential evolution is available online.

Additional information

Responsible Editor: Palaniappan Ramu

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Fundamental theory and algorithmic description of the LARS

Appendix: Fundamental theory and algorithmic description of the LARS

Suppose we have n observed responses denoted by response vector y = (y1, y2, …, yn)T and m input variables (regressors). Let x1x2,…, xm be m column vectors of size n composing the n-by-m design matrix Xn×m= (x1, x2, …, xm), m regression coefficients be denoted by coefficient vector β= (β1, β2, …, βm)T and the prediction vector be defined as \( \hat{\boldsymbol{\mu}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} \) where \( \hat{\boldsymbol{\beta}} \) is the LARS coefficient vector fit to the observed data. The LARS procedure begins with setting all the coefficients to 0, \( {\hat{\boldsymbol{\beta}}}_0=\mathbf{0} \) and standardizing both X and y using (9). Then, it identifies the variable most correlated with the current residual vector \( \mathbf{r}=\mathbf{y}-\hat{\boldsymbol{\mu}} \); at the initial step, \( {\hat{\mathbf{r}}}_0=\mathbf{y} \) since \( {\hat{\boldsymbol{\mu}}}_0=\boldsymbol{X}{\hat{\boldsymbol{\beta}}}_0=\mathbf{0} \). The correlations of m variables with the current residual can be quantified by computing a correlation vector defined as

$$ \mathbf{c}\left(\hat{\boldsymbol{\mu}}\right)={\boldsymbol{X}}^{\mathrm{T}}\left(\mathbf{y}-\hat{\boldsymbol{\mu}}\right) $$
(30)

If there are only two variables (m = 2) as illustrated by Fig. 8, the current correlations of x1 and x2 with r are \( {c}_1\left({\hat{\boldsymbol{\mu}}}_0\right)={\mathbf{x}}_1^{\mathrm{T}}\mathbf{y} \) and \( {c}_2\left({\hat{\boldsymbol{\mu}}}_0\right)={\mathbf{x}}_2^{\mathrm{T}}\mathbf{y} \), respectively. Suppose that x1 is more correlated with the current residual \( {\hat{\mathbf{r}}}_0=\mathbf{y} \) than x2: \( \left|{c}_1\left({\hat{\boldsymbol{\mu}}}_0\right)\right|>\left|{c}_2\left({\hat{\boldsymbol{\mu}}}_0\right)\right| \). The geometrical meaning of this inequality is that residual vector \( {\hat{\mathbf{r}}}_0 \) has a smaller angle with x1 than x2; in other words, x1 is closer to \( {\hat{\mathbf{r}}}_0 \) than x2. Next, since \( {c}_1\left({\hat{\boldsymbol{\mu}}}_0\right)>0 \) in the case shown in Fig. 8, the prediction vector \( {\hat{\boldsymbol{\mu}}}_0 \) is updated into \( {\hat{\boldsymbol{\mu}}}_1 \) by moving \( {\hat{\boldsymbol{\mu}}}_0 \) in the direction of x1, which is expressed as \( {\boldsymbol{\mu}}_1={\hat{\boldsymbol{\mu}}}_0+{\gamma}_1{\mathbf{x}}_1 \) until |c1(μ1)| = |c2(μ1)|: \( {\hat{\boldsymbol{\mu}}}_1={\hat{\boldsymbol{\mu}}}_0+{\hat{\gamma}}_1{\mathbf{x}}_1 \) where \( {\hat{\gamma}}_1 \) is the quantity that makes the residual vector \( {\hat{\mathbf{r}}}_1=\mathbf{y}-{\hat{\boldsymbol{\mu}}}_1 \) bisects the angle between x1 and x2. Next, the prediction vector moves along \( {\hat{\mathbf{r}}}_1 \), which is expressed by \( {\boldsymbol{\mu}}_2={\hat{\boldsymbol{\mu}}}_1+{\gamma}_2{\mathbf{u}}_2 \) where u2 is the unit vector pointing in the equiangular (least angle) direction lying along \( {\hat{\mathbf{r}}}_1 \). In the case of m = 2, \( {\hat{\gamma}}_2 \) is the value making \( {\hat{\boldsymbol{\mu}}}_2={\hat{\boldsymbol{\mu}}}_1+{\hat{\gamma}}_2{\mathbf{u}}_2={\overline{\mathbf{y}}}_2 \) where \( {\overline{\mathbf{y}}}_2 \) is the least square fit of x1 and x2 to y. If m > 2, the equiangular vector u and the step size \( \hat{\gamma} \) are computed as follows:

Given the prediction vector \( {\hat{\boldsymbol{\mu}}}_{\mathcal{A}} \) where \( \mathcal{A} \) denotes the current active set, the correlations of m variables with the current residual \( \mathbf{y}-{\hat{\boldsymbol{\mu}}}_{\mathcal{A}} \) are computed using (30), which can be restated for each variable xj, j = 1, 2, …, m as

$$ {\hat{c}}_j={\mathbf{x}}_j^{\mathrm{T}}\left(\mathbf{y}-{\hat{\boldsymbol{\mu}}}_{\mathcal{A}}\right) $$
(31)

Each member in \( \mathcal{A} \) is an index for each variable having the largest absolute correlation,

$$ \hat{C}={\max}_j\left\{\left|{\hat{c}}_j\right|\right\}\mathrm{and}\mathcal{A}=\left\{j:\left|{\hat{c}}_j\right|=\hat{C}\right\} $$
(32)
Fig. 8
figure 8

The LARS algorithm illustrated in the case of involving two input variables x1 and x2 (m = 2). Response is evaluated at two input points (n = 2): y = (y1, y2)T. Since m = n, \( \mathrm{y}={\overline{\mathrm{y}}}_2 \) where \( {\overline{\mathrm{y}}}_2 \) is the least square fit of x1 and x2 to y

Also, \( \left|{\hat{c}}_j\right|<\hat{C} \) for \( j\in {\mathcal{A}}^{\complement } \) where \( {\mathcal{A}}^{\complement } \) is the complement of \( \mathcal{A} \). For the active set \( \mathcal{A} \), define matrix \( {\boldsymbol{X}}_{\mathcal{A}} \) as

$$ {\boldsymbol{X}}_{\mathcal{A}}={\left(\cdots {s}_j{\mathbf{x}}_j\cdots \right)}_{j\in \mathcal{A}} $$
(33)

where \( {s}_j=\operatorname{sign}\left({\hat{c}}_j\right)\in \left\{-1,1\right\} \) for each \( j\in \mathcal{A} \). Let

$$ {A}_{\mathcal{A}}={\left({\mathbf{1}}_{\mathcal{A}}^{\mathrm{T}}{\boldsymbol{g}}_{\mathcal{A}}^{-1}{\mathbf{1}}_{\mathcal{A}}\right)}^{-1/2}\ \mathrm{and}\ {\boldsymbol{g}}_{\mathcal{A}}={\boldsymbol{X}}_{\mathcal{A}}^{\mathrm{T}}{\boldsymbol{X}}_{\mathcal{A}} $$
(34)

where \( {\mathbf{1}}_{\mathcal{A}} \) is a vector of ones of which the size is \( \left|\mathcal{A}\right| \), which is the number of the elements in \( \mathcal{A} \). The equiangular vector \( {\mathbf{u}}_{\mathcal{A}} \) now can be computed using

$$ {\mathbf{u}}_{\mathcal{A}}={\boldsymbol{X}}_{\mathcal{A}}{\boldsymbol{w}}_{\mathcal{A}}\ \mathrm{and}\ {\boldsymbol{w}}_{\mathcal{A}}={A}_{\mathcal{A}}{\boldsymbol{g}}_{\mathcal{A}}^{-1}{\mathbf{1}}_{\mathcal{A}} $$
(35)

\( {\mathbf{u}}_{\mathcal{A}} \) is the unit vector having the equal inner product with each column of \( {\boldsymbol{X}}_{\mathcal{A}} \) as follows:

$$ {\boldsymbol{X}}_{\mathcal{A}}^{\mathrm{T}}{\mathbf{u}}_{\mathcal{A}}={A}_{\mathcal{A}}{\mathbf{1}}_{\mathcal{A}}\ \mathrm{and}\ {\left\Vert {\mathbf{u}}_{\mathcal{A}}\right\Vert}^2=1 $$
(36)

The equation right above implies that the angle between each column vector sjxj of \( {\boldsymbol{X}}_{\mathcal{A}} \) and \( {\mathbf{u}}_{\mathcal{A}} \) is equal and less than 90° for each \( j\in \mathcal{A} \) since \( {A}_{\mathcal{A}}>0 \). Next, the prediction vector \( {\hat{\boldsymbol{\mu}}}_{\mathcal{A}} \) is moved along \( {\mathbf{u}}_{\mathcal{A}} \) as much as \( \hat{\gamma} \),

$$ {\hat{\boldsymbol{\mu}}}_{{\mathcal{A}}_{+}}={\hat{\boldsymbol{\mu}}}_{\mathcal{A}}+\hat{\gamma}{\mathbf{u}}_{\mathcal{A}} $$
(37)

\( \hat{\gamma} \) can be computed using

$$ \hat{\gamma}={\min^{+}}_{j\in {\mathcal{A}}^{\complement }}\left\{\frac{\hat{C}-{\hat{c}}_j}{A_{\mathcal{A}}-{a}_j},\frac{\hat{C}+{\hat{c}}_j}{A_{\mathcal{A}}+{a}_j}\right\} $$
(38)

where aj is a component of the inner product vector a for each \( j\in {\mathcal{A}}^{\complement } \). a is defined as

$$ \boldsymbol{a}={\boldsymbol{X}}^{\mathrm{T}}{\mathbf{u}}_{\mathcal{A}} $$
(39)

“+” symbol in (38) signifies that only positive components are considered for each \( j\in {\mathcal{A}}^{\complement } \). \( \hat{\gamma} \) is the minimum of the positive component values computed using (38) for \( j\in {\mathcal{A}}^{\complement } \). The active set \( \mathcal{A} \) is also updated into \( {\mathcal{A}}_{+}=\mathcal{A}\cup \left\{\hat{j}\right\} \) where \( \hat{j} \) is the index corresponding to \( \hat{\gamma} \). (38) implies that when γ reaches \( \hat{\gamma} \) with \( j=\hat{j} \), the correlation of \( {\mathbf{x}}_{j=\hat{j}} \) with the evolving residual \( {\mathbf{r}}_{\mathcal{A}}=\mathbf{y}-\left({\hat{\boldsymbol{\mu}}}_{\mathcal{A}}+\gamma {\mathbf{u}}_{\mathcal{A}}\right) \) becomes first equal to the equally declining absolute correlation of xj for \( j\in \mathcal{A} \) with \( {\mathbf{r}}_{\mathcal{A}} \). And, it is not possible to compute \( \hat{\gamma} \) using (38) at the last step since \( {\mathcal{A}}^{\complement }=\varnothing \). At the last step, the LARS computes \( \hat{\gamma}=\hat{C}/{A}_{\mathcal{A}} \), \( \mathcal{A}=\left\{1,2,\dots, m\ \right\} \), which makes \( {\hat{\boldsymbol{\mu}}}_m=\boldsymbol{X}{\hat{\boldsymbol{\beta}}}_m={\overline{\mathbf{y}}}_m \) where \( {\hat{\boldsymbol{\mu}}}_m \) is the prediction vector at the last step, and \( {\overline{\mathbf{y}}}_m \) is the OLS regression fit using all the m variables. The LARS estimate \( {\hat{\boldsymbol{\beta}}}_m \) at the last step equals the OLS coefficient estimate for the entire set of m variables: \( {\hat{\boldsymbol{\beta}}}_m={\hat{\boldsymbol{\beta}}}_{OLS} \).

The LARS can exactly fit the Lasso coefficient profiles only if a certain restriction is met. If this condition is violated, the original LARS should be modified to provide the exact Lasso solutions. The restriction that the Lasso puts on the LARS is that the sign of any nonzero coefficient estimate \( {\hat{\beta}}_j \) must agree with the sign of the corresponding correlation \( {\hat{c}}_j \) at any step,

$$ \operatorname{sign}\left({\hat{\beta}}_j\right)=\operatorname{sign}\left({\hat{c}}_j\right)={s}_j $$
(40)

This restriction should be checked on over the entire procedure because the LARS does not impose it. Let \( \hat{\boldsymbol{d}} \) of size m be defined as

$$ {\hat{d}}_j=\left\{\begin{array}{cc}{s}_j{w}_{{\mathcal{A}}_j}& \mathrm{for}\ j\in \mathcal{A}\\ {}0& \mathrm{for}\ j\in {\mathcal{A}}^{\complement}\end{array}\begin{array}{c}\kern0.5em \\ {}\ \end{array}\right. $$
(41)

Define γj for each \( j\in \mathcal{A} \) as

$$ {\gamma}_j=-\frac{{\hat{\beta}}_j}{{\hat{d}}_j} $$
(42)

where \( {\hat{\beta}}_j \) is a component of the current LARS coefficient vector \( \hat{\boldsymbol{\beta}} \) for each \( j\in \mathcal{A} \), which makes \( {\hat{\boldsymbol{\mu}}}_{\mathcal{A}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} \). Compute \( \overset{\sim }{\gamma } \) using

$$ \overset{\sim }{\gamma }=\underset{\gamma_j>0}{\min}\left\{{\gamma}_j\right\} $$
(43)

Also, if there is no γj > 0, \( \overset{\sim }{\gamma } \) is given infinity. (43) implies that βj first changes sign when \( j=\overset{\sim }{j} \), which corresponds to \( \overset{\sim }{\gamma } \). If the first sign change occurs before γ reaches \( \hat{\gamma} \), then increasing γ should be stopped at \( \overset{\sim }{\gamma } \), and \( \overset{\sim }{j} \) should be removed from the active set \( \mathcal{A} \) as well. This can be algebraically expressed as follows: if \( \overset{\sim }{\gamma }<\hat{\gamma} \),

$$ {\hat{\boldsymbol{\mu}}}_{{\mathcal{A}}_{+}}={\hat{\boldsymbol{\mu}}}_{\mathcal{A}}+\overset{\sim }{\gamma }{\mathbf{u}}_{\mathcal{A}}\ \mathrm{and}\ {\mathcal{A}}_{+}=\mathcal{A}-\left\{\overset{\sim }{j}\right\} $$
(44)

Each step involved in the LARS with the lasso modification is briefly described below. The notation is slightly modified to denote each iteration.

  1. 1.

    The procedure begins with standardizing X and y using (9). Set \( {\hat{\boldsymbol{\beta}}}_0=\mathbf{0} \), which leads to \( {\hat{\boldsymbol{\mu}}}_0=\mathbf{0} \).

  2. 2.

    Compute \( {\hat{\mathbf{c}}}_0={\boldsymbol{X}}^{\mathrm{T}}\mathbf{y} \) and find \( {\hat{j}}_0 \) giving \( {\hat{C}}_0={\max}_j\left|{\hat{c}}_{0_j}\right| \) for j ∈ {1, 2, …, m} .

  3. 3.

    Set k = 1. Define \( {\mathcal{A}}_1=\left\{{\hat{j}}_0\right\} \) and \( {\boldsymbol{X}}_{{\mathcal{A}}_1}={s}_{{\hat{j}}_0}{\mathbf{x}}_{{\hat{j}}_0} \) where \( {s}_{{\hat{j}}_0}=\operatorname{sign}\left({\hat{c}}_{0_{\hat{j}}}\right) \).

  4. 4.

    Compute \( {\boldsymbol{g}}_{{\mathcal{A}}_k}={\boldsymbol{X}}_{{\mathcal{A}}_k}^{\mathrm{T}}{\boldsymbol{X}}_{{\mathcal{A}}_k} \), \( {A}_{{\mathcal{A}}_k}={\left({\mathbf{1}}_{{\mathcal{A}}_k}^{\mathrm{T}}{\boldsymbol{g}}_{{\mathcal{A}}_k}^{-1}{\mathbf{1}}_{{\mathcal{A}}_k}\right)}^{-1/2} \) and \( {\boldsymbol{w}}_{{\mathcal{A}}_k}={A}_{{\mathcal{A}}_k}{\boldsymbol{g}}_{{\mathcal{A}}_k}^{-1}{\mathbf{1}}_{\mathcal{A}} \). Define equiangular vector \( {\mathbf{u}}_k={\boldsymbol{X}}_{{\mathcal{A}}_k}{\boldsymbol{w}}_{{\mathcal{A}}_k} \). If k = 1, \( {\mathbf{u}}_k={s}_{{\hat{j}}_0}{\mathbf{x}}_{{\hat{j}}_0}/\left\Vert {\mathbf{x}}_{{\hat{j}}_0}\right\Vert \).

  5. 5.

    Given the inner product vector ak= XTuk, compute the step size \( {\hat{\gamma}}_k=\underset{j\in {\mathcal{A}}_k^{\complement }}{\min^{+}}\left\{\frac{{\hat{C}}_{k-1}-{\hat{c}}_{k-{1}_j}}{A_{{\mathcal{A}}_k}-{a}_{k_j}},\frac{{\hat{C}}_{k-1}+{\hat{c}}_{k-{1}_j}}{A_{{\mathcal{A}}_k}+{a}_{k_j}}\right\} \). Determine index \( {\hat{j}}_k \) corresponding to \( {\hat{\gamma}}_k \). If \( {\mathcal{A}}_k^{\complement }=\varnothing \), \( {\hat{\gamma}}_k={\hat{C}}_{k-1}/{A}_{{\mathcal{A}}_k} \).

  6. 6.

    Given the vector \( {\hat{\boldsymbol{d}}}_k \) equaling \( {s}_j{w}_{{\mathcal{A}}_{k_j}} \) for \( j\in {\mathcal{A}}_k \) where \( {s}_j=\operatorname{sign}\left({\hat{c}}_{k-{1}_j}\right) \) and 0 elsewhere, compute the Lasso modification step size \( {\overset{\sim }{\gamma}}_k=\underset{\gamma_{k_j}>0}{\min}\left\{{\gamma}_{k_j}\right\} \) where \( {\gamma}_{k_j}=-\frac{{\hat{\beta}}_{k-{1}_j}}{{\hat{d}}_{k_j}} \) for \( j\in {\mathcal{A}}_k \). Determine index \( \tilde{j}_{k} \) corresponding to \( {\overset{\sim }{\gamma}}_k \).

  7. 7. a)

    If \( {\hat{\gamma}}_k\le {\overset{\sim }{\gamma}}_k \), compute the prediction vector \( {\hat{\boldsymbol{\mu}}}_k= \) \( {\hat{\boldsymbol{\mu}}}_{k-1}+{\hat{\gamma}}_k{\mathbf{u}}_k \) and the LARS coefficient vector \( {\hat{\boldsymbol{\beta}}}_k={\hat{\boldsymbol{\beta}}}_{k-1}+{\hat{\gamma}}_k{\hat{\boldsymbol{d}}}_k \). If \( {\mathcal{A}}_k^{\complement }=\varnothing \), return \( \left\{{\hat{\boldsymbol{\beta}}}_0,{\hat{\boldsymbol{\beta}}}_1,\dots, {\hat{\boldsymbol{\beta}}}_{k_{max}}\right\} \) and stop; otherwise, update the active set, \( {\mathcal{A}}_{k+1}={\mathcal{A}}_k\cup \left\{{\hat{j}}_k\right\} \).

    1. b)

      If \( {\hat{\gamma}}_k>{\overset{\sim }{\gamma}}_k \), compute \( {\hat{\boldsymbol{\mu}}}_k= \) \( {\hat{\boldsymbol{\mu}}}_{k-1}+{\overset{\sim }{\gamma}}_k{\mathbf{u}}_k \) and \( {\hat{\boldsymbol{\beta}}}_k={\hat{\boldsymbol{\beta}}}_{k-1}+{\overset{\sim }{\gamma}}_k{\hat{\boldsymbol{d}}}_k \). Update the active set, \( {\mathcal{A}}_{k+1}={\mathcal{A}}_k-\left\{\tilde{j}_{k}\right\} \).

  8. 8.

    Compute the correlation vector \( {\hat{\mathbf{c}}}_k={\boldsymbol{X}}^{\mathrm{T}}\left(\mathbf{y}-{\hat{\boldsymbol{\mu}}}_k\right) \) and \( {\hat{C}}_k=\underset{j}{\max}\left\{\left|{\hat{c}}_{k_j}\right|\right\} \) for j ∈ {1, 2, …, m}. If \( {\hat{C}}_k=0 \), return \( \left\{{\hat{\boldsymbol{\beta}}}_0,{\hat{\boldsymbol{\beta}}}_1,\dots, {\hat{\boldsymbol{\beta}}}_{k_{max}}\right\} \) and stop; otherwise, update \( {\boldsymbol{X}}_{{\mathcal{A}}_{k+1}}={\left(\mathbf{\cdots}{s}_j{\mathbf{x}}_j\mathbf{\cdots}\right)}_{j\in {\mathcal{A}}_{k+1}} \) where \( {s}_j=\operatorname{sign}\left({\hat{c}}_{k_j}\right) \), set k = k + 1, and go to Step 4.

A sequence of the computed LARS coefficient vectors \( {\left\{{\hat{\boldsymbol{\beta}}}_k\right\}}_0^{k_{max}} \) provides the Lasso solutions of the coefficient vector β for \( {\hat{t}}_k={\left\Vert {\hat{\boldsymbol{\beta}}}_k\right\Vert}_1={\sum}_{j=1}^m\left|{\hat{\beta}}_{k_j}\right| \), k = 0, 1, …, kmax. And, the entire coefficient profiles can be simply achieved by sequentially drawing a line between two points \( {\hat{\beta}}_{k_j} \) and \( {\hat{\beta}}_{k+{1}_j} \), k = 0, 1, …, kmax − 1 for each j along the axis of \( t={\sum}_{j=1}^m\left|{\beta}_j\right| \).

The computational complexity of the LARS is largely dependent on the computation for inverting \( {\boldsymbol{g}}_{{\mathcal{A}}_k}={\boldsymbol{X}}_{{\mathcal{A}}_k}^{\mathrm{T}}{\boldsymbol{X}}_{{\mathcal{A}}_k} \) in Step 4 each time active set \( {\mathcal{A}}_k \) is updated. Since \( {\boldsymbol{X}}_{{\mathcal{A}}_k} \) is composed by adding a column vector \( {s}_{{\hat{j}}_{k-1}}{\mathbf{x}}_{{\hat{j}}_{k-1}} \) to \( {\boldsymbol{X}}_{{\mathcal{A}}_{k-1}} \) or deleting \( {s}_{{\tilde{j}}_{k-1}}{\mathbf{x}}_{{\tilde{j}}_{k-1}} \) from \( {\boldsymbol{X}}_{{\mathcal{A}}_{k-1}} \), \( {\boldsymbol{g}}_{{\mathcal{A}}_k}^{-1} \) can be efficiently derived by updating the Cholesky factorization of \( {\boldsymbol{g}}_{{\mathcal{A}}_{k-1}} \) computed at the previous iteration (Golub and Van Loan 1983). With the Cholesky factorization implemented, the computations for fitting the entire Lasso coefficient profiles are of the same order as that of computing the OLS coefficient estimates using all the variables unless the Lasso restriction is occasionally violated, which results in considerably increasing the number of iterations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, I. Lasso Kriging for efficiently selecting a global trend model. Struct Multidisc Optim 64, 1527–1543 (2021). https://doi.org/10.1007/s00158-021-02939-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-021-02939-7

Keywords

Navigation