Skip to main content
Log in

Exact penalty method for knot selection of B-spline regression

  • Original Paper
  • Published:
Japan Journal of Industrial and Applied Mathematics Aims and scope Submit manuscript

Abstract

This paper presents a new approach to selecting knots at the same time as estimating the B-spline regression model. Such simultaneous selection of knots and model is not trivial, but our strategy can make it possible by employing a nonconvex regularization on the least square method that is usually applied. More specifically, motivated by the constraint that directly designates (the upper bound of) the number of knots to be used, we present an (unconstrained) regularized least square reformulation, which is later shown to be equivalent to the motivating cardinality-constrained formulation. The obtained formulation is further modified so that we can employ a proximal gradient-type algorithm, known as GIST, for a class of nonconvex nonsmooth optimization problems. We show that under a mild technical assumption, the algorithm is shown to reach a local minimum of the problem. Since it is shown that any local minimum of the problem satisfies the cardinality constraint, the proposed algorithm can be used to obtain a spline regression model that depends only on a designated number of knots at most. Numerical experiments demonstrate how our approach performs on synthetic and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

Data sets generated in this paper and MATLAB codes are available from the corresponding author on reasonable request.

References

  1. De Boor, C.: A Practical Guide to Splines. Springer, New York (1978)

    Book  Google Scholar 

  2. O’Sullivan, F.: A statistical perspective on ill-posed inverse problems. Stat. Sci. 502–518 (1986)

  3. Eilers, P.H., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)

    Article  MathSciNet  Google Scholar 

  4. Goepp, V., Bouaziz, O., Nuel, G.: Spline regression with automatic knot selection. arXiv preprint arXiv:1808.01770 (2018)

  5. Yagishita, S., Gotoh, J.: Exact penalization at d-stationary points of cardinality-or rank-constrained problem. arXiv preprint arXiv:2209.02315 (2022)

  6. Wright, S.J., Nowak, R.D., Figueiredo, M.A.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  7. Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: International Conference on Machine Learning, pp. 37–45. PMLR (2013)

  8. Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Program. 169(1), 141–176 (2018)

    Article  MathSciNet  Google Scholar 

  9. Lu, Z., Li, X.: Sparse recovery via partial regularization: models, theory, and algorithms. Math. Oper. Res. 43(4), 1290–1316 (2018)

    Article  MathSciNet  Google Scholar 

  10. Bertsimas, D., Copenhaver, M.S., Mazumder, R.: The trimmed lasso: sparsity and robustness. arXiv preprint arXiv:1708.04527 (2017)

  11. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  12. Yagishita, S., Gotoh, J.: Pursuit of the cluster structure of network lasso: recovery condition and non-convex extension. arXiv preprint arXiv:2012.07491 (2020)

  13. Kim, S.-J., Koh, K., Boyd, S., Gorinevsky, D.: \(\ell _1\) trend filtering. SIAM Rev. 51(2), 339–360 (2009)

    Article  MathSciNet  Google Scholar 

  14. Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)

    Article  MathSciNet  Google Scholar 

  15. Tibshirani, R.J.: Adaptive piecewise polynomial estimation via trend filtering. Ann. Stat. 42(1), 285–323 (2014)

    Article  MathSciNet  Google Scholar 

  16. Amir, T., Basri, R., Nadler, B.: The trimmed lasso: sparse recovery guarantees and practical optimization by the generalized soft-min penalty. SIAM J. Math. Data Sci. 3(3), 900–929 (2021)

    Article  MathSciNet  Google Scholar 

  17. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    Article  MathSciNet  Google Scholar 

  18. Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23(4), 707–716 (1986)

    Article  MathSciNet  Google Scholar 

  19. Grippo, L., Sciandrone, M.: Nonmonotone globalization techniques for the Barzilai–Borwein gradient method. Comput. Optim. Appl. 23, 143–169 (2002)

    Article  MathSciNet  Google Scholar 

  20. Nakayama, S., Gotoh, J.: On the superiority of PGMs to PDCAs in nonsmooth nonconvex sparse regression. Optim. Lett. 15(8), 2831–2860 (2021)

    Article  MathSciNet  Google Scholar 

  21. Rippe, R.C., Meulman, J.J., Eilers, P.H.: Visualization of genomic changes by segmented smoothing using an L0 penalty. PLoS One 7(6), 38230 (2012)

    Article  Google Scholar 

  22. Frommlet, F., Nuel, G.: An adaptive ridge procedure for L0 regularization. PLoS One 11(2), 0148620 (2016)

    Article  Google Scholar 

  23. Hastie, T., Tibshirani, R.: Generalized additive models. Stat. Sci. 1(3), 297–310 (1986)

    MathSciNet  Google Scholar 

  24. Alizadeh, F., Eckstein, J., Noyan, N., Rudolf, G.: Arrival rate approximation by nonnegative cubic splines. Oper. Res. 56(1), 140–156 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewer for many insightful suggestions. J. Gotoh is supported in part by JSPS KAKENHI Grant 20H00285.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shotaro Yagishita.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of propositions

The proofs omitted in the main body of the paper are displayed in the following.

1.1 Proof of Theorem 2.1

To prove Theorem 2.1 we start with the following lemma.

Lemma A.1

For \(p\ge 0\), let \(s=\sum _{j=-p}^{l-1}\alpha _{j+p+1}B^{(p)}_j,~ \varvec{\alpha }=(\alpha _1,\dots ,\alpha _{p+l})^\top \) and \(i\in \{1,\ldots ,l-1\}\). The coefficient of the pth order term of s over the interval \((t_i,t_{i+1})\) is denoted by the ith element of the vector \(\varvec{D}^{(p+1)}_+\varvec{\alpha }\), and that over the interval \((t_{i-1},t_{i})\) is denoted by the ith element of the vector \(\varvec{D}^{(p+1)}_-\varvec{\alpha }\).

Proof

We will prove by induction. For \(p=0\), we see that

$$\begin{aligned} s&=\sum _{j=0}^{l-1}\alpha _{j+1}\varvec{1}_{[t_j,t_{j+1})}\\&=\sum _{j=1}^{l}\alpha _{j}\varvec{1}_{[t_{j-1},t_j)}, \end{aligned}$$

\((\varvec{D}^{(1)}_+\varvec{\alpha })_i=\alpha _{i+1}\), and \((\varvec{D}^{(1)}_-\varvec{\alpha })_i=\alpha _i\). Thus the statement holds for \(p=0\).

Next, we assume that the statement holds for \(p-1\). We obtain from the definition of \(B^{(p)}_j\) that

$$\begin{aligned} s(x)&=\sum _{j=-p}^{l-1}\alpha _{j+p+1}B^{(p)}_j(x)\\&=\sum _{j=-p}^{l-1}\alpha _{j+p+1}\left\{ \frac{x-t_{j}}{t_{j+p}-t_{j}}B^{(p-1)}_j(x)+\frac{t_{j+p+1}-x}{t_{j+p+1}-t_{j+1}}B^{(p-1)}_{j+1}(x)\right\} \\&=\sum _{j=-(p-1)}^{l-1}\left\{ \alpha _{j+p+1}\frac{x-t_{j}}{t_{j+p}-t_{j}}+\alpha _{j+p}\frac{t_{j+p}-x}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x)\\&=\sum _{j=-(p-1)}^{l-1}\left\{ \frac{\alpha _{j+p+1}-\alpha _{j+p}}{t_{j+p}-t_j}x-\frac{\alpha _{j+p+1}t_j-\alpha _{j+p}t_{j+p}}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x)\\&=\sum _{j=-(p-1)}^{l-1}\left\{ (\varvec{\Delta }^{(p+1)}\varvec{\alpha })_{j+p}x-\frac{\alpha _{j+p+1}t_j-\alpha _{j+p}t_{j+p}}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x)\\&=x\sum _{j=-(p-1)}^{l-1}(\varvec{\Delta }^{(p+1)}\varvec{\alpha })_{j+p}B^{(p-1)}_j(x)\\&\quad -\sum _{j=-(p-1)}^{l-1}\left\{ \frac{\alpha _{j+p+1}t_j-\alpha _{j+p}t_{j+p}}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x) \end{aligned}$$

for all \(x\in [t_0,t_l)\), where the third equality follows from \(B^{(q-1)}_{-q}=B^{(q-1)}_l=0\) on \([t_0,t_l)\). Using the induction hypothesis, we see that the coefficient of the \((p-1)\)th order term of s on the interval \((t_i,t_{i+1})\) is \((\varvec{D}^{(p)}_+\varvec{\Delta }^{(p+1)}\varvec{\alpha })_i=(\varvec{D}^{(p+1)}_+\varvec{\alpha })_i\), and that on the interval \((t_{i-1},t_{i})\) is \((\varvec{D}^{(p)}_-\varvec{\Delta }^{(p+1)}\varvec{\alpha })_i=(\varvec{D}^{(p+1)}_-\varvec{\alpha })_i\). This completes the proof. \(\square \)

proof of Theorem 2.1

Note that \(B^{(p)}_{-p},\ldots ,B^{(p)}_{l-1}\) restricted to \([t_0,t_l)\) is a basis of the linear space consisting of piece wise polynomials of order p on \([t_0,t_l)\) with breakpoints \(t_1,\ldots ,t_{l-1}\) whose derivatives coincide up to order \(p-1\) at all the breakpoints [1, pp. 97–98]. Thus the function s does not use the ith knot if and only if the coefficient of the pth order term of s over the interval \((t_i,t_{i+1})\) coincides with that over the interval \((t_{i-1},t_{i})\). Since it holds that \(\varvec{D}^{(p+1)}=\varvec{D}^{(p+1)}_+-\varvec{D}^{(p+1)}_-\), \((\varvec{D}^{(p+1)}\varvec{\alpha })_{i}=0\) is equivalent to \((\varvec{D}^{(p+1)}_+\varvec{\alpha })_i=(\varvec{D}^{(p+1)}_-\varvec{\alpha })_i\). Therefore, we have the desired result from Lemma A.1. \(\square \)

1.2 Proof of Proposition 3.1

Proof

Let \(\varvec{\beta }^*\) be a local minimum of (10), that is, there exists a neighborhood \({\mathcal {N}}\) of \(\varvec{\beta }^*\) such that \(F(\varvec{\beta }^*)\le F(\varvec{\beta })\) holds for any \(\varvec{\beta }\in {\mathcal {N}}\). Noting that \(F(\varvec{\beta })=G(\varvec{\beta },g(\varvec{\beta }))\) holds for any \(\varvec{\beta }\in {\mathbb {R}}^{l-1}\), we have

$$\begin{aligned} G(\varvec{\beta }^*,g(\varvec{\beta }^*))=F(\varvec{\beta }^*)\le F(\varvec{\beta })\le G(\varvec{\beta },\varvec{\beta }') \end{aligned}$$

for any \(\varvec{\beta }\in {\mathcal {N}}\) and \(\varvec{\beta }'\in {\mathbb {R}}^{p+1}\), which implies that \((\varvec{\beta }^*,g(\varvec{\beta }^*))\) is locally optimal to (9). It is clear that the local optimality of \((\varvec{\beta }^*,g(\varvec{\beta }^*))\) to (9) and \(\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top \) to (8) are equivalent. This completes the proof of the former argument. The latter claim can be proved as well. \(\square \)

1.3 Proof of Theorem 3.1

Proof

Let \(h(\varvec{\beta }):=\frac{1}{2}\big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }\big \Vert _2^2+\frac{c}{2}\big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }\big \Vert _2^2\). We see from the d-stationarity of \(\varvec{\beta }^*\) that

$$\begin{aligned} \begin{aligned}&\nabla h(\varvec{\beta }^*)^\top (0-\varvec{\beta }^*)+\gamma T_K'(\varvec{\beta }^*;-\varvec{\beta }^*)\\&\quad =h'(\varvec{\beta }^*;-\varvec{\beta }^*)+\gamma T_K'(\varvec{\beta }^*;-\varvec{\beta }^*)\\&\quad =F'(\varvec{\beta }^*;-\varvec{\beta }^*)\\&\quad \ge 0. \end{aligned} \end{aligned}$$
(A1)

Since it is easy to see that \(T_K'(\varvec{\beta }^*;-\varvec{\beta }^*)=-T_K(\varvec{\beta }^*)\), we have

$$\begin{aligned} \Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2\ge \big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }^*\big \Vert _2^2+c\big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }^*\big \Vert _2^2 \end{aligned}$$

from (A1), the convexity of h, and the nonnegativity of \(T_K\). This leads to

$$\begin{aligned} \big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }^*\big \Vert _2\le \sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}, \quad \sqrt{c}\big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }^*\big \Vert _2\le \sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2} \end{aligned}$$

and hence we can evaluate as

$$\begin{aligned} \nabla h(\varvec{\beta }^*)^\top \varvec{d}&\le \Vert \nabla h(\varvec{\beta }^*)\Vert _\infty \\&=\max _{j=1,\ldots ,l-1}\left| {\varvec{l}^{(1)}_j}^\top (\varvec{z}_1-\varvec{L}_1\varvec{\beta }^*)+c{\varvec{l}^{(2)}_j}^\top (\varvec{z}_2-\varvec{L}_2\varvec{\beta }^*)\right| \\&\le \max _{j=1,\ldots ,l-1}\left\{ \Vert \varvec{l}^{(1)}_j\Vert _2\Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }^*\Vert _2+c\Vert \varvec{l}^{(2)}_j\Vert _2\Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }^*\Vert _2\right\} \\&\le \max _{j=1,\ldots ,l-1}\left\{ \Vert \varvec{l}^{(1)}_j\Vert _2\sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}+\sqrt{c}\Vert \varvec{l}^{(2)}_j\Vert _2\sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}\right\} \\&=\max _{j=1,\ldots ,l-1}\left\{ \Vert \varvec{l}^{(1)}_j\Vert _2+\sqrt{c}\Vert \varvec{l}^{(2)}_j\Vert _2\right\} \sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}. \end{aligned}$$

for all d such that \(\Vert d\Vert _1=1\) and \(d\in \{-1,0,1\}^{l-1}\). From Lemma 4 of Yagishita and Gotoh [5], we obtain \(\Vert \varvec{\beta }^*\Vert _0\le K\). Noting that

$$\begin{aligned} \begin{pmatrix} \varvec{D}^{(p+1)}\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top \\ \varvec{A}\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top \end{pmatrix} = \hat{\varvec{D}}^{(p+1)}\varvec{\Sigma }^{(p+1)} \begin{pmatrix} \varvec{\beta }^*\\ g(\varvec{\beta }^*) \end{pmatrix} = \begin{pmatrix} \varvec{\beta }^*\\ g(\varvec{\beta }^*) \end{pmatrix} \end{aligned}$$

and that \(\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top \) is local minimum of (8) according to Proposition 3.1, we have the desired result. \(\square \)

1.4 Proof of Theorem 3.2

Proof

Let \(\Omega :=\{\varvec{\beta }\mid F(\varvec{\beta })\le F(\varvec{\beta }_0)\}\). Note that \(\{\varvec{\beta }_t\}\subset \Omega \). From the nonnegativity of \(T_K\), it holds that \(h(\varvec{\beta })\le F(\varvec{\beta }_0)\) for any \(\varvec{\beta }\in \Omega \), which leads to

$$\begin{aligned} \big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }\big \Vert _2\le \sqrt{2F(\varvec{\beta }_0)}, \quad \big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }\big \Vert _2\le \sqrt{\frac{2F(\varvec{\beta }_0)}{c}}. \end{aligned}$$

Thus, we have

$$\begin{aligned} \Vert \nabla h(\varvec{\beta })\Vert _2&=\left\| \varvec{L}_1^\top (\varvec{z}_1-\varvec{L}_1\varvec{\beta })+c\varvec{L}_2^\top (\varvec{z}_2-\varvec{L}_2\varvec{\beta })\right\| _2\\&\le \Vert \varvec{L}_1\Vert \left\| \varvec{z}_1-\varvec{L}_1\varvec{\beta }\right\| _2+c\Vert \varvec{L}_2\Vert \left\| \varvec{z}_2-\varvec{L}_2\varvec{\beta }\right\| _2\\&\le \Vert \varvec{L}_1\Vert \sqrt{2F(\varvec{\beta }_0)}+\Vert \varvec{L}_2\Vert \sqrt{2cF(\varvec{\beta }_0)} \end{aligned}$$

for any \(\varvec{\beta }\in \Omega \) where \(\Vert \cdot \Vert \) is the operator norm, which implies that h is Lipschitz continuous on \(\Omega \). Furthermore, \(T_K\) is also Lipschitz continuous because it is expressed as the difference between the \(\ell _1\) norm and the largest-K norm. As a result, F is Lipschitz continuous on \(\Omega \), namely, is also uniformly continuous on \(\Omega \). Combining the uniform continuity and nonnegativity of F with the Lipschitz continuity of \(\nabla h\) yields \(\Vert \varvec{\beta }_{t+1}-\varvec{\beta }_t\Vert _2\rightarrow 0\), similarly to the proof of Lemma 4 of Wright et al. [6]. Let \(\varvec{\beta }^*\) be an accumulation point of \(\{\varvec{\beta }_t\}\) and \(\{\varvec{\beta }_{t_i}\}\) be a subsequence that converges to \(\varvec{\beta }^*\). Since it is easy to see that \(\{\eta _t\}\) is bounded (see, for example, Lu and Li [9, Theorem 5.1]), for any \(\varvec{d}\in {\mathbb {R}}^{l-1}\), it follows from the optimality of \(\varvec{\beta }_{t_i+1}\) that

$$\begin{aligned}&\nabla h(\varvec{\beta }_{t_i})^\top \varvec{\beta }_{t_i+1}+\frac{\eta _{t_i}}{2}\Vert \varvec{\beta }_{t_i+1}-\varvec{\beta }_{t_i}\Vert _2^2+T_K(\varvec{\beta }_{t_i+1})\\&\quad \le \nabla h(\varvec{\beta }_{t_i})^\top (\varvec{\beta }^*+\xi \varvec{d})+\frac{\eta _{t_i}}{2}\Vert \varvec{\beta }^*+\xi \varvec{d}-\varvec{\beta }_{t_i}\Vert _2^2+T_K(\varvec{\beta }^*+\xi \varvec{d})\\&\quad \le \nabla h(\varvec{\beta }_{t_i})^\top (\varvec{\beta }^*+\xi \varvec{d})+\frac{\eta _{\max }}{2}\Vert \varvec{\beta }^*+\xi \varvec{d}-\varvec{\beta }_{t_i}\Vert _2^2+T_K(\varvec{\beta }^*+\xi \varvec{d}) \end{aligned}$$

for \(\xi >0\), where \(\eta _{\max }:=\sup _t\eta _t\). We obtain from the continuity of \(\nabla h\) and \(T_K\) that

$$\begin{aligned} \xi \nabla h(\varvec{\beta }^*)^\top \varvec{d}+\frac{\eta _{\max }\xi ^2}{2}\Vert \varvec{d}\Vert _2^2+T_K(\varvec{\beta }^*+\xi \varvec{d})-T_K(\varvec{\beta }^*)\ge 0. \end{aligned}$$

Dividing both sides by \(\xi \) and taking the limit \(\xi \rightarrow 0\) give

$$\begin{aligned} F'(\varvec{\beta }^*;\varvec{d})=\nabla h(\varvec{\beta }^*)^\top d+\gamma T_K'(\varvec{\beta }^*;\varvec{d})\ge 0, \end{aligned}$$

which implies that \(\varvec{\beta }^*\) is a d-stationary point of (10), that is, a local minimum of (10).

Next, let \(\varvec{\alpha }^*\) be an accumulation point of \(\big \{\varvec{\Sigma }^{(p+1)}(\varvec{\beta }_t^\top ,g(\varvec{\beta }_t)^\top )^\top \big \}\) and \(\big \{\varvec{\Sigma }^{(p+1)}(\varvec{\beta }_{t_i}^\top ,g(\varvec{\beta }_{t_i})^\top )^\top \big \}\) be a subsequence that converges to \(\varvec{\alpha }^*\). We see that

$$\begin{aligned} \begin{aligned} \begin{pmatrix} \varvec{\beta }_{t_i}\\ g(\varvec{\beta }_{t_i}) \end{pmatrix} =\hat{\varvec{D}}^{(p+1)}\varvec{\Sigma }^{(p+1)} \begin{pmatrix} \varvec{\beta }_{t_i}\\ g(\varvec{\beta }_{t_i}) \end{pmatrix} \rightarrow \hat{\varvec{D}}^{(p+1)}\varvec{\alpha }^*= \begin{pmatrix} \varvec{D}^{(p+1)}\varvec{\alpha }^*\\ \varvec{A}\varvec{\alpha }^* \end{pmatrix}, \end{aligned} \end{aligned}$$
(A2)

which implies that \(\varvec{D}^{(p+1)}\varvec{\alpha }^*\) is a local minimum of (10) because it is an accumulation point of \(\{\varvec{\beta }_t\}\). It follows from (A2) and the continuity of g that \(g(\varvec{D}^{(p+1)}\varvec{\alpha }^*)=\varvec{A}\varvec{\alpha }^*\) and hence we obtain from Proposition 3.1 that

$$\begin{aligned} \varvec{\alpha }^*=\varvec{\Sigma }^{(p+1)}\hat{\varvec{D}}^{(p+1)}\varvec{\alpha }^*=\varvec{\Sigma }^{(p+1)} \begin{pmatrix} \varvec{D}^{(p+1)}\varvec{\alpha }^*\\ g(\varvec{D}^{(p+1)}\varvec{\alpha }^*) \end{pmatrix} \end{aligned}$$

is locally optimal to (8). \(\square \)

Appendix B: Construction of expanded difference matrix and its inverse

In this section, concrete constructions of \(\hat{\varvec{D}}^{(p+1)}\) and \(\varvec{\Sigma }^{(p+1)}\) are shown. Let us define

$$\begin{aligned}&\hat{\varvec{D}}^{(1)}:=\begin{pmatrix} -1 &{}{} 1 &{}{} &{}{} &{}{} &{}{} &{}{} &{}{} \\ {} &{}{} \ddots &{}{} \ddots &{}{} &{}{} &{}{} &{}{} &{}{} \\ {} &{}{} &{}{} -1 &{}{} 1 &{}{} &{}{} &{}{} \\ {} &{}{} &{}{} &{}{} s_1 &{}{} &{}{} &{}{} \\ {} &{}{} &{}{} &{}{} &{}{} &{}{} \ddots &{}{} \\ {} &{}{} &{}{} &{}{} &{}{} &{}{} &{}{} s_1 \\ \end{pmatrix}\in {\mathbb {R}}^{(l+p)\times (l+p)}\end{aligned}$$

by expanding \(\varvec{D}^{(1)}\) with \(s_1\ne 0\) and

$$\begin{aligned} \hat{\varvec{\Delta }}^{(q+1)}:=\begin{pmatrix} \frac{-1}{t_1-t_{-q+1}} &{} \frac{1}{t_1-t_{-q+1}} &{} &{} &{} \\ &{} \frac{-1}{t_2-t_{-q+2}} &{} \frac{1}{t_2-t_{-q+2}} &{} &{} \\ &{} &{} \ddots &{} \ddots &{} \\ &{} &{} &{} \frac{-1}{t_{l-1+q}-t_{l-1}} &{} \frac{1}{t_{l-1+q}-t_{l-1}} \\ &{} &{} &{} &{} s_{q+1} &{} &{} \\ &{} &{} &{} &{} &{} \ddots &{} \\ &{} &{} &{} &{} &{} &{} s_{q+1} \\ \end{pmatrix} \in {\mathbb {R}}^{(l+p)\times (l+p)} \end{aligned}$$

by expanding \(\varvec{\Delta }^{(q+1)}\) with \(s_{q+1}\ne 0\) for \(1\le q\le p\). Let

$$\begin{aligned} \hat{\varvec{D}}^{(q+1)}:=\hat{\varvec{D}}^{(q)}\hat{\varvec{\Delta }}^{(q+1)} \end{aligned}$$

recursively, then there exists a \((p+1)\times (l+p)\) matrix \(\varvec{A}\) such that

$$\begin{aligned} \hat{\varvec{D}}^{(p+1)}= \begin{pmatrix} \varvec{D}^{(p+1)}\\ \varvec{A} \end{pmatrix}. \end{aligned}$$

By constructions of \(\hat{\varvec{D}}^{(1)}\) and \(\hat{\varvec{\Delta }}^{(q+1)}\), they are non-singular, and hence \(\hat{\varvec{D}}^{(p+1)}\) is also non-singular. It is not hard to see that

$$\begin{aligned} \big (\hat{\varvec{D}}^{(1)}\big )^{-1}= \begin{pmatrix} -\varvec{U} &{} \varvec{S} \\ &{} s_{1}^{-1}\varvec{I}_{p+1} \end{pmatrix} \end{aligned}$$
(B3)

and

$$\begin{aligned} \big (\hat{\varvec{\Delta }}^{(q+1)}\big )^{-1}= \begin{pmatrix} -(t_1-t_{-q+1}) &{} -(t_2-t_{-q+2}) &{} \cdots &{} -(t_{l-1+q}-t_{l-1}) &{} s_{q+1}^{-1} \\ &{} -(t_2-t_{-q+2}) &{} \cdots &{} -(t_{l-1+q}-t_{l-1}) &{} s_{q+1}^{-1} \\ &{} &{} \ddots &{} \vdots &{} \vdots \\ &{} &{} &{} -(t_{l-1+q}-t_{l-1}) &{} \vdots \\ &{} &{} &{} &{} s_{q+1}^{-1} &{} &{} \\ &{} &{} &{} &{} &{} \ddots &{} \\ &{} &{} &{} &{} &{} &{} s_{q+1}^{-1} \\ \end{pmatrix} \end{aligned}$$
(B4)

for \(1\le q\le p\), where \(\varvec{U}\) is the upper triangular matrix of size \((l-1)\times (l-1)\) such that all non-zero elements equal 1 and

$$\begin{aligned} \varvec{S}= \begin{pmatrix} s_{1}^{-1} &{}{} 0 &{}{} \ldots &{}{} 0 \\ \vdots &{}{} \vdots &{}{} &{}{} \vdots \\ s_{1}^{-1} &{}{} 0 &{}{} \ldots &{}{} 0 \\ \end{pmatrix} \in {\mathbb {R}}^{(l-1)\times (p+1)}. \end{aligned}$$

As a result, we can compute as

$$\begin{aligned} \big (\varvec{\Sigma }^{(p+1)}\big )^{-1}=\big (\hat{\varvec{D}}^{(1)}\big )^{-1}\cdots \big (\hat{\varvec{\Delta }}^{(p+1)}\big )^{-1} \end{aligned}$$

by using (B3) and (B4).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yagishita, S., Gotoh, Jy. Exact penalty method for knot selection of B-spline regression. Japan J. Indust. Appl. Math. 41, 1033–1059 (2024). https://doi.org/10.1007/s13160-023-00631-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13160-023-00631-5

Keywords

Mathematics Subject Classification

Navigation