Exact penalty method for knot selection of B-spline regression

Yagishita, Shotaro; Gotoh, Jun-ya

doi:10.1007/s13160-023-00631-5

Exact penalty method for knot selection of B-spline regression

Original Paper
Published: 03 December 2023

Volume 41, pages 1033–1059, (2024)
Cite this article

Japan Journal of Industrial and Applied Mathematics Aims and scope Submit manuscript

40 Accesses
Explore all metrics

Abstract

This paper presents a new approach to selecting knots at the same time as estimating the B-spline regression model. Such simultaneous selection of knots and model is not trivial, but our strategy can make it possible by employing a nonconvex regularization on the least square method that is usually applied. More specifically, motivated by the constraint that directly designates (the upper bound of) the number of knots to be used, we present an (unconstrained) regularized least square reformulation, which is later shown to be equivalent to the motivating cardinality-constrained formulation. The obtained formulation is further modified so that we can employ a proximal gradient-type algorithm, known as GIST, for a class of nonconvex nonsmooth optimization problems. We show that under a mild technical assumption, the algorithm is shown to reach a local minimum of the problem. Since it is shown that any local minimum of the problem satisfies the cardinality constraint, the proposed algorithm can be used to obtain a spline regression model that depends only on a designated number of knots at most. Numerical experiments demonstrate how our approach performs on synthetic and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Globally optimal univariate spline approximations

Article Open access 28 February 2023

CG-Lasso Estimator for Multivariate Adaptive Regression Spline

Chebyshev Approximation by Linear Combinations of Fixed Knot Polynomial Splines with Weighting Functions

Article 05 February 2016

Data Availability

Data sets generated in this paper and MATLAB codes are available from the corresponding author on reasonable request.

References

De Boor, C.: A Practical Guide to Splines. Springer, New York (1978)
Book Google Scholar
O’Sullivan, F.: A statistical perspective on ill-posed inverse problems. Stat. Sci. 502–518 (1986)
Eilers, P.H., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)
Article MathSciNet Google Scholar
Goepp, V., Bouaziz, O., Nuel, G.: Spline regression with automatic knot selection. arXiv preprint arXiv:1808.01770 (2018)
Yagishita, S., Gotoh, J.: Exact penalization at d-stationary points of cardinality-or rank-constrained problem. arXiv preprint arXiv:2209.02315 (2022)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Article MathSciNet Google Scholar
Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: International Conference on Machine Learning, pp. 37–45. PMLR (2013)
Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Program. 169(1), 141–176 (2018)
Article MathSciNet Google Scholar
Lu, Z., Li, X.: Sparse recovery via partial regularization: models, theory, and algorithms. Math. Oper. Res. 43(4), 1290–1316 (2018)
Article MathSciNet Google Scholar
Bertsimas, D., Copenhaver, M.S., Mazumder, R.: The trimmed lasso: sparsity and robustness. arXiv preprint arXiv:1708.04527 (2017)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Yagishita, S., Gotoh, J.: Pursuit of the cluster structure of network lasso: recovery condition and non-convex extension. arXiv preprint arXiv:2012.07491 (2020)
Kim, S.-J., Koh, K., Boyd, S., Gorinevsky, D.: $\ell _1$ trend filtering. SIAM Rev. 51(2), 339–360 (2009)
Article MathSciNet Google Scholar
Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)
Article MathSciNet Google Scholar
Tibshirani, R.J.: Adaptive piecewise polynomial estimation via trend filtering. Ann. Stat. 42(1), 285–323 (2014)
Article MathSciNet Google Scholar
Amir, T., Basri, R., Nadler, B.: The trimmed lasso: sparse recovery guarantees and practical optimization by the generalized soft-min penalty. SIAM J. Math. Data Sci. 3(3), 900–929 (2021)
Article MathSciNet Google Scholar
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Article MathSciNet Google Scholar
Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23(4), 707–716 (1986)
Article MathSciNet Google Scholar
Grippo, L., Sciandrone, M.: Nonmonotone globalization techniques for the Barzilai–Borwein gradient method. Comput. Optim. Appl. 23, 143–169 (2002)
Article MathSciNet Google Scholar
Nakayama, S., Gotoh, J.: On the superiority of PGMs to PDCAs in nonsmooth nonconvex sparse regression. Optim. Lett. 15(8), 2831–2860 (2021)
Article MathSciNet Google Scholar
Rippe, R.C., Meulman, J.J., Eilers, P.H.: Visualization of genomic changes by segmented smoothing using an L0 penalty. PLoS One 7(6), 38230 (2012)
Article Google Scholar
Frommlet, F., Nuel, G.: An adaptive ridge procedure for L0 regularization. PLoS One 11(2), 0148620 (2016)
Article Google Scholar
Hastie, T., Tibshirani, R.: Generalized additive models. Stat. Sci. 1(3), 297–310 (1986)
MathSciNet Google Scholar
Alizadeh, F., Eckstein, J., Noyan, N., Rudolf, G.: Arrival rate approximation by nonnegative cubic splines. Oper. Res. 56(1), 140–156 (2008)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewer for many insightful suggestions. J. Gotoh is supported in part by JSPS KAKENHI Grant 20H00285.

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Shotaro Yagishita
Department of Data Science for Business Innovation, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Jun-ya Gotoh

Authors

Shotaro Yagishita
View author publications
You can also search for this author in PubMed Google Scholar
Jun-ya Gotoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shotaro Yagishita.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of propositions

The proofs omitted in the main body of the paper are displayed in the following.

1.1 Proof of Theorem 2.1

To prove Theorem 2.1 we start with the following lemma.

Lemma A.1

For $p\ge 0$, let $s=\sum _{j=-p}^{l-1}\alpha _{j+p+1}B^{(p)}_j,~ \varvec{\alpha }=(\alpha _1,\dots ,\alpha _{p+l})^\top $ and $i\in \{1,\ldots ,l-1\}$. The coefficient of the pth order term of s over the interval $(t_i,t_{i+1})$ is denoted by the ith element of the vector $\varvec{D}^{(p+1)}_+\varvec{\alpha }$, and that over the interval $(t_{i-1},t_{i})$ is denoted by the ith element of the vector $\varvec{D}^{(p+1)}_-\varvec{\alpha }$.

Proof

We will prove by induction. For $p=0$, we see that

$$\begin{aligned} s&=\sum _{j=0}^{l-1}\alpha _{j+1}\varvec{1}_{[t_j,t_{j+1})}\\&=\sum _{j=1}^{l}\alpha _{j}\varvec{1}_{[t_{j-1},t_j)}, \end{aligned}$$

$(\varvec{D}^{(1)}_+\varvec{\alpha })_i=\alpha _{i+1}$, and $(\varvec{D}^{(1)}_-\varvec{\alpha })_i=\alpha _i$. Thus the statement holds for $p=0$.

Next, we assume that the statement holds for $p-1$. We obtain from the definition of $B^{(p)}_j$ that

$$\begin{aligned} s(x)&=\sum _{j=-p}^{l-1}\alpha _{j+p+1}B^{(p)}_j(x)\\&=\sum _{j=-p}^{l-1}\alpha _{j+p+1}\left\{ \frac{x-t_{j}}{t_{j+p}-t_{j}}B^{(p-1)}_j(x)+\frac{t_{j+p+1}-x}{t_{j+p+1}-t_{j+1}}B^{(p-1)}_{j+1}(x)\right\} \\&=\sum _{j=-(p-1)}^{l-1}\left\{ \alpha _{j+p+1}\frac{x-t_{j}}{t_{j+p}-t_{j}}+\alpha _{j+p}\frac{t_{j+p}-x}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x)\\&=\sum _{j=-(p-1)}^{l-1}\left\{ \frac{\alpha _{j+p+1}-\alpha _{j+p}}{t_{j+p}-t_j}x-\frac{\alpha _{j+p+1}t_j-\alpha _{j+p}t_{j+p}}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x)\\&=\sum _{j=-(p-1)}^{l-1}\left\{ (\varvec{\Delta }^{(p+1)}\varvec{\alpha })_{j+p}x-\frac{\alpha _{j+p+1}t_j-\alpha _{j+p}t_{j+p}}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x)\\&=x\sum _{j=-(p-1)}^{l-1}(\varvec{\Delta }^{(p+1)}\varvec{\alpha })_{j+p}B^{(p-1)}_j(x)\\&\quad -\sum _{j=-(p-1)}^{l-1}\left\{ \frac{\alpha _{j+p+1}t_j-\alpha _{j+p}t_{j+p}}{t_{j+p}-t_j}\right\} B^{(p-1)}_j(x) \end{aligned}$$

for all $x\in [t_0,t_l)$, where the third equality follows from $B^{(q-1)}_{-q}=B^{(q-1)}_l=0$ on $[t_0,t_l)$. Using the induction hypothesis, we see that the coefficient of the $(p-1)$th order term of s on the interval $(t_i,t_{i+1})$ is $(\varvec{D}^{(p)}_+\varvec{\Delta }^{(p+1)}\varvec{\alpha })_i=(\varvec{D}^{(p+1)}_+\varvec{\alpha })_i$, and that on the interval $(t_{i-1},t_{i})$ is $(\varvec{D}^{(p)}_-\varvec{\Delta }^{(p+1)}\varvec{\alpha })_i=(\varvec{D}^{(p+1)}_-\varvec{\alpha })_i$. This completes the proof. $\square $

proof of Theorem 2.1

Note that $B^{(p)}_{-p},\ldots ,B^{(p)}_{l-1}$ restricted to $[t_0,t_l)$ is a basis of the linear space consisting of piece wise polynomials of order p on $[t_0,t_l)$ with breakpoints $t_1,\ldots ,t_{l-1}$ whose derivatives coincide up to order $p-1$ at all the breakpoints [1, pp. 97–98]. Thus the function s does not use the ith knot if and only if the coefficient of the pth order term of s over the interval $(t_i,t_{i+1})$ coincides with that over the interval $(t_{i-1},t_{i})$. Since it holds that $\varvec{D}^{(p+1)}=\varvec{D}^{(p+1)}_+-\varvec{D}^{(p+1)}_-$, $(\varvec{D}^{(p+1)}\varvec{\alpha })_{i}=0$ is equivalent to $(\varvec{D}^{(p+1)}_+\varvec{\alpha })_i=(\varvec{D}^{(p+1)}_-\varvec{\alpha })_i$. Therefore, we have the desired result from Lemma A.1. $\square $

1.2 Proof of Proposition 3.1

Proof

Let $\varvec{\beta }^*$ be a local minimum of (10), that is, there exists a neighborhood ${\mathcal {N}}$ of $\varvec{\beta }^*$ such that $F(\varvec{\beta }^*)\le F(\varvec{\beta })$ holds for any $\varvec{\beta }\in {\mathcal {N}}$. Noting that $F(\varvec{\beta })=G(\varvec{\beta },g(\varvec{\beta }))$ holds for any $\varvec{\beta }\in {\mathbb {R}}^{l-1}$, we have

$$\begin{aligned} G(\varvec{\beta }^*,g(\varvec{\beta }^*))=F(\varvec{\beta }^*)\le F(\varvec{\beta })\le G(\varvec{\beta },\varvec{\beta }') \end{aligned}$$

for any $\varvec{\beta }\in {\mathcal {N}}$ and $\varvec{\beta }'\in {\mathbb {R}}^{p+1}$, which implies that $(\varvec{\beta }^*,g(\varvec{\beta }^*))$ is locally optimal to (9). It is clear that the local optimality of $(\varvec{\beta }^*,g(\varvec{\beta }^*))$ to (9) and $\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top $ to (8) are equivalent. This completes the proof of the former argument. The latter claim can be proved as well. $\square $

1.3 Proof of Theorem 3.1

Proof

Let $h(\varvec{\beta }):=\frac{1}{2}\big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }\big \Vert _2^2+\frac{c}{2}\big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }\big \Vert _2^2$. We see from the d-stationarity of $\varvec{\beta }^*$ that

$$\begin{aligned} \begin{aligned}&\nabla h(\varvec{\beta }^*)^\top (0-\varvec{\beta }^*)+\gamma T_K'(\varvec{\beta }^*;-\varvec{\beta }^*)\\&\quad =h'(\varvec{\beta }^*;-\varvec{\beta }^*)+\gamma T_K'(\varvec{\beta }^*;-\varvec{\beta }^*)\\&\quad =F'(\varvec{\beta }^*;-\varvec{\beta }^*)\\&\quad \ge 0. \end{aligned} \end{aligned}$$

(A1)

Since it is easy to see that $T_K'(\varvec{\beta }^*;-\varvec{\beta }^*)=-T_K(\varvec{\beta }^*)$, we have

$$\begin{aligned} \Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2\ge \big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }^*\big \Vert _2^2+c\big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }^*\big \Vert _2^2 \end{aligned}$$

from (A1), the convexity of h, and the nonnegativity of $T_K$. This leads to

$$\begin{aligned} \big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }^*\big \Vert _2\le \sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}, \quad \sqrt{c}\big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }^*\big \Vert _2\le \sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2} \end{aligned}$$

and hence we can evaluate as

$$\begin{aligned} \nabla h(\varvec{\beta }^*)^\top \varvec{d}&\le \Vert \nabla h(\varvec{\beta }^*)\Vert _\infty \\&=\max _{j=1,\ldots ,l-1}\left| {\varvec{l}^{(1)}_j}^\top (\varvec{z}_1-\varvec{L}_1\varvec{\beta }^*)+c{\varvec{l}^{(2)}_j}^\top (\varvec{z}_2-\varvec{L}_2\varvec{\beta }^*)\right| \\&\le \max _{j=1,\ldots ,l-1}\left\{ \Vert \varvec{l}^{(1)}_j\Vert _2\Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }^*\Vert _2+c\Vert \varvec{l}^{(2)}_j\Vert _2\Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }^*\Vert _2\right\} \\&\le \max _{j=1,\ldots ,l-1}\left\{ \Vert \varvec{l}^{(1)}_j\Vert _2\sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}+\sqrt{c}\Vert \varvec{l}^{(2)}_j\Vert _2\sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}\right\} \\&=\max _{j=1,\ldots ,l-1}\left\{ \Vert \varvec{l}^{(1)}_j\Vert _2+\sqrt{c}\Vert \varvec{l}^{(2)}_j\Vert _2\right\} \sqrt{\Vert \varvec{z}_1\Vert _2^2+c\Vert \varvec{z}_2\Vert _2^2}. \end{aligned}$$

for all d such that $\Vert d\Vert _1=1$ and $d\in \{-1,0,1\}^{l-1}$. From Lemma 4 of Yagishita and Gotoh [5], we obtain $\Vert \varvec{\beta }^*\Vert _0\le K$. Noting that

$$\begin{aligned} \begin{pmatrix} \varvec{D}^{(p+1)}\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top \\ \varvec{A}\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top \end{pmatrix} = \hat{\varvec{D}}^{(p+1)}\varvec{\Sigma }^{(p+1)} \begin{pmatrix} \varvec{\beta }^*\\ g(\varvec{\beta }^*) \end{pmatrix} = \begin{pmatrix} \varvec{\beta }^*\\ g(\varvec{\beta }^*) \end{pmatrix} \end{aligned}$$

and that $\varvec{\Sigma }^{(p+1)}(\varvec{\beta }^{*\top },g(\varvec{\beta }^*)^\top )^\top $ is local minimum of (8) according to Proposition 3.1, we have the desired result. $\square $

1.4 Proof of Theorem 3.2

Proof

Let $\Omega :=\{\varvec{\beta }\mid F(\varvec{\beta })\le F(\varvec{\beta }_0)\}$. Note that $\{\varvec{\beta }_t\}\subset \Omega $. From the nonnegativity of $T_K$, it holds that $h(\varvec{\beta })\le F(\varvec{\beta }_0)$ for any $\varvec{\beta }\in \Omega $, which leads to

$$\begin{aligned} \big \Vert \varvec{z}_1-\varvec{L}_1\varvec{\beta }\big \Vert _2\le \sqrt{2F(\varvec{\beta }_0)}, \quad \big \Vert \varvec{z}_2-\varvec{L}_2\varvec{\beta }\big \Vert _2\le \sqrt{\frac{2F(\varvec{\beta }_0)}{c}}. \end{aligned}$$

Thus, we have

$$\begin{aligned} \Vert \nabla h(\varvec{\beta })\Vert _2&=\left\| \varvec{L}_1^\top (\varvec{z}_1-\varvec{L}_1\varvec{\beta })+c\varvec{L}_2^\top (\varvec{z}_2-\varvec{L}_2\varvec{\beta })\right\| _2\\&\le \Vert \varvec{L}_1\Vert \left\| \varvec{z}_1-\varvec{L}_1\varvec{\beta }\right\| _2+c\Vert \varvec{L}_2\Vert \left\| \varvec{z}_2-\varvec{L}_2\varvec{\beta }\right\| _2\\&\le \Vert \varvec{L}_1\Vert \sqrt{2F(\varvec{\beta }_0)}+\Vert \varvec{L}_2\Vert \sqrt{2cF(\varvec{\beta }_0)} \end{aligned}$$

for any $\varvec{\beta }\in \Omega $ where $\Vert \cdot \Vert $ is the operator norm, which implies that h is Lipschitz continuous on $\Omega $. Furthermore, $T_K$ is also Lipschitz continuous because it is expressed as the difference between the $\ell _1$ norm and the largest-K norm. As a result, F is Lipschitz continuous on $\Omega $, namely, is also uniformly continuous on $\Omega $. Combining the uniform continuity and nonnegativity of F with the Lipschitz continuity of $\nabla h$ yields $\Vert \varvec{\beta }_{t+1}-\varvec{\beta }_t\Vert _2\rightarrow 0$, similarly to the proof of Lemma 4 of Wright et al. [6]. Let $\varvec{\beta }^*$ be an accumulation point of $\{\varvec{\beta }_t\}$ and $\{\varvec{\beta }_{t_i}\}$ be a subsequence that converges to $\varvec{\beta }^*$. Since it is easy to see that $\{\eta _t\}$ is bounded (see, for example, Lu and Li [9, Theorem 5.1]), for any $\varvec{d}\in {\mathbb {R}}^{l-1}$, it follows from the optimality of $\varvec{\beta }_{t_i+1}$ that

$$\begin{aligned}&\nabla h(\varvec{\beta }_{t_i})^\top \varvec{\beta }_{t_i+1}+\frac{\eta _{t_i}}{2}\Vert \varvec{\beta }_{t_i+1}-\varvec{\beta }_{t_i}\Vert _2^2+T_K(\varvec{\beta }_{t_i+1})\\&\quad \le \nabla h(\varvec{\beta }_{t_i})^\top (\varvec{\beta }^*+\xi \varvec{d})+\frac{\eta _{t_i}}{2}\Vert \varvec{\beta }^*+\xi \varvec{d}-\varvec{\beta }_{t_i}\Vert _2^2+T_K(\varvec{\beta }^*+\xi \varvec{d})\\&\quad \le \nabla h(\varvec{\beta }_{t_i})^\top (\varvec{\beta }^*+\xi \varvec{d})+\frac{\eta _{\max }}{2}\Vert \varvec{\beta }^*+\xi \varvec{d}-\varvec{\beta }_{t_i}\Vert _2^2+T_K(\varvec{\beta }^*+\xi \varvec{d}) \end{aligned}$$

for $\xi >0$, where $\eta _{\max }:=\sup _t\eta _t$. We obtain from the continuity of $\nabla h$ and $T_K$ that

$$\begin{aligned} \xi \nabla h(\varvec{\beta }^*)^\top \varvec{d}+\frac{\eta _{\max }\xi ^2}{2}\Vert \varvec{d}\Vert _2^2+T_K(\varvec{\beta }^*+\xi \varvec{d})-T_K(\varvec{\beta }^*)\ge 0. \end{aligned}$$

Dividing both sides by $\xi $ and taking the limit $\xi \rightarrow 0$ give

$$\begin{aligned} F'(\varvec{\beta }^*;\varvec{d})=\nabla h(\varvec{\beta }^*)^\top d+\gamma T_K'(\varvec{\beta }^*;\varvec{d})\ge 0, \end{aligned}$$

which implies that $\varvec{\beta }^*$ is a d-stationary point of (10), that is, a local minimum of (10).

Next, let $\varvec{\alpha }^*$ be an accumulation point of $\big \{\varvec{\Sigma }^{(p+1)}(\varvec{\beta }_t^\top ,g(\varvec{\beta }_t)^\top )^\top \big \}$ and $\big \{\varvec{\Sigma }^{(p+1)}(\varvec{\beta }_{t_i}^\top ,g(\varvec{\beta }_{t_i})^\top )^\top \big \}$ be a subsequence that converges to $\varvec{\alpha }^*$. We see that

$$\begin{aligned} \begin{aligned} \begin{pmatrix} \varvec{\beta }_{t_i}\\ g(\varvec{\beta }_{t_i}) \end{pmatrix} =\hat{\varvec{D}}^{(p+1)}\varvec{\Sigma }^{(p+1)} \begin{pmatrix} \varvec{\beta }_{t_i}\\ g(\varvec{\beta }_{t_i}) \end{pmatrix} \rightarrow \hat{\varvec{D}}^{(p+1)}\varvec{\alpha }^*= \begin{pmatrix} \varvec{D}^{(p+1)}\varvec{\alpha }^*\\ \varvec{A}\varvec{\alpha }^* \end{pmatrix}, \end{aligned} \end{aligned}$$

(A2)

which implies that $\varvec{D}^{(p+1)}\varvec{\alpha }^*$ is a local minimum of (10) because it is an accumulation point of $\{\varvec{\beta }_t\}$. It follows from (A2) and the continuity of g that $g(\varvec{D}^{(p+1)}\varvec{\alpha }^*)=\varvec{A}\varvec{\alpha }^*$ and hence we obtain from Proposition 3.1 that

$$\begin{aligned} \varvec{\alpha }^*=\varvec{\Sigma }^{(p+1)}\hat{\varvec{D}}^{(p+1)}\varvec{\alpha }^*=\varvec{\Sigma }^{(p+1)} \begin{pmatrix} \varvec{D}^{(p+1)}\varvec{\alpha }^*\\ g(\varvec{D}^{(p+1)}\varvec{\alpha }^*) \end{pmatrix} \end{aligned}$$

is locally optimal to (8). $\square $

Appendix B: Construction of expanded difference matrix and its inverse

In this section, concrete constructions of $\hat{\varvec{D}}^{(p+1)}$ and $\varvec{\Sigma }^{(p+1)}$ are shown. Let us define

$$\begin{aligned}&\hat{\varvec{D}}^{(1)}:=\begin{pmatrix} -1 &{}{} 1 &{}{} &{}{} &{}{} &{}{} &{}{} &{}{} \\ {} &{}{} \ddots &{}{} \ddots &{}{} &{}{} &{}{} &{}{} &{}{} \\ {} &{}{} &{}{} -1 &{}{} 1 &{}{} &{}{} &{}{} \\ {} &{}{} &{}{} &{}{} s_1 &{}{} &{}{} &{}{} \\ {} &{}{} &{}{} &{}{} &{}{} &{}{} \ddots &{}{} \\ {} &{}{} &{}{} &{}{} &{}{} &{}{} &{}{} s_1 \\ \end{pmatrix}\in {\mathbb {R}}^{(l+p)\times (l+p)}\end{aligned}$$

by expanding $\varvec{D}^{(1)}$ with $s_1\ne 0$ and

$$\begin{aligned} \hat{\varvec{\Delta }}^{(q+1)}:=\begin{pmatrix} \frac{-1}{t_1-t_{-q+1}} &{} \frac{1}{t_1-t_{-q+1}} &{} &{} &{} \\ &{} \frac{-1}{t_2-t_{-q+2}} &{} \frac{1}{t_2-t_{-q+2}} &{} &{} \\ &{} &{} \ddots &{} \ddots &{} \\ &{} &{} &{} \frac{-1}{t_{l-1+q}-t_{l-1}} &{} \frac{1}{t_{l-1+q}-t_{l-1}} \\ &{} &{} &{} &{} s_{q+1} &{} &{} \\ &{} &{} &{} &{} &{} \ddots &{} \\ &{} &{} &{} &{} &{} &{} s_{q+1} \\ \end{pmatrix} \in {\mathbb {R}}^{(l+p)\times (l+p)} \end{aligned}$$

by expanding $\varvec{\Delta }^{(q+1)}$ with $s_{q+1}\ne 0$ for $1\le q\le p$. Let

$$\begin{aligned} \hat{\varvec{D}}^{(q+1)}:=\hat{\varvec{D}}^{(q)}\hat{\varvec{\Delta }}^{(q+1)} \end{aligned}$$

recursively, then there exists a $(p+1)\times (l+p)$ matrix $\varvec{A}$ such that

$$\begin{aligned} \hat{\varvec{D}}^{(p+1)}= \begin{pmatrix} \varvec{D}^{(p+1)}\\ \varvec{A} \end{pmatrix}. \end{aligned}$$

By constructions of $\hat{\varvec{D}}^{(1)}$ and $\hat{\varvec{\Delta }}^{(q+1)}$, they are non-singular, and hence $\hat{\varvec{D}}^{(p+1)}$ is also non-singular. It is not hard to see that

$$\begin{aligned} \big (\hat{\varvec{D}}^{(1)}\big )^{-1}= \begin{pmatrix} -\varvec{U} &{} \varvec{S} \\ &{} s_{1}^{-1}\varvec{I}_{p+1} \end{pmatrix} \end{aligned}$$

(B3)

and

$$\begin{aligned} \big (\hat{\varvec{\Delta }}^{(q+1)}\big )^{-1}= \begin{pmatrix} -(t_1-t_{-q+1}) &{} -(t_2-t_{-q+2}) &{} \cdots &{} -(t_{l-1+q}-t_{l-1}) &{} s_{q+1}^{-1} \\ &{} -(t_2-t_{-q+2}) &{} \cdots &{} -(t_{l-1+q}-t_{l-1}) &{} s_{q+1}^{-1} \\ &{} &{} \ddots &{} \vdots &{} \vdots \\ &{} &{} &{} -(t_{l-1+q}-t_{l-1}) &{} \vdots \\ &{} &{} &{} &{} s_{q+1}^{-1} &{} &{} \\ &{} &{} &{} &{} &{} \ddots &{} \\ &{} &{} &{} &{} &{} &{} s_{q+1}^{-1} \\ \end{pmatrix} \end{aligned}$$

(B4)

for $1\le q\le p$, where $\varvec{U}$ is the upper triangular matrix of size $(l-1)\times (l-1)$ such that all non-zero elements equal 1 and

$$\begin{aligned} \varvec{S}= \begin{pmatrix} s_{1}^{-1} &{}{} 0 &{}{} \ldots &{}{} 0 \\ \vdots &{}{} \vdots &{}{} &{}{} \vdots \\ s_{1}^{-1} &{}{} 0 &{}{} \ldots &{}{} 0 \\ \end{pmatrix} \in {\mathbb {R}}^{(l-1)\times (p+1)}. \end{aligned}$$

As a result, we can compute as

$$\begin{aligned} \big (\varvec{\Sigma }^{(p+1)}\big )^{-1}=\big (\hat{\varvec{D}}^{(1)}\big )^{-1}\cdots \big (\hat{\varvec{\Delta }}^{(p+1)}\big )^{-1} \end{aligned}$$

by using (B3) and (B4).

About this article

Cite this article

Yagishita, S., Gotoh, Jy. Exact penalty method for knot selection of B-spline regression. Japan J. Indust. Appl. Math. 41, 1033–1059 (2024). https://doi.org/10.1007/s13160-023-00631-5

Download citation

Received: 11 April 2023
Revised: 09 October 2023
Accepted: 26 October 2023
Published: 03 December 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s13160-023-00631-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exact penalty method for knot selection of B-spline regression

Abstract

Access this article

Similar content being viewed by others

Globally optimal univariate spline approximations

CG-Lasso Estimator for Multivariate Adaptive Regression Spline

Chebyshev Approximation by Linear Combinations of Fixed Knot Polynomial Splines with Weighting Functions

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs of propositions

1.1 Proof of Theorem 2.1

Lemma A.1

Proof

proof of Theorem 2.1

1.2 Proof of Proposition 3.1

Proof

1.3 Proof of Theorem 3.1

Proof

1.4 Proof of Theorem 3.2

Proof

Appendix B: Construction of expanded difference matrix and its inverse

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Exact penalty method for knot selection of B-spline regression

Abstract

Access this article

Similar content being viewed by others

Globally optimal univariate spline approximations

CG-Lasso Estimator for Multivariate Adaptive Regression Spline

Chebyshev Approximation by Linear Combinations of Fixed Knot Polynomial Splines with Weighting Functions

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs of propositions

1.1 Proof of Theorem 2.1

Lemma A.1

Proof

proof of Theorem 2.1

1.2 Proof of Proposition 3.1

Proof

1.3 Proof of Theorem 3.1

Proof

1.4 Proof of Theorem 3.2

Proof

Appendix B: Construction of expanded difference matrix and its inverse

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation