The term ‘spline’ refers to a craftsman’s tool, a flexible thin strip of wood or metal, used to draft smooth curves. Several weights would be applied on various positions so the strip would bend according to their number and position. This would be forced to pass through a set of fixed points: metal pins, the ribs of a boat, etc. On a flat surface these were often weights with an attached hook and thus easy to manipulate. The shape of the bended material would naturally take the form of a spline curve. Similarly, splines are used in statistics in order to mathematically reproduce flexible shapes. Knots are placed at several places within the data range, to identify the points where adjacent functional pieces join each other. Instead of metal or wood stripes, smooth functional pieces (usually low-order polynomials) are chosen to fit the data between two consecutive knots. The type of polynomial and the number and placement of knots is what then defines the type of spline.
Motivating example
With the introduction of generalized additive models (GAMs) [15] in 1986, the use of spline modelling has become an established tool in statistical regression analysis. To illustrate this, consider data on a set of 892 females under 50 years collected in three villages in West Africa (data available in the Additional file 1: Appendix). We would like to explore the relationship between age (in years) and a crude measure of body fat, which is triceps skinfold thickness. Figure 1 shows the relationship between age and triceps skinfold thickness measured in logarithmic scale. For more information about the data see [3, 23].
A simple regression model of the form yi=β0+β1xi+ε,i=1,...,n, would hardly give an approximation of the observed pattern, since it is obvious that the relationship is not linear. The model can be extended to accommodate for non-linear effects using some polynomials. Then, non-linear effects could be modelled by a polynomial of degree 3 given by:
$$ y_{i}=\alpha_{0}+\alpha_{1} u_{i}+\alpha_{2} u_{i}^{2}+\alpha_{3} u_{i}^{3}+\epsilon $$
(1)
where u is a function of x called basis function, defined here by:
$$U=\left[ \begin{array}{cccc} 1 & x_{1} & x_{1}^{2} & x_{1}^{3}\\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_{n} & x_{n}^{2} & x_{n}^{3} \end{array}\right] $$
The regression model described in Eq. 1 is still a linear model, despite the fact that it provides a non-linear function of the predictor variable. The model is still linear in the coefficients and can be fitted using ordinary least squares methods. The basis can be created in R using function poly(x,3) with inputs x (referring to the variable), and p (referring to the degree of the polynomial). This leads to a simple univariate smooth model of the form: yi=f(xi)+ε where f() is some function/transformation of the predictor. Such a model can be easily fitted in R by using: lm(y ∼poly(x,3)). Despite the simplicity, polynomial regression has several drawbacks, the most important being non-locality. That means that the fitted function at a given value x0 depends on data values far from that point. It is easy to see this in action by fitting a polynomial to a set of data and moving one of the data points near the right edge up or down. As a result, the fitted function will usually change far from that x coordinate.
Consider, instead of fitting a global polynomial, partitioning the range of x into smaller intervals, utilising an arbitrary number and position of points, τ, also called the knots. A simple piecewise continuous model can be fitted by defining the functions: f1(x)=1,f2(x)=x,f3(x)=(x−τ1)+,f4(x)=(x−τ2)+,..., with “+” a function defined as:
$$u_{+}=\left\{ \begin{array}{cc} u, & \text{if}\, u>0\\ 0, & \text{if}\, u\leq 0 \end{array}\right. $$
The set of these functions lead to a composite function f(x).
Definition of splines
The draftsman’s metal spline can assume arbitrary shapes, for instance, the cross-section of an airplane wing or the spiral of a centrifugal pump. For statistical applications we will assume curves of the form f(X), i.e., a single y value for each x. The predictor x can be a single variable or multiple variables. Our discussion will focus almost entirely on a univariate function with \(X\in \mathbb {R}\). Define a set of knots τ1<...<τK in the range of X. A spline f(X) will be a smooth function, satisfying certain differentiability properties mentioned below, such that f(X) is a polynomial of degree d. Wooden or metal splines have continuous derivatives of all orders since they are a physical object. This is not true for statistical splines. Rather we impose a smoothness criterion that all derivatives of order less than d are continuous. A physical spline is linear beyond the last knot and we may impose a further constraint derivatives of order 2 or greater are zero at the leftmost and rightmost knots; splines with this additional constraint are known as “restricted” or “natural” splines. In order to obtain more flexible curves the number of knots or the degree of the polynomial can be increased. There is however a trade-off; increasing the number of knots may overfit the data and increase the variance, whilst decreasing the number of knots may result in a rigid and restrictive function that has more bias.
Representation by basis functions
Assume that the unknown function f is represented by a spline function with fixed knot sequence and fixed degree d. Because the latter functions form a vector space V, it is possible to write f as
$$ f(X)=\sum\limits_{k=1}^{K+d+1}\beta_{k} B_{k} (X) \,, $$
(2)
where the Bk are a set of basis functions defining V and βk are the associated spline coefficients. With k knots there are k+1 polynomials of degree d along with d∗k constraints, leading to (d+1)(k+1)−d∗k=d+k+1 free parameters [9, 41]; for a natural spline there are k free parameters. Since βB=(βA)(A−1B)=γB∗ for any nonsingular matrix A there are an infinite number of possible basis sets for the spline fit.
The representation in (2) has the advantage that the estimation of f reduces to the estimation of the coefficients βk. More specifically, the expression in (2) is linear in the coefficient vector β=(β1,...,βK+d+1). Therefore the estimation of f can be viewed as an optimization problem that is linear in the transformed variables B1(X),...,BK+d+1(X), allowing for the use of well-established estimation techniques for use of splines in a broad range of (generalized) multivariable regression models. Importantly, spline modelling reduces the estimation of the functions f() to the estimation of a small set of real-valued coefficients.
As pointed out by various authors (e.g., [9, 12, 41] the high flexibility of spline modelling comes at the price of a number of tuning parameters. Two of these, the choice of basis functions B and the degree d of the underlying polynomials turn out to have little impact. In fact, spline fits are remarkably robust to the degree d. Cubic polynomials (d=3) are the usual standard as they result in curves that appear perfectly smooth to the human eye. If derivatives of the fitted curves are of interest, a higher order is sometimes appropriate, but in general fits for d>3 are effectively indistinguishable. Fits with d=1 or d=2 have nearly identical statistical properties but will appear more jagged. The choice between two basis sets B and B∗ will by definition not change the predictions from a fit and so come down to convenience issues.
The two key choices are in the number and spacing of the knots and the use (or not) of a penalty function, e.g., the integrated second derivative of the spline. When there is no penalty, the creation of the transformed variables can be done separately and the new variables are simply included in a standard model fit; no modification of the underlying regression procedure is required. This approach is often referred to as regression splines; the flexibility of the resulting non-linear function is entirely a function of the number of knots. The inclusion of a smoothing penalty, on the other hand, requires modification of the fitting routine in order to accommodate it. This has to be included in each regression function separately. The resulting smoothing splines have several desirable properties, but the added complexity of the smooth function can be a reason for not been used more often in applied settings.
Although considerable research has been conducted to explore the mathematical properties of the various spline approaches (see [4, 11, 13, 37, 41], applied statisticians and data analysts hardly seem to be aware of these results when using spline modelling in practical applications. In fact, many of the articles identified by our web search contained no justification on the rationale for the choice of the used spline method.
Popular spline basis
There are numerous options for the definition of the basis functions Bk, where the various spline basis differ with respect to their numerical properties [4, 41]. In this Section, we will introduce some of the most popular spline basis, namely the truncated power series basis, the B-spline basis and the cardinal spline basis.
Truncated power series and Cubic splines
The truncated power series basis is defined by the basis functions
$$B_{1}(x) = 1, B_{2}(x) = x,..., B_{d+1}(x) = x^{d}, $$
$$B_{d+2}(x) = (x- \tau_{1})_{+}^{d},..., B_{K+d+1} = (x -\tau_{k})_{+}^{d} $$
An advantage of the basis functions above is their easy interpretation: Starting with a “basic” polynomial of degree d defined on [a,b] (first line of equation), deviations from the basic polynomial are successively added to the spline function to the right of each of K knots (second line). A truncated power base spline is d−1 times differentiable at the knots and has d+K degrees of freedom. It is relatively easy for the user to create a truncated power series in R. Let x represent some observations in [0,1], then a truncated power basis of degree d=3 with 5 knots equally spaced within along the range of x can be created using Code 1 in the Additional file 1: Appendix (Fig. 2).
A feature of the truncated power series is that the supports of the functions are not local, with some of the Bk being defined over the whole range of data [a,b]. This might lead to high correlations between some basis splines, implying numerical instabilities in spline estimation. For the truncated power series basis, an example is given in [9], Chapter 5.
Cubic splines are created by using a cubic polynomial in an interval between two successive knots. The spline has four parameters on each of the K+1 regions minus three constraints for each knot, resulting in a K+4 degrees of freedom.
A cubic spline function, with three knots (τ1,τ2,τ3) will have 7 degrees of freedom. Using representation given in Eq. 2, the function can be written as:
$$ f(X)= \beta_{0} + \beta_{1} X + \beta_{2} X^{2} + \beta_{3} X^{3} + \beta_{4} (X-\tau_{1})^{3} + \beta_{5} (X-\tau_{2})^{3} + \beta_{6} (X-\tau_{3})^{3} $$
B-splines
The B-spline basis is a commonly used spline basis that is based on a special parametrisation of a cubic spline. The B-spline basis [4], is based on the knot sequence
$$\begin{aligned} \xi_{1} \le \ldots &\le \xi_{d} \le \xi_{d+1} < \xi_{d+2} < \ldots < \xi_{d + K + 1} \\ &< \xi_{d + K + 2} \le \xi_{d + K + 3} \le \ldots \le \xi_{2d + K + 2} \,, \end{aligned} $$
where the sets ξd+2 := τ1,…,ξd+K+1:=τK and ξd+1:=a,ξd+K+2:=b are referred to as “inner knots” and “boundary knots”, respectively. The choice of the additional knots ξ1,…,ξd and ξd+K+3,…,ξ2d+K+2 is essentially arbitrary. A common strategy is to set them equal to the boundary knots. Alternatively, if the inner knots and the boundary knots ξd+1<…<ξd+K+2 are chosen to be equidistant, i.e., ξk+1−ξk=δ ∀k∈{d+1,…,d+K+1}, the boundary knots may be placed at ξd+1−δ,…,ξd+1−d·δ and ξd+K+2+δ,…,ξd+K+2+d·δ.
For d>0, B-spline basis functions of degree d (denoted by \(B_{k}^{d}(x)\)) are defined by the recursive formulaFootnote 1
$$ \begin{aligned} B_{k}^{d}(x)&=\frac{x-\xi_{k}}{\xi_{k+d}-\xi_{k}}B_{k}^{d-1}(x)-\frac{\xi_{k+d+1}-x}{\xi_{k+d+1}-\xi_{k+1}}B_{k+1}^{d-1}(x),\\k &= 1,...,K+d+1, \end{aligned} $$
where
$$B_{k}^{0}(x)=\left\{ \begin{array}{cc} 1, & \xi_{k} \leq x < \xi_{k+1}\\ 0, & \text{else} \end{array} \right. $$
and \(B_{k}^{0}(x) \equiv 0\) if ξk=ξk+1. B-splines have the advantage that the basis functions have local support. More specifically, they are larger than zero in intervals spanned by d+2 knots and zero elsewhere. This property results in a high numerical stability, and also in an efficient algorithm for the construction of the basis functions, see [4] for details.
Natural cubic and cardinal splines
A polynomial spline such as a cubic or a B-spline, can be erratic at the boundaries of the data. To address this issue, natural splines are cubic splines that have the additional constraints that they are linear in the tails of the boundary knots (−∞,a],[b,+∞). This is achieved by requiring that the spline function f satisfies f″=f‴=0 which lead to additional four constraints, that a natural spline basis on K knots has K+1 degrees of freedom.
Another basis for natural cubic splines is the cardinal spline basis. The K basis functions of cardinal splines (of degree d=3 each) are defined by their values at the knots τ1,...,τK. More specifically, they are defined such that the k-th basis function satisfies Bk(τk)=1 and Bk(τj)=0,τj≠τk. As a consequence, the coefficients βk have an easy interpretation: Each coefficient equals to the value of the spline function f at the knot τk. For an efficient construction of the cardinal spline basis we refer to [41], Chapter 4.
In addition to the truncated power series natural splines, B-spline and cardinal spline basis, various other - less popular - basis exist. For an overview, we refer to the books by [11, 13, 41].
Penalized splines
The splines presented so far are often referred to as regression splines. In addition to the choice of the spline basis (B-spline, truncated power series, etc.), the number of knots and the knot positions have to be chosen. Obviously, these tuning parameters may have an important impact on the estimated shape of a spline function: A large number of knots implies high flexibility but may also result in overfitting the data at hand. Conversely, a small number of knots may result in an “oversmooth” estimate that is prone to under-fit bias (see [9, 41]).
A popular approach to facilitate the choice of the knot positions in spline modelling is the use of penalized splines. Given an i.i.d. sample of data (x1,y1),…(xn,yn), a penalized spline is the solution to the problem
$$\hat{\beta} = \text{argmax}_{\beta} \left[ l_{\beta} (x_{1},y_{1}, \ldots, x_{n},y_{n}) - \lambda \cdot J_{\beta} \right] \,, $$
where lβ denotes the log-likelihood (or, in case of Cox regression, the partial log-likelihood) and Jr is a roughness penalty that becomes small if the the spline function is “smooth”. Generally, penalized splines are based on the idea that the unknown function f is modeled by a spline with a large number of knots, allowing for a high degree of flexibility. On the other hand, a rough spline estimate that has a high value of lβ and is close to the data values results in a large value of Jβ. The maximization of this function therefore implies a trade-off between smoothness and model fit that is controlled by the tuning parameter λ≥0.
A special case is the penalized least squares problem
$$ \hat{\beta} = \text{argmin}_{\beta} \left[ \sum\limits_{i=1}^{n} \left(f_{\beta} (x_{i}) - y_{i}\right)^{2} + \lambda \cdot {\int\nolimits}_{a}^{b} \left(\partial^{2} f / \partial x^{2}\right)^{2} \,dx \right] $$
(3)
in Gaussian regression. The penalty \(J_{\beta } \,=\, \int _{a}^{b} \left (\partial ^{2} f / \partial x^{2}\right)^{2} dx\) expresses the “smoothness” of a spline function in terms of the second derivative of f. For given λ, it can be shown that the solution is a natural cubic spline with knot sequence x(1)<…<x(n), i.e., the knot positions do not have to be chosen but are ‘naturally’ given by the ordered unique data values of X. In the literature, this type of spline is referred to as smoothing spline [11]. Of note, it can be shown that a smoothing spline interpolates the data if λ=0, while λ=∞ implies a linear function. Note that smoothing splines are a special case of the more general class of thin plate splines [40], which allow for an extension of the criterion in Eq. (3) to higher-dimensional xi (see [41], Section 4.15], and [11] for details).
A convenient property of smoothing splines is that the penalty Jβ can be written as β⊤Ωβ with a suitably defined penalty matrix Ω. Therefore the solution to (3) is given by the penalized least squares estimate
$$ \hat{\beta} = \left(B^{\top} B + \lambda \Omega\right)^{-1} B^{\top} y $$
(4)
where B is a matrix of dimension n×n containing the natural spline basis functions evaluated at the data values. The vector y contains the response values y1,…,yn. In practice, very efficient algorithms exist to compute \(\hat {\beta }\) in (4) [11]. Instead of specifying a natural spline basis for f, it is further possible to work with an unconstrained B-spline basis, as the penalty in (3) automatically imposes the linearity constraints at the knots x(1) and x(n) (see [9], Chapter 5, and [13], Chapter 2). Regarding the B-spline basis, estimation results will not depend on the choice of the boundary knots:it is either possible to use x(1) and x(n) as boundary knots or to include x(1) and x(n) in the set of inner knots.
If n is large and the interval [a,b] is covered densely by the observed data, it is usually not necessary to place a knot at every xi,i=1,…,n. Instead, the smoothing spline may be approximated by a penalized regression spline that uses a reduced set of knots. A very popular class of penalized regression splines are P-splines [8], which are based on the cubic B-spline basis and on a ‘large’ set of equidistant knots (usually, 10–40). Instead of evaluating the integral in (3), P-splines are based on a second-order difference penalty defined by
$$J^{*}_{\beta} = \sum\limits_{k=3}^{K+4} \left(\Delta^{2} \beta_{k} \right)^{2} \,, $$
which, in case of evenly spaced knots, can be shown to be an approximation to Jβ. The second-order difference operator Δ2 is defined by Δ2βk:=(βk−βk−1)−(βk−1−βk−2). The penalty can therefore be expressed as β⊤Pβ, where P is defined by D⊤D with D a matrix of differences. It is easily derived that the resulting estimator of β has the same structure as 2, with Ω replaced by P.
A convenient property of P-splines is that they are numerically stable and very easy to define and implement. In particular, it is much easier to set up the difference matrix D than the matrix Ω. Also, it is straightforward to extend the penalty Jβ (and hence the matrix D) to higher-order differences Δq with q>2. It is also possible to use a knot sequence that is not evenly spaced; in this case, weights need to be introduced. Because P-splines with unevenly spaced knots are seldom used in practice, we do not consider them here and refer to [8] instead.
Smoothing splines and P-splines overcome the problem of knot selection to some degree. Their philosophy is to use a large number of knots and then let λ control the amount of smoothness. This results in one extra tuning parameter, with no general consensus on how to tune this parameter. Some popular ways to determine the “optimal” value of λ use generalized cross-validation (GCV), AIC or a mixed-model representation [24].
Splines in R
The basic installation bundle of R contains a set of functions that can fit simple polynomial splines and smoothing splines. Further functions are included in the library splines written by DM Bates and WN Venables. The package has been the workhorse of spline fitting for many years and is now part of the basic distribution of R. There are more than 100 other packages that depend on splines when loading. The package contains several functions to create spline basis, such as bs for B-splines and ns for natural splines, that are widely used, but also some more specialized functions for creating basis functions (such as periodicSpline that creates a periodic interpolation splines) or commands that are useful such as command predict.bSpline that would evaluate a spline at new values of X.
The default bs values will create a cubic B-spline basis with two boundary knots and one interior knot placed at the median of the observed data values. More flexibility can be achieved by the user, by increasing the placement and the number of knots and/or changing their locations. Figure 3 (code 2 in the Additional file 1: Appendix) shows B-splines created with different options. The upper part presents linear splines, i.e. first order polynomials (degree is one) connected together on equidistant knots. The lower part presents cubic polynomials (degree 3).
It should be noted that B-splines created in R with bs() are automatically bounded by the range of the data, and that the additional knots (τ1,...,τd) are set equal to the boundary knots, giving multiple knots at both ends of the domain. This approach is useful in univariate cases and has some computationally attractive features. However, if one works on a two-dimensional smoothing problem, using tensor products of B-splines, or when working with P-splines, this basis is unsuitable and may lead to spurious results.
Natural splines can be created within the splines package, using command ns. By default, unless the user specifies either the degrees of freedom or the knots the function returns a straight line within the boundary knots. Figure 4 (code 3 in the Additional file 1: Appendix shows natural splines created with different options.
To illustrate how these functions can be used in practice, consider again the data from Section 2.0.1. Figure 5 (created by (code 4 in the Additional file 1: Appendix)) shows the fits obtained by using the following commands: poly() for simple orthogonal polynomical splines, smooth.spline() for smoothing splines, bs() and ns() from library splines, for B-splines and natural splines respectively. The upper left graph shows a simple linear fit on the data (dashed line) and a third degree polynomial fit that is able to capture the more complex relationship between the variables. The graph on the upper right corner is particularly interesting though, since it presents the fits using the default values of the spline functions. The green line comes from functions poly() and ns() which at default they both define a straight line. On the other extreme, the blue line is a fit from function smooth.spline() which if no degrees of freedom are specified tends to undersmooth the data, i.e. produce a very flexible wiggly fit based -here- on 45 degrees on freedom. A -visually- reasonable fit to the data can be achieved when four degrees of freedom are specified (lower left graph). It can be seen that there are some differences depending on the chosen base. The polynomial basis (black line) is a little more flexible than the rest, especially at higher ages. On the other hand, a smoothing spline restricted to just four degrees of freedom is more rigid than other approaches, but probably oversmooths the data at small ages, between years 0 and 10. In between the two extremes, B-splines and natural splines provide very similar fits that capture the effect of small ages and tend to be less influenced by extreme cases at the end of the age spectrum. Last, the lower right graph shows how much more flexible the fits become with additional degrees of freedom and suggests potential over-fit bias due to use of excessive degrees of freedom.
A note on degrees of freedom
In practice, it is always useful to define a spline by degrees of freedom. This approach is particularly useful when working with B-splines and natural splines. B-splines have d+K, while a natural cubic spline basis function with K knots has K+1 degrees of freedom, respectively. By default, the function bs in R creates B-splines of degree 3 with no interior knots and boundary knots defined at the range of the X variable. As such the function creates three basis functions. Now consider the following case: when a user defines a B-spline with an interior knot at the median of X (bs(x,knots=median(x))) the software will create four functions (d=3 plus K=1 interior knots, four degrees of freedom). If however, the user specifies in the function the boundary knots within the knots argument (bs(x,knots=c(min(x),median(x),max(x)))), the function will have six degrees of freedom (d =3 plus k =3). Similar caution should be taken with function ns.
When working with smoothing splines, it is not easy to specify the degrees of freedom, since they will vary depending on the size of the penalty. However, in practice, penalized splines can also be restricted to a maximum number of degrees of freedom or desired degrees of freedom.
Other spline packages
Broadly speaking, the extended list spline packages contains either approaches that are quite similar to what is presented here or very specialized cases that target specific applications. In Table 1 some of these packages are presented along with the number of downloads. The number refer to the number of times a package has been downloaded but not unique users. It is beyond the scope of this work to describe in detail all of these approaches.
Table 1 R packages used for the creation of splines