P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data

The P-splines of Eilers and Marx (Stat Sci 11:89–121, 1996) combine a B-spline basis with a discrete quadratic penalty on the basis coefficients, to produce a reduced rank spline like smoother. P-splines have three properties that make them very popular as reduced rank smoothers: (i) the basis and the penalty are sparse, enabling efficient computation, especially for Bayesian stochastic simulation; (ii) it is possible to flexibly ‘mix-and-match’ the order of B-spline basis and penalty, rather than the order of penalty controlling the order of the basis as in spline smoothing; (iii) it is very easy to set up the B-spline basis functions and penalties. The discrete penalties are somewhat less interpretable in terms of function shape than the traditional derivative based spline penalties, but tend towards penalties proportional to traditional spline penalties in the limit of large basis size. However part of the point of P-splines is not to use a large basis size. In addition the spline basis functions arise from solving functional optimization problems involving derivative based penalties, so moving to discrete penalties for smoothing may not always be desirable. The purpose of this note is to point out that the three properties of basis-penalty sparsity, mix-and-match penalization and ease of setup are readily obtainable with B-splines subject to derivative based penalization. The penalty setup typically requires a few lines of code, rather than the two lines typically required for P-splines, but this one off disadvantage seems to be the only one associated with using derivative based penalties. As an example application, it is shown how basis-penalty sparsity enables efficient computation with tensor product smoothers of scattered data.


Computing arbitrary derivative penalties for B-splines
The main purpose of this note is to show that reduced rank spline smoothers with derivative based penalties can be set up almost as easily as the P-splines of Eilers and Marx (1996), while retaining sparsity of the basis and penalty and the ability to mix-and-match the orders of spline basis functions and penalties.
The key idea is that we want to represent a smooth function f (x) using a rank k spline basis expansion f (x) = k j=1 β j B m 1 ,j (x), where B m 1 ,j (x) is an order m 1 B-spline basis function, and β j is a coefficient to be estimated.In this paper order m 1 = 3 will denote a cubic spline.Associated with the spline will be a derivative based penalty where f [m 2 ] (x) denotes the m th 2 derivative of f with respect to x, and [a, b] is the interval over which the spline is to be evaluated.It is assumed that m 2 ≤ m 1 , otherwise the penalty is formulated in terms of a derivative that is not properly defined for the basis functions, which makes no sense.It is possible to write J = β T Sβ where S is a band diagonal matrix of known coefficients.Computation of S is the only part of setting up the smoother that presents any difficulty, since standard routines for evaluating B-splines basis functions (and their derivatives) are readily and widely available, and in any case the recursion for basis function evaluation is straightforward.
The algorithm for finding S in general is as follows.p = m 1 − m 2 denotes the order of piecewise polynomial defining the m th 2 derivative of the spline.Let x 1 , x 2 . . .x k−m+1 be the (ordered) 'interior knots' defining the B-spline basis, that is the knots within whose range the spline and its penalty are to be evaluated (so a = x 1 and b = x k−m+1 ).Let the inter-knot distances be h 1.For each interval [x j , x j+1 ], generate p + 1 evenly spaced points within the interval.For p = 0 the point should be at the interval centre, otherwise the points always include the end points x j and x j+1 .Let x ′ contain the unique x values so generated, in ascending order.
5. The diagonally banded penalty coefficient matrix is S = G T WG.
6. Optionally, compute the diagonally banded Cholesky decomposition R T R = W, and form diagonally banded matrix D = RG, such that S = D T D.
Step 2 can be accomplished by standard routines for generating B-spline bases and their derivatives of arbitrary order: in R for example, the function splines:splineDesign for normal B-splines or mgcv:cSplineDes for cyclic B-splines.Alternatively see the appendix.
Step 4 requires no more than a single rank p + 1 matrix inversion of P. P is somewhat ill conditioned for p ≥ 20, with breakdown for p > 30.However it is difficult to imagine any sane application for which p would even be as high as 10, and for p ≤ 10, P's condition number is < 2 × 10 4 .Of course W is formed without explicitly forming the W q matrices.Step 6 can be accomplished by a banded Cholesky decomposition such as dpbtrf from LAPACK (accessible via routine mgcv:bandchol in R, for example).Alternatively see the appendix.However for applications with k less than 1000 or so, a dense Cholesky decomposition might be deemed efficient enough.Note that step 6 is preferable to construction of D by decomposition of S, since W is positive definite by construction, while, for m 2 > 0, S is only positive semi-definite.As in the case of a discrete P-spline penalty the leading order computational cost of evaluating S is O(bk) where b is the number of bands in S (the O(p 3 ) cost of W usually being negligible in comparison), and is a trivial relative to model fitting.
The derivation of the algorithm is quite straightforward.Given the basis expansion we have that m 1 ,j (x)dx.

However by construction B
[m 2 ] m 1 ,i (x) is made up of order p = m 1 − m 2 polynomial segments.So we are really interested in integrals of the form for some polynomial coefficients a i and d j .The polynomial coefficients are the solution obtained by evaluating m 1 ,i (x) at p + 1 points spaced evenly from x l to x l+1 , to obtain a vector of evaluated derivatives, g a , and then solving Pa = g a (d is obtained from g d similarly).Then S ij = l S ijl .
Given that 1 −1 x q dx = (1 + (−1) q )/(q + 1) it is clear that S ijl = h l a T Hd/2 where H ij = (1 + (−1) i+j−2 )/(i + j − 1) (i and j start at 1).In terms of the evaluated gradient vectors, The G matrix simply maps β to the concatenated (and duplicate deleted) gradient vectors for all intervals, while W is just the overlapping-block diagonal matrix with blocks given by h l P −T HP −1 /2, hence S ij = G T i WG j , where G i is the i th row of G.The simplicity of the algorithm rests on the ease with which G and W can be computed.Note that the construction is more general than that of Wand and Ormerod (2008), in allowing m 1 and m 2 to be chosen freely (rather than m 1 determining m 2 ), and treating even m 1 as well as odd.

Tensor product smoothing of unevenly distributed data
An example where a compactly supported basis and sparse penalty is computationally helpful is in tensor product smoothing of unevenly distributed data.A three dimensional example suffices to illustrate how tensor product smooths are constructed from one dimensional bases.Suppose we want to smooth with respect to z 1 , z 2 and z 3 .Firstly B-spline bases are constructed for smooth functions of each covariate separately.Suppressing subscripts for order, let B j1 (z j ), B j2 (z j ), . . .denote the basis for the smooth function of z j , and let D j denote the corresponding 'square root' penalty matrix.The smooth function of all three variables is then represented as where β ijl are the coefficients.Notice that the tensor product basis functions, B 1i (z 1 )B 2j (z 2 )B 3l (z 3 ), inherit compact support form the marginal basis functions.Now write the coefficients in 'column major' order in one vector β T = (β 111 , β 112 , . . ., β 11k 1 , β 121 , β 122 , . . .β k 1 k 2 k 3 ), where k j is the dimension of the j th basis.The tensor product smoother then has three associated penalties, β T S j β (each with its own smoothing parameter), where S j = DT j Dj , This construction generalizes to other numbers of dimensions in the obvious way (see e.g.Wood, 2006).By construction the domain of the tensor product smooth is a rectangle, cuboid or hypercuboid, but it is often the case that the covariates to be smoothed over occupy only part of this domain.In this case it is possible for some basis functions to evaluate to zero at every covariate observation, and there is often little point in retaining these basis functions and their associated coefficients.Let ι denote the index of a coefficient to be dropped from β (along with its corresponding basis function).The näive approach of dropping row and column ι of each S j is equivalent to setting β ι to zero when evaluating β T S j β, which is not usually desirable.Rather than setting β ι = 0 in the penalty, we would like to omit those components of the penalty dependent on β ι .This is easily achieved by dropping every row κ from Dj for which Dj,κι = 0. Notice (i) that without D being diagonally banded this would be a rather drastic reduction of the penalty, and (ii) this construction applies equally well to P-splines.
As an illustration data were generated from the model at the x, z locations shown as black dots in figure 1.The figure shows the reconstruction of the test function using a tensor product smoother, based on cubic spline marginals with second derivative penalties.The left figure is for the full smoother, which had 625 coefficients, while the right figure is for the reduced version which had 358 coefficients.Including REML smoothing parameter selection the reduced rank fit took around 1/8 of the computation time of the full rank fit.The correlation between the fitted values for the two fits is 0.999.In the example the reduced rank fit has marginally smaller mean square reconstruction error than the full rank version, a feature that seems to be robust under repeated replication of the experiment.

Conclusions
Given that the theoretical justification for using spline bases for smoothing is that they arise as the solutions to variational problems with derivative based penalties (see e.g.Wahba, 1990;Duchon, 1977), it is sometimes appealing to be able to use derivative based penalties for reduced rank smoothing also.However if a sparse smoothing basis and penalty were required alongside the ability to mix-and-match penalty order and basis order, then the apparent complexity of obtaining the penalty matrix for derivative based penalties has hitherto presented an obstacle to their use.This note removes this obstacle, allowing the statistician an essentially free choice whether to use derivative based penalties or discrete penalties.
The splines described here are available in R package mgcv from version 1.8-12.They could be referred to as 'D-splines', but a new name is probably un-necessary.This work was supported by EPSRC grant EP/K005251/1.

A Standard recursions
B-spline bases, their derivatives and banded Cholesky decompositions are readily available in standard software libraries and packages such as R and Matlab.However, for completeness the required recursions are included here.
To define a k dimensional B-spline basis of order m we need to define k + m + 1 knots x 1 < x 2 < . . .< x k+m+1 .The interval over which the spline is to be evaluated is [x m+1 , x k+1 ] so the locations of knots outside this interval are rather unimportant.The B-spline basis functions are defined recursively as This can be applied recursively to obtain higher order derivatives.For more on both of these recursions see and p.89 and p.116 of de Boor (2001) (or de Boor, 1978).Now consider the banded Cholesky decomposition of a symmetric positive definite matrix A with 2p − 1 non-zero diagonals (clustered around the leading diagonal).We have ki , and R ij = all other elements of Cholesky factor R being 0. The expressions are used one row at a time, starting from row 1, and working across the columns from left to right.See any matrix algebra book for Cholesky decomposition (e.g.Golub and van Loan, 1996).

Figure 1 :
Figure1: Left: conventional tensor product smooth reconstruction of the example function given in the text, based on noisy samples at the x, z locations shown as black dots.Right: as left, but using the reduced basis described in section 2.
x), i = 1, . . ., k, m > 0 whereB 0,i (x) = 1 x i ≤ x < x i+1 0 otherwise.It turns out that the derivative with respect to x of a B-spline of order m can be expressed in terms of a B-spline basis of order m − 1 as followsj β j B ′ m,j (x) = (m − 1) j β j − β j−1 x j+m − x j B m−1,j (x).