# Shape constrained additive models

- 4.3k Downloads
- 26 Citations

## Abstract

A framework is presented for generalized additive modelling under shape constraints on the component functions of the linear predictor of the GAM. We represent shape constrained model components by mildly non-linear extensions of P-splines. Models can contain multiple shape constrained and unconstrained terms as well as shape constrained multi-dimensional smooths. The constraints considered are on the sign of the first or/and the second derivatives of the smooth terms. A key advantage of the approach is that it facilitates efficient estimation of smoothing parameters as an integral part of model estimation, via GCV or AIC, and numerically robust algorithms for this are presented. We also derive simulation free approximate Bayesian confidence intervals for the smooth components, which are shown to achieve close to nominal coverage probabilities. Applications are presented using real data examples including the risk of disease in relation to proximity to municipal incinerators and the association between air pollution and health.

## Keywords

Monotonic smoothing Convex smoothing Generalized additive model P-splines## 1 Introduction

*shape constrained*smooth function of predictor variable \(x_k\). The predictors \(x_j\) and \(z_k\) may be vector valued.

It is the shape constraints on the \(m_k\) that differentiate this model from a standard generalized additive model (GAM). In many studies it is natural to assume that the relationship between a response variable and one or more predictors obeys certain shape restrictions. For example, the growth of children over time and dose-response curves in medicine are known to be monotonic. The relationships between daily mortality and air pollution concentration, between body mass index and incidence of heart disease are other examples requiring shape restrictions. Unconstrained models might be too flexible and give implausible or un-interpretable results.

Here we develop a general framework for shape constrained generalized additive models (SCAM), covering estimation, smoothness selection, interval estimation and also allowing for model comparison. The aim is to make SCAMs as routine to use as conventional unconstrained GAMs. To do this we build on the established framework for generalized additive modelling covered, for example, in Wood (2006a). Model smooth terms are represented using spline type penalized basis function expansions; given smoothing parameter values, model coefficients are estimated by maximum penalized likelihood, achieved by an inner iteratively reweighted least squares type algorithm; smoothing parameters are estimated by the outer optimization of a GCV or AIC criterion. Interval estimation is achieved by taking a Bayesian view of the smoothing process, and model comparison can be achieved using AIC, for example.

- 1.
We propose shape constrained P-splines (SCOP-splines), based on a novel mildly non linear extension of the P-splines of Eilers and Marx (1996), with novel discrete penalties. These allow a variety of shape constraints for one and multidimensional smooths. From a computational viewpoint, they ensure that the penalized likelihood and the GCV/AIC scores are smooth with respect to the model coefficients and smoothing parameters, allowing the development of efficient and stable model estimation methods.

- 2.
We develop stable computational schemes for estimating the model coefficients and smoothing parameters, able to deal with the ill-conditioning that can affect even unconstrained GAM fits (Wood 2004, 2008), while retaining computational efficiency. The extra non-linearity induced by the use of SCOP-splines does not allow the unconstrained GAM methods to be re-used or simply modified. Substantially new algorithms are required instead.

- 3.
We provide simulation free approximate Bayesian confidence intervals for the SCOP-spline model components in this setting.

To understand the motivation for our approach, note that it is not difficult to construct shape constrained spline like smoothers, by subjecting the spline coefficients to linear inequality constraints (Ramsay 1988; Wood 1994; Zhang 2004; Kelly and Rice 1990; Meyer 2012). However, this approach leads to methodological problems in estimating the smoothing parameters of the spline. The use of linear inequality constraints makes it difficult to optimize standard smoothness selection criteria, such as AIC and GCV with respect to multiple smoothing parameters. The difficulty arises because the derivatives of these criteria change discontinuously as constraints enter or leave the set of active constraints. This leads to failure of the derivative based optimization schemes which are essential for efficient computation when there are many smoothing parameters to optimize. SCOP-splines circumvent this problem.

Other procedures based on B-splines were proposed by He and Shi (1998), Bollaerts et al. (2006), Rousson (2008), Wang and Meyer (2011). Meyer (2012) presented a cone projection method for estimating penalized B-splines with monotonicity or convexity constraints and proposed a GCV based test for checking the shape constrained assumptions. Monotonic regression within the Bayesian framework has been considered by Lang and Brezger (2004), Holmes and Heard (2003), Dunson and Neelon (2003), and Dunson (2005). In spite of their diversity these existing approaches also lack the ability to efficiently compute the smoothing parameter in a multiple smooth context. In addition, to our knowledge except for the bivariate constrained P-spline introduced by Bollaerts et al. (2006), multi-dimensional smooths under shape constraints on either all or a selection of the covariates have not yet been presented in the literature.

The remainder of the paper is structured as follows. The next section introduces SCOP-splines. Section 3.1 shows how SCAMs can be represented for estimation. A penalized likelihood maximization method for SCAM coefficient estimation is discussed in Sect. 3.2. Section 3.3 investigates the selection of multiple smoothing parameters. Interval estimation of the component smooth functions of the model is considered in Sect. 3.4. A simulation study is presented in Sect. 4 while Sect. 5 demonstrates applications of SCAM to two epidemiological examples.

## 2 SCOP-splines

### 2.1 B-spline background

In the smoothing literature B-splines are a common choice for the basis functions because of their smooth interpolation property, flexibility, and local support. The B-splines properties are thoroughly discussed in De Boor (1978). Eilers and Marx (1996) combined B-spline basis functions with discrete penalties in the basis coefficients to produce the popular ‘P-spline’ smoothers. Li and Ruppert (2008) established the corresponding asymptotic theory. Specifically that the rate of convergence of the penalized spline to a smooth function depends on an order of the difference penalty but not on a degree of B-spline basis and number of knots, given that the number of knots grows with the number of data and assuming the function is twice continuously differentiable. Ruppert (2002) and Li and Ruppert (2008) showed that the choice of the basis dimension is not critical but should be above some minimal level which depends on the spline degree. Asymptotic properties of P-splines were also studied in Kauermann et al. (2009) and Claeskens et al. (2009). Here we propose to build on the P-spline idea to produce SCOP-splines.

### 2.2 One-dimensional case

#### 2.2.1 Smoothing

#### 2.2.2 Identifiability, basis dimension

As with any penalized regression spline approaches, the choice of the basis dimension, \(q\), is not crucial but should be generous enough to avoid oversmoothing/underfitting (Ruppert 2002; Li and Ruppert 2008). Ruppert (2002) suggested algorithms for the basis dimension selection by minimizing GCV over a set of specified values of \(q\), while Kauermann and Opsomer (2011) proposed an equivalent likelihood based scheme.

Univariate shape constrained smooths

Shape constraints | \(\varvec{\varSigma }\) | |
---|---|---|

Monotone increasing | \( \varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i<j \\ 1, &{} \mathrm if \quad i\ge j \\ \end{array}\right. \) | \(\begin{array}{l} D_{i,i+1}=-D_{i,i+2}=1,\\ i=1,\ldots ,q-2 \\ D_{ij}=0, \quad \mathrm otherwise \\ \end{array}\) |

Monotone decreasing | \( \varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i<j \\ 1, &{} \mathrm if \quad j=1,\quad i\ge 1 \\ -1, &{} \mathrm if \quad j\ge 2,\quad i\ge j \\ \end{array}\right. \) | \(\begin{array}{l} D_{i,i+1}=-D_{i,i+2}=1,\\ i=1,\ldots ,q-2 \\ D_{ij}=0, \quad \mathrm otherwise \\ \end{array}\) |

Convex | \( \varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i<j \\ 1, &{} \mathrm if \quad j=1,\quad i\ge 1 \\ -(i-1), &{} \mathrm if \quad j= 2,\quad i\ge j \\ i-j+1, &{} \mathrm if \quad j\ge 3,\quad i\ge j \\ \end{array}\right. \) | \(\begin{array}{l} D_{i,i+2}=-D_{i,i+3}=1,\\ i=1,\ldots ,q-3 \\ D_{ij}=0, \quad \mathrm otherwise \\ \end{array}\) |

Concave | \( \varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i<j \\ 1, &{} \mathrm if \quad j=1,\quad i\ge 1 \\ i-1, &{} \mathrm if \quad j= 2,\quad i\ge j \\ -(i-j+1), &{} \mathrm if \quad j\ge 3,\quad i\ge j \\ \end{array}\right. \) | As above |

increasing and convex | \( \varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i<j \\ 1, &{} \mathrm if \quad j=1,\quad i\ge 1 \\ i-j+1, &{} \mathrm if \quad j\ge 2,\quad i\ge j \\ \end{array}\right. \) | As above |

Increasing and concave | \(\varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i=1, \quad j\ge 2 \\ 1, &{} \mathrm if \quad j=1,\quad i\ge 1 \\ i-1, &{} \mathrm if \quad i\ge 2,\quad j=2,\ldots ,q-1+2 \\ q-j+1, &{} \mathrm if \quad i\ge 2,\quad j=q-i+3,\ldots ,q \\ \end{array}\right. \) | As above |

Decreasing and convex | \(\varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i=1, \quad j\ge 2 \\ 1, &{} \mathrm if \quad j=1,\quad i\ge 1 \\ -(i-1), &{} \mathrm if \quad i\ge 2,\quad j=2,\ldots ,q-1+2 \\ -(q-j+1), &{} \mathrm if \quad i\ge 2,\quad j=q-i+3,\ldots ,q \\ \end{array}\right. \) | As above |

Decreasing and concave | \( \varSigma _{ij}=\left\{ \begin{array}{ll} 0, &{} \mathrm if \quad i<j \\ 1, &{} \mathrm if \quad j=1,\quad i\ge 1 \\ -(i-j+1), &{} \mathrm if \quad j\ge 2,\quad i\ge j \\ \end{array}\right. \) | As above |

### 2.3 Multi-dimensional SCOP-splines

Using the concept of tensor product spline bases it is possible to build up smooths of multiple covariates under the monotonicity constraint, where monotonicity may be assumed on either all or a selection of the covariates. In this section the construction of a multivariable smooth, \(m(x_1,x_2,\ldots ,x_p),\) with multiple monotonically increasing constraints along all covariates is first considered, followed by a discussion of single monotonicity along a single direction.

#### 2.3.1 Tensor product basis

#### 2.3.2 Constraints

- 1.For the single monotonically increasing constraint along the \(x_{j}\) direction: Let \(\varvec{\varSigma }_{j}\) be defined as previously while \(\mathbf{I}_{s}\) is an identity matrix of size \(q_{s},\,s\ne j,\) thenand \({\varvec{\gamma }} = {\varvec{\varSigma }} \tilde{\varvec{\beta }},\) where \(\tilde{\varvec{\beta }}\) is a vector containing a mixture of un-exponentiated and exponentiated coefficients with \(\tilde{\beta }_{k_1\ldots k_j\ldots k_p}=\exp (\beta _{k_1\ldots k_j\ldots k_p})\) when \(k_j\ne 1.\)$$\begin{aligned} \varvec{\varSigma }= \mathbf{I}_{1}\otimes \cdots \otimes \varvec{\varSigma }_{j} \otimes \cdots \otimes \mathbf{I}_{p}, \end{aligned}$$
- 2.
For the single monotonically decreasing constraint along the \(x_{j}\) direction: The re-parametrization is the same as above except for the representation of the matrix \(\varvec{\varSigma }_{j}\) which is as for univariate smooth with monotonically decreasing constraint (see Table 1).

#### 2.3.3 Penalties

## 3 SCAM

### 3.1 SCAM representation

To represent (1) for computation we now choose basis expansions, penalties and identifiability constraints for all the unconstrained \(f_j\), as described in detail in Wood (2006a), for example. This allows \(\sum _j f_j(z_{ji})\) to be replaced by \(\mathbf{F}_i {\varvec{\gamma }}\), where \(\mathbf F\) is a model matrix determined by the basis functions and the constraints, and \(\varvec{\gamma }\) is a vector of coefficients to be estimated. The penalties on the \(f_j\) are quadratic in \(\varvec{\gamma }\).

### 3.2 SCAM coefficient estimation

- 1.
To obtain an initial estimate of \( {\varvec{\beta }}\), minimize \(\Vert g(\mathbf{y}) - \mathbf{X}\tilde{\varvec{\beta }}\Vert ^2 + \tilde{\varvec{\beta }}^{\mathrm{T}}\mathbf{S}_\lambda \tilde{\varvec{\beta }}\) w.r.t. \(\tilde{\varvec{\beta }}\),

*subject to linear inequality constraints*ensuring that \(\tilde{\beta }_j >0 \) whenever \(\tilde{\beta }_j = \exp (\beta _j)\). This is a standard quadratic programming (QP) problem. (If necessary \(\mathbf y\) is adjusted slightly to avoid infinite \(g(\mathbf{y})\).) - 2.
Set k = 0 and repeat the steps 3–11 to convergence...

- 3.
Evaluate \(z_i = (y_i - \mu _i) g^\prime (\mu _i)/\alpha (\mu _i)\) and \(w_i = \omega _i \alpha (\mu _i)/ \{V(\mu _i)g^{\prime 2}(\mu _i)\},\) using the current estimate of \(\mu _i\).

- 4.
Evaluate vectors \(\tilde{\mathbf{w}} = |\mathbf{w}|\) and \(\tilde{\mathbf{z}}\) where \(\tilde{z}_i = \mathrm{sign}(w_i)z_i\).

- 5.
Evaluate the diagonal matrix \(\mathbf C\) such that \(C_{jj} = 1\) if \(\tilde{\beta }_j = \beta _j\), and \(C_{jj} = \exp (\beta _j)\) otherwise.

- 6.
Evaluate the diagonal matrix \(\mathbf E\) such that \(E_{jj} = 0\) if \(\tilde{\beta }_j = \beta _j\), and \(E_{jj} =\sum _i^n w_ig^\prime (\mu _i) [\mathbf{XC}]_{ij}(y_i-\mu _i)/\alpha (\mu _i)\) otherwise.

- 7.
Let \(\mathbf{I}^-\) be the diagonal matrix such that \(I^{-}_{ii} = 1\) if \(w_i<0\) and \(I^{-}_{ii}=0 \) otherwise.

- 8.Letting \(\tilde{\mathbf{W}}\) denote diag\((\tilde{\mathbf{w}})\), form the QR decompositionwhere \(\mathbf{B}\) is any matrix square root such that \(\mathbf{B}^{\mathrm{T}}\mathbf{B} = \mathbf{S}_\lambda \).$$\begin{aligned} \left[ \begin{array}{c} \sqrt{\tilde{\mathbf{W}}} \mathbf{X}\mathbf{C} \\ \mathbf{B} \end{array}\right] = \mathbf{QR}, \end{aligned}$$
- 9.Letting \(\mathbf{Q}_1\) denote the first \(n\) rows of \(\mathbf Q\), form the symmetric eigen-decomposition$$\begin{aligned} \mathbf{Q}_1 ^{\mathrm{T}}\mathbf{I}^- \mathbf{Q}_1 + \mathbf{R}^{-\mathrm T}\mathbf{E} \mathbf{R}^{-1} = \mathbf{U }{\varvec{\varLambda }} \mathbf{U}^{\mathrm{T}}. \end{aligned}$$
- 10.
Hence define \(\mathbf{P} = \mathbf{R}^{-1} \mathbf{U}(\mathbf{I} - {\varvec{\varLambda }})^{-1/2}\) and \(\mathbf{K} = \mathbf{Q}_1 \mathbf{U} (\mathbf{I} - {\varvec{\varLambda }})^{-1/2}\).

- 11.
Update the estimate of \({\varvec{\beta }}\) as \({\varvec{\beta }}^{[k+1]} = {\varvec{\beta }}^{[k]} + \mathbf{P K}^{\mathrm{T}}\sqrt{\tilde{\mathbf{W}}} \tilde{\mathbf{z}} - \mathbf{PP}^{\mathrm{T}}\mathbf{S}_\lambda {\varvec{\beta }}^{[k]}\) and increment \(k\).

- 1.
If the Hessian of the log likelihood is indefinite then step 10 will fail, because some \(\varLambda _{ii}\) will exceed 1. In this case a Fisher update step must be substituted, by setting \(\alpha (\mu _i) = 1\).

- 2.There is considerable scope for identifiability issues to hamper computation. In common with unconstrained GAMs, flexible SCAMs with highly correlated covariates can display co-linearity problems between model coefficients, which require careful handling numerically, in order to ensure numerical stability of the estimation algorithms. An additional issue is that the non-linear constraints mean that parameters can be poorly identified on flat sections of a fitted curve, where \(\beta \) is simply ‘very negative’, but the data contain no information on how negative. So steps must be taken to deal with unidentifiable parameters. One approach is to work directly with the QR decomposition to calculate which coefficients are unidentifiable at each iteration and to drop these, but a simpler strategy substitutes a singular value decomposition for the R factor at step 8 if it is rank deficient, so thatThen we set \(\varvec{{\fancyscript{Q}}} = \mathbf{Q}\varvec{{\fancyscript{U}}},\) \(\varvec{{\fancyscript{R}}}=\mathbf{DV}^T,\) and \(\mathbf{Q}_1\) is the first \(n\) rows of \(\varvec{{\fancyscript{Q}}},\) and everything proceeds as before, except for the inversion of \(\mathbf R\). We now substitute the pseudoinverse \(\mathbf{R}^- = \mathbf{VD}^-\), where the diagonal matrix \(\mathbf{D}^-\) is such that \(D_{jj}^- = 0 \) if the singular value \(D_{jj}\) is ‘too small’, but otherwise \( D^-_{jj} = 1/D_{jj}\). ‘Too small’ is judged relative to the largest singular value \(D_{11}\) multiplied by some power (in the range .5 to 1) of the machine precision. If all parameters are numerically identifiable then the pseudo-inverse is just the inverse.$$\begin{aligned} \mathbf{R}=\varvec{{\fancyscript{U}}} \mathbf{DV}^{\mathrm{T}}. \end{aligned}$$

### 3.3 SCAM smoothing parameter estimation

Optimization of the \({\fancyscript{V}}_*\) w.r.t. \({\varvec{\rho }} = \log ({\varvec{\lambda }})\) can be achieved by a quasi-Newton method. Each trial \(\varvec{\rho }\) vector will require a Sect. 3.2 iteration to find the corresponding \(\hat{\varvec{\beta }}\) so that the criterion can be evaluated. In addition the first derivative vector of \({\fancyscript{V}}_*\) w.r.t. \(\varvec{\rho }\) will be required, which in turn requires \(\partial \hat{\varvec{\beta }}/{\partial \varvec{\rho } }\) and \(\partial \tau /{\partial \rho }\).

### 3.4 Interval estimation

Having obtained estimates \(\hat{\varvec{\beta }}\) and \(\hat{\varvec{\lambda }}\), we have point estimates for the component smooth functions of the model, but it is usually desirable to obtain interval estimates for these functions as well. To facilitate the computation of such intervals we seek distributional results for the \(\tilde{\varvec{\beta }}\), i.e. for the coefficients on which the estimated functions depend linearly.

*approximate*result

## 4 Simulated examples

### 4.1 Simulations: comparison with alternative methods

In this section performance comparison with unconstrained GAM and the QP approach to shape preserving smoothing (Wood 1994) is illustrated on a simulated example of an additive model with a mixture of monotone and unconstrained smooth terms. All simulation studies and data applications were performed using R packages "scam", which implements the proposed SCAM approach, and "mgcv" for GAM and QP implementations. A more extensive simulation study is given in Supplementary material, S.6. Particularly, the first subsection of S.6 shows comparative study with the constrained P-spline regression (Bollaerts et al. 2006), monotone piecewise quadratic splines of Meyer (2008), and shape-restricted penalized B-splines of Meyer (2012) on simulated example on univariate single smooth term models. Since there was no mean square error advantage of these approaches over the SCAM for the univariate model, and moreover, the direct grid search for multiple optimal smoothing parameter is computational expensive, (and to the authors’ knowledge, R routines for the implementation of these methods are not freely available) the comparison for multivariate and additive examples were performed only with the unconstrained GAM and QP approach.

The covariate values, \(x_{1i}\) and \(x_{2i},\) were simulated from the uniform distribution on \([-1,3]\) and \([-3,3]\) respectively. For the Gaussian data the values of \(\sigma \) were 0.05, 0.1, 0.2, which gave the signal to noise ratios of about 0.97, 0.88, and 0.65. For the Poisson model the noise level was controlled by multiplying \(g(\mu _i)\) by \(d\), taking values 0.5, 0.7, 1.2, which resulted in the signal to noise ratios of about 0.58, 0.84, and 0.99. For the SCAM implementation a cubic SCOP-spline of the dimension 30 was used to represent the first monotonic smooth term and a cubic P-spline with \(q=15\) for the second unconstrained term. For an unconstrained GAM, P-splines with the same basis dimensions were used for both model components. The models were fitted by penalized likelihood maximization with the smoothing parameter selected using \({\fancyscript{V}}_g\) in the Gaussian case and \({\fancyscript{V}}_u\) for the Poisson case.

For implementing the QP approach to monotonicity preserving constraint, we approximated the necessary and sufficient condition \(f'(x)\ge 0\), via the standard technique (Villalobos and Wahba 1987) of using a fine grid of linear constraints \((f'(x_i^*)\ge 0, i=1,\ldots , n),\) where \(x_i^*\) are spread evenly through the range of \(x\) (strictly such constraints are necessary, but only sufficient as \(n \rightarrow \infty \), but in practice we observed no violations of monotonicity). Cubic regression spline bases were used here together with the integrated squared second order derivative of the smooth as the penalty. The model fit is obtained by setting the QP problem within a penalized IRLS loop given \(\varvec{\lambda }\) chosen via GCV/UBRE from unconstrained model fit. Cubic regression splines tend to have slightly better MSE performance than P-splines (Wood 2006a) and moreover, the conditions built on finite differences are not only sufficient but also necessary for monotonicity. So this is a challenging test for SCAM. Three hundred replicates were produced for Gaussian and Poisson distributions at each of three levels of noise and for two sample sizes, 100, 200, for the three alternative approaches.

The simulation studies show that SCAM may have practical advantages over the alternative methods considered. It is computationally slower than GAM and QP approaches, however, obviously GAM cannot impose monotonicity, and the selection of the smoothing parameter for SCAM is well founded, in contrast to the ad hoc method used with QP of choosing \(\lambda \) from an unconstrained fit, and then refitting subject to constraint. Finally, the practical MSE performance of SCAM seems to be better than that of the alternatives considered here.

### 4.2 Coverage probabilities

The proposed Bayesian approach for confidence intervals construction makes a number of key assumptions; (i) it uses linear approximation of the exponentiated parameters, and in the case of non-Gaussian models adopts large sample inference; (ii) the smoothing parameters are treated as fixed. The simulation example of the previous subsection is used in order to examine how these restrictions affect the performance of the confidence intervals. The realized coverage probabilities is taken as a measure of their performance. Supplementary material, S.7, demonstrates two other examples for more thorough confidence interval performance presentation.

## 5 Examples

This section presents the application of SCAM to two different data sets. The purpose of the first application is to investigate whether proximity to municipal incinerators in Great Britain is associated with increased risk of stomach cancer (Elliott et al. 1996; Shaddick et al. 2007). It is hypothesized that the risk of cancer is a decreasing function of distance from an incinerator. The second application uses data from the National Morbidity, Mortality, and Air Pollution Study (Peng and Welty 2004). The relationship between daily counts of mortality and short-term changes in air pollution concentrations is investigated. It is assumed that increases in concentrations of ozone, sulphur dioxide, particular matter will be associated with adverse health effects.

**Incinerator data:**Elliott et al. (1996) presented a large-scale study to investigate whether proximity to incinerators is associated with an increased risk of cancer. They analyzed data from 72 municipal solid waste incinerators in Great Britain and investigated the possibility of a decline in risk with distance from sources of pollution for a number of cancers. There was significant evidence for such a decline for stomach cancer, among several others. Data from a single incinerator from those 72 sources, located in the northeast of England, are analyzed using the SCAM approach in this section. This incinerator had a significant result indicating a monotone decreasing risk with distance (Elliott et al. 1996).

The data are from 44 enumeration districts (census-defined administrative areas), ED, whose geographical centroids lay within 7.5 km of the incinerator. The response variable, \(Y_{i},\) are the observed numbers of cases of stomach cancer for each enumeration district. Associated estimates of the expected number of cases, \(E_{i},\) available for risk determination, \(\mathtt{risk}_{i}=Y_{i}/E_{i}\), obtained using national rates for the whole of Great Britain, standardized for age and sex, were calculated for each ED. The two covariates are the distance (km), \(\mathtt{dist}_{i},\) from the incinerator and a deprivation score, the Carstairs score, \(\mathtt{cs}_{i}\).

Model 1: \(\log \left\{ \mathrm E (Y_{i})\right\} =\log (E_{i})+m_{1}(\mathtt{dist}_{i}),\) \(m'_1(\mathtt{dist}_{i})<0.\) Model 2 is the same as model 1 but with \(m_{2}(\mathtt{cs}_{i})\) as its smooth term instead with \(m'_{2}(\mathtt{cs}_{i})>0.\) Model 3 combines both smooths while model 4 takes a bivariate function \(m_{3}(-\mathtt{dist}_{i},\mathtt{cs}_{i})\) subject to double monotone increasing constraint. The univariate smooth terms were represented by the third order SCOP-splines with \(q=15,\) while \(q_1=q_2=6\) were used for the bivariate SCOP-spline.

In model 2 the number of cases of stomach cancer are represented by a smooth function of deprivation score. This function is assumed to be monotonically increasing since it was shown (Elliott et al. 1996) that in general, people living closer to incinerators tend to be less affluent (low Carstairs score). The AIC value for this model was 155.59, whereas the unconstrained version gave AIC = 156.4, both of which were higher than for the previous model. The other three measures of the model performance, \({\fancyscript{V}}_u,\) the adjusted \(r^{2},\) and the deviance explained, also gave slightly worse results than those seen in model 1.

Model 3 incorporates both covariates, dist and cs, assuming an additive effect on log scale. The estimated edf of \(m_{2}(\mathtt{cs})\) was about zero. This smoothing term was insignificant in this model, with all its coefficients near zero. This can be explained by a high correlation between two covariates. Considering a linear effect of Carstairs in place of the smooth function, \(m_{2},\) as it was proposed in Shaddick et al. (2007), \(\log \left\{ \mathrm E (Y_{i})\right\} = \log (E_{i})+m_{1}(\mathtt{dist}_{i})+\beta \mathtt{cs}_{i},\) also resulted in an insignificant value for \(\beta .\)

**Air pollution data:** The second application investigates the relationship between non-accidental daily mortality and air pollution. The data were from the National Morbidity, Mortality, and Air Pollution Study (Peng and Welty 2004) which contains 5,114 daily measurements on different variables for 108 cities within the United States. As an example a single city (Chicago) study was examined in Wood (2006a). The response variable was the daily number of deaths in Chicago (death) for the years 1987–1994. Four explanatory variables were considered: average daily temperature (tempd), levels of ozone (o3median), levels of particulate matter (pm10median), and time. Since it might be expected that increased mortality will be associated with increased concentrations of air pollution, modelling with SCAM may prove useful.

- Model 1:
\( \log \left\{ \mathrm E (\mathtt{death}_{i})\right\} = f_{1}(\mathtt{time}_{i})+m_{2}(\mathtt{pm10}_{i})+ m_{3}(\mathtt{o3}_{i})+f_{4}(\mathtt{tmp}_{i}), \)

- Model 2:
\(\log \left\{ \mathrm E (\mathtt{death}_{i})\right\} = f_{1}(\mathtt{time}_{i})+m_{2}(\mathtt{pm10}_{i}) +m_{3}(\mathtt{o3}_{i},\mathtt{tmp}_{i}), \)

The current approach has been applied to air pollution data for Chicago just for demonstration purpose. It would be of interest to apply the same model to other cities, to see whether the relationship between non-accidental mortality and air pollution can be described by the proposed SCAM in other locations.

## 6 Discussion

In this paper a framework for generalized additive modelling with a mixture of unconstrained and shape restricted smooth terms, SCAM, has been presented and evaluated on a range of simulated and real data sets. The motivation of this framework is an attempt to develop general methods for estimating SCAMs similar to that of a standard unconstrained GAM. SCAM models allow inclusion of multiple unconstrained and shape constrained smooths of both univariate and multi-dimensional type which are represented by the proposed SCOP-splines. It should be mentioned that the shape constraints were assured by the sufficient but not necessary condition for the cubic and higher order splines. However, this condition for the cubic splines is equivalent to that of Fritsch and Carlson (1980) who showed that the sufficient parameter space constitutes the substantial part of the necessary parameter space (see their Fig. 2, p. 242). Also the sensitivity analysis of Brezger and Steiner (2008) on an empirical application models defends the point that the sufficient condition is not highly restrictive.

Since a major challenge of any flexible regression method is its implementation in a computationally efficient and stable manner, numerically robust algorithms for model estimation have been presented. The main benefit of the procedure is that smoothing parameter selection is incorporated into the SCAM parameter estimation scheme, which also produces interval estimates at no additional cost. The approach has the \(O(nq^2) \) computational cost of standard penalized regression spline based GAM estimation, but typically involves 2–4 times as many \(O(nq^2)\) steps because of the additional non-linearities required for the monotonic terms, and the need to use Quasi-Newton in place of full Newton optimization. However, in contrast to the ad hoc methods of choosing the smoothing parameter used in other approaches, smoothing parameter selection for SCAMs is well founded. It should also be mentioned that although the simulation free intervals proposed in this paper show good coverage probabilities it might be of interest to see whether Bayesian confidence intervals derived from posterior distribution simulated via MCMC would give better results.

## Notes

### Acknowledgments

The incinerator data were provided by the Small Area Health Statistics Unit, a unit jointly funded by the UK Department of Health, the Department of the Environment, Food and Rural Affairs, Environment Agency, Health and Safety Executive, Scottish Executive, National Assembly for Wales, and Northern Ireland Assembly. The authors are grateful to Jianxin Pan and Gavin Shaddick for useful discussions on several aspects of the work. The authors are also grateful for the valuable comments and suggestions of two referees and an associated editor. NP was funded by EPSRC/NERC grant EP/1000917/1.

## Supplementary material

## References

- Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, B.F. (eds.) Second International Symposium on Information Theory. Academiai Kiado, Budapest (1973)Google Scholar
- Bollaerts, K., Eilers, P., van Mechelen, I.: Simple and multiple P-splines regression with shape constraints. Br. J. Math. Stat. Psychol.
**59**, 451–469 (2006)CrossRefGoogle Scholar - Brezger, A., Steiner, W.: Monotonic regression based on Bayesian P-splines: an application to estimating price response functions from store-level scanner data. J. Bus. Econ. Stat.
**26**(1), 90–104 (2008)CrossRefMathSciNetGoogle Scholar - Claeskens, G., Krivobokova, T., Opsomer, J.: Asymptotic properties of penalized spline estimators. Biometrica
**96**(3), 529–544 (2009)CrossRefzbMATHMathSciNetGoogle Scholar - Craven, P., Wahba, G.: Smoothing noisy data with spline functions. Numer. Math.
**31**, 377–403 (1979)CrossRefzbMATHMathSciNetGoogle Scholar - De Boor, C.: A Practical Guide to Splines. Cambridge University Press, Cambridge (1978)zbMATHGoogle Scholar
- Dunson, D.: Bayesian semiparametric isotonic regression for count data. J. Am. Stat. Assoc.
**100**(470), 618–627 (2005)CrossRefzbMATHMathSciNetGoogle Scholar - Dunson, D., Neelon, B.: Bayesian inference on order-constrained parameters in generalized linear models. Biometrics
**59**, 286–295 (2003)CrossRefzbMATHMathSciNetGoogle Scholar - Eilers, P., Marx, B.: Flexible smoothing with B-splines and penalties. Stat. Sci.
**11**, 89–121 (1996)CrossRefzbMATHMathSciNetGoogle Scholar - Elliott, P., Shaddick, G., Kleinschmidt, I., Jolley, D., Walls, P., Beresford, J., Grundy, C.: Cancer incidence near municipal solid waste incinerators in Great Britain. Br. J. Cancer
**73**, 702–710 (1996)CrossRefGoogle Scholar - Fritsch, F., Carlson, R.: Monotone piecewise cubic interpolation. SIAM J. Numer. Anal.
**17**(2), 238–246 (1980)CrossRefzbMATHMathSciNetGoogle Scholar - Golub, G., van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
- Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman & Hall, New York (1990)zbMATHGoogle Scholar
- He, X., Shi, P.: Monotone B-spline smoothing. J. Am. Stat. Assoc.
**93**(442), 643–650 (1998)zbMATHMathSciNetGoogle Scholar - Holmes, C., Heard, N.: Generalized monotonic regression using random change points. Stat. Med.
**22**, 623–638 (2003)CrossRefGoogle Scholar - Kauermann, G., Krivobokova, T., Fahrmeir, L.: Some asymptotic results on generalized penalized spline smoothing. J. R. Stat. Soc. B
**71**(2), 487–503 (2009)CrossRefzbMATHMathSciNetGoogle Scholar - Kauermann, G., Opsomer, J.: Data-driven selection of the spline dimension in penalized spline regression. Biometrika
**98**(1), 225–230 (2011)CrossRefzbMATHMathSciNetGoogle Scholar - Kelly, C., Rice, J.: Monotone smoothing with application to dose-response curves and the assessment of synergism. Biometrics
**46**, 1071–1085 (1990)CrossRefGoogle Scholar - Kim, Y.-J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc: Ser. B.
**66**(2), 37–356 (2004)Google Scholar - Lang, S., Brezger, A.: Bayesian P-splines. J. Comput. Graph. Stat.
**13**(1), 183–212 (2004)CrossRefMathSciNetGoogle Scholar - Li, Y., Ruppert, D.: On the asymptotics of penalized splines. Biometrika
**95**(2), 415–436 (2008)CrossRefzbMATHMathSciNetGoogle Scholar - Marra, G., Wood, S.N.: Coverage properties of confidence intervals for generalized additive model components. Scand. J. Stat.
**39**(1), 53–74 (2012)Google Scholar - Meyer, M.: Inference using shape-restricted regression splines. Ann. Appl. Stat.
**2**(3), 1013–1033 (2008)CrossRefzbMATHMathSciNetGoogle Scholar - Meyer, M.: Constrained penalized splines. Can. J. Stat.
**40**(1), 190–206 (2012)CrossRefzbMATHGoogle Scholar - Meyer, M., Woodroofe, M.: On the degrees of freedom in shape-restricted regression. Ann. Stat.
**28**(4), 1083–1104 (2000)CrossRefzbMATHMathSciNetGoogle Scholar - Nychka, D.: Bayesian confidence intervals for smoothing splines. J. Am. Stat. Assoc.
**88**, 1134–1143 (1988)Google Scholar - Peng, R., Welty, L.: The NMMAPSdata package. R News
**4**(2), 10–14 (2004)Google Scholar - Ramsay, J.: Monotone regression splines in action (with discussion). Stat. Sci.
**3**(4), 425–461 (1988)CrossRefGoogle Scholar - Rousson, V.: Monotone fitting for developmental variables. J. Appl. Stat.
**35**(6), 659–670 (2008)CrossRefzbMATHMathSciNetGoogle Scholar - Ruppert, D.: Selecting the number of knots for penalized splines. J. Comput. Graph. Stat.
**11**(4), 735–757 (2002)CrossRefMathSciNetGoogle Scholar - Silverman, B.: Some aspects of the spline smoothing approach to nonparametric regression curve fitting. J. R. Stat. Soc.: Ser. B.
**47**, 1–52 (1985)Google Scholar - Shaddick, G., Choo, L., Walker, S.: Modelling correlated count data with covariates. J. Stat. Comput. Simul.
**77**(11), 945–954 (2007)CrossRefzbMATHMathSciNetGoogle Scholar - Villalobos, M., Wahba, G.: Inequality-constrained multivariate smoothing splines with application to the estimation of posterior probabilities. J. Am. Stat. Assoc.
**82**(397), 239–248 (1987)CrossRefzbMATHMathSciNetGoogle Scholar - Wahba, G.: Bayesian confidence intervals for the cross validated smoothing spline. J. R. Stat. Soc: Ser. B.
**45**, 133–150 (1983) Google Scholar - Wang, J., Meyer, M.: Testing the monotonicity or convexity of a function using regression splines. Can. J. Stat.
**39**(1), 89–107 (2011)CrossRefzbMATHMathSciNetGoogle Scholar - Wood, S.: Monotonic smoothing splines fitted by cross validation. SIAM J. Sci. Comput.
**15**(5), 1126–1133 (1994)CrossRefzbMATHMathSciNetGoogle Scholar - Wood, S.: Partially specified ecological models. Ecol. Monogr.
**71**(1), 1–25 (2001)Google Scholar - Wood, S.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc.
**99**, 673–686 (2004)Google Scholar - Wood, S.: Generalized Additive Models. An Introduction with R. Chapman & Hall, Boca Raton (2006a)Google Scholar
- Wood, S.: On confidence intervals for generalized additive models based on penalized regression splines. Aust. N. Z. J. Stat.
**48**(4), 445–464 (2006b)CrossRefzbMATHMathSciNetGoogle Scholar - Wood, S.: Fast stable direct fitting and smoothness selection for generalized additive models. J. R. Stat. Soc. B
**70**(3), 495–518 (2008)CrossRefzbMATHGoogle Scholar - Wood, S.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. B
**73**(1), 1–34 (2011)CrossRefGoogle Scholar - Zhang, J.: A simple and efficient monotone smoother using smoothing splines. J. Nonparametr. Stat.
**16**(5), 779–796 (2004)CrossRefzbMATHMathSciNetGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.