Monotone Smoothing Splines with Bounds

The problem of monotone smoothing splines with bounds is formulated as a constrained minimization problem of the calculus of variations. Existence and uniqueness of solutions of this problem is proved, as well as the equivalence of it to a finite dimensional but nonlinear optimization problem. A new algorithm for computing the solution which is a spline curve, using a branch and bound technique, is presented. The method is applied to examples in neuroscience and for fitting cumulative distribution functions from data.


Introduction
Splines and smoothing splines have a long history of application in many fields. The basic history is outlined in Egerstedt and Martin, [3] and in Wahba, [19]. See also [20] for an introduction to the use of smoothing splines in statistics. In this paper we return to the problem of monotone smoothing splines, which was previously studied in [3,4,8,9]. A classical application is to determine the average growth curve of a population of juveniles. Suppose we have a population of perhaps 30 children whose heights are measured every 6 months from age 2 to 20. It is easy to fit a smoothing spline to the data set but there is no guarantee of monotonicity. Since people just do not get shorter and then regrow, the smoothing spline is not appropriate for this and other similar situations. Instead, the appropriate tool is the monotone smoothing spline. Charles, Sun and Martin [2] used monotone smoothing splines in calculating distribution functions. There a problem can arise in that the spline may become greater than 1 at some point and by monotonicity it can never decrease, and so this violates the condition that a cumulative distribution only takes values between 0 and 1, a problem that was not addressed in [2]. In this paper, we solve this problem by imposing a condition that the spline is bounded above by some number x max (which would be 1 in the case of cumulative distribution functions). We also present an application of monotone B S. Maad Sasane 1 Lund University, Lund, Sweden smoothing splines in mathematical biology, where sigmoidally shaped function commonly occur.
The main difficulty of the problem is the monotonicity constraintẋ ≥ 0, which has to hold at every point t in the interval under consideration. This infinite dimensional constraint is handled by a vector space version of the Karush-Kuhn-Tucker theorem, and using this, it is possible to reduce the original infinite dimensional problem to a finite dimensional problem which can be solved numerically.
The paper is organized as follows: In Sect. 2, we formulate the curve fitting problem as a constrained Calculus of Variations problem, and in Sect. 3, we show existence and uniqueness of the minimizer of this minimization problem, and hence existence and uniqueness of the monotone smoothing spline function. In Sect. 4, we formulate the Karush-Kuhn-Tucker conditions for this problem, and use these to prove a key lemma, saying that the second derivative ofẍ is essentially piecewise linear. In Sect. 5, we use the key lemma from Sect. 4 to reformulate the infinite dimensional linear problem of Sect. 2 into a finite dimensional but nonlinear problem. This was essentially done previously in [3], except that we provide more details in the proof. We also introduce a new branch and bound type algorithm for computing the optimal curve. Sections 6 and 7 contain applications and examples which show how the method can be used. In Sect. 6, we reconstruct sigmoidal shaped curves arising in an intracellular signalling model, while in Sect. 7, we apply the method on reconstructing cumulative distribution functions using data, and in particular for a cumulative distribution function arising in the cell cycle, and the distribution that we reconstruct gives the time certain cells remain in a particular phase in the cell cycle.
Let x max > 0, and consider the following optimization problem for functions defined on an interval [0, T ]: where H 2 ((0, T )) is the Sobolev space of twice weakly differentiable functions on (0, T ).
The condition x(0) = 0 is included because in many situations in applications, it is for modelling purposes clear that the curve must satisfy this condition. This happens for example in the application of the intracellular signalling model that we discuss in Sect. 6. The condition x(T ) ≤ x max arises from 0 ≤ x(t) ≤ x max . Since the curve is monotonically increasing, it suffices to impose the condition at the endpoint t = T .
As (0, T ) is a bounded interval, we can use x := ( T 0ẍ (t) 2 dt) 1/2 as a norm on H 2 ((0, T )). The rest of the paper is devoted to solving this problem, and to applications of the developed method.

Existence and Uniqueness of a Minimizer
There exists a unique x * ∈ X which solves the minimization problem We will use the direct method in the calculus of variations, which says that if a functional is coercive on H 2 ((0, T )) and weakly lower semicontinuous on a weakly closed set, then a minimizer exists (see e.g. [17], p. 4). It is easy to see that the set X is convex and closed in H 2 ((0, T )) and hence it is weakly closed by Mazur's lemma (see e.g. Theorem 3.13 of [12]).
We will first check that the first term of f is weakly lower semicontinuous and coercive on L 2 ([0, T ]). Indeed, it is weakly lower semicontinuous since and so if x j x (i.e. x j converges weakly to x in H 2 ((0, T ))), then which shows that x → T 0ẍ (t) 2 dt is weakly lower semicontinuous on H 2 ((0, T )). Coerciveness of the first term on H 2 ((0, T )) is obvious since it is the square of the norm on H 2 ((0, T )).
Weak lower semicontinuity of the second term of (1) follows since H 2 ((0, T )) is com- , then x j is bounded in H 2 ([0, T ]), and since the embedding of H 2 ((0, T )) into C([0, T ]) is compact, x j has a subsequence x j l , which converges (to x by uniqueness of a weak limit) in C([0, T ])). Finally, note that for each j = 1, . . . , m, x in H 2 ((0, T )), and so the sum in (1) is weakly continuous. As f is a sum of two weakly lower semicontinuous functions, it is clear that f is weakly lower semicontinuous on X.
Next, we prove that f is coercive on X.
To show uniqueness, we will show that f is strictly convex. For this, we will use that u → T 0 u(t) 2 dt and x → (x − α 1 ) 2 are strictly convex on L 2 ([0, T ]) and on R, respectively.
It is clear that f 1 , f 2 and f 3 are convex (but not strictly convex) functions on X. To show that f is strictly convex, we need to prove that if for some λ ∈ (0, 1) and x 1 , To prove this, we assume that (4) holds. Since f i are convex for i = 1, . . . , 3 and f = f 1 + f 2 + f 3 , Eq. (4) holds also when f is replaced by 2 dt is strictly convex, the equality (4) for f 1 implies thatẍ 1 =ẍ 2 . Then, by integration and using that x 1 (0) = x 2 (0) = 0, it follows that there exists a real constant A ∈ R such that On the other hand, since x → (x − α 1 ) 2 is strictly convex and w 1 > 0, equality (4) for f 2 implies that x 1 (t 1 ) = x 2 (t 1 ). Combining this with x 1 = x 2 + At , we obtain A = 0 (since t 1 > 0 and x 1 (0) = x 2 (0)). We have proved that x 1 (t) = x 2 (t) for all t ∈ [0, T ], and hence x 1 = x 2 in X. This concludes the proof that f is strictly convex on X, and from this it also follows that the minimizer is unique.

The Karush-Kuhn-Tucker Conditions
The constrained optimization problem will be solved with a vector space version of the KKT method, cf. Theorem 1 p. 249 of [6].
It is straight-forward to check that G is Fréchet differentiable, and its derivative is By the Riesz representation theorem (see e.g. pp. 113-115 of [6], and pp. 146-150 of [18] We denote the dual space of Z by Z * . By the above result, Z * is identified with NBV([0, T ]) × R, and the norm of an element (ν, μ) ∈ Z * is given by The positive cone in Z is It is clear that P has a nonempty interior. The positive cone P * in X * is We will derive the KKT conditions for the optimization problem of Eq. (1). In order to do this, we first show that all points satisfying the inequality G(x) ≤ 0 (i.e. G(x) ∈ −P ) are regular points for this inequality (cf. [6], p. 248).
Proof Let x ∈ X be such that G(x) ∈ −P . We need to show that there exists an h ∈ X such that There are clearly many choices for h, for example With this choice, we have h ∈ X and The functional f : X → R defined by (3) is Fréchet differentiable, and its derivative is given by Let x * ∈ X be the minimizer of f subject to G(x) ∈ −P . By the KKT Theorem (see [6], p. 249), there exists a z * ∈ Z * , z * ≥ 0 (i.e. z * ∈ P * ) such that the Lagrangian An explicit statement of the KKT conditions implies the following result, which will be used in the next section for constructing a numerical algorithm for the solution: Lemma 2 Let x * ∈ X be the minimizer of the minimization problem (1)- (2), and let u * := x * . Then the following holds: (3) Ifẋ * = 0 on an interval, then u * = 0 on this interval (and hence it is an affine function also there).
Remark 1 We cannot conclude directly from Lemma 2 that u * is piecewise linear, since we cannot yet rule out that there is an increasing sequence of points s j → s 0 such thatẋ * (s j ) = 0 while x * (t) > 0 for t ∈ (s j −1 , s j ) (and u * is affine on each of the intervals (s j −1 , s j )). We will see in Lemma 3, that this does not happen for the optimal curve, and u * is in fact piecewise linear.
Proof By the KKT conditions [6], p. 249, there exists a ν * ∈ NBV([0, T ]) and a μ * ∈ R such that where ν * is nondecreasing and μ * ≥ 0. Equation (6) is the complementary slackness condition, and together with the constraint G( The Riemann-Stieltjes integral in (5) may be integrated by parts, and doing so and noting that dḣ(t) =ḧ dt (sinceḣ ∈ H 1 ((0, T )) and hence absolutely continuous), we obtain after collecting the two integral terms for all h ∈ X. Choosing h(t) = 0 on all except one of the subintervals (t i−1 , t i ), i = 1, . . . , m, we conclude that for each i = 1, . . . , m + 1, . We may assume (by choosing a representative for the function u * ∈ L 2 ((0, T ))), that u Hence u * is a right continuous function of bounded variation. Now with a general h ∈ X, the integral term of (7) can be rewritten using integration by parts as By choosing h appropriately (i.e. exactly one of h(t i ) andḣ(t i ) not equal to zero), we conclude that where we have also used that h(0) = 0. In particular, since ν * (0) = 0 and β 1 = 0, it follows that u * (0) = 0. Hence u * ∈ NBV([0, T ]). Note that the first equation of (8) implies that u * + ν * is continuous at the spline knots t i , i = 1, . . . , m, and the second equation implies that the derivative of u * + ν * has jumps of size −w i (x * (t i ) − α i ) at t i , i = 1, . . . , m − 1.
Recall that ν * is (locally) constant for t such thatẋ * (t) > 0. Let us examine what happens for a point s 0 such thatẋ * (s 0 ) = 0. If s 0 is an isolated zero ofẋ * , then ν * may have a jump discontinuity at s 0 (where it is right continuous). Ifẋ * = 0 on an interval around s 0 , then clearly also u * =ẍ * = 0 on this interval. In particular u * is piecewise linear in this interval.

An Algorithm for Computing the Optimal Solution
Using Lemma 2, we will now reformulate the optimization problem as a finite dimensional problem which we can solve numerically. This is essentially the approach of Sect. 7.3 of [3], and their method has been adapted to the extra constraint x(T ) ≤ x max . Instead of using dynamical programming as in [3], we give an outline of a branch and bound algorithm for finding an optimal solution to the problem. The variables of this new problem are x 1 , . . . , x m , v 1 , . . . , v m , which are the (unknown) values of the function x(t) and its derivativeẋ(t) at the spline knots.
Assuming initially that the values x 1 , . . . , x n and v 1 , . . . , v m are known, we will use the method of [3] to determine a curve with minimal cost under the constraint that it passes through the points (t 1 , x 1 ), . . . , (t m , x m ) with derivatives at these points equal to v 1 , . . . , v m , respectively. This will give us a new cost function depending on the variables x 1 , . . . , x m and v 1 , . . . , v m . Minimizing this function is equivalent to the original infinite dimensional problem.
Now we focus our attention on one interval [t i , t i+1 ], and rename it [t 0 , t F ]. Without loss of generality, we assume that t 0 = 0. The corresponding values of x at the endpoints 0 and t T are denoted by x 0 and x F , respectively. We assume without loss of generality that x 0 = 0. The values of the derivatives at the endpoints are denoted byẋ 0 andẋ F . The following lemma is essentially given in [3], but here we provide the full proof with more details, taking care of excluding the pathological case in the remark after Lemma 2. Note also that a typo in formula (7.27) of [3] has been corrected (ẋ 2 i instead ofẋ i ).

Lemma 3 [3]
Suppose thatẋ 0 ,ẋ F ≥ 0, x F ≥ 0 and t F > 0. Then the optimal control u which minimizes The contribution to the cost in the two cases is The corresponding spline function x is given by , and Proof The optimal control u which minimizes t F 0 u 2 dt subject to the constraintsẍ = u, x(0) = 0, x(t F ) = x F ,ẋ(0) =ẋ 0 ,ẋ(t F ) =ẋ F (that is all the constraints except the monotonicity constraintẋ ≥ 0), is an affine function u(t) = Ct + D where C and D are chosen so that the constraints are satisfied. This gives the expression for all choices of x F ,ẋ 0 ,ẋ F and t F . By integration and using the remaining constraints, the corresponding curve x(t) on the interval (0, t F ) is given by Clearly, in the cases when the monotonicity constraintẋ ≥ 0 is satisfied, this curve is optimal also for the original problem. We claim that theẋ ≥ 0 on (0, t F ) if and only if To prove this, note that the quadratic functionẋ is given bẏ We note thatẋ(0) =ẋ 0 ≥ 0 andẋ(t F ) =ẋ F ≥ 0 by the assumptions, and soẋ(t) ≥ 0 for all t ∈ (0, t F ) if and only if the value ofẋ at an interior minimizer is nonnegative. We study the cases C ≥ 0 and C < 0 separately. If C < 0, thenẋ doesn't have a minimizer, and so the monotonicity condition will always be satisfied. We note that in this case, is always satisfied. If C ≥ 0, i e if x F t F ≤ 1 2 (ẋ 0 +ẋ F ), then we need a necessary and sufficient condition for when the minimum value ofẋ is nonnegative and the minimizer belongs to the interval (0, t F ). The minimizer belongs to the interval if and only which holds if and only if The minimum value B − D 2 2C is nonnegative if and only if BC , and this inequality holds if and only if We note that the right inequality is always satisfied if the minimizer belongs to the interval (0, t F ), by (10) and since min(ẋ 0 ,ẋ F ) ≤ √ẋ 0ẋF . To summarize, we see that irrespective of the sign of C, a necessary and sufficient condition for the monotonicity of x on (0, t F ) is that as required. Let In view of Lemma 2 and by considering functions which on each subinterval is of the form of Lemma 3, we have proved the following: (11) subject to the constraints −ẋ i ≤ 0 for i = 1, . . . , m,

Theorem 2 The infinite dimensional optimization problem of Sect. 2 is equivalent to the finite dimensional problem
The nonlinear optimization problem of Theorem 2 can be solved numerically, for example with fmincon in matlab. Unfortunately, this method does not seem to give stable results when there are more than 10 subintervals, and it is hard to analyse due to the piecewise defined objective function.
To come around this problem, we suggest using a branch-and-bound approach which is outlined below. We emphasise that the algorithm is guaranteed to terminate, since there are finitely many (at most 2 m+1 , but probably much less in practice) subproblems to solve, each of which are convex and can be solved within a fixed time, e.g. with Newton's method. We suggest that a breadth first search is used when going through the branches of the tree, and it is likely that for most problems it will not be needed to search through so many levels of the tree. Further investigations will be needed to find out how efficient the algorithm is and the limit of the size of the problem that can be solved in practice. These questions will be addressed in a future project. In the current paper (Sects. 6 and 7) we give some examples where the method has been implemented for up to 30 data points with good results.
1. Start by fitting an ordinary cubic smoothing spline using the data points and with the constraints that x(0) = 0 and x(T ) ≤ x max . If this curve satisfiesẋ(t) ≥ 0 for every t ∈ (0, T ), or, equivalently . . , m − 1, then this must be the optimal curve, and we can stop. Otherwise, the value of the optimal function for this step gives a lower bound for the optimal solution. 2. If the curve in step 1 was not optimal, we branch the problem into m − 1 subproblems, where each subproblem corresponds to an interval for which the spline is given by a piecewise defined curve as in (9), whereas the spline should be an ordinary cubic spline on the other subintervals. A minimization problem is solved using (11), except that the first line in the definition is taken for the subintervals where the curve should be an ordinary spline, and the second line for the subinterval where the spline should be piecewise defined. The optimal value for each of these subproblems give lower bounds for the optimum of that branch. If the curve is monotone, then we have an optimum for the current branch and don't need to branch any further. Otherwise, that node have to be branched into further subproblems, each with one more subinterval where we use a piecewise defined spline curve. 3. We continue branching and bounding. Branches whose lower bound is smaller than an optimum in another side branch can be cut off, and don't need to be examined further.
In the end, we compare the branch with the smallest optimum, which will give the minimum of the full problem.
In Sects. 6 and 7, we show how this method can be used to construct graphs of increasing curves relevant in applications from computational biology and for finding cumulative distribution functions.

Applications to an Intracellular Signaling Model
As a first application of the algorithm developed in Sect. 5, we suggest curve fitting using monotone smoothing splines as an alternative to the parametric models that are commonly used in modelling of pathways of the cell. This is expected to be particularly useful in cases where the underlying chemistry is not completely understood, but when certain monotonicity trends in the data can be observed. The ODE models or stochastic models that are commonly used can be very complicated, see e.g. [10] where the modelling process of this type of models is described. In Sect. 3 of this reference, a relatively simple modelling example of this type, occurring in neuroscience is given, which we describe briefly here, to give the reader context.
Calmodulin is an abbreviation for calcium-modulated protein, which is an intermediate calcium-binding messenger protein present in all eukaryotic cells. Once bound to a calcium ion, calmodulin acts as part of a calcium signal pathway by modifying its interactions with various target proteins such as kinases or phosphatases. Its importance in neuroscience stems from its crucial involvement in synaptic plasticity.
We consider five data sets with experimental data taken from [1,11,15,16], corresponding to models describing this particular pathway. See [5,10] for a model using a system of ordinary differential equations stemming from steady state equations for these reactions, and where the same data sets are used. This particular model consists of the elementary species  [10] with monotone smoothing splines, using the branch and bound algorithm outlined in Sect. 5. The data for the respective subfigures are taken (in order) from [1,11,15,16] calcium (Ca), calmodulin (CaM), protein phosphatase 2B (PP2B), and Ca/CaM-dependent protein kinase II (CaMKII) and protein phosphatase 1 (PP1).
Up to four Ca ions are bound by calmodulin, and the first data set that we consider describes how many (moles of) ions of calcium are bound to CaM per (mole of) CaM, and this quantity is plotted versus the Ca concentration. When the Ca concentration increases, more Calcium ions are bound to the proteins, and it is therefore natural to assume that the curve corresponds to the graph of an increasing function. Each calmodulin molecule can bind at most four Ca ions, and hence the range of the function is naturally included in [0, 4]. When there is no Ca present in the system, there cannot be any bounds, and so we require that the curve passes through the origin. The data set is taken from [16], and the data together with the fitted monotone spline is shown in the first subfigure of Fig. 1.
The binding of Ca ions by Calmodulin is a cooperative process. Ca-bound CaM activates PP2B, another protein implicated in molecular processes related to learning which also plays a role in striatal signaling. Dataset 2, which is taken from [11], describes the number of moles of apo calmodulin (apoCaM, i e calmodulin without calcium bound to it) bound to each mole PP2B versus the concentration of apoCaM. Since one PP2B molecule can bind at most one calmodulin molecule, the range of the function is naturally between 0 and 1. As there has to be apoCaM in the system for this type of binding to occur, we require that the (0, 0) is on the curve. The more apoCaM there is in the system, the more likely it is for such a binding to occur, and it is hence natural to assume that the fitted function is increasing.
In the third subfigure, percentage activation of PP2B is plotted versus Ca concentration at two different concentrations of CaM (30 nM for the left curve and 300 nM for the right curve). The data for this subfigure is taken from [16]. Naturally, the range of the function is contained in [0, 100], and the data suggests that the binding is more likely to occur for higher concentrations of Ca, and hence it is natural to fit the data with a curve which is the graph of an increasing function. Again, Ca is needed for the activation to occur, and for this reason we require that the curve starts at the origin.
The third protein CaMKII, is a kinase, which is activated by the binding of Ca-CaM. In the fourth subfigure, we consider data from [15], representing the number of moles of Ca that is bound to CaM per mole of CaM in the presence of the enzyme CaMKII. As in subfigure 1, the curve is expected to be the graph of a function which is increasing, whose range is contained in [0, 4], and which is originating from the origin.
CaMKII molecules exist as dodecamers, consisting of two hexamer rings. A CaMKII unit that has bound CaM can autophosphorylate when sitting beside an active neighboring unit in the same hexamer ring. The phosphorylated unit can remain active even in the absence of Ca-CaM. In subfigure 5, the data comes from [1], and it describes the percentage phosphorylated CaMKII (autonomy in CaMKII activity) versus calcium concentration.
The method of the current paper is applied to data sets in order to fit curves which come close to the data points. The weights w i were set to 1 for all examples and the parameter λ was chosen to be 100 for the first two datasets and 1000 for the three last.
The same range was used for the variables as in Fig. 12.3 of [10]. Instead of imposing a new type of condition corresponding to the limit as the independent variable tends to infinity, an additional data point was introduced, which forces the curve to come close to the maximum bound at the right endpoint of the interval. For example, for dataset 1, we demand that the curve comes close to the point (6,4). After doing this, the method of Sect. 5 could be used directly. The plots of Fig. 1 were obtained, and these can be compared to the first five plots in Fig. 12

Applications for Cumulative Distribution Functions and an Example from Cell Cycle Models
A second application to the technique of this paper, is for reconstructing an unknown distribution function given some data points. Suppose that it is known that a sample comes from a distribution with an absolutely continuous distribution function, but that the exact form of the distribution is unknown. Then we could reconstruct the cumulative distribution function by using monotone splines. We first test the method from a sample of data points coming from a normal distribution.
Using x max = 1, 1000 random points were generated following the normal distribution with expectation value 0 and standard deviation 1. Then a histogram with 20 bins was created using Matlab's function histcounts. The vector α with data values was created by using Matlab's function cumsum. With λ = 50, the method of this paper was used with the minor modification that u(0) = 0 is replaced by u(t 0 ) ≥ 0 to create an approximation of the cumulative distribution function. By differentiating the obtained spline function an estimate for the density function could also be obtained. The results can be seen in Fig. 2.
The method is more useful when the data does not come from a standard distribution, and the following is an example of such a situation arising in cell biology. The cell cycle consists of four distinct phases: G1, S, G2 and M. For many types of cells, the time a cell spends in the G1 phase is highly variable, and it is of interest to find the distribution for the time a cell spends in the G1 phase. See for example [7], where such a distribution is used in an age structured cell cycle model. FUCCI is a fluorescence technology that can be used for tracking the time an individual cell spends in the G1 phase [13,14]. Using the movie S1 of the supplementary material of [13], the histogram data for the time that each cell in that movie stays in the G1 phase was obtained. Using λ = 3, the cumulative distribution function could be estimated and is shown in Fig. 3