Monotone Smoothing Splines with Bounds

The problem of monotone smoothing splines with bounds is formulated as a constrained minimization problem of the calculus of variations. Existence and uniqueness of solutions of this problem is proved, as well as the equivalence of it to a finite dimensional but nonlinear optimization problem. A new algorithm for computing the solution which is a spline curve, using a branch and bound technique, is presented. The method is applied to examples in neuroscience and for fitting cumulative distribution functions from data.


Introduction
Splines and smoothing splines have a long history of application in many fields. The basic history is outlined in Egerstedt and Martin, [3] and in Wahba, [17]. In this paper we return to the problem of monotone smoothing splines, which was previously studied in [3,4,7]. A classical application is to determine the average growth curve of a population of juveniles. Suppose we have a population of perhaps 30 children whose heights are measured every 6 months from age 2 to 20. It is easy to fit a smoothing spline to the data set but there is no guarantee of monotonicity. Since people just do not get shorter and then regrow, the smoothing spline is not appropriate for this and other similar situations. Instead, the appropriate tool is the monotone smoothing spline. Charles, Sun and Martin [2] used monotone smoothing splines in calculating distribution functions. There a problem can arise in that the spline may become greater than 1 at some point and by monotonicity it can never decrease, and so this violates the condition that a cumulative distribution only takes values between 0 and 1, a problem that was not addressed in [2]. In this paper, we solve this problem by imposing a condition that the spline is bounded above by some number x max (which would be 1 in the case of cumulative distribution functions).
The main difficulty of the problem is the monotonicity constraintẋ ≥ 0, which has to hold at every point t in the interval under consideration. This infinite dimensional constraint is handled by a vector space version of the Karush-Kuhn-Tucker theorem, and using this, it is possible to reduce the original infinite dimensional problem to a finite dimensional problem which can be solved using a computer.
The paper is organized as follows: In Section 2, we formulate the curve fitting problem as a constrained Calculus of Variations problem, and in Section 3, we show existence and uniqueness of the minimizer of this minimization problem, and hence existence and uniqueness of the monotone smoothing spline function. In Section 4, we formulate the Karush-Kuhn-Tucker conditions for this problem, and use these to prove a key lemma, saying that the second derivative ofẍ is essentially piecewise linear. In Section 5, we use the key lemma from Section 4 to reformulate the infinite dimensional linear problem of Section 2 into a finite dimensional but nonlinear problem. This was essentially done previously in [3], except that we provide more details in the proof. We also develop a new branch and bound type algorithm for computing the optimal curve. Sections 6 and 7 contain applications and examples which show how the method can be used. In Section 6, we reconstruct sigmoidal shaped curves arising in an intracellular signalling model, while in Section 7, we apply the method on reconstructing cumulative distribution functions using data, and in particular for a cumulative distribution function arising in the cell cycle, and the distribution that we reconstruct gives the time certain cells remain in a particular phase in the cell cycle.

The problem
Let T > 0, and let 0 = t 0 < t 1 < · · · < t m = T . Consider a data set Let x max > 0, and consider the following optimization problem for functions defined on an interval [0, T ]: where H 2 ((0, T )) is the Sobolev space of twice weakly differentiable functions on (0, T ). The condition x(0) = 0 is included because in many situations in applications, it is for modelling purposes clear that the curve must satisfy this condition. This happens for example in the application of the intracellular signalling model that we discuss in Section 6. The condition x(T ) ≤ x max arises from 0 ≤ x(t) ≤ x max . Since the curve is monotonically increasing, it suffices to impose the condition at the endpoint t = T .
As (0, T ) is a bounded interval, we can use x := T 0ẍ (t) 2 dt 1/2 as a norm on H 2 ((0, T )). The rest of the paper is devoted to solving this problem, and to applications of the developed method.

Existence and uniqueness of a minimizer
and assume that m ≥ 1. There exists a unique x * ∈ X which solves the minimization problem Proof. Let f : We will use the direct method in the calculus of variations, which says that if a functional is coercive on H 2 ((0, T )) and weakly lower semicontinuous on a weakly closed set, then a minimizer exists (see e.g. [15], p. 4). It is easy to see that the set X is convex and closed in H 2 ((0, T )) and hence it is weakly closed by Mazur's lemma (see e.g. Theorem 3.13 of [10]).
We will first check that the first term of f is weakly lower semicontinuous and coercive on L 2 ([0, T ]). Indeed, it is weakly lower semicontinuous since and so if x j ⇀ x (i.e. x j converges weakly to x in H 2 ((0, T ))), then which shows that x → T 0ẍ (t) 2 dt is weakly lower semicontinuous on H 2 ((0, T )). Coerciveness of the first term on H 2 ((0, T )) is obvious since it is the square of the norm on H 2 ((0, T )).
Weak lower semicontinuity of the second term of (1) follows since H 2 ((0, T )) is compactly , and since the embedding of H 2 ((0, T )) into C([0, T ]) is compact, x j has a subsequence x j l , which converges (to x by uniqueness of a weak limit) in C([0, T ])). Finally, note that for each j = 1, . . . , m, , and so the sum in (1) is weakly continuous.
As f is a sum of two weakly lower semicontinuous functions, it is clear that f is weakly lower semicontinuous on X.
Next, we prove that f is coercive on X. As x → T 0ẍ (t) 2 dt is coercive on L 2 ([0, T ]), we see that To show uniqueness, we will show that f is strictly convex. For this, we will use that u → T 0 u(t) 2 dt and x → (x − α 1 ) 2 are strictly convex on L 2 ([0, T ]) and on R, respectively.
It is clear that f 1 , f 2 and f 3 are convex (but not strictly convex) functions on X. To show that f is strictly convex, we need to prove that if for some λ ∈ (0, 1) and x 1 , x 2 ∈ X, then x 1 = x 2 .
To prove this, we assume that (4) holds. Since f i are convex for i = 1, . . . , 3 and f = f 1 +f 2 +f 3 , equation (4) holds also when f is replaced by f i , i = 1, . . . , 3. Since u → T 0 u(t) 2 dt is strictly convex, the equality (4) for f 1 implies thatẍ 1 =ẍ 2 . Then, by integration and using that x 1 (0) = x 2 (0) = 0, it follows that there exists a real constant A ∈ R such that On the other hand, since x → (x − α 1 ) 2 is strictly convex and w 1 > 0, equality (4) for f 2 implies that x 1 (t 1 ) = x 2 (t 1 ). Combining this with x 1 = x 2 + At, we obtain A = 0 (since t 1 > 0 and x 1 (0) = x 2 (0)). We have proved that x 1 (t) = x 2 (t) for all t ∈ [0, T ], and hence x 1 = x 2 in X. This concludes the proof that f is strictly convex on X, and from this it also follows that the minimizer is unique.

The Karush-Kuhn-Tucker conditions
The constrained optimization problem will be solved with a vector space version of the KKT method, cf. Theorem 1 p. 249 of [5].
It is straight-forward to check that G is Fréchet differentiable, and its derivative is By the Riesz representation theorem (see e.g. pp. 113-115 of [5], and pp. 146-150 of [16]) We denote the dual space of Z by Z * . By the above result, Z * is identified with N BV ([0, T ])× R, and the norm of an element (ν, µ) ∈ Z * is given by The positive cone in Z is It is clear that P has a nonempty interior. The positive cone P * in X * is We will derive the KKT conditions for the optimization problem of equation (1). In order to do this, we first show that all points satisfying the inequality G(x) ≤ 0 (i.e. G(x) ∈ −P ) are regular points for this inequality (cf. [5], p. 248).
There are clearly many choices for h, for example With this choice, we have h ∈ X and The functional f : X → R defined by (3) is Fréchet differentiable, and its derivative is given by Let x * ∈ X be the minimizer of f subject to G(x) ∈ −P . By the KKT Theorem (see [5], p. 249), there exists a z * ∈ Z * , z * ≥ 0 (i.e. z * ∈ P * ) such that the Lagrangian An explicit statement of the KKT conditions implies the following result, which will be used in the next section for constructing a numerical algorithm for the solution: Lemma 2. Let x * ∈ X be the minimizer of the minimization problem (1) -(2), and let u * :=ẍ * . Then the following holds: (3) Ifẋ * = 0 on an interval, then u * = 0 on this interval (and hence it is an affine function also there).
Remark 1. We cannot conclude directly from Lemma 2 that u * is piecewise linear, since we cannot yet rule out that there is an increasing sequence of points s j → s 0 such thatẋ * (s j ) = 0 while x * (t) > 0 for t ∈ (s j−1 , s j ) (and u * is affine on each of the intervals (s j−1 , s j )). We will see in Lemma 3, that this does not happen for the optimal curve, and u * is in fact piecewise linear.
Proof. By the KKT conditions [5], p. 249, there exists a ν * ∈ N BV ([0, T ]) and a µ * ∈ R such that for all h ∈ X. Furthermore, where ν * is nondecreasing and µ * ≥ 0. Equation (6) is the complementary slackness condition, and together with the constraint G( The Riemann-Stieltjes integral in (5) may be integrated by parts, and doing so and noting that dḣ(t) =ḧdt (sinceḣ ∈ H 1 ((0, T )) and hence absolutely continuous), we obtain after collecting the two integral terms for all h ∈ X. Choosing h(t) = 0 on all except one of the subintervals (t i−1 , t i ), i = 1, . . . , m, we conclude that for each i = 1, . . . , m + 1, . We may assume (by choosing a representative for the function u * ∈ L 2 ((0, T ))), that u * (t) + ν * (t) = β i + γ i t on the half open interval [t i−1 , t i ) for i = 1, . . . , m. Hence u * is a right continuous function of bounded variation. Now with a general h ∈ X, the integral term of (7) can be rewritten using integration by parts as By choosing h appropriately (i.e. exactly one of h(t i ) andḣ(t i ) not equal to zero), we conclude that (8) where we have also used that h(0) = 0. In particular, since ν * (0) = 0 and β 1 = 0, it follows that u * (0) = 0. Hence u * ∈ N BV ([0, T ]). Note that the first equation of (8) implies that u * + ν * is continuous at the spline knots t i , i = 1, . . . , m, and the second equation implies that the derivative of u * + ν * has jumps of size −w Recall that ν * is (locally) constant for t such thatẋ * (t) > 0. Let us examine what happens for a point s 0 such thatẋ * (s 0 ) = 0. If s 0 is an isolated zero ofẋ * , then ν * may have a jump discontinuity at s 0 (where it is right continuous). Ifẋ * = 0 on an interval around s 0 , then clearly also u * =ẍ * = 0 on this interval. In particular u * is piecewise linear in this interval.

An algorithm for computing the optimal solution
Using Lemma 2, we will now reformulate the optimization problem as a finite dimensional problem which we can solve numerically. This is essentially the approach of Section 7.3 of [3], and their method has been adapted to the extra constraint x(t) ≤ x max . The dynamical programming approach which is outlined in [3], will not be used here however, because enough details of the algorithm is not given to implement it, and so it cannot be verified. Moreover, the code the authors used for monotone splines in [3] seems to be lost, making it impossible to compare the results of the two approaches. Instead of using dynamical programming, we develop here a branch and bound algorithm for finding an optimal solution to the problem. The variables of this new problem are x 1 , . . . , x m , v 1 , . . . , v m , which are the (unknown) values of the function x(t) and its derivativeẋ(t) at the spline knots.
Assuming initially that the values x 1 , . . . , x n and v 1 , . . . , v m are known, we will use the method of [3] to determine a curve with minimal cost under the constraint that it passes through the points (t 1 , x 1 ), . . . , (t m , x m ) with derivatives at these points equal to v 1 , . . . , v m , respectively. This will give us a new cost function depending on the variables x 1 , . . . , x m and v 1 , . . . , v m . Minimizing this function is equivalent to the original infinite dimensional problem. Now we focus our attention on one interval [t i , t i+1 ], and rename it [t 0 , t F ]. Without loss of generality, we assume that t 0 = 0. The corresponding values of x(t) at the endpoints 0 and t T are denoted by x 0 and x F , respectively. We assume without loss of generality that x 0 = 0. The values of the derivatives at the endpoints are denoted byẋ 0 andẋ F . The following lemma is essentially given in [3], but here we provide the full proof with more details, taking care of excluding the pathological case in the remark after Lemma 2. Note also that a typo in formula (7.27) of [3] has been corrected (ẋ 2 i instead ofẋ i ). Lemma 3.
[3] Suppose thatẋ 0 ,ẋ F ≥ 0, x F ≥ 0 and t F > 0. Then the optimal control u which minimizes t F 0 u 2 dt subject to the constraintsẍ(t) = u(t) for t ∈ (0, t F ), x(0) = 0, x(t F ) = x F ,ẋ(0) =ẋ 0 ,ẋ(t F ) =ẋ F andẋ(t) ≥ 0 for t ∈ (0, t F ) is given by . The contribution to the cost in the two cases is The corresponding spline function x is given by , and (9) Proof. The optimal control u which minimizes that is all the constraints except the monotonicity constraintẋ ≥ 0), is an affine function u(t) = Ct + D where C and D are chosen so that the constraints are satisfied. This gives the expression for all choices of x F ,ẋ 0 ,ẋ F and t F . By integration and using the remaining constraints, the corresponding curve x(t) on the interval (0, t F ) is given by Clearly, in the cases when the monotonicity constraintẋ ≥ 0 is satisfied, this curve is optimal also for the original problem. We claim that theẋ ≥ 0 on (0, t F ) if and only if x F t F ≥ 1 3 ẋ 0 +ẋ 0 − √ẋ 0ẋF . To prove this, note that the quadratic functionẋ is given bẏ We note thatẋ(0) =ẋ 0 ≥ 0 andẋ(t F ) =ẋ F ≥ 0 by the assumptions, and soẋ(t) ≥ 0 for all t ∈ (0, t F ) if and only if the value ofẋ at an interior minimizer is nonnegative. We study the cases C ≥ 0 and C < 0 separately. If C < 0, thenẋ doesn't have a minimizer, and so the monotonicity condition will always be satisfied. We note that in this case, , then we need a necessary and sufficient condition for when the minimum value ofẋ is nonnegative and the minimizer belongs to the interval (0, t F ).

The minimizer belongs to the interval if and only if
which holds if and only if (10) , and this inequality holds if and only if 1 We note that the right inequality is always satisfied if the minimizer belongs to the interval (0, t F ), by (10) and since min(ẋ 0 ,ẋ F ) ≤ √ẋ 0ẋF . To summarize, we see that irrespective of the sign of C, a necessary and sufficient condition for the monotonicity of x on (0, t F ) is that In view of Lemma 2 and by considering functions which on each subinterval is of the form of Lemma 3, we have proved the following: Theorem 2. The infinite dimensional optimization problem of Section 2 is equivalent to the finite dimensional problem subject to the constraints −ẋ i ≤ 0 for i = 1, . . . , m, The nonlinear optimization problem of Theorem 2 can be solved numerically, for example with fmincon in matlab. Unfortunately, this method does not seem to give stable results when there are more than 10 subintervals, and it is hard to analyse due to the piecewise defined objective function.
To come around this problem, we suggest using a branch-and-bound approach which is outlined below. We emphasise that the algorithm is guaranteed to terminate, since there are finitely many (at most 2 m+1 , but probably much less in practice) subproblems to solve, each of which are convex and can be solved within a fixed time, e.g. with Newton's method. We suggest that a breadth first search is used when going through the branches of the tree, and it is likely that for most problems it will not be needed to search through so many levels of the tree. Further investigations will be needed to find out how efficient the algorithm is and how large problems can be solved in practice. In the current paper (Sections 6 and 7 we give some examples where the method has been implemented for up to 30 data points with good results. 1. Start by fitting an ordinary cubic smoothing spline using the data points and with the constraints that x(0) = 0 and x(T ) ≤ x max . If this curve satisfiesẋ(t) ≥ 0 for every t ∈ (0, T ), or, equivalently for every i = 0, . . . , m − 1, then this must be the optimal curve, and we can stop. Otherwise, the value of the optimal function for this step gives a lower bound for the optimal solution. 2. If the curve in step 1 was not optimal, we branch the problem into m − 1 subproblems, where each subproblem corresponds to an interval for which the spline is given by a piecewise defined curve as in (9), whereas the spline should be an ordinary cubic spline on the other subintervals. A minimization problem is solved using (11), except that the first line in the definition is taken for the subintervals where the curve should be an ordinary spline, and the second line for the subinterval where the spline should be piecewise defined. The optimal value for each of these subproblems give lower bounds for the optimum of that branch. If the curve is monotone, then we have an optimum for the current branch and don't need to branch any further. Otherwise, that node have to be branched into further subproblems, each with one more subinterval where we use a piecewise defined spline curve. 3. We continue branching and bounding. Branches whose lower bound is smaller than an optimum in another side branch can be cut off, and don't need to be examined further. In the end, we compare the branch with the smallest optimum, which will give the minimum of the full problem.
In Sections 6 and 7, I have used this method to construct increasing curves relevant for some applications to curve fitting and for finding cumulative distribution functions.

Applications to an intracellular signaling model
As an application of the model, we consider five data sets corresponding to models occuring in neuroscience. For these types of models, the output functions typically have a sigmoidal shape, and so they are monotonically increasing, and are also subject to some bounds, which are naturally satisfied by these functions. In particular for biochemical reasons, for dataset 1 it is necessary that the function ranges between 0 and 4. The output function of dataset 2 naturally ranges between 0 and 1, and the output functions used together with datasets 3 and 5 are both bounded between 0 and 100. All fitted curves should also pass through the origin, and tend to the respective upper bound as the independent variable tends to infinity.
The method of the current paper is applied to data sets in order to fit curves which come close to the data points. The weights w i were set to 1 for all examples and the parameter λ was chosen to be 100 for the first two datasets and 1000 for the three last.
The same range was used for the variables as in Figure 12.3 of [8]. Instead of imposing a new type of condition corresponding to the limit as the independent variable tends to infinity, an additional data point was introduced, which forces the curve to come close to the maximum bound at the right endpoint of the interval. For example, for dataset 1, we demand that the curve comes close to the point (6,4). After doing this, the method could be used directly. The with monotone smoothing splines, using the branch and bound algorithm outlined in Section 5. The data for the respective subfigures are taken (in order) from [14,9,14,13,1].
plots of Figure 1 were obtained, and these can be compared to the first five plots in Figure  12.3 of [8].

Applications for cumulative distribution functions and an example from cell cycle models
A second application to the technique of this paper, is for reconstructing an unknown distribution function given some data points.
Suppose that it is known that a sample comes from a distribution with an absolutely continuous distribution function, but that the exact form of the distribution is unknown. Then we could reconstruct the cumulative distribution function by using monotone splines. We first test the method from a sample of data points coming from a normal distribution.
Using x max = 1, I generated 1000 random points following the normal distribution with expectation value 0 and standard deviation 1. Then a histogram with 20 bins was created using Matlab's function histcounts. The vector α with data values was created by using Matlab's function cumsum. With λ = 50, the method of this paper was used with the minor modification that u(0) = 0 is replaced by u(t 0 ) ≥ 0 to create an approximation of the cumulative distribution function. By differentiating the obtained spline function an estimate for the density function could also be obtained. The results can be seen in Figure 2.
The method is more useful when the data does not come from a standard distribution, and the following is an example of such a situation arising in cell biology. The cell cycle consists of four distinct phases: G1, S, G2 and M . For many types of cells, the time a cell spends in the G1 phase is highly variable, and it is of interest to find the distribution for the time a cell spends in the G1 phase. See for example [6], where such a distribution is used in an  Figure 2. Estimate of the cumulative distribution function using monotone splines (left), and comparison of the derived density function with the exact density function of the normal distribution and the histogram (right).
age structured cell cycle model. FUCCI is a fluorescence technology that can be used for tracking the time an individual cell spends in the G1 phase [11,12]. Using the movie S1 of the supplementary material of [11], the histogram data for the time that each cell in that movie stays in the G1 phase was obtained. Using λ = 3, the cumulative distribution function could be estimated and is shown in Figure 3 Figure 3. Estimate of the cumulative distribution function using monotone splines (left), and comparison of the derived density function with the histogram (right).

Acknowledgements
The author would like to thank Clyde Martin for reading and commenting on an earlier version of the manuscript and Olivia Eriksson for a useful discussion about the data sets occurring in Figure 1.