Skip to main content

Binary optimal control by trust-region steepest descent


We present a trust-region steepest descent method for dynamic optimal control problems with binary-valued integrable control functions. Our method interprets the control function as an indicator function of a measurable set and makes set-valued adjustments derived from the sublevel sets of a topological gradient function. By combining this type of update with a trust-region framework, we are able to show by theoretical argument that our method achieves asymptotic stationarity despite possible discretization errors and truncation errors during step determination. To demonstrate the practical applicability of our method, we solve two optimal control problems constrained by ordinary and partial differential equations, respectively, and one topological optimization problem.


We consider optimization problems of the form

$$\begin{aligned} \min _U {\mathcal {J}}(U) \quad \text {s.t.} \quad U \in \varSigma , \end{aligned}$$

where the variable U is a measurable set selected from a finite atomless measure space \((\varOmega , \varSigma , \mu )\) and \({\mathcal {J}} :\varSigma \rightarrow {\mathbb {R}}\) is differentiable with respect to its set-valued argument U. Specifically, we focus on cases where there exist a Banach space Y, a continuously Fréchet differentiable map \(J :Y \rightarrow {\mathbb {R}}\), and a vector measure \(\nu :\varSigma \rightarrow Y\) of bounded variation, such that

$$\begin{aligned} {\mathcal {J}} = J \circ \nu . \end{aligned}$$

Such optimization problems occur implicitly in the context of binary optimal control whenever Lebesgue measurable control functions are considered. For instance, the Lotka–Volterra fishing problem, a test problem from ODE-constrained optimal control that we address in Sect. 4.2, takes the form

$$\begin{aligned} \begin{aligned}&\min _{y, w}\, \int _0^{t_f} \Vert y(t) - (1, 1)^T \Vert ^2 \,\mathrm {d}t \\&\,\,\text {s.t.}\, {\dot{y}}_1(t) = y_1(t) - y_1(t) y_2(t) - c_1 y_1(t) w(t) \qquad \text {for a.a.}\ t \in [0, t_f], \\&\,\quad \,\,\,\, {\dot{y}}_2(t) = -y_2(t) + y_1(t) y_2(t) - c_2 y_2(t) w(t) \quad \text {for a.a.}\ t \in [0, t_f], \\&\,\,\quad \,\,\,\,y(0) = y_0, \\&\,\,\,\quad \,\,\, w(t) \in \lbrace 0, 1 \rbrace \qquad \qquad \qquad \,\,\,\qquad \qquad \qquad \qquad \qquad \text {for a.a.}\ t \in [0, t_f], \\&\,\,\,\,\,\quad \quad \,\,\, w \in L^1([0, t_f]).\\ \end{aligned} \end{aligned}$$

Here, \(\nu \) maps a Lebesgue-measurable control set \(U \subseteq [0, t_f]\) to its characteristic function \(\chi _U\) which serves as w. Each w corresponds to an ODE solution \(y_w\). J(w) is then given by

$$\begin{aligned} J(w) = \int _0^{t_f} \Vert y_w(t) - (1, 1)^T \Vert ^2 \,\mathrm {d}t \end{aligned}$$

and the final objective function \({\mathcal {J}}\) of Problem (1) is \({\mathcal {J}}(U) :=J(\chi _U)\). We develop our theory for the abstract problem (1). However, this more concrete example may aid the reader’s understanding.

Binary optimal control problems are solved both locally and globally by using various approaches, including indirect methods based on the global maximum principle [14, 15], dynamic programming [8, 18], moment relaxations [33], combinatorial integral approximation decompositions [16, 34], or direct first-discretize-then-optimize methods that result in mixed-integer nonlinear programs [3]. We refer to [33, 37] for broader surveys.

Like decomposition methods, we use the fact that, under mild assumptions, control sets form a continuum and can therefore be improved incrementally. To achieve improvement, we use local linearizations based on a gradient concept similar to the topological gradient function as defined in [28].

In topology optimization, the topological gradient is often dependent on perturbation shape. This is the case, for instance, if new boundary conditions are added by perturbations (see, e.g., [28, 36]) and can limit the ways in which perturbations can be made. There are solution heuristics (see, e.g., [12]) that are applicable in these situations. However, more rigorous approaches based on, for instance, fixed-point iterations [9, 27], gradient descent on level set functions [2], and gradient descent on binary-valued indicator functions [10] do also exist. For a broader overview over shape and topology optimization, we refer to [11, 13].

The method presented here is not designed to solve topology optimization problems. It differs from [2] in the sense that it does not operate on level set functions. Instead, it operates directly on sets, like in [9, 27]. It differs from these fixed-point iteration schemes in that it works cumulatively by deriving steps that are “added” to the current control set U rather than completely replacing it.

It most closely resembles the method of [10]. The main difference here is that it is stated as a generic trust-region method within a metric space. Much of its theoretical framework mirrors that of existing trust-region methods. It therefore offers great potential for future extensions into constrained optimization and optimization with higher-order derivatives, both of which are difficult with the specific framework given in [10].

Finally, we note that there are optimal control methods that use measure-valued controls, e.g., [22]. We note that our method does not use measure-valued controls, but rather set-valued controls. While hybrid measure and set optimization methods may be an interesting avenue of research for the future, we will not discuss measure-valued optimization here.

Contribution We derive the topological gradient as a derivative with respect to changes in the control set U. We provide a framework for transforming optimal control problems into this form and deriving topological gradients. We demonstrate this for both ODE- and PDE-constrained problems.

Our main theoretical contribution lies in the use of the metric structure of measure spaces. We show that asymptotic stationarity can be guaranteed using the trust-region framework which automatically discovers appropriate step sizes by solving a succession of subproblems. We describe a solution method for these subproblems in Procedure 1. We show that our steps are of adequate quality despite discretization and truncation errors in Lemma 9 and Theorem 3. We follow the reasoning of other trust-region convergence proofs to show that, given a small amount of groundwork, many of the steps are transferrable from conventional nonlinear optimization.

We present our work with the caveat that we do not prove ultimate convergence of the control sequence and that we do not even guarantee that it is a Cauchy sequence. We do, however, guarantee an actual improvement in objective and approximate stationarity in the limit. Under certain assumptions, the latter can be used to derive bounds on the optimality gap.

Outline In Sect. 2 we introduce basic notation, definitions, and prior results. In Sect. 3, we state our trust-region algorithm and show its asymptotic behavior. In Sect. 4 we apply the algorithm to three test problems as a proof of concept. In Sect. 5 we discuss our results and address some possible criticisms of the methodology. We provide some concluding remarks and speculate on avenues of future research.

Preliminaries and notation

We denote the set of natural numbers with zero by \({\mathbb {N}}_0\) and the set of non-negative real numbers by \({\mathbb {R}}_+\). We define the shorthands

$$\begin{aligned} \begin{array}{ll} \,\,\,{[}i] :=\lbrace j \in {\mathbb {N}}\mid j \le i \rbrace &{}\qquad \forall i \in {\mathbb {N}}, \\ {[}i]_0 :=\lbrace j \in {\mathbb {N}}_0 \mid j \le i \rbrace &{}\qquad \forall i \in {\mathbb {N}}_0 \end{array} \end{aligned}$$

for index sets.

For basic definitions and results of measure theory, we refer to [7]. By \(2^\varOmega \), we denote the power set of the universal set \(\varOmega \). In addition to positive and signed measures, we use vector measures, which are \(\sigma \)-additive maps \(\nu :\varSigma \rightarrow Y\), where \(\varSigma \) is a \(\sigma \)-algebra, Y is a Banach space, and \(\nu (\emptyset ) = \mathbf {0}\). Every vector measure \(\nu :\varSigma \rightarrow Y\) has an associated measure \(|\nu | :\varSigma \rightarrow {\mathbb {R}}_+ \cup \lbrace \infty \rbrace \) given by

$$\begin{aligned} |\nu |(A) :=\sup \Bigl \lbrace \sum _{i = 1}^n \bigl \Vert \nu (A_i) \bigr \Vert _Y \Bigm \vert n \in {\mathbb {N}},\ (A_i)_{i \in [n]} \subseteq \varSigma \ \text {partition of}\ A \Bigr \rbrace \quad \forall A \in \varSigma . \end{aligned}$$

This measure is referred to as the total variation of \(\nu \). If \(\Vert \nu \Vert :=|\nu |(\varOmega ) < \infty \), then \(\nu \) is said to be of bounded variation.

The concept of atomlessness is particularly important to our argument. A measure space \((\varOmega , \varSigma , \mu )\) is atomless if it contains no atoms, that is, no sets \(A \in \varSigma \) with \(\mu (A) > 0\) such that for every measurable subset \(B \subseteq A\), we have either \(\mu (B) = \mu (A)\) or \(\mu (B) = 0\). Given an atomless measure space, a measurable set \(A \in \varSigma \), and any number \(\theta \in [0, \mu (A)]\), there exists a measurable set \(B \subseteq A\) with \(\mu (B) = \theta \).

If \(\mu \) is a measure and \(\nu \) is a signed or vector measure over \((\varOmega , \varSigma )\) with \(|\nu |(A) = 0\) for all \(A \in \varSigma \) with \(|\mu |(A) = 0\), then \(\nu \) is called absolutely continuous with respect to \(\mu \) and is written as \(\nu \ll \mu \). For finite signed measures, absolute continuity implies the existence of a density function.

Lemma 1

(Radon–Nikodym) Let \(\varphi \) be a finite signed measure over a finite measure space \((\varOmega , \varSigma , \mu )\) such that \(\varphi \ll \mu \). Then there exists a \(\mu \)-integrable function \(f :\varOmega \rightarrow {\mathbb {R}}\) such that

$$\begin{aligned} \varphi (A) = \int _A f \,\mathrm {d}\mu \qquad \forall A \in \varSigma . \end{aligned}$$


See [7, Thm. 3.2.2]. \(\square \)

The function f in Lemma 1 can be seen as the density function of \(\varphi \) with respect to \(\mu \). We will subsequently refer to it as such. The average of a density function over a given \(\mu \)-measurable set D is given by \(\frac{1}{\mu (D)} \int _D f(x) \,\mathrm {d}\mu = \frac{\varphi (D)}{\mu (D)}\). If \(\mu \) is the Lebesgue measure in \({\mathbb {R}}^n\), then Lebesgue’s differentiation theorem shows that f(x) can be calculated almost everywhere by taking the limit for infinitesimally small balls around x. We refer to [7, Thm. 5.6.2] for proof.

The symmetric difference between two sets \(A,\ B\) is given by

$$\begin{aligned} A \mathbin {\triangle }B :=(A {\setminus } B) \cup (B {\setminus } A) = A {\setminus } (B \cap A) \cup (B {\setminus } A). \end{aligned}$$

Given a finite measure space \((\varOmega , \varSigma , \mu )\), the map \(d :\varSigma \times \varSigma \rightarrow {\mathbb {R}}_+\) with

$$\begin{aligned} d(A, B) :=\mu (A \mathbin {\triangle }B) \end{aligned}$$

is a pseudometric, meaning that it is symmetric and subadditive. By considering the quotient space with respect to the equivalence relation of being equal up to a \(\mu \)-nullset, d can be made into a metric. We note that \(U \mathbin {\triangle }(U \mathbin {\triangle }D) = D\) and therefore \(d(U, U \mathbin {\triangle }D) = \mu (D)\).

Given a measure space \((\varOmega , \varSigma , \mu )\) and a \(\mu \)-measurable function \(g :\varOmega \rightarrow {\mathbb {R}}\), we denote the various types of level sets of g by

$$\begin{aligned} {\mathcal {L}}_{g \sim \eta } :=\bigl \lbrace x \in \varOmega \bigm \vert g(x) \sim \eta \bigr \rbrace \in \varSigma , \end{aligned}$$

where \({{``}\sim }\)\(\in \lbrace \, {{``}<}\)”, \({{``}\le }\)”, \({{``}=}\)”, \({{``}\ge }\)”, \({{``}>}\)” \(\rbrace \) and \(\eta \in {\mathbb {R}}\). These level sets are \(\mu \)-measurable which implies that their symmetric difference \(U \mathbin {\triangle }{\mathcal {L}}_{g \sim \eta }\) with a \(\mu \)-measurable control set \(U \in \varSigma \) is \(\mu \)-measurable. The same is true for unions, intersections, and differences with other \(\mu \)-measurable sets.


In this section, we state the algorithm and prove its correctness. We split this discussion into four subsections. Section 3.1 defines the topological gradient as we use it and shows how to derive it from Fréchet derivatives. Section 3.2 states a gradient-based necessary optimality criterion. Section 3.3 derives the gradient density function for two types of optimal control problems. Section 3.4 states the algorithm and its subroutines and proves that the algorithm achieves stationarity in the limit. We formulate the algorithm in a form that allows for inexactness in some steps to ensure that the procedure remains practically implementable.

Throughout this section, we use the following assumptions.

Assumption 1

Let \(\varOmega \), Y, \(\varSigma \subseteq 2^\varOmega \), \(\mu :\varSigma \rightarrow {\mathbb {R}}_+ \cup \lbrace \infty \rbrace \), \(\nu :\varSigma \rightarrow Y\), \(J :Y \rightarrow {\mathbb {R}}\), and \({\mathcal {J}} :\varSigma \rightarrow {\mathbb {R}}\) satisfy the following assumptions:

  1. 1.

    \((\varOmega , \varSigma )\) is a measurable space, and Y is a Banach space;

  2. 2.

    \(\nu \) is a vector measure of bounded variation;

  3. 3.

    J is Fréchet differentiable;

  4. 4.

    \({\mathcal {J}} = J \circ \nu \);

  5. 5.

    \(\mu \) is a finite measure;

  6. 6.

    there exists \(C > 0\) such that \(\Vert \nu (D) \Vert _Y \le C \cdot \mu (D)\) for all \(D \in \varSigma \); and

  7. 7.

    \((\varOmega , \varSigma , \mu )\) is atomless.

Taylor expansion

In traditional nonlinear optimization, we make use of the fact that sufficiently smooth functions are locally approximated by truncations of their Taylor series. We transfer this property from the Fréchet differentiable objective J to \({\mathcal {J}}\) using Assumption 1.4. This requires the chain rule.

Lemma 2

Let \(\varOmega \), \(\varSigma \), Y, \(\mu \), and \(\nu \) satisfy Assumptions 1.1, 1.2, 1.5 and 1.6; let \(U \in \varSigma \); and let \(T_U :Y \rightarrow {\mathbb {R}}\) be a bounded linear form. Then \(\varphi _U :\varSigma \rightarrow {\mathbb {R}}\) with

$$\begin{aligned} \varphi _U(D) :=T_U \bigl ( \nu (D {\setminus } U) - \nu (D \cap U) \bigr ) \quad \forall D \in \varSigma \end{aligned}$$

is a finite signed measure that is absolutely continuous; i.e., \(\varphi _U \ll \mu \).


Because \(T_U\) is bounded, there exists \(M > 0\) such that \(|T_U y| \le M \Vert y\Vert _Y\) for all \(y \in Y\). Along with Assumption 1.6, this implies that for all \(D \in \varSigma \),

$$\begin{aligned} |\varphi _U(D)|&\le M \cdot \bigl \Vert \nu (D {\setminus } U) - \nu (D \cap U) \bigr \Vert _Y \\&\le M \cdot \bigl ( \Vert \nu (D {\setminus } U) \Vert _Y + \Vert \nu (D \cap U) \Vert _Y \bigr ) \\&\le M C \cdot \mu \bigl ( (D {\setminus } U) \cup (D \cap U) \bigr ), \end{aligned}$$

which shows that \(\varphi _U(D) < \infty \) and \(\varphi _U \ll \mu \), assuming \(\varphi _U\) can be shown to be a signed measure.

Because \(T_U\) is linear and \(\nu \) is a vector measure, \(\varphi _U\) is finitely additive, and we have \(\varphi _U(\emptyset ) = 0\). To show \(\sigma \)-additivity, let \((D_i)_{i \in {\mathbb {N}}} \subseteq \varSigma \) consist of pairwise disjoint measurable sets. For \(N \in {\mathbb {N}}\), the finite additivity of \(\varphi \) implies

$$\begin{aligned} \Bigl | \sum _{i = 1}^\infty \varphi _U(D_i) - \varphi _U\Bigl ( \bigcup _{i = 1}^\infty D_i \Bigr ) \Bigr |&= \Bigl |\mathop {\sum _{i = N + 1}^\infty } \varphi _U(D_i) - \varphi _U\Bigl ( \mathop {\bigcup _{i = N + 1}^\infty } D_i \Bigr ) \Bigr | \\&\le \Bigl | \mathop {\sum _{i = N + 1}^\infty } \varphi _U(D_i) \Bigr | + \Bigl | \varphi _U\Bigl ( \mathop {\bigcup _{i = N + 1}^\infty } D_i \Bigr ) \Bigr |. \end{aligned}$$

For every \(i \in {\mathbb {N}}\), we have \(|\varphi _U(D_i)| \le M C \cdot \mu (D_i)\). This implies

$$\begin{aligned} \Bigl | \mathop {\sum _{i = N + 1}^{\infty }} \varphi _U(D_i) \Bigr |\le & {} \mathop {\sum _{i = N + 1}^\infty } \bigl | \varphi _U(D_i) \bigr | \le M C \cdot \mathop {\sum _{i = N + 1}^\infty } \mu (D_i) \\= & {} M C \cdot \mu \Bigl ( \mathop {\bigcup _{i = N + 1}^\infty } D_i \Bigr ) \xrightarrow {N \rightarrow \infty } 0. \end{aligned}$$

Similarly, we have

$$\begin{aligned} \Bigl | \varphi _U \Bigl ( \mathop {\bigcup _{i = N + 1}^\infty } D_i \Bigr ) \Bigr | \le M C \cdot \mu \Bigl ( \mathop {\bigcup _{i = N + 1}^\infty } D_i \Bigr ) \xrightarrow {N \rightarrow \infty } 0. \end{aligned}$$

In both cases, convergence to zero is guaranteed by the fact that \((D_i)_{i \in {\mathbb {N}}}\) is a sequence of pairwise disjoint subsets of \(\varOmega \), which has finite measure. In total, this means that

$$\begin{aligned} \Bigl | \sum _{i = 1}^\infty \varphi _U(D_i) - \varphi _U\Bigl ( \bigcup _{i = 1}^\infty D_i \Bigr ) \Bigr | = 0, \end{aligned}$$

which proves that \(\varphi \) is \(\sigma \)-additive. Absolute convergence is proven by the fact that the limit is identical for all rearrangements of the sequence. \(\square \)

By using the Fréchet derivative \(J'\bigl (\nu (U)\bigr )\) as \(T_U\) in Lemma 2, we can prove the existence of a finite signed measure \({\mathcal {J}}'(U)\) that acts as a first-order derivative of \({\mathcal {J}}\). With this measure, we can formulate a local first-order Taylor expansion of \({\mathcal {J}}\) around U.

Theorem 1

(Local First-Order Taylor Expansion) Let \(\varOmega \), \(\varSigma \), Y, J, \(\mu \), \(\nu \), and \({\mathcal {J}}\) satisfy Assumptions 1.1 to 1.6. For every \(U \in \varSigma \), let \({\mathcal {J}}'(U) :\varSigma \rightarrow {\mathbb {R}}\) be given by

$$\begin{aligned} {\mathcal {J}}'(U)(D) :=J'(\nu (U)) \bigl ( \nu (D {\setminus } U) - \nu (D \cap U) \bigr ) \quad \forall D \in \varSigma . \end{aligned}$$

Then \({\mathcal {J}}'(U)\) is a finite signed measure with \({{\mathcal {J}}'}(U) \ll \mu \), and

$$\begin{aligned} {\mathcal {J}}(U \mathbin {\triangle }D) = {\mathcal {J}}(U) + {\mathcal {J}}'(U)(D) + o\bigl ( \mu (D) \bigr ) \quad \forall D \in \varSigma . \end{aligned}$$


Let \(U \in \varSigma \). Because J is Fréchet differentiable in \(\nu (U)\), \(J'(\nu (U))\) is a bounded linear operator. Lemma 2 shows that \({\mathcal {J}}'(U)\) as defined in (2) is a finite signed measure and that \({\mathcal {J}}'(U) \ll \mu \). Let \({\mathcal {R}}_U :\varSigma \rightarrow {\mathbb {R}}\) be given by

$$\begin{aligned} {\mathcal {R}}_{U}(D) :={\left\{ \begin{array}{ll} \frac{1}{\mu (D)} \bigl ( {\mathcal {J}}(U \mathbin {\triangle }D) - {\mathcal {J}}(U) - {\mathcal {J}}'(U)(D) \bigr ), &{}\quad \mu (D) \ne 0 \\ 0, &{}\quad \mu (D) = 0 \end{array}\right. } \quad \forall D \in \varSigma . \end{aligned}$$

To establish (3), we need to show that \({\mathcal {R}}_U(D) \rightarrow 0\) for \(\mu (D) \rightarrow 0\). Let \((D_i)_{i \in {\mathbb {N}}}\) be a sequence in \(\varSigma \) such that \(\mu (D_i) \rightarrow 0\), and let

$$\begin{aligned} d_i :=\nu (U \mathbin {\triangle }D_i) - \nu (U) = \nu (D_i {\setminus } U) - \nu (D_i \cap U) \qquad \forall i \in {\mathbb {N}}. \end{aligned}$$

Then we have \(\Vert d_i\Vert _Y \le C \cdot \mu (D_i)\) for all \(i \in {\mathbb {N}}\) with \(C > 0\) from Assumption 1.6. Therefore, \(d_i \rightarrow 0\) for \(i \rightarrow \infty \). If \(d_i = 0\), then \({\mathcal {J}}(U \mathbin {\triangle }D_i) = {\mathcal {J}}(U)\) and \({\mathcal {J}}'(U)(D_i) = 0\), which implies \({\mathcal {R}}_U(D_i) = 0\). Without loss of generality, we consider only sequences where \(\mu (D_i) > 0\) and \(\Vert d_i\Vert _Y > 0\) for all \(i \in {\mathbb {N}}\). By combining (2), (4), and the Fréchet differentiability of J, we can conclude that

$$\begin{aligned} {\mathcal {R}}_U(D_i)&{\mathop {=}\limits ^{(2)}} \frac{1}{\mu (D_i)} \Bigl ( J\bigl (\nu (U \mathbin {\triangle }D_i)\bigr ) - J\bigl (\nu (U)\bigr ) - J'\bigl (\nu (U)\bigr ) d_i \Bigr ) \\&{\mathop {=}\limits ^{(4)}} \underbrace{\frac{\Vert d_i\Vert _Y}{\mu (D_i)}}_{\in (0, C]} \frac{1}{\Vert d_i\Vert _Y} \Bigl ( \underbrace{J\bigl (\nu (U) + d_i\bigr ) - J\bigl (\nu (U)\bigr ) - J'\bigl (\nu (U)\bigr ) d_i}_{= o({\Vert d_i \Vert }_Y)} \Bigr ) \xrightarrow {i \rightarrow \infty } 0, \end{aligned}$$

which proves (3). \(\square \)

Theorem 1 can be extended to include higher-order derivatives. This requires a technical measure extension argument and does not contribute to the method under discussion.

Because \(\mu \) and \({\mathcal {J}}'(U)\) are finite and \({\mathcal {J}}'(U) \ll \mu \), \({\mathcal {J}}'(U)\) has a \(\mu \)-integrable density function that we use to construct steepest descent steps.

Corollary 1

Let \(\varOmega \), \(\varSigma \), \(\mu \), Y, \({\mathcal {J}}\), J, and \(\nu \) satisfy Assumptions 1.1 to 1.6. For every \(U \in \varSigma \), there exists \(g_U \in L^1(\varOmega , \mu )\) such that

$$\begin{aligned} {\mathcal {J}}'(U)(D) = \int _D g_U \,\mathrm {d}\mu \quad \forall D \in \varSigma . \end{aligned}$$


Let \(U \in \varSigma \) be given. According to Theorem 1, \({\mathcal {J}}'(U)\) is a finite signed measure with \({\mathcal {J}}'(U) \ll \mu \). Because \(\mu \) is also finite, Lemma 1 proves the existence of a \(g_U \in L^1(\varOmega )\) with the stated properties. \(\square \)

Fig. 1
figure 1

Illustration of support points that would allow piecewise first-order approximation

There are different ways to calculate \(g_U\). Section 3.3 derives \(g_U\) for two exemplary ODE- and PDE-constrained problems. The approximation (3) is only accurate in a small neighborhood of U. To obtain estimates for the error accumulated over larger distances, we need to cover a connecting line with such neighborhoods and use a compactness argument to extract a finite subcover. We can then choose a finite number of support points such that first-order Taylor approximations are accurate when traveling from one support point to the next, as illustrated in Fig. 1.

Sets do not have an exact equivalent of convex combinations or “connecting lines”. We could construct a similar argument by using geodesics. In the interest of brevity, however, we assume instead that the derivative of \(J :Y \rightarrow {\mathbb {R}}\) satisfies a Lipschitz condition. We can then make the argument in the underlying vector space Y.

Lemma 3

Let \(\varOmega \), \(\varSigma \), Y, J, \(\mu \), \(\nu \), and \({\mathcal {J}}\) satisfy Assumptions 1.1 to 1.6. Furthermore, let \(C > 0\) be as specified in Assumption 1.6, and let the Fréchet derivative \(J'\) of J be such that there exists a constant \(L > 0\) with

$$\begin{aligned} {\bigl \Vert J'(x) - J'(y) \bigr \Vert }_{Y^*} \le L \cdot \Vert x - y \Vert _Y \quad \forall x, y \in {\text {conv}}\bigl (\nu (\varSigma )\bigr ) \subseteq Y. \end{aligned}$$

Then we have

$$\begin{aligned} \bigl | {\mathcal {J}}(U \mathbin {\triangle }D) - {\mathcal {J}}(U) - {\mathcal {J}}'(U)(D) \bigr | \le \frac{L C^2}{2} {\mu (D)}^2 \qquad \forall U, D \in \varSigma . \end{aligned}$$


Let \(U, D \in \varSigma \) be given and let \(x :=\nu (U)\) and \(d :=\nu (D {\setminus } U) - \nu (D \cap U)\). We have

$$\begin{aligned} x + d = \nu (U) + \nu (D {\setminus } U) - \nu (D \cap U) = \nu \bigl ( (U \cup D) {\setminus } (U \cap D) \bigr ) = \nu ( U \mathbin {\triangle }D). \end{aligned}$$

According to (2) in Theorem 1, \({\mathcal {J}}'(U)(D)\) is given by

$$\begin{aligned} {\mathcal {J}}'(U)(D) = J'(x)\bigl ( \nu (D {\setminus } U) - \nu (D \cap U) \bigr ) = J'(x) d. \end{aligned}$$

We further find that \({\Vert d \Vert }_Y \le {\bigl \Vert \nu (D {\setminus } U) \bigr \Vert }_Y + {\bigl \Vert \nu (D \cap U) \bigr \Vert }_Y \le C \cdot \mu (D)\) according to Assumption 1.6. Therefore, we have

$$\begin{aligned} \bigl | {\mathcal {J}}(U \mathbin {\triangle }D) - {\mathcal {J}}(U) - {\mathcal {J}}'(U)(D) \bigr |&= \Bigl | \int _0^1 \bigl ( J'(x + \lambda d) - J'(x) \bigr ) d \,\mathrm {d}\lambda \Bigr | \\&\le \int _0^1 \bigl \Vert J'(x + \lambda d) - J'(x) \bigr \Vert _{Y^*} \Vert d\Vert _Y \,\mathrm {d}\lambda \\&\le \Vert d \Vert _Y \cdot \int _0^1 L \cdot {\Vert x + \lambda d - x \Vert }_Y \,\mathrm {d}\lambda \\&= \frac{L}{2} \underbrace{{\Vert d \Vert }_Y^2}_{\le C^2 {\mu (D)}^2}, \end{aligned}$$

which proves (6). \(\square \)

We note that a geodesic-based argument would not require a Lipschitz condition over \({\text {conv}}\bigl ( \nu (\varSigma ) \bigr )\), but only over \(\nu (\varSigma )\), because support points can be chosen exclusively from \(\nu (\varSigma )\).

Optimality criterion

Our method works towards achieving a necessary optimality criterion based on stationarity. We have shown in Corollary 1 that the derivative measure has an integrable density function \(g_U\). The integral of \(g_U\) over a step set D approximates the change in objective for small steps. Therefore, if

$$\begin{aligned} g_U(x) \ge 0 \qquad \text {a.e. in}\ \varOmega , \end{aligned}$$

then all sufficiently small steps are predicted to either maintain or increase the objective. The following elementary lemma shows that negative density values on non-nullsets always translate into descent steps.

Lemma 4

Let \((\varOmega , \varSigma , \mu )\) be a finite atomless measure space, \(g \in L^1(\varOmega , \mu )\), and \(\lambda \in [0,1]\). Then there exists \(D \in \varSigma \) such that \(\mu (D) = \lambda \cdot \mu (\varOmega )\) and

$$\begin{aligned} \int _D g \,\mathrm {d}\mu \le \lambda \cdot \int _\varOmega g \,\mathrm {d}\mu . \end{aligned}$$


For \(\lambda = 0\), we can choose \(D :=\emptyset \). Similarly, for \(\lambda = 1\), we can choose \(D :=\varOmega \). Therefore, we consider only \(\lambda \in (0,1)\) here. Because g is \(\mu \)-integrable, we have \(\mu ({\mathcal {L}}_{g \le \eta }) \xrightarrow {\eta \rightarrow -\infty } 0\) and \(\mu ({\mathcal {L}}_{g \le \eta }) \xrightarrow {\eta \rightarrow \infty } \mu (\varOmega )\). Therefore,

$$\begin{aligned} \eta ^*:=\inf \bigl \lbrace \eta \in {\mathbb {R}}\bigm \vert \mu ({\mathcal {L}}_{g \le \eta }) > \lambda \mu (\varOmega ) \bigr \rbrace \end{aligned}$$

is a finite real number.

Let \(({\check{\eta }}_i)_{i \in {\mathbb {N}}} \subset {\mathbb {R}}\) be an ascending sequence with \({\check{\eta }}_i < \eta ^*\ \forall i \in {\mathbb {N}}\) and \({\check{\eta }}_i \rightarrow \eta ^*\) for \(i \rightarrow \infty \). The corresponding sequence of sublevel sets \({\mathcal {L}}_{g \le {\check{\eta }}_i}\) is increasing with

$$\begin{aligned} D_1 :={\mathcal {L}}_{g < \eta ^*} = \bigcup _{i = 1}^\infty {\mathcal {L}}_{g \le {\check{\eta }}_i}. \end{aligned}$$

Since \(\mu ({\mathcal {L}}_{g \le {\check{\eta }}_i}) \le \lambda \cdot \mu (\varOmega )\) for all \(i \in {\mathbb {N}}\), we have \(\mu (D_1) \le \lambda \cdot \mu (\varOmega )\).

Conversely, let \(({\hat{\eta }}_i)_{i \in {\mathbb {N}}} \subset {\mathbb {R}}\) be a descending sequence with \({\hat{\eta }}_i > \eta ^*\ \forall i \in {\mathbb {N}}\) and \({\hat{\eta }}_i \rightarrow \eta ^*\) for \(i \rightarrow \infty \). The corresponding sequence of sublevel sets \({\mathcal {L}}_{g \le {\hat{\eta }}_i}\) is decreasing with

$$\begin{aligned} {\bar{D}}_2 :={\mathcal {L}}_{g \le \eta ^*} = \bigcap _{i = 1}^\infty {\mathcal {L}}_{g \le {\hat{\eta }}_i}. \end{aligned}$$

Here, we find that \(\mu ({\bar{D}}_2) \ge \lambda \cdot \mu (\varOmega )\).

Because \((\varOmega , \varSigma , \mu )\) is atomless, we can choose a measurable set \(D_2 \subset {\bar{D}}_2 {\setminus }D_1\) such that \(\mu (D_2) = \lambda \cdot \mu (\varOmega ) - \mu (D_1)\). We define \(D :=D_1 \cup D_2\) and obtain \(\mu (D) = \lambda \cdot \mu (\varOmega )\) and \(D \subseteq {\mathcal {L}}_{g \le \eta ^*}\). We therefore have

$$\begin{aligned} \int _D g \,\mathrm {d}\mu \le \eta ^*\cdot \lambda \cdot \mu (\varOmega ). \end{aligned}$$

If we were to assume that \(\int _D g \,\mathrm {d}\mu > \lambda \cdot \int _\varOmega g \,\mathrm {d}\mu \), it would imply that

$$\begin{aligned} \eta ^*> \frac{1}{\mu (\varOmega )} \int _\varOmega g \,\mathrm {d}\mu . \end{aligned}$$

We could then conclude that

$$\begin{aligned} \int _\varOmega g \,\mathrm {d}\mu&= \int _D g \,\mathrm {d}\mu + \int _{\varOmega {\setminus } D} \underbrace{g}_{{\ge \eta ^*}} \,\mathrm {d}\mu \\&> \underbrace{\lambda }_{= \frac{\mu (D)}{\mu (\varOmega )}} \cdot \int _\varOmega g \,\mathrm {d}\mu + \eta ^*\cdot \bigl ( \mu (\varOmega ) - \mu (D) \bigr ) \\&> \frac{\mu (D)}{\mu (\varOmega )} \cdot \int _\varOmega g \,\mathrm {d}\mu + \frac{\mu (\varOmega ) - \mu (D)}{\mu (\varOmega )} \cdot \int _\varOmega g \,\mathrm {d}\mu \\&= \int _\varOmega g \,\mathrm {d}\mu , \end{aligned}$$

which proves by contradiction that

$$\begin{aligned} \int _D g \,\mathrm {d}\mu \le \lambda \cdot \int _\varOmega g \,\mathrm {d}\mu \end{aligned}$$

and therefore that g realizes or exceeds its overall average value on D. \(\square \)

It is useful to think of \(\lambda = \frac{\mu (D)}{\mu (\varOmega )}\) and \(g = g_U\) for some \(U \in \varSigma \). We then find a step \(D \in \varSigma \) such that

$$\begin{aligned} \frac{1}{\mu (D)} \int _D g_U \,\mathrm {d}\mu \le \frac{1}{\mu (\varOmega )} \int _\varOmega g_U \,\mathrm {d}\mu . \end{aligned}$$

For any measure less than or equal to \(\mu (\varOmega )\), we can therefore choose a \(\mu \)-measurable subset \(D \subseteq \varOmega \) with that exact measure such that the predicted decrease in objective for the step D is no worse than the average over all of \(\varOmega \). This does not imply that the predicted change is negative. We can ensure a decrease by applying Lemma 4 to the finite atomless measure space \(\bigl (D_0,\, \varSigma \cap D_0,\, \mu \vert _{\varSigma \cap D_0}\bigr )\) for \(D_0 :={\mathcal {L}}_{g_U < 0} \in \varSigma \). There then exists a step D of given size that captures at least a corresponding fraction of the maximal achievable predicted decrease. We can use this to prove that (7) is a necessary criterion for local optimality.

Lemma 5

Let \(\varOmega \), \(\varSigma \), Y, J, \(\mu \), \(\nu \), and \({\mathcal {J}}\) satisfy Assumptions 1.1 to 1.7. Let \(U \in \varSigma \) be a locally optimal solution such that there exists \(R > 0\) with

$$\begin{aligned} {\mathcal {J}}(U \mathbin {\triangle }D) \ge {\mathcal {J}}(U) \quad \forall D \in \varSigma :\mu (D) \le R, \end{aligned}$$

and let \(g_U \in L^1(\varOmega , \mu )\) denote the \(\mu \)-integrable density function of \({\mathcal {J}}'(U)\). Then

$$\begin{aligned} g_U(x) \ge 0 \quad \mu \text {-a.e. in }\varOmega . \end{aligned}$$


We assume the contrary, i.e., that there exists \(D_0 \in \varSigma \) with \(\mu (D_0) > 0\) and \(g_U(x) < 0\) for all \(x \in D_0\). We have

$$\begin{aligned} \delta :=\frac{1}{\mu (D_0)} \int _{D_0} g_U \,\mathrm {d}\mu < 0. \end{aligned}$$

From Theorem 1, there exists \({\bar{R}} > 0\) such that

$$\begin{aligned} \bigl | {\mathcal {J}}(U \mathbin {\triangle }D) - {\mathcal {J}}(D) - {\mathcal {J}}'(U)(D) \bigr | \le -\frac{\delta }{2} \mu (D) \quad \forall D \in \varSigma :\mu (D) \le {\bar{R}}. \end{aligned}$$

Let \(R' :=\min \lbrace R,\, {\bar{R}},\, \mu (D_0) \rbrace > 0\). This implies that \(\lambda :=\frac{R'}{\mu (D_0)} \in (0, 1]\). According to Lemma 4, there exists \(D \in \varSigma \) with \(D \subseteq D_0\), \(\mu (D) = R' \le R\), and

$$\begin{aligned} \int _D g_U \,\mathrm {d}\mu \le \frac{\mu (D)}{\mu (D_0)} \cdot \int _{D_0} g_U \,\mathrm {d}\mu = \delta \cdot \mu (D), \end{aligned}$$

which implies that

$$\begin{aligned} {\mathcal {J}}(U \mathbin {\triangle }D) \le {\mathcal {J}}(U) + \int _{D} g_U \,\mathrm {d}\mu - \frac{\delta }{2} \mu (D) \le {\mathcal {J}}(U) + \underbrace{\frac{\delta }{2} \mu (D)}_{< 0}. \end{aligned}$$

This would contradict \({\mathcal {J}}(U \mathbin {\triangle }D) \ge {\mathcal {J}}(U)\) for all \(D \in \varSigma \) with \(\mu (D) \le R\). Therefore, no such \(D_0\) can exist. \(\square \)

Because \(g_U\) is the density function of the gradient measure, (7) is essentially a stationarity condition. We subsequently refer to points that satisfy (7) as stationary points. Similarly, we refer to \(U \in \varSigma \) as \(\varepsilon \)-stationary for given \(\varepsilon > 0\) if and only if

$$\begin{aligned} \int _\varOmega \Bigl | \min \bigl \lbrace 0, g_U(x) \bigr \rbrace \Bigr | \,\mathrm {d}\mu \le \varepsilon . \end{aligned}$$

We refer to the integral on the left hand side as the instationarity of U. One cannot always find solutions that satisfy the necessary optimality criterion (7). For instance, if the vector measure \(\nu \) maps a set \(U \in \varSigma \) to its characteristic function \(\nu (U) = \chi _U \in L^1(\varOmega )\), then the image \(\nu (\varSigma )\) is not closed, and accumulation points of sequences may not themselves be characteristic functions of measurable sets.

A weak guarantee can be given for the degree of suboptimality of an \(\varepsilon \)-stationary point. If we make an assumption of limited curvature in the sense of Lemma 3, then the following estimate holds for every \(D \in \varSigma \).

$$\begin{aligned} {\mathcal {J}}(U \mathbin {\triangle }D)&\ge {\mathcal {J}}(U) + {\mathcal {J}}'(U)(D) - \frac{L C^2}{2} {\mu (D)}^2 \\&= {\mathcal {J}}(U) + \int _D g_U \mathrm {d}\mu - \frac{L C^2}{2} {\mu (D)}^2 \\&\ge {\mathcal {J}}(U) - \varepsilon - \frac{L C^2}{2} {\mu (D)}^2 \end{aligned}$$

Here, \(L > 0\) is the Lipschitz constant associated with changes in the Fréchet derivative of J, and \(C > 0\) is the constant from Assumption 1.6. This implies that within a given radius of \(R > 0\), U is suboptimal by at most \(\varepsilon + \frac{L C^2}{2} R^2\).

It is further possible to infer that the sequence of control functions \(\nu (U_i)\) forms a Cauchy sequence, if \({\mathcal {J}}(U_i)\) approach the infimum of J over \({\text {conv}}(\nu (\varSigma ))\) and the underlying functional J is strictly convex.

Approximating gradient densities

In previous sections, we have shown that, under Assumption 1, there exists a gradient density function \(g_U :\varOmega \rightarrow {\mathbb {R}}\) for all control sets \(U \in \varSigma \) such that

$$\begin{aligned} {\mathcal {J}}(U \mathbin {\triangle }D) - {\mathcal {J}}(U) \approx \int _{D} g_U \,\mathrm {d}\mu . \end{aligned}$$

It may not be immediately clear how we would calculate a useful representation of \(g_U\). We will now briefly discuss two cases in which \(g_U\) can be determined relatively easily.

ODE-constrained optimal control

We first consider a case where Problem (1) is derived from an ODE-constrained optimal control problem of the form

$$\begin{aligned} \begin{aligned}&\min _{w, y}\, \int _0^{t_f} l(y(t), w(t)) \,\mathrm {d}t \\&\text {s.t.}\, {\dot{y}}(t) = f(y(t), w(t)) \quad \text {for a.a.}\ t \in (0, t_f) \\&\,\,\quad w(t) \in \lbrace 0, 1 \rbrace \qquad \qquad \,\,\quad \text {for a.a.}\ t \in (0, t_f) \\&\,\,\,\quad y(0) = y_0 \\&\quad \quad \quad w \in L^1([0, t_f])\\ \end{aligned} \end{aligned}$$

with constant \(t_f > 0\) and \(y_0 \in {\mathbb {R}}^n\). This includes the Lotka–Volterra problem from Sect. 1. We only discuss autonomous ODEs here, which serves primarily to unclutter notation. To guarantee that unique solutions and derivatives exist, we make some assumptions on l and f.

Assumption 2

Let \(t_f > 0\), \(D \subseteq {\mathbb {R}}^n\), \(l :D \times {\mathbb {R}}\rightarrow {\mathbb {R}}\), and \(f :D \times {\mathbb {R}}\rightarrow {\mathbb {R}}^n\) satisfy the following assumptions:

  1. 1.

    D is a convex open set with \(y_0 \in D\),

  2. 2.

    f and l are twice continuously differentiable w.r.t. (yw) on \(D \times {\mathbb {R}}\),

  3. 3.

    f and l are affine linear in w, i.e., \(f(y, w) = f(y, 0) + w \cdot \left( f(y, 1) - f(y, 0)\right) \) and \(l(y, w) = l(y, 0) + w \cdot \left( l(y, 1) - l(y, 0)\right) \),

  4. 4.

    there exists a constant L such that \(\Vert f(x, v) - f(y, w) \Vert \le L \cdot \Vert x - y \Vert + L \cdot |v - w|\) for all \(x, y \in D\) and \(v, w \in {\mathbb {R}}\),

  5. 5.

    there exists \(\varepsilon > 0\) such that for all \(\alpha \in L^1([0, t_f])\) with \(\alpha (t) \in [0, 1]\) almost everywhere and every \(\tau \in (0, t_f]\), every absolutely continuous function \(y :[0, \tau ] \rightarrow D\) with \(y(0) = y_0\) and \({\dot{y}}(t) = f(y(t), \alpha (t))\) almost everywhere satisfies \({\text {dist}}(y(t), \partial D) \ge \varepsilon \) for all \(t \in [0, \tau ]\).

It may be possible to further relax these assumptions. However, this formulation is sufficiently general to apply to the Lotka–Volterra test problem which we discuss in greater depth in Sect. 4.2. While demanding affine linearity in w may seem restrictive, it is always achievable for binary-valued controls by partial outer convexification [30, 31].

We begin with a stability result that allows us to extend the subsequent existence result to control functions whose values lie outside of [0, 1].

Lemma 6

Let \(t_f > 0\), D, f satisfy Assumption 2, let \(L > 0\) denote the constant from Assumption 2.4, and let \(\tau \in (0, t_f]\). Let \(v, w \in L^1([0, t_f])\), and let \(y_v :[0, \tau ) \rightarrow D\) and \(y_w :[0, \tau ) \rightarrow D\) be absolutely continuous such that

$$\begin{aligned} \begin{array}{ll} \,y_v(0) = y_0,&{} \\ \,\,{\dot{y}}_v(t) = f(y_v(t), v(t)) &{}\qquad \text {for a.a.}\ t \in [0, \tau ), \\ y_w(0) = y_0, &{}\\ \,{\dot{y}}_w(t) = f(y_w(t), w(t)) &{}\qquad \text {for a.a.}\ t \in [0, \tau ). \end{array} \end{aligned}$$

Then we have

$$\begin{aligned} \Vert y_v(t) - y_w(t) \Vert \le L \cdot e^{L t} \cdot {\Vert v - w\Vert }_{L^1} \qquad \forall t \in [0, \tau ). \end{aligned}$$


By using the Lipschitz condition in Assumption 2.4, we find that

$$\begin{aligned} \Vert y_v(t) - y_w(t) \Vert&\le \int _0^t \Vert f(y_v, v) - f(y_w, w) \Vert \,\mathrm {d}s \\&\le L \cdot \int _0^t | v - w | \,\mathrm {d}s + \int _0^t L \cdot \Vert y_v - y_w \Vert \,\mathrm {d}s. \end{aligned}$$

for all \(t \in [0, \tau )\). According to Gronwall’s inequality, it follows that

$$\begin{aligned} \Vert y_v(t) - y_w(t) \Vert&\le L \cdot \int _0^t |v - w| \,\mathrm {d}s + \int _0^t L^2 e^{L (t - s)} \cdot \int _0^s |v - w| \,\mathrm {d}\tau \,\mathrm {d}s \\&\le L \cdot \Vert v - w \Vert _{L^1} + \int _0^t L^2 e^{L (t - s)} \cdot \Vert v - w \Vert _{L^1} \,\mathrm {d}s \\&= L \cdot \Bigl ( 1 + \underbrace{\int _0^t L e^{L (t - s)} \,\mathrm {d}s}_{= e^{L t} - 1} \Bigr ) \cdot \Vert v - w\Vert _{L^1} \\&= L \cdot e^{L t} \cdot \Vert v - w \Vert _{L^1} \end{aligned}$$

for all \(t \in [0, \tau )\). \(\square \)

Lemma 6 shows that the solution to the initial value problem exists not only for control functions with values in [0, 1], but also in a small \(L^1\) environment around them.

Lemma 7

Let \(t_f > 0\), \(D \subseteq {\mathbb {R}}^n\), and f satisfy Assumption 2, and let \(\varepsilon > 0\) denote the constant from Assumption 2.5. Then there exists \(\delta > 0\) such that for every \(w \in L^1([0, t_f])\) with \(w(t) \in [0, 1]\) and every \(v \in B_{\delta }(w)\), there exists a unique absolutely continuous function \(y_v :[0, t_f] \rightarrow D\) such that \(y_v(0) = y_0\), \({\text {dist}}(y_v(t), \partial D) \ge \frac{\varepsilon }{2}\) for all \(t \in [0, t_f]\), and

$$\begin{aligned} {\dot{y}}_v(t) = f\left( y_v(t), v(t)\right) \qquad \text {for a.a.}\ t \in [0, t_f]. \end{aligned}$$


We use the generalized existence theory based on the Carathéodory condition as described in [17, Sec. I.5]. Assumptions 2.1 to 2.4 imply the Carathéodory condition for all \(v \in L^1([0, t_f])\). This implies the existence of unique absolutely continuous local solutions \(y_v :[0, \tau ) \rightarrow D\) with \(y_v(0) = y_0\). We can extend these local solutions to their maximal existence interval.

We first consider the case \(v = w\), where \(w(t) \in [0, 1]\) almost everywhere is guaranteed. If the maximal existence interval were to end at \(\tau \le t_f\), then the continuous extension of \(y_w\) would satisfy \(y_w(t) \rightarrow \partial D\) for \(t \rightarrow \tau \), which would contradict Assumption 2.5. We can therefore extend \(y_w\) to \([0, t_f]\) and guarantee that \(y_w(t) \in D\) for all \(t \in [0, t_f]\).

Let \(\varepsilon > 0\) denote the constant given in Assumption 2.5. We can then use Lemma 6 to extend this argument to all \(v \in L^1([0, t_f])\) with

$$\begin{aligned} {\Vert v - w\Vert }_{L^1} \le \underbrace{\frac{\varepsilon }{2 L \cdot e^{L t_f}}}_{=:\delta }. \end{aligned}$$

as this yields \({\text {dist}}(y_v, \partial D) \ge \frac{\varepsilon }{2}\). \(\square \)

Lemma 7 ensures that there exists a small neighborhood around each admissible control function in which the ODE solution is well-defined. This is important for the definition of derivatives, which are defined using changes over infinitesimally small neighborhoods.

As our measure space \((\varOmega , \varSigma )\), we choose \([0, t_f]\) with the Lebesgue \(\sigma \)-algebra. The Banach space Y is the space \(L^1(\varOmega )\) and \(\nu \) maps \(U \mapsto \chi _U\) where \(\chi _U\) denotes the characteristic function of U. The objective \(J :Y \rightarrow {\mathbb {R}}\) is given by

$$\begin{aligned} J(w) :=\int _0^{t_f} l(y_w(t), w(t)) \,\mathrm {d}t. \end{aligned}$$

The function \({\mathcal {J}}\) is then given by

$$\begin{aligned} {\mathcal {J}}(U) = J(\nu (U)) = \int _0^{t_f} l(y_{\chi _U}(t), \chi _U(t)) \,\mathrm {d}t \qquad \forall U \in \varSigma . \end{aligned}$$

For the measure \(\mu \), we can choose either the Lebesgue measure or an equivalent measure. In either case, there is a density function \(m \in L^1([0, t_f]) \cap L^\infty ([0, t_f])\) with \(m > 0\) almost everywhere such that

$$\begin{aligned} \mu (A) = \int _A m(t) \,\mathrm {d}t \qquad \forall A \in \varSigma . \end{aligned}$$

To show that these choices satisfy Assumption 1, we have to prove that \(J :L^1([0, t_f]) \rightarrow {\mathbb {R}}\) is Fréchet differentiable around every \(w \in L^1([0, t_f])\) with \(w(t) \in [0, 1]\) almost everywhere. We first show that the ODE solutions corresponding to control functions in the \(\delta \)-environment established in Lemma 7 are uniformly bounded. This allows us to consider f only on a compact convex subset \(U \subseteq {\mathbb {R}}^n\) where the norms of the second derivatives of f and l can be bounded. The conjunction of Lemma 6 with the norm bounds on the second derivatives and Assumption 2.3 allows us to control residual terms that appear when deriving the Fréchet derivative of J from the Lagrangian function of the original problem.

Lemma 8

Let \(t_f > 0\), D, f satisfy Assumption 2, and let \(L > 0\) denote the constant from Assumption 2.4. Let \(w \in L^1([0, t_f])\) be such that there exists a unique absolutely continuous function \(y_w :[0, t_f] \rightarrow D\) with \(y_w(0) = y_0\) and \({\dot{y}}_w(t) = f(y_w(t), w(t))\) almost everywhere. Then we have

$$\begin{aligned} \Vert y_w(t) - y_0 \Vert \le \bigl ( \Vert f(y_0, 0)\Vert \cdot t_f + L \cdot \Vert w\Vert _{L^1} \bigr ) \cdot e^{L t_f} \qquad \forall t \in [0, t_f]. \end{aligned}$$

Let \(\delta > 0\) be the constant derived in Lemma 7. There exists a convex compact set \(U \subset D\) such that \(y_v(t) \in U\) almost everywhere for all \(v \in L^1([0, t_f])\) such that there exists \(w \in L^1([0, t_f])\) with \(w(t) \in [0, 1]\) for almost all \(t \in [0, t_f]\) and \(\Vert v - w\Vert _{L_1} \le \delta \).


Assumption 2.4 guarantees for all \(t \in [0, t_f]\) that

$$\begin{aligned} \Vert y_w(t) - y_0\Vert \le \Bigl \Vert \int _0^t f(y_w, w) \,\mathrm {d}s \Bigr \Vert \le \int _0^t \underbrace{\Vert f(y_0, w)\Vert }_{{\le \Vert f(y_0, 0)\Vert + L \cdot |w|}} \,\mathrm {d}s + \int _0^t L \Vert y_w - y_0\Vert \,\mathrm {d}s. \end{aligned}$$


$$\begin{aligned} \alpha (t) :=t \cdot \Vert f(y_0, 0) \Vert + L \cdot \int _0^t |w(s)| \,\mathrm {d}s. \end{aligned}$$

We note that \(\alpha \) is increasing in t. By applying Gronwall’s inequality, we obtain the estimate

$$\begin{aligned} \Vert y_w(t) - y_0\Vert&\le \alpha (t) + \int _0^t \alpha (s) \cdot L \cdot e^{L \cdot (t - s)} \,\mathrm {d}s \\&\le \alpha (t) \cdot \Bigl ( 1 + \int _0^t L \cdot e^{L \cdot (t - s)} \,\mathrm {d}s \Bigr ) \\&= \alpha (t) \cdot e^{L t} \\&\le \alpha (t_f) \cdot e^{L t_f} \\&= \bigl ( \Vert f(y_0, 0)\Vert \cdot t_f + L \cdot \Vert w \Vert _{L^1} \bigr ) \cdot e^{L t_f} \end{aligned}$$

for all \(t \in [0, t_f]\). Let \(\varepsilon > 0\) denote the constant from Assumption 2.5. For all \(w \in L^1([0, t_f])\) with \(w(t) \in [0, 1]\) almost everywhere, we have \(\Vert w\Vert _{L^1} \le t_f\). For all \(v \in B_{\delta }(w)\), we have \(\Vert v\Vert _{L^1} \le t_f + \delta \). For such v, we have a unique ODE solution \(y_v\) that satisfies

$$\begin{aligned} \Vert y_v(t) - y_0\Vert \le \bigl ( \Vert f(y_0, 0)\Vert \cdot t_f + L \cdot (t_f + \delta ) \bigr ) \cdot e^{L t_f} =:R \qquad \forall t \in [0, t_f]. \end{aligned}$$

We then have

$$\begin{aligned} y_v(t) \in U :=\left\{ y \in D \,\big |\, {\text {dist}}(y, \partial D) \ge \frac{\varepsilon }{2} \right\} \cap \overline{B_R(y_0)} \qquad \forall t \in [0, t_f]. \end{aligned}$$

We note that U is the intersection of two convex closed sets and is therefore convex and closed. Since U is also bounded, it is compact. \(\square \)

Since f and l are twice continuously differentiable with respect to (yw), this implies that their first and second derivatives are also bounded on U.

We briefly note that the uniform bound of all first and second derivatives of f and l also implies that J satisfies the curvature condition required by Lemma 3. This implies the suboptimality bound described at the end of Sect. 3.2.

Let \(C :=L \cdot (1 + e^{L t_f})\). Let \(\lambda _v :[0, t_f] \rightarrow {\mathbb {R}}^n\) be the unique, absolutely continuous solution of the costate equations

$$\begin{aligned} {\dot{\lambda }}_v= & {} -l_y(y_v, v) - \lambda _v^T f_y(y_v, v) \qquad \text {for a.a.}\ t \in [0, t_f], \\ \lambda _v(t_f)= & {} 0. \end{aligned}$$

With the costate function \(\lambda \), we can define the Lagrangian function

$$\begin{aligned} \varLambda (y, w, \lambda )&= \int _0^{t_f} l(y, w) + \lambda ^T (f(y, w) - {\dot{y}}) \,\mathrm {d}t \\&= \lambda (0)^T y(0) - \lambda (t_f)^T y(t_f) + \int _0^{t_f} l(y, w) + \lambda ^T f(y, w) + {\dot{\lambda }}^T y \,\mathrm {d}t. \end{aligned}$$

We use integration by parts for the second reformulation. We note that we have \(\varLambda (y_w, w, \lambda ) = J(w)\) for all \(\lambda \) and all w. Therefore, the change in objective can be written as \(J(w) - J(v) = \varLambda (y_w, w, \lambda _w) - \varLambda (y_v, v, \lambda _v)\).

Theorem 2

Let \(t_f > 0\), \(D \in {\mathbb {R}}^n\), l, f satisfy Assumption 2, and let \(L > 0\) denote the constant in Assumption 2.4. Let \(v \in L^1([0, t_f])\) with \(v(t) \in [0, 1]\) almost everywhere. Then the objective function \(J :L^1([0, t_f]) \rightarrow {\mathbb {R}}\) with

$$\begin{aligned} J(w) :=\int _0^{t_f} l(y_w(t), w(t)) \,\mathrm {d}t \end{aligned}$$

is Fréchet differentiable in v and its derivative is given by

$$\begin{aligned} DJ(v) d :=\int _0^{t_f} \left( l_w(y_v, v) + \lambda _v^T f_w(y_v, v) \right) \cdot d \,\mathrm {d}t \qquad \forall d \in L^1([0, t_f]) \end{aligned}$$

where \(\lambda _v\) denotes the costate function associated with the control function v and its initial value problem solution \(y_v\).


We make use of the derivative expression of the costate equation. We then perform a truncated Taylor expansion with integral residual expressions to the objective, which yields

$$\begin{aligned}&J(w) - J(v) \\&\quad = \varLambda (y_w, w, \lambda _v) - \varLambda (y_v, v, \lambda _v) \\&\quad = \int _0^{t_f} l(y_w, w) - l(y_v, v) + \lambda _v^T (f(y_w, w) - f(y_v, v))\\&\qquad + {\dot{\lambda }}_v^T (y_w - y_v) \,\mathrm {d}t \\&\quad = \int _0^{t_f} \left( l_w(y_v, v) + \lambda _v^T f_w(y_v, v) \right) (w - v) \\&\qquad + \underbrace{\left( l_y(y_v, v) + \lambda _v^T f_y(y_v, v) + {\dot{\lambda }}_v^T \right) }_{= 0} (y_w - y_v) \,\mathrm {d}t \\&\qquad + \int _0^{t_f} \int _0^1 (1 - s) \varDelta y^T \nabla ^2_{xx} l(y_v + s \varDelta y, v + s \varDelta w) \varDelta y \,\mathrm {d}s \,\mathrm {d}t \\&\qquad + 2 \int _0^{t_f} \int _0^1 (1 - s) \varDelta y^T \nabla ^2_{xw} l(y_v + s \varDelta y, v + s \varDelta w) \varDelta w \,\mathrm {d}s \,\mathrm {d}t \\&\qquad + \int _0^{t_f} \int _0^1 (1 - s) \varDelta w^T \underbrace{\nabla ^2_{ww} l(y_v + s \varDelta y, v + s \varDelta w)}_{= 0} \varDelta w \,\mathrm {d}s \,\mathrm {d}t \\&\qquad + \sum _{i = 1}^n \int _0^{t_f} \int _0^1 (1 - s) \varDelta y^T \nabla ^2_{xx} f_i(y_v + s \varDelta y, v + s \varDelta w) \varDelta y \,\mathrm {d}s \,\mathrm {d}t \\&\qquad + 2 \sum _{i = 1}^n \int _0^{t_f} \int _0^1 (1 - s) \lambda _i \varDelta y^T \nabla ^2_{xw} f_i(y_v + s \varDelta y, v + s \varDelta w) \varDelta w \,\mathrm {d}s \,\mathrm {d}t \\&\qquad + \sum _{i = 1}^n \int _0^{t_f} \int _0^1 (1 - s) \lambda _i \varDelta w^T \underbrace{\nabla ^2_{ww} f_i(y_v + s \varDelta y, v + s \varDelta w)}_{= 0} \varDelta w \,\mathrm {d}s \,\mathrm {d}t. \end{aligned}$$

For the sake of brevity, we write \(\varDelta w :=(w - v)\) and \(\varDelta y :=(y_w - y_v)\). We note that the first integral in the last step is equal to \(DJ(v) \varDelta w\). According to Lemma 8, there exists a compact convex set \(U \subseteq D\) such that \(y_w(t) \in U\) and \(y_v(t) \in U\) for all vw that are either [0, 1]-valued or in a \(\delta \)-neighborhood of such a control function. Given that all convex combinations between \(y_v\) and \(y_w\) lie in the compact set U, there exists a constant \(L' > 0\) such that

$$\begin{aligned} \Vert \nabla ^2_{xx} l \Vert&\le L', \\ \Vert \nabla ^2_{xw} l \Vert&\le L', \\ \Vert \nabla ^2_{xx} f \Vert&\le L', \\ \Vert \nabla ^2_{xw} f \Vert&\le L'. \end{aligned}$$

Because \(\Vert \varDelta y \Vert \le C \cdot \Vert \varDelta w \Vert _{L^1}\) for all \(t \in [0, t_f]\), we have

$$\begin{aligned}&\left| J(w) - J(v) - DJ(v)(w - v) \right| \qquad \\&\quad \le \Bigl | \int _0^{t_f} \int _0^1 (1 - s) \varDelta y^T \nabla ^2_{xx} l(y_v + s \varDelta y, v + s \varDelta w) \varDelta y \,\mathrm {d}s \,\mathrm {d}t \Bigr | \\&\qquad + 2 \Bigl | \int _0^{t_f} \int _0^1 (1 - s) \varDelta y^T \nabla ^2_{xw} l(y_v + s \varDelta y, v + s \varDelta w) \varDelta w \,\mathrm {d}s \,\mathrm {d}t \Bigr | \\&\qquad + \sum _{i = 1}^n \Bigl | \int _0^{t_f} \int _0^1 (1 - s) \varDelta y^T \nabla ^2_{xx} f_i(y_v + s \varDelta y, v + s \varDelta w) \varDelta y \,\mathrm {d}s \,\mathrm {d}t \Bigr | \\&\qquad + 2 \sum _{i = 1}^n \Bigl | \int _0^{t_f} \int _0^1 (1 - s) \lambda _i \varDelta y^T \nabla ^2_{xw} f_i(y_v + s \varDelta y, v + s \varDelta w) \varDelta w \,\mathrm {d}s \,\mathrm {d}t \Bigr | \\&\quad \le (n + 1) L' C^2 t_f \cdot \Vert w - v \Vert _{L^1}^2 + 2 (n + 1) L' C \cdot \Vert w - v \Vert _{L^1} \cdot \int _0^{t_f} \frac{1}{2} \varDelta w \,\mathrm {d}t \\&\quad \le (n + 1) L' t_f \cdot (C^2 + C) \cdot \Vert w - v \Vert _{L^1}^2. \end{aligned}$$

From this, it follows that

$$\begin{aligned} \frac{1}{\Vert w - v\Vert _{L^1}} \left| J(w) - J(v) - DJ(v)(w - v) \right|&\le L' t_f (n + 1) (C^2 + C) \cdot \Vert w - v \Vert _{L^1} \\&\xrightarrow {\Vert w - v\Vert _{L^1} \rightarrow 0} 0. \end{aligned}$$

This shows that J is Fréchet differentiable with respect to \(L^1\) changes in the control function. \(\square \)

Let \(U \in \varSigma \) denote the current control set. The gradient measure \(\varphi _U\) in U can be derived from the bounded linear operator \(DJ(\nu (U)) = DJ(\chi _U)\) using Lemma 2. For a given step \(D \in \varSigma \), we have

$$\begin{aligned} \varphi _U(D)&= DJ(\chi _U) \left( \chi _{D {\setminus } U} - \chi _{D \cap U} \right) \\&= \int _0^{t_f} \left( l_w(y_{\chi _U}, \chi _U) + \lambda _v^T f_w(y_{\chi _U}, \chi _U) \right) \cdot \left( \chi _{D {\setminus } U} - \chi _{D \cap U} \right) \,\mathrm {d}t \\&= \int _D \left( l_w(y_{\chi _U}, \chi _U) + \lambda _v^T f_w(y_{\chi _U}, \chi _U) \right) \cdot \left( 1 - 2 \chi _U \right) \,\mathrm {d}t \\&= \int _D \frac{1 - 2 \chi _U}{m} \cdot \left( l_w(y_{\chi _U}, \chi _U) + \lambda _v^T f_w(y_{\chi _U}, \chi _U) \right) \,\mathrm {d}\mu \end{aligned}$$

where m is the density function of the measure \(\mu \) with respect to the Lebesgue measure. The density function of \(\varphi _U\) is then given by

$$\begin{aligned} g_U(t) :=\frac{1 - 2 \chi _U(t)}{m(t)} \cdot \left( l_w(y_{\chi _U}(t), \chi _U(t)) + \lambda _v^T f_w(y_{\chi _U}(t), \chi _U(t)) \right) . \end{aligned}$$

We note the close connection of this expression to the Hamiltonian

$$\begin{aligned} {\mathcal {H}}(y, w, \lambda ) = l(y, w) + \lambda ^T f(y, w) \end{aligned}$$

and to the maximum principle for hybrid systems, e.g., [35]. The basic idea of the Competing Hamiltonian approach to mixed-integer optimal control, which was first described in [5, 6], can therefore be seen as a special case of our approach.

Since the process by which this density function is derived is very close to the adjoint differentiation scheme commonly used to extract derivatives from numerical integrators, it is possible to extract the density function from off-the-shelf integrators. We use this approach in Sect. 4.2 to extract the density function from the CVODES solver of the SUNDIALS suite.

PDE-constrained optimization

Next, we consider the case where Problem (1) is derived from a PDE constrained optimization problem of the form

$$\begin{aligned} \min _{w, y}\, j(y, w)&\\ \text {s.t.}\, f(y, w)= & {} 0_Z \\ w(x)\in & {} \lbrace 0, 1 \rbrace \quad \text {for a.a.}\ x \in \varOmega \\ w\in & {} L^1(\varOmega ) \\ y\in & {} X\\ \end{aligned}$$

where \(\varOmega \subseteq {\mathbb {R}}^n\) denotes a bounded domain, X and Z are suitably chosen Banach spaces, \(j :X \times L^1(\varOmega ) \rightarrow {\mathbb {R}}\), and \(f :X \times L^1(\varOmega ) \rightarrow Z\). We make additional assumptions to ensure the existence and uniqueness of solutions.

Assumption 3

Let \(\lambda :\varSigma \rightarrow {\mathbb {R}}\) denote the Lebesgue measure. We assume that

  1. 1.

    \(j :X \times L^1(\varOmega ) \rightarrow {\mathbb {R}}\) is continuously Fréchet differentiable,

  2. 2.

    \(f :X \times L^1(\varOmega ) \rightarrow Z\) is continuously Fréchet differentiable,

  3. 3.

    \(f_y(y, w)\) has a bounded inverse for all \((y, w) \in X \times L^1(\varOmega )\),

  4. 4.

    for every \(w \in L^1(\varOmega )\), there exists \(y_w \in X\) such that \(f(y_w, w) = 0\).

We further assume that there is a sequence of partitions \(( \lbrace T^{(i)}_1, \dots , T^{(i)}_{N_i} \rbrace )_{i \in {\mathbb {N}}}\) such that

  1. 5.

    for \(i > 1, j \in [N_i]\), there exists \(j' \in [N_{i - 1}]\) with \(T^{(i)}_j \subseteq T^{{(i - 1)}}_{j'}\),

  2. 6.

    \(\lambda (T^{(i)}_j) > 0\) for all \(i \in {\mathbb {N}}, j \in [N_i]\),

  3. 7.

    \(\max \lbrace \lambda (T^{(i)}_j) \mid j \in [N_i] \rbrace \xrightarrow {i \rightarrow \infty } 0\),

  4. 8.

    for a.a. \(x \in \varOmega \), there exist \(j_i(x) \in [N_i]\) for all \(i \in {\mathbb {N}}\) such that \(x \in T^{(i)}_{j_i(x)}\),

  5. 9.

    there exists \(C > 0\) such that for every \(i \in {\mathbb {N}}, j \in [N_i]\), there is a ball \(B^{(i)}_j\) with \(T^{(i)}_j \subseteq B^{(i)}_j\) and \(\lambda (T^{(i)}_j) \ge C \lambda (B^{(i)}_j)\).

While Assumptions 3.5 to 3.9 are somewhat similar to the assumptions on “order-conserving domain dissections” stated in [26], we note that they differ in that they do not require the partition sequence to be order-conserving. Furthermore, [26] has no counterpart to Assumption 3.6, which we require because we divide by the measure of the cells. Therefore, these sets of assumptions should not be confused.

Under Assumptions 3.1 to 3.4, the implicit function theorem [20, Thm. 1.41] shows that the mapping \(w \mapsto y_w\) is continuously Fréchet differentiable with

$$\begin{aligned} D_w y_w = -f_y^{-1}(y_w, w) f_w(y_w, w). \end{aligned}$$

Our objective functional in this case is the reduced objective

$$\begin{aligned} J(w) :=j(y_w, w) \qquad \forall w \in L^1(\varOmega ) \end{aligned}$$

which is continuously Fréchet differentiable according to the chain rule and satisfies

$$\begin{aligned} DJ(w) = -j_y(y_w, w) f_y^{-1}(y_w, w) f_w(y_w, w) + j_w(y_w, w). \end{aligned}$$

We note that DJ(w) is a bounded linear form in \(L^1(\varOmega )\). Since \(L^\infty (\varOmega )\) is the dual space of \(L^1(\varOmega )\), there exists a function \(g_w \in L^\infty (\varOmega )\) such that

$$\begin{aligned} DJ(w) d = \int _\varOmega g_w d \,\mathrm {d}x. \end{aligned}$$

The vector measure \(\nu \) is once more given by \(\nu (U) :=\chi _U\) and the underlying measure space is the Lebesgue \(\sigma \)-algebra on \(\varOmega \) with the Lebesgue measure or an equivalent measure with weight function \(m \in L^\infty (\varOmega )\). As was the case for the ODE-constrained case in Sect. 3.3.1, the gradient density measure is then given by

$$\begin{aligned} \varphi _U(D)&= DJ(\chi _U) ( \chi _{D {\setminus } U} - \chi _{D \cap U} ) \\&= \int _\varOmega g_w ( \chi _{D {\setminus } U} - \chi _{D \cap U} ) \,\mathrm {d}x \\&= \int _D g_w ( 1 - 2 \chi _U ) \,\mathrm {d}x \\&= \int _D \frac{1 - 2 \chi _U}{m} g_w \,\mathrm {d}\mu \end{aligned}$$

Accordingly, the gradient density function \(g_U\) for a given control set \(U \in \varSigma \) is given by

$$\begin{aligned} g_U(x) = \frac{1 - 2 \chi _U(x)}{m(x)} g_w(x) \qquad \forall x \in \varOmega . \end{aligned}$$

To approximate the value of \(g_w\), we use the fact that \(\varOmega \) is bounded and therefore \(L^\infty (\varOmega ) \subset L^1(\varOmega )\). We consider the sequence of mesh partitions described in Assumption 3. The family of all mesh cells \(T^{(i)}_j\) contracts to nullsets according to Assumption 3.7, can be used to approximate almost all points in \(\varOmega \) according to Assumption 3.8, and is of bounded eccentricity according to Assumption 3.9. We can therefore apply the Lebesgue differentiation theorem to obtain

$$\begin{aligned} g_w(x) = \lim _{i \rightarrow \infty } \frac{1}{\lambda (T^{(i)}_{j_i(x)})} \int _{T^{(i)}_{j_i(x)}} g_w(x) \,\mathrm {d}x \end{aligned}$$

almost everywhere. If we assume that for a given mesh index i, control functions on the i-th mesh are given by

$$\begin{aligned} w(x) = \sum _{j = 1}^{N_i} w^{(i)}_j \chi _{T^{(i)}_j}(x), \end{aligned}$$

then the derivative of the objective function with respect to the degree of freedom \(w^{(i)}_j\) is equal to

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}w^{(i)}_j} J(w) = \int _{T^{(i)}_j} g_w(x) \,\mathrm {d}x. \end{aligned}$$

Therefore, if we start with a sufficiently fine mesh to express the desired control function w and maintain the same w for all higher mesh indices, which we can do because of Assumption 3.5, then we find that

$$\begin{aligned} g_w(x) = \lim _{i \rightarrow \infty } \frac{\frac{\mathrm {d}}{\mathrm {d}w^{(i)}_{j_i(x)}} J(w)}{\lambda (T^{(i)}_{j_i(x)})} \end{aligned}$$

which implies that

$$\begin{aligned} g_U(x) = \frac{1 - 2 \chi _U(x)}{m(x)} \cdot \lim _{i \rightarrow \infty } \frac{\frac{\mathrm {d}}{\mathrm {d}w^{(i)}_{j_i(x)}} J(w)}{\lambda (T^{(i)}_{j_i(x)})} \qquad \text {for a.a.}\ x \in \varOmega . \end{aligned}$$

Thus, the density function \(g_U\) can be approximated using the gradients of the objective function J with respect to the degrees of freedom (DOFs) of a piecewise constant control function on successively refined meshes. We note that this does not take into account discretization errors in function evaluation. Controlling such errors usually requires considerations much more specific to the discretization or problem in question.

We also note that the application of the bounded inverse \(f_y^{-1}(y_w, w)\) usually requires the solution of a PDE that involves the adjoint of a linearization of the original differential operator. This method of deriving the gradient is therefore sometimes known as the adjoint method.

We note that the curvature condition of Lemma 3 can be translated into a Lipschitz condition on the derivatives of f and j in this setting.


In trust-region terminology, we determine our step using the “affine linear” model function

$$\begin{aligned} \phi _U :D \mapsto {\mathcal {J}}(U) + {\mathcal {J}}'(U)(D). \end{aligned}$$

Accordingly, the trust-region subproblem in a given point \(U \in \varSigma \) is

$$\begin{aligned} \min _{D \in \varSigma } \phi _U(D)\ \text {s.t.}\ \mu (D) \le \varDelta , \end{aligned}$$

where \(\varDelta > 0\) is the trust-region radius. Given that \(U \in \varSigma \) is fixed for each instance of the trust-region subproblem, the term \({\mathcal {J}}(U)\) can be omitted and we can rewrite the subproblem as

$$\begin{aligned} \min _{D \in \varSigma } \int _D g_U \,\mathrm {d}\mu \ \text {s.t.}\ \mu (D) \le \varDelta . \end{aligned}$$

The proof of Lemma 4 suggests a method by which the subproblem can be solved. Because \(g_U\) is a \(\mu \)-measurable function, its sublevel and level sets are measurable, and we can select \(D \in \varSigma \) to encompass exactly those \(x \in \varOmega \) where \(g_U(x)\) assumes its smallest values. In the proof, we used the reference level

$$\begin{aligned} \eta ^*:=\inf \bigl \lbrace \eta \in {\mathbb {R}}\bigm \vert \mu ({\mathcal {L}}_{g_U \le \eta }) > \varDelta \bigr \rbrace . \end{aligned}$$

In practice, we need to approximate \(\eta ^*\) and \(g_U\). We state the solution procedure for (8) in a way that allows the use of an approximation of \(g_U\). On discrete meshes, it may also not be possible to choose D with the exact desired measure. Therefore, we allow for some deviation. The resulting algorithm is Procedure 1.

figure a

Line 15 in Procedure 1 selects a subset \(D_2 \subseteq {\mathcal {L}}_{g \le \eta _2} {\setminus } {\mathcal {L}}_{g \le \eta _1}\) with a given size range. The existence of \(D_2\) is guaranteed due to the atomlessness of the underlying measure space. The set \(D_2\) is used to ensure that the resulting step is close enough to the trust-region radius \(\varDelta \) to guarantee sufficient descent.

The way in which \(D_2\) is chosen is arbitrary and can be designed in a way that is suitable and convenient for the given problem implementation. Methods can range from selecting mesh cells in a pre-defined or random order to approximating knapsack solutions to achieve as large a step as possible. Given that the gradient function value is guaranteed to be between \(\eta _1\) and \(\eta _2\) at almost all points in \(D_2\), the approximate optimality of the step is guaranteed for all selections of \(D_2\).

A range of sizes is given to accommodate problem implementations where the boundaries of the step D need to run along mesh boundaries that cannot be moved arbitrarily. In such cases, mesh refinement may become necessary if it is impossible to find a set \(D_2\) of suitable size at the current mesh resolution. If the gradient density function is sufficiently well-approximated and is assumed to remain stable with respect to local mesh refinement, then the main goal of mesh refinement is to achieve sufficient mesh granularity in the candidate set \({\mathcal {L}}_{g \le \eta _2} {\setminus } {\mathcal {L}}_{g \le \eta _1}\) from which \(D_2\) is chosen. Therefore, it is sufficient to refine all cells in this set until sufficient granularity is achieved to select \(D_2\) within the given measure margins.

If the gradient function does change noticeably due to the refinement, then the candidate set may change on subsequent iterations. This is highly undesirable. However, the refinement loop will necessarily terminate as soon as every cell in the candidate set has a measure of less than \(\frac{\delta \varDelta }{-2 \eta _2}\) where \(-\eta _2\) is bounded from above due to the fact that

$$\begin{aligned} \eta _2 \cdot \varDelta \ge \int _\varOmega \min \lbrace 0, g \rbrace \mathrm {d}\mu . \end{aligned}$$

Therefore, this simplistic mesh refinement scheme will, in the worst case, terminate as soon as the entire mesh is refined to sufficient granularity.

The resulting step cannot be assumed to be optimal. As stated in Procedure 1, however, one can automatically determine the required levels of accuracy in order to obtain a solution that has arbitrarily small optimality gap. This claim is proven in Lemma 9.

Lemma 9

Let \((\varOmega , \varSigma , \mu )\) be an atomless measure space, let \(g \in L^1(\varSigma , \mu )\), let \(0 < \varDelta \le \mu (\varOmega )\), let \(\delta > 0\), and let D be the set returned by Procedure 1. Then D is \(\mu \)-measurable, satisfies \(\mu (D) \le \varDelta \), and is nearly a solution of Eq. (8) in the sense that

$$\begin{aligned} \int _D g \,\mathrm {d}\mu \le \int _{D'} g \,\mathrm {d}\mu + \delta \varDelta \quad \forall D' \in \varSigma :\mu (D') \le \varDelta . \end{aligned}$$


First, we consider the case that \(\mu \bigl ({\mathcal {L}}_{g < 0}\bigr ) \le \varDelta \). In this case, Procedure 1 returns in Line 2 with \(D = {\mathcal {L}}_{g < 0}\). Certainly, \(D \in \varSigma \) and \(\mu (D) \le \varDelta \). For every \(D' \in \varSigma \), we have

$$\begin{aligned} \int _{D'} g \,\mathrm {d}\mu = \int _D g \,\mathrm {d}\mu - \int _{D {\setminus } D'} \underbrace{g}_{< 0} \,\mathrm {d}\mu + \int _{D' {\setminus } D} \underbrace{g}_{\ge 0} \,\mathrm {d}\mu \ge \int _D g \,\mathrm {d}\mu . \end{aligned}$$

Next, we consider the case \(\mu \bigl ({\mathcal {L}}_{g < 0}\bigr ) > \varDelta \). In Lines 5 and 6, we find bounds \(\eta _1 < \eta _2 \le 0\) such that \(\mu \bigl ({\mathcal {L}}_{g \le \eta _1}\bigr ) \le \varDelta \) and \(\mu \bigl ({\mathcal {L}}_{g \le \eta _2}\bigr ) > \varDelta \). Such \(\eta _1, \eta _2\) exist because \(\mu \bigl ({\mathcal {L}}_{g < 0}\bigr ) > \varDelta \) and \(\mu \bigl ({\mathcal {L}}_{g \le \eta }\bigr ) \rightarrow 0\) for \(\eta \rightarrow -\infty \), which follows from the \(\mu \)-integrability of g. The bisection loop in Lines 7 to 12 maintains these properties while ensuring that \(\eta _2 - \eta _1 < \frac{\delta }{2}\).

We know that \(\mu (D_1) = \mu ({\mathcal {L}}_{g \le \eta _1}) \le \varDelta \). For \(\eta _2 = 0\), we have \(\mu (D_2) = 0\). Otherwise, the existence of a measurable subset \(D_2 \subseteq {\mathcal {L}}_{g \le \eta _2} {\setminus } D_1\) with

$$\begin{aligned} \mu (D_2) \in \left[ \varDelta - \mu (D_1) + \frac{\delta \varDelta }{2 \eta _2}, \varDelta - \mu (D_1)\right] \end{aligned}$$

is guaranteed by the atomlessness of \((\varOmega , \varSigma , \mu )\). In either case, \(D_2\) is guaranteed to be disjoint from \(D_1\). Let \(D = D_1 \cup D_2 \in \varSigma \) for the selected set \(D_2\). We have

$$\begin{aligned} \mu (D) = \mu (D_1) + \mu (D_2) {\left\{ \begin{array}{ll} = \mu (D_1) &{}\quad \text {if}\ \eta _2 = 0 \\ \in \left[ \varDelta \cdot \left( 1 + \frac{\delta }{2 \eta _2} \right) , \varDelta \right] &{}\quad \text {if}\ \eta _2 < 0, \end{array}\right. } \end{aligned}$$

and therefore \(\mu (D) \le \varDelta \). For every \(D' \in \varSigma \) with \(\mu (D') \le \varDelta \), we have

$$\begin{aligned} \int _D g \,\mathrm {d}\mu - \int _{D'} g \,\mathrm {d}\mu&= \int _{D {\setminus } D'} \underbrace{g}_{\le \eta _2} \,\mathrm {d}\mu - \int _{D' {\setminus } D} \underbrace{g}_{> \eta _1} \,\mathrm {d}\mu \\&\le \eta _2 \cdot \mu (D {\setminus } D') - \eta _1 \cdot \mu (D' {\setminus }D) \\&= \eta _2 \cdot \bigl ( \mu (D) - \mu (D \cap D') \bigr ) - \eta _1 \cdot \bigl ( \mu (D') - \mu (D \cap D') \bigr ) \\&= \underbrace{\eta _2 \cdot \mu (D)}_{{\le \eta _2 \cdot \varDelta + \frac{\delta }{2} \cdot \varDelta }} - \underbrace{\eta _1 \cdot \mu (D')}_{\ge \eta _1 \cdot \varDelta } + \underbrace{(\eta _1 - \eta _2) \cdot \mu (D \cap D')}_{\le 0} \\&\le \underbrace{(\eta _2 - \eta _1)}_{\le \delta / 2} \cdot \varDelta + \frac{\delta }{2} \cdot \varDelta \\&\le \delta \varDelta , \end{aligned}$$

which proves (9). \(\square \)

Using Procedure 1, we can state the main trust region loop in Algorithm 2. We allow for the use of an approximation \({\tilde{g}} \in L^1(\varOmega , \mu )\) of the gradient density \(g_U \in L^1(\varOmega , \mu )\), assuming that it can be made arbitrarily accurate according to the \(L^1\) norm.

figure b

Theorem 3

Let \(\varOmega \), \(\varSigma \), Y, J, \(\mu \), \(\nu \), \({\mathcal {J}}\), and \(C > 0\) satisfy Assumptions 1.1 to 1.7. Furthermore let \(U_0 \in \varSigma \), \(\varDelta _{\max } \in (0, \mu (\varOmega ))\), \(\varDelta _0 \in (0, \varDelta _{\max })\), \(\varepsilon > 0\), \(0< \sigma _1 < \sigma _2 \le 1\), and \(\omega \in (0, 1)\) with \(\omega < \frac{3 - 3 \sigma _1}{3 - 2 \sigma _1}\). Let \({\mathcal {J}}(\varSigma )\) be bounded from below, and let \(L > 0\) be a constant such that

$$\begin{aligned} {\bigl \Vert J'(x) + J'(y) \bigr \Vert }_{Y^*} \le L \cdot {\Vert x - y\Vert }_Y \qquad \forall x, y \in {\text {conv}}\bigl ( \nu (\varSigma ) \bigr ). \end{aligned}$$

Then Algorithm 2 terminates in finite time and yields \(i \in {\mathbb {N}}_0\) and \(U_i \in \varSigma \) such that \({\mathcal {J}}(U_i) \le {\mathcal {J}}(U_0)\) and

$$\begin{aligned} \int _\varOmega \min \lbrace 0, g_{U_i} \rbrace \,\mathrm {d}\mu > -\varepsilon . \end{aligned}$$


For given \(i \in {\mathbb {N}}_0\), let \(S_i :={\mathcal {L}}_{g_{U_i} < 0} \subseteq \varOmega \) and \({\tilde{S}}_i :={\mathcal {L}}_{{\tilde{g}}_i < 0} \subseteq \varOmega \). If the stationarity test in Line 4 succeeds, then

$$\begin{aligned}&\int _\varOmega \min \lbrace g_{U_i}, 0 \rbrace \,\mathrm {d}\mu \\&\quad = \int _{\varOmega } \min \lbrace {\tilde{g}}_i, 0 \rbrace \,\mathrm {d}\mu + \int _{{\tilde{S}}_i \cup S_i} \underbrace{\bigl ( \min \lbrace g_{U_i}, 0 \rbrace - \min \lbrace {\tilde{g}}_i, 0 \rbrace \bigr )}_{\ge -|g_{U_i} - {\tilde{g}}_i|} \,\mathrm {d}\mu \\&\quad > -\left( 1 - \frac{\omega }{3} \right) \varepsilon - \frac{\omega }{3} \frac{\varDelta _i}{\mu (\varOmega )} \varepsilon \\&\quad \ge -\varepsilon + \frac{\omega }{3} \varepsilon - \frac{\omega }{3} \varepsilon . \end{aligned}$$

We note that the initial choice of \(\varDelta _0 \le \varDelta _{\max } \le \mu (\varOmega )\) and the fact that \(\varDelta _i\) is never increased above \(\varDelta _{\max }\) guarantee that \(\frac{\varDelta _i}{\mu (\varOmega )} \le 1\) for all i. If the test fails, then

$$\begin{aligned} \int _\varOmega \min \lbrace g_{U_i}, 0 \rbrace \,\mathrm {d}\mu&\le -\left( 1 - \frac{\omega }{3} \right) \varepsilon + \frac{\omega }{3} \frac{\varDelta _i}{\mu (\varOmega )} \varepsilon \\&\le -\varepsilon + \frac{2 \omega }{3} \varepsilon . \end{aligned}$$

Therefore, the stationarity test will succeed at some point as the solution becomes stationary, but it will succeed only for solutions that satisfy the tolerance \(\varepsilon \). According to Lemma 9, \(D_i\) is such that for every \(D' \in \varSigma \) with \(\mu (D') \le \varDelta _i\),

$$\begin{aligned} \int _{D_i} {\tilde{g}}_i \,\mathrm {d}\mu \le \int _{D'} {\tilde{g}}_i \,\mathrm {d}\mu + \frac{\omega \varepsilon }{3} \frac{\varDelta _i}{\mu (\varOmega )}. \end{aligned}$$

According to Lemma 4, there exists a \(D^*\in \varSigma \) with \(\mu (D^*) \le \varDelta _i\) and

$$\begin{aligned} \int _{D^*} \min \lbrace {\tilde{g}}_i, 0 \rbrace \,\mathrm {d}\mu \le \frac{\varDelta _i}{\mu (\varOmega )} \int _\varOmega \min \lbrace {\tilde{g}}_i, 0 \rbrace \,\mathrm {d}\mu . \end{aligned}$$

With \(D' :=D^*\cap {\tilde{S}}_i\), we have

$$\begin{aligned} \int _{D_i} {\tilde{g}}_i \,\mathrm {d}\mu&\le \int _{D'} {\tilde{g}}_i \,\mathrm {d}\mu + \frac{\omega \varepsilon }{3} \frac{\varDelta _i}{\mu (\varOmega )} \nonumber \\&\le \frac{\varDelta _i}{\mu (\varOmega )} \int _\varOmega \min \lbrace {\tilde{g}}_i, 0 \rbrace \,\mathrm {d}\mu + \frac{\omega \varepsilon }{3} \frac{\varDelta _i}{\mu (\varOmega )} \nonumber \\&\le -\frac{\varDelta _i}{\mu (\varOmega )} \frac{3 - 2\omega }{3} \varepsilon < 0. \end{aligned}$$

For every accepted step, we therefore have

$$\begin{aligned} {\mathcal {J}}(U_{i + 1}) - {\mathcal {J}}(U_i) \le \sigma _1 \int _{D_i} {\tilde{g}}_i \,\mathrm {d}\mu \le -\frac{\varDelta _i}{\mu (\varOmega )} \frac{3 - 2 \omega }{3} \varepsilon \sigma _1 < 0. \end{aligned}$$

Next, we prove that there exists \({\bar{\varDelta }} > 0\) such that step \(D_i\) is accepted for all \(i \in {\mathbb {N}}\) with \(\varDelta _i \le {\bar{\varDelta }}\). From (10), we can invoke Lemma 3 to show that

$$\begin{aligned} \Bigl | {\mathcal {J}}(U_{i + 1}) - {\mathcal {J}}(U_i) - \int _{D_i} g_{U_i} \,\mathrm {d}\mu \Bigl | \le \frac{L C^2}{2} \mu (D_i)^2 \le \frac{L C^2 \varDelta _i^2}{2} \end{aligned}$$

for all i in which the stationarity test does not detect stationarity. For these iterations, we can then conclude that

$$\begin{aligned} \rho _i = \frac{{\mathcal {J}}(U_{i + 1}) - {\mathcal {J}}(U_i)}{\smash [b]{\underbrace{\int _{D_i} {\tilde{g}}_i \,\mathrm {d}\mu }_{< 0}}} \quad&{\mathop {\ge }\limits ^{(12)}}\quad \frac{\int _{D_i} {\tilde{g}}_i \,\mathrm {d}\mu + \frac{\omega \varepsilon \varDelta _i}{3 \mu (\varOmega )} + \frac{L C^2}{2} \cdot \varDelta _i^2}{\int _{D_i} {\tilde{g}}_i \,\mathrm {d}\mu } \\&{\mathop {\ge }\limits ^{(11)}}\quad 1 - \frac{\frac{\omega \epsilon \varDelta _i}{3 \mu (\varOmega )} + \frac{L C^2}{2} \cdot \varDelta _i^2}{\frac{\varDelta _i (3 - 2 \omega ) \varepsilon }{3 \mu (\varOmega )}} \\&\ge \quad 1 - \frac{\omega }{3 - 2 \omega } - \frac{3 L C^2 \mu (\varOmega )}{2 \varepsilon (3 - 2 \omega )} \cdot \varDelta _i. \end{aligned}$$

Note that \(1 - \frac{\omega }{3 - 2 \omega } > \sigma _1\) if and only if \(\omega < \frac{3 - 3 \sigma _1}{3 - 2 \sigma _1}\). Thus, we find that \(\rho _i \ge \sigma _1\) for all i with

$$\begin{aligned} \varDelta _i \le {\bar{\varDelta }} :=\frac{1 - \frac{\omega }{3 - 2 \omega } - \sigma _1}{\frac{3 L C^2 \mu (\varOmega )}{2 \varepsilon (3 - 2 \omega )}} = \frac{2 \varepsilon \cdot \bigl ( 3 - 3 \omega - (3 - 2 \omega ) \sigma _1 \bigr )}{3 L C^2 \mu (\varOmega )} > 0. \end{aligned}$$

This implies that there is never an endless loop of rejected steps. In addition, because \(\varDelta _i\) is only halved upon rejection, it follows that \(\varDelta _i \ge \frac{\min \lbrace {\bar{\varDelta }}, \varDelta _0 \rbrace }{2}\) for all i. When substituted into our prior estimate of the decrease per accepted step, this yields

$$\begin{aligned} {\mathcal {J}}(U_{i + 1}) - {\mathcal {J}}(U_i) \le -\frac{\min \lbrace {\bar{\varDelta }}, \varDelta _0 \rbrace \cdot (3 - 2 \omega ) \varepsilon \sigma _1}{6 \omega \mu (\varOmega )} < 0. \end{aligned}$$

The fact that \({\mathcal {J}}\) is bounded from below therefore implies that the number of accepted steps is finite. Because there is not an endless loop of rejected steps, the algorithm must terminate. The only manner in which it can do so is if the stationarity test succeeds after a finite number of steps. \(\square \)


In this section, we present three test problems. The first two problems are optimal control problems and are intended to show the viability of our method for optimal control. The first problem, presented in Sect. 4.1, is a mesh-dependent PDE-constrained source inversion problem for the Poisson equations in two dimensions. In contrast, the second test problem, presented in Sect. 4.2, is constrained by the Lotka–Volterra ODE system.

The third test problem, presented in Sect. 4.3, is a topology optimization problem based on the linearized elasticity equations and is inspired by [4].

Source inversion for the Poisson equation

Our first test problem is a source inversion problem using a weak form of the Poisson equation with Dirichlet boundary conditions. It has the form

$$\begin{aligned}&\min _{y, w}\, {\Vert y - {\bar{y}} \Vert }_{L^2(\varOmega )}^2 + \alpha \cdot {\Vert w \Vert }_{L^1(\varOmega )} \nonumber \\&\text {s.t.}\, a(y, v) = L(w *k, v) \qquad \forall v \in H^1_0(\varOmega ) \nonumber \\&\quad \,\,\,\quad \quad \quad y \in H^1_0(\varOmega ) \nonumber \\&\quad \,\,\quad \quad \quad w \in L^1\bigl (\varOmega , \lbrace 0, 1 \rbrace \bigr ) \end{aligned}$$

where \(\varOmega :=[0, 1]^2\), \(H^1_0(\varOmega )\) denotes the Banach space of functions in \(L^2(\varOmega )\) that have one weak derivative in \(L^2(\varOmega )\) and whose trace disappears, \({\bar{y}} \in H^1_0(\varOmega )\) denotes a constant reference function,

$$\begin{aligned} a(y, v)&:=\int _\varOmega \left\langle \nabla y, \nabla v \right\rangle \,\mathrm {d}x, \\ L(f, v)&:=\int _\varOmega f v \,\mathrm {d}x, \end{aligned}$$

and \(w *k\) denotes the convolution

$$\begin{aligned} (w *k) (x) :=\int _\varOmega w(y) k(x - y) \,\mathrm {d}y \qquad \forall x \in \varOmega \end{aligned}$$

of w with a fixed smoothing kernel \(k \in L^2({\mathbb {R}}^2) \cap L^\infty ({\mathbb {R}}^2)\) given by

$$\begin{aligned} k(x) :={\left\{ \begin{array}{ll} \frac{1}{\pi \sigma ^2 (1 - \tau )} \cdot \exp \left( -\frac{\Vert x\Vert ^2}{\sigma ^2} \right) &{}\quad \text {if}\ \exp \left( -\frac{\Vert x\Vert ^2}{\sigma ^2} \right) \ge \tau , \\ 0 &{}\quad \text {otherwise.} \end{array}\right. } \end{aligned}$$

where \(\sigma \) controls the severity of the blurring effect, \(\tau > 0\) is a threshold value. We note that \(\int _{{\mathbb {R}}^2} k(x) \,\mathrm {d}x = 1\). For this test, we choose

$$\begin{aligned} \sigma :=0.1, \quad \tau :=0.01, \quad \alpha :=10^{-5} \end{aligned}$$

as problem parameters. The cutoff threshold \(\tau \) ensures a degree of sparsity in the mollification operator \(w \mapsto w *k\).

In order to make the problem accessible to our method, we follow the approach laid out in Sect. 3.3.2 with \(X = H^1_0(\varOmega )\), \(Z = H^1_0(\varOmega )^*\) and

$$\begin{aligned} j(y, w)&:=\Vert y - {\bar{y}} \Vert _{L^2}^2 \,\mathrm {d}x + \alpha \cdot \Vert \underbrace{w}_{\ge 0} \Vert _{L^1} = \Vert y - {\bar{y}} \Vert _{L^2}^2 + \alpha \cdot \int _\varOmega w \,\mathrm {d}x, \\ f(y, w)&:=\left( v \mapsto a(y, v) - L(w *k, v) \right) . \end{aligned}$$

It is easy to verify that j is continuously Fréchet differentiable in y and w. Furthermore, it is well-known that the bilinear form a satisfies the conditions of the Lax-Milgram lemma and is strongly coercitive. Therefore, f is continuously Fréchet differentiable in y and the derivative \(f_y\) has a bounded inverse. Therefore, we only have to show that the linear operator \(w \mapsto L(w *k, v) = \left\langle w *k, v\right\rangle _{L^2}\) is bounded. This is true according to Young’s convolution inequality which states that

$$\begin{aligned} \Vert w *k \Vert _{L^2} \le \Vert w\Vert _{L^1} \Vert k\Vert _{L^2}. \end{aligned}$$

In conjunction with the Cauchy–Schwartz inequality, this shows the boundedness of \(w \mapsto L(w *k, v)\). We note that the Lax-Milgram lemma also states that there always exists a solution of the weak equation, meaning that the problem itself satisfies Assumption 3. As stated in Sect. 3.3.2, we then choose the Lebesgue measure as \(\mu \), \(\nu (U) :=\chi _U\), \(Y = L^1(\varOmega )\), and \(J(w) :=j(y_w, w)\).

We discretize the problem using a finite element method on a triangle mesh. To generate the initial mesh we subdivide the domain into 32 equally large slices along both axes, which yields 1024 equally sized squares. Each square is then subdivided into four triangles along both of its diagonals, yielding a triangle mesh with 4096 cells, each of which has an area of \(\frac{1}{4096}\) and is contained within a ball of radius \(\frac{1}{64}\), centered on the middle of its longest side. If we denote each cell of this initial mesh by \(T^{(1)}_j\) and the ball by \(B^{(1)}_j\), then we have

$$\begin{aligned} \frac{1}{\pi } \mu (B^{(1)}_j) = \frac{\pi }{\pi \cdot 64^2} = \frac{1}{4096} = \mu (T^{(1)}_j). \end{aligned}$$

For local mesh refinement, we use the two-dimensional skeleton-based refinement algorithm described by Plaza and Carey in [29]. Because our initial triangles are isosceles with a height that is exactly half of the length of its base, the triangles resulting from this refinement will always have the same eccentricity bound, meaning that this form of refinement satisfies Assumption 3.9 irrespective of the order in which triangles are refined.

To numerically solve the PDE in (13), we use a finite element method with continuous first-order Lagrange elements. The cellwise averages of the gradient density function are determined from the gradient of the objective with respect to the cell values of a piecewise constant function as described in Sect. 3.3.2.

To determine the set \(D_2\) in Procedure 1, we approximately solve a subset sum problem using a standard fully polynomial approximation scheme using dynamic programming that is described, e.g., in [21]. If we cannot reach a solution within the size margins, we refine all triangles that are contained within the candidate set \({\mathcal {L}}_{{\tilde{g}}_i \le \eta _2} {\setminus } {\mathcal {L}}_{{\tilde{g}}_i \le \eta _1}\) and resolve the PDE on the refined mesh. We had briefly addressed the validity of this approach in Sect. 3.

To determine the reference state \({\bar{y}} \in H^1_0(\varOmega )\), we approximately solve the problem

$$\begin{aligned} a(y, v) = L(f, v) \qquad \forall v \in H^1_0(\varOmega ) \end{aligned}$$


$$\begin{aligned} f(x) :=\frac{2}{\pi } \cdot \arctan \left( \sum _{i = 1}^N \beta _i \cdot \exp \left( \frac{{\Vert x - {\bar{x}}_i \Vert }^2}{c^2} \right) \right) \end{aligned}$$

where \(N = 12\) and

$$\begin{aligned} \begin{array}{llll} \,\,\,\,\beta _1 :=\frac{1}{3},&{}\qquad {\bar{x}}_1 :=\left( \frac{1}{8},\ \frac{1}{8} \right) , &{}\qquad \beta _2 :=\frac{1}{3},&{}\qquad {\bar{x}}_2 :=\left( \frac{1}{8},\ \frac{7}{8} \right) , \\ \,\,\,\,\beta _3 :=\frac{2}{3},&{}\qquad {\bar{x}}_3 :=\left( \frac{2}{8},\ \frac{4}{8} \right) , &{}\qquad \beta _4 :=\frac{2}{3},&{}\qquad {\bar{x}}_4 :=\left( \frac{3}{8},\ \frac{3}{8} \right) , \\ \,\,\,\,\beta _5 :=\frac{3}{4},&{}\qquad {\bar{x}}_5 :=\left( \frac{3}{8},\ \frac{5}{8} \right) , &{}\qquad \beta _6 :=\frac{3}{4},&{}\qquad {\bar{x}}_6 :=\left( \frac{4}{8},\ \frac{2}{8} \right) , \\ \,\,\,\,\beta _7 :=\frac{3}{4},&{}\qquad {\bar{x}}_7 :=\left( \frac{4}{8},\ \frac{6}{8} \right) , &{}\qquad \beta _8 :=\frac{3}{4},&{}\qquad {\bar{x}}_8 :=\left( \frac{5}{8},\ \frac{3}{8} \right) , \\ \,\,\,\,\beta _9 :=\frac{5}{8},&{}\qquad {\bar{x}}_9 :=\left( \frac{5}{8},\ \frac{5}{8} \right) , &{}\qquad \beta _{10} :=\frac{7}{8},&{}\qquad {\bar{x}}_{10} :=\left( \frac{6}{8},\ \frac{4}{8} \right) , \\ \,\,\beta _{11} :=\frac{5}{8},&{}\qquad {\bar{x}}_{11} :=\left( \frac{7}{8},\ \frac{1}{8} \right) , &{}\qquad \beta _{12} :=\frac{7}{8},&{}\qquad {\bar{x}}_{12} :=\left( \frac{7}{8},\ \frac{7}{8} \right) . \end{array} \end{aligned}$$

This reference problem is resolved every time the mesh is refined. We use the arctangent to ensure that the pointwise value of the resulting right-hand side function remains within the interval [0, 1] and is therefore similar in magnitude to the convolutions used in the inversion problem.

For the remaining algorithmic parameters of Algorithm 2, we choose

$$\begin{aligned} \sigma _1:= & {} 0.3, \quad \sigma _2 :=0.7, \quad \omega :=0.01, \quad \varepsilon :=10^{-8}, \\ U_0:= & {} \emptyset , \quad \varDelta _0 :=\mu (\varOmega ), \quad \varDelta _{\max } :=\mu (\varOmega ). \end{aligned}$$

The finite element discretization of the PDE is performed by using FEniCS 2019.1.0Footnote 1 [1, 23,24,25]. Local refinement is performed by using FEniCS’s built-in refine function, which uses the method described in [29].

Although the algorithm is technically parallelizable if the convolution operator is sufficiently sparse, we execute it in a single thread and record the CPU times needed for five components of the solver: initial setup (init), PDE solves (solve), step determination (step), mesh refinement (refine), and state recording (record).

We note that the single-threading implementation of the trust region loop does not preclude the possibility of multithreading being used by libraries such as FEniCS. The overall trust-region loop terminates after 187 iterations with an objective function value of \(1.309 \cdot 10^{-6}\) after having reached the optimality threshold. On a test machine with an Intel i5-10210U Quad-Core CPU, this takes 167.79 CPU seconds.

Table 1 CPU times for different solver components
Fig. 2
figure 2

Objective function value \({\mathcal {J}}(U_i)\) and wall time per iteration in seconds as well as semi-logarithmic plots of step length \(\mu (D_i)\) and instationarity \(\int _\varOmega \bigl | \min \lbrace 0, \mathbf {g}_{U_i} \rbrace \bigr | \,\mathrm {d}\mu \) for the source inversion problem

The precise breakdown of the CPU times used by components of the solver is displayed in Table 1. Due to measurement and rounding errors, these number do not always add up to the correct totals. Figure 2 shows the progression of the objective function value, runtime per iteration, instationarity, and step size over the iterations of the trust-region loop. A selection of iterates is shown in Fig. 3.

Fig. 3
figure 3

Plots of \(U_i\) and \(|\min \lbrace 0, {\tilde{g}}_i \rbrace |\) for \(i \in \lbrace 0, 1, 3, 187 \rbrace \) for the source inversion problem

As we expect with a first-order method, we see a clear pattern of diminishing returns in the tail end of the iteration. Step lengths and instationarity level out after a certain point. Meanwhile, the time per iteration increases over time, especially during those iterations requiring mesh refinement. Mesh refinement, which includes recalculation of the reference solution and reassembly of the convolution operator, accounts for \(87\%\) of the total runtime of the algorithm. By contrast, the step-finding procedure accounts for approximately \(2.8\%\) of the total runtime.

While mesh refinement is a necessary component of our method, we can interpret this as an indication that our method would perform even better on a problem where mesh refinement can be implemented more efficiently.

Lotka–Volterra fishing problem

The Lotka–Volterra fishing problem is a test problem from ODE-constrained binary optimal control. It is described in [30, 32] and is based on the classical Lotka–Volterra predator–prey system with one predator and one prey species. The goal is to minimize the deviation of predator and prey population from an equilibrium state according to the \(L^2\) norm. The optimization problem has the form

$$\begin{aligned}&\min _{y, w}\quad \int _0^{t_f} l\bigl ( y(t) \bigr ) \,\mathrm {d}t \end{aligned}$$
$$\begin{aligned}&\text {s.t.}\quad {\dot{y}}(t) = f\bigl (y(t), w(t)\bigr ) \qquad \forall t \in (0, t_f), \end{aligned}$$
$$\begin{aligned}&\quad \quad \,\, y(0) = {( y_{0,1}, y_{0,2} )}^T, \end{aligned}$$
$$\begin{aligned}&\quad \quad \quad \quad w \in L^1\bigl ([0, t_f], \lbrace 0, 1 \rbrace \bigr ), \end{aligned}$$

where the right hand side of the ODE system (14b) is given by

$$\begin{aligned} f(y, w) :=\begin{pmatrix} y_1 - y_1 \cdot y_2 - c_1 \cdot w \cdot y_1 \\ -y_2 + y_1 \cdot y_2 - c_2 \cdot w \cdot y_2 \end{pmatrix} \end{aligned}$$

and the integrand of the Lagrange term is given by

$$\begin{aligned} l(y) :={\bigl \Vert y - {(1, 1)}^T \bigr \Vert }^2. \end{aligned}$$

For the purposes of this test, we use the parameter values set in [32]:

$$\begin{aligned} t_f :=12, \quad c_1 :=0.4, \quad c_2 :=0.2, \quad y_{0,1} :=0.5, \quad y_{0,2} :=0.7. \end{aligned}$$

The optimal solution of (14) with controls \(w(t) \in [0, 1]\) is known to have a singular arc whose behavior cannot be replicated exactly by binary controls. The binary solution therefore always exhibits chattering behavior.

In order to apply our method to this problem, we follow the approach laid out in Sect. 3.3.1 where we had used compatible notation. Given that f and l are polynomials, both are twice differentiable. As functions of w, they are also affine linear. In order to identify a suitable set D such that f is uniformly Lipschitz continuous with respect to y, we must first establish an a priori bound on the component values of f for [0, 1]-valued controls. We first consider that

$$\begin{aligned} f_1(y, w) = y_1 \cdot (1 - y_2 - c_1 w) \le y_1. \end{aligned}$$

Given that \(y_{0, 1} > 0\), we have \(0 < y_1(t) \le y_{0,1} \cdot e^t\). We also have

$$\begin{aligned} f_2(y, w) = y_2 \cdot (-1 + y_1 - c_2 w) \le y_2 \cdot y_1 \le y_2 \cdot y_{0, 1} \cdot e^{t_f} \end{aligned}$$

which implies \(0 < y_2(t) \le y_{0, 2} e^{t_f \cdot y_{0, 1} \cdot e^{t_f}}\). Thus, we can restrict the function f to the bounded set

$$\begin{aligned} D :=(-1, y_{0, 1} \cdot e^{t_f} + 1) \times (-1, y_{0, 2} \cdot e^{t_f \cdot y_{0, 1} \cdot e^{t_f}} + 1). \end{aligned}$$

Since the closure of D is compact, f is uniformly Lipschitz continuous in y. We note that this choice also satisfies Assumption 2.5 with \(\varepsilon = 1\).

We observe that the objective function is an integral over time. Therefore, control changes will likely have greater impact if they occur early in the domain \([0, t_f]\). This is not always the case. If the system is asymptotically stable, then controls can have the same impact regardless of when they are made. The Lotka–Volterra system, absent any control input, exhibits periodic behavior. It is therefore reasonable to assume that the impact of a a control change is proportional to the amount of time remaining to the end of the time horizon.

To counteract this effect, we make use of the weight function m that we had allowed for in Sect. 3.3.2. Bearing in mind that m may not assume the value 0, we choose

$$\begin{aligned} m(t) :=1 + (t_f - t) \qquad \forall t \in [0, t_f]. \end{aligned}$$

Accordingly, the measure \(\mu \) is given by

$$\begin{aligned} \mu (A) :=\int _A m(t) \,\mathrm {d}t \qquad \forall A \in \varSigma \end{aligned}$$

where \(\varSigma \) is still the Lebesgue \(\sigma \)-algebra on \([0, t_f]\). The vector measure \(\nu \) is once more given by \(\nu (U) :=\chi _U\), and \(Y = L^1([0, t_f])\). In these circumstances, Sect. 3.3.1 states that the gradient density function \(g_U\) in \(U \in \varSigma \) can be derived from the costate function \(\lambda _{\chi _U}\) via

$$\begin{aligned} g_U(t) = \frac{1 - 2 \chi _U(t)}{m(t)} \left( l_w(y_{\chi _U}(t), \chi _U(t)) + \lambda _{\chi _U}^T(t) f_w(y_{\chi _U}(t), \chi _U(t)) \right) . \end{aligned}$$

We use CVODES from the SUNDIALS suiteFootnote 2 [19] to solve the initial value problem. CVODES supports adjoint sensitivity analysis and allows us to record the value of \(g_U(t)\) using callbacks. This means that we are not bound by a fixed control grid and can always use the grid chosen by the integrator. We note that some care must be taken to accurately record the sign flips of \(g_U\) that occur at the boundary of U.

To determine the set \(D_2\) in Procedure 1, we select time points from the candidate set in decreasing order. Because there is no fixed time grid, the trust region radius is always matched exactly.

The algorithmic parameters chosen for this test are

$$\begin{aligned} \sigma _1:= & {} 0.2, \quad \sigma _2 :=0.7, \quad \omega :=10^{-8}, \quad \varepsilon :=5 \cdot 10^{-4}, \\ U_0:= & {} \emptyset , \quad \varDelta _0 :=3.0, \quad \varDelta _{\max } :=\mu (\varOmega ). \end{aligned}$$

CVODES is configured to use the Adams–Moulton method with relative and absolute tolerances fixed to \(10^{-10}\) for both the forward and adjoint runs. In total, our algorithm requires 90 iterations of the outer trust-region loop, which are performed in 16.72 s (wall time). The final objective function value is 1.34424. In Sect. 4.1, mesh refinement contributes significantly to the algorithm’s runtime. In this section, mesh refinement is implicitly performed by the adaptive integrator and does not require reassembly of large matrices. Therefore, we omit a detailed breakdown of the CPU times for this test.

Fig. 4
figure 4

Plots of objective function value \({\mathcal {J}}(U_i)\) and wall time per iteration, as well as semilogarithmic plots of instationarity \(\int _\varOmega \bigl | \min \lbrace 0, \mathbf {g}_{U_i} \rbrace \bigr | \,\mathrm {d}\mu \) and step size \(\mu (D_i)\) for the Lotka–Volterra problem

Fig. 5
figure 5

ODE solutions and gradient density functions for initial guess, first iterate, and final iterate of the Lotka–Volterra fishing problem; dashed lines show cutoff level for step determination

Figure 4 depicts the development of the objective function value, instationarity, step size, and wall time per iteration over the course of the trust-region iteration. Plots of the ODE solution and gradient density function for several iterations are given in Fig. 5.

Once more, we observe that the initial improvements are substantial and level out toward the end of the iteration. However, the fast overall execution time may be on par with relaxation solvers used in first-discretize-then-optimize methods and demonstrates that Algorithm 2 is useful in an ODE setting, where it can benefit from adaptive solvers. As we note in our conclusions, this is one of the advantages our algorithm has over conventional enumerative MINLP methods.

Topology optimization

In this section, we discuss a topology optimization problem inspired by the problem discussed in Chapter 1 of [4]. We preface this by emphasizing that our method is not designed to deal with many of the pitfalls of such problems and only yields good results with carefully calibrated parameters.

Fig. 6
figure 6

Illustration of the domain of the topology optimization problem. \(\varGamma _D\) and \(\varGamma _N\) denote the fixed and traction boundaries, respectively. The grey set serves as an example of the control set U

In this test, we consider the domain \(\varOmega :=[0, 1] \times [0, 0.5]\). Let

$$\begin{aligned} \varGamma _D:= & {} \lbrace 0 \rbrace \times [0, 0.5] \subseteq \partial \varOmega , \\ \varGamma _N:= & {} \lbrace 1 \rbrace \times [0.2, 0.3] \subseteq \partial \varOmega . \end{aligned}$$

The goal is to distribute an isotropic material in \(\varOmega \) in a way that minimizes compliance if the material is fixed via a Dirichlet boundary condition on \(\varGamma _D\) and carries both its own weight and an external weight attached via a Neumann boundary condition on \(\varGamma _N\). The structure of the domain is illustrated in Fig. 6.

Let \(w :\varOmega \rightarrow \lbrace 0, 1 \rbrace \) denote the control function which specifies whether material is placed at a given point, and let \(y :\varOmega \rightarrow {\mathbb {R}}^2\) denote the displacement function which assigns to each point in \(\varOmega \) its displacement in equilibrium. The equations relating w with y are not uniquely solvable if \(w = 0\) implies that absolutely no material is placed. Therefore, we select a small constant \(\epsilon > 0\) and define

$$\begin{aligned} p(w)&:=\epsilon + (1 - \epsilon ) \cdot w, \\ e(y)&:=\frac{1}{2} \left( \nabla y + \nabla y^T \right) , \\ \sigma (y)&:=\ell _1 \cdot \left( \nabla \cdot y \right) \cdot I + 2 \ell _2 \cdot e(y) \in {\mathbb {R}}^{2 \times 2}. \end{aligned}$$

The constants \(\ell _1, \ell _2\) are Lamé’s elasticity parameters and are more commonly referred to as \(\lambda \) and \(\mu \). We choose these symbols to avoid confusion. Lamé’s parameters are properties of the material. We further introduce constant parameters \(\rho > 0\) and \(c > 0\) that denote the density and weight-specific cost of the material. Let \(T > 0\) be a parameter describing the mass pulling on \(\varGamma _N\), and let \(g > 0\) define the strength of gravity. For our test, these parameters have the values

$$\begin{aligned} \begin{array}{l} \ell _1 :=1.25,\quad \ell _2 :=1.0,\quad \rho :=1.0, \quad \epsilon :=10^{-2} \\ \,\,\,c :=0.4,\quad T :=100.0 \cdot g,\quad g :=0.1. \end{array} \end{aligned}$$

The gravitational pull on the material force is given by the force per unit volume

$$\begin{aligned} f(w) = (0, -g \rho w)^T. \end{aligned}$$

The function y is then the solution of the boundary value problem

$$\begin{aligned} \begin{array}{l} \,\,\, -\nabla \cdot (p(w) \cdot \sigma (y)) = f(w) \, \quad \qquad \text {in}\ \varOmega , \\ \qquad \qquad \qquad \qquad \qquad y = 0 \qquad \,\,\qquad \quad \text {on}\ \varGamma _D, \\ \qquad (p(w) \cdot \sigma (y)) \cdot n = (0, -T)^T \quad \text {on}\ \varGamma _N, \end{array} \end{aligned}$$

where n denotes the outer unit normal of \(\varOmega \).

The weak formulation of this boundary value problem draws y from the solution space

$$\begin{aligned} V :=\lbrace y \in H^1(\varOmega ) \mid y\vert _{\varGamma _D} = 0 \rbrace \end{aligned}$$

where \(\cdot \vert _{\varGamma _D}\) denotes the trace on \(\varGamma _D\). The weak formulation of the boundary value problem then takes the form

$$\begin{aligned} a(y, v; w) = L(v, w) \qquad \forall v \in V^*\end{aligned}$$


$$\begin{aligned} a(y, v; w)&:=\int _\varOmega \left\langle p(w) \cdot \sigma (y), e(v) \right\rangle \,\mathrm {d}x, \\ L(v; w)&:=\int _\varOmega \left\langle f(w), v \right\rangle \,\mathrm {d}x + \int _{\varGamma _N} \left\langle (0, -T)^T, v \right\rangle \,\mathrm {d}s. \end{aligned}$$

Since \(p(w) \ge \epsilon > 0\) for all \(w \in [0, 1]\), the bilinear form \((y, v) \mapsto a(y, v; w)\) is both bounded and strongly elliptic for all \(w \in L^1(\varOmega ) \cap L^\infty (\varOmega )\) which guarantees that for every \(w \in L^1(\varOmega )\) with \(w(x) \in [0, 1]\) almost everywhere, there exists a unique function \(y_w \in V\) that satisfies (15).

The objective function is a weighted sum of material cost and compliance. As stated in [4], it is more common to make either cost or compliance the objective and limit the other using a constraint. However, our method cannot accommodate constraints yet. Therefore, we choose a weighted sum. The objective functional is

$$\begin{aligned} j(y, w) :=\int _\varOmega c \rho w \,\mathrm {d}x + \alpha \cdot \left( \int _\varOmega \left\langle f(w), y \right\rangle \,\mathrm {d}x + \int _{\varGamma _N} \left\langle (0, -T)^T, y_w \right\rangle \,\mathrm {d}s \right) \end{aligned}$$

where the penalty parameter \(\alpha \) is fixed to \(10^6\).

Following the approach laid out in Sect. 3.3.2, our problem has the form

$$\begin{aligned}&\min _{y, w}\, j(y, w) \\&\text {s.t.}\, f(y, w) = 0_{V^*} \\&\,\quad \quad \quad w(x) \in \lbrace 0, 1 \rbrace \qquad \text {for a.a.}\ x \in \varOmega \\&\,\,\,\,\quad \quad \qquad w \in L^1(\varOmega ) \\&\,\,\,\,\,\quad \quad \qquad y \in V. \end{aligned}$$


$$\begin{aligned} f(y, w) :=\left( v \mapsto a(y, v; w) - L(v; w) \right) . \end{aligned}$$

Once more, the Lax-Milgram lemma shows that the Fréchet derivative \(f_y(y, w)\) has a bounded inverse. Given that both \(w \mapsto (v \mapsto a(y, v; w))\) and \(w \mapsto (v \mapsto L(v; w))\) are bounded linear operators with respect to the \(L^1\) norm of w, the map \(w \mapsto y_w\) is continuously Fréchet differentiable as a map from \(L^1(\varOmega )\) to V. In conjunction with the easily verifiable continuous Fréchet differentiability of \(j :V \times L^1(\varOmega ) \rightarrow {\mathbb {R}}\), this means that the problem satisfies Assumption 3.

Once more, we choose \(J(w) :=j(y_w, w)\), \(\nu (U) :=\chi _U\), and the unweighted Lebesgue measure as \(\mu \). Our algorithmic parameters are given by

$$\begin{aligned} \sigma _1:= & {} 0.1, \quad \sigma _2 :=0.9, \quad \omega :=10^{-3}, \quad \varepsilon :=0.05, \\ U_0:= & {} \varOmega , \quad \varDelta _0 :=0.015625, \quad \varDelta _{\max } :=0.015625. \end{aligned}$$

The strict limits on \(\varDelta \) avoid large steps that disconnect chunks of material from the fixed boundary \(\varGamma _D\). Whenever this occurs, gradients escalate and require aggressive error control that is beyond the scope of this discussion.

For numerical approximation, we use the same discretization and refinement method described in Sect. 4.1. For the initial mesh generation, we subdivide the domain into 100 equally sized slices along the x axis and 50 equally sized slices along the y axis. We use FEniCS 2019.1.0 to implement the finite-element discretization and approximate the density function using the objective gradient with respect to control DOFs as described in Sect. 3.3.2.

As opposed to Sect. 4.1, we do not solve a subset sum problem to determine \(D_2\) but simply select triangles from the candidate set \({\mathcal {L}}_{{\tilde{g}}_i \le \eta _2} {\setminus } {\mathcal {L}}_{{\tilde{g}}_i \le \eta _1}\) in descending order of their area, skipping those that are too large to fit into the remaining size margin. Refinement is triggered if the result is smaller than the lower size bound for \(D_2\).

Fig. 7
figure 7

Objective function value, time per iteration, instationarity, and prediction quality \(\rho _i\) for the topology optimization problem

Fig. 8
figure 8

Plots of the control set \(U_i\) for selected iterations of the topology optimization problem. The mesh has been warped by \(100 \cdot y_{\chi _{U_i}}\) to illustrate how the design affects compliance

Figure 7 shows how objective function value, time per iteration, instationarity, and the step quality measure \(\rho \) develop over the iterations of the outer trust-region loop. Because of numerical issues, the step length can be allowed to become neither too small nor too large. We therefore tune the parameters such that the step is adjusted as little as possible. In the given test run, the step length was never adjusted. Therefore, we show the prediction quality \(\rho _i\) instead of the step length. This illustrates the significance of error control because prediction quality decays drastically when numerical errors start exceeding actual improvements in objective. The algorithm terminates due to meeting the instationarity threshold after 26 iterations and takes a total of 484.94 s (wall time) on a test machine with an Intel i5-10210U Quad-Core CPU. We depict control sets from various iterations in Fig. 8.

When examining Fig. 8 carefully, we can see that the algorithm starts to “thin out” the structure towards the end of the iteration. This is likely an artifact of the penalty approach we have chosen. Once a good balance between compliance and cost is found, the algorithm is incentivized to thin out the structure to save costs as long as the greater compliance adds less to the objective. This “thinning out” of the structure leads to escalating gradients and numerical issues and forces us to stop the iteration at a relatively high instationarity.

Another problem are checkerboard patterns. Given that our method requires high degrees of local mesh refinement, non-physical microstructures are nearly unavoidable. While such structures are intermittently observable around joints in the structure, they do not seem to appear on a large scale. This may be due to the fact that our solutions are not optimal for the given mesh, but are rather based on approximations of the density function \(g_U\) which is derived from the underlying infinite-dimensional problem.

Checkerboard patterns are, to a certain extent, a side effect of the selected discretization method and can be mitigated by careful choice of such methods. For instance, there exist approaches in the field of topology optimization, e.g., in [27], that allow level set boundaries to pass through the interior of a cell. In conjunction with the use of higher order finite elements, this can mitigate or avoid the issue of checkerboard patterns. While such methods exceed the scope of this paper, we have attempted to describe our method without making overly restrictive assumptions on numerical methodology so that it can be integrated into a variety of different problems and solution approaches.

As it stands, our method should not be seen as a competitive topology optimization method. Rather, we present these results as a proof of concept to show that future extensions of this method may also be applicable to the field of topology optimization.

Conclusions and outlook

In this paper, we present a trust-region algorithm that solves binary optimal control problems by iteratively improving on existing solutions. We exploit the fact that although the controls in these problems are binary valued, they represent points in a continuum of measurable sets. Within the metric space formed by these measurable sets, some objective functions can be shown to be differentiable in a manner that allows for relatively straightforward construction of steepest-descent steps. As a result, we are able to design an algorithm that is almost completely analogous to a conventional steepest-descent trust-region method and whose asymptotic behavior can be proven in a similar way.

We have not extensively compared the performance of our method with that of other methods. Outside of the field of topology optimization, where similar methods already exist, it is generally difficult to design fair comparisons to other methods. Many optimal control methods assume fixed or uniformly refined control meshes, which makes it difficult to find “fair” parameters for comparisons. Enumerative techniques such as branch-and-bound on a fixed control mesh, for instance, suffer from an extreme disadvantage if the control mesh is too fine, whereas our method would be expected to become more accurate with refinement.

We therefore present this work as proof of concept in the hope that, as one method among many, it may expand the scope of practically solvable optimal control problems. We have kept very closely to the theoretical framework used to validate conventional NLP methods in hopes that, in the future, it may become possible to transfer more of conventional NLP theory to this setting, which may enable constrained optimization or higher-order methods to be transfered to measure spaces. If this could be merged with continuous optimal control methods, it could then give rise to a category of fast methods for mixed-integer optimal control with both ODE and PDE constraints.


  1. Available from under GNU LGPL v3

  2. Available from under a BSD license


  1. Alnæs, M.S., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M.E., Wells, G.N.: The FEniCS Project Version 1.5. Arch. Numer. Softw. 3(100), 9–23 (2015).

  2. Amstutz, S.: Connections between topological sensitivity analysis and material interpolation schemes in topology optimization. Struct. Multidiscip. Optim. 43(6), 755–765 (2011).

    MathSciNet  Article  MATH  Google Scholar 

  3. Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. In: Iserles, A. (ed.) Acta Numerica, vol. 22, pp. 1–131. Cambridge University Press, Cambridge (2013).

    Chapter  Google Scholar 

  4. Bendsøe, M.P., Sigmund, O.: Topology Optimization. Springer, Berlin (2004).

    Book  MATH  Google Scholar 

  5. Bock, H., Longman, R.: Optimal control of velocity profiles for minimization of energy consumption in the New York Subway System. In: Proceedings of the Second IFAC Workshop on Control Applications of Nonlinear Programming and Optimization, pp. 34–43. International Federation of Automatic Control (1980)

  6. Bock, H., Longman, R.: Computation of optimal controls on disjoint control sets for minimum energy subway operation. In: Proceedings of the American Astronomical Society. Symposium on Engineering Science and Mechanics (1982)

  7. Bogachev, V.: Measure Theory. Springer, Berlin (2007).

    Book  MATH  Google Scholar 

  8. Burger, M., Gerdts, M., Göttlich, S., Herty, M.: Dynamic programming approach for discrete-valued time discrete optimal control problems with dwell time constraints. In: IFIP Conference on System Modeling and Optimization, pp. 159–168. Springer, Berlin (2015)

  9. Céa, J., Garreau, S., Guillaume, P., Masmoudi, M.: The shape and topological optimization connection. Comput. Methods Appl. Mech. Eng. 188(4), 713–726 (2000).

    MathSciNet  Article  MATH  Google Scholar 

  10. Cea, J., Gioan, A., Michel, J.: Adaptation de la methode du gradient a un probleme d’identification de domaine. In: Glowinski, R., Lions, J.L. (eds.) Computing Methods in Applied Sciences and Engineering Part 2, pp. 391–402. Springer, Berlin (1974).

  11. Deaton, J.D., Grandhi, R.V.: A survey of structural and multidisciplinary continuum topology optimization: post 2000. Struct. Multidiscip. Optim. 49(1), 1–38 (2014).

    MathSciNet  Article  Google Scholar 

  12. Eschenauer, H.A., Kobelev, V.V., Schumacher, A.: Bubble method for topology and shape optimization of structures. Struct. Optim. 8(1), 42–51 (1994).

    Article  Google Scholar 

  13. Eschenauer, H.A., Olhoff, N.: Topology optimization of continuum structures: a review. Appl. Mech. Rev. 54(4), 331 (2001).

    Article  Google Scholar 

  14. Gerdts, M.: A variable time transformation method for mixed-integer optimal control problems. Optim. Control. Appl. Methods 27(3), 169–182 (2006)

    MathSciNet  Article  Google Scholar 

  15. Gerdts, M., Sager, S.: Mixed-integer DAE optimal control problems: necessary conditions and bounds. In: Biegler, L., Campbell, S., Mehrmann, V. (eds.) Control and Optimization with Differential-Algebraic Constraints, pp. 189–212. SIAM, Philadelphia (2012)

    Chapter  Google Scholar 

  16. Hahn, M., Kirches, C., Manns, P., Sager, S., Zeile, C.: Decomposition and approximation for PDE-constrained mixed-integer optimal control. In: Hintermüller, M., Herzog, R., Kanzow, C., Ulbrich, M., Ulbrich, S. (ed.) SPP1962 Special Issue. Birkhäuser, Basel (2019) (Accepted)

  17. Hale, J.K.: Ordinary Differential Equations. Dover Publications, Inc., New York (2009)

    Google Scholar 

  18. Hellström, E., Ivarsson, M., Aslund, J., Nielsen, L.: Look-ahead control for heavy trucks to minimize trip time and fuel consumption. Control Eng. Pract. 17, 245–254 (2009)

    Article  Google Scholar 

  19. Hindmarsh, A.C., Brown, P.N., Grant, K.E., Lee, S.L., Serban, R., Shumaker, D.E., Woodward, C.S.: SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans. Math. Softw. 31(3), 363–396 (2005)

    MathSciNet  Article  Google Scholar 

  20. Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE Constraints. Springer, Dordrecht (2009).

    Book  MATH  Google Scholar 

  21. Korte, B., Vygen, J.: Combinatorial Optimization, Algorithms and Combinatorics, vol. 21, 5th edn. Springer, Berlin (2012).

    Book  MATH  Google Scholar 

  22. Kunisch, K., Trautmann, P., Vexler, B.: Optimal control of the undamped linear wave equation with measure valued controls. SIAM J. Control Optim. 54(3), 1212–1244 (2016).

    MathSciNet  Article  MATH  Google Scholar 

  23. Logg, A., Mardal, K.A., Wells, G.N., et al.: Automated Solution of Differential Equations by the Finite Element Method. Springer, Berlin (2012).

    Book  MATH  Google Scholar 

  24. Logg, A., Wells, G.N.: DOLFIN: automated finite element computing. ACM Trans. Math. Softw. 37(2), 1–28 (2010).

    MathSciNet  Article  MATH  Google Scholar 

  25. Logg, A., Wells, G.N., Hake, J.: DOLFIN: a C++/Python Finite Element Library, Chapter 10. Springer, Berlin (2012)

    Google Scholar 

  26. Manns, P., Kirches, C.: Multidimensional sum-up rounding for elliptic control systems. SIAM J. Numer. Anal. 58(6), 3427–3474 (2020).

    MathSciNet  Article  MATH  Google Scholar 

  27. Norato, J.A., Bendsøe, M.P., Haber, R.B., Tortorelli, D.A.: A topological derivative method for topology optimization. Struct. Multidiscip. Optim. 33(4), 375–386 (2007).

    MathSciNet  Article  MATH  Google Scholar 

  28. Novotny, A.A., Sokołowski, J.: Topological Derivatives in Shape Optimization. Interaction of Mechanics and Mathematics. Springer, Berlin (2012).

    Book  Google Scholar 

  29. Plaza, A., Carey, G.: Local refinement of simplicial grids based on the skeleton. Appl. Numer. Math. 32(2), 195–218 (2000).

    MathSciNet  Article  MATH  Google Scholar 

  30. Sager, S.: Numerical Methods for Mixed-Integer Optimal Control Problems. Der andere Verlag, Tönning, Lübeck, Marburg (2005). ISBN 3-89959-416-9

  31. Sager, S., Bock, H., Diehl, M.: The integer approximation error in mixed-integer optimal control. Math. Program. A 133(1–2), 1–23 (2012)

    MathSciNet  Article  Google Scholar 

  32. Sager, S., Bock, H.G., Diehl, M., Reinelt, G., Schloder, J.P.: Numerical methods for optimal control with binary control functions applied to a Lotka–Volterra type fishing problem. In: Seeger, A. (ed.) Recent Advances in Optimization, pp. 269–289. Springer, Berlin (2006).

    Chapter  Google Scholar 

  33. Sager, S., Claeys, M., Messine, F.: Efficient upper and lower bounds for global mixed-integer optimal control. J. Global Optim. 61(4), 721–743 (2015).

    MathSciNet  Article  MATH  Google Scholar 

  34. Sager, S., Jung, M., Kirches, C.: Combinatorial integral approximation. Math. Methods Oper. Res. 73(3), 363–380 (2011)

    MathSciNet  Article  Google Scholar 

  35. Tauchnitz, N.: Das Pontrjaginsche Maximumprinzip für eine Klasse hybrider Steuerungsprobleme mit Zustandsbeschränkungen und seine Anwendung. Ph.D. thesis, BTU Cottbus (2010)

  36. Xavier, M., Novotny, A.A.: Topological derivative-based topology optimization of structures subject to design-dependent hydrostatic pressure loading. Struct. Multidiscip. Optim. 56(1), 47–57 (2017).

    MathSciNet  Article  Google Scholar 

  37. Zeile, C., Robuschi, N., Sager, S.: Mixed-integer optimal control under minimum dwell time constraints. Math. Program. 188, 1–42 (2020).

    MathSciNet  Article  MATH  Google Scholar 

Download references


S. Sager and M. Hahn have received funding from the European Research Council (ERC) under Grant Agreement No. 647573, from the German Research Foundation under GRK 2297 MathCoRe (Project No. 314838170) and SPP 1962, and from the German Federal Ministry of Education and Research within the program “Mathematics for Innovations” under the project “Power to Chemicals.” Part of this work was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357, and by the Bundesministerium für Bildung und Forschung (Grant No. 05M2018).

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.


Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mirko Hahn.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hahn, M., Leyffer, S. & Sager, S. Binary optimal control by trust-region steepest descent. Math. Program. (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Binary optimal control
  • Topological gradient
  • Trust region methods

Mathematics Subject Classification

  • 49M05
  • 90C10
  • 90C30
  • 90C48