Compactness and convergence rates in the combinatorial integral approximation decomposition

The combinatorial integral approximation decomposition splits the optimization of a discrete-valued control into two steps: solving a continuous relaxation of the discrete control problem, and computing a discrete-valued approximation of the relaxed control. Different algorithms exist for the second step to construct piecewise constant discrete-valued approximants that are defined on given decompositions of the domain. It is known that the resulting discrete controls can be constructed such that they converge to a relaxed control in the weak∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {weak}^*$$\end{document} topology of L∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^\infty $$\end{document} if the grid constant of this decomposition is driven to zero. We exploit this insight to formulate a general approximation result for optimization problems, which feature discrete and distributed optimization variables, and which are governed by a compact control-to-state operator. We analyze the topology induced by the grid refinements and prove convergence rates of the control vectors for two problem classes. We use a reconstruction problem from signal processing to demonstrate both the applicability of the method outside the scope of differential equations, the predominant case in the literature, and the effectiveness of the approach.


Introduction
This article concerns the following class of optimization problems inf x j(K (x)) x(s) ∈ {ξ 1 , . . . , ξ M } ⊂ V for almost all (a.a.) s ∈ Ω T (P) for some bounded domain Ω T ⊂ R d , d ≥ 1. It is an infinite-dimensional and nonsmooth optimization problem, in which the distributed optimization variable x is restricted to a finite number of realizations, often also called bangs. The control-tostate operator K solves the dynamics of the underlying system that is controlled. We note that the feasible set of (P) is bounded. However, we cannot generally expect that (P) admits a minimizer because the feasible set of (P) is not closed in the weak * topology. Apart from control of ODEs and PDEs with discrete-valued control inputs, the problem class (P) also covers problems from image denoising and topology optimization. However, we note that there are also prevalent instances of these problems, where the properties of the control-to-state operator that are required for our analysis do not hold, see for example the problem in [7,Sect. 4

.2.2].
Clason et al. treat a similar problem class with a so-called multi-bang control regularization that generalizes an L 1 -type penalty to promote a corresponding multi-bang solution structure (that is the solution takes the values ξ 1 , . . . , ξ M ) in the series of articles [5][6][7]. Buchheim et al. [4] treat an instance of (P) following the two steps a) discretize (P) into a finite-dimensional integer program (IP), and b) solve the discretized problem with a finite-dimensional IP-technique, namely a branch-and-bound algorithm for convex quadratic integer programs. Their results show that the computational demand may become excessive for fine discretizations. This is unsurprising because the discretized problem is an integer quadratic program (IQP), a class of problems, which is NP-hard in general.
We use the results on convexification reformulations and the combinatorial integral approximation (CIA) decomposition from [12,15,21,23,24,26,27,36]. The CIA decomposition splits the optimization into 1. deriving and solving a continuous relaxation of the problem (P), and 2. computing a discrete-valued approximation of the control of the relaxation.
The splitting allows us to take advantage of the infinite-dimensional structure of the problem, which allows to use efficient algorithms to compute approximations of (P). Obviously, the continuous relaxation cannot be solved to optimality in function space on a computer with finite precision. In [16,22], it is shown that if the minimizers of finite-dimensional approximations of the continuous relaxation approximate a minimizer of the continuous relaxation, the discrete-valued approximation of the relaxed control can be constructed to approximate the infimum of (P). Both steps of the CIA decomposition have been analyzed by means of a reformulation of the problem with binary controls that serve as activation functions of the control realizations ξ 1 ,. . .,ξ M .
We follow this procedure and introduce the terms of binary and relaxed control; see [22]. Definition 1. 1 The term binary control denotes a measurable function ω : Ω T → {0, 1} M with M i=1 ω i = 1 almost everywhere (a.e.). A measurable function α : in Ω T is called a relaxed control.
Following the literature, we call algorithms that transform continuous-valued variables into discrete-valued ones rounding algorithms. The proposed approach is advantageous because we can assume that both steps can be executed efficiently. For the second step, different algorithmic approaches exist. We name sum-up rounding (SUR) [24], next-forced rounding (NFR) [14], and the optimization-based ones presented in [2,37]. All of them take a relaxed control and construct a binary control that is piecewise constant on the cells of a given grid, the so-called rounding grid. If the grid constant, in this case the maximum volume of the grid cells, tends to zero, another quantity, the so-called integrality gap, tends to zero as well. If Ω T = (t 0 , t f ) this means that sup t∈[t 0 ,t f ] t 0 α − ω Δ → 0 for a relaxed control α and the binary control ω Δ that was computed on a rounding grid with grid constant Δ. To avoid ambiguities, we note that we refer to the maximum cell volume in the cells that make up the rounding grid by the symbol Δ and the term grid constant. If the operator K exhibits sufficient compactness properties, namely if it maps weakly convergent sequences to norm convergent sequences, and the objective functional is a continuous function of the state vector, we obtain convergence of the objective functional. This gives rise to an optimality principle, which has been shown in [22] for the case of elliptic boundary value problems (BVPs).
The presented approach is closely related to the approximation of control inputs into differential equations or inclusions with so-called chattering controls, a theory, which has been investigated in the optimal control community for several decades. In particular, the Lyapunov convexity theorem [17,18] and the Filippov-Ważewski theorem [9,35] are important findings in this context. We also note Tartar's work [32] because it provides a constructive means to compute discrete-valued controls from continuously-valued ones in Theorem 3. His construction can also be used in the second step of the CIA decomposition. A similar idea is pursued by Gerdts [10,11] under the name variable time transformation in the one-dimensional -that is the time-dependent -case. In [24,28], Sager employs this approximation approach in the context of discrete-valued optimal control of ODEs. The results are extended constructively using the aformentioned SUR algorithm in [15,26] and transferred to evolution equations with semigroup theory in [12,21]. In [22,36], the algorithmic approach is transferred to the multi-dimensional setting.

Contributions
The CIA decomposition has been developed originally for the approximation of control inputs and corresponding solutions of differential equations. We show that it is in fact always applicable to optimization problems, in which distributed optimization variables are passed into compact or completely continuous control-to-state operators and provide a signal processing example that does not involve any differential equation.
The objective corresponding to a relaxed control can be approximated arbitrarily well with discrete-valued control trajectories if the grid size of the rounding grid tends to 0. From a function space point of view, this is independent of the method that is chosen to solve the relaxed optimization problem.
We show that rounding grids induce pseudometrics. Under a regularity assumption on the refinement of the rounding grids, we prove that the induced pseudometrics form a Hausdorff topology. Moreover, this assumption implies a convergence rate for the integer approximation in the H −1 -norm. We show an improved convergence rate for the state vector approximation for a class of one-dimensional signal filtering approximation problems under a differentiability assumption on the convolution kernel.
We demonstrate computationally that our methodology allows us to obtain high precision approximations of the infimum of (P) without the need to solve a potentially NP-hard discretized problem, which allows for an efficient algorithmic framework and allows for finer discretizations compared to the approach presented in [4].

Structure of the article
In Sect. 2, we introduce a general formulation of the model problem (P) and derive the relaxation for the first step of the CIA decomposition. In Sect. 3, we introduce rounding algorithms and an approximation property that can be satisfied by suitable algorithms in the second step of the relaxation. We show that this is sufficient to obtain the desired convergence of the objective value by employing compactness properties. In Sect. 4, we motivate and prove a convergence rate of the controls in the space H −1 . In Sect. 5, we apply the results from Sect. 3 to a model problem from signal processing and prove a convergence rate on the approximated signal under a suitable regularity assumption. Section 6 demonstrates our findings computationally for a variant of the signal processing problem presented in [4], and Sect. 7 draws a conclusion.

Notation
For a Banach space X , we denote its topological dual space by X * . For an optimization problem (OP), we denote its feasible set by F (OP) . We denote the unit simplex by We denote convergence in the weak * topology by * . We denote the Borel σ -algebra by B. We denote the Lebesgue measure on R m by λ R m . If m is obvious, we abbreviate and simply write λ. For a set A ⊂ R m , we write diam A = sup{ x − y : x, y ∈ A}. For sequences (a (n) ) n ⊂ [0, ∞) and (b (n) ) n ⊂ [0, ∞), we abbreviate the fact that 0 ≤ c 1 a (n) ≤ b (n) ≤ c 2 a (n) for global constants c 1 , c 2 > 0 by the Landau notation b (n) = Θ(a (n) ). We highlight that this is a slight deviation from the canonical use of the Landau notation, where only the limiting behavior matters.

Standing assumptions and continuous relaxation
Before deriving relaxations and stating our assumptions, we define the term ultraweakcomplete continuity, which is tailored to our requirements on the control-to-state operator.
Definition 2.1 Let X and Y be Banach spaces. We call a function A : X * → Y ultraweak-completely continuous if for all sequences (x (n) ) n ⊂ X * , we have that

Remark 2.2 An operator
in Y for Banach spaces X and Y . If X is reflexive and A is linear, this implies compactness of the operator A, that is if the sequence (x (n) ) n ⊂ X is bounded, the sequence (A(x (n) )) n ⊂ Y has a convergent subsequence. Furthermore, compactness of a linear operator always implies its complete continuity. We define ultraweak-complete continuity analogous to complete continuity but require weak * convergence for the domain sequence. In particular, we consider weak * convergence in L ∞ in this manuscript because it is the natural topology to discuss the convergence of the discrete-valued control functions. If the control-to-state operator K is defined for functions x that take values in conv{ξ 1 , . . . , ξ M } and not only in {ξ 1 , . . . , ξ M }, we can replace the discreteness constraint in (P) by its convex hull and obtain the relaxed problem (Q) Employing the aforementioned algorithms in the second step of the CIA decomposition allows us to compute the discrete-valued approximants of the solution of (Q). However, the algorithms are defined on S M -valued functions. This problem can be circumvented because elements in conv{ξ 1 , . . . , ξ M } can be represented by convex combinations of {ξ 1 , . . . , ξ M } by construction. We recall that conv{ξ 1 , . . . , ξ M } is compact because M < ∞.
In the context of differential equations, the convex coefficients are often used to relax binary activation functions of terms that occur in the right hand side of an ODE or PDE. For example, we may consider the initial value problem (IVP) for binary controls ω. This IVP is equivalent tȯ for all feasible x(s) ∈ {ξ 1 , . . . , ξ M } a.e. by means of x(s) = M i=1 ω i (s)ξ i a.e. In this case, the control-to-state operator of the relaxation does not have to be defined for all control functions x(s) ∈ conv{ξ 1 , . . . , ξ M } a.e. because it is sufficient to analyze the control-to-state operator of (2.1). This strategy is called partial outer convexification in the literature [12,15,26]. Thus from now on we consider control-to-state operators that act on the convex coefficients. Therefore, we generalize the relaxation of (P) below. It features a different operator By requiring that K R satisfies the identity K R (ω) = K M i=1 ω i ξ i for all binary controls ω, we obtain that (R) is a relaxation of (P). We make the following standing assumption on (P).

Assumption 1
for all binary controls ω. 6. Let j : Y → R be continuous. 7. Let the number of discrete control realizations M ∈ N be fixed.

Remark 2.3
As an alternative to the analysis we present here, one can also analyse the problem (Q) if K is defined on all of L ∞ (Ω T , V ). Then, in Assumption 1 one may require that K : L ∞ (Ω T , V ) → Y is ultraweak-completely continuous. To this end, one generally requires the identification U * ∼ = V for some Banach space U and that the space V has the Radon-Nikodym property. Then, this allows to deduce (L 1 (Ω T , U )) * ∼ = L ∞ (Ω T , V ), see [8,Thm IV.1.1]. This is a fairly general assumption however and in particular, every reflexive Banach space satisfies this property.

Approximation arguments
The approximation arguments generalize work from [16,22], which analyze the case, where K R and K are control-to-state operators of the BVPs governed by the Laplace operator. In Sect. 3.1, we introduce important terms for our analysis and recall findings from previous work. Sect. 3.2 derives norm convergence and an optimality principle for the approximation based on Sect. 3.1.

Definitions and control approximation
The approximation properties in this section are stated for relaxed controls or sequences of them. One should have in mind that the aforementioned algorithms for the second step of the CIA decomposition produce binary controls, or sequences of them, which satisfy these properties. We introduce the terms of rounding grid and admissible sequence of rounding grids. [20,22]

Definition 3.2 (Order conserving domain dissection
. Remark 3.3 Definition 3.2 2 is important for the analysis in Sect. 4. Therefore, we postpone a discussion to Sect. 4 and just note here that it can be satisfied by using orderings of the grid cells that are induced by iterates of space-filling curves; see [22]. We note that Definition 3.2 3 is satisfied for isotropic refinements of quasi-uniform meshes; see [22].
We introduce a quantity that is known to tend to zero if the grid constant tends to zero for a sequence of rounding grids and the discrete-valued control functions are constructed with suitable rounding algorithms, which we call integrality gap; see [21,22]. We will see in Lemma 4.1 that the function d constitutes a pseudometric as mentioned in Sect. 1. We state the main finding on the relationship between the integrality gap, admissible sequences of rounding grids and weak convergence from [22] in the following proposition.

Proposition 3.5 ([22, Thm 4.7])
Let an order conserving domain dissection be given with corresponding sequence of integrality gaps (d (n) ) n . Let α be a relaxed control and (ω (n) ) n be a sequence of relaxed controls such that Then The corollary below follows immediately.

Corollary 3.6 Let the assumptions of Proposition 3.5 hold. Let V be the topological dual space of a Banach space, and let V have the Radon-Nikodym property. Let
The literature [14,15,23,26,27] shows that the aforementioned approximation algorithms admit constants C > 0, which are independent of the relaxed control α and the sequence of rounding grids, such that they yield d (n) (α, ω (n) ) ≤ CΔ (n) for ω (n) being produced from α by the algorithm on an admissible sequence of rounding grids. The bounds are usually established for the quantity sup t∈[0,T ] t 0 (α − ω (n) ) ∞ for the case Ω T = (0, T ) and transfer to multidimensional formulations of the algorithm with the correspondence established in [22,Sect. 4.1]. We note that the bounds on the integrality gap hold for consistent orderings of the grid cells in a) the procedure of the algorithm and b) the increasing union in the evaluation of the integrality gap. Thus, Definition 3.2 and Proposition 3.5 give ω (n) * α and x (n) * x for an order conserving domain dissection.

State approximation
The prerequisites on our setting transform the weak * into norm convergence and convergence of the objective values.
Proof The claim follows from Assumption 1 and the continuity of j.
Lemma 3.7 leverages the existence statement on approximating sequences.

Lemma 3.8 Let α ∈ F (R) . Then there exists a sequence (ω (n) ) n ⊂ F (R) of binary controls such that
Proof We construct an order conserving domain dissection. One possibility is to employ a uniform triangulation with uniform refinements, which imply that Definition 3.2 1 and 3 hold for the induced sequence of rounding grids. We perform the refinement such that each triangle is split up into several smaller triangles. Moreover, we construct the sequence of grid cells of the refined grid by replacing each triangle with the set of triangles into which it was split up. Therefore, Definition 3.2 2 holds for the resulting sequence of rounding grids. Then one may use one of the approximation algorithms like SUR to compute a sequence of binary controls (ω (n) ) n ⊂ F (R) on these rounding grids.
Let d (n) denote the integrality gap and Δ (n) the grid constant induced by the n-th rounding grid for n ∈ N. By mapping grid cells to intervals that decompose the interval [0, 1](see [22,Sect. 4]), the arguments in [15,26] imply that there exists C > 0 such that for all n ∈ N. The uniform refinement gives that d (n) (ω (n) , α) → 0 for n → ∞. Thus, we apply Proposition 3.5 and obtain The second claim follows from Lemma 3.7.
Starting from Lemma 3.8, we can prove the following optimality principle.
Theorem 3.9 Let Assumption 1 hold. For the optimization problems (P) and (R), it holds true that The optimization problem (R) admits a minimizer.
Proof Since (R) is a relaxation of (P), it suffices to show ≥ for the first claim. Let (α (n) ) n be a minimizing sequence for (R). We note that j(K R (α (n) )) → −∞ is possible here. For all n ∈ N and all ε > 0, we can construct a binary control α (k n ) ∈ F (R) such that j(K R (α (k n ) )) < j(K R (α (n) )) + ε by Lemma 3.8. The choice P) and the identity K R (α (k n ) ) = K (x (k n ) ) from the assumptions yield the first claim. We observe that F (R) is closed with respect to the weak * topology. To see this, we first apply [32, Theorem 3], which gives that the limit of a weakly * convergent sequence in F (R) is an a.e. [0, 1] M -valued function. The coordinate sequences sum to one a.e. because adding L ∞ (Ω T )-functions is a continuous operation with respect to the weak * topology in both arguments. Moreover, every sequence in F (R) is bounded and thus admits a weak * accumulation point. Consequently, F (R) is compact with respect to the weak * topology and the third claim follows from the extreme value theorem as the mapping j • K R is continuous from the weak * topology of L ∞ (Ω T , V ) to R.

Remark 3.10
If V is the topological dual space of a Banach space, and V has the Radon-Nikodym property, analogous arguments hold for the relationship between (Q) and (P).

Order-conserving domain dissections and convergence rate
To motivate the results in this section, we consider the first three approximants of the Hilbert curve, a surjective and continuous mapping of the unit interval to the unit square. A facsimile of the figure in [13] is displayed in Fig. 1.
By inspection of Fig. 1 and the recursive definition of the Hilbert curve iterates, we observe that the ordering of the squares along the curve is preserved from an iterate to the next. This is formulated as Definition 3.2 2 and gives rise to the name order conserving domain dissections for sequences of rounding grids that satisfy this property. Example 4.6 in [22] shows that Proposition 3.5 does not hold true if it is dropped. Moreover, we observe that since the Hilbert curve iterations induce a uniform refinement of the grid cells, Definition 3.2 1 and the regular shrinkage condition Definition 3.2 3 are satisfied as well.
In [22], we have executed the SUR algorithm mentioned above to approximate continuous relaxations of the following elliptic boundary value problem (BVP) with Ω = (0, 1) 2 for a given control α on the ordering induced by successive Hilbert curve iterates. Again, we denote the grid constant by Δ. Figure 2 suggests that a convergence rate for the state approximation error may be obtained (in the numerics we observe O(Δ)).

Order-conserving domain dissections
Order conserving domain dissections have the advantage that we can conserve the quantity S (n) i (α − ω (n) ) in further grid iterations because the successive integration (or averaging) always happens on partitions of cells from previous iterations; see [20]. Intuitively, one can think of Definition 3.2 2 as a means to maintain spatial coherence of the error quantity in all coordinate directions during the grid refinements. We briefly consider the topology induced by order-conserving domain dissections. A preliminary variant of these results is part of the PhD thesis [19].
is a family of seminorms. The locally convex vector space of L p (Ω T )-functions equipped with the topology determined by the family of seminorms (ν (n) ) n is Hausdorff, that is it is able to separate points.
Proof The seminorm properties are induced from the pseudometric properties established in Lemma 4.1. Thus the vector space of L p (Ω T )-functions equipped with the topology induced by (ν (n) ) n is locally convex. For the Hausdorff property, we verify that if ν (n) ( f ) = 0 holds for all n ∈ N then f = 0 a.e. Let ν (n) ( f ) = 0 for all n ∈ N. Thus, for all n ∈ N and all k ∈ {1, . . . , N (n) }, we deduce Since an order conserving domain dissection is a sequence of partitions of the domain Ω T , it holds that for all x ∈ Ω T there exists indices i k for all k ∈ N. Moreover, the Definition 3.2 2 gives Combining these inclusions with Definition 3.2 1 and 3 allows us to apply the Lebesgue differentiation theorem, see [31,Corollary 1.7]. This gives the identity 0 for a.a. x ∈ Ω T , which means that f = 0 a.e., which closes the proof.

Control convergence rate in H −1
This subsection shows that order-conserving domain dissections allow us to prove a convergence rate for the state vector approximation y(ω (n) ) → y(α). Assume K R is the solution operator of an elliptic BVP like (4.1). Then the Lax-Milgram lemma and suitable regularity of the right hand side -that is if the f i are Lipschitz continuous, f i ∈ W 1,∞ (Ω T ) -of an elliptic BVP may yield Lipschitz estimates of the form for some L > 0. Thus we aim for an estimate on α − ω H −1 from bounds on d(ω, α) if ω is a binary control. Regarding the rounding grid one additional regularity assumption is necessary to constrain the ratio of the diameters among the grid cells per rounding grid. Before stating the estimate, we define a helper function f for d ∈ N and 0 < ε ≤ 1 as j . Let (β (n) ) n be a sequence of relaxed controls and assume that there existsĈ > 0 such that d (n) (β (n) , α) ≤ĈΔ (n) for all n ∈ N. Then, for all 0 < ε ≤ 1 there exist C(ε) > 0 such that for all grid levels n ∈ N for which there exists a grid level The constant C(ε) only depends on ε if d = 2. We have to estimate α −β (n) denotes the duality pairing of H 1 0 (Ω T ) and H −1 (Ω T ) (we could more generally also consider H 1 (Ω T ) and H 1 (Ω T ) * ). Let (φ δ ) δ ⊂ C ∞ 0 (R d , R) be a family of positive mollifiers, see Definition A.1. Then there exists C 1 > 0 such that for all 1 ≤ p ≤ ∞, we have We restrict to the case w ∈ H 1 0 (Ω T ) and note that for H 1 (Ω T ), one needs to work with an extension instead of the extension by 0. This is possible for the assumed Lipschitz domain by virtue of Stein's extension theorem, see [30, Sect. VI. §3.1 Thm 5]. Let Then, the mollification w δ of w, satisfies the estimate which is proven in Lemma A.2. We combine Young's convolution inequality with the continuous embedding for d = 2 and all q = 1 + ε with 0 < ε ≤ 1, and we obtain and depends on ε if d = 2. Using (4.3), we can abbreviate the estimates on w δ L ∞ (Ω T ) as for a.a. x ∈ Ω T . Now, we consider a rounding grid at level n 0 with grid constant Δ (n 0 ) and denote by H = H (n 0 ) the maximum diameter of its elements. Moreover, we choose δ > 0 such that δ s = H for some s > max{1, d/2}, which will be adjusted below. Due to the boundedness of the ratio of diameters, the rounding grid at level n 0 decomposes holds, see Lemma A.3.
We use the abbreviation I (n 0 ) := {1, . . . , N (n 0 ) }. For all n ≥ n 0 , we obtain for all 0 < ε ≤ 1. Here, the first inequality follows from the triangle inequality. The second inequality follows from the fact that w H is piecewise constant per grid cell. Because the cell averages w H do not exceed the extremal values, the estimate Because the grid sequence is an order conserving domain dissection, and in particular Definition 3.2 2 holds, we have the estimate for all ∈ I (n 0 ) and for all rounding grids n ≥ n 0 . Thus, the triangle inequality gives in iteration n for C 4 :=Ĉ from the prerequisites. We set C 5 (ε) := max In iteration n ≥ n 0 , we find r ≥ s such that the maximum grid size (diameter) is given by H n = δ r . By Definition 3.2 3 we obtain with constants C 7 > 0 and C 6 > 0 independent of n.
We combine the estimates above to obtain To obtain the second inequality, we have used (4.6) for the third term. For the first and second term, we have applied Hölder's inequality to estimate α − β (n) L 2 and α − β (n) L 1 using the estimate α − β (n) L ∞ ≤ 1. Here, α − β (n) L ∞ ≤ 1 follows straightforwardly from the fact that α and β (n) are relaxed controls. We may include λ(Ω T ) into the constants because λ(Ω T ) < ∞ follows from the fact that Ω T is a bounded domain and hence a bounded open subset of R d .
For the third inequality, we note that the first two terms in the max-operation in the definition of C(ε) follow from the combination of the estimates (4.4) and (4.5) rd , which follows from (4.7), δ > 0, and the positive exponent s − d/2 of δ. The positivity of s − d/2 follows from the restrictions on the choice of δ and s above. The third term follows from (4.6) combined with the upper estimate in (4.7), which can be applied here because the exponent of δ is negative in (4.6), which follows from d ≥ 1, s ≥ 1 and f (d, ε) ≥ 0.
Balancing the terms requires the identities This gives the estimate ). Note that to derive the estimate, we have made the choice H n = δ r . However, after the balancing identities are solved, r is set to a specific value and s and H n 0 dictate the value of δ. The argument holds true with a change in the constant C(ε) that does not depend on n 0 and n if we have H n = Θ(δ r ) instead of the definite choice H n = δ r . Combining our insights above, we deduce which implies that H n = Θ(δ r ) indeed holds true and concludes the proof.
A few remarks are in order here.

Remark 4.6
The proof presented above balances several approximations based on mollification, piecewise averaging and the bound on the integrality gap induced by rounding algorithms. An improved estimate can be obtained under the additional assumption that the grid cells of a rounding grid are ordered along the coordinate axis (time axis) in the case d = 1. In this case, one can derive an improved estimate following the lines of [12,25] that lead to their state space estimate in C([0, T ]) into which W 1, p ((0, T )) is continuously embedded. This is shown briefly in the next subsection.

Remark 4.7
We have formulated the proof of Theorem 4.5 for relaxed controls β (n) to do justice to the generality of the argument. However, one should keep in mind that all binary controls are of course relaxed controls as well. We note that for binary controls ω (n) produced by the rounding algorithm SUR, we obtain the bound d (n) (ω (n) , α) ≤ C 4 Δ (n) for some fixed C 4 > 0. In the case d = 1, this follows directly from the analysis in [15,26]. For d ≥ 2, this follows with the arguments in Section 2 of [22], in particular Proposition 2.4.

Remark 4.8
For the balancing argument to hold, we make an assumption on the grid levels, namely Δ (n) = Θ (Δ (n 0 ) ) q for a specific q > 1 depending on d. We show in Proposition 4.9 below that this can be satisfied under under mild assumptions on the grid refinement, namely that the considered maximum grid cell volumes are montonously decreasing and that Θ(Δ (n) ) = Θ(Δ (n+1) ).

Improved bound in the one-dimensional case
We consider the case that a rounding algorithm is executed on the discretization t 0 < . . and the fact that α − ω is a piecewise monotone function because ω is piecewise constant and binary-valued. The order of the grid cells (intervals) S i along the timeaxis matters in this case; see [19,26].  ((0,T ),R M )) * ≤Cd 1D (α, β) for somẽ C > 0.
Proof For i ∈ {1, . . . , M}, we restrict to the coordinate sequence and drop the subscripts i of α i and β i below. We abbreviate W 1, p := W 1, p ((0, T )). We compute an estimate on where ·, · denotes the duality pairing of (W 1, p ) * and W 1, p . Let w ∈ W 1, p with w W 1, p ≤ 1. Since α − β ∈ L ∞ ((0, T )), we represent the duality pairing with integration. The one-dimensional domain implies that w has a continuous representative and that the continuous embedding W 1, p → C([0, T ]) holds, see [1,Thm 5.4]. We use integration by parts and the triangle inequality to deduce The first term of the right hand side is bounded by C 1 w W 1, p d 1D (α, β), where the constant C 1 > 0 is due to the continuous embedding W 1, p → C([0, T ]). The second term is bounded by C 2 w W 1, p d 1D (α, β) for some C 2 > 0 by means of Hölder's inequality, where the constant C 2 > 0 is due to the the continuous embeddings W 1, p → L p ((0, T )) → L 1 ((0, T )).
Combining these estimates for the coordinate sequences with the equivalence of the p -norms on R M for p ∈ [1, ∞] yields the claim.

(P")
The problem (P") constitutes a case where the dynamics of the process are not governed by a differential equation. It arises from (P) by defining j : with the choices V := R and K : for a fixed filter kernel function k ∈ L 1 ((t 0 , t f )) and a fixed tracking objective f ∈ L 2 ((t 0 , t f )). This setting is well-defined by Young's convolution inequality.
To relate this problem to the analysis of Sect. 3, we define the operator K R : Then, we obtain the following proposition, which implies that Assumption 1 holds for the considered problem. ((t 0 , t f )), V := R. Let j, K , and K R defined as above. Then, Assumption 1 holds. Moreover, the operator K :

) is ultraweak-completely continuous
Proof All properties except the ultraweak-complete continuity of K and K R follow immediately from the definition. The operators K and K R are linear, the space R is reflexive, and x n * x in L ∞ ((t 0 , t f )) implies x n x in L 2 ((t 0 , t f )), see also Remark 2.2. Thus, it suffices to know that K and K R are compact operators, which follows for example from [29, Thm 3.1.17].
Following Sect. 2, we obtain the relaxation with ξ L := min{ξ 1 , . . . , ξ M } and ξ U := max{ξ 1 , . . . , ξ M }. To estimate grid constants for the rounding algorithm a priori, we are interested in estimates on the reconstruction error of the filtered trajectory in L 2 , that is on Here, x = x(α) denotes a feasible point of (Q") and x Δ = x(ω) = M i=1 ω i ξ i the discrete-valued input variable arising from an approximation of x on a rounding grid with grid constant Δ.
We follow the considerations in Sect. 4.3, and use the function d 1D to derive an additional priori estimate. A preliminary version of the result has been obtained as Theorem 9.12 in the PhD thesis [19].

Theorem 5.2
Let Ω T = (t 0 , t f ). Let α be a relaxed control and ω be a binary control.
for some C > 0 depending on p, ξ R M , t 0 , t f , and k W 1,1 . For p = ∞, the estimate also holds in C([t 0 , t f ]).
Proof Let Y := W 1, p ((t 0 , t f )) and thus Y * = (W 1, p ((t 0 , t f ))) * . Then, where the second identity follows from the Riesz-Fréchet representation theorem and the continuous embedding Y → L 2 ((t 0 , t f )). The inequality follows from the definition of the dual norm. Clearly, k(t − ·) Y ≤ k(t − ·) W 1,1 (R) holds for all t ∈ (t 0 , t f ). Moreover, and the claim follows by virtue of Theorem 4.10 and Hölder's inequality.

Computational experiments
A discretization transforms (Q) into a finite-dimensional linear-least squares problem which can be solved with standard algorithms for convex optimization problems with box constraints. We name the references [3,34] which are implemented in the Open-Source library SciPy, see [33], which we use for the computational results in this section.

The Sum-Up Rounding Algorithm for Control Approximation
We briefly recap the sum-up rounding (SUR) algorithm for one-dimensional problems, which is one possible approximation algorithm in the second step of the CIA decomposition. It is stated in Definition 6.1 below. Definition 6.1 (Sum-Up-Rounding Algorithm, [24,26,28]) If the maximizing index j is ambiguous, the smallest of the maximizing indices is chosen.
SUR proceeds through the time intervals indexed by i = 0, . . . , N −1 and computes the approximation for the current interval. The index j ∈ {1, . . . , M} identifies an coordinate of the function ω. In the first iteration, the coordinate, in which t 1 t 0 α exhibits the highest value, is set to one in ω on the interval [t 0 , t 1 ]. All other entries of ω are set to zero in the first interval. This procedure is iterated. For the i-th time interval index, SUR sets ω to one in the coordinate that exhibits a maximum value of t 0 ω, and to zero in all other coordinates. As mentioned above, Proposition 3.5, Lemmas 3.7 and 3.8 and Theorem 3.9 hold for approximations constructed by means of SUR. We briefly recap that Proposition 3.5 holds, which yields the other statements. We illustrate the behavior of SUR in Fig. 3. We have executed SUR to compute binary controls for two predetermined relaxed controls. The sigmoid function in the left column is approximated very closely by SUR because most of its values are almost binary-valued already. In the bottom image one observes that a finer discretization implies that a difference in norm has to persist when the ascent of the function is approximated more closely. Contrary to the case of the sigmoid function, the difference between the constant function and its SUR approximants in the right column in norm is constant regardless of the chosen discretization. The right column depicts the same effect as [21, Fig. 1].

Signal processing example
We consider a similar example to the one from [4], which stems from Filtered Approximation in electronics. We introduce a function We define the convolution kernel for (P) and the reformulations and relaxations thereof as follows which yields We use t 0 = −1 and t f = 1 as domain bounds. Regarding the target function f , we set f (t) := 0.2 cos(2π t). Assuming that an equidistant discretization of (t 0 , t f ) into N intervals, i.e. t f − t 0 = N Δ, we obtain a piecewise constant function Settingg(s) := (κ(s)1 s≥0 − κ(s − Δ)1 s≥Δ ), we obtain an IQP similar to the one studied in [4]. We have chosen the parameter values ω 0 = π and A = 0.1, see also [4, Fig. 1].
The feasible realizations for the images of x are ξ L = ξ 1 = −1, ξ 2 = 0 and ξ U = ξ 3 = 1. We discretize the resulting relaxation (Q) as described above. Then, we solve (Q) using scipy.least_squares, that is with SciPy's Trust Region implementation (with parameter method='trf' -Trust Region Reflective algorithm), see [33]. To apply SUR, we need to compute the convex coefficient functions α from x such that M i=1 α i ξ i = x. Because this computation is not unique, we have chosen the most intuitive one from our point of view, see also [22], for an elliptic control problem. Specifically, we compute α(t) such that for t ∈ [t 0 , t f ], x(t) is the interpolant between its two neighboring points in {ξ 1 , . . . , ξ M }, that is we select i such that ξ i ≤ x(t) ≤ ξ i+1 and set and α j (t) := 0 for j / ∈ {i, i + 1}. Then, we apply SUR on a sequence of successively refined grids until the rounding grid coincides with discretization grid for the solution of (Q). We note that the convergence holds if the approximation continuous relaxation is not fixed but refined in every iteration as well and the minimizers of the approximations of the continuous relaxation converges to a minimizer of (P); see [22].

Results
For 256 intervals discretizing (t 0 , t f ), we have visualized the results in Fig. 4. The images in the left column show k * x(α) in the first row and k * x(ω) for ω being computed for N = 4, N = 32 and N = 256 rounding intervals. The convergence of k * x(ω) to k * x(α) is clearly visible. The right column shows the solution of (Q"), x(α), in its first row and the SUR approximants x(ω) for the rounding grids consisting of N = 4, N = 32 and N = 256 intervals.
For 4096 intervals discretizing (t 0 , t f ), we have tabulated the approximation and the relative error in the objective in Table 1. The relative error, which is the relative error of a squared L 2 -difference, approximately follows a trend proportional to Δ 2 . The convergence to zero can be observed clearly. We consider the execution times of the code on a laptop computer equipped with a Intel(R) Core(TM) i7-6820 CPU clocked at 2.70 GHz. The main part of the computational costs is caused by the solution of (Q"). The costs for the execution of SUR are negligible. The execution time for 4096 intervals is 9222 s. Execution times for 2 i intervals, i ∈ {7, . . . , 12} are tabulated in Table 2.
Parts of the numerical results, in particular preliminary versions of Table 2 and Fig. 4 have been published in the PhD thesis [19].

Conclusion
The computational results in Sect. 6 strengthen our claim that the proposed methodology provides a computationally efficient way to compute discrete-valued distributed  Table 1 Convergence j(K (x(ω Δ ))) → j(K (x(α))) with convolution and relaxed solution computed on finest grid   Table 2 Execution times of the solution of (Q") for N intervals discretizing (t 0 , t f ). This is  and a constructive way to compute a minimizing sequence to the optimum. Finally, we note a shortcoming in the presented theory. To compute solutions of the relaxed problems (Q) or (R) numerically efficiently, it is often necessary to introduce regularization since the problems are usually not strictly convex. where r : L p → R denotes the regularizer and R := sup x∈[ξ L ,ξ M ] r (x). Thus, the suboptimality is controlled by the value of coefficient λ.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
A Auxiliary statements for Theorem 4.5 Proof Let S be a grid cell with diameter less or equal than H . Then, we have for x ∈ S that Dividing the volume of Ω T through this lower estimate on the volume of a single grid cell gives