On the rate of convergence of alternating minimization for non-smooth non-strongly convex optimization in Banach spaces

In this paper, the convergence of the fundamental alternating minimization is established for non-smooth non-strongly convex optimization problems in Banach spaces, and novel rates of convergence are provided. As objective function a composition of a smooth, and a block-separable, non-smooth part is considered, covering a large range of applications. For the former, three different relaxations of strong convexity are considered: (i) quasi-strong convexity; (ii) quadratic functional growth; and (iii) plain convexity. With new and improved rates benefiting from both separate steps of the scheme, linear convergence is proved for (i) and (ii), whereas sublinear convergence is showed for (iii).


Introduction
The (cyclic) block coordinate descent (BCD), in the literature also referred to as nonlinear block Gauss-Seidel or successive subspace correction method, is a fundamental optimization algorithm [4,12]. Given a block structured minimization problem, it consists of the successive minimization with respect to the single blocks. Since numerous applications naturally inherit a block structure, the BCD and its variations have been of great interest for decades-especially whenever it is more convenient or feasible to solve the corresponding subproblems instead of the globally coupled problem. For an overview, we refer to the review paper [15].
The convergence of the BCD has been extensively studied in the literaturetypically in Euclidean spaces. For instance, if already partial minimization is

Alternating minimization for two-block structured model problem
We consider the two-block structured model problem where B 1 , B 2 , f , g 1 , g 2 satisfy the following properties: is a Banach space with its dual B i , · i, and the duality pairing ·, · i , i = 1, 2. The index will be omitted for duality pairings. (P2) The function g i : B i → R ∪ {∞} is proper convex, (Fréchet) subdifferentiable with subdifferential ∂ g i on dom g i , i = 1, 2. Let D := dom g 1 × dom g 2 . (P3) The function f : B 1 × B 2 → R is convex and (Fréchet) differentiable over D.
Let ∇ f denote the (Fréchet) derivative of f . (P4) The optimal set of problem (1), denoted by X ⊂ B 1 × B 2 , is non-empty, and the corresponding optimal value is denoted by H . (P5) For any (x 1 ,x 2 ) ∈ D, the following problems have minimizers min and min Exploiting the particular two-block structure, we consider the iterative solution of (1) via the classical alternating minimization, cf. Algorithm 1.

Algorithm 1 Alternating minimization for model problem (1)
(2) General step: For k = 0, 1, ..., given x k ∈ D, find x k+1 ∈ B 1 × B 2 such that x k+1 As in [2], the partial optimality condition (2) on the initial guess has been chosen for the sake of simpler notation in the subsequent analysis; we will analyze the convergence behavior of Algorithm 1 under the following additional assumptions on the product structure and smoothness: (A1) B 1 × B 2 is equipped with a separate norm · and β 1 , β 2 ≥ 0, satisfying Furthermore, B 1 × B 2 is equipped with a canonical duality pairing ·, · .
(A2) The partial (Fréchet) derivative of f with respect to the i-th component, denoted by such that x 1 + h 1 ∈ dom g 1 , equivalently by a block version of the so-called descent lemma [2,4] f ( Remark 1 (Semi-normed spaces) The following analysis does in fact not require · or · i , i = 1, 2, to be positive definite. Consequently, it is sufficient to formulate (5) and (6) as well as convexity properties (specified in each section), with respect to semi-norms. Without introducing additional notation, we also subsequently allow · and · i , i = 1, 2, to be merely semi-norms.

Linear convergence in the quasi-strongly convex case
In this section, linear convergence is established for the alternating minimization applied to model problem (1) under additional quasi-strong convexity for f : , for all x ∈ D andx := arg min x − y y ∈ X , the projection of x onto X , it holds Any strongly convex function is quasi-strongly convex. Moreover, by convexity of g 1 and g 2 , H inherits quasi-strong convexity from f [with (A3a) stated for subdifferentiable functions].

Numerical test for quasi-strongly convex minimization in a Euclidean space
To assess the sharpness of Theorem 1 under the use of suitable problem-dependent norms, we consider a two-block structured, unconstrained, quadratic, convex optimization problem in a Euclidean space (here R n+m , n, m ∈ N) with A 1 , A 2 , A, b properly dimensioned. We assume that A is non-zero. Then by Theorem 8 in [11], the problem (12) is quasi-strongly convex w.r.t. the Euclidean l 2 norm, with σ = σ min (A) 2 , where σ min (·) denotes the minimal singular value. Furthermore, it satisfies the smoothness and convexity assumptions of Theorem 1 However, the generality of Theorem 1 also allows for utilizing problem-dependent norms, allowing for improving the straight forward result (13). Having Remark 1 in mind, set · i := · A i A i , i = 1, 2, where x 2 S := x Sx for any symmetric, suitably dimensioned matrix S. Consequently, it is L 1 = L 2 = 1. In addition, let η > 0, I be the identity matrix (in any dimension), and define the norm on the product space by In order to determine σ and β i , it follows from standard linear algebra that error Finally, σ and β i are obtained by maximizing the singular values w.r.t. η, equivalent with the limit η → 0. Thus, Theorem 1 predicts that for all k ≥ 0 it holds Using a small example, we demonstrate the sharpness of (14) opposing to (13). Let For this choice, the two bounds in (13) and (14) are given by λ ≈ 0.717 and λ opt ≈ 0.245, respectively. In Fig. 1, the theoretical and actual performances of the alternating minimization applied to (12) are visualized for the initial guess x 0 1 := 0. We observe a good agreement between the practical convergence rate and the theoretical bound λ opt , stemming from the analysis using problem-dependent norms.

Linear convergence in the quadratic functional growth case
In this section, linear convergence is established for the alternating minimization applied to model problem (1) under additional quadratic growth for H : (A3b) The objective function H : B 1 × B 2 → R has quadratic functional growth w.r.t.
Following a similar strategy as in the proof of Theorem 1, we show q-linear convergence. We stress that opposing to the analysis of general feasible descent methods for problems with quadratic functional growth, cf., e.g., [11], a feasible descent propertyensured e.g. for block coordinatewise strongly convex functions-is not explicitly required for a mere two-block structure.
Proof We consider the first half-step of the alternating minimization and show W.l.o.g. we assume that β 1 Utilizing the convexity and smoothness of f , we then obtain By (i) introducing γ ∈ (0, 1] to be specified later, (ii) using the Lipschitz continuity of ∇ 1 f , cf. (A2), and the convexity of f , and (iii) the definition of β 1 , cf. Eq. (5), we moreover obtain Based on same grounds as utilized for deriving (9) and (10), it holds

Sublinear convergence in the plain convex case
In this section, sublinear convergence is established for the alternating minimization applied to model problem (1)  The following result predicts a two-stage behavior: first, the error decreases qlinearly until sufficiently small; after that, sublinear convergence is initiated. The shift depends on the smoothness properties of the problem.
where · and [·] + respectively denote the ceiling function and the restriction to the positive part. It holds for all k ≥ 0 In particular, for k ≥ m at the earliest, sublinear convergence kicks in.
The proof utilizes two auxiliary results: general descent properties for each subiteration of the alternating minimization, and a criterion for concluding sublinear convergence. Those are summarized in the following two lemmas.

Lemma 1 Under the assumptions of Theorem 3, it holds for all k ≥ 0 that
By definition of R, cf. (A3c), and the monotonicity of We distinguish two cases: If H k − H > 2L 1 R 2 β 1 , we choose γ = 1; otherwise, we choose γ = β 1 2L 1 R 2 (H k − H ). This finally proves the first part of the assertion (22). The second part (23) analogously follows by symmetry.
Finally, we are able to prove Theorem 3.

Proof of Theorem 3
As long as H k − H > 2 min L 1 β 1 , L 2 β 2 R 2 for some k ∈ N 0 , by Lemma 1 and the monotonicity of {H k } k=0,1,... , it holds that Thereby, there exists a minimal m ≥ 0 such that H k − H ≤ 2 min L 1 β 1 , L 2 β 2 R 2 for all k ≥ m. Assuming m ≥ 1, Eq. (25) holds for all k ≤ m − 1, and it holds Thus, it holds that m < log 2 , and consequently (including the case m = 0), m ≤ m , with m as defined above.

Remark 2 (Exponential decay during the first iterations)
In the case it holds max L 1 β 1 , L 2 β 2 < ∞, and the initial error satisfies H 0 − H > 2 max L 1 β 1 , L 2 β 2 R 2 , the result of Theorem 3 can be in fact improved. By an analogous line of argumentation as in the above proof, one can conclude that H k − H first contracts with a rate of 1 4 for the first k 1 iterations, until H k 1 − H ≤ 2 max L 1 β 1 , L 2 β 2 R 2 for some k 1 ∈ N 0 . Afterwards, the convergence behavior can be qualitatively predicted as in Theorem 3. Ultimately, m is of the order

Numerical example inspired by multiphysics
Sequential solution strategies are widely used in the context of multiphysics applications. Provided a multiphysics problem enjoys a minimization structure, a sequential solution is closely related (or even equivalent) to applying alternating minimization to the underlying minimization problem.
In the following, we numerically demonstrate the efficacy of alternating minimization to a problem, inspired by poroelasticity applications, i.e., flow in deformable porous media. The following model problem corresponds to an elasticity-like vectorial p-Laplace equation coupled with a Darcy-type equation for non-Newtonian fluids, with a Biot-Darcy-type coupling, see [5,10] for more details. For instance, we consider the representative coupled problem where Ω = (0, 1) × (0, 1) ⊂ R 2 denotes the domain, α, β ∈ R, μ, κ ∈ R >0 , f ∈ R 2 are model parameters, p, q ∈ (1, ∞), and the solution spaces are defined by where L p (resp. L q ) denotes the standard Lebesgue space and n ∂Ω is the outer normal vector on the boundary ∂Ω of Ω. We note the solution spaces U and Q are closely related to the standard Sobolev spaces W 1, p 0 (Ω) and H 0 (div; Ω), respectively. We fix α = 1, β = 10, μ = 1, κ = 0.1, f = (1, 1), p = q = 1.5. The corresponding solution is displayed in Fig. 2a.
For the numerical solution, the problem (28) is discretized using the Galerkin method and linear finite elements for u and q on a Cartesian grid with uniform mesh  . The corresponding discrete minimization problem is then solved using Alg. 1 with an initial guess (u 0 , q 0 ) = (0, 0). For the implementation, the DUNE project [13] and in particular the dune-functions module [6] have been utilized.
Let H denote the energy corresponding to the (converged discrete) solution of (28), and H k the energy of the approximation (u k , q k ) of the k-th step of Algorithm 1. The decay H k − H is displayed in Fig. 2b for the three mesh sizes. We observe linear, essentially mesh-independent convergence. In addition, we mention a decreasing trend for the energy values H for consecutively refined grid, as expected due to the consecutively more accurate discretization. In particular, it is H ≈ −7.077e − 3 for N = 4, H ≈ −7.137e − 3 for N = 5, H ≈ −7.153e − 03 for N = 6.
We note the choices for p and q lead to a non-quadratic problem, whose coupling however is governed by a quadratic, merely semi-definite contribution. Hence, the considered problem is closely related with the small algebraic problem in Sect. 3.1, and after all leads to consistent observations. The in principle mesh-independent convergence demonstrates that convergence is most adequately described in problemdependent, i.e., not standard Euclidean norms, which would in contrast suggest mesh-dependent convergence.

Discussion and concluding remarks
In this paper, we have established convergence of the alternating minimization applied to a two-block structured model problem within the class of non-smooth non-strongly convex optimization in general Banach spaces -a fairly broad setting. We have considered three cases of relaxed strong convexity: (i) quasi-strong convexity, (ii) quadratic functional growth, and (iii) plain convexity and a compact initial level set. Convergence rates have been provided, of linear type for the first two cases, and of sublinear type for the third case. To the best of the author's knowledge, all results are novel.
Our results are direct extensions of previous results in the literature [2,3,11], agreeing with or partially refining them if put in the same context, and being valid also in more general scenarios. The key for arriving at our results has been the exploitation of describing smoothness properties (of the two single blocks) and convexity properties (of the full objective function) wrt. different (semi-)norms; these enter the novel rates predicting in particular that both steps of the alternating minimization separately lead to an error decrease. For the subclass of quasi-strongly convex problems, we demonstrate the sharpness of our convergence result, based on a simple numerical example. In addition, an illustrative numerical example inspired by multiphysics demonstrates the efficacy of alternating minimization for PDE-based problems. Finally, we highlight that for the first time, it is proved that quadratic functional growth is sufficient for linear convergence -without any feasible descent property as commonly required in the analysis of the general block coordinate descent [9,11].
Ultimately, it is noteworthy that the provided results allow for a systematic development and analysis of iterative block-partitioned solvers based on the alternating minimization for problems in applied variational calculus -in particular two-way coupled PDEs arising from a convex minimization problem, see, e.g., [5].