Introduction

Fixed-point iterations abound in applied mathematics and engineering. This classical technique, dating back to [43, 56, 59], involves the following two steps. First, find an operator \(T:\mathcal {X}\rightarrow \mathcal {X}\), where \(\mathcal {X}\) is some space, such that if \(x^\star =T(x^\star )\), i.e., if \(x^\star \) is a fixed point, then \(x^\star \) is a solution to the problem at hand. Second, perform the fixed-point iteration \(x^{k+1}=T(x^k)\). Convergence of such iterative methods is usually proved analytically, through a series of inequalities.

In this paper, we present a geometric approach to analyzing contractive and nonexpansive fixed-point iterations with a new tool called the scaled relative graph (SRG). We can think of the SRG as a signature of an operator analogous to how eigenvalues are a signature of a matrix. The SRG provides a correspondence between algebraic operations on nonlinear operators and geometric operations on subsets of the 2D plane. Using this machinery and elementary Euclidean geometry, we can establish properties of operators (such as contractiveness) and establish the convergence of fixed-point iterations through showing the SRG, a set in the 2D plane, resides within certain circles. These geometric arguments form rigorous proofs, not just illustrations.

One advantage of geometric proofs is that a single or a few geometric diagrams concisely capture and communicate the core insight. In contrast, it is much more difficult to extract a core insight from a classical analytic proof based on inequalities. Another advantage is that tightness, loosely defined as being unable to improve a stated result without additional assumptions, is often immediate. In contrast, discerning whether it is possible to make improvements when examining a proof based on inequalities is usually more difficult; providing a matching lower bound is often the only way to establish tightness of such results.

Proving convergence with operator properties

Given \(T:\mathcal {H}\rightarrow \mathcal {H}\), where \(\mathcal {H}\) is a real Hilbert space with norm \(\Vert \cdot \Vert \), consider the fixed-point iteration given by

$$\begin{aligned} x^{k+1}=T(x^k) \end{aligned}$$

for \(k=0,1,\dots \) where \(x^0\in \mathcal {H}\) is a starting point. We say \(x^\star \) is a fixed point of T if \(x^\star =T(x^\star )\). We say \(T:\mathcal {H}\rightarrow \mathcal {H}\) is nonexpansive if

$$\begin{aligned} \Vert Tx)-T(y)\Vert \le \Vert x-y\Vert ,\qquad \forall x,y\in \mathcal {H}. \end{aligned}$$

In this case, \(\Vert x^{k}-x^\star \Vert \) is a nonincreasing sequence, but \(x^k\) need not converge. For instance, if \(T=-I\), then \(x^k\) oscillates between \(x^0\) and \(-x^0\). We say \(T:\mathcal {H}\rightarrow \mathcal {H}\) is contractive if

$$\begin{aligned} \Vert T(x)-T(y)\Vert \le L\Vert x-y\Vert ,\qquad \forall x,y\in \mathcal {H}\end{aligned}$$

for some \(L<1\). In this case, \(x^k\rightarrow x^\star \) strongly with rate \(\Vert x^k-x^\star \Vert \le L^k\Vert x^0-x^\star \Vert \). This classical argument is the Banach contraction principle [3]. We say \(T:\mathcal {H}\rightarrow \mathcal {H}\) is averaged if \(T=(1-\theta )I+\theta R\) for some nonexpansive operator R and \(\theta \in (0,1)\), where I is the identity operator. In this case, \(x^k\rightarrow x^\star \) weakly for a fixed point \(x^\star \) provided that T has a fixed point. This result is the Krasnosel’skiĭ–Mann theorem [39, 46]. The assumption of averagedness is stronger than nonexpansiveness and weaker than contractiveness, as illustrated in Fig. 1.

We now have a general rubric for proving convergence of a fixed-point iteration:

  1. 1.

    Prove the operator T is contractive or averaged.

  2. 2.

    Apply the convergence argument of Banach or Krasnosel’skiĭ–Mann.

Many, although not all, fixed-point iterations are analyzed through this rubric. Step 2 is routine. This work presents a geometric approach to step 1, the more difficult step.

Fig. 1
figure 1

The classes of contractive, averaged, and nonexpansive operators represented with the scaled relative graph (SRG). The notion of the SRG and the precise meaning of these figures will be defined soon in Sect. 3

Prior work and contribution

Using circles or disks centered at the origin to illustrate contractive mappings is natural and likely common. Eckstein and Bertsekas’s illustration of firm-nonexpansiveness via the disk with radius 1/2 centered at (1/2, 0) [24, 25] was, to the best of our knowledge, the first geometric illustration of notions from fixed-point theory other than nonexpansiveness and Lipschitz continuity. Since then, Giselsson and Boyd used similar illustrations in earlier versions of the paper [29] (the arXiv versions 1 through 3 have the geometric diagrams, but later versions do not) and more thoroughly in the lecture slides [27]. Banjac and Goulart also utilize similar illustrations [4].

Through personal communication, we are aware that many have privately used geometric illustrations similar to those presented in this paper to initially build intuition, although the actual mathematics and proofs were eventually presented analytically, with inequalities. To the best of our knowledge, the use of geometry for rigorous proofs of results of nonlinear operators is new.

The notion of the SRG was first defined and presented in the authors’ unpublished manuscript [31]. The work shows how transformations of the operator such as inversion, addition of identity, unitary change in coordinates, and composition map to changes in the SRG and used these transformations to geometrically rigorously prove many standard results. It furthermore discusses the Baillon–Haddad Theorem and convergence rates for various operator methods.

Throughout this paper, we state known results as “Facts”. Our contributions are the alternative geometric proofs, the novel results stated as “Propositions” and “Theorems”, and the overall geometric approach based on the SRG.

Preliminaries

We refer readers to standard references for more information on convex analysis [9, 12, 32], nonexpansive and monotone operators [5, 63], and geometry [50, 58, 69]. Write \(\mathcal {H}\) for a real Hilbert space equipped with the inner product \(\langle \cdot ,\cdot \rangle \) and norm \(\Vert \cdot \Vert \). We use Minkowski-type set notation that generalizes operations on individual elements to sets. For example, given \(\alpha \in \mathbb {R}\) and sets \(U,V\subseteq \mathcal {H}\), write

$$\begin{aligned} \alpha U=\{\alpha u\,|\,u\in U\},\quad U+V=\{u+v\,|\,u\in U,\,v\in V\},\quad U-V=U+(-V). \end{aligned}$$

Notice that if either U or V is \(\emptyset \), then \(U+V=\emptyset \). In particular, \(U+V\) is the Minkowski sum. We use similar notation for sets of operators and complex numbers. The meanings should be clear from context, but for the sake of precision, we provide the full definitions in the appendix.

Multi-valued operators For convex analytical and operator theoretic notions, we follow standard notation [5]. In particular, we consider multi-valued operators, which map a point to a set. The graph of an operator is defined as

$$\begin{aligned} \mathrm {graph}(A)=\{(x,u)\,|\, u \in Ax\}. \end{aligned}$$

For convenience, we do not distinguish an operator from its graph, writing \((x,u)\in A\) to mean \(u\in Ax\). Define the inverse operator as

$$\begin{aligned} A^{-1}=\{(u,x)\,|\,(x,u)\in A\}, \end{aligned}$$

which always exists. Define the resolvent of A is \(J_A=(I+A)^{-1}\).

We say \({\mathcal {A}}\) is a class of operators if \({\mathcal {A}}\) is a set of operators on Hilbert spaces. Note that \(A_1,A_2\in {\mathcal {A}}\) need not be defined on the same Hilbert spaces, i.e., \(A_1:\mathcal {H}_1\rightrightarrows \mathcal {H}_1\), \(A_2:\mathcal {H}_2\rightrightarrows \mathcal {H}_2\), and \(\mathcal {H}_1\ne \mathcal {H}_2\) is possible.

Given classes of operators \({\mathcal {A}}\) and \({\mathcal {B}}\), write

$$\begin{aligned} {\mathcal {A}}+{\mathcal {B}}&=\{A+B\,|\,A\in {\mathcal {A}},\,B\in {\mathcal {B}},\,A:\mathcal {H}\rightrightarrows \mathcal {H},\,B:\mathcal {H}\rightrightarrows \mathcal {H}\}. \end{aligned}$$

To clarify, these definitions require that A and B or A and I are operators on the same (but arbitrary) Hilbert space \(\mathcal {H}\), as otherwise the operations would not make sense. We define \({\mathcal {A}}{\mathcal {B}}\), \(I+\alpha {\mathcal {A}}\), and \(J_{\alpha {\mathcal {A}}}\) similarly. For \(L\in (0,\infty )\), define the class of L-Lipschitz operators as

$$\begin{aligned} {\mathcal {L}}_{L}&=\big \{A:{\mathrm {dom}(A)}\rightarrow \mathcal {H}\,|\,\Vert Ax-Ay\Vert ^2\le L^2\Vert x-y\Vert ^2,\,\forall \, x,y\in {\mathrm {dom}(A)}\subseteq {\mathcal {H}}\big \}. \end{aligned}$$

For \(\beta \in (0,\infty )\), define the class of \(\beta \)-cocoercive operators as

$$\begin{aligned} {\mathcal {C}}_{\beta }&=\big \{A:{\mathrm {dom}(A)}\rightarrow \mathcal {H}\,|\,\langle Ax-Ay,x-y\rangle \ge \beta \Vert Ax-Ay\Vert ^2,\,\forall \, x,y\in {\mathrm {dom}(A)}\subseteq {\mathcal {H}}\big \}. \end{aligned}$$

Define the class of monotone operators as

$$\begin{aligned} {\mathcal {M}}&=\big \{A:\mathcal {H}\rightrightarrows \mathcal {H}\,|\,\langle Ax-Ay,x-y\rangle \ge 0,\,\forall \, x,y\in \mathcal {H}\big \}. \end{aligned}$$

To clarify, \(\langle Ax-Ay,x-y\rangle \ge 0\) means \(\langle u-v,x-y\rangle \ge 0\) for all \((x,u),(y,v)\in A\). If \(x\notin {\mathrm {dom}(A)}\), then the inequality is vacuous. A monotone operator A is maximal if there is no other monotone operator B such that \(\mathrm {graph}(B)\) properly contains \(\mathrm {graph}(A)\). For \(\mu \in (0,\infty )\), define the class of \(\mu \)-strongly monotone operators as

$$\begin{aligned} {\mathcal {M}}_{\mu }&=\big \{A:\mathcal {H}\rightrightarrows \mathcal {H}\,|\,\langle Ax-Ay,x-y\rangle \ge \mu \Vert x-y\Vert ^2,\,\forall \, x,y\in \mathcal {H}\}. \end{aligned}$$

For \(\theta \in (0,1)\), define the class of \(\theta \)-averaged operators \({\mathcal {N}}_\theta \) as

$$\begin{aligned} {\mathcal {N}}_\theta =(1-\theta )I+\theta {\mathcal {L}}_1. \end{aligned}$$

In these definitions, we do not impose any requirements on the domain or maximality of the operators.

Following the notation of [54], respectively write \({\mathcal {F}}_{\mu ,L}\), \({\mathcal {F}}_{0,L}\), \({\mathcal {F}}_{\mu ,\infty }\), and \({\mathcal {F}}_{0,\infty }\) for the sets of lower semi-continuous proper functions on all Hilbert spaces that are respectively \(\mu \)-strongly convex and L-smooth, convex and L-smooth, \(\mu \)-strongly convex, and convex, for \(0< \mu< L < \infty \). Write

$$\begin{aligned} \partial {\mathcal {F}}_{\mu ,L} = \{\partial f \,|\, f \in {\mathcal {F}}_{\mu ,L}\}, \end{aligned}$$

where \(0 \le \mu < L \le \infty \).

Inversive geometry We use the extended complex plane \(\overline{\mathbb {C}}=\mathbb {C}\cup \{\infty \}\) to represent the 2D plane and the point at infinity. We call \(z\mapsto {\bar{z}}^{-1}\), a one-to-one map from \(\overline{\mathbb {C}}\) to \(\overline{\mathbb {C}}\), the inversion map. In polar form, it is \(re^{i\varphi }\mapsto (1/r)e^{i\varphi }\) for \(0\le r\le \infty \), i.e., inversion preserves the angle and inverts the magnitude. In complex analysis, the inversion map is known as the Möbius transformation [1, p. 366]. In classical Euclidean geometry, inversive geometry considers generally the inversion of the 2D plane about any circle [58, p. 75]. Our inversion map \(z\mapsto {\bar{z}}^{-1}\) is the inversion about the unit circle.

Generalized circles consist of (finite) circles and lines with \(\{\infty \}\), and the interpretation is that a line is a circle with infinite radius. Inversion maps generalized circles to generalized circles. Using a compass and straightedge, the inversion of a generalized circle can be constructed fully geometrically. In this paper, we use the following semi-geometric construction:

  1. 1.

    Draw a line L through the origin orthogonally intersecting the generalized circle.

  2. 2.

    Let \(-\infty<x< y\le \infty \) represent the signed distance of the intersecting points from the origin along this line. If the generalized circle is a line, then \(y=\infty \).

  3. 3.

    Draw a generalized circle orthogonally intersecting L at (1/x) and (1/y).

  4. 4.

    When inverting a region with a generalized circle as the boundary, pick a point on L within the interior of the region to determine on which side of the boundary the inverted interior lies.

Figures 2 and 3 illustrate these steps.

Fig. 2
figure 2

Illustration of inverting a disk. In step 1, we choose L to be the x-axis (although any line through the origin works). In steps 2 and 3, we identify x and y, invert them to \(x^{-1}\) and \(y^{-1}\), and draw the generalized circle in the inverted plane to be the new boundary. In step 4, we determine that the interior of the disk is mapped to the exterior by noting that 1, a point invariant under the inversion map, is excluded in the original region and therefore is excluded in the inverted region

Fig. 3
figure 3

The three vertical pairs illustrate inversion. In steps 1, we choose L to be the x-axis. In steps 4 we determine the interior by examining point 1: if 1 is included in the original region, it is included in the inverted region, and vice-versa

Scaled relative graphs

In this section, we define the notion of scaled relative graphs (SRG). Loosely speaking, SRG maps the action of an operator to a set on the extended complex plane.

We use the extended complex plane \(\overline{\mathbb {C}}=\mathbb {C}\cup \{ \infty \}\) to represent the 2D plane and the point at infinity. Since complex numbers compactly represent rotations and scaling, this choice simplifies our notation compared to using \(\mathbb {R}^2\cup \{ \infty \}\). We avoid the operations \(\infty +\infty \), 0/0, \(\infty /\infty \), and \(0\cdot \infty \). Otherwise, we adopt the convention of \(z+\infty =\infty \), \(z/\infty =0\), \(z/0=\infty \), and \(z\cdot \infty =\infty \).

Fig. 4
figure 4

SRGs of the operators: \(P_L:\mathbb {R}^2 \rightarrow \mathbb {R}^2\) is the projection onto an arbitrary line L; \(A:\mathbb {R}^2 \rightarrow \mathbb {R}^2\) is defined as \(A(u,v) = (0,u)\); \(\partial \Vert \cdot \Vert \) is the subdifferential of the Euclidean norm on \(\mathbb {R}^n\) with \(n\ge 2\); \(B:\mathbb {R}^3\rightarrow \mathbb {R}^3\) is defined as \(B(u,v,w) = (u,2v,3w)\). The shapes were obtained by plugging the operators into the definition of the SRG and performing direct calculations. See [34, 35] for follow-up work on drawing the SRG of individual operators

SRG of operators

Consider an operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\). Let \(x,y\in \mathcal {H}\) be a pair of inputs and let \(u,v\in \mathcal {H}\) be their corresponding outputs, i.e., \(u\in Ax\), and \(v\in Ay\). The goal is to understand the change in output relative to the change in input.

First, consider the case \(x\ne y\). Consider the complex conjugate pair

$$\begin{aligned} z= \frac{\Vert u-v\Vert }{\Vert x-y\Vert } \exp \left[ \pm i \angle (u-v,x-y)\right] , \end{aligned}$$

where given any \(a,b\in \mathcal {H}\)

$$\begin{aligned} \angle (a,b) = \left\{ \begin{array}{ll} \arccos ( \tfrac{\langle a,b \rangle }{\Vert a \Vert \Vert b \Vert } )&{}\text { if }a\ne 0,\,b\ne 0\\ 0&{}\text { otherwise} \end{array} \right. \end{aligned}$$

denotes the angle between them. The absolute value (magnitude) \(|z|=\tfrac{\Vert u-v\Vert }{\Vert x-y\Vert }\) represents the size of the change in outputs relative to the size of the change in inputs. The argument (angle) \(\angle (u-v,x-y)\) represents how much the change in outputs is aligned with the change in inputs. Equivalently, \({\text {Re}}z\) and \({\text {Im}}z\) respectively represent the components of \(u-v\) aligned with and perpendicular to \(x-y\), i.e.,

$$\begin{aligned} {\text {Re}}z&= \text {sgn}(\langle u-v,x-y\rangle )\frac{\Vert P_{\mathrm {span}\{x-y\}}(u-v)\Vert }{\Vert x-y\Vert } =\frac{\langle u-v,x-y\rangle }{\Vert x-y\Vert ^2}\nonumber \\ \qquad {\text {Im}}z&=\pm \frac{\Vert P_{\{x-y\}^\perp }(u-v)\Vert }{\Vert x-y\Vert } \end{aligned}$$
(1)

where \(P_{\mathrm {span}\{x-y\}}\) is the projection onto the span of \(x-y\) and \(P_{\{x-y\}^\perp }\) is the projection onto the subspace orthogonal to \(x-y\).

Define the SRG of an operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) as

$$\begin{aligned} \mathcal {G}(A)&= \left\{ \frac{\Vert u-v\Vert }{\Vert x-y\Vert } \exp \left[ \pm i \angle (u-v,x-y)\right] \,\Big |\, u\in Ax,\,v\in Ay,\, x\ne y \right\} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \bigg (\cup \{\infty \}\text { if { A} is multi-valued}\bigg ). \end{aligned}$$

We clarify several points: (i) \({\mathcal {G}}(A)\subseteq \overline{\mathbb {C}}\). (ii) \(\infty \in {\mathcal {G}}(A)\) if and only if there is a point \(x\in \mathcal {H}\) such that Ax is multi-valued. (In this case, there exists \((x,y),(u,v)\in A\) such that \(x=y\) and \(u\ne v\), and the idea is that \(|z|=\Vert u-v\Vert /0=\infty \), i.e., \(u-v\) is infinitely larger than \(x-y=0\).) (iii) the ± makes \({\mathcal {G}}(A)\) symmetric about the real axis. (We include the ± because \(\angle (u-v,x-y)\) always returns a nonnegative angle.) See Fig. 4 for examples.

Fig. 5
figure 5

SRG of a \(3\times 3\) matrix. Crosses denote the eigenvalues

For linear operators, the SRG generalizes eigenvalues. Given \(A\in \mathbb {R}^{n\times n}\), write \(\varLambda (A)\) for the set of eigenvalues of A.

Theorem 1

If \(A\in \mathbb {R}^{n\times n}\) and \(n = 1\) or \(n\ge 3\), then \(\varLambda (A)\subseteq {\mathcal {G}}(A)\).

The result fails for \(n=2\) because \(S^{n-1}\), the sphere in \(\mathbb {R}^n\), is not simply connected for \(n=2\); the proof constructs a loop in \(S^{n-1}\) and argues the image of the loop on complex plane is nullhomotopic. Figure 5 illustrates an SRG of a matrix. The SRG of a matrix does not seem to be directly related to the numerical range (field of values) [33] or the pseudospectrum [67].

Proof

If \(\lambda \) is a real eigenvalue of A, then considering (1) with x as the corresponding eigenvector and \(y=0\) tells us \(\lambda \in {\mathcal {G}}(A)\).

Next consider a complex conjugate eigenvalue pair \(\lambda ,{\overline{\lambda }}\in \varLambda (A)\), where \({\text {Im}}\lambda >0\). (This case excludes \(n=1\).) A has a real Schur decomposition of the form

$$\begin{aligned} A=Q^T \underbrace{\begin{bmatrix} B_{11}&{}A_{12}&{}A_{13}&{}\cdots \\ 0&{}B_{22}&{}A_{23}&{}\cdots \\ &{}&{}\ddots \\ \end{bmatrix}}_{=B} Q,\qquad B_{11}=\begin{bmatrix} a&{}b\\ -c&{}a \end{bmatrix} \in \mathbb {R}^{2\times 2}, \end{aligned}$$

where \(b,c>0\), \(\lambda =a+i\sqrt{bc}\), and \(Q\in \mathbb {R}^{n\times n}\) is orthogonal. (To obtain this decomposition, take the construction of [52] and apply a \(\pm 45\) degree rotation to the leading \(2\times 2\) block.) Since an orthogonal change of coordinates does not change the SRG, we have \({\mathcal {G}}(A)={\mathcal {G}}(B)\). Write \(S^{n-1}\) for the sphere in \(\mathbb {R}^n\). Consider the continuous map \(z:S^{n-1}\rightarrow \mathbb {C}\) defined by \(z(x)=\Vert Bx\Vert \exp \left[ i\angle (Bx,x)\right] \). Since B is a linear operator, we have \(z(S^{n-1})={\mathcal {G}}(B)\). Consider the curve \(\gamma (t)=\cos (2\pi t)e_1+\sin (2\pi t)e_2\in S^{n-1}\) from \(t\in [0,1]\), where \(e_1\) and \(e_2\) are the first and second unit vectors in \(\mathbb {R}^n\). With simple computation, we get

$$\begin{aligned} z(\gamma (t))= a+\frac{1}{2}(b-c)\sin (4\pi t)+ i \frac{1}{2}(b+c-(b-c)\cos (4\pi t)). \end{aligned}$$

If \(b=c\), then \(z(\gamma (t))=\lambda \) and we conclude \(\lambda \in z(S^{n-1})={\mathcal {G}}(A)\).

Assume \(b\ne c\), and assume for contradiction that \(\lambda \notin z(S^{n-1})={\mathcal {G}}(A)\). The curve \(z(\gamma (t))\) strictly encloses the eigenvalue \(\lambda =a+i\sqrt{bc}\) since \(\min (b,c)\le \sqrt{bc}\le \max (b,c)\). Since \(S^{n-1}\) is simply connected for \(n\ge 3\), we can continuously contract \(\gamma (t)\) to a point in \(S^{n-1}\), and the continuous map z provides a continuous contraction of \(z(\gamma (t))\) to a point in \(z(S^{n-1})\). However, \(z( \gamma (t))\) has a nonzero winding numberFootnote 1 around \(\lambda \) and \(\lambda \notin z(S^{n-1})\). Therefore, \(z( \gamma (t))\) cannot be continuously contracted to a point in \(z(S^{n-1})\). This is a contradiction and we conclude \(\lambda \in z(S^{n-1})={\mathcal {G}}(A)\). \(\square \)

The SRG \({\mathcal {G}}(A)\) maps the action of the operator A to points in \(\overline{\mathbb {C}}\). In the following sections, we will need to conversely take any point in \(\overline{\mathbb {C}}\) and find an operator whose action maps to that point. Lemma 1 provides such constructions.

Lemma 1

Take any \(z=z_r+z_ii\in \mathbb {C}\). Define \(A_z:\mathbb {R}^2\rightarrow \mathbb {R}^2\) and \(A_\infty :\mathbb {R}^2\rightrightarrows \mathbb {R}^2\) as

$$\begin{aligned} A_z\begin{bmatrix} \zeta _1\\ \zeta _2 \end{bmatrix} = \begin{bmatrix} z_r\zeta _1-z_i\zeta _2\\ z_r\zeta _2+z_i\zeta _1 \end{bmatrix} \qquad A_\infty (x)=\left\{ \begin{array}{ll} \mathbb {R}^2&{}\quad \text {if }x=0\\ \emptyset &{}\quad \text {otherwise.} \end{array} \right. \end{aligned}$$

Then,

$$\begin{aligned} \mathcal {G}(A_z)=\{z,{\bar{z}}\}, \qquad \mathcal {G}(A_\infty )=\{\infty \}. \end{aligned}$$

If we write \(\cong \) to identify an element of \(\mathbb {R}^2\) with an element in \(\mathbb {C}\) in that

$$\begin{aligned} \begin{bmatrix} x\\ y \end{bmatrix} \cong x+yi, \end{aligned}$$

then we can view \(A_z\) as complex multiplication with z in the sense that

$$\begin{aligned} A_z\begin{bmatrix} \zeta _1\\ \zeta _2 \end{bmatrix} \cong z(\zeta _1+\zeta _2i). \end{aligned}$$

Proof

Again, we write \(\cong \) to identify an element of \(\mathbb {R}^2\) with an element in \(\mathbb {C}\). Write \(z=r_ze^{i\theta _{z}}\). Consider any \(x,y\in \mathbb {R}^2\) where \(x\ne y\) and define \(u=A_zx\) and \(v=A_zy\). Then we can write

$$\begin{aligned} x-y=r_w \begin{bmatrix} \cos (\theta _w)\\ \sin (\theta _w) \end{bmatrix} \end{aligned}$$

where \(r_w>0\), and

$$\begin{aligned} u-v=A_z(x-y)\cong r_{z}r_we^{i(\theta _{z}+\theta _w)}. \end{aligned}$$

This gives us

$$\begin{aligned} \frac{\Vert u-v\Vert }{\Vert x-y\Vert }= r_z,\quad \angle (u-v,x-y)=|\theta _z|, \end{aligned}$$

and

$$\begin{aligned} {\mathcal {G}}(A_z)=\left\{ r_{z}e^{i\theta _{z}},r_{z}e^{- i\theta _{z}}\right\} . \end{aligned}$$

Now consider \(A_\infty \). By definition, \(\infty \in {\mathcal {G}}(A_\infty )\). For any \(u\in A_\infty x\) and \(v\in A_\infty y\), we have \(x=y=0\), and therefore \({\mathcal {G}}(A_\infty )\) contains no finite \(z\in \mathbb {C}\). We conclude \({\mathcal {G}}(A_\infty )=\{\infty \}\). \(\square \)

SRG of operator classes

Let \({\mathcal {A}}\) be a collection of operators. We define the SRG of the class \({\mathcal {A}}\) as

$$\begin{aligned} \mathcal {G}({\mathcal {A}})=\bigcup _{A\in {\mathcal {A}}}\mathcal {G}(A). \end{aligned}$$

We focus more on SRGs of operator classes, rather than individual operators, because theorems are usually stated with operator classes. For example, one might say “If A is 1/2-cocoercive, i.e., if \(A\in {\mathcal {C}}_{1/2}\), then \(I-A\) is nonexpansive.” We now characterize the SRG of the Lipschitz, averaged, monotone, strongly monotone, and cocoercive operator classes.

Proposition 1

Let \(\mu ,\beta ,L\in (0,\infty )\) and \(\theta \in (0,1)\). The SRGs of \({\mathcal {L}}_L\), \({\mathcal {N}}_\theta \), \({\mathcal {M}}\), \({\mathcal {M}}_\mu \), and \({\mathcal {C}}_\beta \) are, respectively, given by

figure a

Proof

First, characterize \({\mathcal {G}}({\mathcal {L}}_L)\). We have \({\mathcal {G}}({\mathcal {L}}_L)\subseteq \left\{ z\in \mathbb {C}\,\big |\,|z|^2\le L^2\right\} \) since

$$\begin{aligned} A\in {\mathcal {L}}_L \;\;\Rightarrow \;\; \frac{\Vert Ax-Ay\Vert }{\Vert x-y\Vert }\le L,\,\forall \,x,y\in \mathcal {H},\,x\ne y \;\;\Rightarrow \;\; {\mathcal {G}}(A)\subseteq \left\{ z\in \mathbb {C}\,\big |\,|z|^2\le L^2\right\} . \end{aligned}$$

Conversely, given any \(z\in \mathbb {C}\) such that \(|z|\le L\), the operator \(A_z\) of Lemma 1 satisfies \(\Vert A_zx-A_zy\Vert \le L\Vert x-y\Vert \) for any \(x,y\in \mathbb {R}^2\), i.e., \(A_z\in {\mathcal {L}}_L\), and \({\mathcal {G}}(A_z)=\{z,{\bar{z}}\}\). Therefore \({\mathcal {G}}({\mathcal {L}}_L)\supseteq \left\{ z\in \mathbb {C}\,\big |\,|z|^2\le L^2\right\} \).

Next, characterize \({\mathcal {G}}({\mathcal {M}})\). For any \(A\in {\mathcal {M}}\), monotonicity implies

$$\begin{aligned} \frac{\langle u-v,x-y\rangle }{\Vert x-y\Vert ^2}\ge 0, \quad \forall \,u\in Ax,\,v\in Ay,\,x\ne y. \end{aligned}$$

Considering (1), we conclude \({\mathcal {G}}(A)\backslash \{\infty \}\subseteq \{z\,|\,{\text {Re}}z\ge 0\}\). On the other hand, given any \(z\in \{z\,|\,{\text {Re}}z\ge 0\}\), the operator \(A_z\) of Lemma 1 satisfies \(\langle A_zx-A_zy,x-y\rangle \ge 0\) for any \(x,y\in \mathbb {R}^2\), i.e., \(A_z\in {\mathcal {M}}\), and \({\mathcal {G}}(A_z)=\{z,{\bar{z}}\}\). Therefore, \(z\in {\mathcal {G}}(A_z)\subset {\mathcal {G}}({\mathcal {M}})\), and we conclude \(\{z\,|\,{\text {Re}}z\ge 0\}\subseteq {\mathcal {G}}(\mathcal {M})\). Finally, note that \(\infty \in {\mathcal {G}}({\mathcal {M}})\) is equivalent to saying that there exists a multi-valued operator in \({\mathcal {M}}\). The \(A_\infty \) of Lemma 1 is one such example.

The other SRGs \({\mathcal {G}}({\mathcal {M}}_\mu )\), \({\mathcal {G}}({\mathcal {C}}_\beta )\), and \({\mathcal {G}}({\mathcal {N}}_\theta )\) can be characterized with similar direct proofs or by using operator and SRG transformations introduced later in Sect. 4. In particular: the fact \({\mathcal {M}}_\mu =\mu I+{\mathcal {M}}\), Theorem 4, and the characterization of \({\mathcal {G}}({\mathcal {M}})\) prove the characterization \({\mathcal {G}}({\mathcal {M}}_\mu )\); the fact \(({\mathcal {M}}_\mu )^{-1}={\mathcal {C}}_{\mu }\), Theorem 5, and the characterization \({\mathcal {G}}({\mathcal {M}}_\mu )\) prove the characterization \({\mathcal {G}}({\mathcal {C}}_{\mu })\); and the fact \((1-\theta )I+\theta {\mathcal {L}}_1={\mathcal {N}}_\theta \), Theorem 4, and the characterization of \({\mathcal {G}}({\mathcal {L}}_1)\) prove the characterization of \({\mathcal {G}}({\mathcal {N}}_\theta )\). Facts \({\mathcal {M}}_\mu =\mu I+{\mathcal {M}}\), \(({\mathcal {M}}_\mu )^{-1}={\mathcal {C}}_{\mu }\), and \((1-\theta )I+\theta {\mathcal {L}}_1={\mathcal {N}}_\theta \) are well known [5]. \(\square \)

Proposition 2

Let \(0<\mu<L<\infty \). Then

figure b

Proof

Since \(\partial \mathcal {F}_{0,\infty }\subset \mathcal {M}\), we have \({\mathcal {G}}(\partial \mathcal {F}_{0,\infty })\subseteq {\mathcal {G}}({\mathcal {M}})= \{z\in \mathbb {C}\,|\,{\text {Re}}z\ge 0\}\cup \{\infty \}\) by Proposition 1. We claim \(f:\mathbb {R}^2\rightarrow \mathbb {R}\) defined by \(f(x,y)=|x|\) satisfies \({\mathcal {G}}(\partial f)=\{z\in \mathbb {C}\,|\,{\text {Re}}z\ge 0\}\cup \{\infty \}\). This tells us \(\{z\in \mathbb {C}\,|\,{\text {Re}}z\ge 0\}\cup \{\infty \}\subseteq {\mathcal {G}}(\partial \mathcal {F}_{0,\infty })\).

We prove the claim with basic computation. Let \(f(x,y)=|x|\). The subgradient has the form \(\partial f(x,y)=(h(x),0)\) for h defined by:

$$\begin{aligned} h(x) = \left\{ \begin{array}{l@{\quad }l} \{-1\}&{}\quad \text {for }x<0\\ \{u\,|\,-1\le u\le 1\}&{}\quad \text {for }x=0\\ \{1\}&{}\quad \text {for }x>0. \end{array}\right. \end{aligned}$$

Since \(\partial f\) is multi-valued at (0, 0), we have \(\infty \in {\mathcal {G}}(\partial f)\). Since \(\partial f(1,0)=\partial f(2,0)\), we have \(0\in {\mathcal {G}}(\partial f)\). The input-output pairs \((0,0)\in \partial f(0,0)\) and \((h(R\cos (\theta )),0)\in \partial f(R\cos (\theta ),R\sin (\theta ))\) map to the point \(R^{-1}(| \cos (\theta ) |,\pm \sin (\theta ))\in \mathbb {C}\). Clearly the image of this map over the range \(R\in (0,\infty )\), \(\theta \in [0,2\pi )\) is the right-hand plane except the origin. Hence \({\mathcal {G}}(\partial f)=\{z\in \mathbb {C}\,|\,{\text {Re}}z\ge 0\}\cup \{\infty \}\).

The SRGs \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,\infty })\), \({\mathcal {G}}(\partial {\mathcal {F}}_{0,L})\), and \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,L})\) can be characterized with similar direct proofs or by using operator and SRG transformations introduced later in Sect. 4. In particular: the fact \(\partial {\mathcal {F}}_{\mu ,\infty }=\mu I+\partial {\mathcal {F}}_{0,\infty }\), Theorem 4, and the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{0,\infty })\) prove the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,\infty })\); the fact \(\partial {\mathcal {F}}_{0,L}=\left( \partial {\mathcal {F}}_{1/L,\infty }\right) ^{-1}\), Theorem 5, and the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{1/L,\infty })\) prove the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{0,L})\); and the fact \(\partial {\mathcal {F}}_{\mu ,L}= \mu I+\partial {\mathcal {F}}_{0,L-\mu }\), Theorem 4, and the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{0,L-\mu })\) prove the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,L})\). Facts \(\partial {\mathcal {F}}_{\mu ,\infty }=\mu I+\partial {\mathcal {F}}_{0,\infty }\), \(\partial {\mathcal {F}}_{0,L}=\left( \partial {\mathcal {F}}_{1/L,\infty }\right) ^{-1}\), and \(\partial {\mathcal {F}}_{\mu ,L}= \mu I+\partial {\mathcal {F}}_{0,L-\mu }\) are well known [66]. \(\square \)

SRG-full classes

Section 3.1 discussed how given an operator we can draw its SRG. Conversely, can we examine the SRG and conclude something about the operator? To perform this type of reasoning, we need further conditions.

We say class of operators \({\mathcal {A}}\) is SRG-full if

$$\begin{aligned} A\in {\mathcal {A}}\quad \Leftrightarrow \quad {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}}). \end{aligned}$$

Since the implication \(A\in {\mathcal {A}}\Rightarrow {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\) already follows from the SRG’s definition, the substance of this definition is the implication \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\Rightarrow A\in {\mathcal {A}}\). Essentially, a class is SRG-full if it can be fully characterized by its SRG; given an SRG-full class \({\mathcal {A}}\) and an operator A, we can check membership \(A\in {\mathcal {A}}\) by verifying (through geometric arguments) the containment \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\) in the 2D plane.

SRG-fullness assumes the desirable property \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\Rightarrow A\in {\mathcal {A}}\). We now discuss which classes possess this property.

Theorem 2

An operator class \({\mathcal {A}}\) is SRG-full if it is defined by

$$\begin{aligned} A\in {\mathcal {A}}\quad \Leftrightarrow \quad h\left( \Vert u-v\Vert ^2,\Vert x-y\Vert ^2,\langle u-v,x-y\rangle \right) \le 0, \quad \forall u\in Ax,\,v\in Ay \end{aligned}$$

for some nonnegative homogeneous function \(h:\mathbb {R}^3\rightarrow \mathbb {R}\).

To clarify, h is nonnegative homogeneous if \(\theta h(a,b,c)= h(\theta a,\theta b,\theta c)\) for all \(\theta \ge 0\). (We do not assume h is smooth.) When a class \({\mathcal {A}}\) is defined by h as in Theorem 2, we say h represents \({\mathcal {A}}\). For example, the \(\mu \)-strongly monotone class \({\mathcal {M}}_\mu \) is represented by \(h(a,b,c)=\mu b-c\), since

$$\begin{aligned} A\in {\mathcal {M}}_{\mu } \quad \Leftrightarrow \quad \mu \Vert x-y\Vert ^2\le \langle u-v,x-y\rangle ,\quad \forall u\in Ax,\,v\in Ay. \end{aligned}$$

As another example, firmly-nonexpansive class \({\mathcal {N}}_{1/2}\) is represented by \(h(a,b,c)=a-c\), since

$$\begin{aligned} A\in {\mathcal {N}}_{1/2} \quad \Leftrightarrow \quad \Vert u-v\Vert ^2\le \langle u-v,x-y\rangle ,\quad \forall u\in Ax,\,v\in Ay. \end{aligned}$$

By Theorem 2, the classes \({\mathcal {M}}\), \({\mathcal {M}}_\mu \), \({\mathcal {C}}_\beta \), \({\mathcal {L}}_L\), and \({\mathcal {N}}_\theta \) are all SRG-full. Respectively,

  • \({\mathcal {M}}\) is represented by \(h=-c\),

  • \({\mathcal {M}}_\mu \) is represented by \(h=\mu b-c\),

  • \({\mathcal {C}}_\beta \) is represented by \(h=\beta a-c\),

  • \({\mathcal {L}}_L\) is represented by \(h= a-Lb\),

  • \({\mathcal {N}}_{\theta }\) is represented by \(h=a+(1-2\theta )b-2(1-\theta )c\).

If h and g represent SRG-full classes \({\mathcal {A}}\) and \({\mathcal {B}}\), then \(\max \{h,g\}\) represents \({\mathcal {A}}\cap {\mathcal {B}}\) and \(\min \{h,g\}\) represents \({\mathcal {A}}\cup {\mathcal {B}}\).

On the other hand, the classes \(\partial {\mathcal {F}}_{0,\infty }\), \(\partial {\mathcal {F}}_{\mu ,\infty }\), \(\partial {\mathcal {F}}_{0,L}\), and \(\partial {\mathcal {F}}_{\mu ,L}\) are not SRG-full. For example, the operator

$$\begin{aligned} A(z_1,z_2)= \begin{bmatrix} 0&{}\quad -1\\ 1&{}\quad 0 \end{bmatrix} \begin{bmatrix} z_2\\ z_2 \end{bmatrix} \end{aligned}$$

satisfies \({\mathcal {G}}(A)=\{-i,i\}\subseteq {\mathcal {G}}( \partial {\mathcal {F}}_{0,\infty })\). However, \(A\notin \partial {\mathcal {F}}_{0,\infty }\) because there is no convex function f for which \(\nabla f\) = DA.

Proof

Since \(A\in {\mathcal {A}}\Rightarrow {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\) always holds, we show \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\Rightarrow A\in {\mathcal {A}}\). Assume \({\mathcal {A}}\) is represented by h and an operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) satisfies \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\). Let \(u_A\in Ax_A\) and \(v_A\in Ay_A\) represent distinct evaluations, i.e., \(x_A\ne y_A\) or \(u_A\ne v_A\).

First consider the case \(x_A\ne y_A\). Then

$$\begin{aligned} z=(\Vert u_A-v_A\Vert /\Vert x_A-y_A\Vert )\exp [i\angle (u_A-v_A,x_A-y_A)] \end{aligned}$$

satisfies \(z\in {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\). Since \(z\in {\mathcal {G}}({\mathcal {A}})\), there is an operator \(B\in {\mathcal {A}}\) such that \(u_B\in Bx_B\) and \(v_B\in By_B\) with

$$\begin{aligned} \frac{\Vert u_B-v_B\Vert ^2}{\Vert x_B-y_B\Vert ^2}=|z|^2,\quad \frac{\langle u_B-v_B,x_B-y_B\rangle }{\Vert x_B-y_B\Vert ^2}={\text {Re}}z. \end{aligned}$$

Since h represents \({\mathcal {A}}\), we have

$$\begin{aligned} 0\ge h\left( \Vert u_B-v_B\Vert ^2,\Vert x_B-y_B\Vert ^2,\langle u_B-v_B,x_B-y_B\rangle \right) , \end{aligned}$$

and homogeneity gives us

$$\begin{aligned} 0&\ge h\left( \frac{\Vert u_B-v_B\Vert ^2}{\Vert x_B-y_B\Vert ^2},1,\frac{\langle u_B-v_B,x_B-y_B\rangle }{\Vert x_B-y_B\Vert ^2}\right) \\&= h\left( |z|^2,1,{\text {Re}}z\right) = h\left( \frac{\Vert u_A-v_A\Vert ^2}{\Vert x_A-y_A\Vert ^2},1,\frac{\langle u_A-v_A,x_A-y_A\rangle }{\Vert x_A-y_A\Vert ^2}\right) . \end{aligned}$$

Finally, by homogeneity we have

$$\begin{aligned} h\left( \Vert u_A-v_A\Vert ^2,\Vert x_A-y_A\Vert ^2,\langle u_A-v_A,x_A-y_A\rangle \right) \le 0. \end{aligned}$$

Now consider the case \(x_A=y_A\) and \(u_A\ne v_B\). Then A is multi-valued and \(\infty \in {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\). Since \(\infty \in {\mathcal {G}}({\mathcal {A}})\), there is a multi-valued operator \(B\in {\mathcal {A}}\) such that \(u_B\in Bx_B\) and \(v_B\in Bx_B\) with \(u_B\ne v_B\). This implies \(h(\Vert u_B-v_B\Vert ^2,0,0)\le 0\). Therefore, \(h(\Vert u_A-v_A\Vert ^2,0,0)\le 0\).

In conclusion, \((x_A,u_A)\) and \((y_A,v_A)\), which represent arbitrary evaluations of A, satisfy the inequality defined by h, and we conclude \(A\in {\mathcal {A}}\). \(\square \)

Operator and SRG transformations

In this section, we show how transformations of operators map to changes in their SRGs. We then use these results and geometric arguments to analyze convergence of various fixed-point iterations. The convergence analyses are tight in the sense that they cannot be improved without additional assumptions.

SRG intersection

Theorem 3

If \({\mathcal {A}}\) and \({\mathcal {B}}\) are SRG-full classes, then \({\mathcal {A}}\cap {\mathcal {B}}\) is SRG-full, and

$$\begin{aligned} {\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})={\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B}). \end{aligned}$$

The containment \({\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\) holds regardless of SRG-fullness since, by definition, \({\mathcal {G}}({\mathcal {A}}\cap {\mathcal {B}})= \{{\mathcal {G}}(A)\,|\, A\in {\mathcal {A}},\, A\in {\mathcal {B}}\}\) and \({\mathcal {G}}({\mathcal {A}}) \cap {\mathcal {G}}({\mathcal {B}})= \{{\mathcal {G}}(A) \cap {\mathcal {G}}(B)\,|\, A\in {\mathcal {A}},\, B\in {\mathcal {B}}\}\). Therefore, the substance of Theorem 3 is \({\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\supseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\). This result is useful for setups with multiple assumptions on a single operator such as Facts 5, 7, 12. A similar result holds with the union.

Proof

Since \({\mathcal {A}}\) and \({\mathcal {B}}\) are SRG-full

$$\begin{aligned} {\mathcal {G}}(C)\subseteq {\mathcal {G}}({\mathcal {A}}\cap {\mathcal {B}})\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}({\mathcal {B}}) \quad&\Rightarrow \quad {\mathcal {G}}(C)\subseteq {\mathcal {G}}({\mathcal {A}})\text { and }{\mathcal {G}}(C)\subseteq {\mathcal {G}}({\mathcal {B}})\\ \quad&\Rightarrow \quad C\in {\mathcal {A}}\text { and }C\in {\mathcal {B}}\\ \quad&\Rightarrow \quad C\in {\mathcal {A}}\cap {\mathcal {B}}\end{aligned}$$

for an operator C, and we conclude \({\mathcal {A}}\cap {\mathcal {B}}\) is SRG-full.

Assume \(z\in \mathbb {C}\) satisfies \(\{z,{\bar{z}}\}\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}({\mathcal {B}})\). Then \(A_z\) of Lemma 1 satisfies \({\mathcal {G}}(A_z)=\{z,{\bar{z}}\}\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}({\mathcal {B}})\). Since \({\mathcal {A}}\) and \({\mathcal {B}}\) are SRG-full, \(A_z\in {\mathcal {A}}\) and \(A_z\in {\mathcal {B}}\) and \(\{z,{\bar{z}}\}={\mathcal {G}}(A_z)\subseteq {\mathcal {G}}({\mathcal {A}}\cap {\mathcal {B}})\). If \(\infty \in {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}({\mathcal {B}})\), then a similar argument using \(A_\infty \) of Lemma 1 proves \(\infty \in {\mathcal {G}}({\mathcal {A}}\cap {\mathcal {B}})\). Therefore \({\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\subseteq {\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\). Since the other containment \({\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\) holds by definition, we have the equality. \(\square \)

SRG scaling and translation

Theorem 4

Let \(\alpha \in \mathbb {R}\) and \(\alpha \ne 0\). If \({\mathcal {A}}\) is a class of operators, then

$$\begin{aligned} {\mathcal {G}}(\alpha {\mathcal {A}})={\mathcal {G}}({\mathcal {A}}\alpha )=\alpha {\mathcal {G}}({\mathcal {A}}),\qquad {\mathcal {G}}(I+ {\mathcal {A}})=1+{\mathcal {G}}({\mathcal {A}}). \end{aligned}$$

If \({\mathcal {A}}\) is furthermore SRG-full, then \(\alpha {\mathcal {A}}\), \({\mathcal {A}}\alpha \), and \(I+ {\mathcal {A}}\) are SRG-full.

Proof

\({\mathcal {G}}(\alpha A)=\alpha {\mathcal {G}}(A)\) and \({\mathcal {G}}(A\alpha )=\alpha {\mathcal {G}}(A)\) follow from the definition of the SRG, and \({\mathcal {G}}(I+ A)=1+{\mathcal {G}}(A)\) follows from (1). The scaling and translation operations are reversible and \({\mathcal {G}}((1/\alpha ){\mathcal {A}})={\mathcal {G}}({\mathcal {A}}(1/\alpha ))=(1/\alpha ){\mathcal {G}}({\mathcal {A}})\) and \({\mathcal {G}}({\mathcal {A}}-I)={\mathcal {G}}(A)-1\). For any \(B:\mathcal {H}\rightrightarrows \mathcal {H}\),

$$\begin{aligned} {\mathcal {G}}(B)\subseteq {\mathcal {G}}(\alpha {\mathcal {A}}) \quad \Rightarrow \quad {\mathcal {G}}((1/\alpha )B)\subseteq {\mathcal {G}}({\mathcal {A}}) \quad \Rightarrow \quad (1/\alpha )B\in {\mathcal {A}}\quad \Rightarrow \quad B\in \alpha {\mathcal {A}}, \end{aligned}$$

and we conclude \(\alpha {\mathcal {A}}\) is SRG-full. By a similar reasoning, \({\mathcal {A}}\alpha \) and \(I+ {\mathcal {A}}\) are SRG-full. \(\square \)

Since a class of operators can consist of a single operator, if \(A:\mathcal {H}\rightrightarrows \mathcal {H}\), then

$$\begin{aligned} {\mathcal {G}}(\alpha A)={\mathcal {G}}(A\alpha )=\alpha {\mathcal {G}}(A),\qquad {\mathcal {G}}(I+ A)=1+{\mathcal {G}}(A). \end{aligned}$$

To clarify, \(\alpha {\mathcal {G}}(A)\) corresponds to scaling \({\mathcal {G}}(A)\subseteq \overline{\mathbb {C}}\) by \(|\alpha |\) and reflecting about the vertical axis (imaginary axis) if \(\alpha <0\).

Convergence analysis: gradient descent

Consider the optimization problem

$$\begin{aligned} \begin{array}{ll} \underset{x\in {\mathcal {H}}}{\text{ minimize }}&f(x), \end{array} \end{aligned}$$
(2)

where f is a differentiable function with a minimizer. Consider gradient descent [16]

$$\begin{aligned} x^{k+1}=x^k-\alpha \nabla f(x^k) \end{aligned}$$
(GD)

where \(\alpha >0\) and \(x^0\in \mathcal {H}\) is a starting point. We can use the Krasnosel’skiĭ–Mann theorem to establish convergence of (GD).

Fact 1

Assume f is convex and L-smooth with \(L>0\). For \(\alpha \in (0,2/L)\), the iterates of (GD) converge in that \(x^k\rightarrow x^\star \) weakly for some \(x^\star \) such that \(\nabla f(x^\star )=0\).

Proof

By Propositions 1 and 2 and Theorem 4, we have the geometry

figure c

Since \({\mathcal {N}}_\theta \) is SRG-full by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. Therefore \(I-\alpha \nabla f\) is averaged, and the iteration converges by the Krasnosel’skiĭ–Mann theorem. \(\square \)

With stronger assumptions, we can establish an exponential rate of convergence for (GD).

Fact 2

Assume f is \(\mu \)-strongly convex and L-smooth with \(0<\mu<L<\infty \). For \(\alpha \in (0,2/L)\), the iterates of (GD) converge exponentially to the minimizer \(x^\star \) with rate

$$\begin{aligned} \Vert x^k-x^\star \Vert \le \left( \max \{|1-\alpha \mu |,|1-\alpha L|\}\right) ^k \Vert x^0-x^\star \Vert . \end{aligned}$$

Proof

This follows from Fact 3, which we state and prove below. \(\square \)

Fact 3

Let \(0<\mu<L<\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}= \partial {\mathcal {F}}_{\mu ,L}\), then \(I-\alpha {\mathcal {A}}\subseteq {\mathcal {L}}_R\) for

$$\begin{aligned} R= \max \{|1-\alpha \mu |,|1-\alpha L|\}. \end{aligned}$$

This result is tight in the sense that \(I-\alpha {\mathcal {A}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.

Proof

By Proposition 2 and Theorem 4, we have the geometry

figure d

The containment of \({\mathcal {G}}(I-\alpha {\mathcal {A}})\) holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRG-full by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)

Convergence analysis: forward step method

Consider the monotone inclusion problem

$$\begin{aligned} \text{ find } x\in {\mathcal {H}}\quad \text{ such } \text{ that } \quad 0\in Ax \end{aligned}$$

where A is a maximal monotone operator with a zero. Consider the forward step method [15]

$$\begin{aligned} x^{k+1}=x^k-\alpha A(x^k) \end{aligned}$$
(FS)

where \(\alpha >0\) and \(x^0\in \mathcal {H}\) is a starting point. The forward step method is analogous to gradient descent. Under the following two setups, (FS) converges exponentially.

Fact 4

Assume A is \(\mu \)-strongly monotone and L-Lipschitz with \(0<\mu<L<\infty \). For \(\alpha \in (0,2\mu /L)\), the iterates of (FS) converge exponentially to the zero \(x^\star \) with rate

$$\begin{aligned} \Vert x^k-x^\star \Vert \le \left( 1-2\alpha \mu +\alpha ^2 L^2\right) ^{k/2} \Vert x^0-x^\star \Vert . \end{aligned}$$

Proof

This follows from Fact 5, which we state and prove below. \(\square \)

Fact 5

(Proposition 26.16 [5]) Let \(0<\mu<L<\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}} =\mathcal {M}_\mu \cap \mathcal {L}_L\), then \(I-\alpha {\mathcal {A}}\subseteq {\mathcal {L}}_R\) for

$$\begin{aligned} R= \sqrt{1-2\alpha \mu +\alpha ^2 L^2}. \end{aligned}$$

This result is tight in the sense that \(I-\alpha {\mathcal {A}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.

Proof

First consider the case \(\alpha \mu >1\). By Proposition 1 and Theorem 4, we have the geometry

figure e

To clarify, O is the center of the circle with radius \({\overline{OC}}\) (lighter shade) and A is the center of the circle with radius \({\overline{AC}}={\overline{AD}}\) defining the inner region (darker shade). With 2 applications of the Pythagorean theorem, we get

$$\begin{aligned} {\overline{OC}}^2&={\overline{CB}}^2+{\overline{BO}}^2 = {\overline{AC}}^2-{\overline{BA}}^2+ {\overline{BO}}^2\\&=(\alpha L)^2-(\alpha \mu )^2+(1-\alpha \mu )^2=1-2\alpha \mu +\alpha ^2L^2. \end{aligned}$$

Since \(\overline{C'C}\) is a chord of circle O, it is within the circle. Since 2 non-identical circles intersect at no more than 2 points, and since D is within circle O, arc is within circle O. Finally, the region bounded by (darker shade) is within circle O (lighter shade).

The previous diagram illustrates the case \(\alpha \mu >1\). In the cases \(\alpha \mu =1\) and \(\alpha \mu < 1\), we have a slightly different geometry, but the same arguments and calculations hold.

figure f

The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRG-full by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)

Fact 6

Assume A is \(\mu \)-strongly monotone and \(\beta \)-cocoercive with \(0<\mu<1/\beta <\infty \). For \(\alpha \in (0,2\beta )\), the iterates of (FS) converge exponentially to the zero \(x^\star \) with rate

$$\begin{aligned} \Vert x^k-x^\star \Vert \le \left( 1-2\alpha \mu +\alpha ^2\mu /\beta \right) ^{k/2} \Vert x^0-x^\star \Vert . \end{aligned}$$

Proof

This follows from Fact 7 below. \(\square \)

Fact 7

Let \(0<\mu<1/\beta <\infty \) and \(\alpha \in (0,2\beta )\). If \({\mathcal {A}}=\mathcal {M}_\mu \cap \mathcal {C}_\beta \), then \(I-\alpha {\mathcal {A}}\subseteq {\mathcal {L}}_R\) for

$$\begin{aligned} R= \sqrt{1-2\alpha \mu +\alpha ^2\mu /\beta }. \end{aligned}$$

This result is tight in the sense that \(I-\alpha {\mathcal {A}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.

Proof outline  We quickly outline the geometric insight while deferring the full proof with precise geometric arguments to the Sect. C in the appendix. For the case \(\mu <1/(2\beta )\), we have the geometry

figure g

where the calculations involve the use of the Pythagorean theorem. In the cases \(\mu =1/(2\beta )\) and \(\mu >1/(2\beta )\), we have a slightly different geometry, but the same arguments and calculations hold. \(\square \)

SRG inversion

In this subsection, we relate inversion of operators with inversion (reciprocal) of complex numbers. This operation is intimately connected to inversive geometry.

Operator inversion

Theorem 5

If \({\mathcal {A}}\) is a class of operators, then

$$\begin{aligned} {\mathcal {G}}({\mathcal {A}}^{-1})=\left( {\mathcal {G}}({\mathcal {A}})\right) ^{-1}. \end{aligned}$$

If \({\mathcal {A}}\) is furthermore SRG-full, then \({\mathcal {A}}^{-1}\) is SRG-full.

Since a class of operators can consist of a single operator, if \(A:\mathcal {H}\rightrightarrows \mathcal {H}\), then \({\mathcal {G}}(A^{-1})=({\mathcal {G}}(A))^{-1}\). To clarify, \(({\mathcal {G}}({\mathcal {A}}))^{-1}=\{z^{-1}\,|\,z\in {\mathcal {G}}({\mathcal {A}})\}\subseteq \overline{\mathbb {C}}\). Note that \(({\mathcal {G}}({\mathcal {A}}))^{-1}=(\overline{{\mathcal {G}}({\mathcal {A}})})^{-1}\), since \({\mathcal {G}}({\mathcal {A}})\) is symmetric about the real axis, so we write the simpler \(({\mathcal {G}}({\mathcal {A}}))^{-1}\) even though the inversion map we consider is \(z\mapsto {\bar{z}}^{-1}\).

Proof

The equivalence of non-zero finite points, i.e.,

$$\begin{aligned} {\mathcal {G}}(A^{-1})\backslash \{0,\infty \}=\left( {\mathcal {G}}(A) \backslash \, \{0,\infty \}\right) ^{-1}, \end{aligned}$$

follows from

$$\begin{aligned} \mathcal {G}(A)\backslash \{0,\infty \}&= \left\{ \frac{\Vert u-v\Vert }{\Vert x-y\Vert } \exp \left[ \pm i \angle (u-v,x-y)\right] \,\Big |\, (x,u),(y,v)\in A,\, x\ne y,\,u\ne v \right\} \end{aligned}$$

and

$$\begin{aligned}&\mathcal {G}(A^{-1})\backslash \{0,\infty \}\\&\qquad = \left\{ \frac{\Vert x-y\Vert }{\Vert u-v\Vert } \exp \left[ \pm i \angle (x-y,u-v)\right] \,\Big |\, (u,x),(v,y)\in A^{-1},\, x\ne y,\,u\ne v \right\} \\&\qquad = \left\{ \frac{\Vert x-y\Vert }{\Vert u-v\Vert } \exp \left[ \pm i \angle (u-v,x-y)\right] \,\Big |\, (x,u),(y,v)\in A,\, x\ne y,\,u\ne v \right\} \\&\qquad =\left( {\mathcal {G}}(A)\backslash \{0,\infty \}\right) ^{-1} \end{aligned}$$

where we use the fact that \(\angle (a,b)=\angle (b,a)\).

The equivalence of the zero and infinite points follow from

$$\begin{aligned} \infty \in {\mathcal {G}}(A)&\quad \Leftrightarrow \quad \exists \, (x,u),(x,v)\in A,\,u\ne v\\&\quad \Leftrightarrow \quad \exists \, (u,x),(v,x)\in A^{-1},\,u\ne v\\&\quad \Leftrightarrow \quad 0\in {\mathcal {G}}(A^{-1}). \end{aligned}$$

With the same argument, we have \(0\in {\mathcal {G}}(A) \Leftrightarrow \infty \in {\mathcal {G}}(A^{-1})\).

The inversion operation is reversible. For any \(B:\mathcal {H}\rightrightarrows \mathcal {H}\),

$$\begin{aligned} {\mathcal {G}}(B)\subseteq {\mathcal {G}}( {\mathcal {A}}^{-1}) \quad \Rightarrow \quad {\mathcal {G}}(B^{-1})\subseteq {\mathcal {G}}({\mathcal {A}}) \quad \Rightarrow \quad B^{-1}\in {\mathcal {A}}\quad \Rightarrow \quad B\in {\mathcal {A}}^{-1}, \end{aligned}$$

and we conclude \({\mathcal {A}}^{-1}\) is SRG-full. \(\square \)

Convergence analysis: proximal point

Consider the monotone inclusion problem

$$\begin{aligned} \text{ find } x\in {\mathcal {H}}\quad \text{ such } \text{ that } \quad 0\in Ax \end{aligned}$$

where A is a maximal monotone operator with a zero. Consider the proximal point method [13, 47, 48, 62]

$$\begin{aligned} x^{k+1}=J_{\alpha A}x^k, \end{aligned}$$
(PP)

where \(\alpha >0\) and \(x^0\in \mathcal {H}\) is a starting point. Since \(J_{\alpha A}\) is 1/2-averaged, we can use the Krasnosel’skiĭ–Mann theorem to establish convergence of (PP). Under stronger assumptions, (PP) converges exponentially.

Fact 8

Assume A is \(\mu \)-strongly monotone with \(\mu >0\). For \(\alpha >0\), the iterates of (PP) converge exponentially to the zero \(x^\star \) with rate

$$\begin{aligned} \Vert x^k-x^\star \Vert \le \left( \frac{1}{1+\alpha \mu }\right) ^k \Vert x^0-x^\star \Vert . \end{aligned}$$

Proof

This follows from Fact 9, which we state and prove below. \(\square \)

Fact 9

(Proposition 23.13 [5]) Let \(\mu \in (0,\infty )\) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}=\mathcal {M}_\mu \), then \(J_{\alpha {\mathcal {A}}}\subseteq {\mathcal {L}}_R\) for

$$\begin{aligned} R= \frac{1}{1+\alpha \mu }. \end{aligned}$$

This result is tight in the sense that \(J_{\alpha {\mathcal {A}}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.

Proof

By Proposition 1 and Theorems 4 and 5, we have the geometry

figure h

The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRG-full by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)

Convergence analysis: Douglas–Rachford

Consider the monotone inclusion problem

$$\begin{aligned} \text{ find } x\in {\mathcal {H}}\quad \text{ such } \text{ that } \quad 0\in (A+B)x, \end{aligned}$$

where A and B are operators and \(A+B\) has a zero. Consider Douglas–Rachford splitting [22, 44]

$$\begin{aligned} z^{k+1}=\left( \tfrac{1}{2}I+\tfrac{1}{2}(2J_{\alpha A}-I)(2J_{\alpha B}-I)\right) z^k, \end{aligned}$$
(DR)

where \(\alpha >0\) and \(z^0\in \mathcal {H}\) is a starting point. If \(z^\star \) is a fixed point, then \(J_{\alpha B}(z^\star )\) is a zero of \(A+B\) (see tutorial [63, p. 28] or textbook [5, Proposition 26.1]). We can use the Krasnosel’skiĭ–Mann theorem to establish convergence of (DR).

Fact 10

(Theorem 1 [44]) Assume A and B are maximal monotone. For \(\alpha >0\), the iterates of (DR) converge in that \(z^k\rightarrow z^\star \) weakly for some fixed point \(z^\star \).

Proof

By Proposition 1 and Theorems 4 and 5, we have the geometry

figure i

Since \({\mathcal {L}}_1\) is SRG-full, Theorem 2 implies \((2J_{\alpha A}-I)\) is nonexpansive. By the same reasoning, \((2J_{\alpha B}-I)\) is nonexpansive, and, since the composition of nonexpansive operators is nonexpansive, \((2J_{\alpha A}-I)(2J_{\alpha B}-I)\) is nonexpansive. So (DR) is a fixed-point iteration with a 1/2-averaged operator, and the iteration converges by the Krasnosel’skiĭ–Mann theorem. \(\square \)

When we have further assumptions, we can provide a stronger rate of convergence.

Fact 11

Assume A or B is \(\mu \)-strongly monotone and \(\beta \)-cocoercive with \(0<\mu<1/\beta <\infty \). For \(\alpha >0\), the iterates of (DR) converge exponentially to the fixed point \(z^\star \) with rate

$$\begin{aligned} \Vert z^k-z^\star \Vert \le \left( \frac{1}{2}+\frac{1}{2} \sqrt{1-\frac{4\alpha \mu }{1+2\alpha \mu +\alpha ^2\mu /\beta }}\right) ^k \Vert z^0-z^\star \Vert . \end{aligned}$$

Proof

If \(S_1\) is \(R_1\)-Lipschitz continuous and \(S_2\) is \(R_2\)-Lipschitz continuous, then \(S_1S_2\) is \((R_1R_2)\)-Lipschitz continuous. If S is R-Lipschitz continuous, then \(\frac{1}{2}I+\frac{1}{2}S\) is \(\left( \frac{1}{2}+\frac{R}{2}\right) \)-Lipschitz continuous. The result follows from these observations and Fact 12, which we state and prove below. \(\square \)

Fact 12

(Theorem 7.2 [28]) Let \(0<\mu<1/\beta <\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}= {\mathcal {M}}_\mu \cap {\mathcal {C}}_\beta \), then \(2J_{\alpha {\mathcal {A}}}-I\subseteq {\mathcal {L}}_R\) for

$$\begin{aligned} R=\sqrt{1-\frac{4\alpha \mu }{1+2\alpha \mu +\alpha ^2\mu /\beta }}. \end{aligned}$$

This result is tight in the sense that \(2J_{\alpha {\mathcal {A}}}-I\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.

Proof outline  We quickly outline the geometric insight while deferring the full proof with precise geometric arguments to the Sect. C in the appendix. We have the geometry

figure j

The radius R is obtained with Stewart’s theorem [65]. \(\square \)

As a special case, consider the optimization problem

$$\begin{aligned} \begin{array}{ll} \underset{x\in {\mathcal {H}}}{\text{ minimize }}&f(x)+g(x), \end{array} \end{aligned}$$

where f and g are functions (not necessarily differentiable) and a minimizer exists. Then (DR) with \(A=\partial f\) and \(B=\partial g\) can be written as

$$\begin{aligned} x^{k+1/2}&=J_{\alpha \partial g}(z^k)\\ x^{k+1}&=J_{\alpha \partial f}(2x^{k+1/2}-z^k)\\ z^{k+1}&=z^k+x^{k+1}-x^{k+1/2}, \end{aligned}$$

where \(\alpha >0\) and \(z^0\in \mathcal {H}\) is a starting point. As an aside, the popular method ADMM is equivalent to this instance of Douglas–Rachford splitting [26].

Fact 13

Assume f is \(\mu \)-strongly convex and L-smooth with \(0<\mu<L<\infty \). Assume g is convex, lower semi-continuous, and proper. For \(\alpha >0\), the iterates of (DR) converge exponentially to the fixed point \(z^\star \) with rate

$$\begin{aligned} \Vert z^k-z^\star \Vert \le \left( \frac{1}{2}+\frac{1}{2} \max \left\{ \left| \frac{1-\alpha \mu }{1+\alpha \mu }\right| ,\left| \frac{1-\alpha L}{1+\alpha L}\right| \right\} \right) ^k \Vert z^0-z^\star \Vert \end{aligned}$$

Proof

If \(S_1\) is \(R_1\)-Lipschitz continuous and \(S_2\) is \(R_2\)-Lipschitz continuous, then \(S_1S_2\) is \((R_1R_2)\)-Lipschitz continuous. If S is R-Lipschitz continuous, then \(\frac{1}{2}I+\frac{1}{2}S\) is \(\left( \frac{1}{2}+\frac{R}{2}\right) \)-Lipschitz continuous. The result follows from these observations and Fact 12, which we state and prove below. \(\square \)

Fact 14

(Theorem 1 [29]) Let \(0<\mu<L<\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}= \partial {\mathcal {F}}_{\mu ,L}\), then \(2J_{\alpha {\mathcal {A}}}-I\subseteq {\mathcal {L}}_R\) for

$$\begin{aligned} R= \max \left\{ \left| \frac{1-\alpha \mu }{1+\alpha \mu }\right| ,\left| \frac{1-\alpha L}{1+\alpha L}\right| \right\} . \end{aligned}$$

This result is tight in the sense that \(2J_{\alpha {\mathcal {A}}}-I\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.

Proof

By Proposition 2 and Theorems 4 and 5, we have the geometry

figure k

The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRG-full by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)

Sum of operators

Fig. 6
figure 6

The chord property

Given \(z,w\in \mathbb {C}\), define the line segment between z and w as

$$\begin{aligned}{}[z,w]=\{\theta z+(1-\theta )w\,|\,\theta \in [0,1]\}. \end{aligned}$$

We say an SRG-full class \({\mathcal {A}}\) satisfies the chord property if

\(z\in {\mathcal {G}}({\mathcal {A}})\backslash \{\infty \}\) implies \([z,{\bar{z}}]\subseteq {\mathcal {G}}({\mathcal {A}})\). See Fig. 6.

Theorem 6

Let \({\mathcal {A}}\) and \({\mathcal {B}}\) be SRG-full classes such that \(\infty \notin {\mathcal {G}}(A)\) and \(\infty \notin {\mathcal {G}}(B)\). Then

$$\begin{aligned} {\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\supseteq {\mathcal {G}}({\mathcal {A}})+{\mathcal {G}}({\mathcal {B}}). \end{aligned}$$

If \({\mathcal {A}}\) or \({\mathcal {B}}\) furthermore satisfies the chord property, then

$$\begin{aligned} {\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})={\mathcal {G}}({\mathcal {A}})+{\mathcal {G}}({\mathcal {B}}). \end{aligned}$$

Although we do not pursue this, one can generalize Theorem 6 to allow \(\infty \) by excluding the following exception: if \(\emptyset ={\mathcal {G}}({\mathcal {A}})\) and \(\infty \in {\mathcal {G}}({\mathcal {B}})\), then \(\{\infty \}={\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\).

Proof

We first show \({\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\supseteq {\mathcal {G}}({\mathcal {A}})+{\mathcal {G}}({\mathcal {B}})\). Assume \({\mathcal {G}}({\mathcal {A}})\ne \emptyset \) and \({\mathcal {G}}({\mathcal {B}})\ne \emptyset \) as otherwise there is nothing to show. Let \(z\in {\mathcal {G}}({\mathcal {A}})\) and \(w\in {\mathcal {G}}({\mathcal {B}})\) and let \(A_z\) and \(A_w\) be their corresponding operators as defined in Lemma 1. Then it is straightforward to see that \(A_z+A_w\) corresponds to complex multiplication with respect to \((z+w)\), and \(z+w\in {\mathcal {G}}(A_z+A_w)\subseteq {\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\).

Next, we show \({\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\subseteq {\mathcal {G}}({\mathcal {A}})+{\mathcal {G}}({\mathcal {B}})\). Consider the case \({\mathcal {G}}({\mathcal {A}})\ne \emptyset \) and \({\mathcal {G}}({\mathcal {B}})\ne \emptyset \). Without loss of generality, assume it is \({\mathcal {A}}\) that satisfies the chord property. Consider \(A+B\in {\mathcal {A}}+{\mathcal {B}}\) such that \(A\in {\mathcal {A}}\) and \(B\in {\mathcal {B}}\) . Consider \((x,u_A+u_B),(y,v_A+v_B)\in A+B\) such that \(x\ne y\), \((x,u_A),(y,v_A)\in A\), and \((x,u_B),(y,v_B)\in B\). Define

$$\begin{aligned} z_A&=\frac{\Vert u_A-v_A\Vert }{\Vert x-y\Vert } \exp \left[ i\angle (u_A-v_A,x-y) \right] \in {\mathcal {G}}(A)\\ z_B&=\frac{\Vert u_B-v_B\Vert }{\Vert x-y\Vert } \exp \left[ i\angle (u_B-v_B,x-y) \right] \in {\mathcal {G}}(B)\\ z&=\frac{\Vert u_A+u_B-v_A-v_B\Vert }{\Vert x-y\Vert } \exp \left[ i\angle (u_A+u_B-v_A-v_B,x-y) \right] \in {\mathcal {G}}(A+B). \end{aligned}$$

(Note that \({\text {Im}}z_A,{\text {Im}}z_B,{\text {Im}}z\ge 0\).) Since

$$\begin{aligned} {\text {Re}}z_A=\frac{\langle u_A-v_A,x-y\rangle }{\Vert x-y\Vert ^2},\qquad {\text {Re}}z_B=\frac{\langle u_B-v_B,x-y\rangle }{\Vert x-y\Vert ^2},\\ {\text {Re}}z=\frac{\langle (u_A+u_B)-(v_A+v_B),x-y\rangle }{\Vert x-y\Vert ^2}, \end{aligned}$$

we have \({\text {Re}}z = {\text {Re}}z_A + {\text {Re}}z_B\). Using (1) and the triangle inequality, we have

$$\begin{aligned} {\text {Im}}z&=\frac{\Vert P_{\{x-y\}^\perp }(u_A+u_B-v_A-v_B)\Vert }{\Vert x-y\Vert }\\&\le \frac{\Vert P_{\{x-y\}^\perp }(u_A-v_A)\Vert +\Vert P_{\{x-y\}^\perp }(u_B-v_B)\Vert }{\Vert x-y\Vert }\\&={\text {Im}}z_A+{\text {Im}}z_B, \end{aligned}$$

and using the reverse triangle inequality, we have \({\text {Im}}z\ge -{\text {Im}}z_A+{\text {Im}}z_B\). Together, we conclude

$$\begin{aligned} -{\text {Im}}z_A+{\text {Im}}z_B\le {\text {Im}}z\le {\text {Im}}z_A+{\text {Im}}z_B \end{aligned}$$

and

$$\begin{aligned} z\in [z_A,\overline{z_A}]+z_B,\qquad {\overline{z}}\in [z_A,\overline{z_A}]+\overline{z_B}. \end{aligned}$$

This shows

$$\begin{aligned} {\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})&\subseteq \left\{ w_A+z_B \,|\, w_A\in \left[ z_A,\overline{z_A}\right] ,\, z_A\in {\mathcal {G}}({\mathcal {A}}),\,z_B\in {\mathcal {G}}({\mathcal {B}}) \right\} \\&= \left\{ w_A+z_B \,|\, w_A\in {\mathcal {G}}({\mathcal {A}}),\,z_B\in {\mathcal {G}}({\mathcal {B}}) \right\} ={\mathcal {G}}({\mathcal {A}})+{\mathcal {G}}({\mathcal {B}}), \end{aligned}$$

where the equality follows from the chord property.

Now, consider the case \({\mathcal {G}}({\mathcal {A}})= \emptyset \) or \({\mathcal {G}}({\mathcal {B}})= \emptyset \) (or both). (We also discuss this degenerate case in Sect. A.3). Assume \({\mathcal {G}}({\mathcal {A}})= \emptyset \) without loss of generality and let \(A\in {\mathcal {A}}\) and \(B\in {\mathcal {B}}\). Then \({\mathrm {dom}(A)}\) is empty or a singleton, and if \(\{x\}={\mathrm {dom}(A)}\) then Ax is a singleton. Therefore \({\mathrm {dom}(A+B)}\subseteq {\mathrm {dom}(A)}\) is empty or a singleton, and if \(\{x\}={\mathrm {dom}(A)}\) then \((A+B)x\) is empty or a singleton since B is single-valued. Therefore, \({\mathcal {G}}(A+B)=\emptyset \) and we conclude \({\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})=\emptyset \). \(\square \)

Composition of operators

Given \(z\in \mathbb {C}\), define the right-hand arc between z and \({\bar{z}}\) as

$$\begin{aligned} \mathrm {Arc}^+(z,{\bar{z}}) =\left\{ re^{i(1-2\theta )\varphi }\,\Big |\, z=re^{i\varphi },\, \varphi \in (-\pi ,\pi ],\,\theta \in [0,1],\,r\ge 0 \right\} \end{aligned}$$
Fig. 7
figure 7

Left and right-arc properties

and the left-hand arc as

$$\begin{aligned} \mathrm {Arc}^-(z,{\bar{z}}) =-\mathrm {Arc}^+(-z,-{\bar{z}}). \end{aligned}$$

We say an SRG-full class \({\mathcal {A}}\) respectively satisfies the left-arc property and right-arc property if \(z\in {\mathcal {G}}({\mathcal {A}})\backslash \{\infty \}\) implies \(\mathrm {Arc}^-{(z,{\bar{z}})}\subseteq {\mathcal {G}}({\mathcal {A}})\) and \(\mathrm {Arc}^+{(z,{\bar{z}})}\subseteq {\mathcal {G}}({\mathcal {A}})\), respectively. We say \({\mathcal {A}}\) satisfies an arc property if the left or right-arc property is satisfied. See Fig. 7.

Theorem 7

Let \({\mathcal {A}}\) and \({\mathcal {B}}\) be SRG-full classes such that \(\infty \notin {\mathcal {G}}({\mathcal {A}})\), \(\emptyset \ne {\mathcal {G}}({\mathcal {A}})\), \(\infty \notin {\mathcal {G}}({\mathcal {B}})\), and \(\emptyset \ne {\mathcal {G}}({\mathcal {B}})\). Then

$$\begin{aligned} {\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\supseteq {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}}). \end{aligned}$$

If \({\mathcal {A}}\) or \({\mathcal {B}}\) furthermore satisfies a left or right arc property, then

$$\begin{aligned} {\mathcal {G}}({\mathcal {A}}{\mathcal {B}})={\mathcal {G}}({\mathcal {B}}{\mathcal {A}})={\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}}). \end{aligned}$$

Although we do not pursue this, one can generalize Theorem 7 to allow \(\emptyset \) and \(\infty \) by excluding the following exceptions: if \(\emptyset ={\mathcal {G}}({\mathcal {A}})\) and \(\infty \in {\mathcal {G}}({\mathcal {B}})\), then \(\{\infty \}={\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\); if \(0\in {\mathcal {G}}({\mathcal {A}})\) and \(\infty \in {\mathcal {G}}({\mathcal {B}})\), then \(\infty \in {\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\); if \(\emptyset = {\mathcal {G}}({\mathcal {A}})\) and \(0\in {\mathcal {G}}({\mathcal {B}})\), then \(\{0\}={\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\) and \(\emptyset ={\mathcal {G}}({\mathcal {B}}{\mathcal {A}})\).

Proof

We first show \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\supseteq {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). Assume \({\mathcal {G}}({\mathcal {A}})\ne \emptyset \) and \({\mathcal {G}}({\mathcal {B}})\ne \emptyset \) as otherwise there is nothing to show. Let \(z\in {\mathcal {G}}({\mathcal {A}})\) and \(w\in {\mathcal {G}}({\mathcal {B}})\) and let \(A_z\) and \(A_w\) be their corresponding operators as defined in Lemma 1. Then it is straightforward to see that \(A_zA_w\) corresponds to complex multiplication with respect to zw, and \(zw\in {\mathcal {G}}(A_zA_w)\subseteq {\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\).

Next, we show \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\subseteq {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). Let \(A\in {\mathcal {A}}\) and \(B\in {\mathcal {B}}\). Consider \((u,s),(v,t)\in A\) and \((x,u),(y,v)\in B\), where \(x\ne y\). This implies \((x,s),(y,t)\in AB\). Define

$$\begin{aligned} z=\frac{\Vert s-t\Vert }{\Vert x-y\Vert } \exp \left[ i\angle (s-t,x-y) \right] . \end{aligned}$$

Consider the case \(u=v\). Then \(0\in {\mathcal {G}}({\mathcal {B}})\). Moreover, \(s=t\), since A is single-valued (by the assumption \(\infty \notin {\mathcal {G}}({\mathcal {A}})\)), and \(z=0\). Therefore, \(z=0\in {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\).

Next, consider the case \(u\ne v\). Define

$$\begin{aligned} z_A=\frac{\Vert s-t\Vert }{\Vert u-v\Vert } e^{i\varphi _A} ,\quad z_B=\frac{\Vert u-v\Vert }{\Vert x-y\Vert } e^{i\varphi _B} , \end{aligned}$$

where \(\varphi _A=\angle (s-t,u-v)\) and \(\varphi _B= \angle (u-v,x-y)\). Consider the case where \({\mathcal {A}}\) satisfies the right-arc property. Using the spherical triangle inequality (further discussed in the appendix) we see that either \(\varphi _A\ge \varphi _B\) and

$$\begin{aligned} z&\in \frac{\Vert s-t\Vert }{\Vert u-v\Vert } \frac{\Vert u-v\Vert }{\Vert x-y\Vert } \exp \left[ i[\varphi _A-\varphi _B,\varphi _A+\varphi _B]\right] \\&\subseteq \frac{\Vert s-t\Vert }{\Vert u-v\Vert } \frac{\Vert u-v\Vert }{\Vert x-y\Vert } \exp \left[ i[\varphi _B-\varphi _A,\varphi _B+\varphi _A]\right] \\&= z_B \mathrm {Arc}^+\left( z_A,\overline{z_A} \right) \end{aligned}$$

or \(\varphi _A< \varphi _B\) and

$$\begin{aligned} z&\in \frac{\Vert s-t\Vert }{\Vert u-v\Vert } \frac{\Vert u-v\Vert }{\Vert x-y\Vert } \exp \left[ i[\varphi _B-\varphi _A,\varphi _B+\varphi _A]\right] \\&=z_B \mathrm {Arc}^+\left( z_A,\overline{z_A} \right) . \end{aligned}$$

This gives us

$$\begin{aligned} z\in \underbrace{ z_B}_{\in {\mathcal {G}}({\mathcal {B}})} \underbrace{\mathrm {Arc}^+\left( z_A,\overline{z_A}\right) }_{\subseteq {\mathcal {G}}({\mathcal {A}})} \subseteq {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}}). \end{aligned}$$

That \({\bar{z}}\in {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\) follows from the same argument. That \(z,{\bar{z}}\in {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\) when instead \({\mathcal {B}}\) satisfies the right-arc property follows from the same argument.

Putting everything together, we conclude \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})= {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\) when \({\mathcal {A}}\) or \({\mathcal {B}}\) satisfies the right-arc property. When \({\mathcal {A}}\) satisfies the left-arc property, \(-{\mathcal {A}}\) satisfies the right-arc property. So

$$\begin{aligned} -{\mathcal {G}}({\mathcal {A}}{\mathcal {B}})= {\mathcal {G}}(-{\mathcal {A}}{\mathcal {B}})={\mathcal {G}}(-{\mathcal {A}}){\mathcal {G}}({\mathcal {B}}) -{\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}}) \end{aligned}$$

by Theorem 4, and we conclude \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})={\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). When \({\mathcal {B}}\) satisfies the left-arc property, \({\mathcal {B}}\circ (-I)\) satisfies the right-arc property. So

$$\begin{aligned} -{\mathcal {G}}({\mathcal {A}}{\mathcal {B}})= {\mathcal {G}}({\mathcal {A}}{\mathcal {B}}\circ (-I))={\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}}\circ (-I)) = -{\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}}) \end{aligned}$$

by Theorem 4, and we conclude \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})={\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). \(\square \)

We cannot fully drop the arc property from the second part of Theorem 7. Consider the SRG-full operator class \({\mathcal {A}}\) represented by \(h(a,b,c)=|a-b|+|c|\), which has \({\mathcal {G}}({\mathcal {A}})=\{\pm i\}\). Linear operators on \(\mathbb {R}^3\) representing 90 degrees rotations are in \({\mathcal {A}}\). With this, one can show the strict containment \({\mathcal {G}}({\mathcal {A}}{\mathcal {A}})=\{z\in \mathbb {C}\,|\,|z|=1\}\supset {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {A}})\).

As a consequence of Theorem 7, the SRGs of operator classes commute under composition even though individual operators, in general, do not commute when an arc property is satisfied. Several results in operator theory involving 2 operators exhibit previously unexplained symmetry. The Ogura–Yamada–Combettes averagedness factor [17, 57], the contraction factor of Giselsson [28], the contraction factor of Moursi and Vandenberghe [51], the contraction factor of Ryu, Taylor, Bergeling, and Giselsson [64] are all symmetric in the assumptions of the two operators. Theorem 7 shows that this symmetry is not a coincidence.

Convergence analysis: alternating projections

Consider the convex feasibility problem

$$\begin{aligned} \text{ find } x\in {\mathcal {H}}\quad \text{ such } \text{ that } \quad x\in C\cap D \end{aligned}$$

where \(C\subseteq {\mathcal {H}}\) and \(D\subseteq {\mathcal {H}}\) are nonempty closed convex sets and \(C\cap D\ne \emptyset \). Consider the alternating projections method [55, Theorem 13.7]

$$\begin{aligned} x^{k+1}=P_CP_Dx^k, \end{aligned}$$
(AP)

where \(P_C\) and \(P_D\) are projections onto C and D and \(x_0\in {\mathcal {H}}\) is a starting point.

Fact 15

The iterates of (AP) converge in that \(x^k\rightarrow x^\star \) weakly for some \(x^\star \in C\cap D\).

Proof

By [5, Proposition 4.16], \(P_C\) and \(P_D\) are 1/2-averaged. By Fact 16, which we state and prove below, \(P_CP_D\) is 2/3-averaged, and the iteration converges by the Krasnosel’skiĭ–Mann theorem. \(\square \)

Fact 16

(Propsition 4.42 [5]) Let \({\mathcal {N}}_{1/2}\) be the class of firmly nonexpansive operators. Then

$$\begin{aligned} {\mathcal {N}}_{1/2}{\mathcal {N}}_{1/2}\subset {\mathcal {N}}_{2/3}. \end{aligned}$$

(containment is strict.) Furthermore,

figure l

In Fact 16, the precise characterization \({\mathcal {G}}({\mathcal {N}}_{1/2}{\mathcal {N}}_{1/2})\) is new, but \({\mathcal {N}}_{1/2}{\mathcal {N}}_{1/2}\subset {\mathcal {N}}_{2/3}\) is known.

Proof outline  We quickly outline the geometric insight while deferring the full proof with precise geometric arguments to the Sect. C in the appendix.

Define

figure m

and

$$\begin{aligned} S=\bigcup _{0\le \varphi _1\le 2\pi }S_{\varphi _1}, \qquad S_{\varphi _1}= Q\left( \frac{1}{2}+\frac{1}{2}e^{i\varphi _1}\right) . \end{aligned}$$

In geometric terms, this construction takes a point on the circle C, draws the disk whose diameter is the line segment between this point and the origin, and takes the union of such disks. \(S={\mathcal {G}}({\mathcal {N}}_{1/2}){\mathcal {G}}({\mathcal {N}}_{1/2})\) follows from Theorem 7.

figure n

To show \(S=\left\{ re^{i\varphi }\,|\,0\le r\le \cos ^2(\varphi /2)\right\} \), we analyze S in the inverted space. Write \({\mathcal {I}}:\overline{\mathbb {C}}\rightarrow \overline{\mathbb {C}}\) for the mapping \({\mathcal {I}}(z)={\bar{z}}^{-1}\).

figure o

The union of the half-spaces \({\mathcal {I}}(S)=\bigcup _{0\le \varphi _1\le 2\pi }{\mathcal {I}}(S_{\varphi _1})\) forms a parabola.

figure p

Find the largest circle tangent to the parabola at point 1 and invert back.

figure q

The largest circle to the left of the parabola is inverted to the smallest circle (i.e., tight averagedness circle) containing the SRG. The known formula \(r(\varphi )\le \cos ^2(\varphi /2)\) describes the parabola under the inversion mapping. \(\square \)

Tightness and constructing lower bounds

An advantage of geometric proofs is that tightness is often immediate. In the proof of Fact 12, for example, it is clear that finding a smaller circle containing the SRG is not possible. Consequently, the rate of Fact 11 cannot be improved.

However, although tightness is proved with the geometric arguments, sometimes one may wish to construct an explicit counterexample achieving the tight rate. This can be done by picking the extreme point on the complex plane, finding a corresponding \(2\times 2\) matrix with Lemma 1, and reverse engineering the proof.

In the setup of Fact 12,

figure r

the extreme point z corresponds to the complex number

$$\begin{aligned} z= \frac{1-\alpha ^2\mu /\beta }{ 1+2\alpha \mu +\alpha ^2\mu /\beta } + \frac{ 2\alpha \sqrt{(1-\mu \beta )\mu /\beta } }{ 1+2\alpha \mu +\alpha ^2\mu /\beta } i. \end{aligned}$$

Lemma 1 provides a corresponding operator \(A_z:\mathbb {R}^2\rightarrow \mathbb {R}^2\)

$$\begin{aligned} A_z \begin{bmatrix} \zeta _1\\ \zeta _2 \end{bmatrix} = \underbrace{ \frac{1}{1+2\alpha \mu +\alpha ^2\mu /\beta } \begin{bmatrix} 1-\alpha ^2\mu /\beta &{} -2\alpha \sqrt{(1-\mu \beta )\mu /\beta }\\ 2\alpha \sqrt{(1-\mu \beta )\mu /\beta }&{} 1-\alpha ^2\mu /\beta \end{bmatrix} }_{=M} \begin{bmatrix} \zeta _1\\ \zeta _2 \end{bmatrix} \end{aligned}$$

In the proof, the depicted geometry was obtained through the transformations \(A\mapsto I+\alpha A\), \(A\mapsto A^{-1}\), and \(A\mapsto 2A-I\). We revert the transformations by applying \(A\mapsto \frac{1}{2}I+\frac{1}{2}A\), \(A\mapsto A^{-1}\), and \(A\mapsto \frac{1}{\alpha }(A-I)\) and define \(A:\mathbb {R}^2\rightarrow \mathbb {R}^2\) a as

$$\begin{aligned} A \begin{bmatrix} \zeta _1\\ \zeta _2 \end{bmatrix}=\frac{1}{\alpha }\left( \left( \frac{1}{2}I+\frac{1}{2}M\right) ^{-1}-I\right) \begin{bmatrix} \zeta _1\\ \zeta _2 \end{bmatrix}. \end{aligned}$$

(We do not show the individual entries as they are very complicated.) Finally, if \(B=0\), then the fixed-point iteration

$$\begin{aligned} z^{k+1}=\left( \tfrac{1}{2}I+\tfrac{1}{2}(2J_{\alpha A}-I)(2J_{\alpha B}-I)\right) z^k \end{aligned}$$

converges at the exact rate given by Fact 11.

If \(\mathcal {A}\) is SRG-full and \(z\in \mathcal {G}( \mathcal {A})\), then there is an operator A on \(\mathbb {R}^2\) constructed such that \(\{z,{\overline{z}}\}=\mathcal {G}(A)\) so explicit counterexamples providing the lower bounds can be constructed in \(\mathbb {R}^2\). When an operator class is not SRG-full, counter examples still exist, but they may not be in \(\mathbb {R}^2\).

Insufficiency of metric subregularity for linear convergence

Recently, there has been much interest in analyzing optimization methods under assumptions weaker than strong convexity or strong monotonicity. One approach is to assume metric subregularity in place of strong monotonicity and establish linear convergence.

In this section, we show that it is not always possible to replace strong monotonicity with metric subregularity. In particular, we show impossibility results proving the insufficiency of metric subregularity in establishing linear convergence for certain setups where strong monotonicity is sufficient.

Inverse Lipschitz continuity and metric subregularity

Let

$$\begin{aligned} {\mathcal {L}}^{-1}_{\gamma }&=\left\{ A^{-1}\,|\,A\in {\mathcal {L}}_{\gamma }\right\} ,\\&=\big \{A:{\mathrm {dom}(A)}\rightarrow \mathcal {H}\,|\,\gamma ^2\Vert Ax-Ay\Vert ^2\ge \Vert x-y\Vert ^2,\,\forall \, x,y\in \mathcal {H},\,{\mathrm {dom}(A)}\subseteq \mathcal {H}\big \} \end{aligned}$$

be the class of inverse Lipschitz continuous operators with parameter \(\gamma \in (0,\infty )\), which has the SRG

figure s

It is clear that inverse Lipschitz continuity is weaker than strong monotonicity in the sense that \(A\in {\mathcal {M}}_{1/\gamma }\) implies \( A\in {\mathcal {L}}^{-1}_{\gamma }\).

An operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is \(\gamma \)-metrically subregular at \(x_0\) for \(y_0\) if \(y_0\in Ax_0\) and there exists a neighborhood V of \(x_0\) such that

$$\begin{aligned} d(x,A^{-1}y_0)\le \gamma d(y_0,A(x)),\quad \forall \, x\in V. \end{aligned}$$

Although not necessarily obvious from first sight, metric subregularity is weaker than inverse Lipschitz continuity, i.e., \(A\in {\mathcal {L}}^{-1}_{\gamma }\) implies A is metrically subregular at x for y with parameter \(\gamma \), for any \((x,y)\in A\).

Metric subregularity of A is equivalent to “calmness” of \(A^{-1}\) [20], and calmness is also known as “Upper Lipschitz continuity” [61]. For subdifferential operators of convex functions, metric subregularity is equivalent to the “error bound condition” [23]. See [21] for an in-depth treatment of this subject.

Metric subregularity has been used in place of strong monotonicity to establish linear convergence for a wide range of setups. Leventhal [41, Theorem 3.1] used metric subregularity for the proximal point method; Bauschke, Noll, and Phan [6, Lemma 3.8] and Liang, Fadili, and Peyré [42, Theorem 3] for the Krasnoselskii–Mann iteration; Latafat and Patrinos [40, Theorem 3.3] for their splitting method AFBA; Ye et al. [70] for the proximal gradient method, the proximal alternating linearized minimization algorithm, and the randomized block coordinate proximal gradient method; and Yuan, Zeng, and Zhang for ADMM, DRS, and PDHG [71]. See [11, 23, 36, 53, 72] for a systematic study of this subject. Although most recent work concerns sufficiency of metric subregularity or related assumptions in establishing linear convergence, Zhang [72] studied the necessary and sufficient conditions.

Impossibility proofs

Douglas–Rachford splitting (DRS) is known to be a strict contraction under the combined assumption of Lipschitz continuity and strong monotonicity: [44, Proposition 4], [30, Theorem 4.1], [19, Table 1 under \(A=B=I\)], [18, Theorems 5–7], [28, Theorem 6.3], and [64, Theorem 4]. Is it possible to establish linear convergence with Lipschitz continuity and metric subregularity or a variation of metric subregularity? Then answer is no in the sense of Corollaries 1 and 2.

Define the DRS operator with respect to operators A and B with parameters \(\alpha \) and \(\theta \) as

$$\begin{aligned} D_{\alpha ,\theta }(A,B)=(1-\theta )I+\theta (2J_{\alpha A}-I)(2J_{\alpha B}-I) \end{aligned}$$

and the class of DRS operators as

$$\begin{aligned} D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})=\{D_{\alpha ,\theta }(A,B)\,|\,A\in {\mathcal {A}},\,B\in {\mathcal {B}},\, A:\mathcal {H}\rightrightarrows \mathcal {H},\, B:\mathcal {H}\rightrightarrows \mathcal {H}\}. \end{aligned}$$

Define \(T(B,A,\alpha ,\theta )\) and \(T({\mathcal {B}},{\mathcal {A}},\alpha ,\theta )\) analogously.

Theorem 8

Let \(0<1/\gamma \le L<\infty \) and \(\alpha \in (0,\infty )\). Let \({\mathcal {A}}={\mathcal {M}}\cap {\mathcal {L}}_L\cap ({\mathcal {L}}_\gamma )^{-1}\) and \({\mathcal {B}}={\mathcal {M}}\). Then for any \(\theta \ne 0\)

$$\begin{aligned} D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\nsubseteq {\mathcal {L}}_{1-\varepsilon } \end{aligned}$$

for any \(\varepsilon >0\). The same conclusion holds for \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\).

Proof

We have the geometry

figure t

Note the line segment \({\overline{AB}}\) is mapped to the (minor) arc . Using Theorem 7, we have

figure u

because is on the unit circle, and since \({\mathcal {G}}(2J_{\alpha {\mathcal {B}}}-I)=\{z\,|\,|z|\le 1\}\). So we have \(1\in {\mathcal {G}}(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}}))\), but \(1\notin {\mathcal {G}}({\mathcal {L}}_{1-\varepsilon })\) for any \(\varepsilon >0\). Therefore, \({\mathcal {G}}(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}}))\nsubseteq {\mathcal {G}}({\mathcal {L}}_{1-\varepsilon })\) and, with Theorem 2, we conclude \(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\nsubseteq {\mathcal {L}}_{1-\varepsilon }\). The result for the operator \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\) follows from similar reasoning. \(\square \)

Corollary 1

Let \(0<1/\gamma \le L<\infty \) and \(\alpha \in (0,\infty )\). Let \(B\in {\mathcal {M}}\) and let \(A\in {\mathcal {M}}\cap {\mathcal {L}}_L\) satisfy a condition weaker than or equal to \(\gamma \)-inverse Lipschitz continuity, such as \(\gamma \)-metric subregularity. It is not possible to establish a strict contraction of the DRS operators \(D_{\alpha ,\theta }(A,B)\) or \(D_{\alpha ,\theta }(B,A)\) for any \(\alpha >0\) and \(\theta \ne 0\) without further assumptions.

Theorem 9

Let \(\gamma ,L,\alpha \in (0,\infty )\). Let \({\mathcal {A}}={\mathcal {M}}\cap ({\mathcal {L}}_\gamma )^{-1}\) and \({\mathcal {B}}={\mathcal {M}}\cap {\mathcal {L}}_L\). If \(1/\gamma \le L\) and \(\theta \ne 0\), then

$$\begin{aligned} D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\nsubseteq {\mathcal {L}}_{1-\varepsilon } \end{aligned}$$

for any \(\varepsilon >0\). If \(1/\gamma > L\) and \(\theta \in (0,1)\), then \(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\subseteq {\mathcal {L}}_R\) for

$$\begin{aligned} R= \sqrt{1-\frac{4\theta (1-\theta )(1-\gamma L)^2}{(1+\gamma ^2/\alpha ^2)(1+\alpha ^2 L^2)}}. \end{aligned}$$

This result is tight in the sense that \(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\nsubseteq {\mathcal {L}}_R\) for any smaller value of R. The same conclusion holds for \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\).

Proof

Consider the case \(\alpha /\gamma <1\) and \(\alpha L<1\). We have

figure v

Let and let \(\overline{S_A}\) as the region bounded by . These sets provide an inner and outer bound of \({\mathcal {G}}(2J_{\alpha {\mathcal {A}}}-1)\) in the sense that

$$\begin{aligned} \underline{S_A} \subseteq {\mathcal {G}}(2J_{\alpha {\mathcal {A}}}-1) \subseteq \overline{S_A}. \end{aligned}$$

Note that \(J_{\alpha {\mathcal {A}}}\) satisfies the left-arc property. By the law of cosines, we have

$$\begin{aligned} \cos (\varphi _A)&= \frac{1}{2\cdot {\overline{AO}}\cdot {\overline{OD}}} \left( {\overline{AO}}^2+ {\overline{OD}}^2-{\overline{AD}}^2\right) \\&= \tfrac{1}{2\cdot 1\cdot \left( \tfrac{1+\alpha ^2/\gamma ^2}{1-\alpha ^2/\gamma ^2}\right) } \left( 1^2+ \left( \tfrac{1+\alpha ^2/\gamma ^2}{1-\alpha ^2/\gamma ^2}\right) ^2 - \left( \tfrac{1+\alpha ^2/\gamma ^2}{1-\alpha ^2/\gamma ^2} - \tfrac{1-\alpha /\gamma }{1+\alpha /\gamma } \right) ^2 \right) =\tfrac{1-\alpha ^2/\gamma ^2}{1+\alpha ^2/\gamma ^2}. \end{aligned}$$

Likewise, we have

figure w

Let as the circular sector bounded by . Again, we have

$$\begin{aligned} \underline{S_B} \subseteq {\mathcal {G}}(2J_{\alpha {\mathcal {B}}}-1) \subseteq \overline{S_B}, \end{aligned}$$

and

$$\begin{aligned} \cos \varphi _B=\tfrac{1-\alpha ^2 L^2}{1+\alpha ^2 L^2}. \end{aligned}$$

Using the arccosine sum identity [2, p. 80, 4.4.33], we get

$$\begin{aligned} \cos (\varphi _A-\varphi _B) =1-\tfrac{2(\alpha L-\alpha /\gamma )^2}{(1+\alpha ^2L^2)(1+\alpha ^2/\gamma ^2)}. \end{aligned}$$

When \(1/\gamma \le L\), we have \(\varphi _A\le \varphi _B\). In this case,

figure x

Therefore

$$\begin{aligned} 1\in (1-\theta )1+\theta \underline{S_A}\, \underline{S_B} \subseteq {\mathcal {G}}(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})), \end{aligned}$$

but \(1\notin {\mathcal {G}}({\mathcal {L}}_{1-\varepsilon })\) for any \(\varepsilon >0\). Therefore, we conclude \({\mathcal {G}}(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}}))\nsubseteq {\mathcal {G}}({\mathcal {L}}_{1-\varepsilon })\).

When \(1/\gamma > L\), we have \(\varphi _A>\varphi _B\). In this case,

figure y
figure z

Using the outer bounds \(\overline{S_A}\) and \(\overline{S_B}\) we establish correctness. Using the inner bounds \(\underline{S_A}\) and \(\underline{S_B}\) we establish tightness.

figure aa

With the Pythagorean theorem, we can verify that the containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRG-full by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class.

The result for the cases \(\alpha /\gamma \ge 1\) or \(\alpha L\ge 1\) and for the operator \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\) follows from similar reasoning. \(\square \)

Corollary 2

Let \(0<1/\gamma \le L<\infty \) and \(\alpha \in (0,\infty )\). Let \(B\in {\mathcal {M}}\cap {\mathcal {L}}_L\) and let \(A\in {\mathcal {M}}\) satisfy a condition weaker than or equal to \(\gamma \)-inverse Lipschitz continuity, such as \(\gamma \)-metric subregularity. It is not possible to establish a strict contraction of the DRS operators \(T(A,B,\alpha ,\theta )\) or \(T(B,A,\alpha ,\theta )\) for any \(\alpha >0\) and \(\theta \in \mathbb {R}\) without further assumptions.

Conclusion

In this work, we presented the scaled relative graph, a tool that maps the action of an operator to the extended complex plane. This machinery enables us to analyze nonexpansive and monotone operators with geometric arguments. The geometric ideas should complement the classical analytical approaches and bring clarity.

Extending this geometric framework to more general setups and spaces is an interesting future direction. Some fixed-point iterations, such as the power iteration of non-symmetric matrices [49] or the Bellman iteration [10], are analyzed most effectively through notions other than the norm induced by the inner product (the Euclidean norm for finite-dimensional spaces). Whether it is possible to gain insight through geometric arguments in such setups would be worthwhile to investigate.