Abstract
Many iterative methods in applied mathematics can be thought of as fixedpoint iterations, and such algorithms are usually analyzed analytically, with inequalities. In this paper, we present a geometric approach to analyzing contractive and nonexpansive fixed point iterations with a new tool called the scaled relative graph. The SRG provides a correspondence between nonlinear operators and subsets of the 2D plane. Under this framework, a geometric argument in the 2D plane becomes a rigorous proof of convergence.
Introduction
Fixedpoint iterations abound in applied mathematics and engineering. This classical technique, dating back to [43, 56, 59], involves the following two steps. First, find an operator \(T:\mathcal {X}\rightarrow \mathcal {X}\), where \(\mathcal {X}\) is some space, such that if \(x^\star =T(x^\star )\), i.e., if \(x^\star \) is a fixed point, then \(x^\star \) is a solution to the problem at hand. Second, perform the fixedpoint iteration \(x^{k+1}=T(x^k)\). Convergence of such iterative methods is usually proved analytically, through a series of inequalities.
In this paper, we present a geometric approach to analyzing contractive and nonexpansive fixedpoint iterations with a new tool called the scaled relative graph (SRG). We can think of the SRG as a signature of an operator analogous to how eigenvalues are a signature of a matrix. The SRG provides a correspondence between algebraic operations on nonlinear operators and geometric operations on subsets of the 2D plane. Using this machinery and elementary Euclidean geometry, we can establish properties of operators (such as contractiveness) and establish the convergence of fixedpoint iterations through showing the SRG, a set in the 2D plane, resides within certain circles. These geometric arguments form rigorous proofs, not just illustrations.
One advantage of geometric proofs is that a single or a few geometric diagrams concisely capture and communicate the core insight. In contrast, it is much more difficult to extract a core insight from a classical analytic proof based on inequalities. Another advantage is that tightness, loosely defined as being unable to improve a stated result without additional assumptions, is often immediate. In contrast, discerning whether it is possible to make improvements when examining a proof based on inequalities is usually more difficult; providing a matching lower bound is often the only way to establish tightness of such results.
Proving convergence with operator properties
Given \(T:\mathcal {H}\rightarrow \mathcal {H}\), where \(\mathcal {H}\) is a real Hilbert space with norm \(\Vert \cdot \Vert \), consider the fixedpoint iteration given by
for \(k=0,1,\dots \) where \(x^0\in \mathcal {H}\) is a starting point. We say \(x^\star \) is a fixed point of T if \(x^\star =T(x^\star )\). We say \(T:\mathcal {H}\rightarrow \mathcal {H}\) is nonexpansive if
In this case, \(\Vert x^{k}x^\star \Vert \) is a nonincreasing sequence, but \(x^k\) need not converge. For instance, if \(T=I\), then \(x^k\) oscillates between \(x^0\) and \(x^0\). We say \(T:\mathcal {H}\rightarrow \mathcal {H}\) is contractive if
for some \(L<1\). In this case, \(x^k\rightarrow x^\star \) strongly with rate \(\Vert x^kx^\star \Vert \le L^k\Vert x^0x^\star \Vert \). This classical argument is the Banach contraction principle [3]. We say \(T:\mathcal {H}\rightarrow \mathcal {H}\) is averaged if \(T=(1\theta )I+\theta R\) for some nonexpansive operator R and \(\theta \in (0,1)\), where I is the identity operator. In this case, \(x^k\rightarrow x^\star \) weakly for a fixed point \(x^\star \) provided that T has a fixed point. This result is the Krasnosel’skiĭ–Mann theorem [39, 46]. The assumption of averagedness is stronger than nonexpansiveness and weaker than contractiveness, as illustrated in Fig. 1.
We now have a general rubric for proving convergence of a fixedpoint iteration:

1.
Prove the operator T is contractive or averaged.

2.
Apply the convergence argument of Banach or Krasnosel’skiĭ–Mann.
Many, although not all, fixedpoint iterations are analyzed through this rubric. Step 2 is routine. This work presents a geometric approach to step 1, the more difficult step.
Prior work and contribution
Using circles or disks centered at the origin to illustrate contractive mappings is natural and likely common. Eckstein and Bertsekas’s illustration of firmnonexpansiveness via the disk with radius 1/2 centered at (1/2, 0) [24, 25] was, to the best of our knowledge, the first geometric illustration of notions from fixedpoint theory other than nonexpansiveness and Lipschitz continuity. Since then, Giselsson and Boyd used similar illustrations in earlier versions of the paper [29] (the arXiv versions 1 through 3 have the geometric diagrams, but later versions do not) and more thoroughly in the lecture slides [27]. Banjac and Goulart also utilize similar illustrations [4].
Through personal communication, we are aware that many have privately used geometric illustrations similar to those presented in this paper to initially build intuition, although the actual mathematics and proofs were eventually presented analytically, with inequalities. To the best of our knowledge, the use of geometry for rigorous proofs of results of nonlinear operators is new.
The notion of the SRG was first defined and presented in the authors’ unpublished manuscript [31]. The work shows how transformations of the operator such as inversion, addition of identity, unitary change in coordinates, and composition map to changes in the SRG and used these transformations to geometrically rigorously prove many standard results. It furthermore discusses the Baillon–Haddad Theorem and convergence rates for various operator methods.
Throughout this paper, we state known results as “Facts”. Our contributions are the alternative geometric proofs, the novel results stated as “Propositions” and “Theorems”, and the overall geometric approach based on the SRG.
Preliminaries
We refer readers to standard references for more information on convex analysis [9, 12, 32], nonexpansive and monotone operators [5, 63], and geometry [50, 58, 69]. Write \(\mathcal {H}\) for a real Hilbert space equipped with the inner product \(\langle \cdot ,\cdot \rangle \) and norm \(\Vert \cdot \Vert \). We use Minkowskitype set notation that generalizes operations on individual elements to sets. For example, given \(\alpha \in \mathbb {R}\) and sets \(U,V\subseteq \mathcal {H}\), write
Notice that if either U or V is \(\emptyset \), then \(U+V=\emptyset \). In particular, \(U+V\) is the Minkowski sum. We use similar notation for sets of operators and complex numbers. The meanings should be clear from context, but for the sake of precision, we provide the full definitions in the appendix.
Multivalued operators For convex analytical and operator theoretic notions, we follow standard notation [5]. In particular, we consider multivalued operators, which map a point to a set. The graph of an operator is defined as
For convenience, we do not distinguish an operator from its graph, writing \((x,u)\in A\) to mean \(u\in Ax\). Define the inverse operator as
which always exists. Define the resolvent of A is \(J_A=(I+A)^{1}\).
We say \({\mathcal {A}}\) is a class of operators if \({\mathcal {A}}\) is a set of operators on Hilbert spaces. Note that \(A_1,A_2\in {\mathcal {A}}\) need not be defined on the same Hilbert spaces, i.e., \(A_1:\mathcal {H}_1\rightrightarrows \mathcal {H}_1\), \(A_2:\mathcal {H}_2\rightrightarrows \mathcal {H}_2\), and \(\mathcal {H}_1\ne \mathcal {H}_2\) is possible.
Given classes of operators \({\mathcal {A}}\) and \({\mathcal {B}}\), write
To clarify, these definitions require that A and B or A and I are operators on the same (but arbitrary) Hilbert space \(\mathcal {H}\), as otherwise the operations would not make sense. We define \({\mathcal {A}}{\mathcal {B}}\), \(I+\alpha {\mathcal {A}}\), and \(J_{\alpha {\mathcal {A}}}\) similarly. For \(L\in (0,\infty )\), define the class of LLipschitz operators as
For \(\beta \in (0,\infty )\), define the class of \(\beta \)cocoercive operators as
Define the class of monotone operators as
To clarify, \(\langle AxAy,xy\rangle \ge 0\) means \(\langle uv,xy\rangle \ge 0\) for all \((x,u),(y,v)\in A\). If \(x\notin {\mathrm {dom}(A)}\), then the inequality is vacuous. A monotone operator A is maximal if there is no other monotone operator B such that \(\mathrm {graph}(B)\) properly contains \(\mathrm {graph}(A)\). For \(\mu \in (0,\infty )\), define the class of \(\mu \)strongly monotone operators as
For \(\theta \in (0,1)\), define the class of \(\theta \)averaged operators \({\mathcal {N}}_\theta \) as
In these definitions, we do not impose any requirements on the domain or maximality of the operators.
Following the notation of [54], respectively write \({\mathcal {F}}_{\mu ,L}\), \({\mathcal {F}}_{0,L}\), \({\mathcal {F}}_{\mu ,\infty }\), and \({\mathcal {F}}_{0,\infty }\) for the sets of lower semicontinuous proper functions on all Hilbert spaces that are respectively \(\mu \)strongly convex and Lsmooth, convex and Lsmooth, \(\mu \)strongly convex, and convex, for \(0< \mu< L < \infty \). Write
where \(0 \le \mu < L \le \infty \).
Inversive geometry We use the extended complex plane \(\overline{\mathbb {C}}=\mathbb {C}\cup \{\infty \}\) to represent the 2D plane and the point at infinity. We call \(z\mapsto {\bar{z}}^{1}\), a onetoone map from \(\overline{\mathbb {C}}\) to \(\overline{\mathbb {C}}\), the inversion map. In polar form, it is \(re^{i\varphi }\mapsto (1/r)e^{i\varphi }\) for \(0\le r\le \infty \), i.e., inversion preserves the angle and inverts the magnitude. In complex analysis, the inversion map is known as the Möbius transformation [1, p. 366]. In classical Euclidean geometry, inversive geometry considers generally the inversion of the 2D plane about any circle [58, p. 75]. Our inversion map \(z\mapsto {\bar{z}}^{1}\) is the inversion about the unit circle.
Generalized circles consist of (finite) circles and lines with \(\{\infty \}\), and the interpretation is that a line is a circle with infinite radius. Inversion maps generalized circles to generalized circles. Using a compass and straightedge, the inversion of a generalized circle can be constructed fully geometrically. In this paper, we use the following semigeometric construction:

1.
Draw a line L through the origin orthogonally intersecting the generalized circle.

2.
Let \(\infty<x< y\le \infty \) represent the signed distance of the intersecting points from the origin along this line. If the generalized circle is a line, then \(y=\infty \).

3.
Draw a generalized circle orthogonally intersecting L at (1/x) and (1/y).

4.
When inverting a region with a generalized circle as the boundary, pick a point on L within the interior of the region to determine on which side of the boundary the inverted interior lies.
Scaled relative graphs
In this section, we define the notion of scaled relative graphs (SRG). Loosely speaking, SRG maps the action of an operator to a set on the extended complex plane.
We use the extended complex plane \(\overline{\mathbb {C}}=\mathbb {C}\cup \{ \infty \}\) to represent the 2D plane and the point at infinity. Since complex numbers compactly represent rotations and scaling, this choice simplifies our notation compared to using \(\mathbb {R}^2\cup \{ \infty \}\). We avoid the operations \(\infty +\infty \), 0/0, \(\infty /\infty \), and \(0\cdot \infty \). Otherwise, we adopt the convention of \(z+\infty =\infty \), \(z/\infty =0\), \(z/0=\infty \), and \(z\cdot \infty =\infty \).
SRG of operators
Consider an operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\). Let \(x,y\in \mathcal {H}\) be a pair of inputs and let \(u,v\in \mathcal {H}\) be their corresponding outputs, i.e., \(u\in Ax\), and \(v\in Ay\). The goal is to understand the change in output relative to the change in input.
First, consider the case \(x\ne y\). Consider the complex conjugate pair
where given any \(a,b\in \mathcal {H}\)
denotes the angle between them. The absolute value (magnitude) \(z=\tfrac{\Vert uv\Vert }{\Vert xy\Vert }\) represents the size of the change in outputs relative to the size of the change in inputs. The argument (angle) \(\angle (uv,xy)\) represents how much the change in outputs is aligned with the change in inputs. Equivalently, \({\text {Re}}z\) and \({\text {Im}}z\) respectively represent the components of \(uv\) aligned with and perpendicular to \(xy\), i.e.,
where \(P_{\mathrm {span}\{xy\}}\) is the projection onto the span of \(xy\) and \(P_{\{xy\}^\perp }\) is the projection onto the subspace orthogonal to \(xy\).
Define the SRG of an operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) as
We clarify several points: (i) \({\mathcal {G}}(A)\subseteq \overline{\mathbb {C}}\). (ii) \(\infty \in {\mathcal {G}}(A)\) if and only if there is a point \(x\in \mathcal {H}\) such that Ax is multivalued. (In this case, there exists \((x,y),(u,v)\in A\) such that \(x=y\) and \(u\ne v\), and the idea is that \(z=\Vert uv\Vert /0=\infty \), i.e., \(uv\) is infinitely larger than \(xy=0\).) (iii) the ± makes \({\mathcal {G}}(A)\) symmetric about the real axis. (We include the ± because \(\angle (uv,xy)\) always returns a nonnegative angle.) See Fig. 4 for examples.
For linear operators, the SRG generalizes eigenvalues. Given \(A\in \mathbb {R}^{n\times n}\), write \(\varLambda (A)\) for the set of eigenvalues of A.
Theorem 1
If \(A\in \mathbb {R}^{n\times n}\) and \(n = 1\) or \(n\ge 3\), then \(\varLambda (A)\subseteq {\mathcal {G}}(A)\).
The result fails for \(n=2\) because \(S^{n1}\), the sphere in \(\mathbb {R}^n\), is not simply connected for \(n=2\); the proof constructs a loop in \(S^{n1}\) and argues the image of the loop on complex plane is nullhomotopic. Figure 5 illustrates an SRG of a matrix. The SRG of a matrix does not seem to be directly related to the numerical range (field of values) [33] or the pseudospectrum [67].
Proof
If \(\lambda \) is a real eigenvalue of A, then considering (1) with x as the corresponding eigenvector and \(y=0\) tells us \(\lambda \in {\mathcal {G}}(A)\).
Next consider a complex conjugate eigenvalue pair \(\lambda ,{\overline{\lambda }}\in \varLambda (A)\), where \({\text {Im}}\lambda >0\). (This case excludes \(n=1\).) A has a real Schur decomposition of the form
where \(b,c>0\), \(\lambda =a+i\sqrt{bc}\), and \(Q\in \mathbb {R}^{n\times n}\) is orthogonal. (To obtain this decomposition, take the construction of [52] and apply a \(\pm 45\) degree rotation to the leading \(2\times 2\) block.) Since an orthogonal change of coordinates does not change the SRG, we have \({\mathcal {G}}(A)={\mathcal {G}}(B)\). Write \(S^{n1}\) for the sphere in \(\mathbb {R}^n\). Consider the continuous map \(z:S^{n1}\rightarrow \mathbb {C}\) defined by \(z(x)=\Vert Bx\Vert \exp \left[ i\angle (Bx,x)\right] \). Since B is a linear operator, we have \(z(S^{n1})={\mathcal {G}}(B)\). Consider the curve \(\gamma (t)=\cos (2\pi t)e_1+\sin (2\pi t)e_2\in S^{n1}\) from \(t\in [0,1]\), where \(e_1\) and \(e_2\) are the first and second unit vectors in \(\mathbb {R}^n\). With simple computation, we get
If \(b=c\), then \(z(\gamma (t))=\lambda \) and we conclude \(\lambda \in z(S^{n1})={\mathcal {G}}(A)\).
Assume \(b\ne c\), and assume for contradiction that \(\lambda \notin z(S^{n1})={\mathcal {G}}(A)\). The curve \(z(\gamma (t))\) strictly encloses the eigenvalue \(\lambda =a+i\sqrt{bc}\) since \(\min (b,c)\le \sqrt{bc}\le \max (b,c)\). Since \(S^{n1}\) is simply connected for \(n\ge 3\), we can continuously contract \(\gamma (t)\) to a point in \(S^{n1}\), and the continuous map z provides a continuous contraction of \(z(\gamma (t))\) to a point in \(z(S^{n1})\). However, \(z( \gamma (t))\) has a nonzero winding number^{Footnote 1} around \(\lambda \) and \(\lambda \notin z(S^{n1})\). Therefore, \(z( \gamma (t))\) cannot be continuously contracted to a point in \(z(S^{n1})\). This is a contradiction and we conclude \(\lambda \in z(S^{n1})={\mathcal {G}}(A)\). \(\square \)
The SRG \({\mathcal {G}}(A)\) maps the action of the operator A to points in \(\overline{\mathbb {C}}\). In the following sections, we will need to conversely take any point in \(\overline{\mathbb {C}}\) and find an operator whose action maps to that point. Lemma 1 provides such constructions.
Lemma 1
Take any \(z=z_r+z_ii\in \mathbb {C}\). Define \(A_z:\mathbb {R}^2\rightarrow \mathbb {R}^2\) and \(A_\infty :\mathbb {R}^2\rightrightarrows \mathbb {R}^2\) as
Then,
If we write \(\cong \) to identify an element of \(\mathbb {R}^2\) with an element in \(\mathbb {C}\) in that
then we can view \(A_z\) as complex multiplication with z in the sense that
Proof
Again, we write \(\cong \) to identify an element of \(\mathbb {R}^2\) with an element in \(\mathbb {C}\). Write \(z=r_ze^{i\theta _{z}}\). Consider any \(x,y\in \mathbb {R}^2\) where \(x\ne y\) and define \(u=A_zx\) and \(v=A_zy\). Then we can write
where \(r_w>0\), and
This gives us
and
Now consider \(A_\infty \). By definition, \(\infty \in {\mathcal {G}}(A_\infty )\). For any \(u\in A_\infty x\) and \(v\in A_\infty y\), we have \(x=y=0\), and therefore \({\mathcal {G}}(A_\infty )\) contains no finite \(z\in \mathbb {C}\). We conclude \({\mathcal {G}}(A_\infty )=\{\infty \}\). \(\square \)
SRG of operator classes
Let \({\mathcal {A}}\) be a collection of operators. We define the SRG of the class \({\mathcal {A}}\) as
We focus more on SRGs of operator classes, rather than individual operators, because theorems are usually stated with operator classes. For example, one might say “If A is 1/2cocoercive, i.e., if \(A\in {\mathcal {C}}_{1/2}\), then \(IA\) is nonexpansive.” We now characterize the SRG of the Lipschitz, averaged, monotone, strongly monotone, and cocoercive operator classes.
Proposition 1
Let \(\mu ,\beta ,L\in (0,\infty )\) and \(\theta \in (0,1)\). The SRGs of \({\mathcal {L}}_L\), \({\mathcal {N}}_\theta \), \({\mathcal {M}}\), \({\mathcal {M}}_\mu \), and \({\mathcal {C}}_\beta \) are, respectively, given by
Proof
First, characterize \({\mathcal {G}}({\mathcal {L}}_L)\). We have \({\mathcal {G}}({\mathcal {L}}_L)\subseteq \left\{ z\in \mathbb {C}\,\big \,z^2\le L^2\right\} \) since
Conversely, given any \(z\in \mathbb {C}\) such that \(z\le L\), the operator \(A_z\) of Lemma 1 satisfies \(\Vert A_zxA_zy\Vert \le L\Vert xy\Vert \) for any \(x,y\in \mathbb {R}^2\), i.e., \(A_z\in {\mathcal {L}}_L\), and \({\mathcal {G}}(A_z)=\{z,{\bar{z}}\}\). Therefore \({\mathcal {G}}({\mathcal {L}}_L)\supseteq \left\{ z\in \mathbb {C}\,\big \,z^2\le L^2\right\} \).
Next, characterize \({\mathcal {G}}({\mathcal {M}})\). For any \(A\in {\mathcal {M}}\), monotonicity implies
Considering (1), we conclude \({\mathcal {G}}(A)\backslash \{\infty \}\subseteq \{z\,\,{\text {Re}}z\ge 0\}\). On the other hand, given any \(z\in \{z\,\,{\text {Re}}z\ge 0\}\), the operator \(A_z\) of Lemma 1 satisfies \(\langle A_zxA_zy,xy\rangle \ge 0\) for any \(x,y\in \mathbb {R}^2\), i.e., \(A_z\in {\mathcal {M}}\), and \({\mathcal {G}}(A_z)=\{z,{\bar{z}}\}\). Therefore, \(z\in {\mathcal {G}}(A_z)\subset {\mathcal {G}}({\mathcal {M}})\), and we conclude \(\{z\,\,{\text {Re}}z\ge 0\}\subseteq {\mathcal {G}}(\mathcal {M})\). Finally, note that \(\infty \in {\mathcal {G}}({\mathcal {M}})\) is equivalent to saying that there exists a multivalued operator in \({\mathcal {M}}\). The \(A_\infty \) of Lemma 1 is one such example.
The other SRGs \({\mathcal {G}}({\mathcal {M}}_\mu )\), \({\mathcal {G}}({\mathcal {C}}_\beta )\), and \({\mathcal {G}}({\mathcal {N}}_\theta )\) can be characterized with similar direct proofs or by using operator and SRG transformations introduced later in Sect. 4. In particular: the fact \({\mathcal {M}}_\mu =\mu I+{\mathcal {M}}\), Theorem 4, and the characterization of \({\mathcal {G}}({\mathcal {M}})\) prove the characterization \({\mathcal {G}}({\mathcal {M}}_\mu )\); the fact \(({\mathcal {M}}_\mu )^{1}={\mathcal {C}}_{\mu }\), Theorem 5, and the characterization \({\mathcal {G}}({\mathcal {M}}_\mu )\) prove the characterization \({\mathcal {G}}({\mathcal {C}}_{\mu })\); and the fact \((1\theta )I+\theta {\mathcal {L}}_1={\mathcal {N}}_\theta \), Theorem 4, and the characterization of \({\mathcal {G}}({\mathcal {L}}_1)\) prove the characterization of \({\mathcal {G}}({\mathcal {N}}_\theta )\). Facts \({\mathcal {M}}_\mu =\mu I+{\mathcal {M}}\), \(({\mathcal {M}}_\mu )^{1}={\mathcal {C}}_{\mu }\), and \((1\theta )I+\theta {\mathcal {L}}_1={\mathcal {N}}_\theta \) are well known [5]. \(\square \)
Proposition 2
Let \(0<\mu<L<\infty \). Then
Proof
Since \(\partial \mathcal {F}_{0,\infty }\subset \mathcal {M}\), we have \({\mathcal {G}}(\partial \mathcal {F}_{0,\infty })\subseteq {\mathcal {G}}({\mathcal {M}})= \{z\in \mathbb {C}\,\,{\text {Re}}z\ge 0\}\cup \{\infty \}\) by Proposition 1. We claim \(f:\mathbb {R}^2\rightarrow \mathbb {R}\) defined by \(f(x,y)=x\) satisfies \({\mathcal {G}}(\partial f)=\{z\in \mathbb {C}\,\,{\text {Re}}z\ge 0\}\cup \{\infty \}\). This tells us \(\{z\in \mathbb {C}\,\,{\text {Re}}z\ge 0\}\cup \{\infty \}\subseteq {\mathcal {G}}(\partial \mathcal {F}_{0,\infty })\).
We prove the claim with basic computation. Let \(f(x,y)=x\). The subgradient has the form \(\partial f(x,y)=(h(x),0)\) for h defined by:
Since \(\partial f\) is multivalued at (0, 0), we have \(\infty \in {\mathcal {G}}(\partial f)\). Since \(\partial f(1,0)=\partial f(2,0)\), we have \(0\in {\mathcal {G}}(\partial f)\). The inputoutput pairs \((0,0)\in \partial f(0,0)\) and \((h(R\cos (\theta )),0)\in \partial f(R\cos (\theta ),R\sin (\theta ))\) map to the point \(R^{1}( \cos (\theta ) ,\pm \sin (\theta ))\in \mathbb {C}\). Clearly the image of this map over the range \(R\in (0,\infty )\), \(\theta \in [0,2\pi )\) is the righthand plane except the origin. Hence \({\mathcal {G}}(\partial f)=\{z\in \mathbb {C}\,\,{\text {Re}}z\ge 0\}\cup \{\infty \}\).
The SRGs \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,\infty })\), \({\mathcal {G}}(\partial {\mathcal {F}}_{0,L})\), and \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,L})\) can be characterized with similar direct proofs or by using operator and SRG transformations introduced later in Sect. 4. In particular: the fact \(\partial {\mathcal {F}}_{\mu ,\infty }=\mu I+\partial {\mathcal {F}}_{0,\infty }\), Theorem 4, and the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{0,\infty })\) prove the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,\infty })\); the fact \(\partial {\mathcal {F}}_{0,L}=\left( \partial {\mathcal {F}}_{1/L,\infty }\right) ^{1}\), Theorem 5, and the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{1/L,\infty })\) prove the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{0,L})\); and the fact \(\partial {\mathcal {F}}_{\mu ,L}= \mu I+\partial {\mathcal {F}}_{0,L\mu }\), Theorem 4, and the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{0,L\mu })\) prove the characterization of \({\mathcal {G}}(\partial {\mathcal {F}}_{\mu ,L})\). Facts \(\partial {\mathcal {F}}_{\mu ,\infty }=\mu I+\partial {\mathcal {F}}_{0,\infty }\), \(\partial {\mathcal {F}}_{0,L}=\left( \partial {\mathcal {F}}_{1/L,\infty }\right) ^{1}\), and \(\partial {\mathcal {F}}_{\mu ,L}= \mu I+\partial {\mathcal {F}}_{0,L\mu }\) are well known [66]. \(\square \)
SRGfull classes
Section 3.1 discussed how given an operator we can draw its SRG. Conversely, can we examine the SRG and conclude something about the operator? To perform this type of reasoning, we need further conditions.
We say class of operators \({\mathcal {A}}\) is SRGfull if
Since the implication \(A\in {\mathcal {A}}\Rightarrow {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\) already follows from the SRG’s definition, the substance of this definition is the implication \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\Rightarrow A\in {\mathcal {A}}\). Essentially, a class is SRGfull if it can be fully characterized by its SRG; given an SRGfull class \({\mathcal {A}}\) and an operator A, we can check membership \(A\in {\mathcal {A}}\) by verifying (through geometric arguments) the containment \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\) in the 2D plane.
SRGfullness assumes the desirable property \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\Rightarrow A\in {\mathcal {A}}\). We now discuss which classes possess this property.
Theorem 2
An operator class \({\mathcal {A}}\) is SRGfull if it is defined by
for some nonnegative homogeneous function \(h:\mathbb {R}^3\rightarrow \mathbb {R}\).
To clarify, h is nonnegative homogeneous if \(\theta h(a,b,c)= h(\theta a,\theta b,\theta c)\) for all \(\theta \ge 0\). (We do not assume h is smooth.) When a class \({\mathcal {A}}\) is defined by h as in Theorem 2, we say h represents \({\mathcal {A}}\). For example, the \(\mu \)strongly monotone class \({\mathcal {M}}_\mu \) is represented by \(h(a,b,c)=\mu bc\), since
As another example, firmlynonexpansive class \({\mathcal {N}}_{1/2}\) is represented by \(h(a,b,c)=ac\), since
By Theorem 2, the classes \({\mathcal {M}}\), \({\mathcal {M}}_\mu \), \({\mathcal {C}}_\beta \), \({\mathcal {L}}_L\), and \({\mathcal {N}}_\theta \) are all SRGfull. Respectively,

\({\mathcal {M}}\) is represented by \(h=c\),

\({\mathcal {M}}_\mu \) is represented by \(h=\mu bc\),

\({\mathcal {C}}_\beta \) is represented by \(h=\beta ac\),

\({\mathcal {L}}_L\) is represented by \(h= aLb\),

\({\mathcal {N}}_{\theta }\) is represented by \(h=a+(12\theta )b2(1\theta )c\).
If h and g represent SRGfull classes \({\mathcal {A}}\) and \({\mathcal {B}}\), then \(\max \{h,g\}\) represents \({\mathcal {A}}\cap {\mathcal {B}}\) and \(\min \{h,g\}\) represents \({\mathcal {A}}\cup {\mathcal {B}}\).
On the other hand, the classes \(\partial {\mathcal {F}}_{0,\infty }\), \(\partial {\mathcal {F}}_{\mu ,\infty }\), \(\partial {\mathcal {F}}_{0,L}\), and \(\partial {\mathcal {F}}_{\mu ,L}\) are not SRGfull. For example, the operator
satisfies \({\mathcal {G}}(A)=\{i,i\}\subseteq {\mathcal {G}}( \partial {\mathcal {F}}_{0,\infty })\). However, \(A\notin \partial {\mathcal {F}}_{0,\infty }\) because there is no convex function f for which \(\nabla f\) = DA.
Proof
Since \(A\in {\mathcal {A}}\Rightarrow {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\) always holds, we show \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\Rightarrow A\in {\mathcal {A}}\). Assume \({\mathcal {A}}\) is represented by h and an operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) satisfies \({\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\). Let \(u_A\in Ax_A\) and \(v_A\in Ay_A\) represent distinct evaluations, i.e., \(x_A\ne y_A\) or \(u_A\ne v_A\).
First consider the case \(x_A\ne y_A\). Then
satisfies \(z\in {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\). Since \(z\in {\mathcal {G}}({\mathcal {A}})\), there is an operator \(B\in {\mathcal {A}}\) such that \(u_B\in Bx_B\) and \(v_B\in By_B\) with
Since h represents \({\mathcal {A}}\), we have
and homogeneity gives us
Finally, by homogeneity we have
Now consider the case \(x_A=y_A\) and \(u_A\ne v_B\). Then A is multivalued and \(\infty \in {\mathcal {G}}(A)\subseteq {\mathcal {G}}({\mathcal {A}})\). Since \(\infty \in {\mathcal {G}}({\mathcal {A}})\), there is a multivalued operator \(B\in {\mathcal {A}}\) such that \(u_B\in Bx_B\) and \(v_B\in Bx_B\) with \(u_B\ne v_B\). This implies \(h(\Vert u_Bv_B\Vert ^2,0,0)\le 0\). Therefore, \(h(\Vert u_Av_A\Vert ^2,0,0)\le 0\).
In conclusion, \((x_A,u_A)\) and \((y_A,v_A)\), which represent arbitrary evaluations of A, satisfy the inequality defined by h, and we conclude \(A\in {\mathcal {A}}\). \(\square \)
Operator and SRG transformations
In this section, we show how transformations of operators map to changes in their SRGs. We then use these results and geometric arguments to analyze convergence of various fixedpoint iterations. The convergence analyses are tight in the sense that they cannot be improved without additional assumptions.
SRG intersection
Theorem 3
If \({\mathcal {A}}\) and \({\mathcal {B}}\) are SRGfull classes, then \({\mathcal {A}}\cap {\mathcal {B}}\) is SRGfull, and
The containment \({\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\) holds regardless of SRGfullness since, by definition, \({\mathcal {G}}({\mathcal {A}}\cap {\mathcal {B}})= \{{\mathcal {G}}(A)\,\, A\in {\mathcal {A}},\, A\in {\mathcal {B}}\}\) and \({\mathcal {G}}({\mathcal {A}}) \cap {\mathcal {G}}({\mathcal {B}})= \{{\mathcal {G}}(A) \cap {\mathcal {G}}(B)\,\, A\in {\mathcal {A}},\, B\in {\mathcal {B}}\}\). Therefore, the substance of Theorem 3 is \({\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\supseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\). This result is useful for setups with multiple assumptions on a single operator such as Facts 5, 7, 12. A similar result holds with the union.
Proof
Since \({\mathcal {A}}\) and \({\mathcal {B}}\) are SRGfull
for an operator C, and we conclude \({\mathcal {A}}\cap {\mathcal {B}}\) is SRGfull.
Assume \(z\in \mathbb {C}\) satisfies \(\{z,{\bar{z}}\}\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}({\mathcal {B}})\). Then \(A_z\) of Lemma 1 satisfies \({\mathcal {G}}(A_z)=\{z,{\bar{z}}\}\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}({\mathcal {B}})\). Since \({\mathcal {A}}\) and \({\mathcal {B}}\) are SRGfull, \(A_z\in {\mathcal {A}}\) and \(A_z\in {\mathcal {B}}\) and \(\{z,{\bar{z}}\}={\mathcal {G}}(A_z)\subseteq {\mathcal {G}}({\mathcal {A}}\cap {\mathcal {B}})\). If \(\infty \in {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}({\mathcal {B}})\), then a similar argument using \(A_\infty \) of Lemma 1 proves \(\infty \in {\mathcal {G}}({\mathcal {A}}\cap {\mathcal {B}})\). Therefore \({\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\subseteq {\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\). Since the other containment \({\mathcal {G}}({\mathcal {A}}\cap \mathcal {B})\subseteq {\mathcal {G}}({\mathcal {A}})\cap {\mathcal {G}}(\mathcal {B})\) holds by definition, we have the equality. \(\square \)
SRG scaling and translation
Theorem 4
Let \(\alpha \in \mathbb {R}\) and \(\alpha \ne 0\). If \({\mathcal {A}}\) is a class of operators, then
If \({\mathcal {A}}\) is furthermore SRGfull, then \(\alpha {\mathcal {A}}\), \({\mathcal {A}}\alpha \), and \(I+ {\mathcal {A}}\) are SRGfull.
Proof
\({\mathcal {G}}(\alpha A)=\alpha {\mathcal {G}}(A)\) and \({\mathcal {G}}(A\alpha )=\alpha {\mathcal {G}}(A)\) follow from the definition of the SRG, and \({\mathcal {G}}(I+ A)=1+{\mathcal {G}}(A)\) follows from (1). The scaling and translation operations are reversible and \({\mathcal {G}}((1/\alpha ){\mathcal {A}})={\mathcal {G}}({\mathcal {A}}(1/\alpha ))=(1/\alpha ){\mathcal {G}}({\mathcal {A}})\) and \({\mathcal {G}}({\mathcal {A}}I)={\mathcal {G}}(A)1\). For any \(B:\mathcal {H}\rightrightarrows \mathcal {H}\),
and we conclude \(\alpha {\mathcal {A}}\) is SRGfull. By a similar reasoning, \({\mathcal {A}}\alpha \) and \(I+ {\mathcal {A}}\) are SRGfull. \(\square \)
Since a class of operators can consist of a single operator, if \(A:\mathcal {H}\rightrightarrows \mathcal {H}\), then
To clarify, \(\alpha {\mathcal {G}}(A)\) corresponds to scaling \({\mathcal {G}}(A)\subseteq \overline{\mathbb {C}}\) by \(\alpha \) and reflecting about the vertical axis (imaginary axis) if \(\alpha <0\).
Convergence analysis: gradient descent
Consider the optimization problem
where f is a differentiable function with a minimizer. Consider gradient descent [16]
where \(\alpha >0\) and \(x^0\in \mathcal {H}\) is a starting point. We can use the Krasnosel’skiĭ–Mann theorem to establish convergence of (GD).
Fact 1
Assume f is convex and Lsmooth with \(L>0\). For \(\alpha \in (0,2/L)\), the iterates of (GD) converge in that \(x^k\rightarrow x^\star \) weakly for some \(x^\star \) such that \(\nabla f(x^\star )=0\).
Proof
By Propositions 1 and 2 and Theorem 4, we have the geometry
Since \({\mathcal {N}}_\theta \) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. Therefore \(I\alpha \nabla f\) is averaged, and the iteration converges by the Krasnosel’skiĭ–Mann theorem. \(\square \)
With stronger assumptions, we can establish an exponential rate of convergence for (GD).
Fact 2
Assume f is \(\mu \)strongly convex and Lsmooth with \(0<\mu<L<\infty \). For \(\alpha \in (0,2/L)\), the iterates of (GD) converge exponentially to the minimizer \(x^\star \) with rate
Proof
This follows from Fact 3, which we state and prove below. \(\square \)
Fact 3
Let \(0<\mu<L<\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}= \partial {\mathcal {F}}_{\mu ,L}\), then \(I\alpha {\mathcal {A}}\subseteq {\mathcal {L}}_R\) for
This result is tight in the sense that \(I\alpha {\mathcal {A}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.
Proof
By Proposition 2 and Theorem 4, we have the geometry
The containment of \({\mathcal {G}}(I\alpha {\mathcal {A}})\) holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)
Convergence analysis: forward step method
Consider the monotone inclusion problem
where A is a maximal monotone operator with a zero. Consider the forward step method [15]
where \(\alpha >0\) and \(x^0\in \mathcal {H}\) is a starting point. The forward step method is analogous to gradient descent. Under the following two setups, (FS) converges exponentially.
Fact 4
Assume A is \(\mu \)strongly monotone and LLipschitz with \(0<\mu<L<\infty \). For \(\alpha \in (0,2\mu /L)\), the iterates of (FS) converge exponentially to the zero \(x^\star \) with rate
Proof
This follows from Fact 5, which we state and prove below. \(\square \)
Fact 5
(Proposition 26.16 [5]) Let \(0<\mu<L<\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}} =\mathcal {M}_\mu \cap \mathcal {L}_L\), then \(I\alpha {\mathcal {A}}\subseteq {\mathcal {L}}_R\) for
This result is tight in the sense that \(I\alpha {\mathcal {A}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.
Proof
First consider the case \(\alpha \mu >1\). By Proposition 1 and Theorem 4, we have the geometry
To clarify, O is the center of the circle with radius \({\overline{OC}}\) (lighter shade) and A is the center of the circle with radius \({\overline{AC}}={\overline{AD}}\) defining the inner region (darker shade). With 2 applications of the Pythagorean theorem, we get
Since \(\overline{C'C}\) is a chord of circle O, it is within the circle. Since 2 nonidentical circles intersect at no more than 2 points, and since D is within circle O, arc is within circle O. Finally, the region bounded by (darker shade) is within circle O (lighter shade).
The previous diagram illustrates the case \(\alpha \mu >1\). In the cases \(\alpha \mu =1\) and \(\alpha \mu < 1\), we have a slightly different geometry, but the same arguments and calculations hold.
The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)
Fact 6
Assume A is \(\mu \)strongly monotone and \(\beta \)cocoercive with \(0<\mu<1/\beta <\infty \). For \(\alpha \in (0,2\beta )\), the iterates of (FS) converge exponentially to the zero \(x^\star \) with rate
Proof
This follows from Fact 7 below. \(\square \)
Fact 7
Let \(0<\mu<1/\beta <\infty \) and \(\alpha \in (0,2\beta )\). If \({\mathcal {A}}=\mathcal {M}_\mu \cap \mathcal {C}_\beta \), then \(I\alpha {\mathcal {A}}\subseteq {\mathcal {L}}_R\) for
This result is tight in the sense that \(I\alpha {\mathcal {A}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.
Proof outline We quickly outline the geometric insight while deferring the full proof with precise geometric arguments to the Sect. C in the appendix. For the case \(\mu <1/(2\beta )\), we have the geometry
where the calculations involve the use of the Pythagorean theorem. In the cases \(\mu =1/(2\beta )\) and \(\mu >1/(2\beta )\), we have a slightly different geometry, but the same arguments and calculations hold. \(\square \)
SRG inversion
In this subsection, we relate inversion of operators with inversion (reciprocal) of complex numbers. This operation is intimately connected to inversive geometry.
Operator inversion
Theorem 5
If \({\mathcal {A}}\) is a class of operators, then
If \({\mathcal {A}}\) is furthermore SRGfull, then \({\mathcal {A}}^{1}\) is SRGfull.
Since a class of operators can consist of a single operator, if \(A:\mathcal {H}\rightrightarrows \mathcal {H}\), then \({\mathcal {G}}(A^{1})=({\mathcal {G}}(A))^{1}\). To clarify, \(({\mathcal {G}}({\mathcal {A}}))^{1}=\{z^{1}\,\,z\in {\mathcal {G}}({\mathcal {A}})\}\subseteq \overline{\mathbb {C}}\). Note that \(({\mathcal {G}}({\mathcal {A}}))^{1}=(\overline{{\mathcal {G}}({\mathcal {A}})})^{1}\), since \({\mathcal {G}}({\mathcal {A}})\) is symmetric about the real axis, so we write the simpler \(({\mathcal {G}}({\mathcal {A}}))^{1}\) even though the inversion map we consider is \(z\mapsto {\bar{z}}^{1}\).
Proof
The equivalence of nonzero finite points, i.e.,
follows from
and
where we use the fact that \(\angle (a,b)=\angle (b,a)\).
The equivalence of the zero and infinite points follow from
With the same argument, we have \(0\in {\mathcal {G}}(A) \Leftrightarrow \infty \in {\mathcal {G}}(A^{1})\).
The inversion operation is reversible. For any \(B:\mathcal {H}\rightrightarrows \mathcal {H}\),
and we conclude \({\mathcal {A}}^{1}\) is SRGfull. \(\square \)
Convergence analysis: proximal point
Consider the monotone inclusion problem
where A is a maximal monotone operator with a zero. Consider the proximal point method [13, 47, 48, 62]
where \(\alpha >0\) and \(x^0\in \mathcal {H}\) is a starting point. Since \(J_{\alpha A}\) is 1/2averaged, we can use the Krasnosel’skiĭ–Mann theorem to establish convergence of (PP). Under stronger assumptions, (PP) converges exponentially.
Fact 8
Assume A is \(\mu \)strongly monotone with \(\mu >0\). For \(\alpha >0\), the iterates of (PP) converge exponentially to the zero \(x^\star \) with rate
Proof
This follows from Fact 9, which we state and prove below. \(\square \)
Fact 9
(Proposition 23.13 [5]) Let \(\mu \in (0,\infty )\) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}=\mathcal {M}_\mu \), then \(J_{\alpha {\mathcal {A}}}\subseteq {\mathcal {L}}_R\) for
This result is tight in the sense that \(J_{\alpha {\mathcal {A}}}\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.
Proof
By Proposition 1 and Theorems 4 and 5, we have the geometry
The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)
Convergence analysis: Douglas–Rachford
Consider the monotone inclusion problem
where A and B are operators and \(A+B\) has a zero. Consider Douglas–Rachford splitting [22, 44]
where \(\alpha >0\) and \(z^0\in \mathcal {H}\) is a starting point. If \(z^\star \) is a fixed point, then \(J_{\alpha B}(z^\star )\) is a zero of \(A+B\) (see tutorial [63, p. 28] or textbook [5, Proposition 26.1]). We can use the Krasnosel’skiĭ–Mann theorem to establish convergence of (DR).
Fact 10
(Theorem 1 [44]) Assume A and B are maximal monotone. For \(\alpha >0\), the iterates of (DR) converge in that \(z^k\rightarrow z^\star \) weakly for some fixed point \(z^\star \).
Proof
By Proposition 1 and Theorems 4 and 5, we have the geometry
Since \({\mathcal {L}}_1\) is SRGfull, Theorem 2 implies \((2J_{\alpha A}I)\) is nonexpansive. By the same reasoning, \((2J_{\alpha B}I)\) is nonexpansive, and, since the composition of nonexpansive operators is nonexpansive, \((2J_{\alpha A}I)(2J_{\alpha B}I)\) is nonexpansive. So (DR) is a fixedpoint iteration with a 1/2averaged operator, and the iteration converges by the Krasnosel’skiĭ–Mann theorem. \(\square \)
When we have further assumptions, we can provide a stronger rate of convergence.
Fact 11
Assume A or B is \(\mu \)strongly monotone and \(\beta \)cocoercive with \(0<\mu<1/\beta <\infty \). For \(\alpha >0\), the iterates of (DR) converge exponentially to the fixed point \(z^\star \) with rate
Proof
If \(S_1\) is \(R_1\)Lipschitz continuous and \(S_2\) is \(R_2\)Lipschitz continuous, then \(S_1S_2\) is \((R_1R_2)\)Lipschitz continuous. If S is RLipschitz continuous, then \(\frac{1}{2}I+\frac{1}{2}S\) is \(\left( \frac{1}{2}+\frac{R}{2}\right) \)Lipschitz continuous. The result follows from these observations and Fact 12, which we state and prove below. \(\square \)
Fact 12
(Theorem 7.2 [28]) Let \(0<\mu<1/\beta <\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}= {\mathcal {M}}_\mu \cap {\mathcal {C}}_\beta \), then \(2J_{\alpha {\mathcal {A}}}I\subseteq {\mathcal {L}}_R\) for
This result is tight in the sense that \(2J_{\alpha {\mathcal {A}}}I\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.
Proof outline We quickly outline the geometric insight while deferring the full proof with precise geometric arguments to the Sect. C in the appendix. We have the geometry
The radius R is obtained with Stewart’s theorem [65]. \(\square \)
As a special case, consider the optimization problem
where f and g are functions (not necessarily differentiable) and a minimizer exists. Then (DR) with \(A=\partial f\) and \(B=\partial g\) can be written as
where \(\alpha >0\) and \(z^0\in \mathcal {H}\) is a starting point. As an aside, the popular method ADMM is equivalent to this instance of Douglas–Rachford splitting [26].
Fact 13
Assume f is \(\mu \)strongly convex and Lsmooth with \(0<\mu<L<\infty \). Assume g is convex, lower semicontinuous, and proper. For \(\alpha >0\), the iterates of (DR) converge exponentially to the fixed point \(z^\star \) with rate
Proof
If \(S_1\) is \(R_1\)Lipschitz continuous and \(S_2\) is \(R_2\)Lipschitz continuous, then \(S_1S_2\) is \((R_1R_2)\)Lipschitz continuous. If S is RLipschitz continuous, then \(\frac{1}{2}I+\frac{1}{2}S\) is \(\left( \frac{1}{2}+\frac{R}{2}\right) \)Lipschitz continuous. The result follows from these observations and Fact 12, which we state and prove below. \(\square \)
Fact 14
(Theorem 1 [29]) Let \(0<\mu<L<\infty \) and \(\alpha \in (0,\infty )\). If \({\mathcal {A}}= \partial {\mathcal {F}}_{\mu ,L}\), then \(2J_{\alpha {\mathcal {A}}}I\subseteq {\mathcal {L}}_R\) for
This result is tight in the sense that \(2J_{\alpha {\mathcal {A}}}I\nsubseteq {\mathcal {L}}_R\) for any smaller value of R.
Proof
By Proposition 2 and Theorems 4 and 5, we have the geometry
The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)
Sum of operators
Given \(z,w\in \mathbb {C}\), define the line segment between z and w as
We say an SRGfull class \({\mathcal {A}}\) satisfies the chord property if
\(z\in {\mathcal {G}}({\mathcal {A}})\backslash \{\infty \}\) implies \([z,{\bar{z}}]\subseteq {\mathcal {G}}({\mathcal {A}})\). See Fig. 6.
Theorem 6
Let \({\mathcal {A}}\) and \({\mathcal {B}}\) be SRGfull classes such that \(\infty \notin {\mathcal {G}}(A)\) and \(\infty \notin {\mathcal {G}}(B)\). Then
If \({\mathcal {A}}\) or \({\mathcal {B}}\) furthermore satisfies the chord property, then
Although we do not pursue this, one can generalize Theorem 6 to allow \(\infty \) by excluding the following exception: if \(\emptyset ={\mathcal {G}}({\mathcal {A}})\) and \(\infty \in {\mathcal {G}}({\mathcal {B}})\), then \(\{\infty \}={\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\).
Proof
We first show \({\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\supseteq {\mathcal {G}}({\mathcal {A}})+{\mathcal {G}}({\mathcal {B}})\). Assume \({\mathcal {G}}({\mathcal {A}})\ne \emptyset \) and \({\mathcal {G}}({\mathcal {B}})\ne \emptyset \) as otherwise there is nothing to show. Let \(z\in {\mathcal {G}}({\mathcal {A}})\) and \(w\in {\mathcal {G}}({\mathcal {B}})\) and let \(A_z\) and \(A_w\) be their corresponding operators as defined in Lemma 1. Then it is straightforward to see that \(A_z+A_w\) corresponds to complex multiplication with respect to \((z+w)\), and \(z+w\in {\mathcal {G}}(A_z+A_w)\subseteq {\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\).
Next, we show \({\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})\subseteq {\mathcal {G}}({\mathcal {A}})+{\mathcal {G}}({\mathcal {B}})\). Consider the case \({\mathcal {G}}({\mathcal {A}})\ne \emptyset \) and \({\mathcal {G}}({\mathcal {B}})\ne \emptyset \). Without loss of generality, assume it is \({\mathcal {A}}\) that satisfies the chord property. Consider \(A+B\in {\mathcal {A}}+{\mathcal {B}}\) such that \(A\in {\mathcal {A}}\) and \(B\in {\mathcal {B}}\) . Consider \((x,u_A+u_B),(y,v_A+v_B)\in A+B\) such that \(x\ne y\), \((x,u_A),(y,v_A)\in A\), and \((x,u_B),(y,v_B)\in B\). Define
(Note that \({\text {Im}}z_A,{\text {Im}}z_B,{\text {Im}}z\ge 0\).) Since
we have \({\text {Re}}z = {\text {Re}}z_A + {\text {Re}}z_B\). Using (1) and the triangle inequality, we have
and using the reverse triangle inequality, we have \({\text {Im}}z\ge {\text {Im}}z_A+{\text {Im}}z_B\). Together, we conclude
and
This shows
where the equality follows from the chord property.
Now, consider the case \({\mathcal {G}}({\mathcal {A}})= \emptyset \) or \({\mathcal {G}}({\mathcal {B}})= \emptyset \) (or both). (We also discuss this degenerate case in Sect. A.3). Assume \({\mathcal {G}}({\mathcal {A}})= \emptyset \) without loss of generality and let \(A\in {\mathcal {A}}\) and \(B\in {\mathcal {B}}\). Then \({\mathrm {dom}(A)}\) is empty or a singleton, and if \(\{x\}={\mathrm {dom}(A)}\) then Ax is a singleton. Therefore \({\mathrm {dom}(A+B)}\subseteq {\mathrm {dom}(A)}\) is empty or a singleton, and if \(\{x\}={\mathrm {dom}(A)}\) then \((A+B)x\) is empty or a singleton since B is singlevalued. Therefore, \({\mathcal {G}}(A+B)=\emptyset \) and we conclude \({\mathcal {G}}({\mathcal {A}}+{\mathcal {B}})=\emptyset \). \(\square \)
Composition of operators
Given \(z\in \mathbb {C}\), define the righthand arc between z and \({\bar{z}}\) as
and the lefthand arc as
We say an SRGfull class \({\mathcal {A}}\) respectively satisfies the leftarc property and rightarc property if \(z\in {\mathcal {G}}({\mathcal {A}})\backslash \{\infty \}\) implies \(\mathrm {Arc}^{(z,{\bar{z}})}\subseteq {\mathcal {G}}({\mathcal {A}})\) and \(\mathrm {Arc}^+{(z,{\bar{z}})}\subseteq {\mathcal {G}}({\mathcal {A}})\), respectively. We say \({\mathcal {A}}\) satisfies an arc property if the left or rightarc property is satisfied. See Fig. 7.
Theorem 7
Let \({\mathcal {A}}\) and \({\mathcal {B}}\) be SRGfull classes such that \(\infty \notin {\mathcal {G}}({\mathcal {A}})\), \(\emptyset \ne {\mathcal {G}}({\mathcal {A}})\), \(\infty \notin {\mathcal {G}}({\mathcal {B}})\), and \(\emptyset \ne {\mathcal {G}}({\mathcal {B}})\). Then
If \({\mathcal {A}}\) or \({\mathcal {B}}\) furthermore satisfies a left or right arc property, then
Although we do not pursue this, one can generalize Theorem 7 to allow \(\emptyset \) and \(\infty \) by excluding the following exceptions: if \(\emptyset ={\mathcal {G}}({\mathcal {A}})\) and \(\infty \in {\mathcal {G}}({\mathcal {B}})\), then \(\{\infty \}={\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\); if \(0\in {\mathcal {G}}({\mathcal {A}})\) and \(\infty \in {\mathcal {G}}({\mathcal {B}})\), then \(\infty \in {\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\); if \(\emptyset = {\mathcal {G}}({\mathcal {A}})\) and \(0\in {\mathcal {G}}({\mathcal {B}})\), then \(\{0\}={\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\) and \(\emptyset ={\mathcal {G}}({\mathcal {B}}{\mathcal {A}})\).
Proof
We first show \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\supseteq {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). Assume \({\mathcal {G}}({\mathcal {A}})\ne \emptyset \) and \({\mathcal {G}}({\mathcal {B}})\ne \emptyset \) as otherwise there is nothing to show. Let \(z\in {\mathcal {G}}({\mathcal {A}})\) and \(w\in {\mathcal {G}}({\mathcal {B}})\) and let \(A_z\) and \(A_w\) be their corresponding operators as defined in Lemma 1. Then it is straightforward to see that \(A_zA_w\) corresponds to complex multiplication with respect to zw, and \(zw\in {\mathcal {G}}(A_zA_w)\subseteq {\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\).
Next, we show \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})\subseteq {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). Let \(A\in {\mathcal {A}}\) and \(B\in {\mathcal {B}}\). Consider \((u,s),(v,t)\in A\) and \((x,u),(y,v)\in B\), where \(x\ne y\). This implies \((x,s),(y,t)\in AB\). Define
Consider the case \(u=v\). Then \(0\in {\mathcal {G}}({\mathcal {B}})\). Moreover, \(s=t\), since A is singlevalued (by the assumption \(\infty \notin {\mathcal {G}}({\mathcal {A}})\)), and \(z=0\). Therefore, \(z=0\in {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\).
Next, consider the case \(u\ne v\). Define
where \(\varphi _A=\angle (st,uv)\) and \(\varphi _B= \angle (uv,xy)\). Consider the case where \({\mathcal {A}}\) satisfies the rightarc property. Using the spherical triangle inequality (further discussed in the appendix) we see that either \(\varphi _A\ge \varphi _B\) and
or \(\varphi _A< \varphi _B\) and
This gives us
That \({\bar{z}}\in {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\) follows from the same argument. That \(z,{\bar{z}}\in {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\) when instead \({\mathcal {B}}\) satisfies the rightarc property follows from the same argument.
Putting everything together, we conclude \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})= {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\) when \({\mathcal {A}}\) or \({\mathcal {B}}\) satisfies the rightarc property. When \({\mathcal {A}}\) satisfies the leftarc property, \({\mathcal {A}}\) satisfies the rightarc property. So
by Theorem 4, and we conclude \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})={\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). When \({\mathcal {B}}\) satisfies the leftarc property, \({\mathcal {B}}\circ (I)\) satisfies the rightarc property. So
by Theorem 4, and we conclude \({\mathcal {G}}({\mathcal {A}}{\mathcal {B}})={\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {B}})\). \(\square \)
We cannot fully drop the arc property from the second part of Theorem 7. Consider the SRGfull operator class \({\mathcal {A}}\) represented by \(h(a,b,c)=ab+c\), which has \({\mathcal {G}}({\mathcal {A}})=\{\pm i\}\). Linear operators on \(\mathbb {R}^3\) representing 90 degrees rotations are in \({\mathcal {A}}\). With this, one can show the strict containment \({\mathcal {G}}({\mathcal {A}}{\mathcal {A}})=\{z\in \mathbb {C}\,\,z=1\}\supset {\mathcal {G}}({\mathcal {A}}){\mathcal {G}}({\mathcal {A}})\).
As a consequence of Theorem 7, the SRGs of operator classes commute under composition even though individual operators, in general, do not commute when an arc property is satisfied. Several results in operator theory involving 2 operators exhibit previously unexplained symmetry. The Ogura–Yamada–Combettes averagedness factor [17, 57], the contraction factor of Giselsson [28], the contraction factor of Moursi and Vandenberghe [51], the contraction factor of Ryu, Taylor, Bergeling, and Giselsson [64] are all symmetric in the assumptions of the two operators. Theorem 7 shows that this symmetry is not a coincidence.
Convergence analysis: alternating projections
Consider the convex feasibility problem
where \(C\subseteq {\mathcal {H}}\) and \(D\subseteq {\mathcal {H}}\) are nonempty closed convex sets and \(C\cap D\ne \emptyset \). Consider the alternating projections method [55, Theorem 13.7]
where \(P_C\) and \(P_D\) are projections onto C and D and \(x_0\in {\mathcal {H}}\) is a starting point.
Fact 15
The iterates of (AP) converge in that \(x^k\rightarrow x^\star \) weakly for some \(x^\star \in C\cap D\).
Proof
By [5, Proposition 4.16], \(P_C\) and \(P_D\) are 1/2averaged. By Fact 16, which we state and prove below, \(P_CP_D\) is 2/3averaged, and the iteration converges by the Krasnosel’skiĭ–Mann theorem. \(\square \)
Fact 16
(Propsition 4.42 [5]) Let \({\mathcal {N}}_{1/2}\) be the class of firmly nonexpansive operators. Then
(containment is strict.) Furthermore,
In Fact 16, the precise characterization \({\mathcal {G}}({\mathcal {N}}_{1/2}{\mathcal {N}}_{1/2})\) is new, but \({\mathcal {N}}_{1/2}{\mathcal {N}}_{1/2}\subset {\mathcal {N}}_{2/3}\) is known.
Proof outline We quickly outline the geometric insight while deferring the full proof with precise geometric arguments to the Sect. C in the appendix.
Define
and
In geometric terms, this construction takes a point on the circle C, draws the disk whose diameter is the line segment between this point and the origin, and takes the union of such disks. \(S={\mathcal {G}}({\mathcal {N}}_{1/2}){\mathcal {G}}({\mathcal {N}}_{1/2})\) follows from Theorem 7.
To show \(S=\left\{ re^{i\varphi }\,\,0\le r\le \cos ^2(\varphi /2)\right\} \), we analyze S in the inverted space. Write \({\mathcal {I}}:\overline{\mathbb {C}}\rightarrow \overline{\mathbb {C}}\) for the mapping \({\mathcal {I}}(z)={\bar{z}}^{1}\).
The union of the halfspaces \({\mathcal {I}}(S)=\bigcup _{0\le \varphi _1\le 2\pi }{\mathcal {I}}(S_{\varphi _1})\) forms a parabola.
Find the largest circle tangent to the parabola at point 1 and invert back.
The largest circle to the left of the parabola is inverted to the smallest circle (i.e., tight averagedness circle) containing the SRG. The known formula \(r(\varphi )\le \cos ^2(\varphi /2)\) describes the parabola under the inversion mapping. \(\square \)
Tightness and constructing lower bounds
An advantage of geometric proofs is that tightness is often immediate. In the proof of Fact 12, for example, it is clear that finding a smaller circle containing the SRG is not possible. Consequently, the rate of Fact 11 cannot be improved.
However, although tightness is proved with the geometric arguments, sometimes one may wish to construct an explicit counterexample achieving the tight rate. This can be done by picking the extreme point on the complex plane, finding a corresponding \(2\times 2\) matrix with Lemma 1, and reverse engineering the proof.
In the setup of Fact 12,
the extreme point z corresponds to the complex number
Lemma 1 provides a corresponding operator \(A_z:\mathbb {R}^2\rightarrow \mathbb {R}^2\)
In the proof, the depicted geometry was obtained through the transformations \(A\mapsto I+\alpha A\), \(A\mapsto A^{1}\), and \(A\mapsto 2AI\). We revert the transformations by applying \(A\mapsto \frac{1}{2}I+\frac{1}{2}A\), \(A\mapsto A^{1}\), and \(A\mapsto \frac{1}{\alpha }(AI)\) and define \(A:\mathbb {R}^2\rightarrow \mathbb {R}^2\) a as
(We do not show the individual entries as they are very complicated.) Finally, if \(B=0\), then the fixedpoint iteration
converges at the exact rate given by Fact 11.
If \(\mathcal {A}\) is SRGfull and \(z\in \mathcal {G}( \mathcal {A})\), then there is an operator A on \(\mathbb {R}^2\) constructed such that \(\{z,{\overline{z}}\}=\mathcal {G}(A)\) so explicit counterexamples providing the lower bounds can be constructed in \(\mathbb {R}^2\). When an operator class is not SRGfull, counter examples still exist, but they may not be in \(\mathbb {R}^2\).
Insufficiency of metric subregularity for linear convergence
Recently, there has been much interest in analyzing optimization methods under assumptions weaker than strong convexity or strong monotonicity. One approach is to assume metric subregularity in place of strong monotonicity and establish linear convergence.
In this section, we show that it is not always possible to replace strong monotonicity with metric subregularity. In particular, we show impossibility results proving the insufficiency of metric subregularity in establishing linear convergence for certain setups where strong monotonicity is sufficient.
Inverse Lipschitz continuity and metric subregularity
Let
be the class of inverse Lipschitz continuous operators with parameter \(\gamma \in (0,\infty )\), which has the SRG
It is clear that inverse Lipschitz continuity is weaker than strong monotonicity in the sense that \(A\in {\mathcal {M}}_{1/\gamma }\) implies \( A\in {\mathcal {L}}^{1}_{\gamma }\).
An operator \(A:\mathcal {H}\rightrightarrows \mathcal {H}\) is \(\gamma \)metrically subregular at \(x_0\) for \(y_0\) if \(y_0\in Ax_0\) and there exists a neighborhood V of \(x_0\) such that
Although not necessarily obvious from first sight, metric subregularity is weaker than inverse Lipschitz continuity, i.e., \(A\in {\mathcal {L}}^{1}_{\gamma }\) implies A is metrically subregular at x for y with parameter \(\gamma \), for any \((x,y)\in A\).
Metric subregularity of A is equivalent to “calmness” of \(A^{1}\) [20], and calmness is also known as “Upper Lipschitz continuity” [61]. For subdifferential operators of convex functions, metric subregularity is equivalent to the “error bound condition” [23]. See [21] for an indepth treatment of this subject.
Metric subregularity has been used in place of strong monotonicity to establish linear convergence for a wide range of setups. Leventhal [41, Theorem 3.1] used metric subregularity for the proximal point method; Bauschke, Noll, and Phan [6, Lemma 3.8] and Liang, Fadili, and Peyré [42, Theorem 3] for the Krasnoselskii–Mann iteration; Latafat and Patrinos [40, Theorem 3.3] for their splitting method AFBA; Ye et al. [70] for the proximal gradient method, the proximal alternating linearized minimization algorithm, and the randomized block coordinate proximal gradient method; and Yuan, Zeng, and Zhang for ADMM, DRS, and PDHG [71]. See [11, 23, 36, 53, 72] for a systematic study of this subject. Although most recent work concerns sufficiency of metric subregularity or related assumptions in establishing linear convergence, Zhang [72] studied the necessary and sufficient conditions.
Impossibility proofs
Douglas–Rachford splitting (DRS) is known to be a strict contraction under the combined assumption of Lipschitz continuity and strong monotonicity: [44, Proposition 4], [30, Theorem 4.1], [19, Table 1 under \(A=B=I\)], [18, Theorems 5–7], [28, Theorem 6.3], and [64, Theorem 4]. Is it possible to establish linear convergence with Lipschitz continuity and metric subregularity or a variation of metric subregularity? Then answer is no in the sense of Corollaries 1 and 2.
Define the DRS operator with respect to operators A and B with parameters \(\alpha \) and \(\theta \) as
and the class of DRS operators as
Define \(T(B,A,\alpha ,\theta )\) and \(T({\mathcal {B}},{\mathcal {A}},\alpha ,\theta )\) analogously.
Theorem 8
Let \(0<1/\gamma \le L<\infty \) and \(\alpha \in (0,\infty )\). Let \({\mathcal {A}}={\mathcal {M}}\cap {\mathcal {L}}_L\cap ({\mathcal {L}}_\gamma )^{1}\) and \({\mathcal {B}}={\mathcal {M}}\). Then for any \(\theta \ne 0\)
for any \(\varepsilon >0\). The same conclusion holds for \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\).
Proof
We have the geometry
Note the line segment \({\overline{AB}}\) is mapped to the (minor) arc . Using Theorem 7, we have
because is on the unit circle, and since \({\mathcal {G}}(2J_{\alpha {\mathcal {B}}}I)=\{z\,\,z\le 1\}\). So we have \(1\in {\mathcal {G}}(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}}))\), but \(1\notin {\mathcal {G}}({\mathcal {L}}_{1\varepsilon })\) for any \(\varepsilon >0\). Therefore, \({\mathcal {G}}(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}}))\nsubseteq {\mathcal {G}}({\mathcal {L}}_{1\varepsilon })\) and, with Theorem 2, we conclude \(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\nsubseteq {\mathcal {L}}_{1\varepsilon }\). The result for the operator \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\) follows from similar reasoning. \(\square \)
Corollary 1
Let \(0<1/\gamma \le L<\infty \) and \(\alpha \in (0,\infty )\). Let \(B\in {\mathcal {M}}\) and let \(A\in {\mathcal {M}}\cap {\mathcal {L}}_L\) satisfy a condition weaker than or equal to \(\gamma \)inverse Lipschitz continuity, such as \(\gamma \)metric subregularity. It is not possible to establish a strict contraction of the DRS operators \(D_{\alpha ,\theta }(A,B)\) or \(D_{\alpha ,\theta }(B,A)\) for any \(\alpha >0\) and \(\theta \ne 0\) without further assumptions.
Theorem 9
Let \(\gamma ,L,\alpha \in (0,\infty )\). Let \({\mathcal {A}}={\mathcal {M}}\cap ({\mathcal {L}}_\gamma )^{1}\) and \({\mathcal {B}}={\mathcal {M}}\cap {\mathcal {L}}_L\). If \(1/\gamma \le L\) and \(\theta \ne 0\), then
for any \(\varepsilon >0\). If \(1/\gamma > L\) and \(\theta \in (0,1)\), then \(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\subseteq {\mathcal {L}}_R\) for
This result is tight in the sense that \(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}})\nsubseteq {\mathcal {L}}_R\) for any smaller value of R. The same conclusion holds for \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\).
Proof
Consider the case \(\alpha /\gamma <1\) and \(\alpha L<1\). We have
Let and let \(\overline{S_A}\) as the region bounded by . These sets provide an inner and outer bound of \({\mathcal {G}}(2J_{\alpha {\mathcal {A}}}1)\) in the sense that
Note that \(J_{\alpha {\mathcal {A}}}\) satisfies the leftarc property. By the law of cosines, we have
Likewise, we have
Let as the circular sector bounded by . Again, we have
and
Using the arccosine sum identity [2, p. 80, 4.4.33], we get
When \(1/\gamma \le L\), we have \(\varphi _A\le \varphi _B\). In this case,
Therefore
but \(1\notin {\mathcal {G}}({\mathcal {L}}_{1\varepsilon })\) for any \(\varepsilon >0\). Therefore, we conclude \({\mathcal {G}}(D_{\alpha ,\theta }({\mathcal {A}},{\mathcal {B}}))\nsubseteq {\mathcal {G}}({\mathcal {L}}_{1\varepsilon })\).
When \(1/\gamma > L\), we have \(\varphi _A>\varphi _B\). In this case,
Using the outer bounds \(\overline{S_A}\) and \(\overline{S_B}\) we establish correctness. Using the inner bounds \(\underline{S_A}\) and \(\underline{S_B}\) we establish tightness.
With the Pythagorean theorem, we can verify that the containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class.
The result for the cases \(\alpha /\gamma \ge 1\) or \(\alpha L\ge 1\) and for the operator \(D_{\alpha ,\theta }({\mathcal {B}},{\mathcal {A}})\) follows from similar reasoning. \(\square \)
Corollary 2
Let \(0<1/\gamma \le L<\infty \) and \(\alpha \in (0,\infty )\). Let \(B\in {\mathcal {M}}\cap {\mathcal {L}}_L\) and let \(A\in {\mathcal {M}}\) satisfy a condition weaker than or equal to \(\gamma \)inverse Lipschitz continuity, such as \(\gamma \)metric subregularity. It is not possible to establish a strict contraction of the DRS operators \(T(A,B,\alpha ,\theta )\) or \(T(B,A,\alpha ,\theta )\) for any \(\alpha >0\) and \(\theta \in \mathbb {R}\) without further assumptions.
Conclusion
In this work, we presented the scaled relative graph, a tool that maps the action of an operator to the extended complex plane. This machinery enables us to analyze nonexpansive and monotone operators with geometric arguments. The geometric ideas should complement the classical analytical approaches and bring clarity.
Extending this geometric framework to more general setups and spaces is an interesting future direction. Some fixedpoint iterations, such as the power iteration of nonsymmetric matrices [49] or the Bellman iteration [10], are analyzed most effectively through notions other than the norm induced by the inner product (the Euclidean norm for finitedimensional spaces). Whether it is possible to gain insight through geometric arguments in such setups would be worthwhile to investigate.
Notes
The winding number of a closed curve in the plane around a given point is an integer representing the total number of times that curve travels counterclockwise around the point. A definition, based on the complex analysis and the Cauchy residue theorem can be found in Section 4.1 of [2].
References
Ablowitz, M.J., Fokas, A.S.: Complex Variables: Introduction and Applications, 2nd edn. Cambridge University Press, Cambridge (2003)
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York (1964)
Banach, S.: Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundam. Math. 3(1), 133–181 (1922)
Banjac, G., Goulart, P.J.: Tight global linear convergence rate bounds for operator splitting methods. IEEE Trans. Autom. Control 63(12), 4126–4139 (2018)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)
Bauschke, H.H., Noll, D., Phan, H.M.: Linear and strong convergence of algorithms involving averaged nonexpansive operators. J. Math. Anal. Appl. 421(1), 1–20 (2015)
Bauschke, H.H., Wang, X.: Firmly nonexpansive and Kirszbraun–Valentine extensions: a constructive approach via monotone operator theory. In: Nonlinear Analysis and Optimization I: Nonlinear Analysis, pp. 55–64. American Mathematics Society (2010)
Bauschke, H.H., Wang, X., Yao, L.: General resolvents for monotone operators: characterization and extension. In: Biomedical Mathematics: Promising Directions in Imaging, Therapy Planning, and Inverse Problems, pp. 57–74. Medical Physics Publishing (2010)
Beck, A.: FirstOrder Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2017)
Bellman, R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci. 38(8), 716–719 (1952)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of firstorder descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Brezis, H., Lions, P.L.: Produits infinis de resolvantes. Israel J. Math. 29(4), 329–345 (1978)
BriceñoArias, L.M., Davis, D.: Forward–backwardhalf forward algorithm for solving monotone inclusions. SIAM J. Optim. 28(4), 2839–2871 (2018)
Bruck, R.E.: On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in hilbert space. J. Math. Anal. Appl. 61(1), 159–164 (1977)
Cauchy, M.A.: Méthode générale pour la résolution des systémes d’équations simultanées. Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences 25, 536–538 (1847)
Combettes, P.L., Yamada, I.: Compositions and convex combinations of averaged nonexpansive operators. J. Math. Anal. Appl. 425(1), 55–70 (2015)
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42(3), 783–805 (2017)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2015)
Dontchev, A.L., Rockafellar, R.T.: Regularity and conditioning of solution mappings in variational analysis. SetValued Anal. 12(1), 79–109 (2004)
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings: A View from Variational Analysis, 2nd edn. Springer, New York (2014)
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, MIT (1989)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Numerical Solution of BoundaryValue Problems. NorthHolland, Amsterdam (1983)
Giselsson, P.: Lunds Universitet, lecture notes: largescale convex optimization (2015). http://www.control.lth.se/education/doctorateprogram/largescaleconvexoptimization/. Last visited on 1 Dec 2018
Giselsson, P.: Tight global linear convergence rate bounds for Douglas–Rachford splitting. J. Fixed Point Theory Appl. 19(4), 2241–2270 (2017)
Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2017)
Han, D., Yuan, X.: Convergence analysis of the Peaceman–Rachford splitting method for nonsmooth convex optimization. Optimization Online (2012)
Hannah, R., Yin, W.: Scaled relative graph. UCLA CAM report (2016)
HiriartUrruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms, vol. 2. Springer, Berlin (1993)
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)
Huang, X., Ryu, E.K., Yin, W.: Scaled relative graph of normal matrices. arXiv preprint arXiv:2001.02061 (2019)
Huang, X., Ryu, E.K., Yin, W.: Tight coefficients of averaged operators via scaled relative graph. J. Math. Anal. Appl. 490(1), 124211 (2020)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximalgradient methods under the Polyak–Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) Machine Learning and Knowledge Discovery in Databases (KDD), pp. 795–811. Springer, Berlin (2016)
Kline, M.: Calculus: An Intuitive and Physical Approach, 2nd edn. Wiley, Hoboken (1977)
Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Ekon. Mat. Metod. 12, 747–756 (1976)
Krasnosel’skii, M.A.: Two remarks on the method of successive approximations. Uspekhi Mat. Nauk 10(1), 123–127 (1955)
Latafat, P., Patrinos, P.: Asymmetric forward–backward–adjoint splitting for solving monotone inclusions involving three operators. Comput. Optim. Appl. 68(1), 57–93 (2017)
Leventhal, D.: Metric subregularity and the proximal point method. J. Math. Anal. Appl. 360(2), 681–688 (2009)
Liang, J., Fadili, J., Peyré, G.: Convergence rates with inexact nonexpansive operators. Math. Program. 159(1), 403–434 (2016)
Lindelöf, E.: Sur l’applications de la méthode des approximations successives aux équations différentielles ordinaires du premier ordre. Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences 118, 454–456 (1894)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Malitsky, Y., Tam, M.K.: A forwardbackward splitting method for monotone inclusions without cocoercivity. SIAM J. Optim. 30(2), 1451–1472 (2020)
Mann, W.R.: Mean value methods in iteration. Proc. Am. Math. Soc. 4(3), 506–510 (1953)
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Revue Française d’Informatique et de Recherche Opérationnelle, Série Rouge 4(3), 154–158 (1970)
Martinet, B.: Determination approchée d’un point fixe d’une application pseudocontractante. Comptes Rendus de l’Académie des Sciences, Série A 274, 163–165 (1972)
Mises, R.V., PollaczekGeiringer, H.: Praktische verfahren der gleichungsauflösung. Zeitschrift für Angewandte Mathematik und Mechanik 9(2), 152–164 (1929)
Morley, F., Morley, F.V.: Inversive Geometry. G. Bell and Sons, London (1933)
Moursi, W.M., Vandenberghe, L.: Douglas–Rachford splitting for a Lipschitz continuous and a strongly monotone operator. J. Optim. Theory Appl. 183, 179–198 (2019)
Murnaghan, F.D., Wintner, A.: A canonical form for real matrices under orthogonal transformations. Proc. Natl. Acad. Sci. 17(7), 417–420 (1931)
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for nonstrongly convex optimization. Math. Program. 175(1), 69–107 (2018)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, Berlin (2013)
von Neumann, J.: Functional Operators, Volume II. The Geometry of Orthogonal Spaces. Princeton University Press, Princeton (1950)
Newton, I.: De analysi per aequationes numero terminorum infinitas. The Royal Society, London (1669)
Ogura, N., Yamada, I.: Nonstrictly convex minimization over the fixed point set of an asymptotically shrinking nonexpansive mapping. Numer. Funct. Anal. Optim. 23(1–2), 113–137 (2002)
Pedoe, D.: A Course Geometry for Colleges and Universities. Cambridge University Press, Cambridge (1970)
Picard, E.: Mémoire sur la théorie des équations aux dérivées partielles et la méthode des approximations successives. Journal de Mathématiques Pures et Appliquées 4éme Série 6, 145–210 (1890)
Riley, K.F., Hobson, M.P., Bence, S.J.: Mathematical Methods for Physics and Engineering, 3rd edn. Cambridge University Press, Cambridge (2006)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. In: König, H., Korte, B., Ritter, K. (eds.) Mathematical Programming at Oberwolfach, pp. 206–214. Springer, Berlin (1981)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Ryu, E.K., Boyd, S.: Primer on monotone operator methods. Appl. Comput. Math. 15, 3–43 (2016)
Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. SIAM J. Optim. 30(3), 2251–2271 (2020)
Stewart, M.: Some General Theorems of Considerable Use in the Higher Parts of Mathematics. W. Sands, A. Murray, and J. Cochran, London (1746)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worstcase performance of firstorder methods. Math. Program. 161(1), 307–345 (2017)
Trefethen, L., Embree, M.: Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton University Press, Princeton (2005)
Tseng, P.: A modified forwardbackward splitting method for maximal monotone mappings. SIAM J. Control Optim. 38(2), 431–446 (2000)
Wentworth, G., Smith, D.E.: Plane and Solid Geometry. Ginn and Company, London (1913)
Ye, J., Yuan, X., Zeng, S., Zhang, J.: Variational analysis perspective on linear convergence of some first order methods for nonsmooth convex optimization problems. Optimization (2018) (Online Preprint)
Yuan, X., Zeng, S., Zhang, J.: Discerning the linear convergence of ADMM for structured convex optimization through the lens of variational analysis. JMLR 21(83):1–75, (2020)
Zhang, H.: New analysis of linear convergence of gradienttype methods via unifying error bound conditions. Math. Program. 180(1), 371–416 (2019)
Acknowledgements
We thank Pontus Giselsson for the illuminating discussion on metric subregularity. We thank Minyong Lee and Yeoil Yoon for helpful discussions on inversive geometry and its use in high school mathematics competitions. We thank Xinmeng Huang for the aid in drawing Figure 5. We thank the Erwin Schrödinger Institute and the organizers of its workshop in February 2019, which provided us with fruitful discussions and helpful feedback that materially improved this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was partially supported by AFOSR MURI FA95501810502, NSF Grant DMS1720237, ONR Grant N000141712162, the New Faculty Startup Fund from Seoul National University, the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MSIP) [No. 2020R1F1A1A01072877], and the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MSIP) [No. 2017R1A5A1015626].
Appendices
Further discussion
The role of maximality
A fixedpoint iteration \(x^{k+1}=Tx^k\) becomes undefined if its iterates ever escape the domain of T. This is why we assume the monotone operators are maximal as maximality ensures \({\mathrm {dom}( J_{\alpha A})}=\mathcal {H}\). The results of this work are otherwise entirely independent of the notion of maximality.
In Sect. 2, we define \({\mathcal {M}}\) to contain all monotone operators (maximal or not). This choice is necessary to make \({\mathcal {M}}\) SRGfull. For the other classes \({\mathcal {L}}_{L}\), \({\mathcal {C}}_{\beta }\), \({\mathcal {M}}_{\mu }\), and \({\mathcal {N}}_\theta \), we make no restriction on the domain or maximality so that they can be SRGfull.
Minkowskitype set notation
Given \(\alpha \in \mathbb {R}\) and sets \(U,V\subseteq \mathcal {H}\), write
Notice that if either U or V is \(\emptyset \), then \(U+V=\emptyset \).
Given \(Z,W\subseteq \mathbb {C}\), write
Given \(\alpha \in \mathbb {C}\), \(\alpha \ne 0\), and \(Z\subseteq \overline{\mathbb {C}}\), write
Given a class of operators \({\mathcal {A}}\) and \(\alpha \ne 0\), write
Given classes of operators \({\mathcal {A}}\) and \({\mathcal {B}}\) and \(\alpha >0\), write
SRGfull classes
There is one degenerate case to keep in mind for the sake of rigor. The SRGfull class of operators \({\mathcal {A}}_\text {null}\) represented by \(h(a,b,c)=a+b+c\) has \({\mathcal {G}}({\mathcal {A}}_\text {null})=\emptyset \). However, the class \({\mathcal {A}}_\text {null}\) is not itself empty; it contains operators whose graph contains zero or one pair, i.e., \(A\in {\mathcal {A}}_\text {null}\) if and only if we have either a) \({\mathrm {dom}(A)}=\emptyset \) or b) \({\mathrm {dom}(A)}=x\) and \(Ax=\{y\}\) for some \(x,y\in \mathcal {H}\).
Theorem 3 does not apply when the operator classes are not SRGfull. For example, although
we have the strict containment
Invariant circle number
Let \({\mathcal {A}}\) be an SRGfull class such that \({\mathcal {G}}({\mathcal {A}})\ne \emptyset \) and \({\mathcal {G}}({\mathcal {A}})\ne \overline{\mathbb {C}}\). Define the circle number of \({\mathcal {A}}\) as
which is a positive integer or \(\infty \). For example, \({\mathcal {M}}\cap {\mathcal {L}}_1\) has circle number 2 since
In this section, we show that the circle number of an operator class is invariant under certain operations. This is analogous to how the genus or the winding number are topological invariants under homeomorphisms. That it is impossible to continuously deform a donut into a sphere since they have different numbers of holes, an invariant, is a standard argument of topology. The circle number serves as an analogous invariant for operator classes.
Theorem 10
The circle number of an SRGfull operator class is invariant under nonzero pre and postscalar multiplication, addition by identity, and inversion.
Proof
Let T be a onetoone mapping from an operator to an operator and let \(T'\) the the corresponding onetoone mapping from \(\overline{\mathbb {C}}\) to \(\overline{\mathbb {C}}\). In particular, consider the following four cases: first \(T(A)=\alpha A\) and \(T'(z)=\alpha z\), second \(T(A)= A\alpha \) and \(T'(z)=\alpha z\), third \(T(A)= I+A \) and \(T'(z)=1+z\), and fourth \(T(A)= A^{1} \) and \(T'(z)={\bar{z}}^{1}\).
If
then
where \(T'(B_1),\dots ,T'(B_k)\) are each a disk or a halfspace. Therefore, the circle number of \(T({\mathcal {A}})\) satisfies
Since T and \(T'\) are invertible mappings, the argument goes in the other direction as well, and we conclude that the infimums are equal. \(\square \)
Corollary 3
There is no onetoone mapping from \(\mathcal {M}\) to \(\mathcal {M}\cap \mathcal {L}_L\) constructed via pre and postscalar multiplication, addition with the identity operator, and operator inversion.
Such onetoone mappings between operator classes are used for translating a nice result on a simple operator class to another operator class. In [7, 8, 64] the maximal monotone extension theorem was translated to extension theorems of other operator classes. Corollary 3 shows that this approach will not work for \(\mathcal {M}\cap \mathcal {L}_L\), a class of operators considered by the extragradient method [38], forwardbackwardforward splitting [68], and other related methods [14, 45]. In fact, [64] shows that certain simple interpolation condition for \({\mathcal {M}}\) fails for \(\mathcal {M}\cap \mathcal {L}_L\).
Deferred proofs
Fact 17
(Spherical triangle inequality) Any nonzero \(a,b,c\in \mathcal {H}\) satisfies
Figure 8 illustrates the inequality. We use the spherical triangle inequality in Theorem 7 to argue that there is no need to consider a third dimension and that we can continue the analysis in 2D.
Proof of spherical triangle inequality
Although this result is known, we provide a proof for completeness. Without loss of generality, assume a, b, and c are unit vectors. Let \(\theta =\angle (a,b)\) and \(\varphi =\angle (b,c)\), and without loss of generality, assume \(\theta \ge \varphi \). Then we have
where u and v are unit vectors orthogonal to b, and
Since \(\theta ,\varphi \in [0, \pi ]\), we have \(\sin \theta \sin \varphi \ge 0\). Since \(\Vert u\Vert =\Vert v\Vert =1\), we have \(\langle u,v\rangle  \le 1\). Therefore
Since \(\cos (\alpha \pm \beta )=\cos \alpha \cos \beta \mp \sin \alpha \sin \beta \), [2, p. 72, 4.3.17] we have
and we conclude
\(\square \)
Proof of Fact 7
First consider the case \(\mu <1/(2\beta )\). By Proposition 1 and Theorem 4, we have the geometry
To clarify, O is the center of the circle with radius \({\overline{OB}}\) (lighter shade) and C is the center of the circle with radius \({\overline{AC}}={\overline{CB}}\) defining the inner region (darker shade). With two applications of the Pythagorean theorem, we get
Since \(\overline{B'B}\) is a chord of circle O, it is within the circle. Since 2 nonidentical circles intersect at at most 2 points, and since A is within circle O, arc is within circle O. Finally, the region bounded by (darker shade) is within circle O (lighter shade).
In the cases \(\mu =1/(2\beta )\) and \(\mu >1/(2\beta )\), we have a slightly different geometry, but the same arguments and calculations hold.
The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)
We quickly state Stewart’s theorem [65], which we use for the Proof of Fact 12. For a triangle \(\triangle ABC\) and Cevian \({\overline{CD}}\) to the side \({\overline{AB}}\),
the lengths of the line segments satisfy
Full Proof of Fact 12 By Proposition 1 and Theorems 4 and 5, we have the geometry
A closer look gives us
To clarify, B is the center of the circle with radius \({\overline{BA}}\) and C is the center of the circle with radius \({\overline{CA}}\). By Stewart’s theorem [65], we have
Since 2 nonidentitcal circles intersect at at most 2 points, and since D is within circle B, arc is within circle O. By the same reasoning, arc is within circle O. Finally, the region bounded by (darker shade) is within circle O (lighter shade).
The containment holds for R and fails for smaller R. Since \({\mathcal {L}}_R\) is SRGfull by Theorem 2, the containment of the SRG in \(\overline{\mathbb {C}}\) equivalent to the containment of the class. \(\square \)
Proof of Fact 16
Let
First, we show \(Q=[0,1]C=[0,1]Q\). To clarify, [0, 1] is the set of real numbers between 0 and 1 and [0, 1]C and [0, 1]Q are Minkowski products of sets of complex numbers. Given any point \(A\in Q\backslash C\), define \(A'\) as the nonzero intersection of the line extending \({\overline{OA}}\) and circle C. Since A is on the line and inside the circle, the nonzero intersection \(A'\) exists.
Since \(A\in \overline{OA'}\subseteq [0,1]C\), we have \(Q\subseteq [0,1]C\). On the other hand, \(Q\supseteq [0,1]C\) follows from noting that given any point \(A'\) on C, the line segment \(\overline{OA'}\) is a chord of the circle C and therefore is within the disk Q. Therefore, \(Q=[0,1]C\). As a corollary, we have \(Q=[0,1]C=([0,1][0,1])C=[0,1]([0,1]C)=[0,1]Q\).
Next, define
In geometric terms, this construction takes a point on the circle C, draws the disk whose diameter is the line segment between this point and the origin, and takes the union of such disks.
The dashed circle is the unit circle. The solid circle is C. The shaded circles represent instances of \(S_{\varphi _1}\). We can characterize \({\mathcal {G}}({\mathcal {N}}_{1/2}{\mathcal {N}}_{1/2})\) by analyzing this construction since
by Proposition 1 and Theorem 7.
We now show \(S=\left\{ re^{i\varphi }\,\,0\le r\le \cos ^2(\varphi /2)\right\} \). This fact is known and can be analytically derived through the envelope theorem [60, Exercise 5.22]. We provide a geometric proof, which was inspired by [37, Exercise 4.15].
Throughout this proof, we write \({\mathcal {I}}:\overline{\mathbb {C}}\rightarrow \overline{\mathbb {C}}\) for the mapping \({\mathcal {I}}(z)={\bar{z}}^{1}\). We map S into the inverted space, i.e., we analyze
Again, \({\mathcal {I}}(z)={\bar{z}}^{1}\). The dashed circle, the unit circle, is mapped onto itself. Circle C, the solid circle, is mapped to \({\mathcal {I}}(C)\), the verticle line going through 1. Each shaded circle \(S_{\varphi _1}\) is mapped to a halfspace \({\mathcal {I}}(S_{\varphi _1})\). Let point A be the nonzero intersection between C and the boundary of \(S_{\varphi _1}\). Then point \({\mathcal {I}}(A)\) is the noninfinite intersection between \({\mathcal {I}}(C)\) and the boundary of \({\mathcal {I}}(S_{\varphi _1})\). By construction, \({\overline{OA}}\) is the diameter of \(S_{\varphi _1}\). The (infinite) line containing O, A, and \({\mathcal {I}}(A)\) is mapped onto itself, excluding the origin. Since \({\mathcal {I}}\) is conformal, the right angle at A between the boundary of \(S_{\varphi _1}\) and the diameter \({\overline{OA}}\) is mapped to a right angle between boundary of \({\mathcal {I}}(S_{\varphi _1})\) and \(\overline{O{\mathcal {I}}(A)}\).
Next, we show that the union of the halfspaces is described by a parabola. Define line D as the vertical line going through 2. Consider any point \(A'\) on the line \({\mathcal {I}}(C)\). Consider the line through \(A'\) perpendicular to \(\overline{OA'}\) and the halfspace to the right of the line including \(\infty \). Define point E as the intersection of line D and the line extending \({\overline{OA}}\). Draw a line through point E perpendicular to line D, and define point B as the intersection of this line with the boundary of the halfspace. Since E is within the halfspace and the boundary of the halfspace is not horizontal, this intersection exists and B is to the left of E. Since \(\overline{OA'}=\overline{A'E}\), we have \(\triangle BA'O\cong \triangle BA'E\) by the sideangleside (SAS) congruence, and \({\overline{OB}}={\overline{BE}}\). The union of all such points corresponding to B forms a parabola. with directrix D, focus (0, 0) and vertex (1, 0).
The boundary of the halfspace is tangent to the parabola at B, i.e., the line intersects the parabola at no other point. To see why, consider any other point \(B'\) on the boundary of the halfspace. Then \(\overline{OB'}=\overline{B'E}\) by SAS congruence. However, \(\overline{B'E}\) is not perpendicular to D, i.e., \(\overline{B'E}\) is not a horizontal line. Therefore \(\overline{OB'}\) is longer than the distance of \(B'\) to D, and therefore \(B'\) is not on the parabola. Since each halfspace is tangent to the parabola at B, all points to the right of B are in \({\mathcal {I}}(S)\) and no points strictly to the left of the parabola are in \({\mathcal {I}}(S)\). Therefore \({\mathcal {I}}(S)\) is characterized by the closed region to the right of the parabola including \(\infty \).
The region exterior to the circle centered at \(1\) with radius 2 contains the region towards the right of the parabola. This is easily verified with calculus. The circle with the lighter shade corresponds to \({\mathcal {G}}({\mathcal {N}}_{2/3})\) by Proposition 1. Since \({\mathcal {N}}_{2/3}\) is SRGfull by Theorem 2, strict containment of the SRG in \(\overline{\mathbb {C}}\) implies strict containment of the class. The inverse curve of the parabola with the focus as the center of inversion is known as the cardioid and it has the polar coordinate representation \(r(\varphi )\le \cos ^2(\varphi /2)\). The expression of the Theorem is the region bounded by this curve. \(\square \)
Rights and permissions
About this article
Cite this article
Ryu, E.K., Hannah, R. & Yin, W. Scaled relative graphs: nonexpansive operators via 2D Euclidean geometry. Math. Program. 194, 569–619 (2022). https://doi.org/10.1007/s1010702101639w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1010702101639w
Keywords
 Fixedpoint iteration
 Euclidean geometry
 Inversive geometry
 Contraction mapping
 Douglas–Rachford splitting
 Metric subregularity
 Monotone operator
 Douglas–Rachford splitting
Mathematics Subject Classification
 47H05
 47H09
 51M04
 90C25
 49M27