On the asymptotic behavior of the Douglas–Rachford and proximal-point algorithms for convex optimization

Banjac, Goran; Lygeros, John

doi:10.1007/s11590-021-01706-3

On the asymptotic behavior of the Douglas–Rachford and proximal-point algorithms for convex optimization

Original Paper
Open access
Published: 04 February 2021

Volume 15, pages 2719–2732, (2021)
Cite this article

Download PDF

You have full access to this open access article

Optimization Letters Aims and scope Submit manuscript

On the asymptotic behavior of the Douglas–Rachford and proximal-point algorithms for convex optimization

Download PDF

2326 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Banjac et al. (J Optim Theory Appl 183(2):490–519, 2019) recently showed that the Douglas–Rachford algorithm provides certificates of infeasibility for a class of convex optimization problems. In particular, they showed that the difference between consecutive iterates generated by the algorithm converges to certificates of primal and dual strong infeasibility. Their result was shown in a finite-dimensional Euclidean setting and for a particular structure of the constraint set. In this paper, we extend the result to real Hilbert spaces and a general nonempty closed convex set. Moreover, we show that the proximal-point algorithm applied to the set of optimality conditions of the problem generates similar infeasibility certificates.

Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates

Article 20 November 2017

An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods

Article 23 September 2016

Convergence and rate of convergence of some greedy algorithms in convex optimization

Article 01 May 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Due to its very good practical performance and ability to handle nonsmooth functions, the Douglas–Rachford algorithm has attracted a lot of interest for solving convex optimization problems. Provided that a problem is solvable and satisfies certain constraint qualification, the algorithm converges to an optimal solution [1, Cor. 27.3]. If the problem is infeasible, then some of its iterates diverge [2].

Results on the asymptotic behavior of the Douglas–Rachford algorithm for infeasible problems are very scarce, and most of them study some specific cases such as feasibility problems involving two convex sets that do not intersect [3,4,5]. Although there have been some recent results studying a more general setting [6, 7], they impose some additional assumptions on feasibility of either the primal or the dual problem. The authors in [8] consider a problem of minimizing a convex quadratic function over a particular constraint set, and show that the iterates of the Douglas–Rachford algorithm generate an infeasibility certificate when the problem is primal and/or dual strongly infeasible. A similar analysis was applied in [9] to show that the proximal-point algorithm used for solving a convex quadratic program can also detect infeasibility.

The constraint set of the problem studied in [8] is represented in the form $Ax\in C$, where A is a real matrix and $C$ the Cartesian product of a convex compact set and a translated closed convex cone. This paper extends the result of [8] to real Hilbert spaces and a general nonempty closed convex set $C$. Moreover, we show that a similar analysis can be used to prove that the proximal-point algorithm for solving the same class of problems generates similar infeasibility certificates.

The paper is organized as follows. We introduce some definitions and notation in the remainder of Sect. 1, and the problem under consideration in Sect. 2. Section 3 presents some supporting results that are essential for generalizing the results in [8]. Finally, Sects. 4 and 5 analyze the asymptotic behavior of the Douglas–Rachford and proximal-point algorithms, respectively, and show that they provide infeasibility certificates for the considered problem.

1.1 Notation

Let $\mathcal {H}$, $\mathcal {H}_1$, $\mathcal {H}_2$ be real Hilbert spaces with inner products $\left\langle {\cdot }\mid {\cdot }\right\rangle $, induced norms $\Vert \,\cdot \,\Vert $, and identity operators ${{\,\mathrm{Id}\,}}$. The power set of $\mathcal {H}$ is denoted by $2^\mathcal {H}$. Let $\mathbb {N}$ denote the set of positive integers. For a sequence $({s_n})_{n\in \mathbb {N}}$, we denote by $s_n\rightarrow s$ ($s_n\rightharpoonup s$) that it converges strongly (weakly) to s and define $\delta s_{n+1}:=s_{n+1}-s_n$.

Let $D$ be a nonempty subset of $\mathcal {H}$ with $\overline{D}$ being its closure. Then $T:D\rightarrow \mathcal {H}$ is nonexpansive if

$$\begin{aligned} (\forall x\in D)(\forall y\in D) \quad \Vert Tx-Ty\Vert \le \Vert x-y\Vert , \end{aligned}$$

and it is $\alpha $-averaged with $\alpha \in ]0,1[$ if there exists a nonexpansive operator $R:D\rightarrow \mathcal {H}$ such that $T=(1-\alpha ){{\,\mathrm{Id}\,}}+ \alpha R$. We denote the range of T by ${{\,\mathrm{ran}\,}}T$. A set-valued operator $B:\mathcal {H}\rightarrow 2^\mathcal {H}$, characterized by its graph

$$\begin{aligned} {{\,\mathrm{gra}\,}}B = \left\{ (x,u)\in \mathcal {H}\times \mathcal {H} \mid u\in Bx \right\} , \end{aligned}$$

is monotone if

$$\begin{aligned} \left( \forall (x,u)\in {{\,\mathrm{gra}\,}}B \right) \left( \forall (y,v)\in {{\,\mathrm{gra}\,}}B \right) \quad \left\langle {x-y}\mid {u-v}\right\rangle \ge 0. \end{aligned}$$

The inverse of B, denoted by $B^{-1}$, is defined through its graph

$$\begin{aligned} {{\,\mathrm{gra}\,}}B^{-1}= \left\{ (u,x)\in \mathcal {H}\times \mathcal {H} \mid (x,u)\in {{\,\mathrm{gra}\,}}B \right\} . \end{aligned}$$

For a proper lower semicontinuous convex function $f:\mathcal {H}\rightarrow ]-\infty ,+\infty ]$, we define its:

For a nonempty closed convex set $C\subseteq \mathcal {H}$, we define its:

2 Problem of interest

Consider the following convex optimization problem:

(1)

with $Q:\mathcal {H}_1\rightarrow \mathcal {H}_1$ a monotone self-adjoint bounded linear operator, $q\in \mathcal {H}_1$, $A:\mathcal {H}_1\rightarrow \mathcal {H}_2$ a bounded linear operator, and $C$ a nonempty closed convex subset of $\mathcal {H}_2$; we assume that ${{\,\mathrm{ran}\,}}{Q}$ and ${{\,\mathrm{ran}\,}}{A}$ are closed. The objective function of the problem is convex, continuous, and Fréchet differentiable [1, Prop. 17.36].

When $\mathcal {H}_1$ and $\mathcal {H}_2$ are finite-dimensional Euclidean spaces, problem (1) reduces to the one considered in [8], where the Douglas–Rachford algorithm (which is equivalent to the alternating direction method of multipliers) was shown to generate certificates of primal and dual strong infeasibility. Moreover, the authors proposed termination criteria for infeasibility detection, which are easy to implement and are used in several numerical solvers; see, e.g., [10,11,12]. To prove the main results, they used the assumption that $C$ can be represented as the Cartesian product of a convex compact set and a translated closed convex cone, which was exploited heavily in their proofs. In this paper we extend these results to the case where $\mathcal {H}_1$ and $\mathcal {H}_2$ are real Hilbert spaces, and $C$ is a general nonempty closed convex set.

2.1 Optimality conditions

We can rewrite problem (1) in the form

$$\begin{aligned} \underset{x\in \mathcal {H}_1}{\mathrm{minimize}} \quad \tfrac{1}{2}\left\langle {Qx}\mid {x}\right\rangle + \left\langle {q}\mid {x}\right\rangle + \iota _{C}(Ax). \end{aligned}$$

Provided that a certain constraint qualification holds, we can characterize its solution by [1, Thm. 27.2]

$$\begin{aligned} 0 \in Qx + q + A^* \partial \iota _{C}(Ax), \end{aligned}$$

and introducing a dual variable $y\in \partial \iota _{C}(Ax)$, we can rewrite the inclusion as

$$\begin{aligned} 0 \in \begin{pmatrix} Qx + q + A^* y \\ -y + \partial \iota _{C}(Ax) \end{pmatrix}. \end{aligned}$$

(2)

Introducing an auxiliary variable $z\in C$ and using $\partial \iota _{C}=N_{C}$, we can write the optimality conditions for problem (1) as

$$\begin{aligned}&Ax-z=0 \end{aligned}$$

(3a)

$$\begin{aligned}&Qx + q + A^* y = 0 \end{aligned}$$

(3b)

$$\begin{aligned}&z\in C, \quad y\in N_{C} z. \end{aligned}$$

(3c)

2.2 Infeasibility certificates

The authors in [8] derived the following conditions for characterizing strong infeasibility of problem (1) and its dual:

Proposition 2.1

([8, Prop. 3.1])

(i)
If there exists a $\bar{y}\in \mathcal {H}_2$ such that
$$\begin{aligned} A^* \bar{y} = 0 \quad \text {and}\quad \sigma _{C}(\bar{y}) < 0, \end{aligned}$$
then problem (1) is strongly infeasible.
(ii)
If there exists an $\bar{x}\in \mathcal {H}_1$ such that
$$\begin{aligned} Q\bar{x}=0, \quad A\bar{x}\in \mathrm{rec}\,{C}, \quad \text {and}\quad \left\langle {q}\mid {\bar{x}}\right\rangle < 0, \end{aligned}$$
then the dual of problem (1) is strongly infeasible.

3 Auxiliary results

Fact 3.1

Suppose that $T:\mathcal {H}\rightarrow \mathcal {H}$ is an averaged operator and let $s_0\in \mathcal {H}$, $s_n=T^n s_0$, and $\delta s :=P_{\overline{{{\,\mathrm{ran}\,}}}(T-{{\,\mathrm{Id}\,}})}(0)$. Then

(i)
$\tfrac{1}{n} s_n \rightarrow \delta s$.
(ii)
$\delta s_n \rightarrow \delta s$.

Proof

The first result is [13, Cor. 3] and the second is [14, Cor. 2.3]. $\square $

The following proposition provides essential ingredients for generalizing the results in [8, §5].

Proposition 3.2

Let $({s_n})_{n\in \mathbb {N}}$ be a sequence in $\mathcal {H}$ satisfying $\tfrac{1}{n}s_n\rightarrow \delta s$. Let $D\subseteq \mathcal {H}$ be a nonempty closed convex set and define sequences $({p_n})_{n\in \mathbb {N}}$ and $({r_n})_{n\in \mathbb {N}}$ by

$$\begin{aligned} p_n&:=P_{D} s_n \\ r_n&:=({{\,\mathrm{Id}\,}}- P_{D}) s_n. \end{aligned}$$

Then

(i)
$r_n \in {(\mathrm{rec}\,{D})}^\ominus $.
(ii)
$\tfrac{1}{n} p_n \rightarrow \delta p :=P_{\mathrm{rec}\,{D}}(\delta s)$.
(iii)
$\tfrac{1}{n} r_n \rightarrow \delta r :=P_{{(\mathrm{rec}\,{D})}^\ominus }(\delta s)$.
(iv)
$\lim _{n\rightarrow \infty }\tfrac{1}{n}\left\langle {p_n}\mid {r_n}\right\rangle = \sigma _{D}(\delta r)$.

Proof

(i): Follows from [15, Thm. 3.1].

(ii) and (iii): A related result was shown in [16, Lem. 6.3.13] and [17, Prop. 2.2] in a finite-dimensional setting. Using similar arguments here, together with those in [18, Lem. 4.3], we can only establish the weak convergence, i.e., $\tfrac{1}{n}p_n\rightharpoonup \delta p$. Using Moreau’s decomposition [1, Thm. 6.30], it follows that $\tfrac{1}{n}r_n\rightharpoonup \delta r$ and $\Vert \delta s\Vert ^2=\Vert \delta p\Vert ^2+\Vert \delta r\Vert ^2$. For an arbitrary vector $z\in D$, [1, Thm. 3.16] yields

$$\begin{aligned} \Vert s_n-z\Vert ^2 \ge \Vert p_n-z\Vert ^2 + \Vert r_n\Vert ^2, \quad \forall n\in \mathbb {N}. \end{aligned}$$

Dividing the inequality by $n^2$ and taking the limit superior, we get

$$\begin{aligned} \lim \, \Vert \tfrac{1}{n}s_n\Vert ^2 \ge \overline{\lim } \, (\Vert \tfrac{1}{n}p_n\Vert ^2 + \Vert \tfrac{1}{n}r_n\Vert ^2) \ge \overline{\lim } \, \Vert \tfrac{1}{n}p_n\Vert ^2 + \underline{\lim } \, \Vert \tfrac{1}{n}r_n\Vert ^2, \end{aligned}$$

and thus

$$\begin{aligned} \overline{\lim } \, \Vert \tfrac{1}{n}p_n\Vert ^2 \le \lim \, \Vert \tfrac{1}{n}s_n\Vert ^2 - \underline{\lim } \, \Vert \tfrac{1}{n}r_n\Vert ^2 \le \Vert \delta s\Vert ^2 - \Vert \delta r\Vert ^2 = \Vert \delta p\Vert ^2, \end{aligned}$$

where the second inequality follows from [1, Lem. 2.42]. The inequality above yields $\overline{\lim } \, \Vert \tfrac{1}{n}p_n\Vert \le \Vert \delta p\Vert $, which due to [1, Lem. 2.51] implies $\tfrac{1}{n}p_n\rightarrow \delta p$. Using Moreau’s decomposition, it follows that $\tfrac{1}{n}r_n \rightarrow \delta r$.

(iv): Taking the limit of the inequality

$$\begin{aligned} (\forall n\in \mathbb {N})(\forall \hat{p}\in D) \quad \left\langle {\hat{p}}\mid {\tfrac{1}{n} r_n}\right\rangle \le \sup _{p\in D} \left\langle {p}\mid {\tfrac{1}{n}r_n}\right\rangle , \end{aligned}$$

we obtain

$$\begin{aligned} (\forall \hat{p}\in D) \quad \lim _{n\rightarrow \infty }\left\langle {\hat{p}}\mid {\tfrac{1}{n}r_n}\right\rangle \le \lim _{n\rightarrow \infty }\sup _{p\in D} \left\langle {p}\mid {\tfrac{1}{n}r_n}\right\rangle , \end{aligned}$$

and taking the supremum of the left-hand side over $D$, we get

$$\begin{aligned} \sup _{p\in D}\lim _{n\rightarrow \infty }\left\langle {p}\mid {\tfrac{1}{n}r_n}\right\rangle \le \lim _{n\rightarrow \infty }\sup _{p\in D} \left\langle {p}\mid {\tfrac{1}{n}r_n}\right\rangle . \end{aligned}$$

(4)

From [1, Prop. 6.47], we have

$$\begin{aligned} r_n = s_n - p_n \in N_{D} p_n, \end{aligned}$$

which, due to [1, Thm. 16.29] and the facts that $\iota _{D}^*=\sigma _{D}$ and $\partial \iota _{D}=N_{D}$, is equivalent to

$$\begin{aligned} \tfrac{1}{n}\left\langle {p_n}\mid {r_n}\right\rangle = \sigma _{D}\left( \tfrac{1}{n} r_n \right) . \end{aligned}$$

(5)

Taking the limit of (5) and using (4), we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty }\tfrac{1}{n}\left\langle {p_n}\mid {r_n}\right\rangle = \lim _{n\rightarrow \infty } \sup _{p\in D} \left\langle {p}\mid {\tfrac{1}{n}r_n}\right\rangle \ge \sup _{p\in D}\lim _{n\rightarrow \infty }\left\langle {p}\mid {\tfrac{1}{n}r_n}\right\rangle = \sigma _{D}(\delta r). \end{aligned}$$

Since $p_n\in D$, we also have

$$\begin{aligned} \lim _{n\rightarrow \infty }\tfrac{1}{n}\left\langle {p_n}\mid {r_n}\right\rangle \le \sup _{p\in D} \lim _{n\rightarrow \infty }\left\langle {p}\mid {\tfrac{1}{n}r_n}\right\rangle = \sigma _{D}(\delta r). \end{aligned}$$

The result follows by combining the two inequalities above. $\square $

The results of Prop. 3.2 are straightforward under the additional assumption that $D$ is compact, since then $\mathrm{rec}\,{D}=\{0\}$ and ${(\mathrm{rec}\,{D})}^\ominus =\mathcal {H}$, and thus

$$\begin{aligned} \lim _{n\rightarrow \infty }\tfrac{1}{n} p_n&= \lim _{n\rightarrow \infty }\tfrac{1}{n}P_{D} s_n = 0 = P_{\mathrm{rec}\,{D}}(\delta s) \\ \lim _{n\rightarrow \infty }\tfrac{1}{n} r_n&= \lim _{n\rightarrow \infty }\tfrac{1}{n} (s_n-p_n) = \delta s = P_{{(\mathrm{rec}\,{D})}^\ominus }(\delta s). \end{aligned}$$

Moreover, the compactness of $D$ implies the continuity of $\sigma _{D}$ [1, Example 11.2], and thus taking the limit of (5) yields

$$\begin{aligned} \lim _{n\rightarrow \infty }\tfrac{1}{n}\left\langle {p_n}\mid {r_n}\right\rangle = \lim _{n\rightarrow \infty }\sigma _{D}\left( \tfrac{1}{n}r_n\right) = \sigma _{D}\left( \lim _{n\rightarrow \infty }\tfrac{1}{n}r_n\right) = \sigma _{D}(\delta r). \end{aligned}$$

When $D$ is a (translated) closed convex cone, its recession cone is the cone itself, and the results of Prop. 3.2 can be shown using Moreau’s decomposition and some basic properties of the projection operator; see [8, Lem. A.3 and Lem. A.4] for details.

A result that motivated our generalization of these limits to an arbitrary nonempty closed convex set $D$ is given in [18, Lem. 4.3], where Prop. 3.2(ii) is established in a finite-dimensional setting.

4 Douglas–Rachford algorithm

The Douglas–Rachford algorithm is an operator splitting method, which can be used to solve composite minimization problems of the form

$$\begin{aligned} \underset{w\in \mathcal {H}}{\mathrm{minimize}} \quad f(w) + g(w), \end{aligned}$$

(6)

where f and g are proper lower semicontinuous convex functions. An iteration of the algorithm in application to problem (6) can be written as

$$\begin{aligned} w_n&= {{\,\mathrm{Prox}\,}}_g s_n \\ \tilde{w}_n&= {{\,\mathrm{Prox}\,}}_f ( 2w_n - s_n ) \\ s_{n+1}&= s_n + \alpha ( \tilde{w}_n - w_n ). \end{aligned}$$

where $\alpha \in \,] 0,2 [$ is the relaxation parameter.

If we rewrite problem (1) as

$$\begin{aligned} f(x,z)&= \tfrac{1}{2}\left\langle {Qx}\mid {x}\right\rangle + \left\langle {q}\mid {x}\right\rangle + \iota _{Ax=z}(x,z) \\ g(x,z)&= \iota _{C}(z), \end{aligned}$$

then an iteration of the Douglas–Rachford algorithm takes the following form [8, 10]:

$$\begin{aligned} \tilde{x}_n&= \mathop {\hbox {argmin}}\limits _{x\in \mathcal {H}_1} \big ( \tfrac{1}{2}\!\left\langle {Qx}\mid {x}\right\rangle \!+\! \left\langle {q}\mid {x}\right\rangle \!+\! \tfrac{1}{2}\Vert x \!-\! x_n\Vert ^2 + \tfrac{1}{2}\Vert Ax \!-\! (2P_{C} \!-\! {{\,\mathrm{Id}\,}}) v_n\Vert ^2 \big ) \end{aligned}$$

(7a)

$$\begin{aligned} x_{n+1}&= x_n + \alpha \left( \tilde{x}_n - x_n \right) \end{aligned}$$

(7b)

$$\begin{aligned} v_{n+1}&= v_n + \alpha \left( A\tilde{x}_n - P_{C} v_n \right) \end{aligned}$$

(7c)

We will exploit the following well-known result to analyze the asymptotic behavior of the algorithm [19]:

Fact 4.1

Iteration (7) amounts to

$$\begin{aligned} (x_{n+1},v_{n+1}) = T_\mathrm{DR}(x_n,v_n), \end{aligned}$$

where $T_\mathrm{DR}:(\mathcal {H}_1\times \mathcal {H}_2)\rightarrow (\mathcal {H}_1\times \mathcal {H}_2)$ is an $(\alpha /2)$-averaged operator.

The solution to the subproblem in (7a) satisfies the optimality condition

$$\begin{aligned} Q\tilde{x}_n + q + (\tilde{x}_n - x_n) + A^* \left( A\tilde{x}_n - (2P_{C} - {{\,\mathrm{Id}\,}}) v_n \right) = 0. \end{aligned}$$

(8)

If we rearrange (7b) to isolate $\tilde{x}_n$,

$$\begin{aligned} \tilde{x}_n = x_n + \alpha ^{-1}\delta x_{n+1}, \end{aligned}$$

and substitute it into (7c) and (8), we obtain the following relations between the iterates:

$$\begin{aligned} Ax_n - P_{C} v_n&= -\alpha ^{-1}\left( A\delta x_{n+1} - \delta v_{n+1}\right) \end{aligned}$$

(9a)

$$\begin{aligned} Qx_n + q + A^*({{\,\mathrm{Id}\,}}- P_{C}) v_n&= -\alpha ^{-1}\left( (Q+{{\,\mathrm{Id}\,}})\delta x_{n+1} + A^* \delta v_{n+1} \right) . \end{aligned}$$

(9b)

Let us define the following auxiliary iterates of iteration (7):

$$\begin{aligned} z_n&:=P_{C} v_n \end{aligned}$$

(10a)

$$\begin{aligned} y_n&:=({{\,\mathrm{Id}\,}}- P_{C}) v_n. \end{aligned}$$

(10b)

Observe that the pair $(z_n,y_n)$ satisfies optimality condition (3c) for all $n\in \mathbb {N}$ [1, Prop. 6.47], and that the right-hand terms in (9) indicate how far the iterates $(x_n,z_n,y_n)$ are from satisfying (3a) and (3b).

The following corollary follows directly from Fact 3.1, Prop. 3.2, Fact 4.1, and Moreau’s decomposition [1, Thm. 6.30]:

Corollary 4.2

Let the sequences $({x_n})_{n\in \mathbb {N}}$, $({v_n})_{n\in \mathbb {N}}$, $({z_n})_{n\in \mathbb {N}}$, and $({y_n})_{n\in \mathbb {N}}$ be given by (7) and (10), and $(\delta x, \delta v) :=P_{\overline{{{\,\mathrm{ran}\,}}}(T_\mathrm{DR}-{{\,\mathrm{Id}\,}})}(0)$. Then

(i)
$\tfrac{1}{n} (x_n,v_n) \rightarrow (\delta x,\delta v)$.
(ii)
$(\delta x_n, \delta v_n) \rightarrow (\delta x,\delta v)$.
(iii)
$y_n \in {(\mathrm{rec}\,{C})}^\ominus $.
(iv)
$\tfrac{1}{n} z_n \rightarrow \delta z :=P_{\mathrm{rec}\,{C}}(\delta v)$.
(v)
$\tfrac{1}{n} y_n \rightarrow \delta y :=P_{{(\mathrm{rec}\,{C})}^\ominus }(\delta v)$.
(vi)
$\lim _{n\rightarrow \infty }\tfrac{1}{n}\left\langle {z_n}\mid {y_n}\right\rangle = \sigma _{C}(\delta y)$.
(vii)
$\delta z + \delta y = \delta v$.
(viii)
$\left\langle {\delta z}\mid {\delta y}\right\rangle = 0$.
(ix)
$\Vert \delta z\Vert ^2 + \Vert \delta y\Vert ^2 = \Vert \delta v\Vert ^2$.

The following two propositions generalize [8, Prop. 5.1 and Prop. 5.2], though the proofs follow very similar arguments.

Proposition 4.3

The following relations hold between $\delta x$, $\delta z$, and $\delta y$, which are defined in Cor. 4.2:

(i)
$A \delta x = \delta z$.
(ii)
$Q\delta x = 0$.
(iii)
$A^* \delta y = 0$.
(iv)
$\delta z_n \rightarrow \delta z$.
(v)
$\delta y_n \rightarrow \delta y$.

Proof

(i)
Divide (9a) by n, take the limit, and use Cor. 4.2(iv) to get
$$\begin{aligned} A \delta x = \lim _{n\rightarrow \infty }\tfrac{1}{n}P_{C} v_n = \delta z. \end{aligned}$$
(11)
(ii)
Divide (9b) by n, take the inner product of both sides with $\delta x$ and take the limit to obtain
$$\begin{aligned} \left\langle {Q\delta x}\mid {\delta x}\right\rangle = -\lim _{n\rightarrow \infty } \big \langle A \delta x, \tfrac{1}{n}({{\,\mathrm{Id}\,}}-P_{C}) v_n \big \rangle = -\left\langle {\delta z}\mid {\delta y}\right\rangle = 0, \end{aligned}$$
where we used (11) and Cor. 4.2(v) in the second equality, and Cor. 4.2(viii) in the third. Due to [1, Cor. 18.18], the equality above implies
$$\begin{aligned} Q \delta x = 0. \end{aligned}$$
(12)
(iii)
Divide (9b) by n, take the limit, and use (12) to obtain
$$\begin{aligned} 0 = \lim _{n\rightarrow \infty }\tfrac{1}{n} A^* ({{\,\mathrm{Id}\,}}-P_{C}) v_n = A^* \delta y, \end{aligned}$$
where we used Cor. 4.2(v) in the second equality.
(iv)
Subtracting (9a) at iterations $n+1$ and n, and taking the limit yield
$$\begin{aligned} \lim _{n\rightarrow \infty }\delta z_n = A\delta x = \delta z, \end{aligned}$$
where the second equality follows from (11).
(v)
From (10) we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \delta y_n = \lim _{n\rightarrow \infty } \left( \delta v_n - \delta z_n \right) = \delta v - \delta z = \delta y, \end{aligned}$$

where the last equality follows from Cor. 4.2(vii). $\square $

Proposition 4.4

The following identities hold for $\delta x$ and $\delta y$, which are defined in Cor. 4.2:

(i)
$\left\langle {q}\mid {\delta x}\right\rangle = -\alpha ^{-1}\Vert \delta x\Vert ^2 - \alpha ^{-1}\Vert A\delta x\Vert ^2$.
(ii)
$\sigma _{C}(\delta y) = -\alpha ^{-1}\Vert \delta y\Vert ^2$.

Proof

Take the inner product of both sides of (9b) with $\delta x$ and use (12) to obtain

$$\begin{aligned} \left\langle {q}\mid {\delta x}\right\rangle + \left\langle {A\delta x}\mid {y_n}\right\rangle = - \alpha ^{-1}\left\langle {\delta x}\mid {\delta x_{n+1}}\right\rangle - \alpha ^{-1}\left\langle {A\delta x}\mid {\delta v_{n+1}}\right\rangle . \end{aligned}$$

Taking the limit and using Prop. 4.3(i) and Cor. 4.2(vii) and (viii) give

$$\begin{aligned} \left\langle {q}\mid {\delta x}\right\rangle + \alpha ^{-1}\Vert \delta x\Vert ^2 + \alpha ^{-1}\Vert \delta z\Vert ^2 = -\lim _{n\rightarrow \infty } \left\langle {\delta z}\mid {y_n}\right\rangle \ge 0, \end{aligned}$$

(13)

where the inequality follows from Cor. 4.2(iii) and (iv) as the inner product of terms in $\mathrm{rec}\,{C}$ and ${(\mathrm{rec}\,{C})}^\ominus $ is nonpositive. Now take the inner product of both sides of (9a) with $\delta y$ to obtain

$$\begin{aligned} \left\langle {A^* \delta y}\mid {x_n + \alpha ^{-1}\delta x_{n+1}}\right\rangle - \left\langle {\delta y}\mid {P_{C} v_n}\right\rangle = \alpha ^{-1}\left\langle {\delta y}\mid {\delta v_{n+1}}\right\rangle . \end{aligned}$$

Due to Prop. 4.3(iii), the first inner product on the left-hand side is zero. Taking the limit and using Cor. 4.2(vii) and (viii), we obtain

$$\begin{aligned} -\alpha ^{-1}\Vert \delta y\Vert ^2 = \lim _{n\rightarrow \infty }\left\langle {\delta y}\mid {P_{C} v_n}\right\rangle \le \sup _{z\in C} \left\langle {\delta y}\mid {z}\right\rangle = \sigma _{C}(\delta y), \end{aligned}$$

or equivalently,

$$\begin{aligned} \sigma _{C}(\delta y) + \alpha ^{-1}\Vert \delta y\Vert ^2 \ge 0. \end{aligned}$$

(14)

Summing (13) and (14) and using Cor. 4.2(ix), we obtain

$$\begin{aligned} \left\langle {q}\mid {\delta x}\right\rangle + \sigma _{C}(\delta y) + \alpha ^{-1}\Vert \delta x\Vert ^2 + \alpha ^{-1}\Vert \delta v\Vert ^2 \ge 0. \end{aligned}$$

(15)

Now take the inner product of both sides of (9b) with $x_n$ to obtain

$$\begin{aligned} \left\langle {Qx_n}\mid {x_n}\right\rangle + \left\langle {q}\mid {x_n}\right\rangle + \left\langle {Ax_n}\mid {y_n}\right\rangle =&-\alpha ^{-1}\left\langle {(Q+{{\,\mathrm{Id}\,}})\delta x_{n+1}}\mid {x_n}\right\rangle \\&- \alpha ^{-1}\left\langle {Ax_n}\mid {\delta v_{n+1}}\right\rangle . \end{aligned}$$

Dividing by n, taking the limit, and using Prop. 4.3(i) and (ii) and Cor. 4.2(vii) and (viii) yield

$$\begin{aligned} \lim _{n\rightarrow \infty } \tfrac{1}{n} \left\langle {Qx_n}\mid {x_n}\right\rangle + \left\langle {q}\mid {\delta x}\right\rangle + \lim _{n\rightarrow \infty } \tfrac{1}{n}\left\langle {Ax_n}\mid {y_n}\right\rangle = - \alpha ^{-1}\Vert \delta x\Vert ^2 - \alpha ^{-1}\Vert \delta z\Vert ^2. \end{aligned}$$

We can write the last term on the left-hand side as

$$\begin{aligned} \lim _{n\rightarrow \infty } \tfrac{1}{n}\left\langle {Ax_n}\mid {y_n}\right\rangle&= \lim _{n\rightarrow \infty } \tfrac{1}{n}\left\langle {z_n + \alpha ^{-1}\left( \delta v_{n+1} - A\delta x_{n+1} \right) }\mid {y_n}\right\rangle \\&= \lim _{n\rightarrow \infty } \tfrac{1}{n}\left\langle {z_n}\mid {y_n}\right\rangle + \alpha ^{-1}\Vert \delta y\Vert ^2 \\&= \sigma _{C}(\delta y) + \alpha ^{-1}\Vert \delta y\Vert ^2, \end{aligned}$$

where the first equality follows from (9a), the second from Prop. 4.3(i) and Cor. 4.2(v) and (vii), and the third from Cor. 4.2(vi). Plugging the equality above in the preceding, we obtain

$$\begin{aligned} \left\langle {q}\mid {\delta x}\right\rangle + \sigma _{C}(\delta y) + \alpha ^{-1}\Vert \delta x\Vert ^2 + \alpha ^{-1}\Vert \delta v\Vert ^2 = -\lim _{n\rightarrow \infty } \tfrac{1}{n} \left\langle {Qx_n}\mid {x_n}\right\rangle \le 0, \end{aligned}$$

(16)

where the inequality follows from the monotonicity of Q. Comparing inequalities (15) and (16), it follows that they must be satisfied with equality. Consequently, the left-hand sides of (13) and (14) must be zero. This concludes the proof. $\square $

Given the infeasibility conditions in Prop. 2.1, it follows from Prop. 4.3 and Prop. 4.4 that, if the limit $\delta y$ is nonzero, then problem (1) is strongly infeasible, and similarly, if $\delta x$ is nonzero, then its dual is strongly infeasible. Thanks to the fact that $(\delta y_n, \delta x_n) \rightarrow (\delta y, \delta x)$, we can now extend the termination criteria proposed in [8, §5.2] for the more general case where $C$ is a general nonempty closed convex set. The criteria in [8, §5.2] evaluate conditions given in Prop. 2.1 at $\delta y_n$ and $\delta x_n$, and have already formed the basis for stable numerical implementations [10, 11]. Our results pave the way for similar developments in the more general setting considered here.

5 Proximal-point algorithm

The proximal-point algorithm is a method for finding a vector $w\in \mathcal {H}$ that solves the following inclusion problem:

$$\begin{aligned} 0 \in B(w), \end{aligned}$$

(17)

where $B:\mathcal {H}\rightarrow 2^\mathcal {H}$ is a maximally monotone operator. An iteration of the algorithm in application to problem (17) can be written as

$$\begin{aligned} w_{n+1} = ({{\,\mathrm{Id}\,}}+ \gamma B)^{-1} w_n, \end{aligned}$$

where $\gamma >0$ is the regularization parameter.

Due to [1, Cor. 16.30], we can rewrite (2) as

$$\begin{aligned} 0 \in \mathcal {M}(x,y) :=\begin{pmatrix} Qx + q + A^* y \\ -Ax + \partial \iota _{C}^* (y) \end{pmatrix}, \end{aligned}$$

where $\mathcal {M}:(\mathcal {H}_1\times \mathcal {H}_2)\rightarrow 2^{(\mathcal {H}_1\times \mathcal {H}_2)}$ is a maximally monotone operator [20]. An iteration of the proximal-point algorithm in application to the inclusion above is then

$$\begin{aligned} (x_{n+1},y_{n+1}) = \left( {{\,\mathrm{Id}\,}}+ \gamma \mathcal {M} \right) ^{-1}(x_n,y_n), \end{aligned}$$

(18)

which was also analyzed in [12]. We will exploit the following result [1, Prop. 23.8] to analyze the algorithm:

Fact 5.1

Operator $T_\mathrm{PP} :=({{\,\mathrm{Id}\,}}+ \gamma \mathcal {M})^{-1}$ is the resolvent of a maximally monotone operator and is thus (1/2)-averaged.

Iteration (18) reads

$$\begin{aligned} 0&= x_{n+1} - x_n + \gamma \left( Q x_{n+1} + q + A^* y_{n+1} \right) \end{aligned}$$

(19a)

$$\begin{aligned} 0&\in y_{n+1} - y_n + \gamma \left( -A x_{n+1} + \partial \iota _{C}^* (y_{n+1}) \right) . \end{aligned}$$

(19b)

Inclusion (19b) can be written as

$$\begin{aligned} \gamma A x_{n+1} + y_n \in \left( {{\,\mathrm{Id}\,}}+ \gamma \partial \iota _{C}^* \right) y_{n+1}, \end{aligned}$$

which is equivalent to [1, Prop. 16.44]

$$\begin{aligned} y_{n+1} = {{\,\mathrm{Prox}\,}}_{\gamma \iota _{C}^*} \left( \gamma A x_{n+1} + y_n \right) = \gamma A x_{n+1} + y_n - \gamma P_{C} ( A x_{n+1} \!+\! \gamma ^{-1}y_n), \end{aligned}$$

(20)

where the second equality follows from [1, Thm. 14.3]. Let us define the following auxiliary iterates of iteration (18):

$$\begin{aligned} v_{n+1}&:=Ax_{n+1} + \gamma ^{-1}y_n \end{aligned}$$

(21a)

$$\begin{aligned} z_{n+1}&:=P_{C} v_{n+1}, \end{aligned}$$

(21b)

and observe from (20) that

$$\begin{aligned} y_{n+1} = \gamma ({{\,\mathrm{Id}\,}}-P_{C}) v_{n+1}. \end{aligned}$$

Using (19a) and (20), we now obtain the following relations between the iterates:

$$\begin{aligned} A x_{n+1} - P_{C} v_{n+1}&= \gamma ^{-1}\delta y_{n+1} \end{aligned}$$

(22a)

$$\begin{aligned} Q x_{n+1} + q + \gamma A^* ({{\,\mathrm{Id}\,}}-P_{C}) v_{n+1}&= -\gamma ^{-1}\delta x_{n+1}. \end{aligned}$$

(22b)

Similarly as for the Douglas–Rachford algorithm, the pair $(z_{n+1},y_{n+1})$ satisfies optimality condition (3c) for all $n\in \mathbb {N}$. Observe that the optimality residuals, given by the norms of the left-hand terms in (22), can be computed by evaluating the norms of $\delta y_{n+1}$ and $\delta x_{n+1}$.

The following corollary follows directly from Fact 3.1, Prop. 3.2, and Fact 5.1:

Corollary 5.2

Let the sequences $({x_n})_{n\in \mathbb {N}}$, $({y_n})_{n\in \mathbb {N}}$, $({v_n})_{n\in \mathbb {N}}$, and $({z_n})_{n\in \mathbb {N}}$ be given by (18) and (21), and $(\delta x, \delta y) :=P_{\overline{{{\,\mathrm{ran}\,}}}(T_\mathrm{PP}-{{\,\mathrm{Id}\,}})}(0)$. Then

(i)
$\tfrac{1}{n} (x_n,y_n,v_n) \rightarrow (\delta x,\delta y,A\delta x + \gamma ^{-1}\delta y)$.
(ii)
$(\delta x_n,\delta y_n,\delta v_n) \rightarrow (\delta x,\delta y,A\delta x + \gamma ^{-1}\delta y)$.
(iii)
$y_{n+1}\in {(\mathrm{rec}\,{C})}^\ominus $.
(iv)
$\tfrac{1}{n}z_n\rightarrow \delta z:=P_{\mathrm{rec}\,{C}}(\delta v)$.
(v)
$\delta y=\gamma P_{{(\mathrm{rec}\,{C})}^\ominus }(\delta v)$.
(vi)
$\lim _{n\rightarrow \infty }\tfrac{1}{n}\left\langle {z_n}\mid {y_n}\right\rangle = \sigma _{C}(\delta y)$.

The proofs of the following two propositions follow similar arguments as those in Sect. 4, and are thus omitted.

Proposition 5.3

The following relations hold between $\delta x$, $\delta z$, and $\delta y$, which are defined in Cor. 5.2:

(i)
$A \delta x = \delta z$.
(ii)
$Q\delta x = 0$.
(iii)
$A^* \delta y = 0$.

Proposition 5.4

The following identities hold for $\delta x$ and $\delta y$, which are defined in Cor. 5.2:

(i)
$\left\langle {q}\mid {\delta x}\right\rangle = -\gamma ^{-1}\Vert \delta x\Vert ^2$.
(ii)
$\sigma _{C}(\delta y) = -\gamma ^{-1}\Vert \delta y\Vert ^2$.

The authors in [12] use similar termination criteria to those given in [8, §5.2] to detect infeasibility of convex quadratic programs using the algorithm given by iteration (18), though they do not prove that $\delta y$ and $\delta x$ are indeed infeasibility certificates whenever the problem is strongly infeasible. Identities in (22) show that, when $(\delta y,\delta x)=(0,0)$, the optimality conditions (3) are satisfied in the limit. Otherwise, Prop. 2.1, Prop. 5.3, and Prop. 5.4 imply that problem (1) and/or its dual is strongly infeasible.

Remark 5.5

Weak infeasibility of problem (1) means that the sets ${{\,\mathrm{ran}\,}}{A}$ and $C$ do not intersect, but the distance between them is zero. In such cases, there exists no $\bar{y}\in \mathcal {H}_2$ satisfying the conditions in Prop. 2.1 and the algorithms studied in Sects. 4–5 would yield $\delta y_n \rightarrow \delta y = 0$. A similar reasoning holds for the weak infeasibility of the dual problem for which the algorithms would yield $\delta x_n \rightarrow \delta x = 0$.

References

Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017). https://doi.org/10.1007/978-3-319-48311-5
Book MATH Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992). https://doi.org/10.1007/BF01581204
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Dao, M.N., Moursi, W.M.: The Douglas–Rachford algorithm in the affine-convex case. Oper. Res. Lett. 44(3), 379–382 (2016). https://doi.org/10.1016/j.orl.2016.03.010
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Moursi, W.M.: The Douglas–Rachford algorithm for two (not necessarily intersecting) affine subspaces. SIAM J. Optim. 26(2), 968–985 (2016). https://doi.org/10.1137/15M1016989
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Moursi, W.M.: On the Douglas–Rachford algorithm. Math. Program. 164(1), 263–284 (2017). https://doi.org/10.1007/s10107-016-1086-3
Article MathSciNet MATH Google Scholar
Ryu, E., Liu, Y., Yin, W.: Douglas-Rachford splitting and ADMM for pathological convex optimization. Comput. Optim. Appl. 74, 747–778 (2019). https://doi.org/10.1007/s10589-019-00130-9
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Moursi, W.M.: On the behavior of the Douglas–Rachford algorithm for minimizing a convex function subject to a linear constraint. SIAM J. Optim. 30(3), 2559–2576 (2020). https://doi.org/10.1137/19M1281538
Article MathSciNet MATH Google Scholar
Banjac, G., Goulart, P., Stellato, B., Boyd, S.: Infeasibility detection in the alternating direction method of multipliers for convex optimization. J. Optim. Theory Appl. 183(2), 490–519 (2019). https://doi.org/10.1007/s10957-019-01575-y
Article MathSciNet MATH Google Scholar
Liao-McPherson, D., Kolmanovsky, I.: FBstab: a proximally stabilized semismooth algorithm for convex quadratic programming. Automatica (2020). https://doi.org/10.1016/j.automatica.2019.108801
Article MATH Google Scholar
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12(4), 637–672 (2020). https://doi.org/10.1007/s12532-020-00179-2
Article MathSciNet MATH Google Scholar
Garstka, M., Cannon, M., Goulart, P.: COSMO: a conic operator splitting method for large convex problems. In: European Control Conference (ECC) (2019). https://doi.org/10.23919/ECC.2019.8796161
Hermans, B., Themelis, A., Patrinos, P.: QPALM: a Newton-type proximal augmented Lagrangian method for quadratic programs. In: IEEE Conference on Decision and Control (CDC) (2019). https://doi.org/10.1109/CDC40024.2019.9030211
Pazy, A.: Asymptotic behavior of contractions in Hilbert space. Israel J. Math. 9(2), 235–240 (1971). https://doi.org/10.1007/BF02771588
Article MathSciNet MATH Google Scholar
Baillon, J.B., Bruck, R.E., Reich, S.: On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces. Houston J. Math. 4(1), 1–9 (1978)
MathSciNet MATH Google Scholar
Zarantonello, E.H.: Projections on convex sets in Hilbert space and spectral theory. In: Zarantonello, E.H. (ed.) Contributions to nonlinear functional analysis, pp. 237–424. Academic Press, Cambridge (1971). https://doi.org/10.1016/B978-0-12-775850-3.50013-3
Chapter Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research and Financial Engineering. Springer, New York (2003). https://doi.org/10.1007/b97543
Gowda, M.S., Sossa, D.: Weakly homogeneous variational inequalities and solvability of nonlinear equations over cones. Math. Program. 177, 149–171 (2019). https://doi.org/10.1007/s10107-018-1263-7
Article MathSciNet MATH Google Scholar
Shen, J., Lebair, T.M.: Shape restricted smoothing splines via constrained optimal control and nonsmooth Newton’s methods. Automatica 53, 216–224 (2015). https://doi.org/10.1016/j.automatica.2014.12.040
Article MathSciNet MATH Google Scholar
Lions, P., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979). https://doi.org/10.1137/0716071
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976). https://doi.org/10.1287/moor.1.2.97
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement OCAL, No. 787845.

Funding

Open Access funding provided by ETH Zurich.

Author information

Authors and Affiliations

Automatic Control Laboratory, ETH Zurich, Zurich, Switzerland
Goran Banjac & John Lygeros

Authors

Goran Banjac
View author publications
You can also search for this author in PubMed Google Scholar
John Lygeros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Goran Banjac.

Ethics declarations

Data Availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Banjac, G., Lygeros, J. On the asymptotic behavior of the Douglas–Rachford and proximal-point algorithms for convex optimization. Optim Lett 15, 2719–2732 (2021). https://doi.org/10.1007/s11590-021-01706-3

Download citation

Received: 29 April 2020
Accepted: 15 January 2021
Published: 04 February 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11590-021-01706-3

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the asymptotic behavior of the Douglas–Rachford and proximal-point algorithms for convex optimization

Abstract

Similar content being viewed by others

Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates

An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods

Convergence and rate of convergence of some greedy algorithms in convex optimization

1 Introduction

1.1 Notation

2 Problem of interest

2.1 Optimality conditions

2.2 Infeasibility certificates

Proposition 2.1

3 Auxiliary results

Fact 3.1

Proof

Proposition 3.2

Proof

4 Douglas–Rachford algorithm

Fact 4.1

Corollary 4.2

Proposition 4.3

Proof

Proposition 4.4

Proof

5 Proximal-point algorithm

Fact 5.1

Corollary 5.2

Proposition 5.3

Proposition 5.4

Remark 5.5

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data Availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation