A primal-dual fixed point algorithm for minimization of the sum of three convex separable functions
Abstract
Many problems arising in image processing and signal recovery with multi-regularization and constraints can be formulated as minimization of a sum of three convex separable functions. Typically, the objective function involves a smooth function with Lipschitz continuous gradient, a linear composite nonsmooth function, and a nonsmooth function. In this paper, we propose a primal-dual fixed point (PDFP) scheme to solve the above class of problems. The proposed algorithm for three-block problems is a symmetric and fully splitting scheme, only involving an explicit gradient, a linear transform, and the proximity operators which may have a closed-form solution. We study the convergence of the proposed algorithm and illustrate its efficiency through examples on fused LASSO and image restoration with non-negative constraint and sparse regularization.
Keywords
primal-dual fixed point algorithm convex separable minimization proximity operator sparsity regularization1 Introduction
As far as we know, Combettes and Pesquet first proposed a fully splitting algorithm in [4] to solve monotone operator inclusions problems, which include (1.1) as a special case. Condat [5] tackled the same problem and proposed a primal-dual splitting scheme. Extensions to multi-block composite functions are also discussed in detail. For the special case \(B=I\) (I denotes the usual identity operator), Davis and Yin [6] proposed a three-operator splitting scheme based on monotone operators. For the case that the problem (1.1) reduces to two-block separable functions, many splitting and proximal algorithms have been proposed and studied in the literature. Among them, extensive research have been conducted on the alternating direction of multiplier method (ADMM) [7] (also known as split Bregman [3]; see for example [8] and the references therein). The primal-dual hybrid gradient method (PDHG) [9, 10, 11, 12], also known as the Chambolle-Pock algorithm [11], is another class of popular algorithm, largely adopted in imaging applications. In [13, 14, 15, 16], several completely decoupled schemes, such as the inexact Uzawa solver and primal-dual fixed point algorithm, are proposed to avoid subproblem solving for some typical \(\ell_{1}\) minimization problems. Komodakis and Pesquet [17] recently gave a nice overview of recent primal-dual approaches for solving large-scale optimization problems (1.1). A general class of multi-step fixed point proximity algorithms is proposed in [18], which covers several existing algorithms [11, 12] as special cases. In the preparation of this paper, we notice that Li and Zhang [19] also studied the problem (1.1) and introduced a quasi-Newton and the overrelaxation strategies for accelerating the algorithms. Both algorithms can be viewed as a generalization of Condat’s algorithm [5]. The theoretical analysis is established based on the multi-step techniques present in [18].
In the following, we mainly review some most relevant work for a concise presentation. Problem (1.2) has been studied in [20] in the context of maximum a posterior ECT reconstruction, and a preconditioned alternating projection algorithm (PAPA) is proposed for solving the resulting regularization problem. For \(f_{3}=0\) in (1.1), we proposed the primal-dual fixed point algorithm \(\mathrm{PDFP}^{2}\mathrm{O}\) (primal-dual fixed point algorithm based on proximity operator) in [15]. Based on the fixed point theory, we have shown the convergence of the scheme \(\mathrm{PDFP}^{2}\mathrm{O}\) and the convergence rate of the iteration sequence under suitable conditions.
The rest of the paper is organized as follows. In Section 2, we will present some preliminaries and notations, and deduce PDFP from the first order optimality condition. In Section 3, we will provide the convergence results and the linear convergence rate results for some special cases. In Section 4, we will make a comparison on the form of the PDFP algorithm (1.3) with some existing algorithms. In Section 5, we will show the numerical performance and the efficiency of PDFP through some examples on fused LASSO and pMRI (parallel magnetic resonance image) reconstruction.
2 Primal-dual fixed point algorithm
2.1 Preliminaries and notations
For the self completeness of this work, we list some relevant notations, definitions, assumption and lemmas in convex analysis. We refer the reader to [15, 22] and the references therein for more details.
Lemma 2.1
Lemma 2.2
Lemma 2.3
LetTbe an operator and\(u^{*}\)be a fixed point ofT. Let\(\{ u^{k+1}\}\)be the sequence generated by the fixed point iteration\(u^{k+1}=T(u^{k})\). Suppose (i) Tis continuous, (ii) \(\{\|u^{k}-u^{*}\|\} \)is non-increasing, (iii) \(\lim_{k\to+\infty} \|u^{k+1}-u^{k}\|=0\). Then the sequence\(\{u^{k}\}\)is bounded and converges to a fixed point ofT.
The proof of Lemma 2.3 is standard, and we refer the reader to the proof of Theorem 3.5 in [15] for more details.
2.2 Derivation of PDFP
On extending the ideas of the PAPA proposed in [20] and the \(\mathrm{PDFP}^{2}\mathrm{O}\) proposed in [15], we derive the primal-dual fixed point algorithm (1.3) for solving the minimization problem (1.1).
To sum up, we have the following theorem.
Theorem 2.1
It is easy to confirm that the sequence \(\{(v^{k+1}, x^{k+1})\}\) generated by the PDFP algorithm (1.3) is the Picard iteration \((v^{k+1},x^{k+1})={T}(v^{k},x^{k})\). So we will use the operator T to analyze the convergence of the PDFP in Section 3.
3 Convergence analysis
In the following, let \(\{y^{k+1}\}\) and \(\{u^{k+1}=(v^{k+1},x^{k+1})\}\) be the sequences generated by the PDFP algorithm (1.3), i.e.\(y^{k+1}={T}_{0}(v^{k},x^{k})\) and \((v^{k+1},x^{k+1})={T}(v^{k},x^{k})\). Let \({u^{*}}= ({v^{*}},{x^{*}} )\) be a fixed point of the operator T.
3.1 Convergence
Lemma 3.1
Proof
Lemma 3.2
Proof
Lemma 3.3
Let\(0<\lambda<1/{\lambda_{\mathrm{max}}(BB^{T})}\)and\(0<\gamma<2\beta\). Then the sequence\(\{\|u^{k}-u^{*}\|_{\lambda}\}\)is non-increasing and\(\lim_{k\to+\infty} \|u^{k+1}-u^{k}\|_{\lambda}=0\).
Proof
As a direct consequence of Lemma 3.3 and Lemma 2.3, we obtain the convergence of the PDFP as follows.
Theorem 3.1
Let\(0<\lambda<1/{\lambda_{\mathrm{max}}(BB^{T})}\)and\(0<\gamma<2\beta\). Then the sequence\(\{u^{k}\}\)is bounded and converges to a fixed point ofT, and both\(\{x^{k}\}\)and\(\{y^{k}\}\)converge to a solution of (1.1).
Proof
By Lemma 2.2, both \(\operatorname{prox}_{{\gamma }{f_{3}}}\) and \(I-\operatorname{prox}_{\frac{\gamma}{\lambda}{f_{2}}}\) are firmly nonexpansive, thus the operator T defined by (2.9)-(2.12) is continuous. From Lemma 3.3, we know that the sequence \(\{\|u^{k}-u^{*}\| _{\lambda}\}\) is non-increasing and \(\lim_{k\to+\infty} \| u^{k+1}-u^{k}\|_{\lambda}=0\). By using Lemma 2.3, we know that the sequence \(\{ u^{k}\}\) is bounded and converges to a fixed point of T. By using Theorem 2.1 and (3.14), we can conclude that both \(\{x^{k}\}\) and \(\{y^{k}\} \) converge to a solution of (1.1). □
Remark 3.1
Remark 3.2
For the special case \(f_{1}=0\), the problem (1.1) only corresponds to two proper lower semi-continuous convex functions. The convergence condition \(0<\gamma<2\beta\) in the PDFP becomes \(0<\gamma<+\infty\). Although γ is an arbitrary positive number in theory, the range of γ will affect the convergence speed and it is also a difficult problem to choose a best value in practice.
3.2 Linear convergence rate for special cases
Theorem 3.2
Proof
Let \(\eta_{3}=1/\sqrt{1+\lambda\delta/\gamma}\) and \(\eta=\max\{ \eta_{2},\eta_{3}\}\). It is clear that \(0<\eta<1\). Hence, according to the notation (2.16), the estimate (3.22) can be rewritten as required. □
We note that a linear convergence rate for strongly convex \(f_{2}^{*}\) and \(f_{3}\) are obtained in [19]. They introduced two preconditioned operators for accelerating the algorithm, while a clear relation between the convergence rate and the preconditioned operators is still missing. Meanwhile, introducing preconditioned operators could be beneficial in practice, and we can also introduce a preconditioned operator to deal with \(\nabla f_{1}\) in our scheme. Since the analysis is rather similar to the current one, we will omit it in this paper.
4 Connections to other algorithms
In this section, we present the connections of the PDFP algorithm to some algorithms proposed previously in the literature.
Combettes and Pesquet first proposed a fully split algorithm in [4] to solve monotone inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators, which include (1.1) as a special case. The problem is recast as two-block inclusions and then solved with an error-tolerant primal-dual forward-backward-forward algorithm as studied in [25]. Condat [5] tackled the same problem as given in (1.1) and proposed a primal-dual splitting scheme. For the special case with \(f_{1}=0\), Condat’s algorithm reduces to the PDHG method in [11]. By grouping the multi-block as two blocks, the authors in [24] extended the PDHG algorithm [12] to the minimization of sum of multi-composite functions. The authors in [18] proposed a class of multi-step fixed point proximity algorithms, including several existing algorithms as special examples, for example the algorithms in [11, 12]. In [6], Davis and Yin proposed a three-operator splitting method for solving three-block monotone inclusions in a very tricky way. When solving the problem (1.1) with \(B=I\), the scheme is different from Condat’s algorithm and PDFP algorithm. But it requires subproblem solving if \(B\neq I\). Li and Zhang [19] studied (1.1) based on the techniques present in [18] and including Condat’s algorithm in [5] as a special case, and further introduced quasi-Newton and the overrelaxation strategies to accelerate the algorithms.
The comparison between Condat (\(\pmb{\rho_{k}=1}\)) and PDFP
Condat(\(\boldsymbol{\rho_{k}=1}\)) | PDFP | |
---|---|---|
Form | \(\overline{v}^{k+1}=\operatorname{prox}_{\sigma {f_{2}^{*}}}(\sigma Bx^{k}+\overline{v}^{k})\), \(x^{k+1}=\operatorname{prox}_{{\tau}{f_{3}}}(x^{k}-\tau \nabla {f_{1}}(x^{k})-{\tau} B^{T} (2\overline{v}^{k+1}-\overline{v}^{k}))\) | \(y^{k+1}=\operatorname{prox}_{{\gamma}{f_{3}}}(x^{k}-\gamma \nabla{f_{1}}(x^{k})-{\gamma} B^{T} \overline{v}^{k})\), \(\overline{v}^{k+1}=\operatorname{prox}_{\frac{\lambda }{\gamma }{f_{2}^{*}}}(\frac{\lambda}{\gamma}By^{k+1}+\overline{v}^{k})\), \(x^{k+1}=\operatorname{prox}_{{\gamma}{f_{3}}}(x^{k}-\gamma \nabla {f_{1}}(x^{k})-{\gamma} B^{T} \overline{v}^{k+1})\) |
\(f_{1}\neq0\) | \(\sigma\tau\lambda_{\mathrm{max} }(BB^{T})+\tau/(2\beta)\leq1\) | \(0<\lambda< 1/\lambda _{\mathrm{max}}(BB^{T})\), 0<γ<2β |
\(f_{1}=0\) | \(0<\sigma\tau\leq1/\lambda_{\mathrm{max} }(BB^{T})\) | \(0<\lambda< 1/\lambda_{\mathrm{max}}(BB^{T})\), 0<γ< + ∞ |
Relation | σ = λ/γ, τ = γ |
5 Numerical experiments
In this section, we will apply the PDFP algorithm to solve two problems: the fused LASSO penalized problem and parallel magnetic resonance imaging (pMRI) reconstruction. All the experiments are implemented under MATLAB 7.00 (R14) and conducted on a computer with Intel (R) core (TM) i5-4300U CPU@1.90G.
5.1 The fused LASSO penalized problem
We compare the PDFP algorithm with Condat’s algorithm [5]. For the PDFP algorithm, the parameters λ and γ are chosen according to Theorem 3.1. In practice, we set λ to be close to \(1/\lambda_{\mathrm{max}}(BB^{T})\) and γ to be close to 2β. Here we set \(\lambda=1/4\) as the \(n-1\) eigenvalues of \(BB^{T}\) can be analytically computed as \(2-2 \cos(i\pi /n)\), \(i = 1, 2,\ldots, n-1\) and \(\gamma=1.99/\lambda_{\mathrm{max}}(A^{T}A)\). For Condat’s algorithm, we set \(\lambda= 0.19/4\), \(\gamma=1.9/\lambda _{\mathrm{max}}(A^{T}A)\), which is chosen for a relative better numerical performance. The computation time, the attained objective function values, and the relative errors to the true solution are close for Condat’s algorithm and PDFP. From Figure 1, we see that both Condat’s algorithm and PDFP can quite correctly recover the positions of the non-zeros and the values.
5.2 Image restoration with non-negative constraint and sparse regularization
We consider pMRI reconstruction, where \(A=(A_{1}^{T},A_{2}^{T},\ldots, A_{N}^{T})^{T}\) for each \(A_{j}\) is composed of a diagonal downsampling operator D, the Fourier transform F, and a diagonal coil sensitivity mapping \(S_{j}\) for receiver j, i.e.\(A_{j}=DFS_{j}\) and \(S_{j}\) are often estimated in advance. It is well known in the total variation application that \(\lambda_{\mathrm{max}}(BB^{T})=8\). The related Lipschitz constant of \(\nabla f_{1}\) can be estimated as \(\beta =1\). Therefore the two parameters in PDFP are set as \(\lambda=1/8\) and \(\gamma=2\). The same simulation setting as in [15] is used in this experiment and we still use the artifact power (AP) and the two-region signal to noise ratio (SNR) to measure the image quality. We refer the reader to [15, 27] for more details.
6 Conclusion
We have extended the algorithm PAPA [20] and \(\mathrm{PDFP}^{2}\mathrm{O}\) [15] to derive a primal-dual fixed point algorithm PDFP (see (1.3)) for solving the minimization problem of three-block convex separable functions (1.1). The proposed PDFP algorithm is a symmetric and fully splitting scheme, only involving explicit gradient and linear operators without any inversion and subproblem solving, when the proximity operator of nonsmooth functions can easily be handled. The scheme can easily be adapted to a variety of inverse problems involving many terms minimization and it is suitable for large-scale parallel implementation. In addition, the parameter range determined by the convergence analysis is rather simple and clear, and it could be useful for practical applications. Finally, as discussed in Section 5 in [5], we can also extend the current PDFP algorithm to solve multi-block composite (more than three) minimization problems. Preconditioning operators, as proposed in [16, 19, 24, 28], can also be introduced to accelerate the PDFP, which could be a future work for some specific applications.
Notes
Acknowledgements
P Chen was partially supported by the PhD research startup foundation of Taiyuan University of Science and Technology (No. 20132024). J Huang was partially supported by NSFC (No. 11571237). X Zhang was partially supported by NSFC (Nos. 91330102 and GZ1025) and 973 program (No. 2015CB856004). We thank the reviewer for pointing out the references [4, 14, 16] and for the pertinent comments and suggestions, which greatly improved the early version of this paper.
References
- 1.Tibshirani, R, Saunders, M, Rosset, S, Zhu, J, Knight, K: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc., Ser. B, Stat. Methodol. 67(1), 91-108 (2005) MathSciNetCrossRefMATHGoogle Scholar
- 2.Yuan, M, Lin, Y: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68(1), 49-67 (2006) MathSciNetCrossRefMATHGoogle Scholar
- 3.Goldstein, T, Osher, S: The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2(2), 323-343 (2009) MathSciNetCrossRefMATHGoogle Scholar
- 4.Combettes, PL, Pesquet, J-C: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal. 20(2), 307-330 (2012) MathSciNetCrossRefMATHGoogle Scholar
- 5.Condat, L: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158(2), 460-479 (2013) MathSciNetCrossRefMATHGoogle Scholar
- 6.Davis, D, Yin, W: A three-operator splitting scheme and its optimization applications (2015). arXiv:1504.01032
- 7.Fortin, M, Glowinski, R: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983) MATHGoogle Scholar
- 8.Boyd, S, Parikh, N, Chu, E, Peleato, B, Eckstein, J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1-122 (2011) CrossRefMATHGoogle Scholar
- 9.Zhu, M, Chan, T: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. CAM report 08-34, UCLA (2008) Google Scholar
- 10.Esser, E, Zhang, X, Chan, TF: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015-1046 (2010) MathSciNetCrossRefMATHGoogle Scholar
- 11.Chambolle, A, Pock, T: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120-145 (2011) MathSciNetCrossRefMATHGoogle Scholar
- 12.Pock, T, Chambolle, A: Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In: 2011 International Conference on Computer Vision (ICCV), pp. 1762-1769. IEEE Press, New York (2011) CrossRefGoogle Scholar
- 13.Zhang, X, Burger, M, Bresson, X, Osher, S: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253-276 (2010) MathSciNetCrossRefMATHGoogle Scholar
- 14.Loris, I, Verhoeven, C: On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty. Inverse Probl. 27(12), 125007 (2011) MathSciNetCrossRefMATHGoogle Scholar
- 15.Chen, P, Huang, J, Zhang, X: A primal-dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Probl. 29(2), 025011 (2013) MathSciNetCrossRefMATHGoogle Scholar
- 16.Combettes, PL, Condat, L, Pesquet, J-C, Vu, BC: A forward-backward view of some primal-dual optimization methods in image recovery. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 4141-4145. IEEE Press, New York (2014) CrossRefGoogle Scholar
- 17.Komodakis, N, Pesquet, J-C: Playing with duality: an overview of recent primal-dual approaches for solving large-scale optimization problems (2014). arXiv:1406.5429
- 18.Li, Q, Shen, L, Xu, Y, Zhang, N: Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing. Adv. Comput. Math. 41(2), 387-422 (2015) MathSciNetCrossRefMATHGoogle Scholar
- 19.Li, Q, Zhang, N: Fast proximity-gradient algorithms for structured convex optimization problems. Preprint (2015) Google Scholar
- 20.Krol, A, Li, S, Shen, L, Xu, Y: Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction. Inverse Probl. 28(11), 115005 (2012) MathSciNetCrossRefMATHGoogle Scholar
- 21.Moreau, J-J: Fonctions convexes duales et points proximaux dans un espace Hilbertien. C. R. Acad. Sci. Paris, Sér. A Math. 255, 2897-2899 (1962) MathSciNetMATHGoogle Scholar
- 22.Combettes, PL, Wajs, VR: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168-1200 (2005) MathSciNetCrossRefMATHGoogle Scholar
- 23.Chen, P, Huang, J, Zhang, X: A primal-dual fixed point algorithm based on proximity operator for convex set constrained separable problem. J. Nanjing Norm. Univ. Nat. Sci. Ed. 36(3), 1-5 (2013) (in Chinese) MathSciNetMATHGoogle Scholar
- 24.Tang, Y-C, Zhu, C-X, Wen, M, Peng, J-G: A splitting primal-dual proximity algorithm for solving composite optimization problems (2015). arXiv:1507.08413
- 25.Briceno-Arias, LM, Combettes, PL: A monotone+skew splitting model for composite monotone inclusions in duality. SIAM J. Control Optim. 21(4), 1230-1250 (2011) MathSciNetCrossRefMATHGoogle Scholar
- 26.Liu, J, Yuan, L, Ye, J: An efficient algorithm for a class of fused lasso problems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323-332. ACM, New York (2010) CrossRefGoogle Scholar
- 27.Ji, JX, Son, JB, Rane, SD: PULSAR: a Matlab toolbox for parallel magnetic resonance imaging using array coils and multiple channel receivers. Concepts Magn. Reson., Part B Magn. Reson. Eng. 31(1), 24-36 (2007) CrossRefGoogle Scholar
- 28.Chen, P: Primal-dual fixed point algorithms for convex separable minimization and their applications. PhD thesis, Shanghai Jiao Tong University (2013) (in Chinese) Google Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.