Abstract
We consider the augmented Lagrangian method (ALM) as a solver for the fused lasso signal approximator (FLSA) problem. The ALM is a dual method in which squares of the constraint functions are added as penalties to the Lagrangian. In order to apply this method to FLSA, two types of auxiliary variables are introduced to transform the original unconstrained minimization problem into a linearly constrained minimization problem. Each updating in this iterative algorithm consists of just a simple one-dimensional convex programming problem, with closed form solution in many cases. While the existing literature mostly focused on the quadratic loss function, our algorithm can be easily implemented for general convex loss. We also provide some convergence analysis of the algorithm. Finally, the method is illustrated with some simulation datasets.
Similar content being viewed by others
References
Conte SD, De Boor C (1980) Elementary numerical analysis : an algorithmic approach, 3rd edn. McGraw-Hill, New York
Ekeland I, Turnbull T (1983) Infinite-dimensional optimization and convexity. Chicago lectures in mathematics. University of Chicago Press, Chicago
Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332
Glowinski R, Le Tallec P (1989) Augmented Lagrangian and operator-splitting methods in nonlinear mechanics. Society for Industrial and Applied Mathematics, Philadelphia
Hestenes MR (1969) Multiplier and gradient methods. J Optim Theory Appl 4:303–320
Hoefling H (2010) A path algorithm for the fused lasso signal approximator. J Comput Graph Stat 19(4): 984–1006
Huang T, Wu BL, Lizardi P, Zhao HY (2005) Detection of DNA copy number alterations using penalized least squares regression. Bioinformatics 21(20):3811–3817
Powell MJD (1969) A method for nonlinear constraints in minimization problems. In: Fletcher R (ed) Optimization. Academic Press, New York, pp 283–298
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton
Rosset S, Zhu J (2007) Piecewise linear regularized solution paths. Ann Stat 35(3):1012–1030
Tai X-C, Wu C (2009) Augmented Lagrangian method, dual methods and split Bregman iteration for ROF model. In: 2nd international conference on scale space and variational methods in computer vision. pp 502–513
Tao M, Yuan X (2011) Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J Optim 21(1):57–81
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B-Stat Methodology 67:91–108
Tibshirani R, Wang P (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9(1):18–29
Wen Z, Goldfarb D, Yin W (2010) Alternating direction augmented Lagrangian methods for semidefinite programming. Math Program Comput 2(3):203–230
Yang J, Yuan X (2012) Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math Comput 82:301–329
Yang J, Zhang Y (2011) Alternating direction algorithms for \(l_{1}\)-problems in compressive sensing. SIAM J Scientific Comput 33(1):250–278
Zou H, Li RZ (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Acknowledgments
We thank the Editor, the AE and two anonymous reviewers for their insightful comments which have impoved the manuscript significantly. The research of Heng Lian is supported by a Singapore MOE Tier 1 grant.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A Proof of Theorem 1
In the proof we use matrix and vector notations. In particular, the expressions \(\theta _i-\beta _i+\beta _{i-1}, i=2,\ldots , n \) can be written as \(\theta -A\beta \) with \(A\) an \((n-1)\times n\) matrix. We also make frequent use of some standard and classical results from convex analysis, such as those contained in Rockafellar (1970), Ekeland and Turnbull (1983), most notably the properties of subdifferential for convex functions. Also, we only show the convergence of Algorithms 1 and 2 while the analysis for Algorithms 3 and 4 is very much the same but more tedious to write down and thus omitted.
We start with Algorithm 1, for which the augmented Lagrangian can be written as
where \(U(\beta )=\sum _{i=1}^nF_i(y_i,\beta _i)+\lambda _1\sum _{i=1}^n|\beta _i|,\,V(\theta )=\lambda _2\sum _{i=2}^n|\theta _i|\), and \(\nu ^T\) is the transpose of the column vector \(\nu \). In the proof we only need to use the convexity of \(U\) and \(V\).
Using the usual notation, suppose \((\beta ^*,\theta ^*,\nu ^*)\) is the saddle point of \(\mathcal L \) satisfying
From the first equality of (5), we have \(\theta ^*=A\beta ^*\). The update for \(\nu \) in Algorithm 1 is \(\nu ^{k}=\nu ^{k-1}+c(\theta ^k-A\beta ^k)\), which implies
where we set \(\bar{\beta }^k=\beta ^k-\beta ^*, \bar{\theta }^k=\theta ^k-\theta ^*\) and \(\bar{\nu }^{k}=\nu ^{k}-\nu ^*\). From (6), we immediately get
Next we show the right hand side of the above is nonnegative.
From the second inequality of (5), we have
where \(\partial \) is the notation for the subdifferential of a convex function.
Correspondingly, based on the update of \(\beta ^k\) and \(\theta ^k\) in Algorithm 1, we have
Subtracting (7) from (9) and subtracting (8) from (10), we get
Multiplying \((\bar{\beta }^k)^T\) to (11) from the left, multiplying \((\bar{\theta }^k)^T\) to (12) from the left, and adding the two expressions gives us
where we used \(\langle \cdot ,\cdot \rangle \) to denote the dot product of two vectors in some places above to be consistent with the usual notation in convex analysis as in Ekeland and Turnbull (1983).
From standard results in convex analysis, all elements in \(\langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle \) and \(\langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle \) are nonnegative and thus we get \(c||\bar{\theta }^k-A\bar{\beta }^k||^2+(\bar{\nu }^{k-1})^T(\bar{\theta }^k-A\bar{\beta }^k)\le 0\) which immediately implies that
Now that \(||\bar{\nu }^k||^2\) is nonnegative and decreasing, we obtain \(\bar{\theta }^k-A\bar{\beta }^k\rightarrow 0\). Using this in (13), we get
where the above expression is taken to mean that “there exists some sequence \(u_k\in \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle \) with \(0\le u_k\rightarrow 0\)”, for example. Similar interpretations are used in the following.
By the definition of subdifferential, we have
resulting in
Using (14), the difference between and left hand side and the right hand side is converging to zero and thus we have \(U(\beta ^k)\rightarrow U(\beta ^*)\). Similarly we can show \(V(\theta ^k)\rightarrow V(\theta ^*)\). These combined with \(\bar{\theta }^k-A\bar{\beta }^k\rightarrow 0\) prove the convergence of Algorithm 1.
For Algorithm 2, the proof strategy is similar and we only point out the differences. The proof is the same as before up to Eq. (8). Because the order of update of \(\beta \) and \(\theta \) in Algorithm 2, Eq. (9) is replaced by
and thus Eq. (11) becomes instead
while Eq. (12) remains the same. Then we have, in place of (13),
which then implies
So the difference from the corresponding analysis for Algorithm 1 is the extra term \(-2c^2(\bar{\beta }^k)^TA^T(\bar{\theta }^{k-1}-\bar{\theta }^k)\) on the right hand side above.
Now we analyze the term \(2c^2(\bar{\beta }^k)^TA^T(\bar{\theta }^{k}-\bar{\theta }^{k-1})\). From (12) (which is still true for Algorithm 2) and the update rule for \(\nu \) in Algorithm 2, we have
Subtracting (17) from (16) and taking into account (18), we get
Taking inner product with \(\theta ^k-\theta ^{k-1}\) in the above equation and using the property of convex function that \(\langle \partial V(\theta ^k)-\partial V(\theta ^{k-1}), \theta ^k-\theta ^{k-1}\rangle \ge 0\), we get
and we can rewrite the above expression as
Using the identity \((\bar{\theta }^k)^T(\bar{\theta }^k-\bar{\theta }^{k-1})=1/2 (||\bar{\theta }^k||^2-||\bar{\theta }^{k-1}||^2+||\bar{\theta }^k-\bar{\theta }^{k-1}||^2)\) we obtain from (15)
After rearranging, we get
and then \(||\bar{\theta }^k-A\bar{\beta }^k||\rightarrow 0\) and \(||\bar{\theta }^k-\bar{\theta }^{k-1}||\rightarrow 0\). Now the rest of the analysis follows that for Algorithm 1 with no changes.
Appendix B R code for FLSA with quadratic loss
Rights and permissions
About this article
Cite this article
Wang, L., You, Y. & Lian, H. A simple and efficient algorithm for fused lasso signal approximator with convex loss function. Comput Stat 28, 1699–1714 (2013). https://doi.org/10.1007/s00180-012-0373-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0373-6