A simple and efficient algorithm for fused lasso signal approximator with convex loss function

Wang, Lichun; You, Yuan; Lian, Heng

doi:10.1007/s00180-012-0373-6

A simple and efficient algorithm for fused lasso signal approximator with convex loss function

Original Paper
Published: 23 October 2012

Volume 28, pages 1699–1714, (2013)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Lichun Wang¹,
Yuan You² &
Heng Lian²

514 Accesses
7 Citations
Explore all metrics

Abstract

We consider the augmented Lagrangian method (ALM) as a solver for the fused lasso signal approximator (FLSA) problem. The ALM is a dual method in which squares of the constraint functions are added as penalties to the Lagrangian. In order to apply this method to FLSA, two types of auxiliary variables are introduced to transform the original unconstrained minimization problem into a linearly constrained minimization problem. Each updating in this iterative algorithm consists of just a simple one-dimensional convex programming problem, with closed form solution in many cases. While the existing literature mostly focused on the quadratic loss function, our algorithm can be easily implemented for general convex loss. We also provide some convergence analysis of the algorithm. Finally, the method is illustrated with some simulation datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dual symmetric Gauss-Seidel technique-based proximal ADMM for robust fused lasso estimation

Article 23 April 2024

An indefinite proximal Peaceman–Rachford splitting method with substitution procedure for convex programming

Article 08 October 2019

Using LASSO for formulating constraint of least-squares programming for solving one-norm equality constrained problem

Article 14 June 2016

References

Conte SD, De Boor C (1980) Elementary numerical analysis : an algorithmic approach, 3rd edn. McGraw-Hill, New York
MATH Google Scholar
Ekeland I, Turnbull T (1983) Infinite-dimensional optimization and convexity. Chicago lectures in mathematics. University of Chicago Press, Chicago
Google Scholar
Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332
Article MathSciNet MATH Google Scholar
Glowinski R, Le Tallec P (1989) Augmented Lagrangian and operator-splitting methods in nonlinear mechanics. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Hestenes MR (1969) Multiplier and gradient methods. J Optim Theory Appl 4:303–320
Article MathSciNet MATH Google Scholar
Hoefling H (2010) A path algorithm for the fused lasso signal approximator. J Comput Graph Stat 19(4): 984–1006
Google Scholar
Huang T, Wu BL, Lizardi P, Zhao HY (2005) Detection of DNA copy number alterations using penalized least squares regression. Bioinformatics 21(20):3811–3817
Article Google Scholar
Powell MJD (1969) A method for nonlinear constraints in minimization problems. In: Fletcher R (ed) Optimization. Academic Press, New York, pp 283–298
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton
MATH Google Scholar
Rosset S, Zhu J (2007) Piecewise linear regularized solution paths. Ann Stat 35(3):1012–1030
Article MathSciNet MATH Google Scholar
Tai X-C, Wu C (2009) Augmented Lagrangian method, dual methods and split Bregman iteration for ROF model. In: 2nd international conference on scale space and variational methods in computer vision. pp 502–513
Tao M, Yuan X (2011) Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J Optim 21(1):57–81
Article MathSciNet MATH Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B-Stat Methodology 67:91–108
Article MathSciNet MATH Google Scholar
Tibshirani R, Wang P (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9(1):18–29
Article MATH Google Scholar
Wen Z, Goldfarb D, Yin W (2010) Alternating direction augmented Lagrangian methods for semidefinite programming. Math Program Comput 2(3):203–230
Article MathSciNet MATH Google Scholar
Yang J, Yuan X (2012) Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math Comput 82:301–329
Google Scholar
Yang J, Zhang Y (2011) Alternating direction algorithms for $l_{1}$-problems in compressive sensing. SIAM J Scientific Comput 33(1):250–278
Google Scholar
Zou H, Li RZ (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank the Editor, the AE and two anonymous reviewers for their insightful comments which have impoved the manuscript significantly. The research of Heng Lian is supported by a Singapore MOE Tier 1 grant.

Author information

Authors and Affiliations

Department of Mathematics, Beijing Jiaotong University, Beijing, 100044, PR China
Lichun Wang
Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371, Singapore
Yuan You & Heng Lian

Authors

Lichun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan You
View author publications
You can also search for this author in PubMed Google Scholar
Heng Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heng Lian.

Appendices

Appendix A Proof of Theorem 1

In the proof we use matrix and vector notations. In particular, the expressions $\theta _i-\beta _i+\beta _{i-1}, i=2,\ldots , n $ can be written as $\theta -A\beta $ with $A$ an $(n-1)\times n$ matrix. We also make frequent use of some standard and classical results from convex analysis, such as those contained in Rockafellar (1970), Ekeland and Turnbull (1983), most notably the properties of subdifferential for convex functions. Also, we only show the convergence of Algorithms 1 and 2 while the analysis for Algorithms 3 and 4 is very much the same but more tedious to write down and thus omitted.

We start with Algorithm 1, for which the augmented Lagrangian can be written as

$$\begin{aligned} \mathcal L (\beta ,\theta ,\nu )=U(\beta )+V(\theta )+\frac{c}{2} ||\theta -A\beta ||^2+\nu ^T(\theta -A\beta ), \end{aligned}$$

where $U(\beta )=\sum _{i=1}^nF_i(y_i,\beta _i)+\lambda _1\sum _{i=1}^n|\beta _i|,\,V(\theta )=\lambda _2\sum _{i=2}^n|\theta _i|$, and $\nu ^T$ is the transpose of the column vector $\nu $. In the proof we only need to use the convexity of $U$ and $V$.

Using the usual notation, suppose $(\beta ^*,\theta ^*,\nu ^*)$ is the saddle point of $\mathcal L $ satisfying

$$\begin{aligned} \mathcal L (\beta ^*,\theta ^*,\nu )\le \mathcal L (\beta ^*,\theta ^*,\nu ^*) \le \mathcal L (\beta ,\theta ,\nu ^*) \; \forall \beta ,\theta ,\nu \end{aligned}$$

(5)

From the first equality of (5), we have $\theta ^*=A\beta ^*$. The update for $\nu $ in Algorithm 1 is $\nu ^{k}=\nu ^{k-1}+c(\theta ^k-A\beta ^k)$, which implies

$$\begin{aligned} \bar{\nu }^{k}=\bar{\nu }^{k-1}+c(\bar{\theta }^k-A\bar{\beta }^k), \end{aligned}$$

(6)

where we set $\bar{\beta }^k=\beta ^k-\beta ^*, \bar{\theta }^k=\theta ^k-\theta ^*$ and $\bar{\nu }^{k}=\nu ^{k}-\nu ^*$. From (6), we immediately get

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2=-2c(\bar{\nu }^{k-1})^T (\bar{\theta }^k-A\bar{\beta }^k)-c^2||\bar{\theta }^k-A\bar{\beta }^k||^2. \end{aligned}$$

Next we show the right hand side of the above is nonnegative.

From the second inequality of (5), we have

$$\begin{aligned} 0\in \partial _\beta \mathcal L (\beta ^*,\theta ^*,\nu ^*)&\Leftrightarrow 0\in \partial U(\beta ^*)-cA^T(\theta ^*-A\beta ^*)-{\nu ^*}^TA,\end{aligned}$$

(7)

$$\begin{aligned} 0\in \partial _\theta \mathcal L (\beta ^*,\theta ^*,\nu ^*)&\Leftrightarrow 0\in \partial V(\theta ^*)+c(\theta ^*-A\beta ^*)+{\nu ^*}, \end{aligned}$$

(8)

where $\partial $ is the notation for the subdifferential of a convex function.

Correspondingly, based on the update of $\beta ^k$ and $\theta ^k$ in Algorithm 1, we have

$$\begin{aligned} 0\in \partial _\beta \mathcal L (\beta ^k,\theta ^k,\nu ^{k-1})&\Leftrightarrow 0\in \partial U(\beta ^k)-cA^T(\theta ^k-A\beta ^k)-(\nu ^{k-1})^TA,\end{aligned}$$

(9)

$$\begin{aligned} 0\in \partial _\theta \mathcal L (\beta ^k,\theta ^k,\nu ^{k-1})&\Leftrightarrow 0\in \partial V(\theta ^k)+c(\theta ^k-A\beta ^k)+{\nu ^{k-1}}. \end{aligned}$$

(10)

Subtracting (7) from (9) and subtracting (8) from (10), we get

$$\begin{aligned} 0&\in \partial U(\beta ^k)-\partial U(\beta ^*)-cA^T(\bar{\theta }^k-A\bar{\beta }^k)-({\bar{\nu }}^{k-1})^TA,\end{aligned}$$

(11)

$$\begin{aligned} 0&\in \partial V(\theta ^k)-\partial V(\theta ^*)+c(\bar{\theta }^k-A\bar{\beta }^k)+\bar{\nu }^{k-1}. \end{aligned}$$

(12)

Multiplying $(\bar{\beta }^k)^T$ to (11) from the left, multiplying $(\bar{\theta }^k)^T$ to (12) from the left, and adding the two expressions gives us

$$\begin{aligned} 0&\in \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle +\langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle +c||\bar{\theta }^k-A\bar{\beta }^k||^2\nonumber \\&\quad +(\bar{\nu }^{k-1})^T(\bar{\theta }^k-A\bar{\beta }^k), \end{aligned}$$

(13)

where we used $\langle \cdot ,\cdot \rangle $ to denote the dot product of two vectors in some places above to be consistent with the usual notation in convex analysis as in Ekeland and Turnbull (1983).

From standard results in convex analysis, all elements in $\langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle $ and $\langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle $ are nonnegative and thus we get $c||\bar{\theta }^k-A\bar{\beta }^k||^2+(\bar{\nu }^{k-1})^T(\bar{\theta }^k-A\bar{\beta }^k)\le 0$ which immediately implies that

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2=-2c(\bar{\nu }^{k-1})^T(\bar{\theta }^k\!-\!A\bar{\beta }^k)-c^2||\bar{\theta }^k\!-\!A\bar{\beta }^k||^2\ge c^2||\bar{\theta }^k-A\bar{\beta }^k||^2. \end{aligned}$$

Now that $||\bar{\nu }^k||^2$ is nonnegative and decreasing, we obtain $\bar{\theta }^k-A\bar{\beta }^k\rightarrow 0$. Using this in (13), we get

$$\begin{aligned} 0\le \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle \rightarrow 0,\;0\le \langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle \rightarrow 0, \end{aligned}$$

(14)

where the above expression is taken to mean that “there exists some sequence $u_k\in \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle $ with $0\le u_k\rightarrow 0$”, for example. Similar interpretations are used in the following.

By the definition of subdifferential, we have

$$\begin{aligned} U(\beta ^k)&\ge U(\beta ^*)+\langle \partial U(\beta ^*),\bar{\beta }^k\rangle ,\\ U(\beta ^*)&\ge U(\beta ^k)-\langle \partial U(\beta ^k),\bar{\beta }^k\rangle ,\\ \end{aligned}$$

resulting in

$$\begin{aligned} U(\beta ^k)-\langle \partial U(\beta ^*),\bar{\beta }^k\rangle&\ge U(\beta ^*)\ge U(\beta ^k)-\langle \partial U(\beta ^k),\bar{\beta }^k\rangle . \end{aligned}$$

Using (14), the difference between and left hand side and the right hand side is converging to zero and thus we have $U(\beta ^k)\rightarrow U(\beta ^*)$. Similarly we can show $V(\theta ^k)\rightarrow V(\theta ^*)$. These combined with $\bar{\theta }^k-A\bar{\beta }^k\rightarrow 0$ prove the convergence of Algorithm 1.

For Algorithm 2, the proof strategy is similar and we only point out the differences. The proof is the same as before up to Eq. (8). Because the order of update of $\beta $ and $\theta $ in Algorithm 2, Eq. (9) is replaced by

$$\begin{aligned} 0\in \partial _\beta \mathcal L (\beta ^k,\theta ^{k-1},\nu ^{k-1})&\Leftrightarrow 0\in \partial U(\beta ^k)-cA^T(\theta ^{k-1}-A\beta ^k)-(\nu ^{k-1})^TA, \end{aligned}$$

and thus Eq. (11) becomes instead

$$\begin{aligned} 0&\in \partial U(\beta ^k)-\partial U(\beta ^*)-cA^T(\bar{\theta }^{k-1}-A\bar{\beta }^k)-({\bar{\nu }}^{k-1})^TA, \end{aligned}$$

while Eq. (12) remains the same. Then we have, in place of (13),

$$\begin{aligned} 0&\in \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle +\langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle \\&\quad +c||\bar{\theta }^k-A\bar{\beta }^k||^2+(\bar{\nu }^{k-1})^T (\bar{\theta }^k-A\bar{\beta }^k)-c(\bar{\beta }^k)^TA^T(\bar{\theta }^{k-1} -\bar{\theta }^k), \end{aligned}$$

which then implies

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2\ge c^2||\bar{\theta }^k-A\bar{\beta }^k||^2-2c^2(\bar{\beta }^k)^TA^T (\bar{\theta }^{k-1}-\bar{\theta }^k). \end{aligned}$$

(15)

So the difference from the corresponding analysis for Algorithm 1 is the extra term $-2c^2(\bar{\beta }^k)^TA^T(\bar{\theta }^{k-1}-\bar{\theta }^k)$ on the right hand side above.

Now we analyze the term $2c^2(\bar{\beta }^k)^TA^T(\bar{\theta }^{k}-\bar{\theta }^{k-1})$. From (12) (which is still true for Algorithm 2) and the update rule for $\nu $ in Algorithm 2, we have

$$\begin{aligned} 0&\in \partial V(\theta ^k)-\partial V(\theta ^*)+c(\bar{\theta }^k-A\bar{\beta }^k)+\bar{\nu }^{k-1}\end{aligned}$$

(16)

$$\begin{aligned} 0&\in \partial V(\theta ^{k-1})-\partial V(\theta ^*)+c(\bar{\theta }^{k-1} -A\bar{\beta }^{k-1})+\bar{\nu }^{k-2}\end{aligned}$$

(17)

$$\begin{aligned} \bar{\nu }^{k-1}-\bar{\nu }^{k-2}&= c(\bar{\theta }^{k-1}-A\bar{\beta }^{k-1}). \end{aligned}$$

(18)

Subtracting (17) from (16) and taking into account (18), we get

$$\begin{aligned} 0\in \partial V(\theta ^k)-\partial V(\theta ^{k-1})+c(\bar{\theta }^{k}-A\bar{\beta }^{k}). \end{aligned}$$

Taking inner product with $\theta ^k-\theta ^{k-1}$ in the above equation and using the property of convex function that $\langle \partial V(\theta ^k)-\partial V(\theta ^{k-1}), \theta ^k-\theta ^{k-1}\rangle \ge 0$, we get

$$\begin{aligned} (\bar{\theta }^k-A\bar{\beta }^k)^T(\theta ^k-\theta ^{k-1})\le 0, \end{aligned}$$

and we can rewrite the above expression as

$$\begin{aligned} (\bar{\beta }^k)^TA^T(\bar{\theta }^k-\bar{\theta }^{k-1})\ge (\bar{\theta }^k)^T(\bar{\theta }^k-\bar{\theta }^{k-1}). \end{aligned}$$

Using the identity $(\bar{\theta }^k)^T(\bar{\theta }^k-\bar{\theta }^{k-1})=1/2 (||\bar{\theta }^k||^2-||\bar{\theta }^{k-1}||^2+||\bar{\theta }^k-\bar{\theta }^{k-1}||^2)$ we obtain from (15)

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2\ge c^2||\bar{\theta }^k-A\bar{\beta }^k||^2+c^2(||\bar{\theta }^k||^2-||\bar{\theta }^{k-1}||^2+||\bar{\theta }^k-\bar{\theta }^{k-1}||^2). \end{aligned}$$

After rearranging, we get

$$\begin{aligned} (||\bar{\nu }^{k-1}||^2\!+\!c^2||\bar{\theta }^{k-1}||^2)\!-\!(||\bar{\nu }^{k}||^2\!+\!c^2||\bar{\theta }^{k}||^2)\!\ge \! c^2||\bar{\theta }^k-A\bar{\beta }^k||^2\!+\!c^2||\bar{\theta }^k-\bar{\theta }^{k-1}||^2, \end{aligned}$$

and then $||\bar{\theta }^k-A\bar{\beta }^k||\rightarrow 0$ and $||\bar{\theta }^k-\bar{\theta }^{k-1}||\rightarrow 0$. Now the rest of the analysis follows that for Algorithm 1 with no changes.

Appendix B R code for FLSA with quadratic loss

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., You, Y. & Lian, H. A simple and efficient algorithm for fused lasso signal approximator with convex loss function. Comput Stat 28, 1699–1714 (2013). https://doi.org/10.1007/s00180-012-0373-6

Download citation

Received: 26 November 2011
Accepted: 04 October 2012
Published: 23 October 2012
Issue Date: August 2013
DOI: https://doi.org/10.1007/s00180-012-0373-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A simple and efficient algorithm for fused lasso signal approximator with convex loss function

Abstract

Access this article

Similar content being viewed by others

A dual symmetric Gauss-Seidel technique-based proximal ADMM for robust fused lasso estimation

An indefinite proximal Peaceman–Rachford splitting method with substitution procedure for convex programming

Using LASSO for formulating constraint of least-squares programming for solving one-norm equality constrained problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A Proof of Theorem 1

Appendix B R code for FLSA with quadratic loss

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A simple and efficient algorithm for fused lasso signal approximator with convex loss function

Abstract

Access this article

Similar content being viewed by others

A dual symmetric Gauss-Seidel technique-based proximal ADMM for robust fused lasso estimation

An indefinite proximal Peaceman–Rachford splitting method with substitution procedure for convex programming

Using LASSO for formulating constraint of least-squares programming for solving one-norm equality constrained problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A Proof of Theorem 1

Appendix B R code for FLSA with quadratic loss

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation