Skip to main content
Log in

A simple and efficient algorithm for fused lasso signal approximator with convex loss function

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We consider the augmented Lagrangian method (ALM) as a solver for the fused lasso signal approximator (FLSA) problem. The ALM is a dual method in which squares of the constraint functions are added as penalties to the Lagrangian. In order to apply this method to FLSA, two types of auxiliary variables are introduced to transform the original unconstrained minimization problem into a linearly constrained minimization problem. Each updating in this iterative algorithm consists of just a simple one-dimensional convex programming problem, with closed form solution in many cases. While the existing literature mostly focused on the quadratic loss function, our algorithm can be easily implemented for general convex loss. We also provide some convergence analysis of the algorithm. Finally, the method is illustrated with some simulation datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Conte SD, De Boor C (1980) Elementary numerical analysis : an algorithmic approach, 3rd edn. McGraw-Hill, New York

    MATH  Google Scholar 

  • Ekeland I, Turnbull T (1983) Infinite-dimensional optimization and convexity. Chicago lectures in mathematics. University of Chicago Press, Chicago

    Google Scholar 

  • Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332

    Article  MathSciNet  MATH  Google Scholar 

  • Glowinski R, Le Tallec P (1989) Augmented Lagrangian and operator-splitting methods in nonlinear mechanics. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  • Hestenes MR (1969) Multiplier and gradient methods. J Optim Theory Appl 4:303–320

    Article  MathSciNet  MATH  Google Scholar 

  • Hoefling H (2010) A path algorithm for the fused lasso signal approximator. J Comput Graph Stat 19(4): 984–1006

    Google Scholar 

  • Huang T, Wu BL, Lizardi P, Zhao HY (2005) Detection of DNA copy number alterations using penalized least squares regression. Bioinformatics 21(20):3811–3817

    Article  Google Scholar 

  • Powell MJD (1969) A method for nonlinear constraints in minimization problems. In: Fletcher R (ed) Optimization. Academic Press, New York, pp 283–298

  • Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Rosset S, Zhu J (2007) Piecewise linear regularized solution paths. Ann Stat 35(3):1012–1030

    Article  MathSciNet  MATH  Google Scholar 

  • Tai X-C, Wu C (2009) Augmented Lagrangian method, dual methods and split Bregman iteration for ROF model. In: 2nd international conference on scale space and variational methods in computer vision. pp 502–513

  • Tao M, Yuan X (2011) Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J Optim 21(1):57–81

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B-Stat Methodology 67:91–108

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani R, Wang P (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9(1):18–29

    Article  MATH  Google Scholar 

  • Wen Z, Goldfarb D, Yin W (2010) Alternating direction augmented Lagrangian methods for semidefinite programming. Math Program Comput 2(3):203–230

    Article  MathSciNet  MATH  Google Scholar 

  • Yang J, Yuan X (2012) Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math Comput 82:301–329

    Google Scholar 

  • Yang J, Zhang Y (2011) Alternating direction algorithms for \(l_{1}\)-problems in compressive sensing. SIAM J Scientific Comput 33(1):250–278

    Google Scholar 

  • Zou H, Li RZ (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We thank the Editor, the AE and two anonymous reviewers for their insightful comments which have impoved the manuscript significantly. The research of Heng Lian is supported by a Singapore MOE Tier 1 grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Lian.

Appendices

Appendix A Proof of Theorem 1

In the proof we use matrix and vector notations. In particular, the expressions \(\theta _i-\beta _i+\beta _{i-1}, i=2,\ldots , n \) can be written as \(\theta -A\beta \) with \(A\) an \((n-1)\times n\) matrix. We also make frequent use of some standard and classical results from convex analysis, such as those contained in Rockafellar (1970), Ekeland and Turnbull (1983), most notably the properties of subdifferential for convex functions. Also, we only show the convergence of Algorithms 1 and 2 while the analysis for Algorithms 3 and 4 is very much the same but more tedious to write down and thus omitted.

We start with Algorithm 1, for which the augmented Lagrangian can be written as

$$\begin{aligned} \mathcal L (\beta ,\theta ,\nu )=U(\beta )+V(\theta )+\frac{c}{2} ||\theta -A\beta ||^2+\nu ^T(\theta -A\beta ), \end{aligned}$$

where \(U(\beta )=\sum _{i=1}^nF_i(y_i,\beta _i)+\lambda _1\sum _{i=1}^n|\beta _i|,\,V(\theta )=\lambda _2\sum _{i=2}^n|\theta _i|\), and \(\nu ^T\) is the transpose of the column vector \(\nu \). In the proof we only need to use the convexity of \(U\) and \(V\).

Using the usual notation, suppose \((\beta ^*,\theta ^*,\nu ^*)\) is the saddle point of \(\mathcal L \) satisfying

$$\begin{aligned} \mathcal L (\beta ^*,\theta ^*,\nu )\le \mathcal L (\beta ^*,\theta ^*,\nu ^*) \le \mathcal L (\beta ,\theta ,\nu ^*) \; \forall \beta ,\theta ,\nu \end{aligned}$$
(5)

From the first equality of (5), we have \(\theta ^*=A\beta ^*\). The update for \(\nu \) in Algorithm 1 is \(\nu ^{k}=\nu ^{k-1}+c(\theta ^k-A\beta ^k)\), which implies

$$\begin{aligned} \bar{\nu }^{k}=\bar{\nu }^{k-1}+c(\bar{\theta }^k-A\bar{\beta }^k), \end{aligned}$$
(6)

where we set \(\bar{\beta }^k=\beta ^k-\beta ^*, \bar{\theta }^k=\theta ^k-\theta ^*\) and \(\bar{\nu }^{k}=\nu ^{k}-\nu ^*\). From (6), we immediately get

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2=-2c(\bar{\nu }^{k-1})^T (\bar{\theta }^k-A\bar{\beta }^k)-c^2||\bar{\theta }^k-A\bar{\beta }^k||^2. \end{aligned}$$

Next we show the right hand side of the above is nonnegative.

From the second inequality of (5), we have

$$\begin{aligned} 0\in \partial _\beta \mathcal L (\beta ^*,\theta ^*,\nu ^*)&\Leftrightarrow 0\in \partial U(\beta ^*)-cA^T(\theta ^*-A\beta ^*)-{\nu ^*}^TA,\end{aligned}$$
(7)
$$\begin{aligned} 0\in \partial _\theta \mathcal L (\beta ^*,\theta ^*,\nu ^*)&\Leftrightarrow 0\in \partial V(\theta ^*)+c(\theta ^*-A\beta ^*)+{\nu ^*}, \end{aligned}$$
(8)

where \(\partial \) is the notation for the subdifferential of a convex function.

Correspondingly, based on the update of \(\beta ^k\) and \(\theta ^k\) in Algorithm 1, we have

$$\begin{aligned} 0\in \partial _\beta \mathcal L (\beta ^k,\theta ^k,\nu ^{k-1})&\Leftrightarrow 0\in \partial U(\beta ^k)-cA^T(\theta ^k-A\beta ^k)-(\nu ^{k-1})^TA,\end{aligned}$$
(9)
$$\begin{aligned} 0\in \partial _\theta \mathcal L (\beta ^k,\theta ^k,\nu ^{k-1})&\Leftrightarrow 0\in \partial V(\theta ^k)+c(\theta ^k-A\beta ^k)+{\nu ^{k-1}}. \end{aligned}$$
(10)

Subtracting (7) from (9) and subtracting (8) from (10), we get

$$\begin{aligned} 0&\in \partial U(\beta ^k)-\partial U(\beta ^*)-cA^T(\bar{\theta }^k-A\bar{\beta }^k)-({\bar{\nu }}^{k-1})^TA,\end{aligned}$$
(11)
$$\begin{aligned} 0&\in \partial V(\theta ^k)-\partial V(\theta ^*)+c(\bar{\theta }^k-A\bar{\beta }^k)+\bar{\nu }^{k-1}. \end{aligned}$$
(12)

Multiplying \((\bar{\beta }^k)^T\) to (11) from the left, multiplying \((\bar{\theta }^k)^T\) to (12) from the left, and adding the two expressions gives us

$$\begin{aligned} 0&\in \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle +\langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle +c||\bar{\theta }^k-A\bar{\beta }^k||^2\nonumber \\&\quad +(\bar{\nu }^{k-1})^T(\bar{\theta }^k-A\bar{\beta }^k), \end{aligned}$$
(13)

where we used \(\langle \cdot ,\cdot \rangle \) to denote the dot product of two vectors in some places above to be consistent with the usual notation in convex analysis as in Ekeland and Turnbull (1983).

From standard results in convex analysis, all elements in \(\langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle \) and \(\langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle \) are nonnegative and thus we get \(c||\bar{\theta }^k-A\bar{\beta }^k||^2+(\bar{\nu }^{k-1})^T(\bar{\theta }^k-A\bar{\beta }^k)\le 0\) which immediately implies that

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2=-2c(\bar{\nu }^{k-1})^T(\bar{\theta }^k\!-\!A\bar{\beta }^k)-c^2||\bar{\theta }^k\!-\!A\bar{\beta }^k||^2\ge c^2||\bar{\theta }^k-A\bar{\beta }^k||^2. \end{aligned}$$

Now that \(||\bar{\nu }^k||^2\) is nonnegative and decreasing, we obtain \(\bar{\theta }^k-A\bar{\beta }^k\rightarrow 0\). Using this in (13), we get

$$\begin{aligned} 0\le \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle \rightarrow 0,\;0\le \langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle \rightarrow 0, \end{aligned}$$
(14)

where the above expression is taken to mean that “there exists some sequence \(u_k\in \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle \) with \(0\le u_k\rightarrow 0\)”, for example. Similar interpretations are used in the following.

By the definition of subdifferential, we have

$$\begin{aligned} U(\beta ^k)&\ge U(\beta ^*)+\langle \partial U(\beta ^*),\bar{\beta }^k\rangle ,\\ U(\beta ^*)&\ge U(\beta ^k)-\langle \partial U(\beta ^k),\bar{\beta }^k\rangle ,\\ \end{aligned}$$

resulting in

$$\begin{aligned} U(\beta ^k)-\langle \partial U(\beta ^*),\bar{\beta }^k\rangle&\ge U(\beta ^*)\ge U(\beta ^k)-\langle \partial U(\beta ^k),\bar{\beta }^k\rangle . \end{aligned}$$

Using (14), the difference between and left hand side and the right hand side is converging to zero and thus we have \(U(\beta ^k)\rightarrow U(\beta ^*)\). Similarly we can show \(V(\theta ^k)\rightarrow V(\theta ^*)\). These combined with \(\bar{\theta }^k-A\bar{\beta }^k\rightarrow 0\) prove the convergence of Algorithm 1.

For Algorithm 2, the proof strategy is similar and we only point out the differences. The proof is the same as before up to Eq. (8). Because the order of update of \(\beta \) and \(\theta \) in Algorithm 2, Eq. (9) is replaced by

$$\begin{aligned} 0\in \partial _\beta \mathcal L (\beta ^k,\theta ^{k-1},\nu ^{k-1})&\Leftrightarrow 0\in \partial U(\beta ^k)-cA^T(\theta ^{k-1}-A\beta ^k)-(\nu ^{k-1})^TA, \end{aligned}$$

and thus Eq. (11) becomes instead

$$\begin{aligned} 0&\in \partial U(\beta ^k)-\partial U(\beta ^*)-cA^T(\bar{\theta }^{k-1}-A\bar{\beta }^k)-({\bar{\nu }}^{k-1})^TA, \end{aligned}$$

while Eq. (12) remains the same. Then we have, in place of (13),

$$\begin{aligned} 0&\in \langle \partial U(\beta ^k)-\partial U(\beta ^*),\bar{\beta }^k\rangle +\langle \partial V(\theta ^k)-\partial V(\theta ^*),\bar{\theta }^k\rangle \\&\quad +c||\bar{\theta }^k-A\bar{\beta }^k||^2+(\bar{\nu }^{k-1})^T (\bar{\theta }^k-A\bar{\beta }^k)-c(\bar{\beta }^k)^TA^T(\bar{\theta }^{k-1} -\bar{\theta }^k), \end{aligned}$$

which then implies

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2\ge c^2||\bar{\theta }^k-A\bar{\beta }^k||^2-2c^2(\bar{\beta }^k)^TA^T (\bar{\theta }^{k-1}-\bar{\theta }^k). \end{aligned}$$
(15)

So the difference from the corresponding analysis for Algorithm 1 is the extra term \(-2c^2(\bar{\beta }^k)^TA^T(\bar{\theta }^{k-1}-\bar{\theta }^k)\) on the right hand side above.

Now we analyze the term \(2c^2(\bar{\beta }^k)^TA^T(\bar{\theta }^{k}-\bar{\theta }^{k-1})\). From (12) (which is still true for Algorithm 2) and the update rule for \(\nu \) in Algorithm 2, we have

$$\begin{aligned} 0&\in \partial V(\theta ^k)-\partial V(\theta ^*)+c(\bar{\theta }^k-A\bar{\beta }^k)+\bar{\nu }^{k-1}\end{aligned}$$
(16)
$$\begin{aligned} 0&\in \partial V(\theta ^{k-1})-\partial V(\theta ^*)+c(\bar{\theta }^{k-1} -A\bar{\beta }^{k-1})+\bar{\nu }^{k-2}\end{aligned}$$
(17)
$$\begin{aligned} \bar{\nu }^{k-1}-\bar{\nu }^{k-2}&= c(\bar{\theta }^{k-1}-A\bar{\beta }^{k-1}). \end{aligned}$$
(18)

Subtracting (17) from (16) and taking into account (18), we get

$$\begin{aligned} 0\in \partial V(\theta ^k)-\partial V(\theta ^{k-1})+c(\bar{\theta }^{k}-A\bar{\beta }^{k}). \end{aligned}$$

Taking inner product with \(\theta ^k-\theta ^{k-1}\) in the above equation and using the property of convex function that \(\langle \partial V(\theta ^k)-\partial V(\theta ^{k-1}), \theta ^k-\theta ^{k-1}\rangle \ge 0\), we get

$$\begin{aligned} (\bar{\theta }^k-A\bar{\beta }^k)^T(\theta ^k-\theta ^{k-1})\le 0, \end{aligned}$$

and we can rewrite the above expression as

$$\begin{aligned} (\bar{\beta }^k)^TA^T(\bar{\theta }^k-\bar{\theta }^{k-1})\ge (\bar{\theta }^k)^T(\bar{\theta }^k-\bar{\theta }^{k-1}). \end{aligned}$$

Using the identity \((\bar{\theta }^k)^T(\bar{\theta }^k-\bar{\theta }^{k-1})=1/2 (||\bar{\theta }^k||^2-||\bar{\theta }^{k-1}||^2+||\bar{\theta }^k-\bar{\theta }^{k-1}||^2)\) we obtain from (15)

$$\begin{aligned} ||\bar{\nu }^{k-1}||^2-||\bar{\nu }^{k}||^2\ge c^2||\bar{\theta }^k-A\bar{\beta }^k||^2+c^2(||\bar{\theta }^k||^2-||\bar{\theta }^{k-1}||^2+||\bar{\theta }^k-\bar{\theta }^{k-1}||^2). \end{aligned}$$

After rearranging, we get

$$\begin{aligned} (||\bar{\nu }^{k-1}||^2\!+\!c^2||\bar{\theta }^{k-1}||^2)\!-\!(||\bar{\nu }^{k}||^2\!+\!c^2||\bar{\theta }^{k}||^2)\!\ge \! c^2||\bar{\theta }^k-A\bar{\beta }^k||^2\!+\!c^2||\bar{\theta }^k-\bar{\theta }^{k-1}||^2, \end{aligned}$$

and then \(||\bar{\theta }^k-A\bar{\beta }^k||\rightarrow 0\) and \(||\bar{\theta }^k-\bar{\theta }^{k-1}||\rightarrow 0\). Now the rest of the analysis follows that for Algorithm 1 with no changes.

Appendix B R code for FLSA with quadratic loss

figure a5

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., You, Y. & Lian, H. A simple and efficient algorithm for fused lasso signal approximator with convex loss function. Comput Stat 28, 1699–1714 (2013). https://doi.org/10.1007/s00180-012-0373-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-012-0373-6

Keywords

Navigation