Quadratic Growth Conditions and Uniqueness of Optimal Solution to Lasso

In the previous paper Bello-Cruz et al. (J Optim Theory Appl 188:378–401, 2021), we showed that the quadratic growth condition plays a key role in obtaining Q-linear convergence of the widely used forward–backward splitting method with Beck–Teboulle’s line search. In this paper, we analyze the property of quadratic growth condition via second-order variational analysis for various structured optimization problems that arise in machine learning and signal processing. This includes, for example, the Poisson linear inverse problem as well as the ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{document}-regularized optimization problems. As a by-product of this approach, we also obtain several full characterizations for the uniqueness of optimal solution to Lasso problem, which complements and extends recent important results in this direction.


Introduction
This paper is a continuation of our previous work [8] at which we studied convergence properties of the forward-backward splitting method (FBS in brief, also known as the proximal gradient method). The FBS [5,7,12,14,15,28,37] is a simple and efficient method for solving an optimization problem whose objective function is the sum of two convex functions: one of which is differentiable in its domain, and the other one is proximal-friendly (that is, its proximal mapping can be easily computed) and can be non-differentiable. It is well known that FBS is globally convergent to an optimal solution with the complexity o(k −1 ) [7,9,19,40] in general settings. Linear convergence for FBS has been studied in many papers via Kurdya-Łojasiewicz inequality [10,25,26,31,49] or error bound conditions [21,38,41,53] with the base from [34]. Without assuming the usual condition that the gradient of the differentiable function involved is globally Lipschitz continuous, our previous paper [8] studied convergence properties and the complexity of FBS method with Beck-Teboulle's line search. In particular, under the so-called quadratic growth condition also known as 2-conditioned property, which is close to the idea in [21,25,26], we showed that the sequence generated by the FBS with Beck-Teboulle's line search is Q-linearly convergent. Our derived linear rates complement and sometimes improve those in [21,25,26].
One of the main aims of this paper is to analyze the quadratic growth condition for several structured optimization problems. This allows us to understand the performance of FBS methods for solving specific optimization problems by considering the specific structure of the problems. In particular, we show that the quadratic growth condition is automatically satisfied for the standard Poisson inverse regularized problems with Kullback-Leibler divergence [16,47], which does not satisfy the usual global Lipschitz continuous assumption mentioned above. Using FBS to solve Poisson inverse regularized problems was first proposed in [4] via the idea of Bregman divergence. Recently, Salzo [40] proved that the FBS method with an appropriate line search enjoys a complexity of o(k −1 ) when it is applied to solve Poisson inverse regularized problems. In this paper, we advance this direction by showing that the convergence rate of FBS method with Beck-Teboulle's line search is indeed Q-linear in solving Poisson inverse regularized problems.
It is worth noting that linear convergence of the sequence generated by the FBS in solving some structured optimization problems was also studied in [12,28,32,33,41] when the nonsmooth function is partly smooth relative to a manifold by using the idea of finite support identification. The latter notion introduced by Lewis [29] allows Liang, Fadili, and Peyré [32,33] to cover many important problems such as the total variation semi-norm, the 1 -norm, the ∞ -norm, and the nuclear norm problems. In their paper, a second-order condition was introduced to guarantee the Q-local linear convergence of the FBS sequence under the non-degeneracy assumption [29]. When considering the 1 -regularized problem, we are able to avoid the non-degeneracy assumption. Under the setting of [8], this allows us to improve the well-known work of Hale, Yin, and Zhang [28] in two aspects: (a) We completely drop the aforementioned nondegeneracy assumption. (b) Our second-order condition is strictly weaker than the one in [28,Theorem 4.10]. The wider view is that when considering particular optimization problems listed in the spirit of [32,33], the assumption of non-degeneracy may not be necessary. Furthermore, we revisit the iterative shrinkage thresholding algorithm (ISTA) [7,18], which is indeed FBS for solving Lasso problem [42]. It is well known that the complexity of this algorithm is O(k −1 ); however, recent works [32,41] indicate the local linear convergence of ISTA. The stronger conclusion in this direction is obtained lately by Bolte, Nguyen, Peypouquet, and Suter [10,25] that: the iterative sequence of ISTA is R-linearly convergent, and its corresponding cost sequence is globally Q-linearly convergent, but the rate may depend on the initial point. Inspired by these achievements, we provide two new information under the setting of [8]: (c) The iterative sequence of ISTA is indeed globally Q-linearly convergent. (d) The iterative sequence of ISTA is eventually Q-linearly convergent to an optimal solution with a uniform rate that does not depend on the initial point.
In order to obtain the linear convergence of ISTA, several papers make the assumption that the optimal solution to Lasso is unique; see, e.g., [12,24,28,41]. Although solution uniqueness is not necessary, as discussed above, it is an important property with immediate implications for recovering sparse signals in compressed sensing; see, e.g., [13,23,24,27,36,43,44,48,50,51] and the references therein. As a direct consequence of our analysis on the 1 -regularized problem, we fully characterize solution uniqueness to Lasso. To the best of our knowledge, Fuchs [23] initialized this direction by introducing a simple sufficient condition for this property, which has been extended in other cited papers. Then, in [43], Tibshirani showed that a sufficient condition closely related to Fuchs' condition is also necessary almost everywhere. The full characterization for this property has been obtained recently in [50,51] by using results of strong duality in linear programming. This characterization, which is based on an existence of a vector satisfying a system of linear equations and inequalities, allows [50,51] to recover the aforementioned sufficient conditions and provide some situations in which these conditions turn necessary. Some related results have been developed in [27,36]. Our approach to solution uniqueness is new and different. We also derive several new full characterizations in terms of positively linear independence and Slater-type conditions, which can be easily verifiable.
The outline of our paper is as follows. Section 2 briefly presents some second-order characterization for quadratic growth condition in terms of subgradient graphical derivative [39] and recalls some convergence analysis from our part I [8]. Section 3 devotes to the study of the quadratic growth condition in some structured optimization problems involving Poisson inverse regularized, 1 -regularized, and 1 -regularized least square optimization problems. In Sect. 4, we obtain several new full characterizations to the uniqueness of optimal solution to Lasso problem. Section 5 gives some conclusions and potential future works in this direction.

Preliminary Results on Metric Subregularity of the Subdifferential and Quadratic Growth Condition
Throughout the paper, R n is the usual Euclidean space with dimension n where · and ·, · denote the corresponding Euclidean norm and inner product in R n . We use 0 (R n ) to denote the set of proper, lower semicontinuous, and convex functions on We say h satisfies the quadratic growth condition atx with modulus κ > 0 if there exist ε > 0 such that Here, for a set S, d(x; S) denotes the distance from x to S, and B (x) denotes the ball centered atx with radius . Moreover, if (2) and (∂h) −1 (0) = {x} are both satisfied, then we say strong quadratic growth condition holds for h atx with modulus κ. Some relationship between the quadratic growth condition and the so-called metric subregularity of the subdifferential could be found in [1-3, 10, 22] even for the case of nonconvex functions. The quadratic growth condition (2) is also called quadratic functional growth property in [38] when h is continuously differentiable over a closed convex set. In [25,26], h is said to be 2-conditioned on B ε (x) if it satisfies the quadratic growth condition (2).
The following proposition, a slight improvement in [2, Corollary 3.7], provides a useful characterization for strong quadratic growth condition via the subgradient graphical derivative [39,Chapter 13]. Proposition 2.1 (Characterization of strong quadratic growth condition) Let h ∈ 0 (R n ) andx be an optimal solution, i.e., 0 ∈ ∂h(x). The following are equivalent: (i) h satisfies the strong quadratic growth condition atx.
where D(∂h)(x|0) : R n ⇒ R n is the subgradient graphical derivative of ∂h at x for 0 defined by

Moreover, if (ii) is satisfied then
with convention 0 0 = ∞ and h satisfies the strong quadratic growth condition atx with any modulus κ < . Next, let us recall here some main results from our part I [8] regarding the convergence of forward-backward splitting method (FBS) for solving the following optimization problem: where f , g : R n → R ∪ {∞} are proper, lower semi-continuous, and convex functions. 1 The standing assumptions on the initial data for (5) used throughout the paper: The forward-backward splitting methods for solving (5) is described by with the proximal operator prox g : R n → dom g given by and the stepsize α k > 0 determined from the Beck-Teboulle's line search as follows: Linesearch BT (Beck-Teboulle's line search) Given σ > 0 and θ ∈ (0, 1). With α −1 := σ and x 0 ∈ int (dom f ) ∩ dom g: In [8, Proposition 3.1 and Corollary 3.1], we show that the linesearch above terminates after finite steps, the FBS sequence (x k ) k∈N ⊂ int (dom f ) ∩ dom g is well defined, and thus f is differentiable at x k by assumption A2. The global convergence [8, Theorem 3.1] is recalled here.

Theorem 2.1 (Global convergence of FBS method)
Let (x k ) k∈N be the sequence generated from FBS method. Suppose that the solution set is not empty. Then, (x k ) k∈N converges to an optimal solution point. Moreover, (F(x k )) k∈N also converges to the optimal value.
When the cost function F satisfies the quadratic growth condition and ∇ f is locally Lipschitz continuous, our [8,Theorem 4.1] shows that both iterative and cost sequences of FBS are Q-linearly convergent. Theorem 2.2 (Q-linear convergence under quadratic growth condition) Let (x k ) k∈N be the sequence generated from FBS method. Suppose that the optimal solution set S * to problem (5) is nonempty, and let x * ∈ S * be the limit point of (x k ) k∈N . Suppose further that ∇ f is locally Lipschitz continuous around x * with constant L > 0. If F satisfies the quadratic growth condition at x * with modulus κ > 0, there exists K ∈ N such that Corollary 2.1 (Sharper Q-linear convergence rate under strong quadratic growth condition) Let (x k ) k∈N be the sequence generated from FBS method. Suppose that the solution set S * is not empty, and let x * ∈ S * be the limit point of (x k ) k∈N as in Theorem 2.1. Suppose further that ∇ f is locally Lipschitz continuous around x * with constant L > 0. If F satisfies the strong quadratic growth condition at x * with modulus κ > 0, then there exists some K ∈ N such that for any k > K we have Additionally, if ∇ f is globally Lipschitz continuous on int(dom f ) ∩ dom g with constant L > 0, α above could be chosen as min σ, θ L .

Poisson Linear Inverse Problem
This subsection devotes to the study of the eventually linear convergence of FBS when solving the following standard Poisson regularized problem [16,47] min x∈R n where A ∈ R m×n + is an m × n matrix with nonnegative entries and nontrivial rows, and b ∈ R m ++ is a positive vector. This problem is usually used to recover a signal x ∈ R n + from the measurement b corrupted by Poisson noise satisfying Ax b. The problem (10) could be written in terms of (5) in which where h is the Kullback-Leibler divergence defined by Note from (11) and (12) and f is continuously differentiable at any point on dom f ∩ dom g. The standing assumptions A1 and A2 are satisfied for Problem (10). Moreover, since the function F 1 is bounded below and coercive, the optimal solution set of problem (10) is always nonempty.
It is worth noting further that ∇ f is locally Lipschitz continuous at any point int(dom f ) ∩ dom g but not globally Lipschitz continuous on int(dom f ) ∩ dom g. [40,Section 4] and our [8,Theorem 3.2] show that FBS is applicable to solving (10) with global convergence rate o( 1 k ). In the recent work [4], a new algorithm on a variant of FBS was designed with applications to solving (10). However, the theory developed in [4] could not guarantee the global convergence of the sequence (x k ) k∈N generated by the algorithm in solving (10). This is because their assumptions on the closedness of the domain of their auxiliary Legendre function in [4, Theorem 2] are not satisfied for (10). Our intent here is to reveal the Q-linear convergence of our method when solving (10) in the sense of Theorem 2.2. In order to do so, we need to verify the quadratic growth condition for F 1 at any optimal minimizer for 0. Note further that the Kullback-Leibler divergence h is not strongly convex and ∇ f is not globally Lipschitz continuous; hence, standing assumptions in [21] are not satisfied. Proving the quadratic growth condition for F 1 at an optimal solution via the approach of [21] needs to be proceeded with caution. (10). Then, for any R > 0, we have

Lemma 3.1 Letx be an optimal solution to problem
with some constant ν > 0.
Proof Pick any R > 0 and x ∈ B R (x). We only need to prove (13) for the case that Similarly, we have Adding the above two inequalities gives us that We claim that the optimal solution set S * to problem (10) satisfies that Pick another optimal solutionū ∈ S * , we haveū t :=x + t(x −ū) ∈ S * ⊂ dom f for any t ∈ [0, 1] due to the convexity of S * . By choosing t sufficiently small, we have Since ∂ g is a monotone operator, we obtain that This together with (15) tells us that a i ,x −ū t = 0 for all i = 1, . . . , m. Hence, Ax = Aū =ȳ for anyū ∈ S * , which also implies that This verifies the inclusion "⊂" in (16). The opposite inclusion is trivial. Indeed, take any u satisfying that Au =ȳ and −∇ f (x) ∈ ∂ g(u), similarly to (17) we have . This shows that 0 ∈ ∇ f (u) + ∂ g(u), i.e., u ∈ S * . The proof for equality (16) is completed. Note from (16) that the optimal solution set S * is a polyhedral with the following format Thanks to Hoffman's lemma, there exists a constant γ > 0 such that Fix any x ∈ B R (x) ∩ R n + , (14) tells us that This together with (19) implies that where the fourth inequality follows from the elementary inequality that (a+b) 2 2 ≤ a 2 + b 2 with a, b ≥ 0, and the last inequality is from (18). This clearly ensures (13).
When applying FBS to solving problem (10), we have where P R n + (·) is the projection mapping to R n + . (20)) Let (x k ) k∈N be the sequence generated from (20) with x 0 ∈ A −1 (R n + ) ∩ R n + for solving the Poisson regularized problem (10). Then, the sequences (x k ) k∈N and (F 1 (x k )) k∈N are Q-linearly convergent to an optimal solution and the optimal value to (10), respectively.

Corollary 3.1 (Q-linear convergence of method
Proof Since both functions f and g in problem (10) satisfy our standing assumptions A1 and A2, and problem (10) always has optimal solutions, the sequence (x k ) k∈N converges to an optimal solutionx to problem (10) by Theorem 2.1. Since ∇ f is locally Lipschitz continuous aroundx, the combination of Theorem 2.2 and Lemma 3.1 tells us that (x k ) k∈N is Q-linearly convergent tox.
Using a similar line of argument as above, one can show that quadratic growth condition in Lemma 3.1 is also valid for the following Poisson inverse problem with sparse regularization [4]: where μ > 0 is the penalty parameter. Indeed, noting that x 1 = e, x for x ∈ R n + , with e = (1, 1, . . . , 1) ∈ R n . The objective function of (21) can be written as p(x) + g(x) where p(x) := f (x) + μ e, x , and f , g are given as in (11). Then, the FBS method for solving (21) can proceed by replacing the function f (x) in (11) by p(x). Letx ∈ dom p = dom f be a minimizer to (21). Observe that the function p also satisfies the similar inequality as in (14) As (14) plays the central role in the proof of Lemma 3.1, we can repeat all the steps in this proof by replacing the function f there by p andx byx to prove the quadratic growth condition of problem (21). This together with Corollary 3.1 shows that FBS (10) solves (21) linearly.

1 -Regularized Optimization Problems
In this section, we consider the 1 -regularized optimization problems where x 1 denotes the 1 -norm of x.
In order to use Proposition 2.1 for characterizing the strong quadratic growth condition for F 2 , we need the following calculation of subgradient graphical derivative of ∂(μ · 1 ).
Proof For any x ∈ R n , note that where sgn : R → {−1, 1} is the sign function. Take any v ∈ D∂ · 1 (x * |s)(u), there exists sequence t k ↓ 0 and (u k , v k ) → (u, v) such that (x * ,s) + t k (u k , v k ) ∈ gph ∂μ · 1 . Let us consider three partitions of j described below: Partition 1.1 j / ∈ I , i.e., |s j | < μ. It follows from (24) that x * j = 0. For sufficiently large k, we have |(s + t k v k ) j | < μ and thus |(x * + t k u k ) j | = 0 by (24) again. Hence, u k j = 0, which implies that u j = 0 for all j / ∈ I . Partition 1.2 j ∈ J , i.e., |s j | = μ and x * j = 0. When k is sufficiently large, we have (x * + t k u k ) j = 0 and derive from (24) that which implies that v j = 0 for all j ∈ J . Partition 1.3 j ∈ K , i.e., |s j | = μ and x * j = 0. If there is a subsequence of (x * ,s) j + t k (u k , v k ) j (without relabeling) such that |(s + t k v k ) j | < μ = |s j |, we haves j v k j < 0 and (x * + t k u k ) j = 0 by (24). It follows that u k j = 0. Letting k → ∞, we have u j = 0 ands j v j ≤ 0. Otherwise, we find some L > 0 such that |(s + t k v k ) j | = μ = |s j | for all k > L, which yields v k j = 0. Taking k → ∞ gives us that v j = 0. Furthermore, by (24) again, we havē which imply thats j u j ≥ 0 after passing the limit k → ∞.
Combining the conclusions in three cases above gives us that u ∈ H (x * ) and also verifies the inclusion "⊂" in (23). To justify the converse inclusion "⊃", take u ∈ H (x * ) and any v ∈ R n with v j = 0 for j ∈ J and u j v j = 0,s j v j ≤ 0 for j ∈ K . For any t k ↓ 0, we prove that (x * ,s) + t k (u, v) ∈ gph ∂μ · 1 and thus verify that v ∈ D∂μ · 1 (x * |s)(u). For any t ∈ R, define the set-valued mapping: Similarly to the proof of "⊂" inclusion, we consider three partitions of j as follows: Partition 2.1 j / ∈ I , i.e., |s j | < μ. Since u ∈ H (x * ), we have u j = 0. Note also that x * j = 0. Hence, we get (x * + t k u) j = 0 and (s + t k v) j ∈ [−μ, μ] when k is sufficiently large, which means (s + t k v) j ∈ μ SGN(x * + t k u) j . Partition 2.2 j ∈ J , i.e., |s j | = μ and x * j = 0. Since v j = 0, we have and (x * + t k u) j = 0 when k is large. It follows that (s + t k v) j ∈ μ SGN(x * + t k u) j . Partition 2.3 j ∈ K , i.e., |s j | = μ and x * j = 0. If u j = 0, we have (x * + t k u) j = 0 and |(s + t k v) j | ≤ |s j | ≤ μ for sufficiently large k, sinces j v j ≤ 0. If u j = 0, we have v j = 0 and (s + t k v) j =s j = sgn (u j ) = sgn (x * + t k u) j when k is large, since u js j ≥ 0. In both cases, we have (s +t k v) j ∈ μ SGN(x * +t k u) j .
As a consequence, we establish a characterization of strong quadratic growth condition for F 2 .
Theorem 3.1 (Characterization of strong quadratic growth condition for F 2 ) Let x * be an optimal solution to problem (22). Suppose that ∇ f is differentiable at x * .
Then, the following statements are equivalent: (i) F 2 satisfies the strong quadratic growth condition at x * .
Moreover, if (25) is satisfied, then F 2 satisfies the strong quadratic growth condition with any positive modulus κ < with with the convention 0 0 = ∞.
Proof First let us verify the equivalence between (i) and (ii) by using Proposition 2.1. Indeed, for any v ∈ D(∂ F 2 )(x * |0)(u) we have get from the sum rule [20,Proposition 4A.2 This tells us that (25) is the same with (3) when h = F 2 . By Proposition 2.1, (i) and (ii) are equivalent. Moreover, F 2 satisfies the strong quadratic growth condition with any positive modulus κ < . Finally, the equivalence between (ii) and (iii) is trivial due to the fact that f is convex and thus H E (x * ) is positive semi-definite.

Corollary 3.2 (Linear convergence of FBS method for 1 -regularized problems)
Let (x k ) k∈N be the sequences generated from FBS method for problem (22). Suppose that the solution set S * is not empty, (x k ) k∈N is converging to some x * ∈ S * , and that f is C 2 around x * . If condition (25) holds, then (x k ) k∈N and (F 2 (x k )) k∈N are Q-linearly convergent to x * and F 2 (x * ), respectively, with rates determined in Corollary 2.1, where κ is any positive number smaller than in (27).

Proof
Since f is C 2 around x * , ∇ f is locally Lipschitz continuous around x * . The result follows from Corollary 2.1 and Theorem 3.1. (26) is strictly weaker than the assumption used in [28] that H E (x * ) has full rank to obtain the linear convergence of FBS for (22). Indeed, let us take into account the case n = 2, μ = 1, and f (x 1 ,

Global Q-linear Convergence of ISTA on Lasso Problem
In this section, we study the linear convergence of ISTA for Lasso problem min x∈R n where A is an m × n real matrix and b is a vector in R m .
The following lemma taken from [10, Lemma 10] plays an important role in our proof.

Lemma 3.2 (Global error bound) Fix any R > b 2
2μ . Suppose that x * is an optimal solution to problem (29). Then, we have while ν is the Hoffman constant defined in [10, Definition 1] and only depends on the initial data A, b, μ.
Global R-linear convergence of (x k ) k∈N from ISTA and Q-linear convergence of (F 3 (x k )) k∈N for solving Lasso problem were obtained in [25,Theorem 4.2 and Remark 4.3] and also [26,Theorem 4.8]. Here, we add another feature: The iterative sequence (x k ) k∈N is also globally Q-linearly convergent.

Theorem 3.2 (Global Q-linear convergence of ISTA)
Let (x k ) k∈N be the sequence generated by ISTA for problem (29) that converges to an optimal solution x * ∈ S * . Then, (x k ) k∈N and (F 3 (x k )) k∈N are globally Q-linearly convergent to x * and F 3 (x * ), respectively: for all k ∈ N, where R is any number bigger than x 0 + b 2 μ and γ R is given as in (30) Proof Note that Lasso always has optimal solutions. With x * ∈ S * , we have with α = min σ, θ λ max (A T A) and the note that λ max (A T A) is the global Lipschitz constant of the gradient of 1 2 Ax − b 2 . The proof of (31) and (32) is quite similar to the one of (8) and (9)   -For the sequence (x k ) k∈N generated by ISTA, we first note that our derived Q-linear convergence in Theorem 3.2 is 1 according to (31). This result is new to the literature. In [10, Theorem 25 and Remark 26], R-linear convergence for this sequence via γ R was obtained. In the case of constant step size, by setting and θ = 1, we have α k = α = σ ; see [8,Remark 4.1].
In this case, the corresponding R-linear convergence rate given in [10] reads as 1 . On the other hand, using [8, Proposition 4.1(i)] and Lemma 3.2, one can deduce that the R-linear rate for (x k ) k∈N is 1 , which is sharper than the rate 1 given in [10].
-For the R-linear convergence rate for (F 3 (x k )) k∈N , from [8, Proposition 4.1(ii)] and Lemma 3.2, one can deduce that the rate is . This rate is sharper than the one derived in [10,Remark 26]. However, the Q-linear rate for (F 3 (x k )) k∈N obtained by combining [25, Theorem 4.2(iii)] and Lemma 3.2 is better than our rate given in (32); see also [8,Remark 4.1] for related comparisons. How to improve the Q-linear convergence rate for (F 3 (x k )) k∈N is an interesting future direction of research.
Observe further that the linear rates in Theorem 3.2 depend on the initial point x 0 ; see also [26,Theorem 4.8]. Next, we show that the local linear rates around optimal solutions are uniform and independent of the choice of x 0 . Corollary 3.3 (Local Q-linear convergence of ISTA with uniform rate) Let (x k ) k∈N be the sequence generated by ISTA for problem (29) that converges to an optimal solution x * ∈ S * . Then, (31) and (32) are satisfied when k is sufficiently large, where α = min σ, θ λ max (A T A) and R is any number bigger than b 2 2μ .
Proof Note from the proof of Theorem 3.2 that x * ≤ b 2 2μ < R. By Lemma 3.2, there exists some ε ∈ (0, R − x * ) such that the quadratic growth condition holds at x * : The corollary follows directly from the second part of Theorem 2.2.

Discussions on Nuclear Norm Regularized Least Square Optimization Problems
Another important optimization problem, which has received a lot of attention, is that so-called nuclear norm regularized least square optimization problem Here, A : R p×q → R m×n is a linear operator, B ∈ R m×n , and X * is the nuclear norm of X which is defined as the sum of its singular values. Similar to the development in [8], Q-linear convergence can be derived by assuming (strong) quadratic growth conditions. On the other hand, the following example shows that, different from Lasso problem (29) studied in Sect. 3.3, the (strong) quadratic growth condition is no longer automatically true for the nuclear norm regularized least square optimization problem (34), even when the underlying problem admits a unique solution.
Example 3.1 (Failure of quadratic growth condition for nuclear norm regularized optimization problems) Consider the following optimization problem: which is a particular case of (34) with A(X ) = X 11 + X 22 X 12 − X 21 + X 22 for any X = , and μ = 1. For X := a b c d , let σ 1 and σ 2 be the singular value of X , we have , which means b = c = d = 0 and a = 1. Thus, X = 1 0 0 0 is the unique optimal solution to problem (35). Choose X ε = 1 − ε 1.5 ε − ε 1.5 ε ε 1.5 with ε > 0 sufficiently small and note that Observer further that X ε −X 2 . This tells us that X does not satisfied the strong quadratic growth condition for (35). Note that X is the unique solution, we also see that the quadratic growth condition (2) fails at X .

Remark 3.3
Moreover, by setting X 0 as the identity matrix, σ = 1, and θ = 1 2 , we solve problem (35) numerically by FBS (6) with the Beck-Teboulle's line search and store the quotients . After 276 iterations, both δ k and η k are close to 1 with error 10 −14 . This suggests that Q-linear convergence unlikely occurs for both sequences The quadratic growth condition of nuclear norm regularized problem was studied in [53] under the nondegeneracy condition [53] 2 : 0 ∈ ri ∂h(X ), where ri ∂h(X ) is the relative interior of ∂h(X ). In general, although the nondegeneracy condition is an important property in matrix optimization, it can be restrictive for some applications. Without assuming the nondegeneracy condition for (34), the strong quadratic growth condition can be used to guarantee the linear convergence of FBS as in Corollary 2.1 and Sect. 3.2. The strong quadratic growth condition for problem (34) can be characterized via second-order analysis on nuclear norm [17,52]. On the other hand, the corresponding characterizations are highly non-trivial and are presented in a rather complicated form which may not be able to be easily verified in general. How to obtain easily verifiable and computationally tractable condition ensuring (strong) quadratic growth condition for nuclear norm regularized optimization problem or more generally for matrix optimization problem deserves a separate study and is out of scope of this current paper.

Uniqueness of Optimal Solution to 1 -Regularized Least Square Optimization Problems
As discussed in Sect. 1, the linear convergence of ISTA for Lasso was sometimes obtained by imposing an additional assumption that Lasso has a unique optimal solution x * ; see, e.g., [41]. Since F 3 satisfies the quadratic growth condition at x * (3.2), the uniqueness of x * is equivalent to the strong quadratic growth condition of F 3 at x * . This observation together with Theorem 3.1 allows us to characterize the uniqueness of optimal solution to Lasso in the next result. A different characterization for this property could be found in [51,Theorem 2.1]. Suppose that x * is an optimal solution, which means −A T (Ax * − b) ∈ ∂(μ · 1 )(x * ). In the spirit of Proposition 3.1 with Since . This tells us that J = { j ∈ {1, . . . , n}| x * j = 0} := supp (x * ). Furthermore, given an index set I ⊂ {1, . . . , n}, we denote A I by the submatrix of A formed by its columns A i , i ∈ I and x I by the subvector of x ∈ R n formed by x i , i ∈ I . For any x ∈ R n , we also define sign (x) := (sign (x 1 ), . . . , sign (x n )) T and Diag (x) by the square diagonal matrix with the main entries x 1 , x 2 , . . . , x n . has a unique solution u = (u J , The first equality (45) indeed tells us that x * is an optimal solution to Lasso problem. Inequality (46) means that E = J , i.e., K = ∅ in Theorem 4.1. (47), is also present in our characterizations. Hence, Fuchs' condition implies (iii) in Theorem 4.1 and is clearly not a necessary condition for the uniqueness of optimal solution to Lasso problem, since in many situations the set K is not empty.
Furthermore, in the recent work [43] Tibshirani shows that the optimal solution x * to problem (29) is unique when the matrix A E has full column rank. This condition is sufficient for our (ii) in Theorem 4.1. Indeed, if (x J , x K ) satisfies system (42) in (ii), we have A E x J −Q K x K = 0, which implies that x J = 0 and Q K x K = 0 when ker A E = 0. Since Q K is invertible, the latter tells us that x J = 0 and x K = 0, which clearly verifies (ii). Tibshirani's condition is also necessary for the uniqueness of optimal solution to Lasso problem for almost all b in (29), but it is not for all b; a concrete example could be found in [51].
In the recent works [50,51], the following useful characterization of unique solution to Lasso has been established under mild assumptions: There exists y ∈ R m satisfying A T J y = sign (x * J ) and A T K y ∞ < 1, A J has full column rank.
It is still open to us to connect this condition directly to those ones in Theorem 4.1, although they must be logically equivalent under the assumptions required in [50,51]. However, our approach via second-order variational analysis is completely different and also provides several new characterizations for the uniqueness of optimal solution to Lasso. It is also worth mentioning here that the standing assumption in [51] that A has full row rank is relaxed in our study.

Conclusion
In this paper, we analyze quadratic growth conditions for some structured optimization problems using second-order variational analysis. This allows us to establish the Qlinear convergence of FBS for -Poisson regularized optimization problems and Lasso problems with no assumption on the initial data; -1 -regularized optimization problems with mild assumptions via second-order conditions.
As a by-product, we also obtain full characterizations for the uniqueness of optimal solution to Lasso problem, which complements and extends recent important results in the literature. Our results in this paper point out several interesting research questions, particularly for extending the approach in this paper for matrix optimization problems such as nuclear norm regularized optimization problems.
-Firstly, as we have seen in Example 3.1, for nuclear norm regularized optimization problem (34), (strong) quadratic growth condition can fail even for problems with unique solutions. Thus, there is a gap between the uniqueness of the solution and strong quadratic growth condition for (34). How to characterize this gap for the nuclear norm regularized optimization problem or, more generally, for matrix optimization problems would be an important research topic to investigate. In particular, solution uniqueness for problem (34) has been characterized by the so-called descent cone [13]. Evaluating the descent cone for nuclear norm will help us understand more about solution uniqueness to (34) and understand the gap between the uniqueness of solution with the strong quadratic growth condition for (34).
-Secondly, what is the tightest possible complexity of FBS in solving the nuclear norm minimization problem? Certainly, the complexity is at least o( 1 k ) as studied in [7][8][9]40]. But FBS may fail to exhibit linear convergence when the quadratic growth condition fails, as discussed in Remark 3.3. Due to algebraic structure in nuclear norm, it is natural to conjecture that the complexity is O( 1 k β ) with some β > 1. Finding the optimal β is another research direction which deserves further study.