Skip to main content
Log in

Minimization of Non-smooth, Non-convex Functionals by Iterative Thresholding

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Convergence analysis is carried out for a forward-backward splitting/generalized gradient projection method for the minimization of a special class of non-smooth and genuinely non-convex minimization problems in infinite-dimensional Hilbert spaces. The functionals under consideration are the sum of a smooth, possibly non-convex and non-smooth, necessarily non-convex functional. For separable constraints in the sequence space, we show that the generalized gradient projection method amounts to a discontinuous iterative thresholding procedure, which can easily be implemented. In this case we prove strong subsequential convergence and moreover show that the limit satisfies strengthened necessary conditions for a global minimizer, i.e., it avoids a certain set of non-global minimizers. Eventually, the method is applied to problems arising in the recovery of sparse data, where strong convergence of the whole sequence is shown, and numerical tests are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  2. Pytlak, R.: Conjugate Gradient Algorithms in Nonconvex Optimization, Nonconvex Optimization and Its Applications, vol. 89. Springer, Berlin (2009)

    Google Scholar 

  3. Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. Multiscale Modell. Simul. 4(3), 960–991 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  4. Nikolova, M.: Markovian reconstruction using a GNC approach. IEEE Trans. Image Process. 8(9), 1204–1220 (1999)

    Article  Google Scholar 

  5. Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)

    Google Scholar 

  6. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  7. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bredies, K., Lorenz, D.A.: Linear convergence of iterative soft-thresholding. J. Fourier Anal. Appl. 14(5–6), 813–837 (2008). doi:10.1007/s00041-008-9041-1

    Article  MATH  MathSciNet  Google Scholar 

  9. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. Ser. B. (2011). doi:10.1007/s10107-011-0484-9.

  10. Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. MOBIS SFB-Report 2011–2014 (2011). http://math.uni-graz.at/mobis/publications/SFB-Report-2011-014.pdf.

  11. Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. To appear J. Optim. Theory Appl. (2013). http://www.optimization-online.org/DB_HTML/2013/01/3749.html

  12. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for non-convex optimization. (2013). http://lmb.informatik.uni-freiburg.de/Publications/2013/OB13

  13. Kaplan, A., Tichatschke, R.: Proximal point methods and nonconvex optimization. J. Global Optim. 13, 389–406 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  14. Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged nonconvex projections. Found. Comput. Math. 9, 485–513 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  15. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)

    Google Scholar 

  16. Penot, J.P.: Proximal mappings. J. Approx. Theory 94, 203–221 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  17. Hare, W., Sagastizábal, C.: Computing proximal points of nonconvex functions. Math. Program. Ser. B 116(1–2), 221–258 (2009)

    Article  MATH  Google Scholar 

  18. Demyanov, V.F., Rubinov, A.M.: Approximate Methods in Optimization Problems. Modern Analytic and Computational Methods in Science and Mathematics, vol. 32. American Elsevier, New York (1970)

    Google Scholar 

  19. Dunn, J.C.: Global and asymptotic convergence rate estimates for a class of projected gradient processes. SIAM J. Control Optim. 19(3), 368–400 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  20. Clarke, F.H.: Optimization and Nonsmooth Analysis. CRM Université de Montréal, Montréal (1989)

    MATH  Google Scholar 

  21. Bredies, K., Lorenz, D.A.: Regularization with non-convex separable constraints. Inverse Problems, 25(8), 085011 (2009). doi:10.1088/0266-5611/25/8/085011

  22. Blumensath, T., Yaghoobi, M., Davies, M.: Iterative hard thresholding and \(l^0\) regularisation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2007)

  23. Moulin, P., Liu, J.: Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Trans. Inform. Theory 45(3), 909–919 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  24. Lorenz, D.A.: Non-convex variational denoising of images: interpolation between hard and soft wavelet shrinkage. Curr. Dev. Theory Appl. Wavelets 1(1), 31–56 (2007)

    MATH  MathSciNet  Google Scholar 

  25. Lorenz, D.A.: Convergence rates and source conditions for Tikhonov regularization with sparsity constraints. J. Inverse Ill-Posed Problems 16(5), 463–478 (2008). doi:10.1515/JIIP.2008.025

    Article  MATH  MathSciNet  Google Scholar 

  26. Grasmair, M.: Well-posedness classes for sparse regularisation. Reports of FSP S092-“Industrial Geometry”. University of Vienna, Austria (2010)

  27. Lai, M.J., Wang, J.: An unconstrained \(\ell _q\) minimization with \(0 < q\le 1\) for sparse solution of underdetermined linear systems. SIAM J. Optim. 21(1), 82–101 (2011)

  28. Sturmfels, B.: What is a Gröbner basis? Notices Am. Math. Soc. 52(10), 1199–1200 (2005)

    MATH  MathSciNet  Google Scholar 

  29. Buchberger, B.: Introduction to Gröbner bases. In: Buchberger, B., Winkler, F. (eds.) Gröbner Basis and Applications. London Mathematical Society Lecture Notes Series, vol. 251, pp. 3–31. Cambridge University Press, Cambridge (1998)

  30. Kreuzer, M., Robbiano, L.: Computational Commutative Algebra, vol. 1. Springer, New York (2000)

    Book  MATH  Google Scholar 

Download references

Acknowledgments

Kristian Bredies acknowledges support by the SFB Research Center “Mathematical Optimization and Applications in Biomedical Sciences” at the University of Graz. Dirk Lorenz acknowledges support by the DFG under Grant LO 1436/2-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk A. Lorenz.

Additional information

Communicated by Hedy Attouch.

Appendix

Appendix

1.1 8.1 Proof of Lemma 3.2

As \(\phi \) is non-decreasing, \(\phi '(y) \ge 0\) for all \(y > 0\), hence \(\lim _{y \rightarrow \infty } \phi '(y) \ge 0\) (with \(\infty \) allowed). Thus, each \(\rho _s\) is a strictly convex function with \(\rho _s(y) \rightarrow \infty \) for \(y \rightarrow 0\) and \(y \rightarrow \infty \), which implies that a unique minimizer \(y_s > 0\) exists.

Next, note that since \(\rho _s\) is strictly convex for some \(s > 0\), it is locally Lipschitz. Consequently, \(\phi '\) is locally Lipschitz and in particular absolutely continuous on compact intervals contained in \({]{0,\infty }[}\). This implies that \(\phi '\) is differentiable almost everywhere in \({]{0,\infty }[}\), with \(\phi '(y_1) - \phi '(y_0) = \int _{y_0}^{y_1} \phi ''(t) \,\mathrm {d}{t}\) for each \(y_1 > y_0 > 0\). Moreover, by assumption \(t \mapsto t\phi ''(t)\) is Lebesgue integrable on \([0,y]\) for each \(y > 0\), hence Fubini’s theorem is applicable. We compute

$$\begin{aligned} \phi (y) - y\phi '(y) = \phi (0) + \int \limits _0^y (\phi '(\tau ) - \phi '(y)) \,\mathrm {d}{\tau } = - \int \limits _0^y \int \limits _{\tau }^y \phi ''(t) \,\mathrm {d}{t} \,\mathrm {d}{\tau } = -\int \limits _0^y t\phi ''(t) \,\mathrm {d}{t}. \end{aligned}$$

Since \((\phi ')'_-\) is strictly monotonically increasing and coincides with \(\phi ''\) where the latter exists, i.e., almost everywhere in \({]{0,\infty }[}\), we get for \(y > 0\) and any \(z \ge (\phi ')'_-(y)\) that

$$\begin{aligned} - \int \limits _0^y t \phi ''(t) \,\mathrm {d}{t} = \int \limits _0^y t (\phi ')'_-(t) \,\mathrm {d}{t} > -z \int \limits _0^y t \,\mathrm {d}{t} = -\frac{z y^2}{2}. \end{aligned}$$

This already proves \(\psi (y) > -z\). For \(\psi \) being strictly decreasing, consider \(y > 0\) where \(\phi ''(y)\) exists and deduce

$$\begin{aligned} \psi '(y) = \frac{2}{y} \left( \frac{2}{y^2} \int \limits _0^y t \phi ''(t) \,\mathrm {d}{t} - \phi ''(y) \right) < 0 \quad \Leftrightarrow \quad \psi (y) > -\phi ''(y), \end{aligned}$$

where the latter has already been established. Hence, \(\psi ' < 0\) almost everywhere in \({]{0,\infty }[}\), and as \(\psi \) is also locally Lipschitz in \({]{0,\infty }[}\), it is absolutely continuous in each compact interval contained in \({]{0,\infty }[}\). Consequently, one can conclude that \(\psi \) is strictly monotonically decreasing. Moreover, \(\psi (y) > -(\phi ')'_-(y)\) also implies \(\psi (y) \rightarrow \infty \) as \(y \rightarrow 0\), since \(\phi '(y)\) is bounded around \(0\) whenever \((\phi ')'_-(y)\) is bounded around \(0\). Finally, from the assumptions \(\phi (y) \rightarrow \infty \) and \(\phi '(y)/y \rightarrow 0\) as \(y \rightarrow \infty \), it follows with L’Hôpital’s rule that \(\phi (y)/y^2 \rightarrow 0\). Consequently, \(\psi (y) \rightarrow 0\) as \(y \rightarrow \infty \) and, together with the above, \(\psi : {]{0,\infty }[} \rightarrow {]{0,\infty }[}\) is onto. \(\square \)

1.2 8.2 Tools from Algebraic Geometry

This section contains some technical lemmas for algebraic equations in order to prove Proposition 5.1. We make here use of the theory of Gröbner bases for solving algebraic equations.

We present the most important notations and definitions, which are needed for the theory here for the sake of convenience and in order to provide a uniform notation. A brief introduction to Gröbner bases can also be found in [28].

Definition 8.1

(Some notations and definitions in polynomial rings)

  • We denote by \(\mathbf {P}\) the polynomial ring \(\mathbf {C}[X_1,\ldots ,X_n]\) in \(n\) variables with complex coefficients.

  • A subgroup \(I\) of \(\mathbf {P}\) with respect to \(+\) is called an ideal of \(\mathbf {P}\), iff for all \(f \in \mathbf {P}\) and all \(g \in I\) the product \(f g\) is also in \(I\). We write \(I \vartriangleleft \mathbf {P}\) if \(I\) is an ideal of \(\mathbf {P}\).

  • Let \(F\) be a subset of \(\mathbf {P}\). The set

    $$\begin{aligned} \langle F \rangle = \bigcap \limits _{F \subset I \vartriangleleft \mathbf {P}} I \end{aligned}$$

    is called the ideal generated by \(F\). It holds that

    $$\begin{aligned} \langle f_1,\ldots ,f_m\rangle = \left\{ \sum \limits _{i=1}^m {g_i f_i}:{f_i \in F, \,g_i \in \mathbf {P}\text { for } i = 1,\ldots ,m}\right\} . \end{aligned}$$
  • For \(\alpha \in \mathbf {N}^n_0\) the power product \(X^{\alpha }\) is defined by \(X^{\alpha } := X_1^{\alpha _1} \cdots X^{\alpha _n}_n\).

  • We denote with \(\mathbf {T}\subset \mathbf {P}\) the set \(\{{ X^{\alpha }}\ : \ {\alpha \in \mathbf {N}_0^n}\}\) of power products. A power product \(X^\alpha \) is mixed if \(\alpha _i > 0\) for at least two \(i\).

  • The lexicographical order \(<_{{{\mathrm{lex}}}}\) of multi-indices \(\alpha , \beta \in \mathbf {N}^n_0\) is defined as

    $$\begin{aligned} \alpha <_{{{\mathrm{lex}}}} \beta \quad \Leftrightarrow \quad \exists m > 0: (\forall i < m: \alpha _i = \beta _i) \wedge \alpha _m < \beta _m. \end{aligned}$$
  • The total degree lexicographical order \(\prec \) on the set \(\mathbf {T}\) of power products is defined by

    $$\begin{aligned}&t_1 \prec t_2 \Leftrightarrow t_1:=X^{\alpha }, t_2 := X^{\beta }, \quad \alpha , \beta \in \mathbf {N}^n_0, \\&\quad |{\alpha }| < |{\beta }| \vee \left( |{\alpha }| = |{\beta }| \wedge \alpha <_{{{\mathrm{lex}}}} \beta \right) \end{aligned}$$

    with \(|{\alpha }| = \sum _{j=1}^n \alpha _j\).

  • For a polynomial \(f \in \mathbf {P}\) we define the coefficient of the power product \(t\) of \(f\) as \(C(f,t)\).

  • We define the support of the polynomial \(f\) as \({{\mathrm{supp}}}(f) := \{{ t \in \mathbf {T}}\ : \ {C(f,t) \ne 0}\}\).

  • The degree of the polynomial \(f\) is defined by \(\deg (f) := \max \ \{{ |{\alpha }|}\ : \ {X^\alpha \in {{\mathrm{supp}}}(f)}\}\), while the degree with respect to \(X_i\) is given by \(\deg _{X_i}(f) := \max \ \{{ \alpha _i}\ : \ {X^\alpha \in {{\mathrm{supp}}}(f)}\}\).

  • The leading powerproduct of \(f\) is defined by \({{\mathrm{LPP}}}(f) := \max _{\prec }\{{t}\ : \ {t \in {{\mathrm{supp}}}(f)}\}\).

  • The trailing term of \(f\) is the smallest powerproduct in \({{\mathrm{supp}}}(f)\) with respect to \(\prec \).

Definition 8.2

(Varieties) Let \(I \subset \mathbf {P}\) be a set of polynomials. Then the variety of \(I\) is defined as \( V(I) := \{{ c \in \mathbf {C}^n}\ : \ {f(c) = 0 \text { for all } f \in I}\}\).

Definition 8.3

Let \(F \subset \mathbf {P}\). The initial ideal with respect to the order \(\prec \) is defined by \({{\mathrm{In}}}_{\prec }(F) := \left\langle \{{{{\mathrm{LPP}}}(f)}\ : \ {f \in F}\} \right\rangle \).

Definition 8.4

(Gröbner basis) Let \(I \vartriangleleft \mathbf {P}\) be an ideal and \(\prec \) the lexicographical total degree order. A finite \(G \subset I\) such that \({{\mathrm{In}}}_{\prec }(I) = {{\mathrm{In}}}_{\prec }(G)\) is called a Gröbner basis for \(I\) with respect to \(\prec \).

We call \(G\) a reduced Gröbner basis iff additionally

  1. (i)

    each \(g \in G\) is normed, i.e., the leading coefficient, \(C(g,{{\mathrm{LPP}}}(g))\), is \(1\),

  2. (ii)

    the set \(\{{{{\mathrm{LPP}}}(g)}\ : \ {g \in G}\}\) minimally generates \({{\mathrm{In}}}_{\prec }(I)\), i.e., for all \(g\in G\) it holds that \({{\mathrm{In}}}_{\prec }(G{\setminus }\{g\}) \ne {{\mathrm{In}}}_{\prec }(G)\), and

  3. (iii)

    no trailing term of any \(g\in G\) lies in \({{\mathrm{In}}}_{\prec }(I)\).

One can show that a reduced Gröbner basis exists for each ideal \(I\) and that it is unique. We will use the following characterization of Gröbner bases (the proof can be found in [29, Sect. 3.5] or [30, Theorem 2.4.1]).

Theorem 8.1

Let \(I \vartriangleleft \mathbf {P}\) be an ideal and let \(G \subset I\) be a finite set of normed polynomials. Then, the following statements are equivalent:

  1. (i)

    \(G\) is a Gröbner basis for \(I\) with respect to \(\prec \).

  2. (ii)

    It holds that \( \{{{{\mathrm{LPP}}}(f)}\ : \ {f \in I}\} = \{{t {{\mathrm{LPP}}}(g)}\ : \ {g \in G, \ t \in \mathbf {T}}\}\).

Theorem 8.2 is also from the theory of Gröbner bases and gives a connection between the finiteness of varieties and the structure of the reduced Gröbner basis (the proof can be found in [29, Sect. 3.6], [30, Proposition 3.7.1]).

Theorem 8.2

Let \(I \vartriangleleft \mathbf {P}\) be an ideal, and \(G\) the reduced Gröbner basis with respect to \(\prec \). Then, the following statements are equivalent:

  1. (i)

    \(V(I)\) is finite.

  2. (ii)

    There exists \(g_1,\ldots ,g_n \in G\), and \(m_1,\ldots ,m_n \in \mathbf {N}_0\), such that \({{\mathrm{LPP}}}(g_i) = X_i^{m_i}\).

  3. (iii)

    The quotient space \(\mathbf {P}/I\) is a finite-dimensional \(\mathbf {C}\) vector space.

Finally, we have all the tools to obtain the desired results about the solutions of algebraic equations. They are formulated in the following two lemmas.

Lemma 8.1

Let \(n \in \mathbf {N}\), \(A \in \mathbf {C}^{n \times n}\) be a Hermitian positive definite matrix. The system of quadratic equations \({{\mathrm{diag}}}(X) A X = 0\), has only the trivial solution \(X = 0\).

Proof

Let \(X\) be a solution of \({{\mathrm{diag}}}(X)AX = 0\). Then, for each \(i\), we have \(X_i =0\) or \((AX)_i = 0\). Hence, we can rearrange the indices \(1,\ldots ,n\) in such a way that for a \(1 \le k \le n\), that \(X_1,\ldots ,X_k \not = 0\), and \(X_{k+1},\ldots ,X_n = 0\). In order to exchange the indices \(i\) and \(j\), we have to exchange the \(i\)-th and the \(j\)-th rows as well as the \(i\)-th and \(j\)-th columns of \(A\). Since the eigenvalues of \(A\) are invariant under those operations, the resulting new matrix \(\tilde{A}\) is also positive definite and Hermitian. Since \(X_{k+1},\ldots , X_n = 0\), we can eliminate the corresponding rows and columns of the system. Denoting the resulting matrix by \(\tilde{\tilde{A}}\) and \(\tilde{X} = (X_1,\ldots ,X_k)\), it follows that

$$\begin{aligned} \tilde{\tilde{A}} \tilde{X} = 0. \end{aligned}$$

From the Sylvester criterion we know that \(\tilde{\tilde{A}}\) is positive definite as it is a submatrix of \(\tilde{A}\). Therefore, \(\tilde{X} = 0\), and hence, \(X = 0\). \(\square \)

Lemma 8.2

Let \(A \in \mathbf {C}^{n\times n}\) be a positive definite, Hermitian matrix and \(b,c,d \in \mathbf {C}^n\). Then, the system

$$\begin{aligned}&{{\mathrm{diag}}}(X) A X + {{\mathrm{diag}}}(X) b + {{\mathrm{diag}}}(X) {{\mathrm{diag}}}(a) c + {{\mathrm{diag}}}(a) d = 0, \\&a_i^r = X_i^{q}, \qquad i=1,\ldots ,n, \end{aligned}$$

with \(r > q\) has only finitely many solutions.

Proof

Define the polynomials

$$\begin{aligned} f_i := f(X_1,\ldots ,X_n) = X_i \left( \,\sum \limits _{j=1}^n A_{ij} X_j \right) + X_i b_i + X_i a_i c_i + a_i d_i = 0 \end{aligned}$$

and

$$\begin{aligned} \tilde{f}_i = X_i \left( \sum \limits _{j=1}^n A_{ij} X_j \right) . \end{aligned}$$

Furthermore set \(\tilde{I}= \langle \tilde{f}_1,\ldots ,\tilde{f}_n \rangle \).

Let \(\tilde{G}\) be the reduced Gröbner basis for \(\tilde{I}\) with respect to the total degree lexicographical order \(\prec \). From Lemma 8.1 and Theorem 8.2 we know that there are \(m_1,\ldots ,m_n \in \mathbf {N}\) such that \({{\mathrm{LPP}}}(\tilde{g}_i) = X_i^{m_i}\) for \(i=1,\ldots ,n\). Since \(\tilde{G}\subseteq \tilde{I}\) there are \(r_{ij} \in \mathbf {P}\), such that

$$\begin{aligned} \tilde{g}_i = \sum \limits _{j=1}^n \tilde{f}_j r_{ij} \end{aligned}$$
(22)

for all \(i = 1,\ldots ,n\). Let us now fix some \(i \in \{1,\ldots ,n\}\).

Next note that \(\tilde{f}_i\) only contains quadratic power products and that \(X_i^2 \in {{\mathrm{supp}}}(\tilde{f}_i)\) since \(A_{ii} > 0\) by positive definiteness of \(A\). Therefore, \(\deg (\tilde{f}_i) = 2\). Now split each \(r_{ij}\) according to

$$\begin{aligned} s_{ij}&:= \sum \limits _{\begin{array}{c} t \in {{\mathrm{supp}}}(r_{ij}), \\ \deg (t) \le m_i-2 \end{array}} C(r_{ij},t) t&t_{ij}&:= \sum \limits _{\begin{array}{c} t \in {{\mathrm{supp}}}(r_{ij}), \\ \deg (t) > m_i-2 \end{array}} C(r_{ij},t) t. \end{aligned}$$

We claim that \(\sum _{j=1}^n t_{ij} \tilde{f}_j = 0\). Assume the opposite and observe that for each \(t \in \mathbf {T}\) with \(\deg (t) > m_i-2\) and each \(j=1,\ldots ,n\) it follows that \(C(t\tilde{f}_j, s) = 0\) for each \(s \in \mathbf {T}\) with \(\deg (s) \le m_i\). Consequently, \(C(\sum _{j=1}^n t_{ij} \tilde{f}_j, s) = 0\) for each \({s \in \mathbf {T}}\) with \(\deg (s) \le m_i\) implying that \(\deg (\sum _{j=1}^n t_{ij} \tilde{f}_j) > m_i\) since \(\sum _{j=1}^n t_{ij} \tilde{f}_j \ne 0\). But we also have that

$$\begin{aligned} m_i = \deg (\tilde{g}_i) = \deg \left( \sum _j s_{ij} \tilde{f}_j + \sum _j t_{ij} \tilde{f}_j \right) , \end{aligned}$$

and \(\deg \Bigl (\sum _j s_{ij} \tilde{f}_j \Bigr ) \le m_i\), from which the contradiction \(\deg (\sum _{j=1}^n t_{ij} \tilde{f}_j) \le m_i\) follows. Hence, \(\sum _{j=1}^n t_{ij} \tilde{f}_j = 0\). In particular, \(\tilde{g}_i = \sum _{j=1}^n s_{ij} \tilde{f}_j\), so, by possibly replacing \(r_{ij}\) with \(s_{ij}\), we can achieve in (22) that \(\deg (r_{ij}) \le m_i -2\) for each \(j=1,\ldots ,n\).

We define the polynomials \(h_i := \sum _j r_{ij} f_j\) and consider them elements of the polynomial ring \(\tilde{\mathbf {P}} := \mathbf {C}[a_1,\ldots ,a_n,X_1,\ldots ,X_n]\) in \(2n\) variables, and extend the order \(\prec \) to \(\tilde{\mathbf {P}}\), i.e., we consider \(a_1\prec \cdots \prec a_n \prec X_1\prec \cdots \prec X_n\). Additionally, we denote by \(\tilde{\mathbf {T}}\) the set of power products of \(\tilde{\mathbf {P}}\). It follows that

$$\begin{aligned} h_i = \sum \limits _j r_{ij} f_j&= \sum \limits _j r_{ij} (\tilde{f}_j + b_j X_j + c_j X_j a_j + d_j a_j) \\&= \sum \limits _j r_{ij} \tilde{f}_j + \sum \limits _j r_{ij} (b_j X_j + c_j X_j a_j + d_j a_j) \\&= \tilde{g}_i + \sum \limits _j r_{ij} (b_j X_j + c_j X_j a_j + d_j a_j). \end{aligned}$$

Since we assumed that the \(b_j\), \(c_j\), and \(d_j\) are constants, \(\deg (r_{ij}) \le m_i - 2\) implies that

$$\begin{aligned} \deg \left( \sum _j r_{ij} (b_j X_j + d_j a_j ) \right) \le m_i - 1, \qquad \deg \left( \sum _j r_{ij} c_j X_j a_j \right) \le m_i \end{aligned}$$

and since \(a_j \prec X_i\) for all \(j\), we have

$$\begin{aligned} {{\mathrm{LPP}}}(h_i) = {{\mathrm{LPP}}}(\tilde{g}_i) = X_i^{m_i}. \end{aligned}$$

Now we define the ideal \(J \vartriangleleft \tilde{\mathbf {P}}\) generated by \(f_1,\ldots ,f_n\) and the polynomials \(k_j = a_j^r - X_j^{q}\), \(j = 1,\ldots ,n\). Clearly, \(h_1,\ldots ,h_n \in J\). Also, \({{\mathrm{LPP}}}(h_i) = X_i^{m_i}\) and \({{\mathrm{LPP}}}(k_i) = a_i^r\) for \(i=1,\ldots ,n\), the latter since \(r > q\).

In order to prove that \(V(J)\) is finite we consider the reduced Gröbner basis \(G\) for \(J\) with respect to \(\prec \). By the above, we have \(X_i^{m_i},a_i^r \in \{{{\mathrm{LPP}}}(f)\,|\,\,f \in J \}\), and by Theorem 8.1 \(\{{{{\mathrm{LPP}}}(f)}\ : \ {f \in J}\} = \{{t g}\ : \ {g \in G, \ t\in \tilde{\mathbf {T}}}\}\) . Hence, for each \(i=1,\ldots ,n\), \(X_i^{m_i} = t_i {{\mathrm{LPP}}}(g_i)\) and \(a_i^r = t_{i-n} {{\mathrm{LPP}}}(g_{i-n})\) for some \(g_1,\ldots ,g_{2n}\in G\), and \(t_1,\ldots ,t_{2n} \in \tilde{\mathbf {T}}\). This is only possible if there exist \(\hat{m}_1,\ldots ,\hat{m}_{2n}\) such that \({{\mathrm{LPP}}}(g_i) = X_i^{\hat{m}_i}\) and \({{\mathrm{LPP}}}(g_{i+n}) = a_i^{\hat{m}_{i+n}}\) for \(i=1,\ldots ,n\). This implies by Theorem 8.2 that \(V(J) = V(\{{ f_1,\ldots ,f_n,k_1,\ldots ,k_n }\})\) is finite, which is what we wanted to show. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bredies, K., Lorenz, D.A. & Reiterer, S. Minimization of Non-smooth, Non-convex Functionals by Iterative Thresholding. J Optim Theory Appl 165, 78–112 (2015). https://doi.org/10.1007/s10957-014-0614-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-014-0614-7

Keywords

Mathematics Subject Classification

Navigation