1 Introduction

Let V be a finite-dimensional real vector space and \(V^*\) its dual space. In this paper we consider the bound-constrained convex minimization problem

$$\begin{aligned} \begin{array}{ll} \min &{} f(x)\\ \mathrm{s.t.~}&{} x \in \mathbf {x}, \end{array} \end{aligned}$$
(1)

where \(f: \mathbf {x}\rightarrow {\mathbb {R}}\) is a—smooth or nonsmooth—convex function, and \(\mathbf {x}= [\underline{x}, \overline{x}]\) is an axis-parallel box in V in which \(\underline{x}\) and \(\overline{x}\) are the vectors of lower and upper bounds on the components of x, respectively. Lower bounds are allowed to take the value \(-\infty \), and upper bounds the value \(+\infty \).

Throughout the paper, \(\langle g,x\rangle \) denotes the value of \(g \in V^*\) at \(x \in V\). A subgradient of the objective function f at x is a vector \(g(x)\in V^*\) satisfying

$$\begin{aligned} f(z) \ge f(x) + \langle g(x),z - x \rangle \end{aligned}$$

for all \(z \in V\). It is assumed that the set of optimal solutions of (1) is nonempty and the first-order information about the objective function (i.e., for any \(x\in \mathbf {x}\), the function value f(x) and some subgradient g(x) at x) are available by a first-order black-box oracle.

Motivation and history Bound-constrained optimization in general is an important problem appearing in many fields of science and engineering, where the parameters describing physical quantities are constrained to be in a given range. Furthermore, it plays a prominent role in the development of general constrained optimization methods since many methods reduce the solution of the general problem to the solution of a sequence of bound-constrained problems.

There are many algorithms for solving bound-constrained optimization; here, we mention only those related to our study. Lin and Moré (1999) and Kim et al. (2010) proposed Newton and quasi-Newton methods for solving bound-constrained optimization. In 1995, Byrd et al. (1995) proposed a limited memory algorithm called LBFGS-B for general smooth nonlinear bound-constrained optimization. Branch et al. (1999) proposed a trust-region method to solve this problem. Neumaier and Azmi (2016) solved this problem by a limited memory algorithm. The smooth bound-constrained optimization problem was also solved by Birgin et al. (2000) and Hager and Zhang (2006, 2013) using nonmonotone spectral projected gradient methods, active set strategy and affine scaling scheme, respectively. Some limited memory bundle methods for solving bound-constrained nonsmooth problems were proposed by Karmitsa and Mäkelä (2010a, b).

In recent years convex optimization has received much attention because it arises in many applications and is suitable for solving problems involving high-dimensional data. The particular case of bound-constrained convex optimization involving a smooth or nonsmooth objective function also appears in a variety of applications, of which we mention the following:

Example 1

(Bound-constrained linear inverse problems) Given \(A \in {\mathbb {R}}^{m \times n}\), \(b \in {\mathbb {R}}^m\) and \(\lambda \in {\mathbb {R}}\), for \(m \ge n\), the bound-constrained regularized least-squares problem is given by

$$\begin{aligned} \begin{array}{ll} \min &{}~~ \displaystyle f(x) := \frac{1}{2} \Vert Ax - b \Vert _2^2 + \lambda \varphi (x)\\ \mathrm{s.t.~}&{}~~ x \in \mathbf {x}, \end{array} \end{aligned}$$
(2)

and the bound-constrained regularized \(l_1\) problem is given by

$$\begin{aligned} \begin{array}{ll} \min &{}~~ \displaystyle f(x) := \Vert Ax - b \Vert _1 + \lambda \varphi (x)\\ \mathrm{s.t.~}&{}~~ x \in \mathbf {x}, \end{array} \end{aligned}$$
(3)

where \(\mathbf {x}=[\underline{x}, \overline{x}]\) is a box and \(\varphi \) is a smooth or nonsmooth regularizer, often a weighted power of a norm; see Sect. 4 for examples. The problems (2) and (3) are commonly arising in the context of control and inverse problems, especially for some imaging problems like denoising, deblurring and inpainting. Morini et al. (2010) formulated the bound-constrained least-squares problem (2) as a nonlinear system of equations and proposed an iterative method based on a reduced Newton’s method. Recently, Zhang and Morini (2013) used alternating direction methods to solve these problems. More recently, Chan et al. (2013), Boţ et al. (2013), and Boţ and Hendrich (2013) proposed alternating direction methods, primal-dual splitting methods, and a Douglas–Rachford primal-dual method, respectively, to solve both (2) and (3) for some applications.

Content In this paper, we show that the optimal subgradient algorithm OSGA proposed by Neumaier (2016) can be used for solving bound-constrained problems of the form (1). In order to run OSGA, one needs to solve a rational auxiliary subproblem. We here investigate efficient schemes for solving this subproblem in the presence of bounds on its variables. To this end, we show that the solution of the subproblem has a one-dimensional piecewise linear representation and that it may be computed by solving a sequence of one-dimensional piecewise rational optimization problems. We also give an iterative scheme that can solve the OSGA subproblem approximately by solving a one-dimensional nonlinear equation. We give numerical results demonstrating the performance of OSGA on some problems from applications. More specifically, in Sect. 2, we give a brief review of the main idea of OSGA. In Sect. 3, we investigate properties of the solution of the subproblem (9) that lead to two algorithms for solving it efficiently. In Sect. 4, we report numerical results of OSGA for an one-dimensional signal recovery and a two-dimensional image deblurring problem. Finally, Sect. 5 delivers some conclusions.

2 A review of OSGA

In this section, we briefly review the main idea of the optimal subgradient algorithm (see Algorithm 1) proposed by Neumaier (2016) for solving the convex constrained minimization problem

$$\begin{aligned} \begin{array}{ll} \min &{}~ f(x)\\ \mathrm{s.t.~}&{}~ x \in C, \end{array} \end{aligned}$$
(4)

where \(f:C\rightarrow {\mathbb {R}}\) is a proper and convex function defined on a nonempty, closed, and convex subset C of a finite-dimensional vector space V, which we take without loss of generality to be \({\mathbb {R}}^n\), which is its own dual space.

OSGA is a subgradient algorithm for problem (4) that uses first-order information, i.e., function values and subgradients, to construct a sequence of iterations \(\{x_j\}\in C\) whose sequence of function values \(\{f(x_j)\}\) converge to the minimum \(\widehat{f} = f(\widehat{x})\) with the optimal complexity. OSGA requires no information regarding global parameters such as Lipschitz constants of function values and gradients. It uses a so-called prox function which we take to be

$$\begin{aligned} Q(x):= Q_0 + \frac{1}{2}\Vert x-x^0\Vert _2^2 \end{aligned}$$
(5)

where \(Q_0>0\). Thus \(Q(x)\ge Q_0>0\) for all \(x \in {\mathbb {R}}^n\), and

$$\begin{aligned} Q(z)\ge Q(x)+\langle g_Q(x),z-x\rangle +\frac{1}{2}\Vert z-x\Vert _2^2, \end{aligned}$$
(6)

where \(g_Q(x)=x-x^0\) is the gradient of Q at x and \(\Vert x\Vert _2\) is the Euclidean norm. At each iteration, OSGA satisfies the bound

$$\begin{aligned} 0\le f(x_b) -\widehat{f}\le \eta Q(u) \end{aligned}$$
(7)

on the currently best function value \(f(x_b)\) with a monotonically decreasing error factor \(\eta \) that is guaranteed to converge to zero by an appropriate steplength selection strategy (see Procedure PUS). Note that \(\widehat{x}\) is not known a priori, thus the error bound is not fully constructive. But it is sufficient to guarantee the convergence of \(f(x_b)\) to \(\widehat{f}\) with a predictable worst case complexity. To maintain (7), OSGA considers linear relaxations of f at z,

$$\begin{aligned} f(z)\ge \gamma +\langle h,z\rangle ~~~\hbox {for all }z\in C, \end{aligned}$$
(8)

where \(\gamma \in {\mathbb {R}}\) and \(h\in V^*\), updated using linear underestimators available from the subgradients evaluated (see Algorithm 1). For each such linear relaxation, OSGA solves a maximization problem of the form

$$\begin{aligned} \begin{array}{rl} E(\gamma ,h):=\max &{}~ E_{\gamma ,h}(x)\\ \mathrm{s.t.~}&{}~ x \in C, \end{array} \end{aligned}$$
(9)

where

$$\begin{aligned} E_{\gamma ,h}(x):= -\displaystyle \frac{\gamma +\langle h,x\rangle }{Q(x)}. \end{aligned}$$
(10)

Let \(\gamma _b:= \gamma -f(x_b)\) and \(u:=U(\gamma _b,h)\in C\) be the solution of (9). From (8) and (10), we obtain

$$\begin{aligned} E(\gamma _b,h) \ge -\frac{\gamma -f(x_b)+\langle h,u \rangle }{Q(u)} \ge \frac{f(x_b)-\widehat{f}}{Q(u)} \ge 0. \end{aligned}$$
(11)

Setting \(\eta :=E(\gamma _b,h)\) in (11) implies that (7) is valid. If \(x_b\) is not optimal for (1), then the right inequality in (11) is strict, and since \(Q(z)\ge Q_0>0\), we conclude that the maximum \(\eta \) is positive. In the remainder of the paper, we denote by \(g_{x_b}\) and \(f_{x_b}\) a subgradient of f at \(x_b\) and the function value \(f(x_b)\), respectively.

In each step, OSGA uses the next scheme for updating the given parameters \(\alpha \), h, \(\gamma \), \(\eta \), and u, see Neumaier (2016) for more details.

figure a
figure b

The original stopping criterion of OSGA is \(\eta \le \varepsilon \); however, we will use a more practical stopping criterion in Sect. 4. In Neumaier (2016), it is shown that the number of iterations to achieve an \(\varepsilon \)-optimum is of the optimal order \(\mathcal {O}\left( \varepsilon ^{-1/2} \right) \) for a smooth function f with Lipschitz continuous gradients and of the order \(\mathcal {O}\left( \varepsilon ^{-2} \right) \) for a Lipschitz continuous nonsmooth function f, cf. Nemirovsky and Yudin (1983) and Nesterov (2004, 2005). The algorithm has low memory requirements so that, if the subproblem (9) can be solved efficiently, OSGA is appropriate for solving large-scale problems. Numerical results reported by Ahookhosh (2016) for unconstrained problems, and by Ahookhosh and Neumaier (2016a, b, 2013) for simply constrained problems show the good behavior of OSGA for solving practical problems.

In this paper, for the above choices of Q(x) and an arbitrary box \(\mathbf {x}\), we solve the subproblem (9) for both medium- and large-scale problems. It follows that OSGA is applicable to solve bound-constrained convex problems as well. Since the underlying problem (1) is a special case of the problem considered in Neumaier (2016), the complexity of OSGA remains valid for (1), which is summarized in the following theorem.

Theorem 2

Suppose that \(f-\mu Q\) is convex and \(\mu \ge 0\). Then we have

  1. (i)

    (Nonsmooth complexity bound) If the points generated by Algorithm 1 stay in a bounded region of the interior of \(\mathbf {x}\), or if f is Lipschitz continuous on \(\mathbf {x}\), the total number of iterations needed to reach a point with \(f(x)\le f(u)+\varepsilon \) is at most \(\mathcal {O}((\varepsilon ^2+\mu \varepsilon )^{-1})\). Thus the asymptotic worst case complexity is \(\mathcal {O}(\varepsilon ^{-2})\) when \(\mu =0\) and \(\mathcal {O}(\varepsilon ^{-1})\) when \(\mu >0\).

  2. (ii)

    (Smooth complexity bound) If f has Lipschitz continuous gradients with Lipschitz constant L, the total number of iterations needed by Algorithm 1 to reach a point with \(f(x)\le f(u)+\varepsilon \) is at most \(\mathcal {O}(\varepsilon ^{-1/2})\) if \(\mu =0\), and at most \(\displaystyle \mathcal {O}(|\log \varepsilon |\sqrt{L/\mu })\) if \(\mu >0\).

Proof

Since all assumptions of Theorems 4.1 and 4.2, Propositions 5.2 and 5.3, and Theorem 5.1 in Neumaier (2016) are satisfied, the results remain valid. \(\square \)

3 Solution of the bound-constrained subproblem (9)

We here emphasize that the function \(E_{\gamma ,h}(\cdot )\) is quasi-concave. Hence finding a solution of this subproblem is the bottleneck of OSGA, which is both theoretically and practically interesting to be studied. Therefore, in this section we investigate the solution of the bound-constrained subproblem (9) and give two iterative schemes, where the first one solves (9) exactly whereas the second one solves it approximately.

3.1 Global solution of the OSGA rational subproblem (9)

In this subsection, we describe an explicit solution of the bound-constrained subproblem (9).

Without loss of generality, we here consider \(V = {\mathbb {R}}^n\). It is not hard to adapt the results to \(V = {\mathbb {R}}^{m \times n}\) and other finite-dimensional spaces. The method is related to one used in several earlier papers. In 1980, Helgason et al. (1980) characterized the solution of a singly constrained quadratic problem with bound constraints. Later, Pardalos and Kovoor (1990) developed an \(\mathcal {O}(n)\) algorithm for this problem using binary search to solve the associated Kuhn–Tucker system. This problem was also solved by Dai and Fletcher (2006) using a projected gradient method. Zhang et al. (2011) solved the linear support vector machine problem by a cutting plane method employing a similar technique.

In the papers mentioned, the key is showing that the problem can be reduced to a piecewise linear problem in a single dimension. To apply this idea to the present problem, we prove that (9) is equivalent to a one-dimensional minimization problem and then develop a procedure to calculate its minimizer. We write

$$\begin{aligned} u(\lambda ):=\sup \{ \underline{x},\inf \{x^0-\lambda h,\overline{x}\} \} \end{aligned}$$
(12)

for the projection of \(x^0-\lambda h\) to the box \(\mathbf {x}\).

Proposition 3

For \(h \ne 0\), the maximum of the subproblem (9) is attained at \(u:=u(\lambda )\), where \(\lambda >0\) or \(\lambda = + \infty \) is the inverse of the value of the maximum.

Proof

The function \(E_{\gamma ,h}: V \rightarrow {\mathbb {R}}\) defined by (10) is continuously differentiable and \(\eta := E(\gamma ,h) > 0\). Since \(Q(x)=\frac{1}{2} \Vert x-x^0\Vert ^2\), \(g_Q(x)=x-x^0\). By differentiating both sides of the equation \(E_{\gamma ,h}(x) Q(x) = -\gamma - \langle h,x \rangle \), we obtain \( \frac{\partial E_{\gamma ,h}}{\partial x} Q(x)+ \eta (x - x^0)= - h,\), leading to

$$\begin{aligned} \frac{\partial E_{\gamma ,h}}{\partial x} Q(x) = - \eta (x - x^0) - h. \end{aligned}$$

At the maximizer u, we have \(\eta Q(u) = -\gamma - \langle h,u \rangle \). Now the first-order optimality conditions imply that for \(i = 1, 2, \ldots , n\),

$$\begin{aligned} -\eta (u_i - x_i^0) - h_i \left\{ \begin{array}{lll} \le 0 \quad &{}~~ \mathrm {if}~ u_i = \underline{x}_i,\\ \ge 0 \quad &{}~~ \mathrm {if}~ u_i = \overline{x}_i,\\ = 0 \quad &{}~~ \mathrm {if}~ \underline{x}_i< u_i < \overline{x}_i. \end{array} \right. \end{aligned}$$
(13)

Since \(\eta > 0\), we may define \(\lambda := \eta ^{-1}\) and find that, for \(i = 1, 2, \ldots , n\),

$$\begin{aligned} u_i = \left\{ \begin{array}{lll} \underline{x}_i \quad &{}~~ \mathrm {if}~ \underline{x}_i \ge x_i^0 -\lambda h_i,\\ \overline{x}_i \quad &{}~~ \mathrm {if}~ \overline{x}_i \le x_i^0 -\lambda h_i,\\ x_i^0 -\lambda h_i \quad &{}~~ \mathrm {if}~ \underline{x}_i \le x_i^0 -\lambda h_i \le \overline{x}_i. \end{array} \right. \end{aligned}$$
(14)

This implies that \(u = u(\lambda )\). \(\square \)

Proposition 3 gives the key feature of the solution of the subproblem (9) implying that it is enough to consider points of the form (12) which depend on only one variable \(\lambda \). In the remainder of this section, we focus on deriving the optimal value for \(\lambda \).

Example 4

Let us consider a very special case that \(\mathbf {x}\) is the n-dimensional nonnegative orthant, i.e., \(\underline{x}_i= 0\) and \(\overline{x}_i = +\infty \), for \(i=1,\ldots ,n\). Nonnegativity as a constraint is important in many applications, see Bardsley and Vogel (2003), Elfving et al. (2012), Esser et al. (2013) and Kaufman and Neumaier (1996, 1997). For the prox function (5) with \(x^0=0\), (12) becomes

$$\begin{aligned} u(\lambda ) = \sup \{ \underline{x},\inf \{-\lambda h,\overline{x}\}\} = \lambda h_-, \end{aligned}$$

where \(z_- := \max \{0,-z\}\). By Proposition 2.2 of Neumaier (2016), we have

$$\begin{aligned} \frac{1}{\lambda } \left( \frac{1}{2} \Vert u(\lambda )\Vert _2^2 + Q_0 \right) + \gamma + \langle h,u(\lambda ) \rangle = \left( \frac{1}{2} \Vert h_-\Vert _2^2 + \langle h,h_- \rangle \right) \lambda ^2 + \gamma \lambda + Q_0 = 0, \end{aligned}$$

leading to

$$\begin{aligned} \beta _1 \lambda ^2 + \beta _2 \lambda + \beta _3 = 0, \end{aligned}$$

where \(\beta _1 =\frac{1}{2} \Vert h_-\Vert _2^2 + \langle h,h_- \rangle \), \(\beta _2 = \gamma \), and \(\beta _3 = Q_0\). Since we search for the maximum \(\eta \), the solution is the largest root of this equation, i.e.,

$$\begin{aligned} \lambda = \frac{-\beta _2 + \sqrt{\beta _2^2-4 \beta _1 \beta _3}}{2\beta _1}. \end{aligned}$$

This shows that for the nonnegativity constraint the subproblem (9) can be solved in a closed form.

However, for a general bound-constrained problem, solving (9) requires a much more sophisticated scheme. To derive the optimal \(\lambda \ge 0\) in Proposition 3, we first determine its permissible range provided by the three conditions considered in (14) leading to the interval

$$\begin{aligned} \lambda \in [\underline{\lambda }_i,\overline{\lambda }_i], \end{aligned}$$
(15)

for each component of x. In particular, if \(h_i = 0\), since \(x^0\) is a feasible point, \(u_i = x_i^0 -\lambda h_i = x_i^0\) satisfies the third condition in (14). Thus there is no upper bound for \(\lambda \), leading to

$$\begin{aligned} \displaystyle \underline{\lambda }_i = 0, ~~\overline{\lambda }_i = + \infty ~~~ \mathrm {if}~ u_i = x_i^0,~ h_i=0. \end{aligned}$$
(16)

If \(h_i \ne 0\), we consider the three cases (i) \(\underline{x}_i \ge x_i^0 -\lambda h_i\), (ii) \(\overline{x}_i \le x_i^0 -\lambda h_i\), and (iii) \(\underline{x}_i \le x_i^0 -\lambda h_i \le \overline{x}_i\) of (14). In Case (i), if \(h_i < 0\), division by \(h_i\) implies that \(\lambda \le - (\underline{x}_i - x_i^0)/h_i \le 0\), which is not in the acceptable range for \(\lambda \). In this case, if \(h_i > 0\), then \(\lambda \ge - (\underline{x}_i - x_i^0)/h_i\) leading to

$$\begin{aligned} \displaystyle \underline{\lambda }_i = -\frac{\underline{x}_i - x_i^0}{h_i},~\overline{\lambda }_i = + \infty ~~~ \mathrm {if}~ u_i = \underline{x}_i,~h_i > 0. \end{aligned}$$
(17)

In Case (ii), if \(h_i < 0\), then \(\lambda \ge - (\overline{x}_i - x_i^0)/h_i\) implying

$$\begin{aligned} \displaystyle \underline{\lambda }_i= - \frac{\overline{x}_i - x_i^0}{h_i},~\overline{\lambda }_i= + \infty ~~~ \mathrm {if}~ u_i = \overline{x}_i,~ h_i < 0. \end{aligned}$$
(18)

In Case (ii), if \(h_i > 0\), then \(\lambda \le - (\overline{x}_i - x_i^0)/h_i \le 0\), which is not in the acceptable range of \(\lambda \). In Case (iii), if \(h_i < 0\), division by \(h_i\) implies

$$\begin{aligned} - \frac{\underline{x}_i - x_i^0}{h_i} \le \lambda \le - \frac{\overline{x}_i - x_i^0}{h_i}. \end{aligned}$$

The lower bound satisfies \(- (\underline{x}_i - x_i^0)/h_i \le 0\), so it is not acceptable, leading to

$$\begin{aligned} \displaystyle \underline{\lambda }_i= 0,~\overline{\lambda }_i =-\frac{\overline{x}_i - x_i^0}{h_i} ~~~ \mathrm {if}~ u_i = x_i^0 -\lambda h_i \in [\underline{x}_i \overline{x}_i],~ h_i < 0. \end{aligned}$$
(19)

In Case (iii), if \(h_i > 0\), then

$$\begin{aligned} - \frac{\overline{x}_i - x_i^0}{h_i} \le \lambda \le - \frac{\underline{x}_i - x_i^0}{h_i}. \end{aligned}$$

However, the lower bound \(- (\overline{x}_i - x_i^0)/h_i \le 0\) is not acceptable, i.e.,

$$\begin{aligned} \displaystyle \underline{\lambda }_i= 0,~\overline{\lambda }_i =-\frac{\underline{x}_i - x_i^0}{h_i} ~~~ \mathrm {if}~ u_i = x_i^0 -\lambda h_i \in [\underline{x}_i \overline{x}_i],~ h_i > 0. \end{aligned}$$
(20)

As a result, the following proposition is valid.

Proposition 5

If \(u(\lambda )\) is solution of the problem (9), then

$$\begin{aligned} \lambda \in [\underline{\lambda }_i,\overline{\lambda }_i] ~~~ i = 1, \ldots , n, \end{aligned}$$

where \(\underline{\lambda }_i\) and \(\overline{\lambda }_i\) are computed by

$$\begin{aligned} \underline{\lambda }_j= & {} \left\{ \begin{array}{ll} \displaystyle -\frac{\underline{x}_i - x_i^0}{h_i} \quad &{}~~\quad \mathrm {if}~ u_i = \underline{x}_i,~ h_i> 0,\\ \displaystyle - \frac{\overline{x}_i - x_i^0}{h_i} \quad &{}~~\quad \mathrm {if}~ u_i = \overline{x}_i,~ h_i< 0,\\ \displaystyle 0 \quad &{}~~\quad \mathrm {if}~ \widetilde{x}_i \in [\underline{x}_i \overline{x}_i],~ h_i< 0,\\ \displaystyle 0 \quad &{}~~\quad \mathrm {if}~ \widetilde{x}_i \in [\underline{x}_i \overline{x}_i],~ h_i> 0,\\ \displaystyle 0 \quad &{}~~\quad \mathrm {if}~ h_i=0, \end{array} \right. ~~~~~~\nonumber \\ \overline{\lambda }_j= & {} \left\{ \begin{array}{ll} \displaystyle + \infty \quad &{}~~\quad \mathrm {if}~ u_i = \underline{x}_i,~ h_i> 0,\\ \displaystyle + \infty \quad &{}~~\quad \mathrm {if}~ u_i = \overline{x}_i,~ h_i< 0,\\ \displaystyle -\frac{\overline{x}_i - x_i^0}{h_i} \quad &{}~~\quad \mathrm {if}~ \widetilde{x}_i \in [\underline{x}_i \overline{x}_i],~ h_i < 0,\\ \displaystyle -\frac{\underline{x}_i - x_i^0}{h_i} \quad &{}~~\quad \mathrm {if}~ \widetilde{x}_i \in [\underline{x}_i \overline{x}_i],~ h_i > 0,\\ \displaystyle + \infty \quad &{}~~\quad \mathrm {if}~ h_i=0, \end{array} \right. \end{aligned}$$
(21)

in which \(\widetilde{x}_i = x_i^0 -\lambda h_i\) for \(i = 1, \ldots , n\).

From Proposition 5, only one of the conditions (16)–(20) is satisfied for each component of x. Thus, for each \(i = 1, \ldots , n\) with \(h_i \ne 0\), we have a single breakpoint

$$\begin{aligned} \widetilde{\lambda }_i := \left\{ \begin{array}{ll} \displaystyle - \frac{\overline{x}_i - x_i^0}{h_i} \quad &{}~~\quad \mathrm {if}~ h_i <0,\\ \displaystyle - \frac{\underline{x}_i - x_i^0}{h_i} \quad &{}~~\quad \mathrm {if}~ h_i >0,\\ \displaystyle +\infty \quad &{}~~\quad \mathrm {if}~ h_i = 0. \end{array} \right. \end{aligned}$$
(22)

Sorting the n bounds \(\widetilde{\lambda }_i, ~ i = 1, \ldots , n\), in increasing order, augmenting the resulting list by 0 and \(+\infty \), and deleting possible duplicate points, we obtain a list of \(m + 1\) different breakpoints (\(m + 1 \le n + 2\)), denoted by

$$\begin{aligned} 0 = \lambda _1< \lambda _2< \cdots< \lambda _{m}< \lambda _{m +1} = + \infty . \end{aligned}$$
(23)

By construction, \(u(\lambda )\) is linear in each interval \([\lambda _k, \lambda _{k+1}]\), for \(k = 1, \ldots , m\). The next proposition gives an explicit representation for \(u(\lambda )\).

Proposition 6

The solution \(u(\lambda )\) of the auxiliary problem (9) defined by (12) has the form

$$\begin{aligned} u(\lambda ) = p^k + \lambda q^k ~~~ \mathrm {for}~ \lambda \in [\lambda _k,\lambda _{k+1}]~~~ (k = 1,2, \ldots , m), \end{aligned}$$
(24)

where

$$\begin{aligned} p_i^k = \left\{ \begin{array}{ll} x_i^0 \quad &{}~~ \mathrm {if}~ h_i = 0,\\ x_i^0 \quad &{}~~ \mathrm {if}~ \lambda _{k+1} \le \widetilde{\lambda }_i,\\ \underline{x}_i \quad &{}~~ \mathrm {if}~ \lambda _k \ge \widetilde{\lambda }_i, ~ h_i> 0,\\ \overline{x}_i \quad &{}~~ \mathrm {if}~ \lambda _k \ge \widetilde{\lambda }_i, ~ h_i< 0, \end{array} \right. ~~~~~~ q_i^k = \left\{ \begin{array}{ll} 0 \quad &{}~~ \mathrm {if}~ h_i = 0,\\ - h_i \quad &{}~~ \mathrm {if}~ \lambda _{k+1} \le \widetilde{\lambda }_i,\\ 0 \quad &{}~~ \mathrm {if}~ \lambda _k \ge \widetilde{\lambda }_i, ~ h_i > 0,\\ 0 \quad &{}~~ \mathrm {if}~ \lambda _k \ge \widetilde{\lambda }_i, ~ h_i < 0. \end{array} \right. \end{aligned}$$
(25)

Proof

Since \(\lambda > 0\), there exists \(k \in \{1, \ldots , m\}\) such that \(\lambda \in [\lambda _k,\lambda _{k+1}]\). Let \(i \in \{1, \ldots , n\}\). If \(h_i = 0\), (16) implies \(u_i = x_i^0\). If \(h_i \ne 0\), the way of construction of \(\lambda _i\) for \(i = 1, \ldots , m\) implies that \(\widetilde{\lambda }_i \not \in (\lambda _k,\lambda _{k+1})\), so two cases are distinguished: (i) \(\lambda _{k+1} \le \widetilde{\lambda }_i\); (ii) \(\lambda _k \ge \widetilde{\lambda }_i\). In Case (i), Proposition 5 implies that \(\widetilde{\lambda }_i = \overline{\lambda }_i\), while it is not possible \(\widetilde{\lambda }_i \ne \underline{\lambda }_i\). Therefore, either (19) or (20) holds dependent on the sign of \(h_i\), implying \(x_i^0 -\lambda h_i \in [\underline{x}_i, \overline{x}_i]\), so that \(p_i^k = x_i^0\) and \(q_i^k = -h_i\). In Case (ii), Proposition 5 implies that \(\widetilde{\lambda }_i = \underline{\lambda }_i\), while it is not possible \(\widetilde{\lambda }_i \ne \overline{\lambda }_i\). Therefore, either (17) or (18) holds. If \(h_i < 0\), then (18) holds, i.e., \(p_i^k = \overline{x}_i\) and \(q_i^k = 0\). Otherwise, (17) holds, implying \(p_i^k = \underline{x}_i\) and \(q_i^k = 0\). This proves the claim. \(\square \)

Proposition 6 exhibits the solution \(u(\lambda )\) of the auxiliary problem (9) as a piecewise linear function of \(\lambda \). In the next result, we show that solving the problem (9) is equivalent to maximizing a one-dimensional piecewise rational function.

Proposition 7

The maximal value of the subproblem (9) is the maximum of the piecewise rational function \(\eta (\lambda )\) defined by

$$\begin{aligned} \eta (\lambda ) := \frac{a_k + b_k \lambda }{c_k + d_k \lambda + s_k \lambda ^2} ~~~\text{ if }~~ \lambda \in [\lambda _k,\lambda _{k+1}]~~~ (k= 1, 2, \ldots , m), \end{aligned}$$
(26)

where

$$\begin{aligned} a_k:= & {} -\gamma - \langle h, p^k \rangle ,~~~ b_k := -\langle h, q^k \rangle ,\\ c_k:= & {} Q_0 + \frac{1}{2} \Vert p^k - x^0\Vert ^2, ~~~ d_k := \langle p^k - x^0, q^k \rangle , ~~~ s_k := \frac{1}{2} \Vert q^k\Vert ^2. \end{aligned}$$

Moreover, \(c_k > 0\), \(s_k > 0\) and \(4s_k c_k > d_k^2\).

Proof

By Proposition 3 and 6, the global minimizer of (9) has the form (24). We substitute (24) into the function (10), and obtain from

$$\begin{aligned} \gamma + \langle h, x^k(\lambda ) \rangle =\gamma + \langle h, p^k + q^k \lambda \rangle =\gamma + \langle h, p^k \rangle + \langle h, q^k \rangle \lambda =-a_k - b_k \lambda \end{aligned}$$

and

$$\begin{aligned} Q_0\le & {} Q(x^k(\lambda ))\\= & {} Q(p^k + q^k \lambda )\\= & {} Q_0 + \frac{1}{2} \Vert p^k - x^0\Vert ^2 + \langle p^k - x^0, q^k \rangle \lambda + \frac{1}{2} \Vert q^k\Vert ^2 \lambda ^2 =c_k + d_k \lambda + s_k \lambda ^2 \end{aligned}$$

the formula

$$\begin{aligned} E_{\gamma ,h}(u(\lambda )) = -\frac{\gamma + \langle h, x^k(\lambda ) \rangle }{Q(x^k(\lambda ))} = \eta (\lambda ). \end{aligned}$$
(27)

Since \(Q_0 > 0\), the denominator of (26) is bounded away from zero; in particular \(c_k> 0\). This implies \(4s_k c_k > d_k^2\). It is enough to verify \(s_k >0\) For \(k = 1, 2, \ldots , m\) and \(\lambda \in [\lambda _k,\lambda _{k+1}]\). Now the definition of \(q_k\) in (25) implies that \(h_i \ne 0\) for \(i \in I = \{i ~:~ \lambda _{k+1} \le \widetilde{\lambda }_i\}\), leading to \(q^k \ne 0\), hence \(s_k > 0\). \(\square \)

The next result leads to a systematic way to maximize the one-dimensional rational problem (26).

Proposition 8

Let a, b, c, d, and s be real constants with \(c > 0\), \(s > 0\), and \(4sc > d^2\). Then

$$\begin{aligned} \phi (\lambda ) := \frac{a + b \lambda }{c + d \lambda + s \lambda ^2} \end{aligned}$$
(28)

defines a function \(\phi :{\mathbb {R}}\rightarrow {\mathbb {R}}\) that has at least one stationary point. Moreover, the global maximizer of \(\phi \) is determined by the following cases:

  1. (i)

    If \(b \ne 0\), then \(a^2 - b (ad - bc)/s > 0\) and the global maximum

    $$\begin{aligned} \phi (\widehat{\lambda }) = \frac{b}{2s \widehat{\lambda } + d} \end{aligned}$$
    (29)

    is attained at

    $$\begin{aligned} \widehat{\lambda } = \frac{-a + \sqrt{a^2 - b(ad - bc)/s}}{b}. \end{aligned}$$
    (30)
  2. (ii)

    If \(b = 0\) and \(a > 0\), the global maximum is

    $$\begin{aligned} \phi (\widehat{\lambda }) = \frac{4as}{4cs - d^2}, \end{aligned}$$
    (31)

    attained at

    $$\begin{aligned} \widehat{\lambda } = -\frac{d}{2s}. \end{aligned}$$
    (32)
  3. (iii)

    If \(b = 0\) and \(a \le 0\), the maximum is \(\phi (\widehat{\lambda }) = 0\), attained at \(\widehat{\lambda } = + \infty \) for \(a<0\) and at all \(\lambda \in {\mathbb {R}}\) for \(a=0\).

Proof

The denominator of (28) is positive for all \(\lambda \in {\mathbb {R}}\) if and only if the stated condition on the coefficients hold. By the differentiation of \(\phi \) and using the first-order optimality condition, we obtain

$$\begin{aligned} \phi '(\lambda ) = \frac{b(c + d\lambda + s\lambda ^2) - (a + b\lambda ) (d + 2s\lambda )}{(c + d \lambda + s \lambda ^2)^2} = - \frac{bs \lambda ^2 + 2as \lambda + ad -bc}{(c + d \lambda + s \lambda ^2)^2}. \end{aligned}$$

For solving \(\phi '(\lambda ) = 0\), we consider possible solutions of the quadratic equation \(bs \lambda ^2 + 2as \lambda + ad -bc = 0\). Using the assumption \(4sc > d^2\), we obtain

$$\begin{aligned} (2as)^2 - 4 bs (ad - bc)= & {} (2as)^2 - (4 abds - 4b^2cs)\\= & {} (2as)^2 - 4 abds - b^2 (d^2 -4cs - d^2)\\= & {} (2as)^2 - 4 abds + (bd)^2 - b^2 (d^2 -4cs)\\\ge & {} (2as - bd)^2 - b^2 (d^2 -4cs) \ge 0, \end{aligned}$$

leading to

$$\begin{aligned} a^2 - \frac{b(ad-bc)}{s} \ge 0, \end{aligned}$$

implying that \(\phi '(\lambda ) = 0\) has at least one solution.

(i) If \(b \ne 0\), then

$$\begin{aligned} a^2 - \frac{b(ad-bc)}{s} = a^2 -\frac{bd}{s}a - \frac{b^2c}{s} = \left( a - \frac{bd}{2s} \right) ^2 + \frac{b^2}{4s^2}(4sc-d^2) > 0, \end{aligned}$$

implying there exist two solutions. Solving \(\phi '(\lambda ) = 0\), the stationary points of the function are found to be

$$\begin{aligned} \lambda = \frac{-a \pm \sqrt{a^2 - b(ad - bc)/s}}{b}. \end{aligned}$$
(33)

Therefore, \(a + b \lambda = \pm w\) with

$$\begin{aligned} w := \sqrt{a^2 - b(ad - bc)/s} > 0, \end{aligned}$$

and we have

$$\begin{aligned} \phi (\lambda ) = \frac{\pm w}{c + d \lambda + s \lambda ^2}. \end{aligned}$$
(34)

Since the denominator of this fraction is positive and \(w \ge 0\), the positive sign in Eq. (33) gives the maximizer, implying that (30) is satisfied. Finally, substituting this maximizer into (34) gives

$$\begin{aligned} \phi (\widehat{\lambda })= & {} \frac{w}{c + d \widehat{\lambda } + s \widehat{\lambda }^2} = \frac{b^2 w}{b^2c + bd (w - a) + s (w - a)^2}\\= & {} \frac{b^2 w}{a^2 s - b(ad - bc) + s w^2 + (bd - 2 as)w} = \frac{b^2 w}{2 s w^2 + (bd - 2 as)w}\\= & {} \frac{b^2 w}{w (2s (w - a) + bd)} = \frac{b}{2s \widehat{\lambda } + d}, \end{aligned}$$

hence (29) holds.

(ii) If \(b = 0\), we obtain

$$\begin{aligned} \phi '(\lambda ) = \frac{- a (d + 2 s \lambda )}{(c + d \lambda + s \lambda ^2)^2}. \end{aligned}$$

Hence the condition \(\phi '(\lambda ) = 0\) implies that \(a = 0\) or \(d + 2 s \lambda = 0\). The latter case implies

$$\begin{aligned} \widehat{\lambda } =- \frac{d}{2s}, ~~~ \phi (\widehat{\lambda }) = \frac{4as}{4cs - d^2}, \end{aligned}$$

whence \(\widehat{\lambda }\) is a stationary point of \(\phi \). If \(a > 0\), its maximizer is \(\widehat{\lambda } =- \frac{d}{2s}\) and (31) is satisfied.

(iii) If \(b=0\) and \(a < 0\), then

$$\begin{aligned} \lim _{\lambda \rightarrow -\infty } \phi (\lambda ) = \lim _{\lambda \rightarrow +\infty } \phi (\lambda ) = 0 \end{aligned}$$

implies \(\phi (\widehat{\lambda }) = 0\) at \(\widehat{\lambda } = \pm \infty \). In case \(a = 0\), \(\phi (\lambda ) = 0\) for all \(\lambda \in {\mathbb {R}}\). \(\square \)

We summarize the results of Propositions 38 into the following algorithm for computing the global optimizer \(x_b\) and the optimum \(\eta _b\) of (9).

figure c

The first loop (lines 2–4) needs \(\mathcal {O}(n)\) operations (including comparisons). Line 5 needs sorting and removing duplicates, requiring \(\mathcal {O}(n~ log(n))\) operations. The second loop (lines 6–15) needs \(\mathcal {O}(m^2)\) operations. Line 16 require \(\mathcal {O}(m)\) comparisons. Therefore, the computational complexity of, the algorithm BCSS is given by

$$\begin{aligned} \mathcal {N}(m,n)=\mathcal {O}(n~ log(n)+m^2). \end{aligned}$$
(35)

The the cost of BCSS is negligible for small-scale and medium-scale problems, where m does not get too large.

3.2 Inexact solution of the OSGA rational subproblem (9)

In the BCSS algorithm, it is possible that the number m of different breakpoints is \(\mathcal {O}(n)\). If m is large solving the subproblem (9) with BCSS is costly in a Matlab implementation, where branching is comparatively slow. If m has the same order as n the second term in (35) dominates and we have \(\mathcal {N}(m,n)=\mathcal {O}(n^2)\). For the application to large-scale problems we need a cheaper alternative. We therefore looked for a theoretically less satisfactory (but in practice for large m superior) approximate technique for solving (9). For simplicity, we consider the quadratic prox-function (5) with \(x^0=0\); the general case can be easily reduced to this one by shifting x appropriately.

In view of Proposition 3 and Theorem 3.1 in Ahookhosh and Neumaier (2017), the solution of the subproblem (9) is given by \(u(\lambda )\) defined in (12), where \(\lambda \) can be computed by solving the one-dimensional nonlinear equation

$$\begin{aligned} \varphi (\lambda ) = 0, \end{aligned}$$

in which

$$\begin{aligned} \varphi (\lambda ) := \frac{1}{\lambda } \left( \frac{1}{2}\Vert u(\lambda )\Vert _2^2 + Q_0 \right) + \gamma + \langle h, u(\lambda ) \rangle . \end{aligned}$$
(36)

The solution of the OSGA subproblem can be found by Algorithm 3 (OSS) in Ahookhosh and Neumaier (2017). In Ahookhosh and Neumaier (2017), it is shown that in many convex domains the nonlinear Eq. (36) can be solved explicitly, however, for the bound-constrained problems it can be only solved approximately. The main advantages of the inexact approach is its simplicity and cheap cost for extremely large-scale problems.

As discussed in Ahookhosh and Neumaier (2017), the one-dimensional nonlinear equation can be solved by some zero-finder schemes such as the bisection method and the secant bisection scheme described in Chapter 5 of Neumaier (2001). One can also use the MATLAB \(\mathtt {fzero}\) function combining the bisection scheme, the inverse quadratic interpolation, and the secant method. In the next section we will use this inexact solution of the OSGA rational subproblem (9) for solving large-scale imaging problems, which turned out to be much faster.

4 Numerical experiments and applications

In this section, we report numerical results for two inverse problems (one-dimensional signal recovery and two-dimensional image deblurring) to show the performance of OSGA compared with some state-of-the-art algorithms.

A software package implementing OSGA for solving unconstrained and bound-constrained convex optimization problems is publicly available at

The package is written in MATLAB, where the parameters

$$\begin{aligned} \delta = 0.9,~~ \alpha _{max} = 0.7,~~ \kappa = \kappa ' = 0.5, \end{aligned}$$

are used. We use the prox-function (5) with \(Q_0 = \frac{1}{2} \Vert x^0\Vert _2 + \epsilon \), where \(\epsilon >0\) is the machine precision. The interface to each subprogram in the package is fully documented in the corresponding file. Some examples for each class of problems are available to show how the user can implement it. The OSGA user’s manual (Ahookhosh 2014) describes the design of the package and how the user can solve his/her own problems.

The algorithms considered in the comparison use the default parameter values reported in the associated literature or packages. All implementations are executed on a Dell Precision Tower 7000 Series 7810 (Dual Intel Xeon Processor E5-2620 v4 with 32 GB RAM).

4.1 One-dimensional signal recovery

In this section, we deal with the linear inverse problem

$$\begin{aligned} Ax=b, ~x\in \mathbf {x}\end{aligned}$$

that can be translated to a problem of the form (1) with the objective functions

$$\begin{aligned} \begin{array}{ll} f(x)= \frac{1}{2} \Vert Ax-b\Vert _2^2 + \frac{1}{2} \lambda \Vert x\Vert _2^2 \quad &{}~~~ (\mathrm {L22L22R}),\\ f(x)= \frac{1}{2} \Vert Ax-b\Vert _2^2 + \lambda \Vert x\Vert _1 \quad &{}~~~ (\mathrm {L22L1R}),\\ f(x)= \Vert Ax-b\Vert _1 + \frac{1}{2} \lambda \Vert x\Vert _2^2 \quad &{}~~~ (\mathrm {L1L22R}),\\ f(x)= \Vert Ax-b\Vert _1 + \lambda \Vert x\Vert _1 \quad &{}~~~ (\mathrm {L1L1R}),\\ \end{array} \end{aligned}$$
(37)

where \(\lambda \) is a regularization parameter.

We solve all of the above-mentioned problems with the dimensions \(n = 1000\) and \(m = 500\). The problem is generated by the same procedure given in the SpaRSA (Wright et al. 2009) package available at

which is

$$\begin{aligned} \begin{array}{l} \mathtt {n\_spikes = floor(spike\_rate*n);} \\ \mathtt {p = zeros(n,1);~} \mathtt {q = randperm(n);} \\ \mathtt {p(q(1:n\_spikes)) = sign(randn(n\_spikes,1));}\\ \mathtt {B = randn(m,n);~} \mathtt {B = orth(B')';} \\ \mathtt {bf = B*p;~ } \mathtt {rk = randn(m,1);}\\ \mathtt {b = bf+sigma*norm(bf)/norm(rk)*rk;} \end{array} \end{aligned}$$

with \(\mathtt {spike\_rate = 0.1}\) and the levels of noise \(\mathtt {sigma=0.4,~0.6,~0.8}\). The lower and upper bounds on the variables are generated by

$$\begin{aligned} \mathtt {\underline{x}=0.05 * ones(n)},~~~ \mathtt {\overline{x}=0.95 * ones(n)}, \end{aligned}$$

respectively. Since among the problems given in (37) only L22L22R is differentiable, we need some nonsmooth algorithms to be compared with OSGA. In our experiment, we consider two versions of OSGA, i.e., one version uses BCSS for solving the subproblem (9) (OSGA-1) and another version uses the inexact solution described in Sect. 3.2 for solving the subproblem (9) (OSGA-2), compared with PSGA-1 (a projected subgradient algorithm with nonsummable diminishing step-size), and PSGA-2 (a projected subgradient algorithm with nonsummable diminishing steplength), cf. Boyd et al. (2003).

The results for L22L22R, L22L1R, L1L22R, and L1L1R are illustrated in Table 1 and Fig. 1. We first run OSGA-2 and stop it after 100 iterations for each problem of (37) and set the best founded function value as \(f_b\). Then we stop the other algorithms once they achieve a function value less or equal than \(f_b\) or after 2000 iterations. Figure 1 displays the relative error of function vales versus iterations

$$\begin{aligned} \delta _k := \frac{f_k - \widehat{f}}{f_0 - \widehat{f}}, \end{aligned}$$
(38)

where \(\widehat{f}=f_b-0.01f_b\) denotes an approximation of the minimum and \(f_0\) shows the function value on an initial point \(x^0\). In our experiments, PSGA-1 and PSGA-2 exploit the step-sizes \(\alpha := 1/\sqrt{k}\Vert g_k\Vert \) and \(\alpha := 0.1/\sqrt{k}\), respectively, in which k is the iteration counter and \(g_k\) is a subgradient of f at \(x_k\).

Table 1 Result summary for solving L22L22R, L22L1R, L1L22R, and L1L1R, where \(N_i\) and T denote the number of iterations and the running time, respectively
Fig. 1
figure 1

The relative error \(\delta _k\) of function values versus iterations of PSGA-1, PSGA-2, OSGA-1, and OSGA-2 for the problems L22L22R, L22L1R, L22L22R, and L22L1R with several levels of noise and regularization parameters. a L22L22R, \(\sigma =0.4\), \(\lambda =1.3\); b L22L22R, \(\sigma =0.6\), \(\lambda =1.3\); c L22L22R, \(\sigma =0.8\), \(\lambda =1.3\); d L22L1R, \(\sigma =0.4\), \(\lambda =0.3\); e L22L1R, \(\sigma =0.6\), \(\lambda =0.3\); f L22L1R, \(\sigma =0.8\), \(\lambda =0.3\); g L1L22R, \(\sigma =0.4\), \(\lambda =3.0\); h L1L22R, \(\sigma =0.6\), \(\lambda =3.0\); i L1L22R, \(\sigma =0.8\), \(\lambda =3.0\); j L1L1R, \(\sigma =0.4\), \(\lambda =0.8\); k L1L1R, \(\sigma =0.6\), \(\lambda =0.8\); l L1L1R, \(\sigma =0.8\), \(\lambda =0.8\)

In Table 1, \(N_i\) and T denote the total number of iterations and the running time, respectively. From this table, we can see that for the problems L22L1R and L1L1R, OSGA-1 and OSGA-2 outperform PSGA-1 and PSGA-2 significantly; however, for L22L22R and L1L22R PSGA-2 attains a comparable or better results than OSGA-1. In Fig. 1, we illustrate the relative error \(\delta _k\) versus iterations for several levels of noise and regularization parameters. It is clear that the considered algorithms have a good behaviour by increasing the levels of noise. Subfigures (a)–(f) and (j)-(l) show that OSGA-1 and OSGA-2 outperform PSGA-1 and PSGA-2 substantially with respect to the relative error of function values \(\delta _k\) (38); however, from subfigures (g)–(i), PSGA-2 attains the best results but rather comparable with OSGA-1 and OSGA-2. These results show that OSGA-1 and OSGA-2 are suitable for the sparse signal recovery with the \(\ell _1\) regularizer. It can also be seen that OSGA-1 (using BCSS) performs much better than OSGA-2 (using inexact scheme) for this medium-scale problem.

4.2 Two-dimensional image deblurring

Image deblurring is one of the fundamental tasks in the context of digital imaging processing, aiming at recovering an image from a blurred/noisy observation. The problem is typically modeled as linear inverse problem

$$\begin{aligned} y = Ax + \omega ,~~~ x \in V, \end{aligned}$$
(39)

where V is a finite-dimensional vector space, A is a blurring linear operator, x is a clean image, y is an observation, and \(\omega \) is either Gaussian or impulsive noise.

The system of Eq. (39) is usually underdetermined and ill-conditioned, and \(\omega \) is not commonly available, so it is not possible to solve it directly, see Neumaier (1998). Hence the solution is generally approximated by an optimization problem of the form

$$\begin{aligned} \begin{array}{ll} \min \limits _{x \in V} &{}~~ \displaystyle \frac{1}{2} \Vert Ax - b \Vert _2^2 + \lambda \varphi (x)\\ \end{array} \end{aligned}$$
(40)

where \(\varphi \) is a smooth or nonsmooth regularizer such as \(\varphi (x) = \frac{1}{2} \Vert x\Vert _2^2\), \(\varphi (x) = \Vert x\Vert _1\), and \(\varphi (x) = \Vert x\Vert _{ITV}\) in which ITV stands for the isotropic total variation. Among the various regularizers, the total variation is much more popular due to its strong edge preserving feature, see, e.g., Chambolle et al. (2010). Isotropic total variation is defined for \(x \in {\mathbb {R}}^{m\times n}\) by

$$\begin{aligned} \Vert x\Vert _{ITV}= & {} \sum \nolimits _i^{m-1} \sum \nolimits _j^{n-1} \sqrt{(x_{i+1,j} - x_{i,j})^2+(x_{i,j+1} - x_{i,j})^2 }\\&+ \sum \nolimits _i^{m-1} |x_{i+1,n} - x_{i,n}| + \sum \nolimits _j^{n-1} |x_{m,j+1} - x_{m,j}|. \end{aligned}$$

The common drawback of the unconstrained problem (40) is that it usually gives a solution outside of the dynamic range of the image, which is either [0, 1] or [0, 255] for 8-bit gray-scale images. Hence one has to project the unconstrained solution to the dynamic range of the image. However, the quality of the projected images is not always acceptable. As a result, it is worth to solve a bound-constrained problem of the form (2) in place of the unconstrained problem (40), where the bounds are defined by the dynamic range of the images, see Beck and Teboulle (2009), Chan et al. (2013) and Woo and Yun (2013).

The comparison concerning the quality of the recovered image is made via the so-called peak signal-to-noise ratio (PSNR) defined by

$$\begin{aligned} \mathrm {PSNR} = 20 \log _{10} \left( \frac{\sqrt{mn}}{\Vert x - x_t\Vert _F} \right) \end{aligned}$$
(41)

and the improvement in signal-to-noise ratio (ISNR) defined by

$$\begin{aligned} \mathrm {ISNR} = 20 \log _{10} \left( \frac{\Vert y - x_t\Vert _F}{\Vert x - x_t\Vert _F} \right) , \end{aligned}$$
(42)

where \(\Vert \cdot \Vert _F\) is the Frobenius norm, \(x_t\) denotes the \(m \times n\) true image, y is the observed image, and pixel values are in [0, 1].

We here consider the image restoration from a blurred/noisy observation using the model (2) equipped with the isotropic total variation regularizer. We employ OSGA, MFISTA (a monotone version of FISTA proposed by Beck and Teboulle (2009)), ADMM (an alternating direction method proposed by Chan et al. (2013)), and a projected subgradient algorithm PSGA (with nonsummable diminishing step-size, see Boyd et al. (2003)). In our implementation, we use the original code of MFISTA and ADMM provided by the authors, with minor adaptations about the stopping criterion.

We here restore the \(512 \times 512\) blurred/noisy Barbara image. Let y be a blurred/noisy version of this image generated by a \(9 \times 9\) uniform blur and adding a Gaussian noise with zero mean and the standard deviation set to \(\sigma =0.02,~0.04,~0.06,~0.08\). Our implementation shows that the algorithms are sensitive to the regularization parameter \(\lambda \). Hence we consider three different regularization parameters \(\lambda = 1 \times 10^{-2}\), \(\lambda = 7 \times 10^{-3}\), and \(\lambda = 4 \times 10^{-3}\). We run MFISTA for the deblurring problem, stop it after 25 iterations, and set \(f_b\) to the best function value found. Then we stop the other algorithms as soon as a function value less or equal than \(f_b\) is achieved or after 50 iterations. The results of our implementation are summarized in Table 2, Figs. 23, and 4.

Table 2 Result summary for the \(l_2^2\) isotropic total variation, where \(\mathrm {PSNR}\) and T denote the peak signal-to-noise (41) and the running time, respectively
Fig. 2
figure 2

The relative error \(\delta _k\) of function values versus iterations of PSGA, MFISTA, ADMM, and OSGA for deblurring the \(512 \times 512\) Barbara image with the \(9 \times 9\) uniform blur and the Gaussian noise with deviation \(\sigma =0.02,~0.04,~0.06,~0.08\). a \(\sigma =0.02\), \(\lambda = 4 \times 10^{-3}\); b \(\sigma =0.02\), \(\lambda = 7 \times 10^{-3}\); c \(\sigma =0.02\), \(\lambda = 1 \times 10^{-2}\); d \(\sigma =0.04\), \(\lambda = 4 \times 10^{-3}\); e \(\sigma =0.04\), \(\lambda = 7 \times 10^{-3}\); f \(\sigma =0.04\), \(\lambda = 1 \times 10^{-2}\); g \(\sigma =0.06\), \(\lambda = 4 \times 10^{-3}\); h \(\sigma =0.06\), \(\lambda = 7 \times 10^{-2}\); i \(\sigma =0.06\), \(\lambda = 1 \times 10^{-2}\); j \(\sigma =0.08\), \(\lambda = 4 \times 10^{-3}\); k \(\sigma =0.08\), \(\lambda = 7 \times 10^{-3}\); l \(\sigma =0.08\), \(\lambda = 1 \times 10^{-2}\)

Fig. 3
figure 3

ISNR versus iterations of PSGA, MFISTA, ADMM, and OSGA for deblurring the \(512 \times 512\) Barbara image with the \(9 \times 9\) uniform blur and the Gaussian noise with deviations \(\sigma =0.02,~0.04,~0.06,~0.08\). a \(\sigma =0.02\), \(\lambda = 4 \times 10^{-3}\); b \(\sigma =0.02\), \(\lambda = 7 \times 10^{-3}\); c \(\sigma =0.02\), \(\lambda = 1 \times 10^{-2}\); d \(\sigma =0.04\), \(\lambda = 4 \times 10^{-3}\); e \(\sigma =0.04\), \(\lambda = 7 \times 10^{-3}\); f \(\sigma =0.04\), \(\lambda = 1 \times 10^{-2}\); g \(\sigma =0.06\), \(\lambda = 4 \times 10^{-3}\); h \(\sigma =0.06\), \(\lambda = 7 \times 10^{-2}\); i \(\sigma =0.06\), \(\lambda = 1 \times 10^{-2}\); j \(\sigma =0.08\), \(\lambda = 4 \times 10^{-3}\); k \(\sigma =0.08\), \(\lambda = 7 \times 10^{-3}\); l \(\sigma =0.08\), \(\lambda = 1 \times 10^{-2}\)

Fig. 4
figure 4

A comparison among PSGA, MFISTA, ADMM, and OSGA for deblurring the \(512 \times 512\) Barbara image with the \(9 \times 9\) uniform blur and the Gaussian noise with the deviation 0.04 and the regularization parameter \(\lambda =4\times 10^{-3}\). a Original image. b Blurred/noisy image. c PSGA: \(\mathrm {PSNR}=23.07\) and \(T=1.39\). d MFISTA: \(\mathrm {PSNR}=23.59\) and \(T=4.73\). e ADMM: \(\mathrm {PSNR}=23.48\) and \(T=1.09\). f OSGA: \(\mathrm {PSNR}=23.64\) and \(T=1.83\)

The results of Table 2, Figs. 2 and 3 show that the PSNR, and ISNR produced by the algorithms are sensitive to the regularization parameter \(\lambda \); the function values are somewhat less sensitive. From Table 2, it can be seen that the running time of PSGA, ADMM, and OSGA are comparable and much better than MFISTA, and OSGA attains the best PSNR. Figure 2 shows that MFISTA and then OSGA attains the better function value; however, MFISTA needs much more time. Figure 3 shows that OSGA outperforms the other methods with respect to ISNR. Figure 4 displays the original Barbara image, the blurred/noisy image, and the recovered images by PSGA, MFISTA, ADMM, and OSGA for the regularization parameter \(\lambda = 4 \times 10^{-3}\).

5 Concluding remarks

This paper discussed how to apply the optimal subgradient algorithm OSGA to the task of solving bound-constrained convex optimization problems. It is shown that the solution of the auxiliary OSGA subproblem needed in each iteration has a piecewise linear form in a single variable.

We give two iterative schemes to solve this one-dimensional problem; one solves the OSGA subproblem exactly in polynomial time, the other inexactly but for very large problems significantly faster. The first scheme translates the subproblem into a one-dimensional piecewise rational problem, which allows the global optimizer of the subproblem to be found in \(\mathcal {O}(n^2)\) operations. The second scheme solves a one-dimensional nonlinear equation with a standard zero finders and gives only an approximate, local optimizer. The exact scheme BCSS is suitable for small- and medium-scale problems, while the inexact version can be successfully applied even to very large-scale problems.

Numerical results are reported showing the efficiency of OSGA compared with some state-of-the-art algorithms.