Abstract
In this work, we introduce a new accelerated algorithm using a linesearch technique for solving convex minimization problems in the form of a summation of two lower semicontinuous convex functions. A weak convergence of the proposed algorithm is given without assuming the Lipschitz continuity on the gradient of the objective function. Moreover, the convexity of this algorithm is also analyzed. Some numerical experiments in machine learning are also discussed, namely regression and classification problems. Furthermore, in our experiments, we evaluate the convergent behavior of this new algorithm, then compare it with various algorithms mentioned in the literature. It is found that our algorithm performs better than the others.
Similar content being viewed by others
1 Introduction
In this paper, we study the convex minimization problem in the form of a summation of two convex functions. It can be expressed as follows:
where \(f,g: H \rightarrow \mathbb{R}\cup \{+\infty \}\) are proper, lower semicontinuous convex functions and H is a Hilbert space. This problem has been analyzed excessively due to its applications in major subjects such as physics, economics, engineering, statistics, and computer science. Some examples of the applications are compressed sensing, signal and image processing, medical image reconstruction, automatic control systems, and machine learning tasks in the form of data prediction and data classification. As seen in [1–7] and the references therein, these problems can be formulated as (1).
In the case that f is differentiable, then \(x^{*}\) solves (1) if and only if
where \(\alpha >0\), \(\mathrm{prox}_{\alpha g}(x^{*}) = J_{\alpha }^{\partial g}(x^{*}) = (I - \alpha \partial g)^{-1}(x^{*})\), ∂g is a subdifferential of g, and I is an identity mapping. One of the most famous algorithms for solving (1) is forward–backward algorithm [8] which is defined in the following form:
where \(\alpha _{n}\) is a suitable step size. This method has been studied and improved by many works, see [2, 3, 9, 10] for examples. Most of these works assume that ▽f is L-Lipschitz continuous, which might be challenging to verify in general cases. So, in this work, we turn our attention to some iterative methods for which the Lipschitz continuity of ▽f is not required.
In 2016, Cruz and Nghia [11] replaced the L-Lipschitz continuity of ▽f with the following conditions.
-
A1.
f, g are proper lower semicontinuous convex functions with \(\operatorname{dom} g \subseteq \operatorname{dom} f\),
-
A2.
f is differentiable on an open set containing domg, and ▽f is uniformly continuous on any bounded subset of domg and maps any bounded subset of domg to a bounded set of H.
Moreover, the authors introduced a linesearch technique as follows:
Linesearch 1
Given \(x \in \operatorname{dom} g\), \(\sigma > 0\), \(\theta \in (0,1)\), and \(\delta > 0\).
Input Set \(\alpha = \sigma \).
While \(\alpha \| \triangledown f(\mathrm{prox}_{\alpha g}(x - \alpha \triangledown f(x))) - \triangledown f(x) \| > \delta \| \mathrm{prox}_{\alpha g}(x - \alpha \triangledown f(x)) - x \| \)
Set \(\alpha = \theta \alpha \)
End While
Output α.
They asserted that Linesearch 1 terminates after a finite number of iterations and introduced the following algorithm:
Algorithm 1
Given \(x_{1} \in \operatorname{dom} g\), \(\sigma > 0\), \(\theta \in (0,1)\), and \(\delta \in (0,\frac{1}{2})\). For \(n \in \mathbb{N}\),
where \(\gamma _{n} := \textbf{Linesearch}\text{ 1} (x_{n},\sigma ,\theta , \delta )\). They proved weak convergence theorem of (4) under assumptions A1 and A2.
Following the idea of Cruz and Nghia, very recently, Kankam et al. [4] introduced a new linesearch technique as follows.
Linesearch 2
Given \(x \in \operatorname{dom} g\), \(\sigma > 0\), \(\theta \in (0,1)\), and \(\delta > 0\). Define
Input Set \(\alpha = \sigma \).
While
Set \(\alpha = \theta \alpha \)
End While
Output α.
They showed that Linesearch 2 terminates at finitely many iterations, then established the following two-step algorithm.
Algorithm 2
Given \(x_{1} \in \operatorname{dom} g\), \(\sigma > 0\), \(\theta \in (0,1)\), and \(\delta \in (0,\frac{1}{8})\). For \(n \in \mathbb{N}\),
where \(\gamma _{n} := \textbf{Linesearch}\text{ 2}(x_{n},\sigma ,\theta ,\delta )\). They proved that this algorithm converges weakly to a solution of (1) under assumptions A1 and A2.
Recently, many authors employed the inertial technique in order to accelerate their algorithms. It was first introduced by Polyak [12] for solving smooth convex minimization problems. After that many inertial-type algorithms have been introduced and analyzed. For instance, in 2001, Alvarez and Attouch [13] introduced the idea of an inertial-proximal operator to solve the inclusion problem of a maximal monotone operator A. It was defined as follows:
where \(x_{0},x_{1} \in H\) are given as starting points, and \(\{\lambda _{n}\}\) and \(\{ \theta _{n} \}\) are nonnegative real sequences. In this algorithm, \(\theta _{n} (x_{n} - x_{n-1})\) is regarded as an inertial term.
In 2019, Attouch and Cabot [14] analyzed the convergence rate of an algorithm called RIPA defined by
where A is a maximal monotone operator. Under mild restrictions of control parameters, they showed that RIPA gives fast convergence rate.
Inertial-type algorithms have been proposed and studied widely by many authors, see [15–22], which showed that inertial step improves the convergence rate of algorithms.
There are several approaches to solving (1), many authors have proposed algorithms for solving inclusion problems. For instance, Moudafi [23] proposed an algorithm for solving inclusion problems in Hilbert spaces. Cholamjiak and Shehu [24] introduced an algorithm for such problems in Banach space, we refer to these works for more comprehensive discussion on inclusion problems and related problems. Under the assumption that ▽f is Lipschitz continuous, algorithms proposed in [23, 24] can be used to solve (1).
Another approach to solving (1) is solving a proximal split feasibility problem. This problem can be reduced to convex minimization (1). Many authors have introduced algorithms for solving this problem, we refer to Shehu and Iyiola [25] for more in-depth discussion on this topic.
Inspired by all the works mentioned in the literature, we aim to introduce a new two-step algorithm which combines a linesearch technique with an inertial step to improve its performance. We obtain a weak convergence of the proposed algorithm to a solution of (1) without assuming ▽f to be L-Lipschitz continuous. Moreover, the complexity of this algorithm is also analyzed. Then, we apply our algorithm to solving regression and classification problems. Furthermore, we compare the performance of the proposed with other linesearch algorithms, namely Algorithms 1 and 2.
This work is organized as follows: In Sect. 2, we recall some definitions and lemmas which will be used in the main results. In Sect. 3, a new algorithm is introduced. We show that the proposed algorithm converges weakly to a solution of (1) as well as analyze its complexity. In Sect. 4, experiments on data classification and regression problems are conducted. Then, we evaluate the performance of the proposed algorithm and other algorithms using various evaluation tools. In the last section, Sect. 5, the conclusion of this research is included.
2 Preliminaries
We recall some definitions and lemmas which are crucial to the main results in this section.
We denote \(x_{n} \rightarrow x\) and \(x_{n} \rightharpoonup x\) as strong and weak convergence of \(\{x_{n}\}\) to x, respectively. Let \(h: H \rightarrow \mathbb{R}\) be a proper lower semicontinuous convex function and \(\operatorname{dom} h = \{ x \in H : f(x) < +\infty \}\).
For any \(x \in H\), a subdifferential of h at x is defined by
A proximal operator \(\mathrm{prox}_{\alpha h} : H \rightarrow \operatorname{dom} h\) is defined by
where I is an identity operator and \(\alpha > 0\). This operator is single-valued with full domain, and the following holds:
Next, we recall some crucial lemmas for this work.
Lemma 1
([26])
Let ∂h be a subdifferential operator, then ∂h is maximal monotone. Moreover, its graph, \(\operatorname{Gph}(\partial h):= \{ (x,y) \in H \times H: y \in \partial h(x) \}\), is demiclosed. In other words, for any sequence \((x_{n},y_{n}) \subseteq \operatorname{Gph}(\partial h)\) such that \(\{x_{n}\}\) converges weakly to x and \(\{y_{n}\}\) converges strongly to y, then \((x,y) \in \operatorname{Gph}(\partial h)\).
Lemma 2
([27])
Let \(f,g :H \rightarrow \mathbb{R}\) be proper lower semicontinuous convex functions with \(\operatorname{dom} g \subseteq \operatorname{dom} f\) and \(J(x,\beta ) = \mathrm{prox}_{\beta g}(x - \beta \triangledown f(x))\). Then, for any \(x \in \operatorname{dom} g\) and \(\beta _{2} \geq \beta _{1} > 0\), we have
Lemma 3
([28])
Let H be a real Hilbert space. Then, for all \(a,b \in H\) and \(\zeta \in [0,1]\), the following hold:
-
(i)
\(\| a \pm b \|^{2} = \| a \|^{2} \pm 2\langle a,b \rangle + \| b \|^{2}\),
-
(ii)
\(\| \zeta a + (1-\zeta )b \|^{2} = \zeta \| a \|^{2} + (1-\zeta ) \| b \|^{2} - \zeta (1-\zeta )\| a-b \|^{2}\),
-
(iii)
\(\| a + b\|^{2} \leq \| a \|^{2} + 2\langle b,a+b \rangle \).
Lemma 4
([3])
Let \(\{a_{n}\}\) and \(\{\beta _{n}\}\) be sequences of nonnegative real numbers such that
Then the following holds:
Moreover, if \(\sum_{n=1}^{+\infty }\beta _{n} < +\infty \), then \(\{a_{n}\}\) is bounded.
Lemma 5
([28])
Let \(\{a_{n}\}\), \(\{b_{n}\}\), and \(\{\delta _{n}\}\) be sequences of nonnegative real numbers such that
If \(\sum_{n=1}^{+\infty }\delta _{n} < +\infty \) and \(\sum_{n=1}^{+\infty }b_{n} < +\infty \), then \(\lim_{n \rightarrow +\infty } a_{n}\) exists.
Lemma 6
([29], Opial)
Let H be a Hilbert space and \(\{x_{n}\}\) be a sequence in H such that there exists a nonempty subset Ω of H satisfying the following:
-
(i)
for any \(x^{*} \in \Omega , \lim_{n \rightarrow +\infty } \|x_{n}-x^{*} \|\) exists;
-
(ii)
every weak-cluster point of \(\{x_{n}\}\) belongs to Ω.
Then \(\{x_{n}\}\) converges weakly to an element in Ω.
Throughout this work, we suppose that a solution of (1) exists and the set of these solutions is denoted by \(S_{*}\).
3 Main results
In this section, we propose an accelerated algorithm by employing a linesearch technique (Linesearch 1) together with the inertial technique for solving (1) and prove its weak convergence. Our algorithm is defined as follows.
Algorithm 3
Given \(x_{0}, x_{1} \in \operatorname{dom} g\), \(\sigma > 0\), \(\theta \in (0,1)\), \(\delta \in (0, \frac{1}{2})\), \(\alpha _{n} \in [0,1]\), and \(\beta _{n} \geq 0\). For \(n \in \mathbb{N}\),
where \(\gamma _{n} := \textbf{Linesearch}\text{ 1} (y_{n},\sigma ,\theta ,\delta ) \) and \(\rho _{n} := \textbf{Linesearch}\text{ 1}(z_{n},\gamma _{n}, \theta ,\delta )\), and \(P_{\operatorname{dom} g}\) is a metric projection onto domg.
Theorem 7
Let H be a real Hilbert space, \(f :H \rightarrow \mathbb{R}\cup \{+\infty \}\) and \(g :H \rightarrow \mathbb{R}\cup \{+\infty \}\) be proper lower semicontinuous convex functions satisfying A1 and A2. In addition, suppose that domg is closed and the following is satisfied, for all \(n \in \mathbb{N}\):
-
B1.
\(\sum_{n=1}^{+\infty } \beta _{n} < +\infty \).
Then a sequence \(\{x_{n}\}\) generated by Algorithm 3converges weakly to a point in \(S_{*}\). In other words, \(\{x_{n}\}\) converges weakly to a solution of (1).
Proof
For the sake of convenience, we denote \(w_{n} = \mathrm{prox}_{\rho _{n} g}(z_{n} - \rho _{n} \triangledown f(z_{n}))\), and let \(x^{*} \in S_{*}\). For any \(x \in \operatorname{dom} g\) and \(n \in \mathbb{N}\), we first prove the following:
In order to show (7), we obtain from (6) that
By the definitions of \(\partial g(z_{n})\), \(\triangledown f(y_{n})\), and \(\triangledown f(z_{n})\), we have
for all \(n \in \mathbb{N}\). From these inequalities and the definition of \(\gamma _{n}\), we obtain
for all \(n \in \mathbb{N}\). Consequently,
Since \(\langle y_{n} - z_{n}, z_{n} -x \rangle = \frac{1}{2}( \| y_{n} -x \|^{2} - \| y_{n} - z_{n}\|^{2} - \| z_{n} - x \|^{2})\), we have
for all \(n \in \mathbb{N}\). Hence, for any \(x \in \operatorname{dom} g\), we have
for all \(n \in \mathbb{N}\). Furthermore, since \(x^{*} \in S_{*} \subseteq \operatorname{dom} g\), we have
To prove (8), using the same arguments, we obtain the following inequalities:
for all \(n \in \mathbb{N}\). Again, using the above inequalities, we have
for all \(n \in \mathbb{N}\), which implies that
Since \(\langle z_{n} - w_{n}, w_{n} -x \rangle = \frac{1}{2}( \| z_{n} -x \|^{2} - \| z_{n} - w_{n}\|^{2} - \| w_{n} - x \|^{2})\), we get
for all \(n \in \mathbb{N}\). It follows that, for all \(x \in \operatorname{dom} g\) and \(n \in \mathbb{N}\),
So, putting \(x = x^{*}\), we obtain
Furthermore, from the definition of \(x_{n+1}\), (9) and (10), we conclude that
Now, we show that \(\lim_{n \rightarrow +\infty }\| x_{n} - x^{*} \|\) exists.
From (12) and the nonexpansiveness of \(P_{\operatorname{dom} g}\), we obtain the following:
By Lemma 4, we have \(\{x_{n}\}\) is bounded, and hence \(\sum_{n=1}^{+\infty }\beta _{n}\|x_{n} - x_{n-1}\| < + \infty \). Consequently,
From (13), we have
By Lemma 5, we get that \(\lim_{n \rightarrow +\infty }\| x_{n} - x^{*} \|\) exists. Now, from the convexity of domg and the definitions of \(z_{n-1}\) and \(x_{n}\), we conclude that \(x_{n} \in \operatorname{dom} g\). Consequently,
By (14) and (15), we have \(\lim_{n \rightarrow +\infty }\| x_{n} - y_{n} \| = 0\). Using (13) and (14), we obtain \(\lim_{n \rightarrow +\infty }\| x_{n} - x^{*} \| = \lim_{n \rightarrow +\infty }\| y_{n} - x^{*} \|\). From (11) and (12), we get \(\lim_{n \rightarrow +\infty }\| y_{n} - x^{*} \| = \lim_{n \rightarrow +\infty }\| z_{n} - x^{*} \|\), and hence (9) implies that \(\lim_{n \rightarrow +\infty }\| y_{n} - z_{n} \| = 0\). As a result, we have \(\lim_{n \rightarrow +\infty }\| x_{n} - z_{n} \| = 0\).
Next, we prove that every weak-cluster point of \(\{x_{n}\}\) belongs to \(S_{*}\). To do this, let w be a weak-cluster point of \(\{x_{n}\}\). Then there exists a subsequence \(\{x_{n_{k}}\}\) of \(\{x_{n}\}\) such that \(x_{n_{k}} \rightharpoonup w\) and hence \(z_{n_{k}} \rightharpoonup w\).
If \(\gamma _{n_{k}} \neq \sigma \) for finitely many k, thus, we can suppose that \(\gamma _{n_{k}} = \sigma \) for all \(k \in \mathbb{N}\) without loss of generality. The definition of \(\gamma _{n_{k}}\) implies that
Since ▽f is uniformly continuous, we get \(\lim_{k \rightarrow +\infty }\| \triangledown f(z_{n_{k}})- \triangledown f(y_{n_{k}}) \| = 0\). We know that
We conclude from the demiclosedness of \(\operatorname{Gph}(\partial (f+g))\) that \((w,0) \in \operatorname{Gph}(\partial (f+g))\). Hence, \(0 \in \partial (f+g)(w)\), which implies that \(w \in S_{*}\).
Now, suppose that there exists a subsequence \(\{z_{n_{k_{j}}}\}\) of \(\{z_{n_{k}}\}\) such that \(\gamma _{n_{k_{j}}} \leq \sigma \theta \) for all \(j \in \mathbb{N}\). In this case, we can set \(\hat{\gamma }_{n_{k_{j}}} = \frac{\gamma _{n_{k_{j}}}}{\theta }\) and \(\hat{z}_{n_{k_{j}}} = \mathrm{prox}_{\hat{\gamma }_{n_{k_{j}}}g} (y_{n_{k_{j}}} - \hat{\gamma }_{n_{k_{j}}} \triangledown f(y_{n_{k_{j}}}))\). By the definition of \(\gamma _{n_{k_{j}}}\), we obtain
Moreover, by Lemma 2, we have
Therefore, \(\| y_{n_{k_{j}}} - \hat{z}_{n_{k_{j}}}\| \rightarrow 0, \text{as } j \rightarrow +\infty \), which implies that \(\hat{z}_{n_{k_{j}}} \rightharpoonup w\). Again, using the uniform continuity of ▽f, we obtain \(\| \triangledown f(\hat{z}_{n_{k_{j}}}) - \triangledown f(y_{n_{k_{j}}}) \| \rightarrow 0\), as \(j \rightarrow +\infty \). Combining with (16), we obtain \(\frac{\| \hat{z}_{n_{k_{j}}} - y_{n_{k_{j}}}\|}{\hat{\gamma }_{n_{k_{j}}}} \rightarrow 0\), as \(j \rightarrow +\infty \). Moreover, we know that
It implies, by the demiclosedness of \(\operatorname{Gph}(\partial (f+g))\), that \(0 \in \partial (f+g)(w)\), so \(w \in S_{*}\).
By Lemma 6, we obtain that \(\{x_{n}\}\) converges weakly to an element in \(S_{*}\), and the proof is complete. □
By setting \(\beta _{n} = 0\) and \(\alpha _{n} = 0\) for all \(n \in \mathbb{N}\), then \(y_{n} = x_{n}\), and hence Algorithm 3 is reduced to Algorithm 1. As a consequence of Theorem 7, we obtain the following result which is one part of Theorem 4.2 in [11].
Corollary 8
Let H be a real Hilbert space, \(f,g :H \rightarrow \mathbb{R}\cup \{+\infty \}\) be proper lower semicontinuous convex functions satisfying A1 and A2. If \(S_{*} \neq \emptyset \), then a sequence \(\{x_{n}\}\) generated by Algorithm 1converges weakly to a point in \(S_{*}\).
In the next theorem, we prove the complexity of Algorithm 3. We first introduce the control sequence \(\{t_{n}\}\) defined in [14] by
This sequence is well defined if the following assumption holds:
It is easy to see that under the above assumption we have
Next, we prove the following theorem.
Theorem 9
Given \(x_{0} = x_{1} \in \operatorname{dom} g\), let \(\{x_{n}\}\) be a sequence generated by Algorithm 3, and suppose that all assumptions in Theorem 7hold. Additionally, the following assumptions are also true for all \(n \in \mathbb{N}\):
-
C1.
\(\sum_{k = n}^{+\infty }(\prod_{i=n}^{k}\beta _{i}) < + \infty \), and \(t^{2}_{n+1} - t_{n+1} \leq t^{2}_{n} \),
-
C2.
\(\alpha _{n} \in [\frac{1}{2}, 1]\), and \(\alpha _{n} \leq \alpha _{n-1}\),
-
C3.
\(\gamma _{n} = \textbf{Linesearch}\textit{ 1}(y_{n},\rho _{n-1},\theta , \delta )\), \(\rho _{n} := \textbf{Linesearch}\textit{ 1}(z_{n},\gamma _{n},\theta ,\delta )\), and \(\rho _{n} \geq \rho > 0\).
Then
for all \(n \in \mathbb{N}\), where \(\zeta _{1} = 2(\gamma _{1} + \alpha _{1} \rho _{1})\). In other words,
Proof
Let \(x^{*} \in S_{*}\). For any \(x \in \operatorname{dom} g\), we know that
for all \(n \in \mathbb{N}\). Put \(x = z_{n}\) in (20), then
thus \((f+g)(z_{n}) \geq (f+g)(w_{n})\) for all \(n \in \mathbb{N}\). Since f and g are convex, we have
From the definition of \(x_{n+1}\), we obtain
Hence,
Combining (20) and (22), we have
for all \(n \in \mathbb{N}\). We claim that
To validate our claim, we know from (21) and C2 that
Consequently,
for all \(n \in \mathbb{N}\). For simplicity, we denote \(\zeta _{n} = 2(\gamma _{n} + \alpha _{n} \rho _{n})\). We note that \(\zeta _{n} \geq \zeta _{n+1} \text{for all } n \in \mathbb{N}\) from C2 and C3. We also know that \(\| \hat{x}_{n} - x \| \geq \| y_{n} - x \|\) since \(x \in \operatorname{dom} g\).
So, from (24) and (25), we have
We know that \(x_{n},x^{*} \in \operatorname{dom} g\) and \(t_{n+1}> 1\). Thus, we conclude that \((1-\frac{1}{t_{n+1}})x_{n} + \frac{1}{t_{n+1}}x^{*} \in \operatorname{dom} g\). By putting \(x = (1-\frac{1}{t_{n+1}})x_{n} + \frac{1}{t_{n+1}}x^{*}\) in (26), we obtain
for all \(n \in \mathbb{N}\). We also have, for \(n \in \mathbb{N}\),
and
Hence, we obtain from (27), (28), and (29) that, for \(n \in \mathbb{N}\),
We know that \(\zeta _{n+1} \leq \zeta _{n}\), so after rearranging (30), we have, for \(n \in \mathbb{N}\),
Furthermore, by using (31), we can inductively show that
for all \(n \in \mathbb{N}\). Since \(\zeta _{n} = 2(\gamma _{n} + \alpha _{n}\rho _{n}) \geq 3\rho \), we obtain, for all \(n \in \mathbb{N}\), that
Since \(x^{*}\) is chosen from \(S_{*}\) arbitrarily, we have
for all \(n \in \mathbb{N}\). Hence, we obtain the desired results and the proof is complete. □
Remark 10
To justify that there exists a sequence \(\{\beta _{n}\}\) satisfying C1, we choose
Obviously, \(\beta _{n} \geq \beta _{n+1} \) for all \(n \in \mathbb{N}\). Since
we have
and hence \(t_{n+1} \leq t_{n}\). Furthermore, it is easy to see that
Therefore, \(\sum_{k = n}^{+\infty }(\prod_{i=n}^{k}\beta _{i}) < + \infty \) and \(t^{2}_{n+1} - t_{n+1} \leq t^{2}_{n} \) for all \(n \in \mathbb{N}\), so C1 is satisfied.
4 Applications to data classification and regression problems
In this section, we apply Algorithm 3 to solving regression and classification problems. Moreover, we conduct some numerical experiments for comparing the performance of Algorithm 3 with Algorithm 1 and Algorithm 2.
Machine learning is an application of artificial intelligence (AI) which has the ability to automatically learn and improve from experience. There are many techniques for the machine to learn, in this work, we focus on extreme learning machine (ELM) introduced by Huang et al. [30] defined as follows:
Let \(S := \{(x_{k},t_{k}): x_{k} \in \mathbb{R}^{n}, t_{k} \in \mathbb{R}^{m}, k = 1,2,\ldots,N\}\) be a training set of N distinct samples, \(x_{k}\) is an input data, and \(t_{k}\) is a target. The output function of ELM for SLFNs with M hidden nodes and activation function G is
where \(w_{i}\) is the weight vector connecting the ith hidden node and the input node, \(\eta _{i}\) is the weight vector connecting the ith hidden node and the output node, and \(b_{i}\) is bias. The hidden layer output matrix H is defined as follows:
To solve ELM is finding \(\eta = [\eta ^{T}_{1},\ldots,\eta ^{T}_{M} ]^{T}\) such that \(\textbf{H}\eta = \textbf{T}\), where \(\textbf{T} = [t^{T}_{1},\ldots,t^{T}_{N} ]^{T}\) is the training data. We can write the solution η in the form \(\eta = \textbf{H}^{\dagger }\textbf{T}\), where \(\textbf{H}^{\dagger }\) is the Moore–Penrose generalized inverse of H. However, if \(\textbf{H}^{\dagger }\) does not exist, then η is quite difficult to find. In this case, we can employ the concept of convex minimization to find such η without relying on the existence of \(\textbf{H}^{\dagger }\).
To prevent overfitting, we use following regularization: Least absolute shrinkage and selection operator (LASSO) [31]:
where λ is a regularization parameter, and consider \(f(x) =\| \textbf{H}x - \textbf{T} \|^{2}_{2}\) and \(g(x) = \lambda \| x \|_{1} \). Based on this model, we conduct some numerical experiment on a regression of a sine function and a classification on the Iris and heart disease dataset.
Throughout Sects. 4.1 and 4.2, we use sigmoid as an activation function. Moreover, we choose parameters according to the hypotheses of Theorem 7. All results are performed on Intel Core i5-7500 CPU with 16GB RAM and GeForce GTX 1060 6GB GPU.
4.1 Experiments for regression
We generate distinct points \(x_{1},x_{2},\ldots,x_{10}\) in an interval \([-4, 4]\), and define the training set \(S := \{\sin x_{n} : n=1,\ldots,10\}\) and a graph of a sine function on \([-4, 4]\) as the target. Moreover, we set \(M = 25\) as the number of hidden nodes, and \(\lambda = 10^{-5}\).
For the first experiment, we set \(\delta = 0.49\), \(\sigma = 0.1\), \(\theta = 0.1\), and \(\alpha _{n} = \zeta _{n} = \frac{0.9n}{n+1}\) in Algorithm 3 to evaluate the convergence behavior of Algorithm 3 with various inertial parameters \(\beta _{n}\), namely
To evaluate the performance, we use mean square error(MSE) defined as follows:
By letting MSE \(= 1 \times 10^{-3}\) and 1000 number of iterations as the stopping criteria, we obtain the following results in Table 1 which show that some inertial parameters improve the performance of Algorithm 3 substantially.
In the next experiment, we compare the performance of Algorithm 3 with Algorithm 1 and Algorithm 2. All the parameters are chosen as seen in Table 2.
By letting \(\mathrm{MSE} = 1 \times 10^{-3}\) and 30,000 number of iterations as the stopping criteria, the results are shown in Table 3.
From Table 3, we see that Algorithm 3 takes only 433 iterations to reach the stopping criteria, so it outperforms both Algorithm 1 and 2.
In the following experiment, we evaluate the performance of each algorithm at the 700th iteration with mean absolute error (MAE) and root mean squared error (RMSE) defined as follows:
The results can be seen in Table 4.
As seen in Table 4, Algorithm 3 achieves the lowest MAE and RMSE at the 700th iteration. In Fig. 1, we illustrate the performance of each algorithm at the 700th iteration.
4.2 Data classification
We conduct some experiments on Iris dataset [32] and heart disease dataset [33] from https://archive.ics.uci.edu. The Iris dataset contains three classes of Iris plants with 50 instances of each, and the heart disease dataset contains two classes, namely 165 patients with heart disease and 138 patients without heart disease. See Table 5 for more details of the datasets.
We set the number of hidden nodes \(M = 35\) and \(\lambda = 10^{-5}\) for both datasets. For an estimation of the optimal weight β, we use Algorithm 1, Algorithm 2, and Algorithm 3, and the output O of training and testing set are calculated by \(O = \textbf{H} \beta \).
Furthermore, the dataset is split into training and testing set, see Table 6 for details.
The accuracy is calculated by the following:
We denote acc.train and acc.test as accuracy of training and testing set, respectively. We first compare the accuracy of Algorithm 3 at 100th iteration with different inertial parameter β, namely
By setting \(\sigma = 0.49\), \(\delta = 0.1\), \(\theta = 0.1\), and \(\alpha _{n} = \frac{0.9n}{n+1}\) in Algorithm 3, the numerical experiment for data classification can be seen in Table 7.
It is observed that \(\beta ^{3}_{n}\) achieves the highest accuracy, so throughout this section we choose \(\beta ^{3}_{n}\) as inertial parameters.
The next experiment is a comparison of the performance for Algorithm 1, Algorithm 2, and Algorithm 3 at the 100th iteration. See Table 8 for the result.
Now, we employ 10-fold stratified cross validation on both Iris and heart disease datasets. We denote
where N is a number of folds, \(x_{i}\) is a number of correctly predicted samples at fold i, and \(y_{i}\) is a number of all samples at fold i.
and
Then we define
In Table 9, we show the result for classification of Iris dataset at the 100th iteration by Algorithm 1, Algorithm 2, and Algorithm 3 at each fold.
In Table 10, we show the result of heart disease dataset at the 100th iteration.
According to Tables 9 and 10, we can conclude that Algorithm 3 achieves the highest accuracy.
5 Conclusions
In this paper, a new algorithm for solving convex minimization problems with an inertial and a linesearch technique, proposed by Cruz and Nghia [11], is introduced and studied. We prove a weak convergence of the proposed algorithm to a solution of (1) without assuming ▽f to be L-Lipschitz continuous. The complexity theorem is also proved under some control conditions. We also employ our algorithm as a machine learning algorithm based on the extreme learning machine model (ELM) introduced by Huang et al. [30] for regression and classification problems. Moreover, we conduct some experiments to show that the proposed algorithm has a good behavior of convergence in terms of low number of iterations and high accuracy for regression and classification problems which imply that our algorithm performs very well in terms of speed in comparison to Algorithm 1 and Algorithm 2.
Availability of data and materials
The datasets analysed during the current study are available in https://archive.ics.uci.edu.
References
Chen, M., Zhang, H., Lin, G., Han, Q.: A new local and nonlocal total variation regularization model for image denoising. Clust. Comput. 22, 7611–7627 (2019)
Combettes, P.L., Wajs, V.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
Hanjing, A., Suantai, S.: A fast image restoration algorithm based on a fixed point and optimization method. Mathematics 8, 378 (2020). https://doi.org/10.3390/math8030378
Kankam, K., Pholasa, N., Cholamjiak, C.: On convergence and complexity of the modified forward-backward method involving new linesearches for convex minimization. Math. Methods Appl. Sci. 42, 1352–1362 (2019)
Kowalski, M., Meynard, A., Wu, H.: Convex optimization approach to signals with fast varying instantaneous frequency. Appl. Comput. Harmon. Anal. 44, 89–122 (2018)
Shehu, Y., Iyiola, O.S., Ogbuisi, F.U.: Iterative method with inertial terms for nonexpansive mappings: applications to compressed sensing. Numer. Algorithms 83, 1321–1347 (2020)
Zhang, Y., Li, X., Zhao, G., Cavalcante, C.C.: Signal reconstruction of compressed sensing based on alternating direction method of multipliers. Circuits Syst. Signal Process. 39, 307–323 (2020)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Bussaban, L., Suantai, S., Kaewkhao, A.: A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpath. J. Math. 36, 21–30 (2020)
Bello Cruz, J.Y., Nghia, T.T.: On the convergence of the forward-backward splitting method with linesearches. Optim. Methods Softw. 31, 1209–1238 (2016)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Alvarez, F., Attouch, H.: An inertial proximal method for maxi mal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9, 3–11 (2001)
Attouch, H., Cabot, A.: Convergence rate of a relaxed inertial proximal algorithm for convex minimization. Optimization 69, 1281–1312 (2019)
Abbas, M., Iqbal, H.: Two inertial extragradient viscosity algorithms for solving variational inequality and fixed point problems. J. Nonlinear Var. Anal. 4, 377–398 (2020)
Abass, H.A., Aremu, K.O., Jolaoso, L.O., Mewomo, O.T.: An inertial forward-backward splitting method for approximating solutions of certain optimization problems. J. Nonlinear Funct. Anal. 2020, 6 (2020). https://doi.org/10.23952/jnfa.2020.6
Luo, Y.: An inertial splitting algorithm for solving inclusion problems and its applications to compressed sensing. J. Appl. Numer. Optim. 2, 279–295 (2020)
Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14, 773–782 (2004)
Bot, R.I., Csetnek, E.R., Laszlo, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)
Chidume, C.E., Kumam, P., Adamu, A.: A hybrid inertial algorithm for approximating solution of convex feasibility problems with applications. Fixed Point Theory Appl. 2020, 12 (2020) https://doi.org/10.1186/s13663-020-00678-w
Thong, D.V., Vinh, N.T., Cho, Y.J.: New strong convergence theorem of the inertial projection and contraction method for variational inequality problems. Numer. Algorithms 84, 285–305 (2020)
Attouch, H., Juan Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Sci. Comput. 24, 232–256 (2014)
Moudafi, A.: On the convergence of the forward-backward algorithm for null-point problems. J. Nonlinear Var. Anal. 2, 263–268 (2018)
Cholamjiak, P., Shehu, Y.: Inertial forward-backward splitting method in Banach spaces with application to compressed sensing. Appl. Math. 64(4), 409–435 (2019)
Shehu, Y., Iyiola, O.S.: Strong convergence result for proximal split feasibility problem in Hilbert spaces. Optimization 66(12), 2275–2290 (2017)
Burachik, R.S., Iusem, A.N.: Set-Valued Mappings and Enlargements of Monotone Operators. Springer, Berlin (2008)
Huang, Y., Dong, Y.: New properties of forward-backward splitting and a practical proximal-descent algorithm. Appl. Math. Comput. 237, 60–68 (2014)
Takahashi, W.: Introduction to Nonlinear and Convex Analysis. Yokohama Publishers, Yokohama (2009)
Moudafi, A., Al-Shemas, E.: Simultaneous iterative methods for split equality problem. Trans. Math. Program. Appl. 1, 1–11 (2013)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, Stat. Methodol. 58, 267–288 (1996)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64, 304–310 (1989). https://doi.org/10.1016/0002-9149(89)90524-9
Acknowledgements
DC was supported by Post-Doctoral Fellowship of Chiang Mai University, Thailand. This research was also supported by Chiang Mai University and Thailand Science Research and Innovation under the project IRN62W0007.
Funding
This work was funded by Chiang Mai University and Thailand Science Research and Innovation.
Author information
Authors and Affiliations
Contributions
Writing original draft preparation, PS; review and editing, WI; software and editing, DC; supervision, SS. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sarnmeta, P., Inthakon, W., Chumpungam, D. et al. On convergence and complexity analysis of an accelerated forward–backward algorithm with linesearch technique for convex minimization problems and applications to data prediction and classification. J Inequal Appl 2021, 141 (2021). https://doi.org/10.1186/s13660-021-02675-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-021-02675-y