1 Introduction

In 2005, Nesterov presented the smoothing [16] and excessive gap techniques [17] to solve nonsmooth convex minimization problems. Specifically, he solved the following minimization problem,

$$\begin{aligned} \min _{x\in X} f(x)=\widehat{f}(x)+\max _{u}\{{<Ax,\,u>-\widehat{\phi }(u){\text {:}}\,u\in Q}\}, \end{aligned}$$
(1.1)

where X is a bounded closed convex set in the n-dimensional Euclidean space \({\mathbb{R}}^{n};\,\widehat{f}(x)\) is a convex and Lipschitz continuous differentiable function with some constant \(M\geqslant 0\) on X; the linear operator A maps X to Q; Q is a simple bounded closed convex set in the m-dimensional Euclidean space \({\mathbb{R}}^{m};\) and \(\widehat{\phi }(u)\) is a simple continuous convex function on Q, such that the maximum function in (1.1) has a closed-form solution. The iteration complexity of Nesterov’s smoothing technique is \(O(1/{\varepsilon }),\) where \(\varepsilon \) is a user-defined absolute accuracy of approximate optimal value. The convergence rate of the excessive gap technique for the problem is \(O(1/k)\) if \(\widehat{f}(x)\) is a non-strongly convex function, and can reach \(O(1/k^{2})\) if \(\widehat{f}(x)\) is a strongly convex function, where k is the iteration counter. The two methods have been widely used in applications, e.g., sparse recovery [2], resource allocation [8], risk measures for portfolio optimization [7], and multi-commodity flow and fractional packing [19].

In this paper, we aim at using the smoothing [16] and excessive gap techniques [17] to solve a convex optimization problem in the placement of a very large-scale integrated (VLSI) circuit. Placement is one of the most important steps in VLSI computer aided design, since chip performance depends heavily on the circuit placement results [1]. A modern chip often contains millions of circuit cells and nets, which must be placed in a design region legally and the objective is optimized. This has to be done using high-performance optimization techniques.

A VLSI circuit can be modeled as a hypergraph \(\mathcal {N}=(V,\,E),\) where V denotes the set of circuit cells with possibly different widths and heights and E denotes the set of nets specifying interconnections between the circuit cells. Note that a net may contain more than two cells, i.e., a net may be a hyperedge. Given a rectangular \([0,\,W]\times [0,\,H],\) VLSI placement seeks a placement of circuit cells such that no cell overlaps with the other, and the total wirelength is minimized [1, 4, 14]. This problem is NP-complete [6], and is rather difficult to solve due to the very large scale and NP-completeness. However, up to now, there are a number of packages for the VLSI placement problem using different optimization methods [1, 4, 14].

Given the location \((x^{(i)},\,y^{(i)})\in [0,\,W]\times [0,\,H]\) of circuit cell \(i\in V,\) the total wirelength of the circuit is defined as the Half-Perimeter Wirelength (HPWL), i.e., \(\mathrm{HPWL}_{\mathcal {N}}(x,\,y)=\mathrm{HPWL}_{\mathcal {N}}(x)+\mathrm{HPWL}_{\mathcal {N}}(y),\) where

$$\begin{aligned} \mathrm{HPWL}_{\mathcal {N}}(x)=\sum _{e\in E} \left[ \max _{i\in e} x^{(i)}-\min _{j\in e} x^{(j)}\right] =\sum _{e\in E} \max _{i, j\in e} \left\{ x^{(i)}-x^{(j)}\right\} . \end{aligned}$$
(1.2)

Since the HPWL function (1.2) is continuous and convex, but not differentiable, it is usually approximated by the Bound-2-Bound (B2B) net model [21]. The B2B model is a convex quadratic function and has been widely used in VLSI placement research [1, 4, 13]. SimPL [10, 12], ComPLx [9], and Mapple [11] are among the best state-of-the-art placers in modern VLSI Computer Aided Design. They also approximate the HPWL function by the B2B net model [21], and solve the following problem as a subproblem,

$$\begin{aligned} \min _{x \in X} f(x)=\mathrm{HPWL}_{\mathcal {N}}^\mathrm{B2B}(x)+\lambda \left\| x-x_{0}^{+}\right\| _{1}, \end{aligned}$$
(1.3)

where \(X=[0,\,W]^{n},\,\mathrm{HPWL}_{\mathcal {N}}^\mathrm{B2B}(x)\) is the B2B convex quadratic approximation of the function \(\mathrm{HPWL}_{\mathcal {N}}(x)\) in Eq. (1.2), \(\lambda >0,\) and \(x_{0}^{+}\) is the feasible solution of the VLSI placement problem.

The problem (1.3) is a key subproblem in VLSI placers SimPL, ComPLx, and Mapple, which was solved by the conjugate gradient method. In the problem, the B2B net model is a rough approximation of the HPWL function, which cannot capture the HPWL information exactly in the process of optimization. Moreover, the \(\Vert \cdot \Vert _{1}\) norm is not Lipschitz continuously differentiable, which makes the conjugate gradient method for solving the problem (1.3) not theoretically sound.

In this paper, we do not use the B2B net model to approximate the function \(\mathrm{HPWL}_{\mathcal {N}}(x)\) but use the HPWL function directly. The \(l_{1}\)-norm \(||\cdot ||_{1}\) is changed to the square of \(l_{2}\)-norm, i.e., \(||\cdot ||^{2}_{2}.\) Hence, the problem we consider in this paper is

$$\begin{aligned} \min _{x\in X} f(x)=\sum _{e\in E} \max _{i, j\in e} \left\{ x^{(i)}- x^{(j)}\right\} +\lambda \left\| x-x_{0}^{+}\right\| _{2}^{2}, \end{aligned}$$
(1. 4)

where \(\lambda >0.\) The function \(\mathrm{HPWL}_{\mathcal {N}}(x)\) is a non-differentiable convex function, so we will adopt Nesterov’s smoothing and excessive gap techniques [16, 17] to solve problem (1.4).

Several approaches in the previous literature can be used to solve the minimization problem (1.4). The LSE net model and the \(L_{P}\)-norm net model [1, 4] approximate the function \(\mathrm{HPWL}_{\mathcal {N}}(x)\) by the LSE and \(L_{P}\)-norm functions. Subgradient methods [3] can be used to solve the problem (1.4) directly. But it has been recognized in practice that subgradient methods are usually slow and numerically sensitive to the choice of step sizes. Recently, Nesterov [18] proposed a subgradient method to optimize huge-scale problems with sparse subgradients. This method is based on a recursive update of the results of matrix/vector products and the values of symmetric functions. The convergence rate of the algorithm is \(O(1/\sqrt{k}),\) where k is the iteration counter. Dinh et al. [5] proposed an algorithm which combines Lagrangian decomposition with excessive gap smoothing techniques to solve large-scale separable convex optimization problems. The convergence rate is \(O(1/k).\) However, our problem is not separable, so we cannot use their technique directly. Schmidt et al. [20] proposed an algorithm to optimize the sum of a finite number of smooth convex functions by the stochastic average gradient method. The convergence rate is \(O(1/k).\)

In this paper, we propose an algorithm basing on the smoothing and excessive gap techniques by Nesterov [16, 17] to solve problem (1.4). Comparing problem (1.4) with the problem (1.1) in Nesterov [16, 17], we can find that our considered problem is a direct generalization of problem (1.1) on the summation of a large number of maximum functions. Moreover, every maximum function in (1.4) contains only the cells in a net, and the number of cells in a net is almost not too big according to the characteristic of the VLSI circuit. Hence, we use Nesterov’s smoothing technique on every maximum function directly. The proposed algorithm has a convergence rate \(O(1/k^{2}),\) where k is the iteration counter. According to the complexity theory for convex optimization by Nemirovski and Yudin [15], the proposed algorithm is optimal.

The paper is organized as follows. In Sect. 2, some notations and basic results are quoted. In Sect. 3.1, we introduce the technique of smoothing the HPWL function of every net \(e_{k}.\) The excessive gap condition (EGC) and some lemmas are given in Sect. 3.2. Section 4 introduces the algorithm. In this section, the theorems of convergence rate and efficiency estimate are also specified. In Sect. 5, we give a scheme to speed up the convergence of the algorithm. Finally, preliminary computational experiments are put in Sect. 6.

2 Notations and Basic Results

In this section, we review some notations and basic results proposed in Nesterov’s smoothing and excessive gap techniques [16, 17]. They will be used in our paper.

Let X be a finite dimensional space. In this paper, we define X as

$$\begin{aligned} X=\left\{ x=(x^{(1)},\,x^{(2)},\cdots ,x^{(|V|)})^\mathrm{T}{\text {:}}\,0\leqslant x^{(i)}\leqslant W\right\} , \end{aligned}$$

where W is the width of the placement region in the VLSI placement problem.

The space of linear functions on X is denoted by \(X^{*}.\) For \(s\in X^{*},\,x\in X,\) the value of s at x is denoted by \(\langle s,\,x\rangle ,\) where \(\langle \cdot ,\,\cdot \rangle \) denotes the regular inner product. Denote \(s,\,z\in S,\) where S is a finite dimensional space equipped with the \(l_{p}\)-norm. The dual norm of s is defined as

$$\begin{aligned} ||s||_{p}^{*}=\max _{||z||_{p}=1}\langle s,\,z \rangle . \end{aligned}$$

Let A be a linear operator which maps X to Q, i.e., \(X\longrightarrow Q.\) We use \(A_{i}\) to denote the ith row of A, and use \(A^{j}\) to denote the jth column of A. In this paper, we set the space Q be equipped with the \(l_{1}\)-norm, and \(Q=\left\{ u{\text {:}}\,\sum \nolimits _{i=1}^{p} |u^{(i)}|\leqslant 1\right\} .\)

For \(x\in X,\,u\in Q,\,X\) is equipped with the \(l_{2}\)-norm, and the norm of A is defined as

$$\begin{aligned} ||A||_{2,1}=\max _{||x||_{2}=1} \max _{||u||_{1}=1}\langle Ax,\,u \rangle . \end{aligned}$$

Clearly,

$$\begin{aligned} ||A||_{2,1}=\left\| A^{\rm T}\right\| _{1,2}=\max _{||x||_{2}=1}||Ax||_1^{*}=\max _{||u||_{1}=1}\left\| A^{\rm T}u\right\| _2^{*}. \end{aligned}$$

It is easy to verify that

$$\begin{aligned} ||Ax||_1^{*}\leqslant ||A||_{2,1} \cdot ||x||_{2},\quad \forall x\in X, \end{aligned}$$
(2.1)

and

$$\begin{aligned} \left\| A^{\rm T}u\right\| _2^{*}\leqslant ||A||_{2,1} \cdot ||u||_1,\quad \forall u\in Q. \end{aligned}$$
(2.2)

It is easy to prove that the norm of A satisfies

$$\begin{aligned} ||A||_{2,1}=\max _{||x||_{2}=1}\max _{1\leqslant i\leqslant p}\left\{ \langle A_{i}^{\rm T},\,x \rangle \right\} =\max _{1\leqslant i\leqslant p}\left\{ \left\| A_{i}\right\| _{2}^{*}\right\} , \end{aligned}$$
(2.3)

where p is the number of rows in A.

Furthermore, we define a strongly convex function \(d_{Q}(u)\) on the convex set Q, which satisfies that

$$\begin{aligned} d_{Q}(u)\geqslant d_{Q}(u^{*})+\frac{1}{2}\sigma _{Q}\Vert |u-u^{*}||_{1}^{2}, \end{aligned}$$
(2.4)

and for all \(u,\,v\in Q,\)

$$\begin{aligned} d_{Q}(u)\geqslant d_{Q}(v)+\langle \nabla d_{Q}(v),\,u-v \rangle +\frac{1}{2}\sigma _{Q} ||u-v||_{1}^{2}, \end{aligned}$$
(2.5)

where \(\sigma _{Q} >0\) is the strong convexity parameter of the function \(d_{Q}(u),\) and \(u^{*}=\arg \!\min _{u\in Q}d_{Q}(u).\) Without loss of generality, we assume that \(d_{Q}(u^{*})=0.\)

3 Objective and Dual Functions

In this section, we present the smoothing technique for the VLSI placement problem in detail. And then, the EGC is given for the objective function and its dual form.

3.1 Smoothing the Objective Function

Let \({a}_{i,j}(e_{k})\) be an n-dimensional vector corresponding to net \(e_{k}\) and cells i and j, where n is the dimension of the vector x. And for any \(i,\,j\in e_{k},\,i\ne j,\) the ith component of \( {a}_{i,j}(e_{k})\) is 1, the jth component of \({a}_{i,j}(e_{k})\) is \(-\)1, and the other components are zeros. Let \(A(e_{k})\) be a matrix with respect to \(e_{k}\) with all possible vectors \(a_{i,j}(e_{k})\) as rows. To make it clear, we give the following example.

Consider a hypergraph \(\mathcal {N}=(V,\,E),\) where \(V=\{1,\,2,\,3\},\) \(E=\{e_{1},\,e_{2}\},\) and the cells \(1,\,2\in e_{1},\,1,\,2,\,3\in e_{2}.\) By the notation, for the net \(e_{1},\,a_{1, 2}(e_{1})=(1,\,-1,\,0),\,a_{2, 1}(e_{1})=(-1,\,1,\,0);\) and for the net \(e_{2},\) we have \(a_{1, 2}(e_{2})=(1,\,-1,\,0),\,a_{1, 3}(e_{2})=(1,\,0,\,-1),\,a_{2, 1}(e_{2})=(-1,\,1,\,0),\,a_{2, 3}(e_{2})=(0,\,1,\,-1),\,a_{3, 1}(e_{2})=(-1,\,0,\,1),\,a_{3, 2}(e_{2})=(0,\,-1,\,1).\) Thus, the matrices \(A(e_{1})\) and \(A(e_{2})\) have the following forms:

$$\begin{aligned} A\left( e_{1}\right) = \left( \begin{array}{ccc} 1 &{} -1 &{} 0 \\ -1 &{} 1 &{} 0 \\ \end{array} \right) ,\quad A\left( e_{2}\right) =\left( \begin{array}{cccccc} 1 &{} 1 &{} -1 &{} 0 &{} -1 &{} 0\\ -1 &{} 0 &{} 1 &{} 1 &{} 0 &{} -1\\ 0 &{} -1 &{} 0 &{} -1 &{} 1 &{} 1\\ \end{array} \right)^{\rm T}. \end{aligned}$$

Obviously, the dimension of \(A(e_{k})\) is \((n_{k}^{2}-n_{k})\times n,\) where \(n_{k}\) is the number of cells on the net \(e_{k}.\) Moreover, any row of the matrix \(A(e_{k})\) has only two non-zero components. The one is 1 and the other one is \(-\)1. So by (2.3), we can get

$$\begin{aligned} \left\| A\left( e_{k}\right) \right\| _{2,1}=\left\| A_{i}\left( e_{k}\right) \right\| _{2}^{*}=\left\| A_{i}\left( e_{k}\right) \right\| _{2}=\sqrt{2}. \end{aligned}$$
(3.1)

Lemma 3.1

By the above notations, for any net \(e_{k}\) with \(n_{k}\) cells, we have

$$\begin{aligned}\begin{aligned} \mathrm{HPWL}_{e_{k}}(x)&=\max _{i,\,j\in e_{k}}\left\{ x^{(i)}-x^{(j)}\right\} =\max _{i,\,j\in e_{k}}\left\{ \langle a_{i,j}\left( e_{k}\right) ,\,x \rangle \right\} \\ {}&= \max _{u_{k}}\left\{ \langle A\left( e_{k}\right) x,\,u_{k} \rangle {\text {:}}\,\sum \limits _{i=1}^{n_{k}(n_{k}-1)} \left| u^{(i)}_{k}\right| \leqslant 1\right\} . \end{aligned}\end{aligned}$$

Proof

For cells \(i,\,j\in e_{k},\) it holds that \(x^{(i)}-x^{(j)}=\langle a_{i,j}(e_{k}),\,x \rangle .\) Thus,

$$\begin{aligned} \max _{i,\,j\in e_{k}}\left\{ x^{(i)}-x^{(j)}\right\} =\max _{i,\,j\in e_{k}} \langle a_{i,j}\left( e_{k}\right) ,\,x \rangle =\max _{i,\,j\in e_{k}} \left| x^{(i)}-x^{(j)}\right| =\max _{i,\,j\in e_{k}} \left| \langle a_{i,j}\left( e_{k}\right) ,\,x\rangle \right| . \end{aligned}$$
(3.2)

Furthermore, there exist \(i^{\prime },\,j^{\prime }\in e_{k}\) such that

$$\begin{aligned} \max _{i,\,j\in e_{k}} \left| \langle a_{i,j}\left( e_{k}\right) ,\,x \rangle \right| = \left| \langle a_{i^{\prime },j^{\prime }}\left( e_{k}\right) ,\,x \rangle \right| . \end{aligned}$$
(3.3)

So for \(u_{k}\) such that \(\sum \nolimits _{i=1}^{n_{k}(n_{k}-1)} |u^{(i)}_{k}|\leqslant 1,\)

$$\begin{aligned} \begin{aligned} \langle A\left( e_{k}\right) x,\,u_{k} \rangle&=\sum \limits _{i= 1}^{n_{k}(n_{k}-1)}u_{k}^{(i)} \langle A_{i}\left( e_{k}\right) ,\,x\rangle \\&\leqslant \sum \limits _{i= 1}^{n_{k}(n_{k}-1)}\left| u_{k}^{(i)} \langle A_{i}\left( e_{k}\right) ,\,x\rangle \right| \\&\leqslant \left| \langle a_{i^{\prime },j^{\prime }}\left( e_{k}\right) ,\,x \rangle \right| \sum \limits _{i= 1}^{n_{k}(n_{k}-1)} \left| u_{k}^{(i)}\right| \\&\leqslant \left| \langle a_{i^{\prime },j^{\prime }}\left( e_{k}\right) ,\,x \rangle \right| . \end{aligned} \end{aligned}$$
(3.4)

Hence, by (3.2)–(3.4),

$$\begin{aligned} \max _{u_{k}}\left\{ \langle A\left( e_{k}\right) x,\,u_{k}\rangle {\text {:}}\sum \limits _{i=1}^{n_{k}(n_{k}-1)} \left| u^{(i)}_{k}\right| \leqslant 1\right\} \leqslant \max _{i,\,j\in e_{k}} \langle a_{i,j}\left( e_{k}\right) ,\,x \rangle . \end{aligned}$$

Furthermore, suppose that in Eq. (3.2), \(a_{i^{\prime },j^{\prime }}(e_{k})\) is the pth row of \(A(e_{k}).\) Take \(u_{k}\) such that the pth component of \(u_{k}\) is \(\mathrm{sgn}(\langle a_{i^{\prime },j^{\prime }}(e_{k}),\,x \rangle ),\) and the other components of \(u_{k}\) are 0. Then

$$\begin{aligned} \langle A\left( e_{k}\right) x,\,u_{k} \rangle =\left| \langle a_{i^{\prime },j^{\prime }}\left( e_{k}\right) ,\,x\rangle \right| =\max _{i,j\in e_{k}} \langle a_{i,j}\left( e_{k}\right) ,\,x \rangle , \end{aligned}$$

which implies that

$$\begin{aligned} \max _{i,j\in e_{k}}\left\{ \langle a_{i,j}\left( e_{k}\right) ,\,x \rangle \right\} =\max _{u_{k}}\left\{ \langle A\left( e_{k}\right) x,\,u_{k} \rangle {\text {:}}\,\sum \limits _{i=1}^{n_{k}(n_{k}-1)}\left| u^{(i)}_{k}\right| \leqslant 1\right\} . \end{aligned}$$

Hence, Lemma 3.1 holds.

As shown in Lemma 3.1, in this paper, we define the set

$$\begin{aligned} Q_{k}=\left\{ u_{k}{\text {:}}\,\sum \limits _{i=1}^{{n_{k}(n_{k}-1)}}\left| u^{(i)}_{k}\right| \leqslant 1\right\} , \end{aligned}$$

where \(n_{k}\) is the number of cells on the net \(e_{k}.\) Furthermore, by Lemma 3.1, for any net \(e_{k},\) we write the function \(\mathrm{HPWL}_{e_{k}}(x)\) of \(e_{k}\) as

$$\begin{aligned} f_{e_{k}}(x)=\max _{u_{k}\in Q_{k}}\langle A\left( e_{k}\right) x,\,u_{k} \rangle . \end{aligned}$$
(3.5)

Let

$$\begin{aligned} \widehat{f}(x)=\left\| x-x_{0}^{+}\right\| _{2}^{2}. \end{aligned}$$

By Lemma 3.1, the problem in (1.4) can be written as

$$\begin{aligned} \min _{x\in X}f(x)=\lambda \widehat{f}(x)+\sum \limits _{k=1}^{m} \max _{u_{k}\in Q_{k}}\langle A\left( e_{k}\right) x,\,u_{k} \rangle , \end{aligned}$$
(3.6)

where \(\lambda > 0,\,m=|\mathcal {N}|,\) which is obviously a direct generalization of problem (1.1).

For every net \(e_{k},\) the HPWL function (1.2) is convex but not Lipschitz continuously differentiable. Hence, the problem (3.6) is not easy to solve directly. We use the idea of smoothing technique [16] to transform the problem such that it is smooth.

Recall that \(d_{Q_{k}}(u_{k})\) is a strongly convex function on \(Q_{k}\) with a strongly convexity parameter \(\sigma _{Q_{k}}>0.\) Let \(u_{k}^{*}\) be the optimal solution of \(d_{Q_{k}}(u_{k}),\) i.e.,

$$\begin{aligned} u_{k}^{*}= \arg \!\min _{u_{k}\in Q_{k}}d_{Q_{k}}\left( u_{k}\right) . \end{aligned}$$
(3.7)

Let \(\mu \) be a positive smoothness parameter. Consider the following function

$$\begin{aligned} f_{e_{k},\mu }(x)=\max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) x,\,u_{k}\rangle -\mu d_{Q_{k}}\left( u_{k}\right) \right\} . \end{aligned}$$
(3.8)

Since \(d_{Q_{k}}(u_{k})\) is strongly convex, the optimal solution of the above maximization problem is unique and we let it be \(u^{*}_{k}(x).\) Thus, the smoothness function of the objective function in (3.6) is in the following form,

$$\begin{aligned} f_{\mu }(x) =\lambda \widehat{f}(x)+ \sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) x,\,u_{k} \rangle -\mu d_{Q_{k}}\left( u_{k}\right) \right\} . \end{aligned}$$
(3.9)

By Theorem 1 of [16], \(f_{\mu }(x)\) is a continuously differentiable convex function, and its gradient

$$\begin{aligned} \nabla f_{\mu }(x)=\lambda \nabla \widehat{f}(x)+\sum \limits _{k=1}^{m}\langle A\left( e_{k}\right) ,\,u_{k}^{*}(x) \rangle , \end{aligned}$$

is Lipschitz-continuous on X.

The dual problem of (3.6) can be formulated as

$$\begin{aligned} \max _{u_{1}\in Q_{1},\cdots ,u_{m}\in Q_{m}} \Phi \left( u_{1},\cdots ,u_{m}\right) = \min _{x\in X}\lambda \widehat{f}(x)+ \sum \limits _{k=1}^{m} \langle A\left( e_{k}\right) x,\,u_{k} \rangle , \end{aligned}$$
(3.10)

where \(\lambda >0.\) Denote \(x^{*}(u)\) as the optimal solution of the minimization problem in the dual problem (3.10), where \(u=(u_{1},\,u_{2}, \cdots ,u_{m})^{\rm T}.\) Since \(\widehat{f}(x)\) is strongly convex with strongly convex parameter 2, \(x^{*}(u)\) is unique. Moreover, by Theorem 1 of [16], \(\Phi (u_{1},\cdots ,u_{m})\) is a continuously differentiable concave function.

Therefore, for any \(\widehat{x}\in X\) and \(\overline{u}_{k} \in Q_{k},\) where \(k=1,\cdots ,m\), we have

$$\begin{aligned} \Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) \leqslant \lambda \widehat{f}(\widehat{x})+ \sum \limits _{k=1}^{m} \langle A\left( e_{k}\right) \widehat{x},\,\overline{u}_{k} \rangle . \end{aligned}$$
(3.11)

Obviously, suppose \(\widehat{x}=x^{*}(\overline{u}),\) then we have

$$\begin{aligned} \Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) \leqslant \lambda \widehat{f}(\widehat{x})+ \sum \limits _{k=1}^{m} \langle A\left( e_{k}\right) \widehat{x},\,\overline{u}_{k} \rangle . \end{aligned}$$
(3.12)

Since \(u_{1},\cdots ,u_{m}\) are independent and they have different dimensions, the partial gradient of the function \(\Phi (u_{1},\cdots ,u_{m})\) with respect to \(u_{k}\) is denoted by

$$\begin{aligned} \nabla _{u_{k}}\Phi \left( u_{1},\cdots ,u_{m}\right) =\frac{\partial \Phi (u_{1},\cdots ,u_{m})}{\partial u_{k}} = \frac{\partial \Phi }{\partial x^{*}(u)}\frac{\partial x^{*}(u)}{\partial u_{k}}+A\left( e_{k}\right) x^{*}(u). \end{aligned}$$

Since \(x^{*}(u)\) is the unique optimal solution of (3.10), the gradient of the function \(\Phi (u_{1},\cdots ,u_{m})\) with respect to \(x^{*}(u)\) satisfies the following equality:

$$\begin{aligned} \frac{\partial \Phi (u_{1},\cdots ,u_{m})}{\partial x^{*}(u)}=0. \end{aligned}$$

Hence,

$$\begin{aligned} \nabla _{u_{k}}\Phi \left( u_{1},\cdots ,u_{m}\right) =\frac{\partial \Phi (u_{1},\cdots ,u_{m})}{\partial u_{k}}=A\left( e_{k}\right) x^{*}(u). \end{aligned}$$
(3.13)

By the way, the gradient of \(\Phi (u_{1},\cdots ,u_{m})\) is defined as follows:

$$\begin{aligned} \nabla \Phi \left( u_{1},\cdots ,u_{m}\right) =\left( \nabla _{u_1}\Phi \left( u_{1},\cdots ,u_{m}\right) ,\cdots , \nabla _{u_{m}}\Phi \left( u_{1},\cdots ,u_{m}\right) \right)^{\rm T}. \end{aligned}$$

By the properties of the norms, we can get the following inequality:

$$\begin{aligned} \left\| \nabla \Phi \left( u_{1},\cdots ,u_{m}\right) \right\| \leqslant \sum \limits _{k=1}^{m}\left\| \nabla _{u_{k}}\Phi \left( u_{1},\cdots ,u_{m}\right) \right\| . \end{aligned}$$
(3.14)

Lemma 3.2

The function \(\Phi (u_{1},\cdots ,u_{m})\) is Lipschitz-continuous differentiable with a Lipschitz constant

$$\begin{aligned} L(\Phi )=\frac{m^{2}}{\lambda }. \end{aligned}$$

Proof

From the assumption that \(\widehat{f}(x)\) is strongly convex, the function \(\Phi (u_{1},\cdots ,u_{m})\) has the unique minimal solution \(x^{*}(u).\) Consider \(u_{k},\,{v}_{k}\in Q_{k},\,k=1,\cdots ,m.\) By the first-order optimality condition, we have

$$\begin{aligned} \langle \lambda \nabla \widehat{f}(x^{*}(u))+\sum \limits _{k=1}^{m}A\left(e_{k}\right)^{\rm T}u_{k},\,x^{*}(v)-x^{*}(u) \rangle \geqslant 0,\\ \langle \lambda \nabla \widehat{f}(x^{*}(v))+\sum \limits _{k=1}^{m}A\left( e_{k}\right)^{\rm T}v_{k},\,x^{*}(u)-x^{*}(v) \rangle \geqslant 0. \end{aligned}$$

Adding the above inequalities and using the strong convexity of \(\widehat{f}(x),\) we have

$$\begin{aligned} \begin{aligned} \sum \limits _{k=1}^{m}\langle A\left( e_{k}\right) (x^{*}(v)-x^{*}(u)),\,u_{k}-v_{k} \rangle&\geqslant \langle \lambda \nabla \widehat{f}(x^{*}(u))- \lambda \nabla \widehat{f}(x^{*}(v)),\,x^{*}(u)-x^{*}(v) \rangle \\&\geqslant \lambda \widehat{\sigma }||x^{*}(u)-x^{*}(v)||_{2}^{2}. \end{aligned} \end{aligned}$$
(3.15)

By (2.1) and (3.15), we can get the following inequalities:

$$\begin{aligned} \begin{aligned}&\;\;\;\;\;\left( \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) (x^{*}(u)-x^{*}(v))\right\| ^{*}_{1}\right) ^{2}\\(\mathrm{by}\;(2.1))&\leqslant \left( \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) \right\| _{2,1}||(x^{*}(u)-x^{*}(v))||_{2}\right) ^{2}\\ {}&= \left( \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) \right\| _{2,1}\right) ^{2}||x^{*}(v)-x^{*}(u)||_{2}^{2}\\(\mathrm{by}\;(3.15))&\leqslant \frac{1}{\lambda \widehat{\sigma }}\left( \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) \right\| _{2,1}\right) ^{2}\sum \limits _{k=1}^{m} \langle A\left( e_{k}\right) (x^{*}(u)-x^{*}(v)),\,u_{k}-v_{k} \rangle \\ {}&\leqslant \frac{1}{\lambda \widehat{\sigma }}\left( \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) \right\| _{2,1}\right) ^{2}\sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) (x^{*}(u)-x^{*}(v))\right\| ^{*}_{1}\left\| u_{k}-v_{k}\right\| _{1}\\ {}&\leqslant \frac{1}{\lambda \widehat{\sigma }}\left( \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) \right\| _{2,1}\right) ^{2}\sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) (x^{*}(u)-x^{*}(v))\right\| ^{*}_{1}\sum \limits _{k=1}^{m}\left\| u_{k}-v_{k}\right\| _{1}. \end{aligned} \end{aligned}$$

Thus, by (3.1) and \(\widehat{\sigma }=2,\) we can get the following inequality:

$$\begin{aligned} \begin{aligned} \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) (x^{*}(u)-x^{*}(v)) \right\| ^{*}_{1}&\leqslant \frac{1}{\lambda \widehat{\sigma }}\left( \sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) \right\| _{2,1}\right) ^{2}\sum \limits _{k=1}^{m}\left\| u_{k}-v_{k}\right\| _{1}\\ {}&=\frac{m^{2}}{\lambda }\sum \limits _{k=1}^{m}\left\| u_{k}-v_{k}\right| _{1}. \end{aligned} \end{aligned}$$

Moreover, by (3.13), (3.14), and the above inequality, we have

$$\begin{aligned} \begin{aligned} \left\| \nabla \Phi \left( u_{1},\cdots ,u_{m}\right) -\nabla \Phi \left( v_{1},\cdots ,v_{m}\right) \right\| ^{*}_{1}&\leqslant \sum \limits _{k=1}^{m}\left\| \nabla _{u_{k}} \Phi _{e_{k}}\left( u_{1},\cdots ,u_{m}\right) -\nabla _{v_{k}} \Phi \left( v_{1},\cdots ,v_{m}\right) \right\| _{1}^{*}\\ {}&=\sum \limits _{k=1}^{m}\left\| A\left( e_{k}\right) (x^{*}(u)-x^{*}(v))\right| ^{*}_{1}\\ {}&\leqslant \frac{m^{2}}{\lambda } \sum \limits _{k=1}^{m}\left\| u_{k}- v_{k}\right\| _{1}. \end{aligned} \end{aligned}$$

Hence, the lemma holds.

3.2 Excessive Gap Condition

Similar to [17], for some \(\overline{x}\in X\) and \(\overline{u}_{k}\in Q_{k},\) where \(k=1,\cdots ,m,\) the EGC is given as follows:

$$\begin{aligned} f_{\mu }(\overline{x})\leqslant \Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) . \end{aligned}$$
(3.16)

Lemma 3.3

Let vectors \(\overline{x}\in X\) and \(\overline{u}_{k}\in Q_{k}\) satisfying (3.16), where \(k=1,\cdots ,m.\) Then

$$\begin{aligned} 0\leqslant f(\overline{x})-\Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) \leqslant \mu \sum \limits _{k=1}^{m} D_{k}, \end{aligned}$$
(3.17)

where \(D_{k}=\max _{u_{k}\in Q_{k}}d_{Q_{k}}(u_{k}).\)

Proof

Clearly,

$$\begin{aligned} \begin{aligned}&\lambda \widehat{f}(\overline{x})+ \sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle -\mu D_{k}\right\} \\&\leqslant \lambda \widehat{f}(\overline{x})+ \sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle -\mu d_{Q_{k}}\left( u_{k}\right) \right\} . \end{aligned} \end{aligned}$$

So we have

$$\begin{aligned} \begin{aligned}f(\overline{x})-\mu \sum \limits _{k=1}^{m}D_{k}&=\lambda \widehat{f}(\overline{x})+\sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle -\mu D_{k}\right\} \\&\leqslant \lambda \widehat{f}(\overline{x})+\sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle -\mu d_{Q_{k}}\left( u_{k}\right) \right\} \\&= f_{\mu }(\overline{x}). \end{aligned} \end{aligned}$$

Hence, we can easily get

$$\begin{aligned} \begin{aligned}f(\overline{x})-\mu \sum \limits _{k=1}^{m} D_{k}\leqslant f_{\mu }(\overline{x})\leqslant \Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) , \end{aligned} \end{aligned}$$

and Lemma 3.3 holds.

For \(u_{k},\,v_{k}\in Q_{k},\) denote the Bregman distance between \(u_{k}\) and \(v_{k}\) by

$$\begin{aligned} \xi \left( v_{k},\,u_{k}\right) =d_{Q_{k}}\left( v_{k}\right) -d_{Q_{k}}\left( u_{k}\right) -\langle \nabla d_{Q_{k}}\left( u_{k}\right) ,\,v_{k}-u_{k} \rangle . \end{aligned}$$
(3.18)

By (2.5), we have

$$\begin{aligned} \xi \left( v_{k},\,u_{k}\right) \geqslant \frac{1}{2}\sigma _{Q_{k}}\left\| v_{k}-u_{k}\right\| _{1}^{2}. \end{aligned}$$
(3.19)

Define the Bregman projection of g on the set \(Q_{k}\) as follows:

$$\begin{aligned} V_{k}\left( u_{k},\,g\right) =\arg \!\max _{v_{k} \in Q_{k}}\left\{ \langle g,\,v_{k}-u_{k}\rangle -\mu \xi \left( v_{k},\,u_{k}\right) \right\} . \end{aligned}$$
(3.20)

Lemma 3.4

The EGC holds for

$$\begin{aligned} \begin{aligned} \mu&=\max _{1\leqslant k\leqslant m}\{\frac{1}{\sigma _{Q_{k}}}L(\Phi )\},\\ \overline{x}&=x^{*}(u^{*}), \\ \overline{u}_{k}&=V_{k}\left( u_{k},\,\nabla _{u_{k}}\Phi \left( u^{*}_{1},\cdots ,u^{*}_{m}\right) \right) , \end{aligned} \end{aligned}$$

where \(u^{*}_{k}\) is the minimal solution of \(d_{Q_{k}}(u_{k})\) and \(k=1,\cdots ,m.\)

Proof

Indeed, for any \(x \in X\) and \(u_{k}\in Q_{k},\,k=1,\cdots ,m,\) by setting \(g_{k}=\nabla _{u_{k}}\Phi (u^{*}_{1},\cdots ,u^{*}_{m}),\) we can get the equation

$$\begin{aligned} \Phi \left( V_{1}\left( u^{*}_{1},\,g_{1}\right) ,\cdots ,V_{m}\left( u^{*}_{m},\,g_{m}\right) \right) =\Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) . \end{aligned}$$

By Lemma 3.3, the function \(\Phi (u_{1},\cdots ,u_{m})\) is Lipschitz-continuous differentiable, we have

$$\begin{aligned} \left\| \nabla \Phi \left( u_{1},\cdots ,u_{m}\right) -\nabla \Phi \left( v_{1},\cdots ,v_{m}\right) \right\| ^{*}_{1}\leqslant L(\Phi )\sum \limits _{k=1}^{m}\left\| u_{k}- v_{k}\right\| _{1}. \end{aligned}$$

In review of [12] (Sect. 2.1), from the fact that the function \(\Phi (u_{1},\cdots ,u_{m})\) is concave, we have

$$ -\Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) \, \leqslant -\Phi \left( u^{*}_{1},\cdots ,u^{*}_{m}\right) -\sum \limits _{k=1}^{m}\langle \nabla _{u_{k}}\Phi ,\,\overline{u}_{k}-u^{*}_{k}\rangle +\frac{1}{2}L(\Phi )\sum \limits _{k=1}^{m}\left\| \overline{u}_{k}-u^{*}_{k}\right\| _{1}^{2}.$$

Thus,

$$\begin{aligned} \Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right)&\geqslant \Phi \left( u_{1}^{*},\cdots ,u_{m}^{*}\right)\\ &\quad +\sum \limits _{k=1}^{m}\left\{ \langle \nabla _{u_k}\Phi ,\,\overline{u}_{k}-u^{*}_{k}\rangle -\frac{1}{2}L(\Phi )\left\| \overline{u}_{k}-u^{*}_{k}\right\| _{1}^{2}\right\} \\&= \lambda \widehat{f}(x^{*}(u^{*}))+\sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) x^{*}(u^{*}),\,u^{*}_{k} \rangle \right. \\ {}&\left. \quad +\langle A\left( e_{k}\right) x^{*}(u^{*}),\,\overline{u}_{k}-u^{*}_{k}\rangle -\frac{1}{2}L(\Phi )\left\| \overline{u}_{k}-u^{*}_{k}\right\| _{1}^{2}\right\} \\(\mathrm{by}\; (3.19))&\geqslant \lambda \widehat{f}(x^{*}(u^{*}))+\sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) x^{*}(u^{*}),\,u^{*}_{k} \rangle \right. \\ {}&\left. \quad +\langle A\left( e_{k}\right) x^{*}(u^{*}),\,\overline{u}_{k}-u^{*}_{k}\rangle -\frac{L(\Phi )}{\sigma _{Q_{k}}}\xi \left( \overline{u}_{k},\,u^{*}_{k}\right) \right\} \\ \left( \mathrm{by}\; \mu =\max _{1\leqslant k\leqslant m}\left\{ \frac{1}{\sigma _{Q_{k}}}L(\Phi )\right\} \right)&\geqslant \lambda \widehat{f}(x^{*}(u^{*}))+\sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) x^{*}(u^{*}),\,u^{*}_{k} \rangle \right. \\ {}&\left. \quad +\langle A\left( e_{k}\right) x^{*}(u^{*}),\,\overline{u}_{k}-u^{*}_{k}\rangle -\mu \xi \left( \overline{u}_{k},\,u^{*}_{k}\right) \right\} \\&=\lambda \widehat{f}(x^{*}(u^{*}))+\sum \limits _{k=1}^{m}\max _{v_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) x^{*}(u^{*}),\,v_{k} \rangle -\mu \xi \left( v_{k},\,u^{*}_{k}\right) \right\} \\ (\mathrm{by}\; (2.5))&\geqslant \lambda \widehat{f}(x^{*}(u^{*}))+\sum \limits _{k=1}^{m}\left\{ \max _{v_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) x^{*}(u^{*}),\,v_{k} \rangle -\mu d_{Q_{k}}\left( v_{k}\right) \right\} \right\} \\ {}&=f_{\mu }(x^{*}(u^{*})). \end{aligned}$$

Hence, the EGC (3.16) is satisfied.

4 Algorithm

From Lemma 3.4, we know that the EGC holds for \(\mu =\max _{1\leqslant k \leqslant m}\frac{1}{\sigma _{Q_{k}}}L(\Phi ),\,\overline{x}=x^{*}(u^{*})\) and \(\overline{u}_{k}=u^{*}_{k},\,k=1,\cdots ,m.\) The following theorem develops a scheme to choose \(\mu ,\,\overline{x},\,\overline{u}=(\overline{u}_{1},\cdots ,\overline{u}_{m}),\) and makes sure that the EGC (3.16) holds in each iteration.

Theorem 4.1

Let vectors \(\overline{x}\in X\) and \(\overline{u}_{k}\in Q_{k},\,k=1,\cdots ,m,\) satisfying EGC for some positive parameters \(\mu .\) Fix a parameter \(\tau \in (0,\,1)\) and choose \(\mu _{+}=(1-\tau )\mu .\) Let

$$\begin{aligned} \begin{aligned}&\widehat{u}_{k}=(1-\tau )\overline{u}_{k}+\tau u_{k}^{*}(\overline{x}),\quad k=1,\cdots ,m,\\ {}&\overline{x}_{+}=(1-\tau ) \overline{x} +\tau x^{*}(\widehat{u}),\\ {}&\widetilde{u}_{k}=V_{k}(u_{k}^{*}(\overline{x}),\quad \frac{\tau }{(1-\tau )}A\left( e_{k}\right) \widehat{x}),\quad k=1,\cdots ,m,\\ {}&\overline{u}_{k+}=(1-\tau )\overline{u}_{k}+\tau \widetilde{u}_{k},\quad k=1,\cdots ,m. \end{aligned} \end{aligned}$$
(4.1)

Then \(\overline{x}_{+}\) and \(\overline{u}_{+}=(\overline{u}_{1+},\cdots ,\overline{u}_{m+})\) satisfy EGC with smoothness parameters \(\mu _{+},\) provided that \(\tau \) is chosen in accordance with the following relation:

$$\begin{aligned} \frac{\tau ^{2}}{1-\tau }\leqslant \min _{1 \leqslant k\leqslant m}\frac{\mu \sigma _{Q_{k}}}{L(\Phi )}. \end{aligned}$$

Proof

Denote \(\widehat{x}=x^{*}(\widehat{u})\) and \(v_{k}=u_{k}^{*}(\overline{x}).\) By line 2 of (4.1) and the convexity of \(\widehat{f}(x),\) we have

$$\begin{aligned} \begin{aligned} f_{\mu _{+}}\left( \overline{x}_{+}\right)&=\lambda \widehat{f}\left( \overline{x}_{+}\right) + \sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) \overline{x}_{+},\,u_{k} \rangle -\mu _{+}d_{Q_{k}}\left( u_{k}\right) \right\} \\ {}&=\lambda \widehat{f}((1-\tau ) \overline{x} +\tau \widehat{x})\\ {}&\quad + \sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) [(1-\tau ) \overline{x} +\tau \widehat{x}],\,u_{k} \rangle -(1-\tau )\mu d_{Q_{k}}\left( u_{k}\right) \right\} \\&\leqslant \lambda [(1-\tau )\widehat{f}(\overline{x}) +\tau \widehat{f}(\widehat{x})] \\&\quad +\sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ (1-\tau )\left[ {\langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle }-\mu d_{Q_{k}}\left( u_{k}\right) \right] +\tau \langle A\left( e_{k}\right) \widehat{x},\,u_{k} \rangle \right\} \\&= (1-\tau )(\lambda \widehat{f}(\overline{x}))+\tau \Phi \left( \widehat{u}_{1},\cdots ,\widehat{u}_{m}\right) \\&\quad + \sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ (1-\tau )\left[ \langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle -\mu d_{Q_{k}}\left( u_{k}\right) \right] +\tau \langle A\left( e_{k}\right) \widehat{x},\,u_{k}-\widehat{u}_{k} \rangle \right\} . \end{aligned} \end{aligned}$$

Since \(v_{k}=u_{k}^{*}(\overline{x})\) is an optimal solution of (3.8), by the first-order condition, we have

$$\begin{aligned} \sum \limits _{k=1}^{m}\langle A\left( e_{k}\right) \overline{x}-\nabla d_{Q_{k}}\left( v_{k}\right) ,\,u_{k}-v_{k} \rangle \leqslant 0. \end{aligned}$$
(4.2)

Hence,

$$\begin{aligned}&\lambda \widehat{f}(\overline{x})+\sum \limits _{k=1}^{m}\left[ {\langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle }-\mu d_{Q_{k}}\left( u_{k}\right) \right] \\(\mathrm{by}\; (3.18))=&\lambda \widehat{f}(\overline{x})+\sum \limits _{k=1}^{m}\left[ {\langle A\left( e_{k}\right) \overline{x},\,u_{k}\rangle }-\mu \left( \xi \left( u_{k},\,v_{k}\right) +d_{Q_{k}}\left( v_{k}\right) +\langle \nabla d_{Q_{k}}\left( v_{k}\right) ,\,u_{k}-v_{k} \rangle \right) \right] \\(\mathrm{by}\; (4.2))\leqslant&\lambda \widehat{f}(\overline{x})+\sum \limits _{k=1}^{m}\left[ {\langle A\left( e_{k}\right) \overline{x},\,v_{k}\rangle }-\mu \left( \xi \left( u_{k},\,v_{k}\right) +d_{Q_{k}}\left( v_{k}\right) \right) \right] \\ =&f_{\mu }(\overline{x})-\mu \sum \limits _{k=1}^{m}\xi \left( u_{k},\,v_{k}\right) \\(\mathrm{by}\;(3.16))\leqslant&\Phi \left( \overline{u}_{1},\cdots ,\overline{u}_{m}\right) -\mu \sum \limits _{k=1}^{m}\xi \left( u_{k},\,v_{k}\right) \\ (\mathrm{by}\;(3.11))\leqslant&\lambda \widehat{f}(\widehat{x})+\sum \limits _{k=1}^{m}\langle A\left( e_{k}\right) \widehat{x},\,\overline{u}_{k} \rangle -\mu \sum \limits _{k=1}^{m}\xi \left( u_{k},\,v_{k}\right) \\ =&\lambda \widehat{f}(\widehat{x})+\sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) \widehat{x},\,\overline{u}_{k}-\widehat{u}_{k}+\widehat{u}_{k}\rangle -\mu \xi \left( u_{k},\,v_{k}\right) \right\} \\ (\mathrm{by}\;(3.12))=&\Phi \left( \widehat{u}_{1},\cdots ,\widehat{u}_{m}\right) + \sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) \widehat{x},\,\overline{u}_{k}-\widehat{u}_{k}\rangle \right\} -\mu \sum \limits _{k=1}^{m}\xi \left( u_{k},\,v_{k}\right) . \end{aligned}$$

By the above inequalities, (3.19) and (4.1), we can get the following relations:

$$\begin{aligned} f_{\mu _{+}}\left( \overline{x}_{+}\right)&\leqslant (1-\tau )(\lambda \widehat{f}(\overline{x}))+\tau \Phi \left( \widehat{u}_{1},\cdots ,\widehat{u}_{m}\right) \, + \sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ (1-\tau )\left[ \langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle -\mu d_{Q_{k}}\left( u_{k}\right) \right] +\tau \langle A\left( e_{k}\right) \widehat{x},\,u_{k}-\widehat{u}_{k} \rangle \right\} \\ {}&\leqslant \Phi \left( \widehat{u}_{1},\cdots ,\widehat{u}_{m}\right) \, +\sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) \widehat{x},\,(1-\tau )\overline{u}_{k}+\tau u_{k}-\widehat{u}_{k} \rangle -(1-\tau )\mu \xi \left( u_{k},\,v_{k}\right) \right\} \\ ({\rm by\,line\,1\,of}\;(4.1))&=\Phi \left( \widehat{u}_{1},\cdots , \widehat{u}_{m}\right) +(1-\tau )\sum \limits _{k=1}^{m}\max _{u_{k}\in Q_{k}}\left\{ \frac{\tau }{1-\tau }\langle A\left( e_{k}\right) \widehat{x},\,u_{k}-v_{k}\rangle -\mu \xi \left( u_{k},\,v_{k}\right) \right\} \\({\rm by\,line\,3\,of}\;(4.1))&=\Phi \left( \widehat{u}_{1},\cdots ,\widehat{u}_{m}\right) +(1-\tau )\sum \limits _{k=1}^{m} \left\{ \frac{\tau }{1-\tau }\langle A\left( e_{k}\right) \widehat{x},\,\widetilde{u}_{k}-v_{k}\rangle -\mu \xi \left( \widetilde{u}_{k},\,v_{k}\right) \right\} \\ ({\rm by}\;(3.19))&\leqslant \Phi \left( \widehat{u}_{1},\cdots ,\widehat{u}_{m}\right) +(1-\tau )\sum \limits _{k=1}^{m}\left\{ \frac{\tau }{1-\tau }\langle A\left( e_{k}\right) \widehat{x},\,\widetilde{u}_{k}-v_{k} \rangle -\frac{\sigma _{Q_{k}}}{2}\left\| \widetilde{u}_{k}-v_{k}\right\| _{1}^{2}\right\} \\&\leqslant \Phi \left( \widehat{u}_{1},\cdots ,\widehat{u}_{m}\right) +\sum \limits _{k=1}^{m}\left\{ \tau \langle A\left( e_{k}\right) \widehat{x},\,\widetilde{u}_{k}-v_{k} \rangle -\frac{\tau ^{2}}{2}L(\Phi )\left\| \widetilde{u}_{k}-v_{k}\right\| _{1}^{2}\right\} . \end{aligned}$$

Moreover, according to lines 1 and 4 of (4.1), we have

$$\begin{aligned} \begin{aligned} f_{\mu _{+}}\left( \overline{x}_{+}\right)&\leqslant \Phi \left( \widehat{u}_{1},\cdots , \widehat{u}_{m}\right) +\sum \limits _{k=1}^{m}\left\{ \tau \langle A\left( e_{k}\right) \widehat{x},\,\widetilde{u}_{k}-v_{k} \rangle -\frac{\tau ^{2}}{2}L(\Phi )\left\| \widetilde{u}_{k}-v_{k}\right\| _{1}^{2}\right\} \\ {}&=\Phi \left( \widehat{u}_{1},\cdots , \widehat{u}_{m}\right) +\sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) \widehat{x},\,\overline{u}_{k+}-\widehat{u}_{k} \rangle -\frac{1}{2}L_{2}(\Phi )\left\| \overline{u}_{k+}-\widehat{u}_{k}\right\| _{1}^{2}\right\} \\&\leqslant \Phi \left( \overline{u}_{1+},\cdots ,\overline{u}_{m+}\right) . \end{aligned} \end{aligned}$$

The proof is completed.

According to the above theorem and lemmas, the following algorithm is given. Denote \(u^{i}\) as the ith iteration value of \(u,\,x_{i}\) as the ith iteration value of x, and \(u_{k}\) as the kth component of u.

figure a

Theorem 4.2

Let the pairs \(\overline{x}_{i}\) and \(\overline{u}^{i}=\{\overline{u}_{1}^{i},\cdots ,\overline{u}_{m}^{i}\}\) be generated by the above algorithm. Then the following inequality holds:

$$\begin{aligned} f\left( \overline{x}_{i}\right) -\Phi \left( \overline{u}_{1}^{i},\cdots ,\overline{u}_{m}^{i}\right) \leqslant \sum \limits _{k=1}^{m}\left\{ \mu _{i} D_{k}\right\} =\sum \limits _{k=1}^{m}\left\{ \frac{2L(\Phi )D_{k}}{(i+1)(i+2)\sigma }\right\} . \end{aligned}$$

Proof

According to Theorem 4.1 and Lemma 3.4, we have that the sequences \(\{\mu _{1}^{i},\cdots ,\mu _{m}^{i}\}_{i=0}^{\infty }\) and \(\{\tau _{i}\}_{i=0}^{\infty }\) satisfy the EGC (3.16), and

$$\begin{aligned} \mu _{i}=\frac{1}{\sigma }L(\Phi )\frac{1}{3}\times \frac{2}{4}\times \frac{5-2}{5}\times \cdots \times \frac{i}{i+2}=\frac{2L(\Phi )}{(i+1)(i+2)\sigma }. \end{aligned}$$

From Lemma 3.3, we have

$$\begin{aligned} 0\leqslant f\left( \overline{x}_{i}\right) -\Phi \left( \overline{u}_{1}^{i},\cdots , \overline{u}_{m}^{i}\right) \leqslant \sum \limits _{k=1}^{m}\left\{ \mu _{i} D_{k}\right\} =\sum \limits _{k=1}^{m}\left\{ \frac{2L(\Phi )D_{k}}{(i+1)(i+2)\sigma }\right\} . \end{aligned}$$

This completes the proof.

By Theorem 4.2, we can see that the convergence rate of our algorithm is \(O(1/i^{2}),\) where i is the iteration counter.

Next we will introduce some implementation details of the algorithm. Choose the smoothness function \(d_{Q_{k}}(u_{k})\) as the following form:

$$\begin{aligned} d_{Q_{k}}\left( u_{k}\right) =\ln {n_{k}\left( n_{k}-1\right) }+\sum \limits _{i=1}^{n_{k}^{2}-n_{k}}u^{(i)}_{k}\ln {u^{(i)}_{k}}. \end{aligned}$$

It is easy to verify that

$$\begin{aligned} D_{k}=\max _{u_{k}\in Q_{k}}\left\{ d_{Q_{k}}\left( u_{k}\right) \right\} =\ln {n_{k}\left( n_{k}-1\right) }, \end{aligned}$$

and the strong convexity parameter \(\sigma _{Q_{k}}=1.\)

We also need to compute the following objects at each iteration.

  1. (1)

    Computation of \(u_{k}^{*}(\overline{x}),\,k=1,\cdots ,m.\) \(\,u_{k}^{*}(\overline{x})\) is the solution of the following problem:

$$\begin{aligned} \max _{u_{k}\in Q_{k}}\left\{ \langle A\left( e_{k}\right) \overline{x},\,u_{k} \rangle -\mu d_{Q_{k}}\left( u_{k}\right) \right\} . \end{aligned}$$

The solution of the above problem is

$$\begin{aligned} u_{k}^{(i)*}(\overline{x})= \frac{\exp {(\langle A_{i}(e_{k})^{\rm T},\,\overline{x}\rangle /\mu })}{\sum \nolimits _{i=1}^{n_{k}(n_{k}-1)}\exp {(\langle A_{i}(e_{k})^{\rm T},\,\overline{x} \rangle /\mu )}} .\end{aligned}$$
  1. (2)

    Computation of \(x^{*}(u).\,x^{*}(u)\) is the solution of the problem as follows:

$$\begin{aligned} \min _{x\in X}\lambda \widehat{f}(x)+ \sum \limits _{k=1}^{m}\left\{ \langle A\left( e_{k}\right) x,\,u_{k} \rangle \right\} . \end{aligned}$$

The solution of the above problem can be written as

$$\begin{aligned} x^{*}(u)=x^{+}_{0}-\frac{1}{2\lambda }\sum \limits _{k=1}^{m}\langle A\left( e_{k}\right) ,\,u_{k} \rangle . \end{aligned}$$
  1. (3)

    Computation of \(V_{k}(u_{k},\,g),\,k=1,\cdots ,m.\) \(\,V_{k}(u_{k},\,g)\) is the solution of the following problem:

$$\begin{aligned} \max _{v_{k} \in Q_{k}}\left\{ \langle g,\,v_{k}-u_{k} \rangle -\xi \left( v_{k},\,u_{k}\right) \right\} . \end{aligned}$$

The solution of the above problem can be written as

$$\begin{aligned} V^{(i)}_{k}\left( u_{k},\,g\right) =\frac{u^{(i)}_{k}\exp {(g^{(i)}/\mu )}}{\sum \nolimits _{i=1}^{n_{k}(n_{k}-1)}u^{(i)}_{k}\exp {(g^{(i)}/\mu )}}. \end{aligned}$$

Thus, all variables of the algorithm can be computed directly.

5 Speeding Up the Convergence

By the above lemmas and theorems, we have found that the Lipschitz constant \(L(\Phi )\) may be too large. So we want to take the following strategies to reduce the constant and speed up the convergence.

For any \(x\in X,\) denote \(x_{e_{k}}\) as an n-dimensional vector, and for any \(i\in e_{k},\) the ith component of \(x_{e_{k}}\) is \(x^{(i)},\) and the other components are 0.

Obviously, for every net \(e_{k},\,x_{e_{k}}\) can be generated by the information of \(e_{k},\) and we have the equalities as follows:

$$\begin{aligned} \langle a_{i,j}\left( e_{k}\right) ,\,x\rangle =\langle a_{i,j}\left( e_{k}\right) ,\,x_{e_{k}}\rangle ,\nonumber \\ \langle A\left( e_{k}\right) x,\,u_{k} \rangle =\langle A\left( e_{k}\right) x_{e_{k}},\,u_{k}\rangle . \end{aligned}$$
(5.1)

Let \(\mathcal {A}\) be a matrix corresponding to a net with all the n cells. Then it is obvious that

$$\begin{aligned} \left\| \mathcal {A}x_{e_{k}}\right\| _{2}\geqslant \left\| A\left( e_{k}\right) x_{e_{k}}\right\| _{2}. \end{aligned}$$

By denoting \(\mathcal {X}^{*}(u)=(x_{1}^{*}(u),\,x_{2}^{*}(u),\cdots ,x_{m}^{*}(u)),\) and \(g_{k}(u)=A(e_{k})x^{*}(u),\) we can get

$$\begin{aligned} \left\| \nabla \Phi \left( u_{1},\,u_{2},\cdots ,u_{m}\right) \right\| _{1}=\left\| \left( g_{1}(u),\,g_{2}(u),\cdots ,g_{m}(u)\right) ^\mathrm{T}\right\| _{1}\leqslant \left\| \mathcal {A}\mathcal {X}^{*}(u)\right\| _{1}. \end{aligned}$$
(5.2)

Denote \(\mathrm{deg}_i\) as the degree of the vertex i in the hypergraph. The following relations hold:

$$\begin{aligned} ||\overline{x}^{*}(u)||_{2}^{2}=\sum \limits _{i=1}^{m}\left\| x_{e_{k}}\right\| _{2}^{2}=\sum \limits _{i=1}^{n}\mathrm{deg}_{i}\left\| x^{(i)}\right\| _{2}^{2}. \end{aligned}$$
(5.3)

Note that the function \(\Phi (u_{1},\cdots ,u_{m})\) defined by (3.10) is concave and differentiable, and \(x^{*}(u)\) is its minimal solution. Moreover, the function \(\widehat{f}(x)\) is strongly convex with strong convexity parameter \(\widehat{\sigma }=2.\) Thus, we have the following lemma.

Lemma 5.1

The function \(\Phi (u_{1},\cdots ,u_{m})\) is Lipschitz-continuous differentiable with a constant

$$\begin{aligned} L(\Phi )=\frac{1}{\lambda }\max _{1\leqslant i\leqslant m}\mathrm{deg}_{i}. \end{aligned}$$

Proof

From the function \(\Phi (u_{1},\cdots ,u_{m})\) defined by (3.10) and the fact that \(\widehat{f}(x)\) is strongly convex, the minimization problem in (3.10) has the unique solution \(x^{*}(u).\) Consider \(u_{k},\,{v}_{k}\in Q_{k},\,k=1,\cdots ,m.\) We have

$$\begin{aligned} \begin{aligned}&\;\;\;\;\;\left( \left\| \nabla \Phi \left( u_{1},\,u_{2},\cdots ,u_{m}\right) -\nabla \Phi \left( v_{1},\,v_{2},\cdots ,v_{m}\right) \right\| _{1}^{*}\right) ^{2} \\ (\mathrm{by}\;(5.2))&\leqslant \left( ||\mathcal {A}\mathcal {X}^{*}(u)-\mathcal {A}\mathcal {X}^{*}(v)||_{1}^{*}\right) ^{2}\\(\mathrm{by}\;(2.1))&\leqslant ||\mathcal {A}||_{2,1}^{2}||\mathcal {X}^{*}(u)-\mathcal {X}^{*}(v)||_{2}^{2}\\(\mathrm{by}\;(3.1))&=2\sum \limits _{k=1}^{m}\left\| x_{e_{k}}^{*}(u)-x_{e_{k}}^{*}(v)\right\| _{2}^{2}\\(\mathrm{by}\;(5.3))&\leqslant 2\max _{1\leqslant i\leqslant n}\mathrm{deg}_{i}||x^{*}(u)-x^{*}(v)||_{2}^{2} \\ (\mathrm{by}\;(3.15))&\leqslant \frac{2}{\lambda \widehat{\sigma }} \max _{1\leqslant i\leqslant n}\mathrm{deg}_{i} \sum \limits _{k=1}^{m} \langle A\left( e_{k}\right) (x^{*}(v)-x^{*}(u)),\,u_{k}-v_{k} \rangle \\ {}&\leqslant \frac{1}{\lambda } \max _{1\leqslant i\leqslant n}\mathrm{deg}_{i} \left\| \nabla \Phi \left( u_{1},\,u_{2},\cdots ,u_{m}\right) -\nabla \Phi \left( v_{1},\,v_{2},\cdots ,v_{m}\right) \right\| ^{*}_{1}\sum \limits _{k=1}^{m}\left\| u_{k}-v_{k}\right\| _{1}. \end{aligned} \end{aligned}$$

Hence, we have

$$\begin{aligned} \left\| \nabla \Phi \left( u_{1},\cdots ,u_{m}\right) -\nabla \Phi \left( v_{1},\cdots ,v_{m}\right) \right\| _{1}\leqslant \frac{1}{\lambda }\max _{1\leqslant i\leqslant n}\mathrm{deg}_{i} \sum \limits _{k=1}^{m}\left\| u_{k}-v_{k}\right\| _{1}, \end{aligned}$$

and the lemma holds.

6 Experiments

In this section, we test the algorithm on nine benchmarks of the 2004 International Symposium on Physical Design (ISPD04) placement contest benchmark suites. Our implementation is written in Matlab 7.0, and is run on a personal computer with Intel Core2 Duo CPU E7500 (2.9 GHZ) and 2 GB internal memory, under Windows XP. The information of the nine benchmarks are put in Table 1. In the experiments, we take different values \(\lambda =0.1,\,0.5,\) and 1, and we take the value \(\varepsilon =200\) for every benchmark.

Table 1 ICCAD’04 IBM benchmark

In Fig. 1, we plot the primal function value and the dual function value of the algorithm on the benchmark ibm01 with \(\lambda =1.0\) at each iteration.

Fig. 1
figure 1

Primal and dual function values of ibm01, \(\lambda =1.0\)

From Fig. 1, we can find that the objective function value decreases very fast in the first 10 iterations, and then the value changes very small. Hence, we terminate our algorithm in the experiments when the dual gap is less than 200. The results of the algorithm on the nine benchmarks are put in Table 2.

Table 2 Runtime and primal and dual function values

Table 2 shows the runtime and the final value of \(f(x)\) for different \(\lambda \) and benchmark. In general, the computation of \(u_{k}^{*}(x)\) takes 25 % of the total runtime, the computation of \(V_{k}(u_{k},\,g)\) takes 27 % the computation of \(x^{*}(u)\) takes 14 % the computation of dual function \(\Phi (u_{1},\cdots ,u_{m})\) spends 26 % the computation of primal function \(f(x)\) takes 5 % and the others take 3 % of the total runtime, respectively.

The runtime of our algorithm is not only related to the scale of the benchmark, but also related to \(\lambda \) and the number of cells on the nets. From Fig. 1 and Table 2, we can find that with the increase of the value of \(\lambda ,\) the algorithm costs less time. This is because L is inversely proportional to \(\lambda ,\) and with the increase of L, the improvement of each iteration will be smaller. From the experiments, we also find that computations of \(u_{k}^{*}(x),\,V_{k}(u_{k},\,g),\) and \(\Phi (u_{1},\cdots ,u_{m})\) take most of the runtime, and these computations depend on the number of cells on the nets. When the number of cells on a net is larger, the more runtime of these computations will cost.

Finally, it must be remarked that the problem considered in this paper is only a subproblem in the software packages SimPL [10, 12], ComPLx [9], and Mapple [11], for VLSI placement. Our future research will improve the performance of the algorithm, and implement it for real VLSI placement.