Optimization problem (Sect. 1)
|
\(N\)
|
Dimension of the optimization variable
|
(1)
|
\(x, h\)
|
Vectors in \(\mathbf {R}^N\)
| |
\(f\)
|
Smooth convex function (\(f: \mathbf {R}^N \rightarrow \mathbf {R}\))
|
(1)
|
\(\Omega \)
|
Convex block separable function (\(\Omega : \mathbf {R}^N \rightarrow \mathbf {R}\cup \{+\infty \}\))
|
(1)
|
\(F\)
|
\(F=f+\Omega \) (loss / objective function)
|
(1)
|
\(\omega \)
|
Degree of partial separability of \(f\)
|
(2,3)
|
Block structure (Sect. 2.1)
|
\(n\)
|
Number of blocks
| |
\([n]\)
|
\([n]=\{1,2,\ldots ,n\}\) (the set of blocks)
|
Sect. 2.1
|
\(N_i\)
|
Dimension of block \(i\) (\(N_1+\ldots +N_n = N\))
|
Sect. 2.1
|
\(U_i\)
|
An \(N_i \times N\) column submatrix of the \(N \times N\) identity matrix
|
Prop. 1
|
\(x^{(i)} \)
|
\(x^{(i)}=U_i^T x \in \mathbf {R}^{N_i}\) (block \(i\) of vector \(x\))
|
Prop. 1
|
\(\nabla _i f(x)\)
|
\(\nabla _i f(x) = U_i^T \nabla f(x)\) (block gradient of \(f\) associated with block \(i\))
|
(11)
|
\(L_i\)
|
Block Lipschitz constant of the gradient of \(f\)
|
(11)
|
\(L\)
|
\(L = (L_1,\ldots ,L_n)^T \in \mathbf {R}^n\) (vector of block Lipschitz constants)
| |
\(w\)
|
\(w = (w_1,\ldots ,w_n)^T \in \mathbf {R}^n\) (vector of positive weights)
| |
\({{\mathrm{Supp}}}(h)\)
|
\({{\mathrm{Supp}}}(h) = \{i \in [n] \;:\; x^{(i)} \ne 0\}\) (set of nonzero blocks of \(x\))
| |
\(B_i\)
|
An \(N_i\times N_i\) positive definite matrix
| |
\(\Vert \cdot \Vert _{(i)}\)
|
\(\Vert x^{(i)}\Vert _{(i)} = \langle B_i x^{(i)} , x^{(i)} \rangle ^{1/2}\) (norm associated with block of \(i\))
| |
\(\Vert x\Vert _w\)
|
\(\Vert x\Vert _w=(\sum _{i=1}^n w_i \Vert x^{(i)}\Vert ^2_{(i)})^{1/2}\) (weighted norm associated with \(x\))
|
(10)
|
\(\Omega _i\)
|
\(i\)th componet of \(\Omega = \Omega _1 + \ldots + \Omega _n\)
|
(13)
|
\(\mu _{\Omega }(w)\)
|
Strong convexity constant of \(\Omega \) with respect to the norm \(\Vert \cdot \Vert _w\)
|
(14)
|
\(\mu _f(w)\)
|
Strong convexity constant of \(f\) with respect to the norm \(\Vert \cdot \Vert _w\)
|
(14)
|
Block samplings (Sect. 4)
|
\(S, J\)
|
Subsets of \(\{1,2,\ldots ,n\}\)
| |
\(\hat{S}, S_k\)
|
Block samplings (random subsets of \(\{1,2,\ldots ,n\}\))
| |
\(x_{[S]}\)
|
Vector in \(\mathbf {R}^N\) formed from \(x\) by zeroing out blocks \(x^{(i)}\) for \(i \notin S\)
|
(7), (8)
|
\(\tau \)
|
# of blocks updated in 1 iteration (when \(\mathbf {P}(|\hat{S}|=\tau )=1\))
| |
\({{\mathrm{\mathbf {E}}}}[|\hat{S}|]\)
|
Average # of blocks updated in 1 iteration (when \(\mathbf {Var}[|\hat{S}|]>0\))
| |
\(p(S)\)
|
\(p(S) = \mathbf {P}(\hat{S}=S)\)
|
(20)
|
\(p_i\)
|
\(p_i= \mathbf {P}(i \in \hat{S})\)
|
(21)
|
\(p\)
|
\(p = (p_1,\ldots ,p_n)^T \in \mathbf {R}^n\)
|
(21)
|
Algorithm (Sect. 2.2)
|
\(\beta \)
|
Stepsize parameter depending on \(f\) and \(\hat{S}\) (a central object in this paper)
| |
\(H_{\beta ,w}(x,h)\)
|
\(H_{\beta ,w}(x,h) = f(x) + \langle \nabla f(x) , h \rangle + \tfrac{\beta }{2}\Vert h\Vert _w^2 + \Omega (x+h)\)
|
(18)
|
\(h(x)\)
|
\(h(x) = \arg \min _{h \in \mathbf {R}^N} H_{\beta ,w}(x,h)\)
|
(17)
|
\(h^{(i)}(x)\)
|
\(h^{(i)}(x) = (h(x))^{(i)} = \arg \min _{t \in \mathbf {R}^{N_i}} \langle \nabla _i f(x) , t \rangle + \tfrac{\beta w_i}{2}\Vert t\Vert _{(i)}^2 + \Omega _i(x^{(i)}+t)\)
|
(17)
|
\(x_{k+1}\)
|
\(x_{k+1} = x_k + \sum _{i\in S_k} U_i h^{(i)}(x_k)\) (\(x_k\) is the \(k\)th iterate of PCDM)
| |