Skip to main content

Table 8 The main notation used in the paper

From: Parallel coordinate descent methods for big data optimization

Optimization problem (Sect. 1)
\(N\) Dimension of the optimization variable (1)
\(x, h\) Vectors in \(\mathbf {R}^N\)  
\(f\) Smooth convex function (\(f: \mathbf {R}^N \rightarrow \mathbf {R}\)) (1)
\(\Omega \) Convex block separable function (\(\Omega : \mathbf {R}^N \rightarrow \mathbf {R}\cup \{+\infty \}\)) (1)
\(F\) \(F=f+\Omega \) (loss / objective function) (1)
\(\omega \) Degree of partial separability of \(f\) (2,3)
Block structure (Sect. 2.1)
\(n\) Number of blocks  
\([n]\) \([n]=\{1,2,\ldots ,n\}\) (the set of blocks) Sect. 2.1
\(N_i\) Dimension of block \(i\) (\(N_1+\ldots +N_n = N\)) Sect. 2.1
\(U_i\) An \(N_i \times N\) column submatrix of the \(N \times N\) identity matrix Prop. 1
\(x^{(i)} \) \(x^{(i)}=U_i^T x \in \mathbf {R}^{N_i}\) (block \(i\) of vector \(x\)) Prop. 1
\(\nabla _i f(x)\) \(\nabla _i f(x) = U_i^T \nabla f(x)\) (block gradient of \(f\) associated with block \(i\)) (11)
\(L_i\) Block Lipschitz constant of the gradient of \(f\) (11)
\(L\) \(L = (L_1,\ldots ,L_n)^T \in \mathbf {R}^n\) (vector of block Lipschitz constants)  
\(w\) \(w = (w_1,\ldots ,w_n)^T \in \mathbf {R}^n\) (vector of positive weights)  
\({{\mathrm{Supp}}}(h)\) \({{\mathrm{Supp}}}(h) = \{i \in [n] \;:\; x^{(i)} \ne 0\}\) (set of nonzero blocks of \(x\))  
\(B_i\) An \(N_i\times N_i\) positive definite matrix  
\(\Vert \cdot \Vert _{(i)}\) \(\Vert x^{(i)}\Vert _{(i)} = \langle B_i x^{(i)} , x^{(i)} \rangle ^{1/2}\) (norm associated with block of \(i\))  
\(\Vert x\Vert _w\) \(\Vert x\Vert _w=(\sum _{i=1}^n w_i \Vert x^{(i)}\Vert ^2_{(i)})^{1/2}\) (weighted norm associated with \(x\)) (10)
\(\Omega _i\) \(i\)th componet of \(\Omega = \Omega _1 + \ldots + \Omega _n\) (13)
\(\mu _{\Omega }(w)\) Strong convexity constant of \(\Omega \) with respect to the norm \(\Vert \cdot \Vert _w\) (14)
\(\mu _f(w)\) Strong convexity constant of \(f\) with respect to the norm \(\Vert \cdot \Vert _w\) (14)
Block samplings (Sect. 4)
   \(S, J\) Subsets of \(\{1,2,\ldots ,n\}\)  
   \(\hat{S}, S_k\) Block samplings (random subsets of \(\{1,2,\ldots ,n\}\))  
   \(x_{[S]}\) Vector in \(\mathbf {R}^N\) formed from \(x\) by zeroing out blocks \(x^{(i)}\) for \(i \notin S\) (7), (8)
   \(\tau \) # of blocks updated in 1 iteration (when \(\mathbf {P}(|\hat{S}|=\tau )=1\))  
   \({{\mathrm{\mathbf {E}}}}[|\hat{S}|]\) Average # of blocks updated in 1 iteration (when \(\mathbf {Var}[|\hat{S}|]>0\))  
   \(p(S)\) \(p(S) = \mathbf {P}(\hat{S}=S)\) (20)
   \(p_i\) \(p_i= \mathbf {P}(i \in \hat{S})\) (21)
   \(p\) \(p = (p_1,\ldots ,p_n)^T \in \mathbf {R}^n\) (21)
Algorithm (Sect. 2.2)
   \(\beta \) Stepsize parameter depending on \(f\) and \(\hat{S}\) (a central object in this paper)  
   \(H_{\beta ,w}(x,h)\) \(H_{\beta ,w}(x,h) = f(x) + \langle \nabla f(x) , h \rangle + \tfrac{\beta }{2}\Vert h\Vert _w^2 + \Omega (x+h)\) (18)
   \(h(x)\) \(h(x) = \arg \min _{h \in \mathbf {R}^N} H_{\beta ,w}(x,h)\) (17)
   \(h^{(i)}(x)\) \(h^{(i)}(x) = (h(x))^{(i)} = \arg \min _{t \in \mathbf {R}^{N_i}} \langle \nabla _i f(x) , t \rangle + \tfrac{\beta w_i}{2}\Vert t\Vert _{(i)}^2 + \Omega _i(x^{(i)}+t)\) (17)
   \(x_{k+1}\) \(x_{k+1} = x_k + \sum _{i\in S_k} U_i h^{(i)}(x_k)\)    (\(x_k\) is the \(k\)th iterate of PCDM)