Optimization problem (Sect. 1) $$N$$ Dimension of the optimization variable (1) $$x, h$$ Vectors in $$\mathbf {R}^N$$ $$f$$ Smooth convex function ($$f: \mathbf {R}^N \rightarrow \mathbf {R}$$) (1) $$\Omega$$ Convex block separable function ($$\Omega : \mathbf {R}^N \rightarrow \mathbf {R}\cup \{+\infty \}$$) (1) $$F$$ $$F=f+\Omega$$ (loss / objective function) (1) $$\omega$$ Degree of partial separability of $$f$$ (2,3) Block structure (Sect. 2.1) $$n$$ Number of blocks $$[n]$$ $$[n]=\{1,2,\ldots ,n\}$$ (the set of blocks) Sect. 2.1 $$N_i$$ Dimension of block $$i$$ ($$N_1+\ldots +N_n = N$$) Sect. 2.1 $$U_i$$ An $$N_i \times N$$ column submatrix of the $$N \times N$$ identity matrix Prop. 1 $$x^{(i)}$$ $$x^{(i)}=U_i^T x \in \mathbf {R}^{N_i}$$ (block $$i$$ of vector $$x$$) Prop. 1 $$\nabla _i f(x)$$ $$\nabla _i f(x) = U_i^T \nabla f(x)$$ (block gradient of $$f$$ associated with block $$i$$) (11) $$L_i$$ Block Lipschitz constant of the gradient of $$f$$ (11) $$L$$ $$L = (L_1,\ldots ,L_n)^T \in \mathbf {R}^n$$ (vector of block Lipschitz constants) $$w$$ $$w = (w_1,\ldots ,w_n)^T \in \mathbf {R}^n$$ (vector of positive weights) $${{\mathrm{Supp}}}(h)$$ $${{\mathrm{Supp}}}(h) = \{i \in [n] \;:\; x^{(i)} \ne 0\}$$ (set of nonzero blocks of $$x$$) $$B_i$$ An $$N_i\times N_i$$ positive definite matrix $$\Vert \cdot \Vert _{(i)}$$ $$\Vert x^{(i)}\Vert _{(i)} = \langle B_i x^{(i)} , x^{(i)} \rangle ^{1/2}$$ (norm associated with block of $$i$$) $$\Vert x\Vert _w$$ $$\Vert x\Vert _w=(\sum _{i=1}^n w_i \Vert x^{(i)}\Vert ^2_{(i)})^{1/2}$$ (weighted norm associated with $$x$$) (10) $$\Omega _i$$ $$i$$th componet of $$\Omega = \Omega _1 + \ldots + \Omega _n$$ (13) $$\mu _{\Omega }(w)$$ Strong convexity constant of $$\Omega$$ with respect to the norm $$\Vert \cdot \Vert _w$$ (14) $$\mu _f(w)$$ Strong convexity constant of $$f$$ with respect to the norm $$\Vert \cdot \Vert _w$$ (14) Block samplings (Sect. 4) $$S, J$$ Subsets of $$\{1,2,\ldots ,n\}$$ $$\hat{S}, S_k$$ Block samplings (random subsets of $$\{1,2,\ldots ,n\}$$) $$x_{[S]}$$ Vector in $$\mathbf {R}^N$$ formed from $$x$$ by zeroing out blocks $$x^{(i)}$$ for $$i \notin S$$ (7), (8) $$\tau$$ # of blocks updated in 1 iteration (when $$\mathbf {P}(|\hat{S}|=\tau )=1$$) $${{\mathrm{\mathbf {E}}}}[|\hat{S}|]$$ Average # of blocks updated in 1 iteration (when $$\mathbf {Var}[|\hat{S}|]>0$$) $$p(S)$$ $$p(S) = \mathbf {P}(\hat{S}=S)$$ (20) $$p_i$$ $$p_i= \mathbf {P}(i \in \hat{S})$$ (21) $$p$$ $$p = (p_1,\ldots ,p_n)^T \in \mathbf {R}^n$$ (21) Algorithm (Sect. 2.2) $$\beta$$ Stepsize parameter depending on $$f$$ and $$\hat{S}$$ (a central object in this paper) $$H_{\beta ,w}(x,h)$$ $$H_{\beta ,w}(x,h) = f(x) + \langle \nabla f(x) , h \rangle + \tfrac{\beta }{2}\Vert h\Vert _w^2 + \Omega (x+h)$$ (18) $$h(x)$$ $$h(x) = \arg \min _{h \in \mathbf {R}^N} H_{\beta ,w}(x,h)$$ (17) $$h^{(i)}(x)$$ $$h^{(i)}(x) = (h(x))^{(i)} = \arg \min _{t \in \mathbf {R}^{N_i}} \langle \nabla _i f(x) , t \rangle + \tfrac{\beta w_i}{2}\Vert t\Vert _{(i)}^2 + \Omega _i(x^{(i)}+t)$$ (17) $$x_{k+1}$$ $$x_{k+1} = x_k + \sum _{i\in S_k} U_i h^{(i)}(x_k)$$    ($$x_k$$ is the $$k$$th iterate of PCDM)