Abstract
Sparse principal component analysis (PCA) improves interpretability of the classic PCA by introducing sparsity into the dimension-reduction process. Optimization models for sparse PCA, however, are generally non-convex, non-smooth and more difficult to solve, especially on large-scale datasets requiring distributed computation over a wide network. In this paper, we develop a distributed and centralized algorithm called DSSAL1 for sparse PCA that aims to achieve low communication overheads by adapting a newly proposed subspace-splitting strategy to accelerate convergence. Theoretically, convergence to stationary points is established for DSSAL1. Extensive numerical results show that DSSAL1 requires far fewer rounds of communication than state-of-the-art peer methods. In addition, we make the case that since messages exchanged in DSSAL1 are well-masked, the possibility of private-data leakage in DSSAL1 is much lower than in some other distributed algorithms.
Similar content being viewed by others
Data availibility
The authors declare that all data supporting the findings of this study are available within the article.
Notes
A function f(X) is called orthogonal-transformation invariant if \(f (XO) = f(X)\) for any \(X \in \mathcal {S}_{n,p}\) and \(O \in \mathcal {S}_{p, p}\).
More information at http://lsec.cc.ac.cn/chinese/lsec/LSSC-IVintroduction.pdf.
Our code is downloadable from http://lsec.cc.ac.cn/~liuxin/code.html.
Available from https://eigen.tuxfamily.org/index.php?title=Main_Page.
References
Sjostrand, K., Rostrup, E., Ryberg, C., Larsen, R., Studholme, C., Baezner, H., Ferro, J., Fazekas, F., Pantoni, L., Inzitari, D., et al.: Sparse decomposition and modeling of anatomical shape variation. IEEE Trans. Med. Imaging 26(12), 1625–1635 (2007). https://doi.org/10.1109/TMI.2007.898808
Chen, G., Sullivan, P.F., Kosorok, M.R.: Biclustering with heterogeneous variance. Proc. Natl. Acad. Sci. 110(30), 12253–12258 (2013). https://doi.org/10.1073/pnas.1304376110
Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009)
Zou, H., Xue, L.: A selective overview of sparse principal component analysis. Proc. IEEE 106(8), 1311–1320 (2018). https://doi.org/10.1109/JPROC.2018.2846588
Wu, T., Hu, E., Xu, S., Chen, M., Guo, P., Dai, Z., Feng, T., Zhou, L., Tang, W., Zhan, L., et al.: ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2(3), 100141 (2021). https://doi.org/10.1016/j.xinn.2021.100141
Gravuer, K., Sullivan, J.J., Williams, P.A., Duncan, R.P.: Strong human association with plant invasion success for Trifolium introductions to New Zealand. Proc. Natl. Acad. Sci. 105(17), 6344–6349 (2008). https://doi.org/10.1073/pnas.0712026105
Baden, T., Berens, P., Franke, K., Rosón, M.R., Bethge, M., Euler, T.: The functional diversity of retinal ganglion cells in the mouse. Nature 529(7586), 345–350 (2016). https://doi.org/10.1038/nature16468
Stiefel, E.: Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten. Commentarii Mathematici Helvetici 8(1), 305–353 (1935). https://doi.org/10.3929/ethz-a-000092403
Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003). https://doi.org/10.1198/1061860032148
Magdon-Ismail, M.: NP-hardness and inapproximability of sparse PCA. Inf. Process. Lett. 126, 35–38 (2017). https://doi.org/10.1016/j.ipl.2017.05.008
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006). https://doi.org/10.1198/106186006X113430
d’Aspremont, A., Bach, F., El Ghaoui, L.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9(42), 1269–1294 (2008)
d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007). https://doi.org/10.1137/050645506
Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99(6), 1015–1034 (2008). https://doi.org/10.1016/j.jmva.2007.06.007
Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009). https://doi.org/10.1093/biostatistics/kxp008
Pacheco, P.S.: An Introduction to Parallel Programming. Elsevier, USA (2011). https://doi.org/10.1016/C2009-0-18471-4
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.y.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp. 1273–1282 (2017). PMLR. https://proceedings.mlr.press/v54/mcmahan17a.html
Lou, Y., Yu, L., Wang, S., Yi, P.: Privacy preservation in distributed subgradient optimization algorithms. IEEE Trans. Cybernetics 48(7), 2154–2165 (2017). https://doi.org/10.1109/TCYB.2017.2728644
Zhang, C., Ahmad, M., Wang, Y.: ADMM based privacy-preserving decentralized optimization. IEEE Trans. Inf. Forensics Secur. 14(3), 565–580 (2018). https://doi.org/10.1109/TIFS.2018.2855169
Manton, J.H.: Optimization algorithms exploiting unitary constraints. IEEE Trans. Signal Process. 50(3), 635–650 (2002). https://doi.org/10.1109/78.984753
Nishimori, Y., Akaho, S.: Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67, 106–135 (2005). https://doi.org/10.1016/j.neucom.2004.11.035
Abrudan, T.E., Eriksson, J., Koivunen, V.: Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans. Signal Process. 56(3), 1134–1147 (2008). https://doi.org/10.1109/tsp.2007.908999
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998). https://doi.org/10.1137/S0895479895290954
Sato, H.: A Dai-Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions. Comput. Optim. Appl. 64(1), 101–118 (2016). https://doi.org/10.1007/s10589-015-9801-1
Zhu, X.: A Riemannian conjugate gradient method for optimization on the Stiefel manifold. Comput. Optim. Appl. 67(1), 73–110 (2017). https://doi.org/10.1007/s10589-016-9883-4
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013). https://doi.org/10.1007/s10107-012-0584-1
Jiang, B., Dai, Y.-H.: A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math. Program. 153(2), 535–575 (2015). https://doi.org/10.1007/s10107-014-0816-7
Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018). https://doi.org/10.1137/17M1142478
Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.-X.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), 2239–2269 (2019). https://doi.org/10.1137/18M121112X
Absil, P.-A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2006). https://doi.org/10.1007/s10208-005-0179-9
Gao, B., Liu, X., Chen, X., Yuan, Y.-X.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018). https://doi.org/10.1137/16M1098759
Wang, L., Gao, B., Liu, X.: Multipliers correction methods for optimization problems over the Stiefel manifold. CSIAM Trans. Appl. Mathemat. 2(3), 508–531 (2021). https://doi.org/10.4208/csiam-am.SO-2020-0008
Gao, B., Liu, X., Yuan, Y.-X.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), 1949–1983 (2019). https://doi.org/10.1137/18m1221679
Xiao, N., Liu, X., Yuan, Y.-X.: A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optimiz. Methods Software (2020). https://doi.org/10.1080/10556788.2020.1852236
Ferreira, O., Oliveira, P.: Subgradient algorithm on Riemannian manifolds. J. Optim. Theory Appl. 97(1), 93–104 (1998). https://doi.org/10.1023/A:1022675100677
Ferreira, O.P., Louzeiro, M.S., Prudente, L.F.: Iteration-complexity of the subgradient method on Riemannian manifolds with lower bounded curvature. Optimization 68(4), 713–729 (2019). https://doi.org/10.1080/02331934.2018.1542532
Bacák, M., Bergmann, R., Steidl, G., Weinmann, A.: A second order nonsmooth variational model for restoring manifold-valued images. SIAM J. Sci. Comput. 38(1), 567–597 (2016). https://doi.org/10.1137/15M101988X
Grohs, P., Hosseini, S.: Nonsmooth trust region algorithms for locally Lipschitz functions on Riemannian manifolds. IMA J. Numer. Anal. 36(3), 1167–1192 (2016). https://doi.org/10.1093/imanum/drv043
Hosseini, S., Uschmajew, A.: A Riemannian gradient sampling algorithm for non smooth optimization on manifolds. SIAM J. Optim. 27(1), 173–189 (2017). https://doi.org/10.1137/16M1069298
Chen, S., Ma, S., Man-Cho So, A., Zhang, T.: Proximal gradient method for non smooth optimization over the Stiefel manifold. SIAM J. Optim. 30(1), 210–239 (2020). https://doi.org/10.1137/18M122457X
Huang, W., Wei, K.: Riemannian proximal gradient methods. Mathemat. Program. (2021). https://doi.org/10.1007/s10107-021-01632-3
Xiao, X., Li, Y., Wen, Z., Zhang, L.: A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76(1), 364–389 (2018)
Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014). https://doi.org/10.1007/s10915-013-9740-x
Kovnatsky, A., Glashoff, K., Bronstein, M.M.: MADMM: A generic algorithm for non-smooth optimization on manifolds. In: European conference on computer vision, pp. 680–696 (2016). Springer
Chen, W., Ji, H., You, Y.: An augmented lagrangian method for \(\ell _{1}\)-regularized optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 38(4), 570–592 (2016). https://doi.org/10.1137/140988875
Hajinezhad, D., Hong, M.: Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis. In: 2015 IEEE global conference on signal and information processing, pp. 255–259 (2015). https://doi.org/10.1109/GlobalSIP.2015.7418196
Wang, L., Liu, X., Zhang, Y.: A distributed and secure algorithm for computing dominant SVD based on projection splitting. arXiv:2012.03461 (2020)
Gemp, I., McWilliams, B., Vernade, C., Graepel, T.: Eigengame: PCA as a nash equilibrium. arXiv:2010.00554 (2020)
Gang, A., Bajwa, W.U.: A linearly convergent algorithm for distributed principal component analysis. arXiv:2101.01300 (2021)
Gang, A., Bajwa, W.U.: FAST-PCA: A fast and exact algorithm for distributed principal component analysis. arXiv:2108.12373 (2021)
Andrade, F.L., Figueiredo, M.A., Xavier, J.: Distributed Picard iteration: application to distributed EM and distributed PCA. arXiv:2106.10665 (2021)
Ye, H., Zhang, T.: DeEPCA: decentralized exact PCA with linear convergence rate. J. Mach. Learn. Res. 22(238), 1–27 (2021)
Chen, S., Garcia, A., Hong, M., Shahrampour, S.: Decentralized Riemannian gradient descent on the Stiefel manifold. In: proceedings of the 38th international conference on machine learning, VOL. 139, PP. 1594–1605 (2021). https://proceedings.mlr.press/v139/chen21g.html
Wang, L., Liu, X.: Decentralized optimization over the Stiefel manifold by an approximate augmented Lagrangian function. IEEE Trans. Signal Process. 70, 3029–3041 (2022). https://doi.org/10.1109/TSP.2022.3182883
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, ASIA (1990)
Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pacific J. Optim. 10(2), 415–434 (2014)
Arrow, K.J., Azawa, H., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming vol. 2. Stanford University Press, (1958)
He, B., You, Y., Yuan, X.: On the convergence of primal-dual hybrid gradient algorithm. SIAM J. Imag. Sci. 7(4), 2526–2537 (2014). https://doi.org/10.1137/140963467
Xiao, N., Liu, X., Yuan, Y.-x.: A penalty-free infeasible approach for a class of nonsmooth optimization problems over the Stiefel manifold. arXiv:2103.03514 (2021)
Rutishauser, H.: Simultaneous iteration method for symmetric matrices. Numer. Math. 16(3), 205–223 (1970). https://doi.org/10.1007/BF02219773
Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, sub gradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017). https://doi.org/10.1007/s10957-017-1093-4
Jiang, B., Ma, S., So, A.M.-C., Zhang, S.: Vector transport-free SVRG with general retraction for Riemannian optimization: complexity analysis and practical implementation. arXiv:1705.09059 (2017)
Funding
The work of the first author was supported by the National Key R &D Program of China (No. 2020YFA0711900, 2020YFA0711904). The work of the second author was supported in part by the National Natural Science Foundation of China (No. 12125108, 11971466, 12288201, 12021001, 11991021) and Key Research Program of Frontier Sciences, Chinese Academy of Sciences (No. ZDBS-LY-7022). The work of the third author was supported in part by the Shenzhen Science and Technology Program (No. GXWD20201231105722002-20200901175001001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A proof of lemma 1
Proof of Lemma 1
According to the definition of \(\textrm{Proj}_{\mathcal {T}_{Z} \mathcal {S}_{n,p}} \left( \cdot \right) \), it follows that
where \(R(Z) \in \partial r(Z)\). The proof is completed.\(\square \)
Appendix B proof of proposition 2
Proof of Proposition 2
To begin with, we assume that \(\left( Z, \{X_i\} \right) \) is a first-order stationary point. Then there exists \(R(Z) \in \partial r(Z)\) such that
and \(Z^{\top }R(Z)\) is symmetric. Let \(\Theta = Z^{\top }R(Z) \in Z^{\top }\partial r(Z)\), \(\Gamma _i = - X_i^{\top }A_i A_i^{\top }X_i\), and
with \(i=1,\dotsc ,d\). Then the matrices \(\Theta \), \(\Gamma _i\) and \(\Lambda _i\) are symmetric and \(\textrm{rank}\left( \Lambda _i\right) \le 2p\). Moreover, we can deduce that
and
Hence, \(\left( Z, \{X_i\} \right) \) satisfies the conditions in (9) under these specific choices of \(\Theta \), \(\Gamma _i\) and \(\Lambda _i\).
Conversely, we now assume that there exist \(R(Z) \in \partial r(Z)\) and symmetric matrices \(\Theta \), \(\Gamma _i\) and \(\Lambda _i\) such that \(\left( Z, \{X_i\} \right) \) satisfies the conditions in (9). It follows from the first and second equality in (9) that
At the same time, since \(X_i X_i^{\top }= Z Z^{\top }\), we have
Combining the above two equalities and orthogonality of Z, we arrive at
Left-multiplying both sides of the second equality in (9) by \(Z^{\top }\), we obtain that
which together with the symmetry of \(\Lambda _i\) and \(\Theta \) implies that \(Z^{\top }R(Z)\) is also symmetric. This completes the proof. \(\square \)
Appendix C proof of lemma 3
Proof of Lemma 3
Since \(( Z^{(k)}, \{X_i^{(k)}\} )\) is feasible, we know \(X_i^{(k)} (X_i^{(k)})^{\top }{=} Z^{(k)} (Z^{(k)})^{\top }\) for \(i=1,\dotsc ,d\).
Thus, it can be readily verified that
which implies that
According to Theorem 4.1 in [57], the first-order optimality condition of (18) can be stated as:
Since \(D^{(k)} = 0\) is the global minimizer of (18), we have
We obtain the assertion of this lemma. \(\square \)
Appendix D convergence of algorithm 2
Now we prove Theorem 4 to establish the global convergence of Algorithm 2. In addition to the notations introduced in Sect. 1, we further adopt the followings throughout the theoretical analysis. The notations \(\textrm{rank}\left( C\right) \) and \(\sigma _{\min } \left( C\right) \) represent the rank and the smallest singular value of \(C\), respectively. For \(X, Y \in \mathcal {S}_{n,p}\), we define \({\mathbf {D_p}}\left( X,Y\right) := XX^{\top }- YY^{\top }\) and \({\mathbf {d_p}}\left( X,Y\right) := \left\| {\mathbf {D_p}}\left( X,Y\right) \right\| _{\textrm{F}}\), standing for, respectively, the projection distance matrix and its measurement.
To begin with, we provide a sketch of our proof. Suppose \(\{Z^{(k)}\}\) is the iteration sequence generated by Algorithm 2, with \(X_i^{(k)}\) and \(\Lambda _i^{(k)}\) being the local variable and multiplier of the i-th agent at the k-th iteration, respectively. The proof includes the following main steps.
-
1.
The sequence \(\{Z^{(k)}\}\) is bounded and the sequence \(\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is bounded from below.
-
2.
The sequence \(\{Z^{(k)}\}\) satisfies \({\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 ( 1 -\underline{\sigma }^2 )\), and \(\underline{\sigma }\) is a unified lower bound of the smallest singular values of the matrices \((X_i^k)^{\top }Z^{k+1}(i=1,\dotsc ,d)\).
-
3.
The sequence \(\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is monotonically non-increasing, and hence is convergent.
-
4.
The sequence \(\{Z^{(k)}\}\) has at least one accumulation point, and any accumulation point is a first-order stationary point of the sparse PCA problem (2).
Next we verify all the items in the above sketch by proving the following lemmas and corollaries.
Lemma 5
Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2. Let
Then the following relationship holds for any \(k \in \mathbb {N}\),
Proof
Since \(g^{(k)}\) is strongly convex with modulus \(\dfrac{1}{\eta }\), we have
for any \(D, \hat{D} \in \mathbb {R}^{n\times p}\). In particular, if \(\hat{D}, D \in \mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}\), it holds that
It follows from the first-order optimality condition of (18) that \(0 \in \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( \partial g^{(k)}\right) \) \({ (D^{(k)})}\). Finally, taking \(\hat{D} = 0\) and \(D = D^{(k)}\) in (30) yields the assertion of this lemma. \(\square \)
Lemma 6
Suppose \(Z \in \mathcal {S}_{n,p}\) and \(D \in \mathcal {T}_{Z} \mathcal {S}_{n,p}\). Then it holds that
and
Proof
The proof can be found in, for example, [63]. For the sake of completeness, we provide a proof here. It follows from the orthogonality of Z and the skew-symmetry of \(Z^{\top }D\) that \(Z + D\) has full column rank. This yields that \(\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) = (Z + D)F^{-1}\), where \(F = (I_p + D^{\top }D)^{1/2}\). Since \(\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) - Z = ( Z (I_p - F) + D ) F^{-1}\), we have
where \(\tilde{\sigma }_1 \ge \cdots \ge \tilde{\sigma }_d \ge 0\) are the singular values of D. Similarly, it follows from the relationship \(\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) - Z - D = (Z + D) (F^{-1}- I_p)\) that
which completes the proof. \(\square \)
Corollary 7
Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 with the parameters satisfying Condition 1. Then for any \(k \in \mathbb {N}\), it holds that
where \(\bar{M} > 0\) is a constant defined in Sect. 4.
Proof
Firstly, it can be readily verified that
Let \(\bar{q}^{(k)} (Z) = \textrm{tr}(Z^{\top }Q^{(k)} Z) / 2\) bethe smooth part of the objective function \(q^{(k)} (Z)\) in (16). Since \(\nabla \bar{q}^{(k)}\) is Lipschitz continuous with the corresponding Lipschitz constant \(\left\| Q^{(k)}\right\| _{\textrm{F}}\), we have
It follows from Lemma 6 that
and
Combing the above three inequalities, we can obtain that
It follows from Lemma 5 that
which infers that
This together with the Lipschitz continuity of r(Z) yields that
Here, \(\bar{M} > 0\) is a constant defined in Sect. 4. According to Condition 1, we know that \(\bar{M} - 1/\eta \le -\bar{M}\). Hence, we finally arrive at
This completes the proof. \(\square \)
Lemma 8
Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 with the parameters satisfying Condition 1. Then for any \(k \in \mathbb {N}\), it can be verified that
where \(\rho \ge 1\) is a constant defined in Sect. 4.
Proof
The inequality (31) directly results in the following relationship.
According to the definition of \(q^{(k)}\), it follows that
By straightforward calculations, we can deduce that
and
The above three inequalities yield that
which further implies that
This completes the proof. \(\square \)
Lemma 9
Suppose \(Z^{(k+1)}\) is the \((k+1)\)-th iterate generated by Algorithm 2 and satisfies the following condition:
where \(\underline{\sigma } \in (0,1)\) is a constant defined in Condition 1. Let the algorithm parameters satisfy Conditions 1 and 2. Then for any \(i=1,\dotsc ,d\), it holds that
and
Proof
It follows from Condition 2 that \(\beta _i > c_i^{\prime } \left\| A_i\right\| _2^2\), which together with (25) yields that
And it can be checked that
Suppose \(\hat{\sigma }_1, \dotsc , \hat{\sigma }_p\) are the singular values of \((X_i^{(k)})^{\top }Z^{(k+1)}\). It is clear that \(0 \le \hat{\sigma }_i \le 1\) for any \(i = 1, \dotsc , p\) due to the orthogonality of \(X_i^{(k)}\) and \(Z^{(k+1)}\). On the one hand, we have
On the other hand, it follows from \({\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) \) that
Let \(Y_i^{(k)} = (X_i^{(k)})^{\top }Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)}\). By straightforward calculations, we can derive that
Combining (35), (36) and (37), we acquire the assertion (33). Then it follows from the definition of \(h_i^{(k)}\) that
By straightforward calculations, we can obtain that
and
The above three relationships yield (34). We complete the proof. \(\square \)
Lemma 10
Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\) with the parameters satisfying Conditions 1 and 2. Then for any \(i=1,\dotsc ,d\) and \(k \in \mathbb {N}\), it holds that
Proof
We use mathematical induction to prove this lemma. To begin with, it follows from the inequality (32) that
under the relationship \(\beta _i > 4 (\sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p ) / ( 1 - \underline{\sigma }^2 )\) in Condition 2. Thus, the argument (38) directly holds for \(( Z^{(1)}, \{X_i^{(0)}\} )\). Now, we assume the argument holds at \(( Z^{(k+1)}, \{X_i^{(k)}\} )\), and investigate the situation at \(( Z^{(k+2)}, \{X_i^{(k+1)}\} )\).
According to Condition 2, we have \(12 \sqrt{p} \left\| A_i\right\| ^2_{\textrm{F}}/\beta _i < 2\left( 1 - \underline{\sigma }^2\right) c_i \underline{\sigma }^2\).
Since we assume that \({\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) \), it follows from the relationship (34) that
which infers that \(\sigma _{\min } \left( (X_i^{(k+1)})^{\top }Z^{(k+1)} \right) \ge \underline{\sigma }\). Similar to the proof of Lemma 9, we can acquire that
Combining the condition (26) and the equality (36), we have
On the other hand, it follows from the triangular inequality that
Combing the inequality (39), it can be verified that
Moreover, according to Lemma B.4 in [48], we have
Combing the above three inequalities, we further obtain that
Together with (40), this yields that
According to Conditions 1 and 2, we have \(\sqrt{2} \underline{\sigma } \beta _i - 8 \left\| A_i\right\| _2^2 > 0\) and \(\underline{\sigma } - 2\sqrt{\rho d} \delta _i > 0\). Thus, it can be verified that
where the last inequality follows from the relationship \(\beta > \dfrac{4 \left( 2 \sqrt{\rho d} + \sqrt{2}\right) \left\| A_i\right\| _2^2}{\underline{\sigma } - 2 \sqrt{\rho d} \delta _i}\) in Condition 2. This together with (32) and (38) yields that
since we assume that \(\beta _i > 8 ( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p ) / ( 1 - \underline{\sigma }^2 )\) in Condition 2. The proof is completed. \(\square \)
Corollary 11
Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\), and the problem parameters satisfy Conditions 1 and 2. Then for any \(k \in \mathbb {N}\), we can obtain that
Proof
This corollary directly follows from Lemma 9 and Lemma 10. \(\square \)
Corollary 12
Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\), and problem parameters satisfy Conditions 1 and 2. Then for any \(k \in \mathbb {N}\), we can acquire that
Proof
According to the Cauchy-Schwarz inequality, we can show that
where the last inequality follows from Lemma B.4 in [48] and (41). In addition, we have
which implies that
Combing the fact that
we complete the proof. \(\square \)
Now based on these lemmas and corollaries, we can demonstrate the monotonic non-increasing of \(\left\{ \mathcal {L}( \{X_i^k\}, Z^k, \{\Lambda _i^k\} ) \right\} \), which results in the global convergence of our algorithm.
Proposition 13
Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\), and problem parameters satisfy Conditions 1 and 2. Then the sequence of augmented Lagrangian functions \(\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is monotonically non-increasing, and for any \(k \in \mathbb {N}\), it satisfies the following sufficient descent property:
where \(J_i = \dfrac{1}{2} \rho d \underline{\sigma }^2 c_i \beta _i - 2 (\sqrt{2 \rho d} + 1) \left\| A_i\right\| _2^2 > 0\) is a constant.
Proof
Combining Corollary 7, Corollary 11, and Corollary 12, we obtain that
Recalling the relationship \(\beta _i > 4 ( \sqrt{2 \rho d} + 1) \left\| A_i\right\| _2^2 / (\rho d \underline{\sigma }^2 c_i)\) in Condition 2, we can conclude that \(\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} )\). Hence, the sequence \(\{\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} )\}\) is monotonically non-increasing. Finally, the above relationship together with (41) yields the assertion (42). The proof is finished. \(\square \)
Based on the above properties, we are ready to prove Theorem 4, which establishes the global convergence rate of our proposed algorithm.
Proof (Proof of Theorem 4)
The whole sequence \(\{ Z^{(k)}, \{X_i^{(k)}\} \}\) is naturally bounded, since each of \(X_i^{(k)}\) or \(Z^{(k)}\) is orthogonal. Then it follows from the Bolzano-Weierstrass theorem that this sequence exists an accumulation point \(\{Z^{*}, \{X_i^{*}\}\}\), where \(Z^{*} \in \mathcal {S}_{n,p}\) and \(X_i^{*} \in \mathcal {S}_{n,p}\). Moreover, the boundedness of \(\{\Lambda _i^{(k)}\}\) results from the multipliers updating formula (15). Hence, the lower boundedness of \(\{ \mathcal {L}( Z^{(k)}, \{ X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is owing to the continuity of the augmented Lagrangian function. Namely, there exists a constant \(\underline{L}\) such that \(\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \underline{L}\) for all \(k \in \mathbb {N}\).
It follows from the sufficient descent property (42) that
and
Upon taking the limit as \(K \rightarrow \infty \), we obtain that
which further implies that
respectively. Combing this with Lemma 3, we know that any accumulation point \(Z^{*}\) of sequence \(\{Z^{(k)}\}\) is a first-order stationary point of the problem (2).
Eventually, we prove the sublinear convergence rate. Indeed, it follows from the inequalities (43) and (44) that
where
is a positive constant. This completes the proof. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, L., Liu, X. & Zhang, Y. A communication-efficient and privacy-aware distributed algorithm for sparse PCA. Comput Optim Appl 85, 1033–1072 (2023). https://doi.org/10.1007/s10589-023-00481-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-023-00481-4