Skip to main content
Log in

A communication-efficient and privacy-aware distributed algorithm for sparse PCA

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Sparse principal component analysis (PCA) improves interpretability of the classic PCA by introducing sparsity into the dimension-reduction process. Optimization models for sparse PCA, however, are generally non-convex, non-smooth and more difficult to solve, especially on large-scale datasets requiring distributed computation over a wide network. In this paper, we develop a distributed and centralized algorithm called DSSAL1 for sparse PCA that aims to achieve low communication overheads by adapting a newly proposed subspace-splitting strategy to accelerate convergence. Theoretically, convergence to stationary points is established for DSSAL1. Extensive numerical results show that DSSAL1 requires far fewer rounds of communication than state-of-the-art peer methods. In addition, we make the case that since messages exchanged in DSSAL1 are well-masked, the possibility of private-data leakage in DSSAL1 is much lower than in some other distributed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availibility

The authors declare that all data supporting the findings of this study are available within the article.

Notes

  1. Suppose \(D^{(k)}\) is the solution to (4). According to Lemma 5.3 in [41], \(X^{(k)}\) is a first-order stationary point if \(D^{(k)} = 0\). Therefore, the stationarity violation is defined as \(\Vert D^{(k)} \Vert _{\textrm{F}}\).

  2. A function f(X) is called orthogonal-transformation invariant if \(f (XO) = f(X)\) for any \(X \in \mathcal {S}_{n,p}\) and \(O \in \mathcal {S}_{p, p}\).

  3. More information at http://lsec.cc.ac.cn/chinese/lsec/LSSC-IVintroduction.pdf.

  4. Our code is downloadable from http://lsec.cc.ac.cn/~liuxin/code.html.

  5. Available from https://eigen.tuxfamily.org/index.php?title=Main_Page.

References

  1. Sjostrand, K., Rostrup, E., Ryberg, C., Larsen, R., Studholme, C., Baezner, H., Ferro, J., Fazekas, F., Pantoni, L., Inzitari, D., et al.: Sparse decomposition and modeling of anatomical shape variation. IEEE Trans. Med. Imaging 26(12), 1625–1635 (2007). https://doi.org/10.1109/TMI.2007.898808

    Article  Google Scholar 

  2. Chen, G., Sullivan, P.F., Kosorok, M.R.: Biclustering with heterogeneous variance. Proc. Natl. Acad. Sci. 110(30), 12253–12258 (2013). https://doi.org/10.1073/pnas.1304376110

    Article  MathSciNet  MATH  Google Scholar 

  3. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Zou, H., Xue, L.: A selective overview of sparse principal component analysis. Proc. IEEE 106(8), 1311–1320 (2018). https://doi.org/10.1109/JPROC.2018.2846588

    Article  Google Scholar 

  5. Wu, T., Hu, E., Xu, S., Chen, M., Guo, P., Dai, Z., Feng, T., Zhou, L., Tang, W., Zhan, L., et al.: ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2(3), 100141 (2021). https://doi.org/10.1016/j.xinn.2021.100141

    Article  Google Scholar 

  6. Gravuer, K., Sullivan, J.J., Williams, P.A., Duncan, R.P.: Strong human association with plant invasion success for Trifolium introductions to New Zealand. Proc. Natl. Acad. Sci. 105(17), 6344–6349 (2008). https://doi.org/10.1073/pnas.0712026105

    Article  Google Scholar 

  7. Baden, T., Berens, P., Franke, K., Rosón, M.R., Bethge, M., Euler, T.: The functional diversity of retinal ganglion cells in the mouse. Nature 529(7586), 345–350 (2016). https://doi.org/10.1038/nature16468

    Article  Google Scholar 

  8. Stiefel, E.: Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten. Commentarii Mathematici Helvetici 8(1), 305–353 (1935). https://doi.org/10.3929/ethz-a-000092403

    Article  MathSciNet  MATH  Google Scholar 

  9. Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003). https://doi.org/10.1198/1061860032148

    Article  MathSciNet  Google Scholar 

  10. Magdon-Ismail, M.: NP-hardness and inapproximability of sparse PCA. Inf. Process. Lett. 126, 35–38 (2017). https://doi.org/10.1016/j.ipl.2017.05.008

    Article  MathSciNet  MATH  Google Scholar 

  11. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006). https://doi.org/10.1198/106186006X113430

    Article  MathSciNet  Google Scholar 

  12. d’Aspremont, A., Bach, F., El Ghaoui, L.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9(42), 1269–1294 (2008)

    MathSciNet  MATH  Google Scholar 

  13. d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007). https://doi.org/10.1137/050645506

    Article  MathSciNet  MATH  Google Scholar 

  14. Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99(6), 1015–1034 (2008). https://doi.org/10.1016/j.jmva.2007.06.007

    Article  MathSciNet  MATH  Google Scholar 

  15. Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009). https://doi.org/10.1093/biostatistics/kxp008

    Article  MATH  Google Scholar 

  16. Pacheco, P.S.: An Introduction to Parallel Programming. Elsevier, USA (2011). https://doi.org/10.1016/C2009-0-18471-4

    Book  Google Scholar 

  17. McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.y.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp. 1273–1282 (2017). PMLR. https://proceedings.mlr.press/v54/mcmahan17a.html

  18. Lou, Y., Yu, L., Wang, S., Yi, P.: Privacy preservation in distributed subgradient optimization algorithms. IEEE Trans. Cybernetics 48(7), 2154–2165 (2017). https://doi.org/10.1109/TCYB.2017.2728644

    Article  Google Scholar 

  19. Zhang, C., Ahmad, M., Wang, Y.: ADMM based privacy-preserving decentralized optimization. IEEE Trans. Inf. Forensics Secur. 14(3), 565–580 (2018). https://doi.org/10.1109/TIFS.2018.2855169

    Article  Google Scholar 

  20. Manton, J.H.: Optimization algorithms exploiting unitary constraints. IEEE Trans. Signal Process. 50(3), 635–650 (2002). https://doi.org/10.1109/78.984753

    Article  MathSciNet  MATH  Google Scholar 

  21. Nishimori, Y., Akaho, S.: Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67, 106–135 (2005). https://doi.org/10.1016/j.neucom.2004.11.035

    Article  Google Scholar 

  22. Abrudan, T.E., Eriksson, J., Koivunen, V.: Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans. Signal Process. 56(3), 1134–1147 (2008). https://doi.org/10.1109/tsp.2007.908999

    Article  MathSciNet  MATH  Google Scholar 

  23. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244

  24. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998). https://doi.org/10.1137/S0895479895290954

    Article  MathSciNet  MATH  Google Scholar 

  25. Sato, H.: A Dai-Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions. Comput. Optim. Appl. 64(1), 101–118 (2016). https://doi.org/10.1007/s10589-015-9801-1

    Article  MathSciNet  MATH  Google Scholar 

  26. Zhu, X.: A Riemannian conjugate gradient method for optimization on the Stiefel manifold. Comput. Optim. Appl. 67(1), 73–110 (2017). https://doi.org/10.1007/s10589-016-9883-4

    Article  MathSciNet  MATH  Google Scholar 

  27. Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013). https://doi.org/10.1007/s10107-012-0584-1

    Article  MathSciNet  MATH  Google Scholar 

  28. Jiang, B., Dai, Y.-H.: A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math. Program. 153(2), 535–575 (2015). https://doi.org/10.1007/s10107-014-0816-7

    Article  MathSciNet  MATH  Google Scholar 

  29. Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018). https://doi.org/10.1137/17M1142478

    Article  MathSciNet  MATH  Google Scholar 

  30. Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.-X.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), 2239–2269 (2019). https://doi.org/10.1137/18M121112X

    Article  MathSciNet  MATH  Google Scholar 

  31. Absil, P.-A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2006). https://doi.org/10.1007/s10208-005-0179-9

    Article  MathSciNet  MATH  Google Scholar 

  32. Gao, B., Liu, X., Chen, X., Yuan, Y.-X.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018). https://doi.org/10.1137/16M1098759

    Article  MathSciNet  MATH  Google Scholar 

  33. Wang, L., Gao, B., Liu, X.: Multipliers correction methods for optimization problems over the Stiefel manifold. CSIAM Trans. Appl. Mathemat. 2(3), 508–531 (2021). https://doi.org/10.4208/csiam-am.SO-2020-0008

    Article  MathSciNet  Google Scholar 

  34. Gao, B., Liu, X., Yuan, Y.-X.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), 1949–1983 (2019). https://doi.org/10.1137/18m1221679

    Article  MathSciNet  MATH  Google Scholar 

  35. Xiao, N., Liu, X., Yuan, Y.-X.: A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optimiz. Methods Software (2020). https://doi.org/10.1080/10556788.2020.1852236

    Article  MATH  Google Scholar 

  36. Ferreira, O., Oliveira, P.: Subgradient algorithm on Riemannian manifolds. J. Optim. Theory Appl. 97(1), 93–104 (1998). https://doi.org/10.1023/A:1022675100677

    Article  MathSciNet  MATH  Google Scholar 

  37. Ferreira, O.P., Louzeiro, M.S., Prudente, L.F.: Iteration-complexity of the subgradient method on Riemannian manifolds with lower bounded curvature. Optimization 68(4), 713–729 (2019). https://doi.org/10.1080/02331934.2018.1542532

    Article  MathSciNet  MATH  Google Scholar 

  38. Bacák, M., Bergmann, R., Steidl, G., Weinmann, A.: A second order nonsmooth variational model for restoring manifold-valued images. SIAM J. Sci. Comput. 38(1), 567–597 (2016). https://doi.org/10.1137/15M101988X

    Article  MathSciNet  MATH  Google Scholar 

  39. Grohs, P., Hosseini, S.: Nonsmooth trust region algorithms for locally Lipschitz functions on Riemannian manifolds. IMA J. Numer. Anal. 36(3), 1167–1192 (2016). https://doi.org/10.1093/imanum/drv043

    Article  MathSciNet  MATH  Google Scholar 

  40. Hosseini, S., Uschmajew, A.: A Riemannian gradient sampling algorithm for non smooth optimization on manifolds. SIAM J. Optim. 27(1), 173–189 (2017). https://doi.org/10.1137/16M1069298

    Article  MathSciNet  MATH  Google Scholar 

  41. Chen, S., Ma, S., Man-Cho So, A., Zhang, T.: Proximal gradient method for non smooth optimization over the Stiefel manifold. SIAM J. Optim. 30(1), 210–239 (2020). https://doi.org/10.1137/18M122457X

    Article  MathSciNet  MATH  Google Scholar 

  42. Huang, W., Wei, K.: Riemannian proximal gradient methods. Mathemat. Program. (2021). https://doi.org/10.1007/s10107-021-01632-3

    Article  Google Scholar 

  43. Xiao, X., Li, Y., Wen, Z., Zhang, L.: A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76(1), 364–389 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  44. Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014). https://doi.org/10.1007/s10915-013-9740-x

    Article  MathSciNet  MATH  Google Scholar 

  45. Kovnatsky, A., Glashoff, K., Bronstein, M.M.: MADMM: A generic algorithm for non-smooth optimization on manifolds. In: European conference on computer vision, pp. 680–696 (2016). Springer

  46. Chen, W., Ji, H., You, Y.: An augmented lagrangian method for \(\ell _{1}\)-regularized optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 38(4), 570–592 (2016). https://doi.org/10.1137/140988875

    Article  MathSciNet  Google Scholar 

  47. Hajinezhad, D., Hong, M.: Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis. In: 2015 IEEE global conference on signal and information processing, pp. 255–259 (2015). https://doi.org/10.1109/GlobalSIP.2015.7418196

  48. Wang, L., Liu, X., Zhang, Y.: A distributed and secure algorithm for computing dominant SVD based on projection splitting. arXiv:2012.03461 (2020)

  49. Gemp, I., McWilliams, B., Vernade, C., Graepel, T.: Eigengame: PCA as a nash equilibrium. arXiv:2010.00554 (2020)

  50. Gang, A., Bajwa, W.U.: A linearly convergent algorithm for distributed principal component analysis. arXiv:2101.01300 (2021)

  51. Gang, A., Bajwa, W.U.: FAST-PCA: A fast and exact algorithm for distributed principal component analysis. arXiv:2108.12373 (2021)

  52. Andrade, F.L., Figueiredo, M.A., Xavier, J.: Distributed Picard iteration: application to distributed EM and distributed PCA. arXiv:2106.10665 (2021)

  53. Ye, H., Zhang, T.: DeEPCA: decentralized exact PCA with linear convergence rate. J. Mach. Learn. Res. 22(238), 1–27 (2021)

    MathSciNet  MATH  Google Scholar 

  54. Chen, S., Garcia, A., Hong, M., Shahrampour, S.: Decentralized Riemannian gradient descent on the Stiefel manifold. In: proceedings of the 38th international conference on machine learning, VOL. 139, PP. 1594–1605 (2021). https://proceedings.mlr.press/v139/chen21g.html

  55. Wang, L., Liu, X.: Decentralized optimization over the Stiefel manifold by an approximate augmented Lagrangian function. IEEE Trans. Signal Process. 70, 3029–3041 (2022). https://doi.org/10.1109/TSP.2022.3182883

    Article  MathSciNet  Google Scholar 

  56. Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, ASIA (1990)

    Book  MATH  Google Scholar 

  57. Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pacific J. Optim. 10(2), 415–434 (2014)

    MathSciNet  MATH  Google Scholar 

  58. Arrow, K.J., Azawa, H., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming vol. 2. Stanford University Press, (1958)

  59. He, B., You, Y., Yuan, X.: On the convergence of primal-dual hybrid gradient algorithm. SIAM J. Imag. Sci. 7(4), 2526–2537 (2014). https://doi.org/10.1137/140963467

    Article  MathSciNet  MATH  Google Scholar 

  60. Xiao, N., Liu, X., Yuan, Y.-x.: A penalty-free infeasible approach for a class of nonsmooth optimization problems over the Stiefel manifold. arXiv:2103.03514 (2021)

  61. Rutishauser, H.: Simultaneous iteration method for symmetric matrices. Numer. Math. 16(3), 205–223 (1970). https://doi.org/10.1007/BF02219773

    Article  MathSciNet  MATH  Google Scholar 

  62. Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, sub gradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017). https://doi.org/10.1007/s10957-017-1093-4

    Article  MathSciNet  MATH  Google Scholar 

  63. Jiang, B., Ma, S., So, A.M.-C., Zhang, S.: Vector transport-free SVRG with general retraction for Riemannian optimization: complexity analysis and practical implementation. arXiv:1705.09059 (2017)

Download references

Funding

The work of the first author was supported by the National Key R &D Program of China (No. 2020YFA0711900, 2020YFA0711904). The work of the second author was supported in part by the National Natural Science Foundation of China (No. 12125108, 11971466, 12288201, 12021001, 11991021) and Key Research Program of Frontier Sciences, Chinese Academy of Sciences (No. ZDBS-LY-7022). The work of the third author was supported in part by the Shenzhen Science and Technology Program (No. GXWD20201231105722002-20200901175001001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Liu.

Ethics declarations

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A proof of lemma 1

Proof of Lemma 1

According to the definition of \(\textrm{Proj}_{\mathcal {T}_{Z} \mathcal {S}_{n,p}} \left( \cdot \right) \), it follows that

$$\begin{aligned}{} & {} \left\| \textrm{Proj}_{\mathcal {T}_{Z} \mathcal {S}_{n,p}} \left( -A A^{\top }Z + R(Z)\right) \right\| ^2_{\textrm{F}}\\ = {}{} & {} \dfrac{1}{4}\left\| Z^{\top }\left( -A A^{\top }Z + R(Z)\right) - \left( -A A^{\top }Z + R(Z)\right) ^{\top }Z\right\| ^2_{\textrm{F}}\\{} & {} + \left\| {\textbf{P}}^{\perp }_{Z} \left( -A A^{\top }Z + R(Z)\right) \right\| ^2_{\textrm{F}}\\ = {}{} & {} \left\| {\textbf{P}}^{\perp }_{Z} \left( -A A^{\top }Z + R(Z)\right) \right\| ^2_{\textrm{F}}+ \dfrac{1}{4}\left\| Z^{\top }R(Z) - R(Z)^{\top }Z\right\| ^2_{\textrm{F}}, \end{aligned}$$

where \(R(Z) \in \partial r(Z)\). The proof is completed.\(\square \)

Appendix B proof of proposition 2

Proof of Proposition 2

To begin with, we assume that \(\left( Z, \{X_i\} \right) \) is a first-order stationary point. Then there exists \(R(Z) \in \partial r(Z)\) such that

$$\begin{aligned} {\textbf{P}}^{\perp }_{Z} \left( - A A^{\top }Z + R(Z)\right) = 0, \end{aligned}$$

and \(Z^{\top }R(Z)\) is symmetric. Let \(\Theta = Z^{\top }R(Z) \in Z^{\top }\partial r(Z)\), \(\Gamma _i = - X_i^{\top }A_i A_i^{\top }X_i\), and

$$\begin{aligned} \Lambda _i = - {{\textbf {P}}}^{\perp }_{X_i} A_i A_i^{\top }X_i X_i^{\top }- X_i X_i^{\top }A_i A_i^{\top }{{\textbf {P}}}^{\perp }_{X_i}, \end{aligned}$$

with \(i=1,\dotsc ,d\). Then the matrices \(\Theta \), \(\Gamma _i\) and \(\Lambda _i\) are symmetric and \(\textrm{rank}\left( \Lambda _i\right) \le 2p\). Moreover, we can deduce that

$$\begin{aligned} A_i A_i^{\top }X_i + X_i \Gamma _i + \Lambda _i X_i =A_i A_i^{\top }X_i - X_iX_i^{\top }A_i A_i^{\top }X_i - {\textbf{P}}^{\perp }_{X_i} A_i A_i^{\top }X_i = 0, \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} R(Z) + \sum \limits _{i=1}^d\Lambda _i Z - Z \Theta = {}&R(Z) - \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{Z} A_i A_i^{\top }Z - ZZ^{\top }R(Z) \\ = {}&{\textbf{P}}^{\perp }_{Z} \left( - A A^{\top }Z + R(Z)\right) = 0. \end{aligned} \end{aligned}$$

Hence, \(\left( Z, \{X_i\} \right) \) satisfies the conditions in (9) under these specific choices of \(\Theta \), \(\Gamma _i\) and \(\Lambda _i\).

Conversely, we now assume that there exist \(R(Z) \in \partial r(Z)\) and symmetric matrices \(\Theta \), \(\Gamma _i\) and \(\Lambda _i\) such that \(\left( Z, \{X_i\} \right) \) satisfies the conditions in (9). It follows from the first and second equality in (9) that

$$\begin{aligned} \begin{aligned}&\sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} A_i A_i^{\top }X_i X_i^{\top }= - \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} \left( X_i\Gamma _i + \Lambda _i X_i \right) X_i^{\top }= - \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} \Lambda _i X_i X_i^{\top }\\&= - {\textbf{P}}^{\perp }_{Z} \left( \sum \limits _{i=1}^d\Lambda _i Z \right) Z^{\top }= {\textbf{P}}^{\perp }_{Z}\left( R(Z) - Z\Theta \right) Z^{\top }= {\textbf{P}}^{\perp }_{Z} R(Z) Z^{\top }. \end{aligned} \end{aligned}$$

At the same time, since \(X_i X_i^{\top }= Z Z^{\top }\), we have

$$\begin{aligned} \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} A_i A_i^{\top }X_i X_i^{\top }={}&\sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{Z} A_i A_i^{\top }Z Z^{\top }= {\textbf{P}}^{\perp }_{Z} A A^{\top }Z Z^{\top }. \end{aligned}$$

Combining the above two equalities and orthogonality of Z, we arrive at

$$\begin{aligned} {\textbf{P}}^{\perp }_{Z} \left( - A A^{\top }Z + R(Z)\right) = 0. \end{aligned}$$

Left-multiplying both sides of the second equality in (9) by \(Z^{\top }\), we obtain that

$$\begin{aligned} Z^{\top }R(Z) = \Theta - \sum \limits _{i=1}^dZ^{\top }\Lambda _i Z, \end{aligned}$$

which together with the symmetry of \(\Lambda _i\) and \(\Theta \) implies that \(Z^{\top }R(Z)\) is also symmetric. This completes the proof. \(\square \)

Appendix C proof of lemma 3

Proof of Lemma 3

Since \(( Z^{(k)}, \{X_i^{(k)}\} )\) is feasible, we know \(X_i^{(k)} (X_i^{(k)})^{\top }{=} Z^{(k)} (Z^{(k)})^{\top }\) for \(i=1,\dotsc ,d\).

Thus, it can be readily verified that

$$\begin{aligned} \begin{aligned} Q^{(k)} Z^{(k)} = {}&\sum \limits _{i=1}^d\left( \Lambda _i^{(k)} - \beta _i X_i^{(k)} (X_i^{(k)})^{\top }\right) Z^{(k)} \\ = {}&\sum \limits _{i=1}^d\left( - {\textbf{P}}^{\perp }_{Z^{(k)}} A_i A_i^{\top }Z^{(k)} (Z^{(k)})^{\top }- \beta _i Z^{(k)} (Z^{(k)})^{\top }\right) Z^{(k)} \\ = {}&- {\textbf{P}}^{\perp }_{Z^{(k)}} A_i A_i^{\top }Z^{(k)} - \left( \sum \limits _{i=1}^d\beta _i \right) Z^{(k)}, \end{aligned} \end{aligned}$$

which implies that

$$\begin{aligned} \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( Q^{(k)}Z^{(k)}\right) = \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( - A_i A_i^{\top }Z^{(k)}\right) . \end{aligned}$$

According to Theorem 4.1 in [57], the first-order optimality condition of (18) can be stated as:

$$\begin{aligned} 0 \in \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( Q^{(k)} Z^{(k)} + \dfrac{1}{\eta } D^{(k)} + \partial r(Z^{(k)} + D^{(k)})\right) . \end{aligned}$$

Since \(D^{(k)} = 0\) is the global minimizer of (18), we have

$$\begin{aligned} \begin{aligned} 0 \in {}&\textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( Q^{(k)} Z^{(k)} + \partial r(Z^{(k)})\right) \\ = {}&\textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( - A_i A_i^{\top }Z^{(k)} + \partial r(Z^{(k)})\right) . \end{aligned} \end{aligned}$$

We obtain the assertion of this lemma. \(\square \)

Appendix D convergence of algorithm 2

Now we prove Theorem 4 to establish the global convergence of Algorithm 2. In addition to the notations introduced in Sect. 1, we further adopt the followings throughout the theoretical analysis. The notations \(\textrm{rank}\left( C\right) \) and \(\sigma _{\min } \left( C\right) \) represent the rank and the smallest singular value of \(C\), respectively. For \(X, Y \in \mathcal {S}_{n,p}\), we define \({\mathbf {D_p}}\left( X,Y\right) := XX^{\top }- YY^{\top }\) and \({\mathbf {d_p}}\left( X,Y\right) := \left\| {\mathbf {D_p}}\left( X,Y\right) \right\| _{\textrm{F}}\), standing for, respectively, the projection distance matrix and its measurement.

To begin with, we provide a sketch of our proof. Suppose \(\{Z^{(k)}\}\) is the iteration sequence generated by Algorithm 2, with \(X_i^{(k)}\) and \(\Lambda _i^{(k)}\) being the local variable and multiplier of the i-th agent at the k-th iteration, respectively. The proof includes the following main steps.

  1. 1.

    The sequence \(\{Z^{(k)}\}\) is bounded and the sequence \(\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is bounded from below.

  2. 2.

    The sequence \(\{Z^{(k)}\}\) satisfies \({\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 ( 1 -\underline{\sigma }^2 )\), and \(\underline{\sigma }\) is a unified lower bound of the smallest singular values of the matrices \((X_i^k)^{\top }Z^{k+1}(i=1,\dotsc ,d)\).

  3. 3.

    The sequence \(\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is monotonically non-increasing, and hence is convergent.

  4. 4.

    The sequence \(\{Z^{(k)}\}\) has at least one accumulation point, and any accumulation point is a first-order stationary point of the sparse PCA problem (2).

Next we verify all the items in the above sketch by proving the following lemmas and corollaries.

Lemma 5

Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2. Let

$$\begin{aligned} g^{(k)} (D) = \left\langle Q^{(k)}Z^{(k)}, D\right\rangle + \dfrac{1}{2\eta } \left\| D\right\| ^2_{\textrm{F}}+ r(Z^{(k)} + D). \end{aligned}$$

Then the following relationship holds for any \(k \in \mathbb {N}\),

$$\begin{aligned} g^{(k)} (0) - g^{(k)} (D^{(k)}) \ge \dfrac{1}{2\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned}$$

Proof

Since \(g^{(k)}\) is strongly convex with modulus \(\dfrac{1}{\eta }\), we have

$$\begin{aligned} g^{(k)} (\hat{D}) \ge g^{(k)} (D) + \left\langle \partial g^{(k)} (D), \hat{D} - D\right\rangle + \dfrac{1}{2\eta } \left\| \hat{D} - D\right\| ^2_{\textrm{F}}, \end{aligned}$$
(30)

for any \(D, \hat{D} \in \mathbb {R}^{n\times p}\). In particular, if \(\hat{D}, D \in \mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}\), it holds that

$$\begin{aligned} \left\langle \partial g^{(k)} (D), \hat{D} - D\right\rangle = \left\langle \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( \partial g^{(k)} (D)\right) , \hat{D} - D\right\rangle . \end{aligned}$$

It follows from the first-order optimality condition of (18) that \(0 \in \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( \partial g^{(k)}\right) \) \({ (D^{(k)})}\). Finally, taking \(\hat{D} = 0\) and \(D = D^{(k)}\) in (30) yields the assertion of this lemma. \(\square \)

Lemma 6

Suppose \(Z \in \mathcal {S}_{n,p}\) and \(D \in \mathcal {T}_{Z} \mathcal {S}_{n,p}\). Then it holds that

$$\begin{aligned} \left\| \textrm{Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z\right\| _{\textrm{F}}\le \left\| D\right\| _{\textrm{F}}, \end{aligned}$$

and

$$\begin{aligned} \left\| \textrm{Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z - D\right\| _{\textrm{F}}\le \dfrac{1}{2} \left\| D\right\| ^2_{\textrm{F}}. \end{aligned}$$

Proof

The proof can be found in, for example, [63]. For the sake of completeness, we provide a proof here. It follows from the orthogonality of Z and the skew-symmetry of \(Z^{\top }D\) that \(Z + D\) has full column rank. This yields that \(\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) = (Z + D)F^{-1}\), where \(F = (I_p + D^{\top }D)^{1/2}\). Since \(\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) - Z = ( Z (I_p - F) + D ) F^{-1}\), we have

$$\begin{aligned} \begin{aligned} \left\| \textrm{Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z\right\| ^2_{\textrm{F}}= {}&2\textrm{tr}\left( I_p - F^{-1}\right) - 2 \textrm{tr}\left( F^{-1}Z^{\top }D\right) = 2\textrm{tr}\left( I_p - F^{-1}\right) \\ = {}&2 \sum \limits _{j=1}^d\left( 1 - \left( 1 + \tilde{\sigma }_i^2 \right) ^{-1/2} \right) \le \sum \limits _{j=1}^d\tilde{\sigma }_i^2 = \left\| D\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

where \(\tilde{\sigma }_1 \ge \cdots \ge \tilde{\sigma }_d \ge 0\) are the singular values of D. Similarly, it follows from the relationship \(\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) - Z - D = (Z + D) (F^{-1}- I_p)\) that

$$\begin{aligned} \begin{aligned} \left\| \text {Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z - D\right\| _{\text {F}}^2= {}&\text {tr}\left( \left( I_p - F\right) ^2 \right) = \sum \limits _{j=1}^d\left( 1 - \left( 1 + \tilde{\sigma }_i^2\right) ^{1/2} \right) ^2 \\ \le {}&\dfrac{1}{4} \sum \limits _{j=1}^d\tilde{\sigma }_i^4 = \dfrac{1}{4} \left\| D\right\| _{\text {F}}^4, \end{aligned} \end{aligned}$$

which completes the proof. \(\square \)

Corollary 7

Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 with the parameters satisfying Condition 1. Then for any \(k \in \mathbb {N}\), it holds that

$$\begin{aligned} \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned}$$
(31)

where \(\bar{M} > 0\) is a constant defined in Sect. 4.

Proof

Firstly, it can be readily verified that

$$\begin{aligned} \left\| Q^{(k)}\right\| _{\textrm{F}}\le \sum \limits _{i=1}^d\left\| Q_i^{(k)}\right\| _{\textrm{F}}\le \sum \limits _{i=1}^d\left( 2 \left\| A_i\right\| ^2_{\textrm{F}}+ \sqrt{p} \beta _i \right) . \end{aligned}$$

Let \(\bar{q}^{(k)} (Z) = \textrm{tr}(Z^{\top }Q^{(k)} Z) / 2\) bethe smooth part of the objective function \(q^{(k)} (Z)\) in (16). Since \(\nabla \bar{q}^{(k)}\) is Lipschitz continuous with the corresponding Lipschitz constant \(\left\| Q^{(k)}\right\| _{\textrm{F}}\), we have

$$\begin{aligned} \begin{aligned} \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) \le {}&\left\langle Q^{(k)} Z^{(k)}, Z^{(k+1)} - Z^{(k)}\right\rangle \\&+ \dfrac{1}{2} \left\| Q^{(k)}\right\| _{\textrm{F}}\left\| Z^{(k+1)} - Z^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

It follows from Lemma 6 that

$$\begin{aligned} \begin{aligned} \left\langle Q^{(k)} Z^{(k)}, Z^{(k+1)} - Z^{(k)} - D^{(k)}\right\rangle \le {}&\left\| Q^{(k)} Z^{(k)}\right\| _{\textrm{F}}\left\| Z^{(k+1)} - Z^{(k)} - D^{(k)}\right\| _{\textrm{F}}\\ \le {}&\sum \limits _{i=1}^d\left( \left\| A_i\right\| ^2_{\textrm{F}}+ \dfrac{\sqrt{p}}{2} \beta _i \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \dfrac{1}{2} \left\| Q^{(k)}\right\| _{\textrm{F}}\left\| Z^{(k+1)} - Z^{(k)}\right\| ^2_{\textrm{F}}\le \sum \limits _{i=1}^d\left( \left\| A_i\right\| ^2_{\textrm{F}}+ \dfrac{\sqrt{p}}{2} \beta _i \right) \left\| D^k\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

Combing the above three inequalities, we can obtain that

$$\begin{aligned} \begin{aligned} \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) \le \left\langle Q^{(k)} Z^{(k)}, D^{(k)}\right\rangle + \sum \limits _{i=1}^d\left( 2\left\| A_i\right\| ^2_{\textrm{F}}+ \beta _i \sqrt{p} \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

It follows from Lemma 5 that

$$\begin{aligned} \begin{aligned}&\left\langle Q^{(k)} Z^{(k)}, D^{(k)}\right\rangle + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&= g^{(k)} (D^{(k)}) - g^{(k)} (0) - \dfrac{1}{2\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}\le - \dfrac{1}{\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

which infers that

$$\begin{aligned} \begin{aligned}&\bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&\le \sum \limits _{i=1}^d\left( 2\left\| A_i\right\| ^2_{\textrm{F}}+ \beta _i \sqrt{p} \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}- \dfrac{1}{\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

This together with the Lipschitz continuity of r(Z) yields that

$$\begin{aligned} \begin{aligned}&q^{(k)} (Z^{(k+1)}) - q^{(k)} (Z^{(k)}) \\&= \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k+1)}) - r(Z^{(k)}) \\&= \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&\quad + r(Z^{(k+1)}) - r(Z^{(k)} + D^{(k)}) \\&\le \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&\quad + \mu \sqrt{np} \left\| Z^{(k+1)} - Z^{(k)} - D^{(k)}\right\| _{\textrm{F}}\\&\le \sum \limits _{i=1}^d\left( 2\left\| A_i\right\| ^2_{\textrm{F}}+ \beta _i \sqrt{p} \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}- \dfrac{1}{\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}+ \dfrac{\mu }{2} \sqrt{np} \left\| D^{(k)}\right\| ^2_{\textrm{F}}\\&= \left( \bar{M} - \dfrac{1}{\eta } \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

Here, \(\bar{M} > 0\) is a constant defined in Sect. 4. According to Condition 1, we know that \(\bar{M} - 1/\eta \le -\bar{M}\). Hence, we finally arrive at

$$\begin{aligned} \begin{aligned} \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\}) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\}) = {}&q^{(k)} (Z^{(k)}) - q^{(k)} (Z^{(k+1)}) \\ \ge {}&\bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

This completes the proof. \(\square \)

Lemma 8

Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 with the parameters satisfying Condition 1. Then for any \(k \in \mathbb {N}\), it can be verified that

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le \rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) + \dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) , \end{aligned}$$
(32)

where \(\rho \ge 1\) is a constant defined in Sect. 4.

Proof

The inequality (31) directly results in the following relationship.

$$\begin{aligned} {q}^{(k)} (Z^{(k)}) - {q}^{(k)} (Z^{(k+1)}) \ge 0. \end{aligned}$$

According to the definition of \(q^{(k)}\), it follows that

$$\begin{aligned} \begin{aligned} 0 \le {}&\dfrac{1}{2} \textrm{tr}\left( (Z^{(k)})^{\top }Q^{(k)} Z^{(k)} \right) - \dfrac{1}{2} \textrm{tr}\left( (Z^{(k+1)})^{\top }Q^{(k)} Z^{(k+1)} \right) + r(Z^{(k)}) - r(Z^{(k+1)}) \\ \le {}&\dfrac{1}{2} \sum \limits _{j=1}^d\textrm{tr}\left( \left( \beta _j X_j^{(k)} (X_j^{(k)})^{\top }- \Lambda _j^{(k)} \right) {\mathbf {D_p}}\left( Z^{(k+1)},Z^{(k)}\right) \right) + 2 \mu n p. \end{aligned} \end{aligned}$$

By straightforward calculations, we can deduce that

$$\begin{aligned} \begin{aligned} \sum \limits _{j=1}^d\textrm{tr}\left( \Lambda _j^{(k)} {\mathbf {D_p}}\left( Z^{(k)},Z^{(k+1)}\right) \right) \le {}&\sum \limits _{j=1}^d\left\| \Lambda _j^{(k)}\right\| _{\textrm{F}}{\mathbf {d_p}}\left( Z^{(k+1)},Z^{(k)}\right) \\ \le {}&4 \sqrt{p} \sum \limits _{j=1}^d\left\| A_j\right\| ^2_{\textrm{F}}= 4\sqrt{p} \left\| A\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\sum \limits _{j=1}^d\beta _j \textrm{tr}\left( X_j^{(k)} (X_j^{(k)})^{\top }{\mathbf {D_p}}\left( Z^{(k+1)},Z^{(k)}\right) \right) \\&= \dfrac{1}{2} \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) - \dfrac{1}{2} \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) . \end{aligned} \end{aligned}$$

The above three inequalities yield that

$$\begin{aligned} \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) \le \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) + 8 \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ 8 \mu n p, \end{aligned}$$

which further implies that

$$\begin{aligned} \begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le {}&\dfrac{1}{\beta _i}\sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) \\ \le {}&\rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) + \dfrac{8}{\beta _i}\left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) . \end{aligned} \end{aligned}$$

This completes the proof. \(\square \)

Lemma 9

Suppose \(Z^{(k+1)}\) is the \((k+1)\)-th iterate generated by Algorithm 2 and satisfies the following condition:

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) , \end{aligned}$$

where \(\underline{\sigma } \in (0,1)\) is a constant defined in Condition 1. Let the algorithm parameters satisfy Conditions 1 and 2. Then for any \(i=1,\dotsc ,d\), it holds that

$$\begin{aligned} h_i^{(k)} ( X_i^{(k)} ) - h_i^{(k)} ( X_i^{(k+1)} ) \ge \frac{1}{4} \underline{\sigma }^2 c_i \beta _i {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned}$$
(33)

and

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \le \left( 1 - c_i \underline{\sigma }^2\right) {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) + \dfrac{12}{\beta _i} \sqrt{p} \left\| A_i\right\| ^2_{\textrm{F}}. \end{aligned}$$
(34)

Proof

It follows from Condition 2 that \(\beta _i > c_i^{\prime } \left\| A_i\right\| _2^2\), which together with (25) yields that

$$\begin{aligned} h_i^{(k)} ( X_i^{(k)} ) - h_i^{(k)} ( X_i^{(k+1)} ) \ge \dfrac{c_i}{2\beta _i} \left\| {{\textbf {P}}}^{\perp }_{X_i^{(k)}} H_i^{(k)} X_i^{(k)} \right\| ^2_{\text {F}}. \end{aligned}$$
(35)

And it can be checked that

$$\begin{aligned} \begin{aligned}&{\textbf{P}}^{\perp }_{X_i^{(k)}} H_i^{(k)} X_i^{(k)} = {\textbf{P}}^{\perp }_{X_i^{(k)}} \left( A_i A_i^{\top }X_i^{(k)} + \Lambda _i^{(k)} X_i^{(k)} + \beta _i Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)} \right) \\&= {\textbf{P}}^{\perp }_{X_i^{(k)}} \left( A_i A_i^{\top }X_i^{(k)} - {\textbf{P}}^{\perp }_{X_i^{(k)}} A_i A_i^{\top }X_i^{(k)} \right) -\beta _i {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)} \\&= -\beta _i {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)}. \end{aligned} \end{aligned}$$
(36)

Suppose \(\hat{\sigma }_1, \dotsc , \hat{\sigma }_p\) are the singular values of \((X_i^{(k)})^{\top }Z^{(k+1)}\). It is clear that \(0 \le \hat{\sigma }_i \le 1\) for any \(i = 1, \dotsc , p\) due to the orthogonality of \(X_i^{(k)}\) and \(Z^{(k+1)}\). On the one hand, we have

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) = \left\| X_i^{(k)}(X_i^{(k)})^{\top }- Z^{(k+1)}(Z^{(k+1)})^{\top }\right\| ^2_{\textrm{F}}= 2\sum \limits _{j=1}^p \left( 1 - \hat{\sigma }_j^2 \right) . \end{aligned}$$

On the other hand, it follows from \({\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) \) that

$$\begin{aligned} \sigma _{\min }\left( (X_i^{(k)})^{\top }Z^{(k+1)} \right) \ge \underline{\sigma }. \end{aligned}$$

Let \(Y_i^{(k)} = (X_i^{(k)})^{\top }Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)}\). By straightforward calculations, we can derive that

$$\begin{aligned} \begin{aligned}&\left\| {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)}\right\| ^2_{\textrm{F}}= \textrm{tr}\left( Y_i^{(k)}\right) - \textrm{tr}\left( (Y_i^{(k)})^2\right) = \sum \limits _{j=1}^p \hat{\sigma }_j^2 \left( 1 - \hat{\sigma }_j^2 \right) \\&\ge \sum \limits _{j=1}^p \sigma _{\min }^2\left( (X_i^{(k)})^{\top }Z^{(k+1)} \right) \left( 1 - \hat{\sigma }_j^2 \right) \ge \dfrac{1}{2} \underline{\sigma }^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$
(37)

Combining (35), (36) and (37), we acquire the assertion (33). Then it follows from the definition of \(h_i^{(k)}\) that

$$\begin{aligned} \begin{aligned} c_i \underline{\sigma }^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le {}&2 \textrm{tr}\left( Z^{(k+1)}(Z^{(k+1)})^{\top }{\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) \\&+ \dfrac{2}{\beta _i} \textrm{tr}\left( \left( A_i A_i^{\top }+ \Lambda _i^{(k)} \right) {\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) . \end{aligned} \end{aligned}$$

By straightforward calculations, we can obtain that

$$\begin{aligned} \begin{aligned} \textrm{tr}\left( \left( A_i A_i^{\top }+ \Lambda _i^{(k)} \right) {\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) \le {}&\left\| A_i A_i^{\top }+ \Lambda _i^{(k)} \right\| _{\textrm{F}}{\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \\ \le {}&6 \sqrt{p}\left\| A_i\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\textrm{tr}\left( Z^{(k+1)}(Z^{(k+1)})^{\top }{\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) \\&= \dfrac{1}{2} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) - \dfrac{1}{2} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned} \end{aligned}$$

The above three relationships yield (34). We complete the proof. \(\square \)

Lemma 10

Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\) with the parameters satisfying Conditions 1 and 2. Then for any \(i=1,\dotsc ,d\) and \(k \in \mathbb {N}\), it holds that

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) . \end{aligned}$$
(38)

Proof

We use mathematical induction to prove this lemma. To begin with, it follows from the inequality (32) that

$$\begin{aligned} \begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(1)},X_i^{(0)}\right) \le {}&\rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(0)},X_j^{(0)}\right) + \dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) \\ = {}&\dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) \le 2 \left( 1 - \underline{\sigma }^2\right) , \end{aligned} \end{aligned}$$

under the relationship \(\beta _i > 4 (\sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p ) / ( 1 - \underline{\sigma }^2 )\) in Condition 2. Thus, the argument (38) directly holds for \(( Z^{(1)}, \{X_i^{(0)}\} )\). Now, we assume the argument holds at \(( Z^{(k+1)}, \{X_i^{(k)}\} )\), and investigate the situation at \(( Z^{(k+2)}, \{X_i^{(k+1)}\} )\).

According to Condition 2, we have \(12 \sqrt{p} \left\| A_i\right\| ^2_{\textrm{F}}/\beta _i < 2\left( 1 - \underline{\sigma }^2\right) c_i \underline{\sigma }^2\).

Since we assume that \({\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) \), it follows from the relationship (34) that

$$\begin{aligned} \begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \le {}&\left( 1 - c_i \underline{\sigma }^2 \right) {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) + \dfrac{12}{\beta _i} \sqrt{p} \left\| A_i\right\| ^2_{\textrm{F}}\\ \le {}&2\left( 1 - \underline{\sigma }^2\right) \left( 1 - c_i \underline{\sigma }^2\right) + 2\left( 1 - \underline{\sigma }^2\right) c_i\underline{\sigma }^2 = 2\left( 1 - \underline{\sigma }^2\right) , \end{aligned} \end{aligned}$$

which infers that \(\sigma _{\min } \left( (X_i^{(k+1)})^{\top }Z^{(k+1)} \right) \ge \underline{\sigma }\). Similar to the proof of Lemma 9, we can acquire that

$$\begin{aligned} \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k+1)} \right\| ^2_{\textrm{F}}\ge \dfrac{1}{2} \underline{\sigma }^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned}$$
(39)

Combining the condition (26) and the equality (36), we have

$$\begin{aligned} \begin{aligned}&\left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} H_i^{(k)} X_i^{(k+1)} \right\| _{\textrm{F}}\le \delta _i \left\| {\textbf{P}}^{\perp }_{X_i^{(k)}} H_i^{(k)} X_i^{(k)} \right\| _{\textrm{F}}\\&= \delta _i \beta _i \left\| {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)} \right\| _{\textrm{F}}\le \delta _i \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$
(40)

On the other hand, it follows from the triangular inequality that

$$\begin{aligned} \begin{aligned}&\left\| {{\textbf {P}}}^{\perp }_{X_i^{(k+1)}} H_i^{(k)} X_i^{(k+1)} \right\| _{\text {F}}\\ \ge {}&\left\| {{\textbf {P}}}^{\perp }_{X_i^{(k+1)}} \left( A_i A_i^{\top }+ \Lambda _i^{(k+1)} + \beta _i Z^{(k+1)}(Z^{(k+1)})^{\top }\right) X_i^{(k+1)} \right\| _{\text {F}}\\ {}&- \left\| {{\textbf {P}}}^{\perp }_{X_i^{(k+1)}} \left( \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right) X_i^{(k+1)} \right\| _{\text {F}}. \end{aligned} \end{aligned}$$

Combing the inequality (39), it can be verified that

$$\begin{aligned} \begin{aligned}&\left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} \left( A_i A_i^{\top }+ \Lambda _i^{(k+1)} + \beta _i Z^{(k+1)}(Z^{(k+1)})^{\top }\right) X_i^{(k+1)} \right\| _{\textrm{F}}\\&= \beta _i \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k+1)} \right\| _{\textrm{F}}\ge \dfrac{\sqrt{2}}{2}\underline{\sigma } \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned} \end{aligned}$$

Moreover, according to Lemma B.4 in [48], we have

$$\begin{aligned} \begin{aligned} \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} \left( \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right) X_i^{(k+1)} \right\| _{\textrm{F}}\le {}&\left\| \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right\| _{\textrm{F}}\\ \le {}&4\left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Combing the above three inequalities, we further obtain that

$$\begin{aligned} \begin{aligned} \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} H_i^{(k)} X_i^{(k+1)} \right\| _{\textrm{F}}\ge {}&\dfrac{\sqrt{2}}{2}\underline{\sigma } \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \\&- 4\left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Together with (40), this yields that

$$\begin{aligned} \begin{aligned}&\dfrac{\sqrt{2}}{2} \underline{\sigma } \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \\&\le \delta _i \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) + 4\left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \\&\le \left( \delta _i \beta _i + 4 \left\| A_i\right\| _2^2 \right) {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) + 4 \left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned} \end{aligned}$$

According to Conditions 1 and 2, we have \(\sqrt{2} \underline{\sigma } \beta _i - 8 \left\| A_i\right\| _2^2 > 0\) and \(\underline{\sigma } - 2\sqrt{\rho d} \delta _i > 0\). Thus, it can be verified that

$$\begin{aligned} \begin{aligned} {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \le {}&\dfrac{ 2 ( \delta _i \beta _i + 4 \left\| A_i\right\| _2^2 ) }{ \sqrt{2} \underline{\sigma } \beta _i - 8 \left\| A_i\right\| _2^2 } {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) \\ \le {}&\sqrt{\dfrac{1}{2 \rho d}} {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned} \end{aligned}$$
(41)

where the last inequality follows from the relationship \(\beta > \dfrac{4 \left( 2 \sqrt{\rho d} + \sqrt{2}\right) \left\| A_i\right\| _2^2}{\underline{\sigma } - 2 \sqrt{\rho d} \delta _i}\) in Condition 2. This together with (32) and (38) yields that

$$\begin{aligned} \begin{aligned}&{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+2)},X_i^{(k+1)}\right) \le \rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k+1)}\right) + \dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) \\&\le \dfrac{1}{2d} \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) + \left( 1 -\underline{\sigma }^2\right) \le \left( 1 -\underline{\sigma }^2\right) + \left( 1 -\underline{\sigma }^2\right) = 2 \left( 1 -\underline{\sigma }^2\right) , \end{aligned} \end{aligned}$$

since we assume that \(\beta _i > 8 ( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p ) / ( 1 - \underline{\sigma }^2 )\) in Condition 2. The proof is completed. \(\square \)

Corollary 11

Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\), and the problem parameters satisfy Conditions 1 and 2. Then for any \(k \in \mathbb {N}\), we can obtain that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k+1)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k)}\} ) \\&\ge \dfrac{1}{4} \underline{\sigma }^2 \sum \limits _{i=1}^dc_i\beta _i {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Proof

This corollary directly follows from Lemma 9 and Lemma 10. \(\square \)

Corollary 12

Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\), and problem parameters satisfy Conditions 1 and 2. Then for any \(k \in \mathbb {N}\), we can acquire that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&\ge - \dfrac{\sqrt{2 \rho d} + 1}{\rho d} \sum \limits _{i=1}^d\left\| A_i\right\| _2^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Proof

According to the Cauchy-Schwarz inequality, we can show that

$$\begin{aligned} \begin{aligned}&\left| \left\langle \Lambda _i^{(k+1)} - \Lambda _i^{(k)}, {\mathbf {D_p}}\left( X_i^{(k+1)},Z^{(k+1)}\right) \right\rangle \right| \le \left\| \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right\| _{\textrm{F}}{\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \\&\le \sqrt{\dfrac{8}{\rho d}} \left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned} \end{aligned}$$

where the last inequality follows from Lemma B.4 in [48] and (41). In addition, we have

$$\begin{aligned} \begin{aligned} {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \le {}&{\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) + {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) \\ \le {}&\dfrac{\sqrt{2 \rho d} + 1}{\sqrt{2 \rho d}} {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned} \end{aligned}$$

which implies that

$$\begin{aligned} \begin{aligned}&\left\langle \Lambda _i^{(k+1)} - \Lambda _i^{(k)}, {\mathbf {D_p}}\left( X_i^{(k+1)},Z^{(k+1)}\right) \right\rangle \\&\ge - \dfrac{2 \left( \sqrt{2 \rho d} + 1 \right) }{\rho d} \left\| A_i\right\| _2^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Combing the fact that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&= \dfrac{1}{2} \sum \limits _{i=1}^d\left\langle \Lambda _i^{(k+1)}-\Lambda _i^{(k)}, {\mathbf {D_p}}\left( X_i^{(k+1)},Z^{(k+1)}\right) \right\rangle , \end{aligned} \end{aligned}$$

we complete the proof. \(\square \)

Now based on these lemmas and corollaries, we can demonstrate the monotonic non-increasing of \(\left\{ \mathcal {L}( \{X_i^k\}, Z^k, \{\Lambda _i^k\} ) \right\} \), which results in the global convergence of our algorithm.

Proposition 13

Suppose \(\{Z^{(k)}\}\) is the iterate sequence generated by Algorithm 2 initiated from \(Z^{(0)} \in \mathcal {S}_{n,p}\), and problem parameters satisfy Conditions 1 and 2. Then the sequence of augmented Lagrangian functions \(\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is monotonically non-increasing, and for any \(k \in \mathbb {N}\), it satisfies the following sufficient descent property:

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&\ge \sum \limits _{i=1}^dJ_i {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) + \bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$
(42)

where \(J_i = \dfrac{1}{2} \rho d \underline{\sigma }^2 c_i \beta _i - 2 (\sqrt{2 \rho d} + 1) \left\| A_i\right\| _2^2 > 0\) is a constant.

Proof

Combining Corollary 7, Corollary 11, and Corollary 12, we obtain that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&\ge \sum \limits _{i=1}^d\left( \dfrac{1}{4} \underline{\sigma }^2 c_i \beta _i - \dfrac{\sqrt{2 \rho d} + 1}{\rho d} \left\| A_i \right\| _2^2 \right) {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) + \bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

Recalling the relationship \(\beta _i > 4 ( \sqrt{2 \rho d} + 1) \left\| A_i\right\| _2^2 / (\rho d \underline{\sigma }^2 c_i)\) in Condition 2, we can conclude that \(\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} )\). Hence, the sequence \(\{\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} )\}\) is monotonically non-increasing. Finally, the above relationship together with (41) yields the assertion (42). The proof is finished. \(\square \)

Based on the above properties, we are ready to prove Theorem 4, which establishes the global convergence rate of our proposed algorithm.

Proof (Proof of Theorem 4)

The whole sequence \(\{ Z^{(k)}, \{X_i^{(k)}\} \}\) is naturally bounded, since each of \(X_i^{(k)}\) or \(Z^{(k)}\) is orthogonal. Then it follows from the Bolzano-Weierstrass theorem that this sequence exists an accumulation point \(\{Z^{*}, \{X_i^{*}\}\}\), where \(Z^{*} \in \mathcal {S}_{n,p}\) and \(X_i^{*} \in \mathcal {S}_{n,p}\). Moreover, the boundedness of \(\{\Lambda _i^{(k)}\}\) results from the multipliers updating formula (15). Hence, the lower boundedness of \(\{ \mathcal {L}( Z^{(k)}, \{ X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}\) is owing to the continuity of the augmented Lagrangian function. Namely, there exists a constant \(\underline{L}\) such that \(\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \underline{L}\) for all \(k \in \mathbb {N}\).

It follows from the sufficient descent property (42) that

$$\begin{aligned}{} & {} \sum \limits _{k=1}^{K} \left\| D^{(k)}\right\| ^2_{\textrm{F}}\nonumber \\{} & {} \quad \le \bar{M}^{-1}\sum \limits _{k=1}^{K} \left( \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \right) \nonumber \\{} & {} \quad = \bar{M}^{-1}\left( \mathcal {L}( Z^{(1)}, \{X_i^{(1)}\}, \{\Lambda _i^{(1)}\} ) - \mathcal {L}( Z^{(K+1)}, \{X_i^{(K+1)}\}, \{\Lambda _i^{(K+1)}\} ) \right) \nonumber \\{} & {} \quad \le \bar{M}^{-1}\left( \mathcal {L}( Z^{(1)}, \{X_i^{(1)}\}, \{\Lambda _i^{(1)}\} ) - \underline{L} \right) , \end{aligned}$$
(43)

and

$$\begin{aligned}{} & {} \sum \limits _{k = 1}^{K} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) \nonumber \\{} & {} \quad \le J_i^{-1}\sum \limits _{k=1}^{K} \left( \mathcal {L}( Z^{(k-1)}, \{X_i^{(k-1)}\}, \{\Lambda _i^{(k-1)}\} ) - \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \right) \nonumber \\{} & {} \quad = J_i^{-1}\left( \mathcal {L}( Z^{(0)}, \{X_i^{(0)}\}, \{\Lambda _i^{(0)}\} ) - \mathcal {L}( Z^{(K)}, \{X_i^{(K)}\}, \{\Lambda _i^{(K)}\} ) \right) \nonumber \\{} & {} \quad \le J_i^{-1}\left( \mathcal {L}( Z^{(0)}, \{X_i^{(0)}\}, \{\Lambda _i^{(0)}\} ) - \underline{L} \right) . \end{aligned}$$
(44)

Upon taking the limit as \(K \rightarrow \infty \), we obtain that

$$\begin{aligned} \sum \limits _{k = 1}^{\infty } \left\| D^{(k)}\right\| ^2_{\textrm{F}}< \infty \text { and } \sum \limits _{k = 1}^{\infty } {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) < \infty , \end{aligned}$$

which further implies that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \left\| D^{(k)} \right\| _{\textrm{F}}= 0 \text { and } \lim \limits _{k\rightarrow \infty } {\mathbf {d_p}}\left( Z^{(k)},X_i^{(k)}\right) = 0, \end{aligned}$$

respectively. Combing this with Lemma 3, we know that any accumulation point \(Z^{*}\) of sequence \(\{Z^{(k)}\}\) is a first-order stationary point of the problem (2).

Eventually, we prove the sublinear convergence rate. Indeed, it follows from the inequalities (43) and (44) that

$$\begin{aligned}{} & {} \min \limits _{k = 1, \dotsc , K} \left\{ \left\| D^{(k)}\right\| ^2_{\textrm{F}}+\dfrac{1}{d} \sum \limits _{i=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) \right\} \\{} & {} \quad \le \dfrac{1}{K} \sum \limits _{k = 1}^K \left\{ \left\| D^{(k)}\right\| ^2_{\textrm{F}}+\dfrac{1}{d} \sum \limits _{i=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) \right\} \le \dfrac{C}{K}, \end{aligned}$$

where

$$\begin{aligned} C= & {} \bar{M}^{-1}\left( \mathcal {L}( Z^{(1)}, \{X_i^{(1)}\}, \{\Lambda _i^{(1)}\} ) - \underline{L} \right) \\{} & {} + \left( \sum \limits _{i=1}^dJ_i^{-1}\right) d^{-1}\left( \mathcal {L}( Z^{(0)}, \{X_i^{(0)}\}, \{\Lambda _i^{(0)}\} ) - \underline{L} \right) \end{aligned}$$

is a positive constant. This completes the proof. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Liu, X. & Zhang, Y. A communication-efficient and privacy-aware distributed algorithm for sparse PCA. Comput Optim Appl 85, 1033–1072 (2023). https://doi.org/10.1007/s10589-023-00481-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-023-00481-4

Keywords

Navigation