A communication-efficient and privacy-aware distributed algorithm for sparse PCA

Wang, Lei; Liu, Xin; Zhang, Yin

doi:10.1007/s10589-023-00481-4

A communication-efficient and privacy-aware distributed algorithm for sparse PCA

Published: 12 April 2023

Volume 85, pages 1033–1072, (2023)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

394 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Sparse principal component analysis (PCA) improves interpretability of the classic PCA by introducing sparsity into the dimension-reduction process. Optimization models for sparse PCA, however, are generally non-convex, non-smooth and more difficult to solve, especially on large-scale datasets requiring distributed computation over a wide network. In this paper, we develop a distributed and centralized algorithm called DSSAL1 for sparse PCA that aims to achieve low communication overheads by adapting a newly proposed subspace-splitting strategy to accelerate convergence. Theoretically, convergence to stationary points is established for DSSAL1. Extensive numerical results show that DSSAL1 requires far fewer rounds of communication than state-of-the-art peer methods. In addition, we make the case that since messages exchanged in DSSAL1 are well-masked, the possibility of private-data leakage in DSSAL1 is much lower than in some other distributed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generalized inexact Uzawa method for stable principal component pursuit problem with nonnegative constraints

Article 10 May 2017

Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes

Article 22 September 2020

Comparing Classical and Robust Sparse PCA

Data availibility

The authors declare that all data supporting the findings of this study are available within the article.

Notes

Suppose $D^{(k)}$ is the solution to (4). According to Lemma 5.3 in [41], $X^{(k)}$ is a first-order stationary point if $D^{(k)} = 0$. Therefore, the stationarity violation is defined as $\Vert D^{(k)} \Vert _{\textrm{F}}$.
A function f(X) is called orthogonal-transformation invariant if $f (XO) = f(X)$ for any $X \in \mathcal {S}_{n,p}$ and $O \in \mathcal {S}_{p, p}$.
More information at http://lsec.cc.ac.cn/chinese/lsec/LSSC-IVintroduction.pdf.
Our code is downloadable from http://lsec.cc.ac.cn/~liuxin/code.html.
Available from https://eigen.tuxfamily.org/index.php?title=Main_Page.

References

Sjostrand, K., Rostrup, E., Ryberg, C., Larsen, R., Studholme, C., Baezner, H., Ferro, J., Fazekas, F., Pantoni, L., Inzitari, D., et al.: Sparse decomposition and modeling of anatomical shape variation. IEEE Trans. Med. Imaging 26(12), 1625–1635 (2007). https://doi.org/10.1109/TMI.2007.898808
Article Google Scholar
Chen, G., Sullivan, P.F., Kosorok, M.R.: Biclustering with heterogeneous variance. Proc. Natl. Acad. Sci. 110(30), 12253–12258 (2013). https://doi.org/10.1073/pnas.1304376110
Article MathSciNet MATH Google Scholar
Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009)
Article MathSciNet MATH Google Scholar
Zou, H., Xue, L.: A selective overview of sparse principal component analysis. Proc. IEEE 106(8), 1311–1320 (2018). https://doi.org/10.1109/JPROC.2018.2846588
Article Google Scholar
Wu, T., Hu, E., Xu, S., Chen, M., Guo, P., Dai, Z., Feng, T., Zhou, L., Tang, W., Zhan, L., et al.: ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2(3), 100141 (2021). https://doi.org/10.1016/j.xinn.2021.100141
Article Google Scholar
Gravuer, K., Sullivan, J.J., Williams, P.A., Duncan, R.P.: Strong human association with plant invasion success for Trifolium introductions to New Zealand. Proc. Natl. Acad. Sci. 105(17), 6344–6349 (2008). https://doi.org/10.1073/pnas.0712026105
Article Google Scholar
Baden, T., Berens, P., Franke, K., Rosón, M.R., Bethge, M., Euler, T.: The functional diversity of retinal ganglion cells in the mouse. Nature 529(7586), 345–350 (2016). https://doi.org/10.1038/nature16468
Article Google Scholar
Stiefel, E.: Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten. Commentarii Mathematici Helvetici 8(1), 305–353 (1935). https://doi.org/10.3929/ethz-a-000092403
Article MathSciNet MATH Google Scholar
Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003). https://doi.org/10.1198/1061860032148
Article MathSciNet Google Scholar
Magdon-Ismail, M.: NP-hardness and inapproximability of sparse PCA. Inf. Process. Lett. 126, 35–38 (2017). https://doi.org/10.1016/j.ipl.2017.05.008
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006). https://doi.org/10.1198/106186006X113430
Article MathSciNet Google Scholar
d’Aspremont, A., Bach, F., El Ghaoui, L.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9(42), 1269–1294 (2008)
MathSciNet MATH Google Scholar
d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007). https://doi.org/10.1137/050645506
Article MathSciNet MATH Google Scholar
Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99(6), 1015–1034 (2008). https://doi.org/10.1016/j.jmva.2007.06.007
Article MathSciNet MATH Google Scholar
Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009). https://doi.org/10.1093/biostatistics/kxp008
Article MATH Google Scholar
Pacheco, P.S.: An Introduction to Parallel Programming. Elsevier, USA (2011). https://doi.org/10.1016/C2009-0-18471-4
Book Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.y.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp. 1273–1282 (2017). PMLR. https://proceedings.mlr.press/v54/mcmahan17a.html
Lou, Y., Yu, L., Wang, S., Yi, P.: Privacy preservation in distributed subgradient optimization algorithms. IEEE Trans. Cybernetics 48(7), 2154–2165 (2017). https://doi.org/10.1109/TCYB.2017.2728644
Article Google Scholar
Zhang, C., Ahmad, M., Wang, Y.: ADMM based privacy-preserving decentralized optimization. IEEE Trans. Inf. Forensics Secur. 14(3), 565–580 (2018). https://doi.org/10.1109/TIFS.2018.2855169
Article Google Scholar
Manton, J.H.: Optimization algorithms exploiting unitary constraints. IEEE Trans. Signal Process. 50(3), 635–650 (2002). https://doi.org/10.1109/78.984753
Article MathSciNet MATH Google Scholar
Nishimori, Y., Akaho, S.: Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67, 106–135 (2005). https://doi.org/10.1016/j.neucom.2004.11.035
Article Google Scholar
Abrudan, T.E., Eriksson, J., Koivunen, V.: Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans. Signal Process. 56(3), 1134–1147 (2008). https://doi.org/10.1109/tsp.2007.908999
Article MathSciNet MATH Google Scholar
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998). https://doi.org/10.1137/S0895479895290954
Article MathSciNet MATH Google Scholar
Sato, H.: A Dai-Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions. Comput. Optim. Appl. 64(1), 101–118 (2016). https://doi.org/10.1007/s10589-015-9801-1
Article MathSciNet MATH Google Scholar
Zhu, X.: A Riemannian conjugate gradient method for optimization on the Stiefel manifold. Comput. Optim. Appl. 67(1), 73–110 (2017). https://doi.org/10.1007/s10589-016-9883-4
Article MathSciNet MATH Google Scholar
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013). https://doi.org/10.1007/s10107-012-0584-1
Article MathSciNet MATH Google Scholar
Jiang, B., Dai, Y.-H.: A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math. Program. 153(2), 535–575 (2015). https://doi.org/10.1007/s10107-014-0816-7
Article MathSciNet MATH Google Scholar
Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018). https://doi.org/10.1137/17M1142478
Article MathSciNet MATH Google Scholar
Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.-X.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), 2239–2269 (2019). https://doi.org/10.1137/18M121112X
Article MathSciNet MATH Google Scholar
Absil, P.-A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2006). https://doi.org/10.1007/s10208-005-0179-9
Article MathSciNet MATH Google Scholar
Gao, B., Liu, X., Chen, X., Yuan, Y.-X.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018). https://doi.org/10.1137/16M1098759
Article MathSciNet MATH Google Scholar
Wang, L., Gao, B., Liu, X.: Multipliers correction methods for optimization problems over the Stiefel manifold. CSIAM Trans. Appl. Mathemat. 2(3), 508–531 (2021). https://doi.org/10.4208/csiam-am.SO-2020-0008
Article MathSciNet Google Scholar
Gao, B., Liu, X., Yuan, Y.-X.: Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3), 1949–1983 (2019). https://doi.org/10.1137/18m1221679
Article MathSciNet MATH Google Scholar
Xiao, N., Liu, X., Yuan, Y.-X.: A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optimiz. Methods Software (2020). https://doi.org/10.1080/10556788.2020.1852236
Article MATH Google Scholar
Ferreira, O., Oliveira, P.: Subgradient algorithm on Riemannian manifolds. J. Optim. Theory Appl. 97(1), 93–104 (1998). https://doi.org/10.1023/A:1022675100677
Article MathSciNet MATH Google Scholar
Ferreira, O.P., Louzeiro, M.S., Prudente, L.F.: Iteration-complexity of the subgradient method on Riemannian manifolds with lower bounded curvature. Optimization 68(4), 713–729 (2019). https://doi.org/10.1080/02331934.2018.1542532
Article MathSciNet MATH Google Scholar
Bacák, M., Bergmann, R., Steidl, G., Weinmann, A.: A second order nonsmooth variational model for restoring manifold-valued images. SIAM J. Sci. Comput. 38(1), 567–597 (2016). https://doi.org/10.1137/15M101988X
Article MathSciNet MATH Google Scholar
Grohs, P., Hosseini, S.: Nonsmooth trust region algorithms for locally Lipschitz functions on Riemannian manifolds. IMA J. Numer. Anal. 36(3), 1167–1192 (2016). https://doi.org/10.1093/imanum/drv043
Article MathSciNet MATH Google Scholar
Hosseini, S., Uschmajew, A.: A Riemannian gradient sampling algorithm for non smooth optimization on manifolds. SIAM J. Optim. 27(1), 173–189 (2017). https://doi.org/10.1137/16M1069298
Article MathSciNet MATH Google Scholar
Chen, S., Ma, S., Man-Cho So, A., Zhang, T.: Proximal gradient method for non smooth optimization over the Stiefel manifold. SIAM J. Optim. 30(1), 210–239 (2020). https://doi.org/10.1137/18M122457X
Article MathSciNet MATH Google Scholar
Huang, W., Wei, K.: Riemannian proximal gradient methods. Mathemat. Program. (2021). https://doi.org/10.1007/s10107-021-01632-3
Article Google Scholar
Xiao, X., Li, Y., Wen, Z., Zhang, L.: A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76(1), 364–389 (2018)
Article MathSciNet MATH Google Scholar
Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014). https://doi.org/10.1007/s10915-013-9740-x
Article MathSciNet MATH Google Scholar
Kovnatsky, A., Glashoff, K., Bronstein, M.M.: MADMM: A generic algorithm for non-smooth optimization on manifolds. In: European conference on computer vision, pp. 680–696 (2016). Springer
Chen, W., Ji, H., You, Y.: An augmented lagrangian method for $\ell _{1}$-regularized optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 38(4), 570–592 (2016). https://doi.org/10.1137/140988875
Article MathSciNet Google Scholar
Hajinezhad, D., Hong, M.: Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis. In: 2015 IEEE global conference on signal and information processing, pp. 255–259 (2015). https://doi.org/10.1109/GlobalSIP.2015.7418196
Wang, L., Liu, X., Zhang, Y.: A distributed and secure algorithm for computing dominant SVD based on projection splitting. arXiv:2012.03461 (2020)
Gemp, I., McWilliams, B., Vernade, C., Graepel, T.: Eigengame: PCA as a nash equilibrium. arXiv:2010.00554 (2020)
Gang, A., Bajwa, W.U.: A linearly convergent algorithm for distributed principal component analysis. arXiv:2101.01300 (2021)
Gang, A., Bajwa, W.U.: FAST-PCA: A fast and exact algorithm for distributed principal component analysis. arXiv:2108.12373 (2021)
Andrade, F.L., Figueiredo, M.A., Xavier, J.: Distributed Picard iteration: application to distributed EM and distributed PCA. arXiv:2106.10665 (2021)
Ye, H., Zhang, T.: DeEPCA: decentralized exact PCA with linear convergence rate. J. Mach. Learn. Res. 22(238), 1–27 (2021)
MathSciNet MATH Google Scholar
Chen, S., Garcia, A., Hong, M., Shahrampour, S.: Decentralized Riemannian gradient descent on the Stiefel manifold. In: proceedings of the 38th international conference on machine learning, VOL. 139, PP. 1594–1605 (2021). https://proceedings.mlr.press/v139/chen21g.html
Wang, L., Liu, X.: Decentralized optimization over the Stiefel manifold by an approximate augmented Lagrangian function. IEEE Trans. Signal Process. 70, 3029–3041 (2022). https://doi.org/10.1109/TSP.2022.3182883
Article MathSciNet Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, ASIA (1990)
Book MATH Google Scholar
Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pacific J. Optim. 10(2), 415–434 (2014)
MathSciNet MATH Google Scholar
Arrow, K.J., Azawa, H., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming vol. 2. Stanford University Press, (1958)
He, B., You, Y., Yuan, X.: On the convergence of primal-dual hybrid gradient algorithm. SIAM J. Imag. Sci. 7(4), 2526–2537 (2014). https://doi.org/10.1137/140963467
Article MathSciNet MATH Google Scholar
Xiao, N., Liu, X., Yuan, Y.-x.: A penalty-free infeasible approach for a class of nonsmooth optimization problems over the Stiefel manifold. arXiv:2103.03514 (2021)
Rutishauser, H.: Simultaneous iteration method for symmetric matrices. Numer. Math. 16(3), 205–223 (1970). https://doi.org/10.1007/BF02219773
Article MathSciNet MATH Google Scholar
Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, sub gradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017). https://doi.org/10.1007/s10957-017-1093-4
Article MathSciNet MATH Google Scholar
Jiang, B., Ma, S., So, A.M.-C., Zhang, S.: Vector transport-free SVRG with general retraction for Riemannian optimization: complexity analysis and practical implementation. arXiv:1705.09059 (2017)

Download references

Funding

The work of the first author was supported by the National Key R &D Program of China (No. 2020YFA0711900, 2020YFA0711904). The work of the second author was supported in part by the National Natural Science Foundation of China (No. 12125108, 11971466, 12288201, 12021001, 11991021) and Key Research Program of Frontier Sciences, Chinese Academy of Sciences (No. ZDBS-LY-7022). The work of the third author was supported in part by the Shenzhen Science and Technology Program (No. GXWD20201231105722002-20200901175001001).

Author information

Authors and Affiliations

State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
Lei Wang & Xin Liu
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
Lei Wang & Xin Liu
School of Data Science, The Chinese University of Hong Kong, Shenzhen, China
Yin Zhang

Authors

Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Liu.

Ethics declarations

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A proof of lemma 1

Proof of Lemma 1

According to the definition of $\textrm{Proj}_{\mathcal {T}_{Z} \mathcal {S}_{n,p}} \left( \cdot \right) $, it follows that

$$\begin{aligned}{} & {} \left\| \textrm{Proj}_{\mathcal {T}_{Z} \mathcal {S}_{n,p}} \left( -A A^{\top }Z + R(Z)\right) \right\| ^2_{\textrm{F}}\\ = {}{} & {} \dfrac{1}{4}\left\| Z^{\top }\left( -A A^{\top }Z + R(Z)\right) - \left( -A A^{\top }Z + R(Z)\right) ^{\top }Z\right\| ^2_{\textrm{F}}\\{} & {} + \left\| {\textbf{P}}^{\perp }_{Z} \left( -A A^{\top }Z + R(Z)\right) \right\| ^2_{\textrm{F}}\\ = {}{} & {} \left\| {\textbf{P}}^{\perp }_{Z} \left( -A A^{\top }Z + R(Z)\right) \right\| ^2_{\textrm{F}}+ \dfrac{1}{4}\left\| Z^{\top }R(Z) - R(Z)^{\top }Z\right\| ^2_{\textrm{F}}, \end{aligned}$$

where $R(Z) \in \partial r(Z)$. The proof is completed.$\square $

Appendix B proof of proposition 2

Proof of Proposition 2

To begin with, we assume that $\left( Z, \{X_i\} \right) $ is a first-order stationary point. Then there exists $R(Z) \in \partial r(Z)$ such that

$$\begin{aligned} {\textbf{P}}^{\perp }_{Z} \left( - A A^{\top }Z + R(Z)\right) = 0, \end{aligned}$$

and $Z^{\top }R(Z)$ is symmetric. Let $\Theta = Z^{\top }R(Z) \in Z^{\top }\partial r(Z)$, $\Gamma _i = - X_i^{\top }A_i A_i^{\top }X_i$, and

$$\begin{aligned} \Lambda _i = - {{\textbf {P}}}^{\perp }_{X_i} A_i A_i^{\top }X_i X_i^{\top }- X_i X_i^{\top }A_i A_i^{\top }{{\textbf {P}}}^{\perp }_{X_i}, \end{aligned}$$

with $i=1,\dotsc ,d$. Then the matrices $\Theta $, $\Gamma _i$ and $\Lambda _i$ are symmetric and $\textrm{rank}\left( \Lambda _i\right) \le 2p$. Moreover, we can deduce that

$$\begin{aligned} A_i A_i^{\top }X_i + X_i \Gamma _i + \Lambda _i X_i =A_i A_i^{\top }X_i - X_iX_i^{\top }A_i A_i^{\top }X_i - {\textbf{P}}^{\perp }_{X_i} A_i A_i^{\top }X_i = 0, \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} R(Z) + \sum \limits _{i=1}^d\Lambda _i Z - Z \Theta = {}&R(Z) - \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{Z} A_i A_i^{\top }Z - ZZ^{\top }R(Z) \\ = {}&{\textbf{P}}^{\perp }_{Z} \left( - A A^{\top }Z + R(Z)\right) = 0. \end{aligned} \end{aligned}$$

Hence, $\left( Z, \{X_i\} \right) $ satisfies the conditions in (9) under these specific choices of $\Theta $, $\Gamma _i$ and $\Lambda _i$.

Conversely, we now assume that there exist $R(Z) \in \partial r(Z)$ and symmetric matrices $\Theta $, $\Gamma _i$ and $\Lambda _i$ such that $\left( Z, \{X_i\} \right) $ satisfies the conditions in (9). It follows from the first and second equality in (9) that

$$\begin{aligned} \begin{aligned}&\sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} A_i A_i^{\top }X_i X_i^{\top }= - \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} \left( X_i\Gamma _i + \Lambda _i X_i \right) X_i^{\top }= - \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} \Lambda _i X_i X_i^{\top }\\&= - {\textbf{P}}^{\perp }_{Z} \left( \sum \limits _{i=1}^d\Lambda _i Z \right) Z^{\top }= {\textbf{P}}^{\perp }_{Z}\left( R(Z) - Z\Theta \right) Z^{\top }= {\textbf{P}}^{\perp }_{Z} R(Z) Z^{\top }. \end{aligned} \end{aligned}$$

At the same time, since $X_i X_i^{\top }= Z Z^{\top }$, we have

$$\begin{aligned} \sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{X_i} A_i A_i^{\top }X_i X_i^{\top }={}&\sum \limits _{i=1}^d{\textbf{P}}^{\perp }_{Z} A_i A_i^{\top }Z Z^{\top }= {\textbf{P}}^{\perp }_{Z} A A^{\top }Z Z^{\top }. \end{aligned}$$

Combining the above two equalities and orthogonality of Z, we arrive at

$$\begin{aligned} {\textbf{P}}^{\perp }_{Z} \left( - A A^{\top }Z + R(Z)\right) = 0. \end{aligned}$$

Left-multiplying both sides of the second equality in (9) by $Z^{\top }$, we obtain that

$$\begin{aligned} Z^{\top }R(Z) = \Theta - \sum \limits _{i=1}^dZ^{\top }\Lambda _i Z, \end{aligned}$$

which together with the symmetry of $\Lambda _i$ and $\Theta $ implies that $Z^{\top }R(Z)$ is also symmetric. This completes the proof. $\square $

Appendix C proof of lemma 3

Proof of Lemma 3

Since $( Z^{(k)}, \{X_i^{(k)}\} )$ is feasible, we know $X_i^{(k)} (X_i^{(k)})^{\top }{=} Z^{(k)} (Z^{(k)})^{\top }$ for $i=1,\dotsc ,d$.

Thus, it can be readily verified that

$$\begin{aligned} \begin{aligned} Q^{(k)} Z^{(k)} = {}&\sum \limits _{i=1}^d\left( \Lambda _i^{(k)} - \beta _i X_i^{(k)} (X_i^{(k)})^{\top }\right) Z^{(k)} \\ = {}&\sum \limits _{i=1}^d\left( - {\textbf{P}}^{\perp }_{Z^{(k)}} A_i A_i^{\top }Z^{(k)} (Z^{(k)})^{\top }- \beta _i Z^{(k)} (Z^{(k)})^{\top }\right) Z^{(k)} \\ = {}&- {\textbf{P}}^{\perp }_{Z^{(k)}} A_i A_i^{\top }Z^{(k)} - \left( \sum \limits _{i=1}^d\beta _i \right) Z^{(k)}, \end{aligned} \end{aligned}$$

which implies that

$$\begin{aligned} \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( Q^{(k)}Z^{(k)}\right) = \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( - A_i A_i^{\top }Z^{(k)}\right) . \end{aligned}$$

According to Theorem 4.1 in [57], the first-order optimality condition of (18) can be stated as:

$$\begin{aligned} 0 \in \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( Q^{(k)} Z^{(k)} + \dfrac{1}{\eta } D^{(k)} + \partial r(Z^{(k)} + D^{(k)})\right) . \end{aligned}$$

Since $D^{(k)} = 0$ is the global minimizer of (18), we have

$$\begin{aligned} \begin{aligned} 0 \in {}&\textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( Q^{(k)} Z^{(k)} + \partial r(Z^{(k)})\right) \\ = {}&\textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( - A_i A_i^{\top }Z^{(k)} + \partial r(Z^{(k)})\right) . \end{aligned} \end{aligned}$$

We obtain the assertion of this lemma. $\square $

Appendix D convergence of algorithm 2

Now we prove Theorem 4 to establish the global convergence of Algorithm 2. In addition to the notations introduced in Sect. 1, we further adopt the followings throughout the theoretical analysis. The notations $\textrm{rank}\left( C\right) $ and $\sigma _{\min } \left( C\right) $ represent the rank and the smallest singular value of $C$, respectively. For $X, Y \in \mathcal {S}_{n,p}$, we define ${\mathbf {D_p}}\left( X,Y\right) := XX^{\top }- YY^{\top }$ and ${\mathbf {d_p}}\left( X,Y\right) := \left\| {\mathbf {D_p}}\left( X,Y\right) \right\| _{\textrm{F}}$, standing for, respectively, the projection distance matrix and its measurement.

To begin with, we provide a sketch of our proof. Suppose $\{Z^{(k)}\}$ is the iteration sequence generated by Algorithm 2, with $X_i^{(k)}$ and $\Lambda _i^{(k)}$ being the local variable and multiplier of the i-th agent at the k-th iteration, respectively. The proof includes the following main steps.

1.
The sequence $\{Z^{(k)}\}$ is bounded and the sequence $\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}$ is bounded from below.
2.
The sequence $\{Z^{(k)}\}$ satisfies ${\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 ( 1 -\underline{\sigma }^2 )$, and $\underline{\sigma }$ is a unified lower bound of the smallest singular values of the matrices $(X_i^k)^{\top }Z^{k+1}(i=1,\dotsc ,d)$.
3.
The sequence $\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}$ is monotonically non-increasing, and hence is convergent.
4.
The sequence $\{Z^{(k)}\}$ has at least one accumulation point, and any accumulation point is a first-order stationary point of the sparse PCA problem (2).

Next we verify all the items in the above sketch by proving the following lemmas and corollaries.

Lemma 5

Suppose $\{Z^{(k)}\}$ is the iterate sequence generated by Algorithm 2. Let

$$\begin{aligned} g^{(k)} (D) = \left\langle Q^{(k)}Z^{(k)}, D\right\rangle + \dfrac{1}{2\eta } \left\| D\right\| ^2_{\textrm{F}}+ r(Z^{(k)} + D). \end{aligned}$$

Then the following relationship holds for any $k \in \mathbb {N}$,

$$\begin{aligned} g^{(k)} (0) - g^{(k)} (D^{(k)}) \ge \dfrac{1}{2\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned}$$

Proof

Since $g^{(k)}$ is strongly convex with modulus $\dfrac{1}{\eta }$, we have

$$\begin{aligned} g^{(k)} (\hat{D}) \ge g^{(k)} (D) + \left\langle \partial g^{(k)} (D), \hat{D} - D\right\rangle + \dfrac{1}{2\eta } \left\| \hat{D} - D\right\| ^2_{\textrm{F}}, \end{aligned}$$

(30)

for any $D, \hat{D} \in \mathbb {R}^{n\times p}$. In particular, if $\hat{D}, D \in \mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}$, it holds that

$$\begin{aligned} \left\langle \partial g^{(k)} (D), \hat{D} - D\right\rangle = \left\langle \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( \partial g^{(k)} (D)\right) , \hat{D} - D\right\rangle . \end{aligned}$$

It follows from the first-order optimality condition of (18) that $0 \in \textrm{Proj}_{\mathcal {T}_{Z^{(k)}} \mathcal {S}_{n,p}} \left( \partial g^{(k)}\right) $ ${ (D^{(k)})}$. Finally, taking $\hat{D} = 0$ and $D = D^{(k)}$ in (30) yields the assertion of this lemma. $\square $

Lemma 6

Suppose $Z \in \mathcal {S}_{n,p}$ and $D \in \mathcal {T}_{Z} \mathcal {S}_{n,p}$. Then it holds that

$$\begin{aligned} \left\| \textrm{Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z\right\| _{\textrm{F}}\le \left\| D\right\| _{\textrm{F}}, \end{aligned}$$

and

$$\begin{aligned} \left\| \textrm{Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z - D\right\| _{\textrm{F}}\le \dfrac{1}{2} \left\| D\right\| ^2_{\textrm{F}}. \end{aligned}$$

Proof

The proof can be found in, for example, [63]. For the sake of completeness, we provide a proof here. It follows from the orthogonality of Z and the skew-symmetry of $Z^{\top }D$ that $Z + D$ has full column rank. This yields that $\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) = (Z + D)F^{-1}$, where $F = (I_p + D^{\top }D)^{1/2}$. Since $\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) - Z = ( Z (I_p - F) + D ) F^{-1}$, we have

$$\begin{aligned} \begin{aligned} \left\| \textrm{Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z\right\| ^2_{\textrm{F}}= {}&2\textrm{tr}\left( I_p - F^{-1}\right) - 2 \textrm{tr}\left( F^{-1}Z^{\top }D\right) = 2\textrm{tr}\left( I_p - F^{-1}\right) \\ = {}&2 \sum \limits _{j=1}^d\left( 1 - \left( 1 + \tilde{\sigma }_i^2 \right) ^{-1/2} \right) \le \sum \limits _{j=1}^d\tilde{\sigma }_i^2 = \left\| D\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

where $\tilde{\sigma }_1 \ge \cdots \ge \tilde{\sigma }_d \ge 0$ are the singular values of D. Similarly, it follows from the relationship $\textrm{Proj}_{\mathcal {S}_{n,p}} (Z + D) - Z - D = (Z + D) (F^{-1}- I_p)$ that

$$\begin{aligned} \begin{aligned} \left\| \text {Proj}_{\mathcal {S}_{n,p}}\left( Z + D\right) - Z - D\right\| _{\text {F}}^2= {}&\text {tr}\left( \left( I_p - F\right) ^2 \right) = \sum \limits _{j=1}^d\left( 1 - \left( 1 + \tilde{\sigma }_i^2\right) ^{1/2} \right) ^2 \\ \le {}&\dfrac{1}{4} \sum \limits _{j=1}^d\tilde{\sigma }_i^4 = \dfrac{1}{4} \left\| D\right\| _{\text {F}}^4, \end{aligned} \end{aligned}$$

which completes the proof. $\square $

Corollary 7

Suppose $\{Z^{(k)}\}$ is the iterate sequence generated by Algorithm 2 with the parameters satisfying Condition 1. Then for any $k \in \mathbb {N}$, it holds that

$$\begin{aligned} \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned}$$

(31)

where $\bar{M} > 0$ is a constant defined in Sect. 4.

Proof

Firstly, it can be readily verified that

$$\begin{aligned} \left\| Q^{(k)}\right\| _{\textrm{F}}\le \sum \limits _{i=1}^d\left\| Q_i^{(k)}\right\| _{\textrm{F}}\le \sum \limits _{i=1}^d\left( 2 \left\| A_i\right\| ^2_{\textrm{F}}+ \sqrt{p} \beta _i \right) . \end{aligned}$$

Let $\bar{q}^{(k)} (Z) = \textrm{tr}(Z^{\top }Q^{(k)} Z) / 2$ bethe smooth part of the objective function $q^{(k)} (Z)$ in (16). Since $\nabla \bar{q}^{(k)}$ is Lipschitz continuous with the corresponding Lipschitz constant $\left\| Q^{(k)}\right\| _{\textrm{F}}$, we have

$$\begin{aligned} \begin{aligned} \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) \le {}&\left\langle Q^{(k)} Z^{(k)}, Z^{(k+1)} - Z^{(k)}\right\rangle \\&+ \dfrac{1}{2} \left\| Q^{(k)}\right\| _{\textrm{F}}\left\| Z^{(k+1)} - Z^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

It follows from Lemma 6 that

$$\begin{aligned} \begin{aligned} \left\langle Q^{(k)} Z^{(k)}, Z^{(k+1)} - Z^{(k)} - D^{(k)}\right\rangle \le {}&\left\| Q^{(k)} Z^{(k)}\right\| _{\textrm{F}}\left\| Z^{(k+1)} - Z^{(k)} - D^{(k)}\right\| _{\textrm{F}}\\ \le {}&\sum \limits _{i=1}^d\left( \left\| A_i\right\| ^2_{\textrm{F}}+ \dfrac{\sqrt{p}}{2} \beta _i \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \dfrac{1}{2} \left\| Q^{(k)}\right\| _{\textrm{F}}\left\| Z^{(k+1)} - Z^{(k)}\right\| ^2_{\textrm{F}}\le \sum \limits _{i=1}^d\left( \left\| A_i\right\| ^2_{\textrm{F}}+ \dfrac{\sqrt{p}}{2} \beta _i \right) \left\| D^k\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

Combing the above three inequalities, we can obtain that

$$\begin{aligned} \begin{aligned} \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) \le \left\langle Q^{(k)} Z^{(k)}, D^{(k)}\right\rangle + \sum \limits _{i=1}^d\left( 2\left\| A_i\right\| ^2_{\textrm{F}}+ \beta _i \sqrt{p} \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

It follows from Lemma 5 that

$$\begin{aligned} \begin{aligned}&\left\langle Q^{(k)} Z^{(k)}, D^{(k)}\right\rangle + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&= g^{(k)} (D^{(k)}) - g^{(k)} (0) - \dfrac{1}{2\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}\le - \dfrac{1}{\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

which infers that

$$\begin{aligned} \begin{aligned}&\bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&\le \sum \limits _{i=1}^d\left( 2\left\| A_i\right\| ^2_{\textrm{F}}+ \beta _i \sqrt{p} \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}- \dfrac{1}{\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

This together with the Lipschitz continuity of r(Z) yields that

$$\begin{aligned} \begin{aligned}&q^{(k)} (Z^{(k+1)}) - q^{(k)} (Z^{(k)}) \\&= \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k+1)}) - r(Z^{(k)}) \\&= \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&\quad + r(Z^{(k+1)}) - r(Z^{(k)} + D^{(k)}) \\&\le \bar{q}^{(k)} (Z^{(k+1)}) - \bar{q}^{(k)} (Z^{(k)}) + r(Z^{(k)} + D^{(k)}) - r(Z^{(k)}) \\&\quad + \mu \sqrt{np} \left\| Z^{(k+1)} - Z^{(k)} - D^{(k)}\right\| _{\textrm{F}}\\&\le \sum \limits _{i=1}^d\left( 2\left\| A_i\right\| ^2_{\textrm{F}}+ \beta _i \sqrt{p} \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}- \dfrac{1}{\eta } \left\| D^{(k)}\right\| ^2_{\textrm{F}}+ \dfrac{\mu }{2} \sqrt{np} \left\| D^{(k)}\right\| ^2_{\textrm{F}}\\&= \left( \bar{M} - \dfrac{1}{\eta } \right) \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

Here, $\bar{M} > 0$ is a constant defined in Sect. 4. According to Condition 1, we know that $\bar{M} - 1/\eta \le -\bar{M}$. Hence, we finally arrive at

$$\begin{aligned} \begin{aligned} \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\}) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\}) = {}&q^{(k)} (Z^{(k)}) - q^{(k)} (Z^{(k+1)}) \\ \ge {}&\bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

This completes the proof. $\square $

Lemma 8

Suppose $\{Z^{(k)}\}$ is the iterate sequence generated by Algorithm 2 with the parameters satisfying Condition 1. Then for any $k \in \mathbb {N}$, it can be verified that

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le \rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) + \dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) , \end{aligned}$$

(32)

where $\rho \ge 1$ is a constant defined in Sect. 4.

Proof

The inequality (31) directly results in the following relationship.

$$\begin{aligned} {q}^{(k)} (Z^{(k)}) - {q}^{(k)} (Z^{(k+1)}) \ge 0. \end{aligned}$$

According to the definition of $q^{(k)}$, it follows that

$$\begin{aligned} \begin{aligned} 0 \le {}&\dfrac{1}{2} \textrm{tr}\left( (Z^{(k)})^{\top }Q^{(k)} Z^{(k)} \right) - \dfrac{1}{2} \textrm{tr}\left( (Z^{(k+1)})^{\top }Q^{(k)} Z^{(k+1)} \right) + r(Z^{(k)}) - r(Z^{(k+1)}) \\ \le {}&\dfrac{1}{2} \sum \limits _{j=1}^d\textrm{tr}\left( \left( \beta _j X_j^{(k)} (X_j^{(k)})^{\top }- \Lambda _j^{(k)} \right) {\mathbf {D_p}}\left( Z^{(k+1)},Z^{(k)}\right) \right) + 2 \mu n p. \end{aligned} \end{aligned}$$

By straightforward calculations, we can deduce that

$$\begin{aligned} \begin{aligned} \sum \limits _{j=1}^d\textrm{tr}\left( \Lambda _j^{(k)} {\mathbf {D_p}}\left( Z^{(k)},Z^{(k+1)}\right) \right) \le {}&\sum \limits _{j=1}^d\left\| \Lambda _j^{(k)}\right\| _{\textrm{F}}{\mathbf {d_p}}\left( Z^{(k+1)},Z^{(k)}\right) \\ \le {}&4 \sqrt{p} \sum \limits _{j=1}^d\left\| A_j\right\| ^2_{\textrm{F}}= 4\sqrt{p} \left\| A\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\sum \limits _{j=1}^d\beta _j \textrm{tr}\left( X_j^{(k)} (X_j^{(k)})^{\top }{\mathbf {D_p}}\left( Z^{(k+1)},Z^{(k)}\right) \right) \\&= \dfrac{1}{2} \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) - \dfrac{1}{2} \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) . \end{aligned} \end{aligned}$$

The above three inequalities yield that

$$\begin{aligned} \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) \le \sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) + 8 \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ 8 \mu n p, \end{aligned}$$

which further implies that

$$\begin{aligned} \begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le {}&\dfrac{1}{\beta _i}\sum \limits _{j=1}^d\beta _j {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) \\ \le {}&\rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_j^{(k)}\right) + \dfrac{8}{\beta _i}\left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) . \end{aligned} \end{aligned}$$

This completes the proof. $\square $

Lemma 9

Suppose $Z^{(k+1)}$ is the $(k+1)$-th iterate generated by Algorithm 2 and satisfies the following condition:

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) , \end{aligned}$$

where $\underline{\sigma } \in (0,1)$ is a constant defined in Condition 1. Let the algorithm parameters satisfy Conditions 1 and 2. Then for any $i=1,\dotsc ,d$, it holds that

$$\begin{aligned} h_i^{(k)} ( X_i^{(k)} ) - h_i^{(k)} ( X_i^{(k+1)} ) \ge \frac{1}{4} \underline{\sigma }^2 c_i \beta _i {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned}$$

(33)

and

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \le \left( 1 - c_i \underline{\sigma }^2\right) {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) + \dfrac{12}{\beta _i} \sqrt{p} \left\| A_i\right\| ^2_{\textrm{F}}. \end{aligned}$$

(34)

Proof

It follows from Condition 2 that $\beta _i > c_i^{\prime } \left\| A_i\right\| _2^2$, which together with (25) yields that

$$\begin{aligned} h_i^{(k)} ( X_i^{(k)} ) - h_i^{(k)} ( X_i^{(k+1)} ) \ge \dfrac{c_i}{2\beta _i} \left\| {{\textbf {P}}}^{\perp }_{X_i^{(k)}} H_i^{(k)} X_i^{(k)} \right\| ^2_{\text {F}}. \end{aligned}$$

(35)

And it can be checked that

$$\begin{aligned} \begin{aligned}&{\textbf{P}}^{\perp }_{X_i^{(k)}} H_i^{(k)} X_i^{(k)} = {\textbf{P}}^{\perp }_{X_i^{(k)}} \left( A_i A_i^{\top }X_i^{(k)} + \Lambda _i^{(k)} X_i^{(k)} + \beta _i Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)} \right) \\&= {\textbf{P}}^{\perp }_{X_i^{(k)}} \left( A_i A_i^{\top }X_i^{(k)} - {\textbf{P}}^{\perp }_{X_i^{(k)}} A_i A_i^{\top }X_i^{(k)} \right) -\beta _i {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)} \\&= -\beta _i {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)}. \end{aligned} \end{aligned}$$

(36)

Suppose $\hat{\sigma }_1, \dotsc , \hat{\sigma }_p$ are the singular values of $(X_i^{(k)})^{\top }Z^{(k+1)}$. It is clear that $0 \le \hat{\sigma }_i \le 1$ for any $i = 1, \dotsc , p$ due to the orthogonality of $X_i^{(k)}$ and $Z^{(k+1)}$. On the one hand, we have

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) = \left\| X_i^{(k)}(X_i^{(k)})^{\top }- Z^{(k+1)}(Z^{(k+1)})^{\top }\right\| ^2_{\textrm{F}}= 2\sum \limits _{j=1}^p \left( 1 - \hat{\sigma }_j^2 \right) . \end{aligned}$$

On the other hand, it follows from ${\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) $ that

$$\begin{aligned} \sigma _{\min }\left( (X_i^{(k)})^{\top }Z^{(k+1)} \right) \ge \underline{\sigma }. \end{aligned}$$

Let $Y_i^{(k)} = (X_i^{(k)})^{\top }Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)}$. By straightforward calculations, we can derive that

$$\begin{aligned} \begin{aligned}&\left\| {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)}\right\| ^2_{\textrm{F}}= \textrm{tr}\left( Y_i^{(k)}\right) - \textrm{tr}\left( (Y_i^{(k)})^2\right) = \sum \limits _{j=1}^p \hat{\sigma }_j^2 \left( 1 - \hat{\sigma }_j^2 \right) \\&\ge \sum \limits _{j=1}^p \sigma _{\min }^2\left( (X_i^{(k)})^{\top }Z^{(k+1)} \right) \left( 1 - \hat{\sigma }_j^2 \right) \ge \dfrac{1}{2} \underline{\sigma }^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

(37)

Combining (35), (36) and (37), we acquire the assertion (33). Then it follows from the definition of $h_i^{(k)}$ that

$$\begin{aligned} \begin{aligned} c_i \underline{\sigma }^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le {}&2 \textrm{tr}\left( Z^{(k+1)}(Z^{(k+1)})^{\top }{\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) \\&+ \dfrac{2}{\beta _i} \textrm{tr}\left( \left( A_i A_i^{\top }+ \Lambda _i^{(k)} \right) {\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) . \end{aligned} \end{aligned}$$

By straightforward calculations, we can obtain that

$$\begin{aligned} \begin{aligned} \textrm{tr}\left( \left( A_i A_i^{\top }+ \Lambda _i^{(k)} \right) {\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) \le {}&\left\| A_i A_i^{\top }+ \Lambda _i^{(k)} \right\| _{\textrm{F}}{\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \\ \le {}&6 \sqrt{p}\left\| A_i\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\textrm{tr}\left( Z^{(k+1)}(Z^{(k+1)})^{\top }{\mathbf {D_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \right) \\&= \dfrac{1}{2} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) - \dfrac{1}{2} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned} \end{aligned}$$

The above three relationships yield (34). We complete the proof. $\square $

Lemma 10

Suppose $\{Z^{(k)}\}$ is the iterate sequence generated by Algorithm 2 initiated from $Z^{(0)} \in \mathcal {S}_{n,p}$ with the parameters satisfying Conditions 1 and 2. Then for any $i=1,\dotsc ,d$ and $k \in \mathbb {N}$, it holds that

$$\begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) . \end{aligned}$$

(38)

Proof

We use mathematical induction to prove this lemma. To begin with, it follows from the inequality (32) that

$$\begin{aligned} \begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(1)},X_i^{(0)}\right) \le {}&\rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(0)},X_j^{(0)}\right) + \dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) \\ = {}&\dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) \le 2 \left( 1 - \underline{\sigma }^2\right) , \end{aligned} \end{aligned}$$

under the relationship $\beta _i > 4 (\sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p ) / ( 1 - \underline{\sigma }^2 )$ in Condition 2. Thus, the argument (38) directly holds for $( Z^{(1)}, \{X_i^{(0)}\} )$. Now, we assume the argument holds at $( Z^{(k+1)}, \{X_i^{(k)}\} )$, and investigate the situation at $( Z^{(k+2)}, \{X_i^{(k+1)}\} )$.

According to Condition 2, we have $12 \sqrt{p} \left\| A_i\right\| ^2_{\textrm{F}}/\beta _i < 2\left( 1 - \underline{\sigma }^2\right) c_i \underline{\sigma }^2$.

Since we assume that ${\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) \le 2 \left( 1 -\underline{\sigma }^2\right) $, it follows from the relationship (34) that

$$\begin{aligned} \begin{aligned} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \le {}&\left( 1 - c_i \underline{\sigma }^2 \right) {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) + \dfrac{12}{\beta _i} \sqrt{p} \left\| A_i\right\| ^2_{\textrm{F}}\\ \le {}&2\left( 1 - \underline{\sigma }^2\right) \left( 1 - c_i \underline{\sigma }^2\right) + 2\left( 1 - \underline{\sigma }^2\right) c_i\underline{\sigma }^2 = 2\left( 1 - \underline{\sigma }^2\right) , \end{aligned} \end{aligned}$$

which infers that $\sigma _{\min } \left( (X_i^{(k+1)})^{\top }Z^{(k+1)} \right) \ge \underline{\sigma }$. Similar to the proof of Lemma 9, we can acquire that

$$\begin{aligned} \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k+1)} \right\| ^2_{\textrm{F}}\ge \dfrac{1}{2} \underline{\sigma }^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned}$$

(39)

Combining the condition (26) and the equality (36), we have

$$\begin{aligned} \begin{aligned}&\left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} H_i^{(k)} X_i^{(k+1)} \right\| _{\textrm{F}}\le \delta _i \left\| {\textbf{P}}^{\perp }_{X_i^{(k)}} H_i^{(k)} X_i^{(k)} \right\| _{\textrm{F}}\\&= \delta _i \beta _i \left\| {\textbf{P}}^{\perp }_{X_i^{(k)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k)} \right\| _{\textrm{F}}\le \delta _i \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

(40)

On the other hand, it follows from the triangular inequality that

$$\begin{aligned} \begin{aligned}&\left\| {{\textbf {P}}}^{\perp }_{X_i^{(k+1)}} H_i^{(k)} X_i^{(k+1)} \right\| _{\text {F}}\\ \ge {}&\left\| {{\textbf {P}}}^{\perp }_{X_i^{(k+1)}} \left( A_i A_i^{\top }+ \Lambda _i^{(k+1)} + \beta _i Z^{(k+1)}(Z^{(k+1)})^{\top }\right) X_i^{(k+1)} \right\| _{\text {F}}\\ {}&- \left\| {{\textbf {P}}}^{\perp }_{X_i^{(k+1)}} \left( \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right) X_i^{(k+1)} \right\| _{\text {F}}. \end{aligned} \end{aligned}$$

Combing the inequality (39), it can be verified that

$$\begin{aligned} \begin{aligned}&\left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} \left( A_i A_i^{\top }+ \Lambda _i^{(k+1)} + \beta _i Z^{(k+1)}(Z^{(k+1)})^{\top }\right) X_i^{(k+1)} \right\| _{\textrm{F}}\\&= \beta _i \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} Z^{(k+1)}(Z^{(k+1)})^{\top }X_i^{(k+1)} \right\| _{\textrm{F}}\ge \dfrac{\sqrt{2}}{2}\underline{\sigma } \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned} \end{aligned}$$

Moreover, according to Lemma B.4 in [48], we have

$$\begin{aligned} \begin{aligned} \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} \left( \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right) X_i^{(k+1)} \right\| _{\textrm{F}}\le {}&\left\| \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right\| _{\textrm{F}}\\ \le {}&4\left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Combing the above three inequalities, we further obtain that

$$\begin{aligned} \begin{aligned} \left\| {\textbf{P}}^{\perp }_{X_i^{(k+1)}} H_i^{(k)} X_i^{(k+1)} \right\| _{\textrm{F}}\ge {}&\dfrac{\sqrt{2}}{2}\underline{\sigma } \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \\&- 4\left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Together with (40), this yields that

$$\begin{aligned} \begin{aligned}&\dfrac{\sqrt{2}}{2} \underline{\sigma } \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \\&\le \delta _i \beta _i {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) + 4\left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \\&\le \left( \delta _i \beta _i + 4 \left\| A_i\right\| _2^2 \right) {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) + 4 \left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) . \end{aligned} \end{aligned}$$

According to Conditions 1 and 2, we have $\sqrt{2} \underline{\sigma } \beta _i - 8 \left\| A_i\right\| _2^2 > 0$ and $\underline{\sigma } - 2\sqrt{\rho d} \delta _i > 0$. Thus, it can be verified that

$$\begin{aligned} \begin{aligned} {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \le {}&\dfrac{ 2 ( \delta _i \beta _i + 4 \left\| A_i\right\| _2^2 ) }{ \sqrt{2} \underline{\sigma } \beta _i - 8 \left\| A_i\right\| _2^2 } {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) \\ \le {}&\sqrt{\dfrac{1}{2 \rho d}} {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned} \end{aligned}$$

(41)

where the last inequality follows from the relationship $\beta > \dfrac{4 \left( 2 \sqrt{\rho d} + \sqrt{2}\right) \left\| A_i\right\| _2^2}{\underline{\sigma } - 2 \sqrt{\rho d} \delta _i}$ in Condition 2. This together with (32) and (38) yields that

$$\begin{aligned} \begin{aligned}&{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+2)},X_i^{(k+1)}\right) \le \rho \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k+1)}\right) + \dfrac{8}{\beta _i} \left( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p\right) \\&\le \dfrac{1}{2d} \sum \limits _{j=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_j^{(k)}\right) + \left( 1 -\underline{\sigma }^2\right) \le \left( 1 -\underline{\sigma }^2\right) + \left( 1 -\underline{\sigma }^2\right) = 2 \left( 1 -\underline{\sigma }^2\right) , \end{aligned} \end{aligned}$$

since we assume that $\beta _i > 8 ( \sqrt{p} \left\| A\right\| ^2_{\textrm{F}}+ \mu n p ) / ( 1 - \underline{\sigma }^2 )$ in Condition 2. The proof is completed. $\square $

Corollary 11

Suppose $\{Z^{(k)}\}$ is the iterate sequence generated by Algorithm 2 initiated from $Z^{(0)} \in \mathcal {S}_{n,p}$, and the problem parameters satisfy Conditions 1 and 2. Then for any $k \in \mathbb {N}$, we can obtain that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k+1)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k)}\} ) \\&\ge \dfrac{1}{4} \underline{\sigma }^2 \sum \limits _{i=1}^dc_i\beta _i {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Proof

This corollary directly follows from Lemma 9 and Lemma 10. $\square $

Corollary 12

Suppose $\{Z^{(k)}\}$ is the iterate sequence generated by Algorithm 2 initiated from $Z^{(0)} \in \mathcal {S}_{n,p}$, and problem parameters satisfy Conditions 1 and 2. Then for any $k \in \mathbb {N}$, we can acquire that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&\ge - \dfrac{\sqrt{2 \rho d} + 1}{\rho d} \sum \limits _{i=1}^d\left\| A_i\right\| _2^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Proof

According to the Cauchy-Schwarz inequality, we can show that

$$\begin{aligned} \begin{aligned}&\left| \left\langle \Lambda _i^{(k+1)} - \Lambda _i^{(k)}, {\mathbf {D_p}}\left( X_i^{(k+1)},Z^{(k+1)}\right) \right\rangle \right| \le \left\| \Lambda _i^{(k+1)} - \Lambda _i^{(k)} \right\| _{\textrm{F}}{\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) \\&\le \sqrt{\dfrac{8}{\rho d}} \left\| A_i\right\| _2^2 {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned} \end{aligned}$$

where the last inequality follows from Lemma B.4 in [48] and (41). In addition, we have

$$\begin{aligned} \begin{aligned} {\mathbf {d_p}}\left( X_i^{(k+1)},X_i^{(k)}\right) \le {}&{\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k+1)}\right) + {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) \\ \le {}&\dfrac{\sqrt{2 \rho d} + 1}{\sqrt{2 \rho d}} {\mathbf {d_p}}\left( Z^{(k+1)},X_i^{(k)}\right) , \end{aligned} \end{aligned}$$

which implies that

$$\begin{aligned} \begin{aligned}&\left\langle \Lambda _i^{(k+1)} - \Lambda _i^{(k)}, {\mathbf {D_p}}\left( X_i^{(k+1)},Z^{(k+1)}\right) \right\rangle \\&\ge - \dfrac{2 \left( \sqrt{2 \rho d} + 1 \right) }{\rho d} \left\| A_i\right\| _2^2 {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) . \end{aligned} \end{aligned}$$

Combing the fact that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&= \dfrac{1}{2} \sum \limits _{i=1}^d\left\langle \Lambda _i^{(k+1)}-\Lambda _i^{(k)}, {\mathbf {D_p}}\left( X_i^{(k+1)},Z^{(k+1)}\right) \right\rangle , \end{aligned} \end{aligned}$$

we complete the proof. $\square $

Now based on these lemmas and corollaries, we can demonstrate the monotonic non-increasing of $\left\{ \mathcal {L}( \{X_i^k\}, Z^k, \{\Lambda _i^k\} ) \right\} $, which results in the global convergence of our algorithm.

Proposition 13

Suppose $\{Z^{(k)}\}$ is the iterate sequence generated by Algorithm 2 initiated from $Z^{(0)} \in \mathcal {S}_{n,p}$, and problem parameters satisfy Conditions 1 and 2. Then the sequence of augmented Lagrangian functions $\{ \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}$ is monotonically non-increasing, and for any $k \in \mathbb {N}$, it satisfies the following sufficient descent property:

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&\ge \sum \limits _{i=1}^dJ_i {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k+1)}\right) + \bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}, \end{aligned} \end{aligned}$$

(42)

where $J_i = \dfrac{1}{2} \rho d \underline{\sigma }^2 c_i \beta _i - 2 (\sqrt{2 \rho d} + 1) \left\| A_i\right\| _2^2 > 0$ is a constant.

Proof

Combining Corollary 7, Corollary 11, and Corollary 12, we obtain that

$$\begin{aligned} \begin{aligned}&\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \\&\ge \sum \limits _{i=1}^d\left( \dfrac{1}{4} \underline{\sigma }^2 c_i \beta _i - \dfrac{\sqrt{2 \rho d} + 1}{\rho d} \left\| A_i \right\| _2^2 \right) {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k+1)},X_i^{(k)}\right) + \bar{M} \left\| D^{(k)}\right\| ^2_{\textrm{F}}. \end{aligned} \end{aligned}$$

Recalling the relationship $\beta _i > 4 ( \sqrt{2 \rho d} + 1) \left\| A_i\right\| _2^2 / (\rho d \underline{\sigma }^2 c_i)$ in Condition 2, we can conclude that $\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} )$. Hence, the sequence $\{\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} )\}$ is monotonically non-increasing. Finally, the above relationship together with (41) yields the assertion (42). The proof is finished. $\square $

Based on the above properties, we are ready to prove Theorem 4, which establishes the global convergence rate of our proposed algorithm.

Proof (Proof of Theorem 4)

The whole sequence $\{ Z^{(k)}, \{X_i^{(k)}\} \}$ is naturally bounded, since each of $X_i^{(k)}$ or $Z^{(k)}$ is orthogonal. Then it follows from the Bolzano-Weierstrass theorem that this sequence exists an accumulation point $\{Z^{*}, \{X_i^{*}\}\}$, where $Z^{*} \in \mathcal {S}_{n,p}$ and $X_i^{*} \in \mathcal {S}_{n,p}$. Moreover, the boundedness of $\{\Lambda _i^{(k)}\}$ results from the multipliers updating formula (15). Hence, the lower boundedness of $\{ \mathcal {L}( Z^{(k)}, \{ X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \}$ is owing to the continuity of the augmented Lagrangian function. Namely, there exists a constant $\underline{L}$ such that $\mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \ge \underline{L}$ for all $k \in \mathbb {N}$.

It follows from the sufficient descent property (42) that

$$\begin{aligned}{} & {} \sum \limits _{k=1}^{K} \left\| D^{(k)}\right\| ^2_{\textrm{F}}\nonumber \\{} & {} \quad \le \bar{M}^{-1}\sum \limits _{k=1}^{K} \left( \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) - \mathcal {L}( Z^{(k+1)}, \{X_i^{(k+1)}\}, \{\Lambda _i^{(k+1)}\} ) \right) \nonumber \\{} & {} \quad = \bar{M}^{-1}\left( \mathcal {L}( Z^{(1)}, \{X_i^{(1)}\}, \{\Lambda _i^{(1)}\} ) - \mathcal {L}( Z^{(K+1)}, \{X_i^{(K+1)}\}, \{\Lambda _i^{(K+1)}\} ) \right) \nonumber \\{} & {} \quad \le \bar{M}^{-1}\left( \mathcal {L}( Z^{(1)}, \{X_i^{(1)}\}, \{\Lambda _i^{(1)}\} ) - \underline{L} \right) , \end{aligned}$$

(43)

and

$$\begin{aligned}{} & {} \sum \limits _{k = 1}^{K} {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) \nonumber \\{} & {} \quad \le J_i^{-1}\sum \limits _{k=1}^{K} \left( \mathcal {L}( Z^{(k-1)}, \{X_i^{(k-1)}\}, \{\Lambda _i^{(k-1)}\} ) - \mathcal {L}( Z^{(k)}, \{X_i^{(k)}\}, \{\Lambda _i^{(k)}\} ) \right) \nonumber \\{} & {} \quad = J_i^{-1}\left( \mathcal {L}( Z^{(0)}, \{X_i^{(0)}\}, \{\Lambda _i^{(0)}\} ) - \mathcal {L}( Z^{(K)}, \{X_i^{(K)}\}, \{\Lambda _i^{(K)}\} ) \right) \nonumber \\{} & {} \quad \le J_i^{-1}\left( \mathcal {L}( Z^{(0)}, \{X_i^{(0)}\}, \{\Lambda _i^{(0)}\} ) - \underline{L} \right) . \end{aligned}$$

(44)

Upon taking the limit as $K \rightarrow \infty $, we obtain that

$$\begin{aligned} \sum \limits _{k = 1}^{\infty } \left\| D^{(k)}\right\| ^2_{\textrm{F}}< \infty \text { and } \sum \limits _{k = 1}^{\infty } {\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) < \infty , \end{aligned}$$

which further implies that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \left\| D^{(k)} \right\| _{\textrm{F}}= 0 \text { and } \lim \limits _{k\rightarrow \infty } {\mathbf {d_p}}\left( Z^{(k)},X_i^{(k)}\right) = 0, \end{aligned}$$

respectively. Combing this with Lemma 3, we know that any accumulation point $Z^{*}$ of sequence $\{Z^{(k)}\}$ is a first-order stationary point of the problem (2).

Eventually, we prove the sublinear convergence rate. Indeed, it follows from the inequalities (43) and (44) that

$$\begin{aligned}{} & {} \min \limits _{k = 1, \dotsc , K} \left\{ \left\| D^{(k)}\right\| ^2_{\textrm{F}}+\dfrac{1}{d} \sum \limits _{i=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) \right\} \\{} & {} \quad \le \dfrac{1}{K} \sum \limits _{k = 1}^K \left\{ \left\| D^{(k)}\right\| ^2_{\textrm{F}}+\dfrac{1}{d} \sum \limits _{i=1}^d{\textbf{d}^2_{\textbf{p}}}\left( Z^{(k)},X_i^{(k)}\right) \right\} \le \dfrac{C}{K}, \end{aligned}$$

where

$$\begin{aligned} C= & {} \bar{M}^{-1}\left( \mathcal {L}( Z^{(1)}, \{X_i^{(1)}\}, \{\Lambda _i^{(1)}\} ) - \underline{L} \right) \\{} & {} + \left( \sum \limits _{i=1}^dJ_i^{-1}\right) d^{-1}\left( \mathcal {L}( Z^{(0)}, \{X_i^{(0)}\}, \{\Lambda _i^{(0)}\} ) - \underline{L} \right) \end{aligned}$$

is a positive constant. This completes the proof. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, L., Liu, X. & Zhang, Y. A communication-efficient and privacy-aware distributed algorithm for sparse PCA. Comput Optim Appl 85, 1033–1072 (2023). https://doi.org/10.1007/s10589-023-00481-4

Download citation

Received: 20 November 2021
Accepted: 23 March 2023
Published: 12 April 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10589-023-00481-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A communication-efficient and privacy-aware distributed algorithm for sparse PCA

Abstract

Access this article

Similar content being viewed by others

A generalized inexact Uzawa method for stable principal component pursuit problem with nonnegative constraints

Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes

Comparing Classical and Robust Sparse PCA

Data availibility

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Appendices

Appendix A proof of lemma 1

Proof of Lemma 1

Appendix B proof of proposition 2

Proof of Proposition 2

Appendix C proof of lemma 3

Proof of Lemma 3

Appendix D convergence of algorithm 2

Lemma 5

Proof

Lemma 6

Proof

Corollary 7

Proof

Lemma 8

Proof

Lemma 9

Proof

Lemma 10

Proof

Corollary 11

Proof

Corollary 12

Proof

Proposition 13

Proof

Proof (Proof of Theorem 4)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation