Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

  • Junyu Zhang
  • Shiqian Ma
  • Shuzhong ZhangEmail author
Full Length Paper Series A


In this paper we study nonconvex and nonsmooth multi-block optimization over Euclidean embedded (smooth) Riemannian submanifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. By utilizing the embedding structure, we develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings, where the duality is with respect to the embedded Euclidean spaces. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of \(\epsilon \)-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms possess an iteration complexity of \(O(1/\epsilon ^2)\) to reach an \(\epsilon \)-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the \(\ell _q\) regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods.


Nonconvex and nonsmooth optimization Riemannian manifold \(\epsilon \)-Stationary solution ADMM Iteration complexity 

Mathematics Subject Classification

90C60 90C90 



The authors would like to thank the associate editor and two anonymous reviewers for insightful and constructive comments that helped improve the presentation of this paper. The work of S. Ma was supported in part by a startup package in the Department of Mathematics at University of California, Davis. The work of S. Zhang was supported in part by the National Science Foundation under Grant CMMI-1462408 and in part by the Shenzhen Fundamental Research Fund under Grant KQTD2015033114415450.


  1. 1.
    Absil, P.A., Baker, C.G., Gallivan, K.A.: Convergence analysis of Riemannian trust-region methods. Technical report (2006)Google Scholar
  2. 2.
    Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)zbMATHGoogle Scholar
  4. 4.
    Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Ballani, J., Grasedyck, L., Kluge, M.: Black box approximation of tensors in hierarchical Tucker format. Linear Algebra Appl. 438(2), 639–657 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. (2016)
  7. 7.
    Bergmann, R., Persch, J., Steidl, G.: A parallel Douglas–Rachford algorithm for minimizing ROF-like functionals on images with values in symmetric Hadamard manifolds. SIAM J. Imaging Sci. 9(3), 901–937 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2018)MathSciNetGoogle Scholar
  9. 9.
    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Chen, Y., Li, X., Xu, J.: Convexified modularity maximization for degree-corrected stochastic block models. arXiv preprint arXiv:1512.08425 (2015)
  12. 12.
    Clarke, F.H.: Nonsmooth analysis and optimization. Proc. Int. Congr. Math. 5, 847–853 (1983)Google Scholar
  13. 13.
    De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Dhillon, I.S., Sra, S.: Generalized nonnegative matrix approximations with Bregman divergences. In: NIPS, vol. 18 (2005)Google Scholar
  15. 15.
    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Edelman, A., Arias, T.A., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Ferreira, O.P., Oliveira, P.R.: Proximal point algorithm on Riemannian manifolds. Optimization 51(2), 257–270 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Frieze, A., Jerrum, M.: Improved approximation algorithms for MAX k-CUT and MAX bisection. Algorithmica 18(1), 67–81 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Fu, W.J.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)MathSciNetGoogle Scholar
  20. 20.
    Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2), 267–305 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    Ghosh, S., Lam, H.: Computing worst-case input models in stochastic simulation. arXiv preprint arXiv:1507.05609 (2015)
  24. 24.
    Ghosh, S., Lam, H.: Mirror descent stochastic approximation for computing worst-case stochastic input models. In: Winter Simulation Conference, 2015, pp. 425–436. IEEE (2015)Google Scholar
  25. 25.
    Grant, M., Boyd, S., Ye, Y.: CVX: MATLAB software for disciplined convex programming (2008)Google Scholar
  26. 26.
    Hong, M.: Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: algorithms, convergence, and applications. arXiv preprint arXiv:1604.00543 (2016)
  27. 27.
    Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Hosseini, S., Pouryayevali, M.R.: Generalized gradients and characterization of epi-Lipschitz sets in Riemannian manifolds. Fuel Energy Abstr. 74(12), 3884–3895 (2011)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Huper, K., Trumpf, J.: Newton-like methods for numerical optimization on manifolds. In: Signals, Systems and Computers, 2004. Conference Record of the Thirty-Eighth Asilomar Conference, vol. 1, pp. 136–139. IEEE (2004)Google Scholar
  30. 30.
    Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674. ACM (2013)Google Scholar
  31. 31.
    Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)MathSciNetzbMATHCrossRefGoogle Scholar
  32. 32.
    Jiang, B., Ma, S., So, A.M.-C., Zhang, S.: Vector transport-free SVRG with general retraction for Riemannian optimization: complexity analysis and practical implementation. Preprint arXiv:1705.09059 (2017)
  33. 33.
    Jin, J.: Fast community detection by score. Ann. Stat. 43(1), 57–89 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic variance reduced gradient on Grassmann manifold. arXiv preprint arXiv:1605.07367 (2016)
  35. 35.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    Kovnatsky, A., Glashoff, K., Bronstein, M.: MADMM: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)Google Scholar
  37. 37.
    Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Lai, Z., Xu, Y., Chen, Q., Yang, J., Zhang, D.: Multilinear sparse principal component analysis. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1942–1950 (2014)CrossRefGoogle Scholar
  39. 39.
    Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. Adv. Neural Inf. Process. Syst. 19, 801 (2007)Google Scholar
  40. 40.
    Lee, J.M.: Introduction to Smooth Manifolds. Springer, New York (2013)zbMATHGoogle Scholar
  41. 41.
    Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    Liu, H., Wu, W., So, A.M.-C.: Quadratic optimization with orthogonality constraints: explicit Lojasiewicz exponent and linear convergence of line-search methods. In: ICML, pp. 1158–1167 (2016)Google Scholar
  43. 43.
    Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008)CrossRefGoogle Scholar
  44. 44.
    Luenberger, D.G.: The gradient projection method along geodesics. Manag. Sci. 18(11), 620–631 (1972)MathSciNetzbMATHCrossRefGoogle Scholar
  45. 45.
    Motreanu, D., Pavel, N.H.: Quasi-tangent vectors in flow-invariance and optimization problems on Banach manifolds. J. Math. Anal. Appl. 88(1), 116–132 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  46. 46.
    Nemirovski, A.: Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program. 109(2), 283–317 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  47. 47.
    Nocedal, J., Wright, S.J.: Numerical Optimization, vol. 9, no. 4, p. 1556. SpringerGoogle Scholar
  48. 48.
    Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    Oseledets, I.V., Tyrtyshnikov, E.: TT-cross approximation for multidimensional arrays. Linear Algebra Appl. 432(1), 70–88 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  50. 50.
    Panagakis, Y., Kotropoulos, C., Arce, G.R.: Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio Speech Lang. Process. 18(3), 576–588 (2010)CrossRefGoogle Scholar
  51. 51.
    Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)Google Scholar
  52. 52.
    Rockafellar, R.T.: Clarke’s tangent cones and the boundaries of closed sets in \(\mathbb{R}^n\). Nonlinear Anal. Theory Methods Appl. 3, 145–154 (1979)zbMATHMathSciNetCrossRefGoogle Scholar
  53. 53.
    Smith, S.T.: Optimization techniques on Riemannian manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)MathSciNetzbMATHGoogle Scholar
  54. 54.
    Srebro, N., Jaakkola, T.: Weighted low-rank approximations. In: ICML, vol. 3, pp. 720–727 (2003)Google Scholar
  55. 55.
    Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere II: recovery by Riemannian trust-region method. IEEE Trans. Inf. Theory 63(2), 885–914 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  56. 56.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  57. 57.
    Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. arXiv preprint arXiv:1505.03063 (2015)
  58. 58.
    Wang, S., Sun, M., Chen, Y., Pang, E., Zhou, C.: STPCA: sparse tensor principal component analysis for feature extraction. In: 21st International Conference on Pattern Recognition, 2012, pp. 2278–2281. IEEE (2012)Google Scholar
  59. 59.
    Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)MathSciNetzbMATHCrossRefGoogle Scholar
  60. 60.
    Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  61. 61.
    Wiegele, A.: Biq Mac library—a collection of max-cut and quadratic 0–1 programming instances of medium size. Preprint (2007)Google Scholar
  62. 62.
    Xu, Y.: Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math. Program. Comput. 7(1), 39–70 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  63. 63.
    Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10(1), 74–110 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  64. 64.
    Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)MathSciNetzbMATHGoogle Scholar
  65. 65.
    Ye, Y.: A. 699-approximation algorithm for max-bisection. Math. Program. 90(1), 101–111 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  66. 66.
    Zhang, H., Reddi, S.J., Sra, S.: Riemannian SVRG: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592–4600 (2016)Google Scholar
  67. 67.
    Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. arXiv preprint arXiv:1602.06053 (2016)
  68. 68.
    Zhang, J., Liu, H., Wen, Z., Zhang, S.: A sparse completely positive relaxation of the modularity maximization for community detection. SIAM J. Sci. Comput. 40(5), A3091–A3120 (2018)MathSciNetzbMATHCrossRefGoogle Scholar
  69. 69.
    Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23(2), 534–550 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  70. 70.
    Zhang, Y., Levina, E., Zhu, J.: Detecting overlapping communities in networks using spectral methods. arXiv preprint arXiv:1412.3432 (2014)
  71. 71.
    Zhu, H., Zhang, X., Chu, D., Liao, L.: Nonconvex and nonsmooth optimization with generalized orthogonality constraints: an approximate augmented Lagrangian method. J. Sci. Comput. 72(1), 331–372 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  72. 72.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67(2), 301–320 (2005)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Department of Industrial and Systems EngineeringUniversity of MinnesotaMinneapolisUSA
  2. 2.Department of MathematicsUniversity of California, DavisDavisUSA
  3. 3.Institute of Data and Decision AnalyticsThe Chinese University of Hong KongShenzhenChina

Personalised recommendations