Skip to main content

Geometric Optimization in Machine Learning

  • Chapter
  • First Online:
Algorithmic Advances in Riemannian Geometry and Applications

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Machine learning models often rely on sparsity, low-rank, orthogonality, correlation, or graphical structure. The structure of interest in this chapter is geometric, specifically the manifold of positive definite (PD) matrices. Though these matrices recur throughout the applied sciences, our focus is on more recent developments in machine learning and optimization. In particular, we study (i) models that might be nonconvex in the Euclidean sense but are convex along the PD manifold; and (ii) ones that are neither Euclidean nor geodesic convex but are nevertheless amenable to global optimization. We cover basic theory for (i) and (ii); subsequently, we present a scalable Riemannian limited-memory BFGS algorithm (that also applies to other manifolds). We highlight some applications from statistics and machine learning that benefit from the geometric structure studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This reformulation essentially uses the “natural parameters.”

  2. 2.

    Actually, we solve a slightly different unconstrained problem that also reparameterizes \(\alpha _j\).

  3. 3.

    Available at UCI machine learning dataset repository via https://archive.ics.uci.edu/ml/datasets.

References

  1. P.A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, 2009)

    MATH  Google Scholar 

  2. M. Arnaudon, F. Barbaresco, L. Yang, Riemannian medians and means with applications to radar signal processing. IEEE J. Sel. Top. Signal Process. 7(4), 595–604 (2013)

    Article  Google Scholar 

  3. D. Arthur, S. Vassilvitskii, K-means++: the advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007), pp. 1027–1035

    Google Scholar 

  4. M. Bacák, Convex Analysis and Optimization in Hadamard Spaces, vol. 22 (Walter de Gruyter GmbH & Co KG, Berlin, 2014)

    MATH  Google Scholar 

  5. F. Bach, R. Jenatton, J. Mairal, G. Obozinski, Optimization with sparsity-inducing penalties. Foundations and Trends\({\textregistered }\) in Machine Learning 4(1), 1–106 (2012)

    Google Scholar 

  6. R. Bhatia, Positive Definite Matrices (Princeton University Press, Princeton, 2007)

    MATH  Google Scholar 

  7. R. Bhatia, R.L. Karandikar, The matrix geometric mean. Technical report, isid/ms/2-11/02, Indian Statistical Institute (2011)

    Google Scholar 

  8. D.A. Bini, B. Iannazzo, Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. D.A. Bini, B. Iannazzo, B. Jeuris, R. Vandebril, Geometric means of structured matrices. BIT Numer. Math. 54(1), 55–83 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2007)

    MATH  Google Scholar 

  11. N. Boumal, Optimization and estimation on manifolds. Ph.D. thesis, Université catholique de Louvain (2014)

    Google Scholar 

  12. N. Boumal, B. Mishra, P.A. Absil, R. Sepulchre, Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)

    MATH  Google Scholar 

  13. M.R. Bridson, A. Haefliger, Metric Spaces of Non-positive Curvature, vol. 319 (Springer Science & Business Media, Berlin, 1999)

    MATH  Google Scholar 

  14. S. Burer, R.D. Monteiro, Y. Zhang, Solving semidefinite programs via nonlinear programming. part i: transformations and derivatives. Technical report, TR99-17, Rice University, Houston TX (1999)

    Google Scholar 

  15. Z. Chebbi, M. Moahker, Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436, 1872–1889 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  16. A. Cherian, S. Sra, Riemannian dictionary learning and sparse coding for positive definite matrices. IEEE Trans. Neural Netw. Learn. Syst. (2015) (Submitted)

    Google Scholar 

  17. A. Cherian, S. Sra, Positive definite matrices: data representation and applications to computer vision, Riemannian Geometry in Machine Learning, Statistics, Optimization, and Computer Vision, Advances in Computer Vision and Pattern Recognition (Springer, New York, 2016) (this book)

    Google Scholar 

  18. A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos, Jensen-Bregman logdet divergence for efficient similarity computations on positive definite tensors. IEEE Trans. Pattern Anal. Mach. Intell. (2012)

    Google Scholar 

  19. S. Dasgupta, Learning mixtures of Gaussians, in 40th Annual Symposium on Foundations of Computer Science (IEEE, 1999), pp. 634–644

    Google Scholar 

  20. A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  21. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (Wiley, New York, 2000)

    MATH  Google Scholar 

  22. R. Hosseini, M. Mash’al, Mixest: an estimation toolbox for mixture models (2015). arXiv:1507.06065

  23. R. Hosseini, S. Sra, Matrix manifold optimization for Gaussian mixtures, in Advances in Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  24. J.B. Hough, M. Krishnapur, Y. Peres, B. Virág et al., Determinantal processes and independence. Probab. Surv. 3, 206–229 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  25. W. Huang, K.A. Gallivan, P.A. Absil, A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. B. Jeuris, R. Vandebril, B. Vandereycken, A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal. 39, 379–402 (2012)

    MathSciNet  MATH  Google Scholar 

  27. J.T. Kent, D.E. Tyler, Redescending M-estimates of multivariate location and scatter. Ann. Stat. 19(4), 2102–2119 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  28. D. Le Bihan, J.F. Mangin, C. Poupon, C.A. Clark, S. Pappata, N. Molko, H. Chabriat, Diffusion tensor imaging: concepts and applications. J. Magn. Reson. Imaging 13(4), 534–546 (2001)

    Article  Google Scholar 

  29. H. Lee, Y. Lim, Invariant metrics, contractions and nonlinear matrix equations. Nonlinearity 21, 857–878 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  30. J.M. Lee, Introduction to Smooth Manifolds, vol. 218, GTM (Springer, New York, 2012)

    Book  Google Scholar 

  31. B. Lemmens, R. Nussbaum, Nonlinear Perron-Frobenius Theory (Cambridge University Press, Cambridge, 2012)

    Book  MATH  Google Scholar 

  32. Y. Lim, M. Pálfia, Matrix power means and the Karcher mean. J. Funct. Anal. 262, 1498–1514 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  33. J. Ma, L. Xu, M.I. Jordan, Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Comput. 12(12), 2881–2907 (2000)

    Article  Google Scholar 

  34. Z. Mariet, S. Sra, Diversity networks (2015). arXiv:1511.05077

  35. Z. Mariet, S. Sra, Fixed-point algorithms for learning determinantal point processes, in International Conference on Machine Learning (ICML) (2015)

    Google Scholar 

  36. J. Masci, D. Boscaini, M.M. Bronstein, P. Vandergheynst, ShapeNet: convolutional neural networks on non-Euclidean manifolds (2015). arXiv:1501.06297

  37. G.J. McLachlan, D. Peel, Finite Mixture Models (Wiley, New Jersey, 2000)

    Book  MATH  Google Scholar 

  38. A. Mehrjou, R. Hosseini, B.N. Araabi, Mixture of ICAs model for natural images solved by manifold optimization method, in 7th International Conference on Information and Knowledge Technology (2015)

    Google Scholar 

  39. B. Mishra, A Riemannian approach to large-scale constrained least-squares with symmetries. Ph.D. thesis, Université de Namur (2014)

    Google Scholar 

  40. M. Moakher, A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. (SIMAX) 26, 735–747 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  41. K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012)

    MATH  Google Scholar 

  42. F. Nielsen, R. Bhatia (eds.), Matrix Information Geometry (Springer, New York, 2013)

    Google Scholar 

  43. E. Ollila, D. Tyler, V. Koivunen, H.V. Poor, Complex elliptically symmetric distributions: survey, new results and applications. IEEE Trans. Signal Process. 60(11), 5597–5625 (2011)

    Article  MathSciNet  Google Scholar 

  44. R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood, and the EM algorithm. Siam Rev. 26, 195–239 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  45. W. Ring, B. Wirth, Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  46. B. Schölkopf, A.J. Smola, Learning with Kernels (MIT Press, Cambridge, 2002)

    MATH  Google Scholar 

  47. A. Shrivastava, P. Li, A new space for comparing graphs, in IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, 2014), pp. 62–71

    Google Scholar 

  48. S. Sra, On the matrix square root and geometric optimization (2015). arXiv:1507.08366

  49. S. Sra, Positive definite matrices and the S-divergence, in Proceedings of the American Mathematical Society (2015). arXiv:1110.1773v4

    Google Scholar 

  50. S. Sra, R. Hosseini, Geometric optimisation on positive definite matrices for elliptically contoured distributions, in Advances in Neural Information Processing Systems (2013), pp. 2562–2570

    Google Scholar 

  51. S. Sra, R. Hosseini, Conic geometric optimisation on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713–739 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  52. S. Sra, R. Hosseini, L. Theis, M. Bethge, Data modeling with the elliptical gamma distribution, in Artificial Intelligence and Statistics (AISTATS), vol. 18 (2015)

    Google Scholar 

  53. A.C. Thompson, On certain contraction mappings in partially ordered vector space. Proc. AMS 14, 438–443 (1963)

    MathSciNet  MATH  Google Scholar 

  54. R. Tibshirani, Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  55. C. Udrişte, Convex Functions and Optimization Methods on Riemannian Manifolds (Kluwer, Dordrecht, 1994)

    Book  MATH  Google Scholar 

  56. R.J. Vanderbei, H.Y. Benson, On formulating semidefinite programming problems as smooth convex nonlinear optimization problems. Technical report, Princeton (2000)

    Google Scholar 

  57. B. Vandereycken, Riemannian and multilevel optimization for rank-constrained matrix problems. Ph.D. thesis, Department of Computer Science, KU Leuven (2010)

    Google Scholar 

  58. J.J. Verbeek, N. Vlassis, B. Kröse, Efficient greedy learning of Gaussian mixture models. Neural Comput. 15(2), 469–485 (2003)

    Article  MATH  Google Scholar 

  59. A. Wiesel, Geodesic convexity and covariance estimation. IEEE Trans. Signal Process. 60(12), 6182–6189 (2012)

    Article  MathSciNet  Google Scholar 

  60. A. Wiesel, Unified framework to regularized covariance estimation in scaled Gaussian models. IEEE Trans. Signal Process. 60(1), 29–38 (2012)

    Article  MathSciNet  Google Scholar 

  61. L. Xu, M.I. Jordan, On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)

    Article  Google Scholar 

  62. F. Yger, A review of kernels on covariance matrices for BCI applications, in IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (IEEE, 2013), pp. 1–6

    Google Scholar 

  63. J. Zhang, L. Wang, L. Zhou, W. Li, Learning discriminative Stein Kernel for SPD matrices and its applications (2014). arXiv:1407.1974

  64. T. Zhang, Robust subspace recovery by geodesically convex optimization (2012). arXiv:1206.1386

  65. T. Zhang, A. Wiesel, S. Greco, Multivariate generalized Gaussian distribution: convexity and graphical models. IEEE Trans. Signal Process. 60(11), 5597–5625 (2013)

    MathSciNet  Google Scholar 

  66. D. Zoran, Y. Weiss, Natural images, Gaussian mixtures and dead leaves, in Advances in Neural Information Processing Systems (2012), pp. 1736–1744

    Google Scholar 

Download references

Acknowledgments

SS acknowledges partial support from NSF grant IIS-1409802.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suvrit Sra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Sra, S., Hosseini, R. (2016). Geometric Optimization in Machine Learning. In: Minh, H., Murino, V. (eds) Algorithmic Advances in Riemannian Geometry and Applications. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-45026-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45026-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45025-4

  • Online ISBN: 978-3-319-45026-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics