Geometric Optimization in Machine Learning

Sra, Suvrit; Hosseini, Reshad

doi:10.1007/978-3-319-45026-1_3

Suvrit Sra⁴ &
Reshad Hosseini⁵

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

2439 Accesses
2 Citations

Abstract

Machine learning models often rely on sparsity, low-rank, orthogonality, correlation, or graphical structure. The structure of interest in this chapter is geometric, specifically the manifold of positive definite (PD) matrices. Though these matrices recur throughout the applied sciences, our focus is on more recent developments in machine learning and optimization. In particular, we study (i) models that might be nonconvex in the Euclidean sense but are convex along the PD manifold; and (ii) ones that are neither Euclidean nor geodesic convex but are nevertheless amenable to global optimization. We cover basic theory for (i) and (ii); subsequently, we present a scalable Riemannian limited-memory BFGS algorithm (that also applies to other manifolds). We highlight some applications from statistics and machine learning that benefit from the geometric structure studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This reformulation essentially uses the “natural parameters.”
2.
Actually, we solve a slightly different unconstrained problem that also reparameterizes \(\alpha _j\).
3.
Available at UCI machine learning dataset repository via https://archive.ics.uci.edu/ml/datasets.

References

P.A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, 2009)
MATH Google Scholar
M. Arnaudon, F. Barbaresco, L. Yang, Riemannian medians and means with applications to radar signal processing. IEEE J. Sel. Top. Signal Process. 7(4), 595–604 (2013)
Article Google Scholar
D. Arthur, S. Vassilvitskii, K-means++: the advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007), pp. 1027–1035
Google Scholar
M. Bacák, Convex Analysis and Optimization in Hadamard Spaces, vol. 22 (Walter de Gruyter GmbH & Co KG, Berlin, 2014)
MATH Google Scholar
F. Bach, R. Jenatton, J. Mairal, G. Obozinski, Optimization with sparsity-inducing penalties. Foundations and Trends\({\textregistered }\) in Machine Learning 4(1), 1–106 (2012)
Google Scholar
R. Bhatia, Positive Definite Matrices (Princeton University Press, Princeton, 2007)
MATH Google Scholar
R. Bhatia, R.L. Karandikar, The matrix geometric mean. Technical report, isid/ms/2-11/02, Indian Statistical Institute (2011)
Google Scholar
D.A. Bini, B. Iannazzo, Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)
Article MathSciNet MATH Google Scholar
D.A. Bini, B. Iannazzo, B. Jeuris, R. Vandebril, Geometric means of structured matrices. BIT Numer. Math. 54(1), 55–83 (2014)
Article MathSciNet MATH Google Scholar
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2007)
MATH Google Scholar
N. Boumal, Optimization and estimation on manifolds. Ph.D. thesis, Université catholique de Louvain (2014)
Google Scholar
N. Boumal, B. Mishra, P.A. Absil, R. Sepulchre, Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
MATH Google Scholar
M.R. Bridson, A. Haefliger, Metric Spaces of Non-positive Curvature, vol. 319 (Springer Science & Business Media, Berlin, 1999)
MATH Google Scholar
S. Burer, R.D. Monteiro, Y. Zhang, Solving semidefinite programs via nonlinear programming. part i: transformations and derivatives. Technical report, TR99-17, Rice University, Houston TX (1999)
Google Scholar
Z. Chebbi, M. Moahker, Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436, 1872–1889 (2012)
Article MathSciNet MATH Google Scholar
A. Cherian, S. Sra, Riemannian dictionary learning and sparse coding for positive definite matrices. IEEE Trans. Neural Netw. Learn. Syst. (2015) (Submitted)
Google Scholar
A. Cherian, S. Sra, Positive definite matrices: data representation and applications to computer vision, Riemannian Geometry in Machine Learning, Statistics, Optimization, and Computer Vision, Advances in Computer Vision and Pattern Recognition (Springer, New York, 2016) (this book)
Google Scholar
A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos, Jensen-Bregman logdet divergence for efficient similarity computations on positive definite tensors. IEEE Trans. Pattern Anal. Mach. Intell. (2012)
Google Scholar
S. Dasgupta, Learning mixtures of Gaussians, in 40th Annual Symposium on Foundations of Computer Science (IEEE, 1999), pp. 634–644
Google Scholar
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (Wiley, New York, 2000)
MATH Google Scholar
R. Hosseini, M. Mash’al, Mixest: an estimation toolbox for mixture models (2015). arXiv:1507.06065
R. Hosseini, S. Sra, Matrix manifold optimization for Gaussian mixtures, in Advances in Neural Information Processing Systems (NIPS) (2015)
Google Scholar
J.B. Hough, M. Krishnapur, Y. Peres, B. Virág et al., Determinantal processes and independence. Probab. Surv. 3, 206–229 (2006)
Article MathSciNet MATH Google Scholar
W. Huang, K.A. Gallivan, P.A. Absil, A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)
Article MathSciNet MATH Google Scholar
B. Jeuris, R. Vandebril, B. Vandereycken, A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal. 39, 379–402 (2012)
MathSciNet MATH Google Scholar
J.T. Kent, D.E. Tyler, Redescending M-estimates of multivariate location and scatter. Ann. Stat. 19(4), 2102–2119 (1991)
Article MathSciNet MATH Google Scholar
D. Le Bihan, J.F. Mangin, C. Poupon, C.A. Clark, S. Pappata, N. Molko, H. Chabriat, Diffusion tensor imaging: concepts and applications. J. Magn. Reson. Imaging 13(4), 534–546 (2001)
Article Google Scholar
H. Lee, Y. Lim, Invariant metrics, contractions and nonlinear matrix equations. Nonlinearity 21, 857–878 (2008)
Article MathSciNet MATH Google Scholar
J.M. Lee, Introduction to Smooth Manifolds, vol. 218, GTM (Springer, New York, 2012)
Book Google Scholar
B. Lemmens, R. Nussbaum, Nonlinear Perron-Frobenius Theory (Cambridge University Press, Cambridge, 2012)
Book MATH Google Scholar
Y. Lim, M. Pálfia, Matrix power means and the Karcher mean. J. Funct. Anal. 262, 1498–1514 (2012)
Article MathSciNet MATH Google Scholar
J. Ma, L. Xu, M.I. Jordan, Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Comput. 12(12), 2881–2907 (2000)
Article Google Scholar
Z. Mariet, S. Sra, Diversity networks (2015). arXiv:1511.05077
Z. Mariet, S. Sra, Fixed-point algorithms for learning determinantal point processes, in International Conference on Machine Learning (ICML) (2015)
Google Scholar
J. Masci, D. Boscaini, M.M. Bronstein, P. Vandergheynst, ShapeNet: convolutional neural networks on non-Euclidean manifolds (2015). arXiv:1501.06297
G.J. McLachlan, D. Peel, Finite Mixture Models (Wiley, New Jersey, 2000)
Book MATH Google Scholar
A. Mehrjou, R. Hosseini, B.N. Araabi, Mixture of ICAs model for natural images solved by manifold optimization method, in 7th International Conference on Information and Knowledge Technology (2015)
Google Scholar
B. Mishra, A Riemannian approach to large-scale constrained least-squares with symmetries. Ph.D. thesis, Université de Namur (2014)
Google Scholar
M. Moakher, A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. (SIMAX) 26, 735–747 (2005)
Article MathSciNet MATH Google Scholar
K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012)
MATH Google Scholar
F. Nielsen, R. Bhatia (eds.), Matrix Information Geometry (Springer, New York, 2013)
Google Scholar
E. Ollila, D. Tyler, V. Koivunen, H.V. Poor, Complex elliptically symmetric distributions: survey, new results and applications. IEEE Trans. Signal Process. 60(11), 5597–5625 (2011)
Article MathSciNet Google Scholar
R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood, and the EM algorithm. Siam Rev. 26, 195–239 (1984)
Article MathSciNet MATH Google Scholar
W. Ring, B. Wirth, Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)
Article MathSciNet MATH Google Scholar
B. Schölkopf, A.J. Smola, Learning with Kernels (MIT Press, Cambridge, 2002)
MATH Google Scholar
A. Shrivastava, P. Li, A new space for comparing graphs, in IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, 2014), pp. 62–71
Google Scholar
S. Sra, On the matrix square root and geometric optimization (2015). arXiv:1507.08366
S. Sra, Positive definite matrices and the S-divergence, in Proceedings of the American Mathematical Society (2015). arXiv:1110.1773v4
Google Scholar
S. Sra, R. Hosseini, Geometric optimisation on positive definite matrices for elliptically contoured distributions, in Advances in Neural Information Processing Systems (2013), pp. 2562–2570
Google Scholar
S. Sra, R. Hosseini, Conic geometric optimisation on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713–739 (2015)
Article MathSciNet MATH Google Scholar
S. Sra, R. Hosseini, L. Theis, M. Bethge, Data modeling with the elliptical gamma distribution, in Artificial Intelligence and Statistics (AISTATS), vol. 18 (2015)
Google Scholar
A.C. Thompson, On certain contraction mappings in partially ordered vector space. Proc. AMS 14, 438–443 (1963)
MathSciNet MATH Google Scholar
R. Tibshirani, Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
C. Udrişte, Convex Functions and Optimization Methods on Riemannian Manifolds (Kluwer, Dordrecht, 1994)
Book MATH Google Scholar
R.J. Vanderbei, H.Y. Benson, On formulating semidefinite programming problems as smooth convex nonlinear optimization problems. Technical report, Princeton (2000)
Google Scholar
B. Vandereycken, Riemannian and multilevel optimization for rank-constrained matrix problems. Ph.D. thesis, Department of Computer Science, KU Leuven (2010)
Google Scholar
J.J. Verbeek, N. Vlassis, B. Kröse, Efficient greedy learning of Gaussian mixture models. Neural Comput. 15(2), 469–485 (2003)
Article MATH Google Scholar
A. Wiesel, Geodesic convexity and covariance estimation. IEEE Trans. Signal Process. 60(12), 6182–6189 (2012)
Article MathSciNet Google Scholar
A. Wiesel, Unified framework to regularized covariance estimation in scaled Gaussian models. IEEE Trans. Signal Process. 60(1), 29–38 (2012)
Article MathSciNet Google Scholar
L. Xu, M.I. Jordan, On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)
Article Google Scholar
F. Yger, A review of kernels on covariance matrices for BCI applications, in IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (IEEE, 2013), pp. 1–6
Google Scholar
J. Zhang, L. Wang, L. Zhou, W. Li, Learning discriminative Stein Kernel for SPD matrices and its applications (2014). arXiv:1407.1974
T. Zhang, Robust subspace recovery by geodesically convex optimization (2012). arXiv:1206.1386
T. Zhang, A. Wiesel, S. Greco, Multivariate generalized Gaussian distribution: convexity and graphical models. IEEE Trans. Signal Process. 60(11), 5597–5625 (2013)
MathSciNet Google Scholar
D. Zoran, Y. Weiss, Natural images, Gaussian mixtures and dead leaves, in Advances in Neural Information Processing Systems (2012), pp. 1736–1744
Google Scholar

Download references

Acknowledgments

SS acknowledges partial support from NSF grant IIS-1409802.

Author information

Authors and Affiliations

Laboratory for Information & Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA
Suvrit Sra
School of ECE, College of Engineering, University of Tehran, Tehran, Iran
Reshad Hosseini

Authors

Suvrit Sra
View author publications
You can also search for this author in PubMed Google Scholar
Reshad Hosseini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suvrit Sra .

Editor information

Editors and Affiliations

Istituto Italiano di Tecnologia, Pattern Analysis and Computer Vision, Genoa, Italy
Hà Quang Minh
Istituto Italiano di Tecnologia, Pattern Analysis and Computer Vision , Genoa, Italy
Vittorio Murino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sra, S., Hosseini, R. (2016). Geometric Optimization in Machine Learning. In: Minh, H., Murino, V. (eds) Algorithmic Advances in Riemannian Geometry and Applications. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-45026-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-45026-1_3
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45025-4
Online ISBN: 978-3-319-45026-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics