Geometric Optimization in Machine Learning

Chapter
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

Machine learning models often rely on sparsity, low-rank, orthogonality, correlation, or graphical structure. The structure of interest in this chapter is geometric, specifically the manifold of positive definite (PD) matrices. Though these matrices recur throughout the applied sciences, our focus is on more recent developments in machine learning and optimization. In particular, we study (i) models that might be nonconvex in the Euclidean sense but are convex along the PD manifold; and (ii) ones that are neither Euclidean nor geodesic convex but are nevertheless amenable to global optimization. We cover basic theory for (i) and (ii); subsequently, we present a scalable Riemannian limited-memory BFGS algorithm (that also applies to other manifolds). We highlight some applications from statistics and machine learning that benefit from the geometric structure studies.

References

  1. 1.
    P.A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, 2009)MATHGoogle Scholar
  2. 2.
    M. Arnaudon, F. Barbaresco, L. Yang, Riemannian medians and means with applications to radar signal processing. IEEE J. Sel. Top. Signal Process. 7(4), 595–604 (2013)CrossRefGoogle Scholar
  3. 3.
    D. Arthur, S. Vassilvitskii, K-means++: the advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007), pp. 1027–1035Google Scholar
  4. 4.
    M. Bacák, Convex Analysis and Optimization in Hadamard Spaces, vol. 22 (Walter de Gruyter GmbH & Co KG, Berlin, 2014)MATHGoogle Scholar
  5. 5.
    F. Bach, R. Jenatton, J. Mairal, G. Obozinski, Optimization with sparsity-inducing penalties. Foundations and Trends\({\textregistered }\) in Machine Learning 4(1), 1–106 (2012)Google Scholar
  6. 6.
    R. Bhatia, Positive Definite Matrices (Princeton University Press, Princeton, 2007)MATHGoogle Scholar
  7. 7.
    R. Bhatia, R.L. Karandikar, The matrix geometric mean. Technical report, isid/ms/2-11/02, Indian Statistical Institute (2011)Google Scholar
  8. 8.
    D.A. Bini, B. Iannazzo, Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    D.A. Bini, B. Iannazzo, B. Jeuris, R. Vandebril, Geometric means of structured matrices. BIT Numer. Math. 54(1), 55–83 (2014)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2007)MATHGoogle Scholar
  11. 11.
    N. Boumal, Optimization and estimation on manifolds. Ph.D. thesis, Université catholique de Louvain (2014)Google Scholar
  12. 12.
    N. Boumal, B. Mishra, P.A. Absil, R. Sepulchre, Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)MATHGoogle Scholar
  13. 13.
    M.R. Bridson, A. Haefliger, Metric Spaces of Non-positive Curvature, vol. 319 (Springer Science & Business Media, Berlin, 1999)MATHGoogle Scholar
  14. 14.
    S. Burer, R.D. Monteiro, Y. Zhang, Solving semidefinite programs via nonlinear programming. part i: transformations and derivatives. Technical report, TR99-17, Rice University, Houston TX (1999)Google Scholar
  15. 15.
    Z. Chebbi, M. Moahker, Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436, 1872–1889 (2012)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    A. Cherian, S. Sra, Riemannian dictionary learning and sparse coding for positive definite matrices. IEEE Trans. Neural Netw. Learn. Syst. (2015) (Submitted)Google Scholar
  17. 17.
    A. Cherian, S. Sra, Positive definite matrices: data representation and applications to computer vision, Riemannian Geometry in Machine Learning, Statistics, Optimization, and Computer Vision, Advances in Computer Vision and Pattern Recognition (Springer, New York, 2016) (this book)Google Scholar
  18. 18.
    A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos, Jensen-Bregman logdet divergence for efficient similarity computations on positive definite tensors. IEEE Trans. Pattern Anal. Mach. Intell. (2012)Google Scholar
  19. 19.
    S. Dasgupta, Learning mixtures of Gaussians, in 40th Annual Symposium on Foundations of Computer Science (IEEE, 1999), pp. 634–644Google Scholar
  20. 20.
    A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)MathSciNetMATHGoogle Scholar
  21. 21.
    R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (Wiley, New York, 2000)MATHGoogle Scholar
  22. 22.
    R. Hosseini, M. Mash’al, Mixest: an estimation toolbox for mixture models (2015). arXiv:1507.06065
  23. 23.
    R. Hosseini, S. Sra, Matrix manifold optimization for Gaussian mixtures, in Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  24. 24.
    J.B. Hough, M. Krishnapur, Y. Peres, B. Virág et al., Determinantal processes and independence. Probab. Surv. 3, 206–229 (2006)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    W. Huang, K.A. Gallivan, P.A. Absil, A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    B. Jeuris, R. Vandebril, B. Vandereycken, A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal. 39, 379–402 (2012)MathSciNetMATHGoogle Scholar
  27. 27.
    J.T. Kent, D.E. Tyler, Redescending M-estimates of multivariate location and scatter. Ann. Stat. 19(4), 2102–2119 (1991)MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    D. Le Bihan, J.F. Mangin, C. Poupon, C.A. Clark, S. Pappata, N. Molko, H. Chabriat, Diffusion tensor imaging: concepts and applications. J. Magn. Reson. Imaging 13(4), 534–546 (2001)CrossRefGoogle Scholar
  29. 29.
    H. Lee, Y. Lim, Invariant metrics, contractions and nonlinear matrix equations. Nonlinearity 21, 857–878 (2008)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    J.M. Lee, Introduction to Smooth Manifolds, vol. 218, GTM (Springer, New York, 2012)CrossRefGoogle Scholar
  31. 31.
    B. Lemmens, R. Nussbaum, Nonlinear Perron-Frobenius Theory (Cambridge University Press, Cambridge, 2012)CrossRefMATHGoogle Scholar
  32. 32.
    Y. Lim, M. Pálfia, Matrix power means and the Karcher mean. J. Funct. Anal. 262, 1498–1514 (2012)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    J. Ma, L. Xu, M.I. Jordan, Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Comput. 12(12), 2881–2907 (2000)CrossRefGoogle Scholar
  34. 34.
    Z. Mariet, S. Sra, Diversity networks (2015). arXiv:1511.05077
  35. 35.
    Z. Mariet, S. Sra, Fixed-point algorithms for learning determinantal point processes, in International Conference on Machine Learning (ICML) (2015)Google Scholar
  36. 36.
    J. Masci, D. Boscaini, M.M. Bronstein, P. Vandergheynst, ShapeNet: convolutional neural networks on non-Euclidean manifolds (2015). arXiv:1501.06297
  37. 37.
    G.J. McLachlan, D. Peel, Finite Mixture Models (Wiley, New Jersey, 2000)CrossRefMATHGoogle Scholar
  38. 38.
    A. Mehrjou, R. Hosseini, B.N. Araabi, Mixture of ICAs model for natural images solved by manifold optimization method, in 7th International Conference on Information and Knowledge Technology (2015)Google Scholar
  39. 39.
    B. Mishra, A Riemannian approach to large-scale constrained least-squares with symmetries. Ph.D. thesis, Université de Namur (2014)Google Scholar
  40. 40.
    M. Moakher, A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. (SIMAX) 26, 735–747 (2005)MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012)MATHGoogle Scholar
  42. 42.
    F. Nielsen, R. Bhatia (eds.), Matrix Information Geometry (Springer, New York, 2013)Google Scholar
  43. 43.
    E. Ollila, D. Tyler, V. Koivunen, H.V. Poor, Complex elliptically symmetric distributions: survey, new results and applications. IEEE Trans. Signal Process. 60(11), 5597–5625 (2011)MathSciNetCrossRefGoogle Scholar
  44. 44.
    R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood, and the EM algorithm. Siam Rev. 26, 195–239 (1984)MathSciNetCrossRefMATHGoogle Scholar
  45. 45.
    W. Ring, B. Wirth, Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)MathSciNetCrossRefMATHGoogle Scholar
  46. 46.
    B. Schölkopf, A.J. Smola, Learning with Kernels (MIT Press, Cambridge, 2002)MATHGoogle Scholar
  47. 47.
    A. Shrivastava, P. Li, A new space for comparing graphs, in IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, 2014), pp. 62–71Google Scholar
  48. 48.
    S. Sra, On the matrix square root and geometric optimization (2015). arXiv:1507.08366
  49. 49.
    S. Sra, Positive definite matrices and the S-divergence, in Proceedings of the American Mathematical Society (2015). arXiv:1110.1773v4
  50. 50.
    S. Sra, R. Hosseini, Geometric optimisation on positive definite matrices for elliptically contoured distributions, in Advances in Neural Information Processing Systems (2013), pp. 2562–2570Google Scholar
  51. 51.
    S. Sra, R. Hosseini, Conic geometric optimisation on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713–739 (2015)MathSciNetCrossRefMATHGoogle Scholar
  52. 52.
    S. Sra, R. Hosseini, L. Theis, M. Bethge, Data modeling with the elliptical gamma distribution, in Artificial Intelligence and Statistics (AISTATS), vol. 18 (2015)Google Scholar
  53. 53.
    A.C. Thompson, On certain contraction mappings in partially ordered vector space. Proc. AMS 14, 438–443 (1963)MathSciNetMATHGoogle Scholar
  54. 54.
    R. Tibshirani, Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)MathSciNetMATHGoogle Scholar
  55. 55.
    C. Udrişte, Convex Functions and Optimization Methods on Riemannian Manifolds (Kluwer, Dordrecht, 1994)CrossRefMATHGoogle Scholar
  56. 56.
    R.J. Vanderbei, H.Y. Benson, On formulating semidefinite programming problems as smooth convex nonlinear optimization problems. Technical report, Princeton (2000)Google Scholar
  57. 57.
    B. Vandereycken, Riemannian and multilevel optimization for rank-constrained matrix problems. Ph.D. thesis, Department of Computer Science, KU Leuven (2010)Google Scholar
  58. 58.
    J.J. Verbeek, N. Vlassis, B. Kröse, Efficient greedy learning of Gaussian mixture models. Neural Comput. 15(2), 469–485 (2003)CrossRefMATHGoogle Scholar
  59. 59.
    A. Wiesel, Geodesic convexity and covariance estimation. IEEE Trans. Signal Process. 60(12), 6182–6189 (2012)MathSciNetCrossRefGoogle Scholar
  60. 60.
    A. Wiesel, Unified framework to regularized covariance estimation in scaled Gaussian models. IEEE Trans. Signal Process. 60(1), 29–38 (2012)MathSciNetCrossRefGoogle Scholar
  61. 61.
    L. Xu, M.I. Jordan, On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)CrossRefGoogle Scholar
  62. 62.
    F. Yger, A review of kernels on covariance matrices for BCI applications, in IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (IEEE, 2013), pp. 1–6Google Scholar
  63. 63.
    J. Zhang, L. Wang, L. Zhou, W. Li, Learning discriminative Stein Kernel for SPD matrices and its applications (2014). arXiv:1407.1974
  64. 64.
    T. Zhang, Robust subspace recovery by geodesically convex optimization (2012). arXiv:1206.1386
  65. 65.
    T. Zhang, A. Wiesel, S. Greco, Multivariate generalized Gaussian distribution: convexity and graphical models. IEEE Trans. Signal Process. 60(11), 5597–5625 (2013)MathSciNetGoogle Scholar
  66. 66.
    D. Zoran, Y. Weiss, Natural images, Gaussian mixtures and dead leaves, in Advances in Neural Information Processing Systems (2012), pp. 1736–1744Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Laboratory for Information & Decision Systems (LIDS)Massachusetts Institute of TechnologyCambridgeUSA
  2. 2.School of ECECollege of Engineering, University of TehranTehranIran

Personalised recommendations