Coupling matrix manifolds assisted optimization for optimal transport problems

Abstract

Optimal transport (OT) is a powerful tool for measuring the distance between two probability distributions. In this paper, we introduce a new manifold named as the coupling matrix manifold (CMM), where each point on this novel manifold can be regarded as a transportation plan of the optimal transport problem. We firstly explore the Riemannian geometry of CMM with the metric expressed by the Fisher information. These geometrical features can be exploited in many essential optimization methods as a framework solving all types of OT problems via incorporating numerical Riemannian optimization algorithms such as gradient descent and trust region algorithms in CMM manifold. The proposed approach is validated using several OT problems in comparison with recent state-of-the-art related works. For the classic OT problem and its entropy regularized variant, it is shown that our method is comparable with the classic algorithms such as linear programming and Sinkhorn algorithms. For other types of non-entropy regularized OT problems, our proposed method has shown superior performance to other works, whereby the geometric information of the OT feasible space was not incorporated within.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    For a matrix \({\mathbf{M}}\), \(\text {diag}({\mathbf{M}})\) is the vector formed by \({\mathbf{M}}\)’s diagonal elements. For a vector \({\mathbf{v}}\), the result of \(\text {diag}({\mathbf{v}})\) is the matrix whose diagonal elements come from \({\mathbf{v}}\).

  2. 2.

    We sincerely thanks to the authors of Courty et al. (2016) for providing us the complete simulated two moon datasets.

  3. 3.

    Both datasets can be found at http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.

  4. 4.

    http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html.

References

  1. Absil, P. A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton: Princeton University Press.

    Google Scholar 

  2. Altschuler, J., Weed, J., & Rigollet, P. (2017). Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of the 31st international conference on neural information processing systems (pp. 1961–1971), Curran Associates Inc., USA, NIPS’17.

  3. Amari, S., & Nagaoka, H. (2007). Methods of informantion geometry. Providence: American Mathematical Society.

    Google Scholar 

  4. Amari, S., & Nagaoka, H. (2000). Methods of Information Geometry (pp. 37–40). Oxford University Press, New York, chap Chentsov’s theorem and some historical remarks.

  5. Ambrogioni, L., Güçlü U, Güçlütürk, Y., Hinne, M., Maris, E., & van Gerven, M. A. J. (2018). Wasserstein variational inference. In Proceedings of the 32nd international conference on neural information processing systems (pp. 2478–2487), Curran Associates Inc., USA, NIPS’18.

  6. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv:1701.07875.

  7. Bertsekas, D. (1999). Nonlinear programming. Belmont: Athena Scientific.

    Google Scholar 

  8. Bousquet, O., Gelly, S., Tolstikhin, I., Simon-Gabriel, C. J., & Schölkopf, B. (2017). From optimal transport to generative modeling: The VEGAN cookbook. Tech. rep.

  9. Brezis, H. (2018). Remarks on the Monge–Kantorovich problem in the discrete setting. Comptes Rendus Mathematique, 356(2), 207–213.

    MathSciNet  Article  Google Scholar 

  10. Bruzzone, L., & Marconcini, M. (2010). Domain adaptation problems: A DASVM classification technique and a circular validation strategy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5), 770–787.

    Article  Google Scholar 

  11. Courty, N., Flamary, R., Tuia, D., & Rakotomamonjy, A. (2016). Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9), 1853–1865.

    Article  Google Scholar 

  12. Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26, 2292–2300.

    Google Scholar 

  13. Cuturi, M., & Doucet, A. (2014). Fast computation of Wasserstein barycenters. In Xing, E. P., & Jebara, T. (Eds.) Proceedings of the 31st international conference on machine learning (pp. 685–693), Bejing, China, vol 32.

  14. De Loera, J. A., & Kim, E. D. (2014). Combinatorics and geometry of transportation polytopes: An update. Discrete Geometry and Algebraic Combinatorics, 625, 37–76.

    MathSciNet  MATH  Google Scholar 

  15. Dessein, A., Papadakis, N., & Rouas, J. L. (2018). Regularised optimal transport and the rot mover’s distance. Journal of Machine Learning Research, 19(15), 1–53.

    MATH  Google Scholar 

  16. Douik, A., & Hassibi, B. (2019). Manifold optimization over the set of doubly stochastic matrices: A second-order geometry. IEEE Transactions on Signal Processing, 67(22), 5761–5774.

    MathSciNet  Article  Google Scholar 

  17. Essid, M., & Solomon, J. (2018). Quadratically regularized optimal transport on graphs. SIAM Journal on Scientific Computing, 40(4), A1961–A1986.

    MathSciNet  Article  Google Scholar 

  18. Ferradans, S., Papadakis, N., Peyre, G., & Aujol, J. F. (2014). Regularized discrete optimal transport. SIAM Journal on Imaging Sciences, 7(3), 1853–1882.

    MathSciNet  Article  Google Scholar 

  19. Flamary, R., Cuturi, M., Courty, N., & Rakotomamonjy, A. (2018). Wasserstein discriminant analysis. Machine Learning, 107(12), 1923–1945.

    MathSciNet  Article  Google Scholar 

  20. Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., & Poggio, T. A. (2015). Learning with a Wasserstein loss. In Advances in neural information processing systems (NIPS), vol 28.

  21. Gabay, D. (1982). Minimizing a differentiable function over a differential manifold. Journal of Optimization Theory and Applications, 37(2), 177–219.

    MathSciNet  Article  Google Scholar 

  22. Genevay, A., Cuturi, M., Peyré, G., & Bach, F. (2016). Stochastic optimization for large-scale optimal transport. In Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (Eds.) Advances in neural information processing systems 29 (pp. 3440–3448). Curran Associates, Inc.

  23. Germain, P., Habrard, A., Laviolette, F., & Morvant, E. (2013). APAC-Bayesian approach for domain adaptation with specialization to linear classifiers. In Proceedings of international conference on machine learning (ICML) (pp. 738–746). Atlanta, USA.

  24. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved training of Wasserstein gans. In Proceedings of the 31st international conference on neural information processing systems (pp. 5769–5779). Curran Associates Inc., Red Hook, NY, USA, NIPS’17.

  25. Haker, S., Zhu, L., Tannenbaum, A., & Angenent, S. (2004). Optimal mass transport for registration and warping. International Journal of Computer Vision, 60(3), 225–240.

    Article  Google Scholar 

  26. Hong, X., & Gao, J. (2015). Sparse density estimation on multinomial manifold combining local component analysis. In Proceedings of international joint conference on neural networks (IJCNN) (pp. 1–7). https://doi.org/10.1109/IJCNN.2015.7280301.

  27. Hong, X., & Gao, J. (2018). Estimating the square root of probability density function on Riemannian manifold. Expert Systems (in press) https://doi.org/10.1111/exsy.12266.

  28. Hong, X., Gao, J., Chen, S., & Zia, T. (2015). Sparse density estimation on the multinomial manifold. IEEE Transactions on Neural Networks and Learning Systems, 26, 2972–2977.

    MathSciNet  Article  Google Scholar 

  29. Jacobs, M., & Lèger, F. (2020). A fast approach to optimal transport: The back-and-forth method. arXiv:190512154 2.

  30. Kantorovich, L. V. (1942). On the translocation of masses. Doklady Akademii Nauk SSSR (NS), 37, 199–201.

    MathSciNet  Google Scholar 

  31. Knight, P. A. (2008). The Sinkhorn–Knopp algorithm: Convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30(1), 261–275.

    MathSciNet  Article  Google Scholar 

  32. Kolouri, S., Pope, P. E., Martin, C. E., & Rohde, G. K. (2019) Sliced Wasserstein auto-encoders. In Proceedings of international conference on learning representation (ICLR).

  33. Lee, Y. T., & Sidford, A. (2014). Path finding methods for linear programming: Solving linear programs in o(vrank) iterations and faster algorithms for maximum flow. In Proceedings of IEEE 55th annual symposium on foundations of computer science (pp. 424–433). https://doi.org/10.1109/FOCS.2014.52.

  34. Maman, G., Yair, O., Eytan, D., & Talmon, R. (2019). Domain adaptation using Riemannian geometry of SPD matrices. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 4464–4468). Brighton, United Kingdom: IEEE.

  35. Miller, M., & Lent, J. V. (2016). Monge’s optimal transport distance with applications for nearest neighbour image classification. arXiv:1612.00181.

  36. Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris.

  37. Montavon, G., Müller, K. R., & Cuturi, M. (2016). Wasserstein training of restricted Boltzmann machines. Advances in Neural In-formation Processing Systems, 29, 3718–3726.

    Google Scholar 

  38. Muzellec, B., Nock, R., Patrini, G., & Nielsen, F. (2017). Tsallis regularized optimal transport and ecological inference. In Proceedings of AAAI (pp. 2387–2393).

  39. Panaretos, V. M., & Zemel, Y. (2019). Statistical aspects of Wasserstein distances. Annual Review of Statistics and Its Application, 6, 405–431.

    MathSciNet  Article  Google Scholar 

  40. Peyre, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning Series, Now Publishers, https://books.google.com.au/books?id=J0BiwgEACAAJ.

  41. Peyré, G., Cuturi, M., et al. (2019). Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5–6), 355–607.

    Article  Google Scholar 

  42. Queyranne, M., & Spieksma, F. (2009). Multi-index transportation problems: Multi-index transportation problems MITP. Encyclopedia of Optimization, pp. 2413–2419.

  43. Rabin, J., & Papadakis, N. (2015). Convex color image segmentation with optimal transport distances. In International conference on scale space and variational methods in computer vision. Springer, pp. 256–269.

  44. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.

    Article  Google Scholar 

  45. Schmitzer, B. (2019). Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM Journal on Scientic Computing, 41(3), A1443–A1481.

    MathSciNet  Article  Google Scholar 

  46. Solomon, J., de Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., et al. (2015). Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics, 34(4), 66:1–66:11.

    Article  Google Scholar 

  47. Su, B., & Hua, G. (2017). Order-preserving Wasserstein distance for sequence matching. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1049–1057).

  48. Su, B., & Wu, Y. (2019). Learning distance for sequences by learning a ground metric. In Proceedings of the 36th international conference on machine learning (ICML).

  49. Sun, Y., Gao, J., Hong, X., Mishra, B., & Yin, B. (2016). Heterogeneous tensor decomposition for clustering via manifold optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 476–489.

    Article  Google Scholar 

  50. Tolstikhin, I., Bousquet, O., Gelly, S., & Schoelkopf, B. (2018). Wasserstein auto-encoders. In Proceedings of international conference on learning representation.

  51. Villani, C. (2009). Optimal transport: Old and new. Berlin: Springer, chap The Wasserstein distances (pp. 93–111).

  52. Yair, O., Dietrich, F., Talmon, R., & Kevrekidis, I.G. (2019). Optimal transport on the manifold of SPD matrices for domain adaptation. arXiv:1906.00616.

  53. Zhang, S., Gao, Y., Jiao, Y., Liu, J., Wang, Y., & Yang, C. (2019). Wasserstein-Wasserstein auto-encoders. arXiv:1902.09323.

  54. Zhao, P,, & Zhou, Z. H. (2018). Label distribution learning by optimal transport. In Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI) (pp. 4506–4513).

Download references

Acknowledgements

This project is partially supported by the University of Sydney Business School ARC Bridging grant. The authors are graeteful to the anonymous reviewers for their constructive comments to improve this work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Junbin Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editor: Pradeep Ravikumar.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shi, D., Gao, J., Hong, X. et al. Coupling matrix manifolds assisted optimization for optimal transport problems. Mach Learn 110, 533–558 (2021). https://doi.org/10.1007/s10994-020-05931-2

Download citation

Keywords

  • Optimal transport
  • Doubly stochastic matrices
  • Coupling matrix manifold
  • Sinkhorn algorithm
  • Wasserstein distance
  • Entropy regularized optimal transport