Tropical Optimal Transport and Wasserstein Distances

We study the problem of optimal transport in tropical geometry and define the Wasserstein-$p$ distances in the continuous metric measure space setting of the tropical projective torus. We specify the tropical metric -- a combinatorial metric that has been used to study of the tropical geometric space of phylogenetic trees -- as the ground metric and study the cases of $p=1,2$ in detail. The case of $p=1$ gives an efficient computation of the infinitely-many geodesics on the tropical projective torus, while the case of $p=2$ gives a form for Fr\'{e}chet means and a general inner product structure. Our results also provide theoretical foundations for geometric insight a statistical framework in a tropical geometric setting. We construct explicit algorithms for the computation of the tropical Wasserstein-1 and 2 distances and prove their convergence. Our results provide the first study of the Wasserstein distances and optimal transport in tropical geometry. Several numerical examples are provided.


Introduction
In algebraic geometry, the geometry of zero sets of systems of polynomials-known as algebraic varietiesare studied using commutative algebra. Tropical geometry is a variant of this field where the polynomials are defined by the tropical algebra: the tropical sum of two elements is their maximum and the tropical product is their usual sum. Mathematical objects such as functions and curves evaluated under the tropical algebra are piecewise linear structures, and tropical varieties are polyhedral complexes. Tropical geometry is an important tool for the study of classical algebraic varieties due to many theoretical coincidences between the two settings. In addition, tropical geometry possesses the advantage of computational tractability and efficiency, as well as connections to other applied sciences. For example, it has been used in optimization theory (Richter-Gebert et al., 2005), dynamic programming in computer science (Maclagan and Sturmfels, 2015), as well as in economics and game theory (Lin and Tran, 2019). An application of tropical geometry that has gained much interest is the tropical geometric representation of the space of phylogenetic trees. In particular, there has very recently been active work in using tropical geometry as a data analytic tool for sets of phylogenetic trees (Monod et al., 2018;Yoshida et al., 2019;Page et al., 2020;Tang et al., 2020). In this paper, we study the tropical projective torus, which is the ambient space of phylogenetic trees, and build upon it to provide a set of tools for statistical, probabilistic, and geometric studies using optimal transport theory.
Optimal transport theory arises from a question posed in economics, and specifically, in the allocation of resources. It deals with optimizing transport modes when geographically displacing resources. Its mathematical formulation was established in the 18th century and has been well-studied since, resulting in strong connections and mutual implications between the domains of dynamical systems and geometry. It has also provided important results in applications and computational fields, such as computer science. An important concept arising from optimal transport is the Wasserstein distances, which are metrics on probability distributions. Intuitively, they measure the effort required to recover the probability mass of one distribution in terms of an efficient reconfiguration of the other. As such, Wasserstein distances broaden the scope of optimal transport theory to probability theory. Additionally, they have been exploited to move further beyond these realms to solve concrete problems in inferential statistics, such as in Panaretos and Zemel (2020). Establishing Wasserstein distances in tropical geometric settings thus provides a framework for a vast body of existing results in these related fields to be applicable to the important problem of statistical inference and data analysis in applied tropical geometric settings by providing a setting for the study of probability measures and distributions. Additionally, it provides an alternative mechanism to study geometric aspects of tropical objects and spaces.
Connecting algebraic theory to optimal transport theory is a new direction of research with very recent contributions involving algebraic geometry and algebraic topology. In Ç elik et al. (2019), the Wasserstein distance between a probability distribution and an algebraic variety is minimized via transportation polytopes. In topological data analysis, where algebraic topology is leveraged to reduce the dimensionality of complex data spaces and extract shape features within the data, optimal transport theory has improved computational efficiency (Lacombe et al., 2018) and also has been used to study geometric aspects of algebraic topological invariants (Divol and Lacombe, 2019). A prior transportation problem (distinct from the optimal transport setting) has been previously considered in tropical geometry by Richter-Gebert et al. (2005). Our work in this paper presents the first connection between tropical geometry and optimal transport theory. Specifically, we consider an infinite metric measure space in a continuous tropical geometric setting endowed with a combinatorial ground metric. Numerical computations of optimal transport with various ground metrics has been recently studied in the continuous setting and shown to be efficient (Benamou, Jean-David et al., 2018;Li et al., 2018). Additionally, studying the optimal transport problem provides a computational framework for the probability density space, which also encodes the geometry of sample space (Lafferty, 1988;Otto and Villani, 2000;Otto, 2001;Villani, 2008). In solving the optimal transport problem, we thus define tropical Wasserstein distances and provide algorithms for our proposed tropical Wasserstein distances. Collectively, these results offer tools for probabilistic, statistical, and geometric inference in a tropical geometric setting, which then may be translated to other applications where tropical geometry plays an important computational and interpretive role.
The remainder of this paper is organized as follows. Section 2 gives an overview of tropical geometry and the tropical projective torus as our ground space of interest. We present and review properties of the tropical metric, which endows this space with a metric structure; we also give some variational forms for the tropical metric. Section 3 overviews the problem of optimal transport and the role of the Wasserstein distances in this framework. We then define the tropical Wasserstein-p distance, with the tropical metric as the ground metric and the tropical projective torus as the ground space; we also give variational forms of the tropical Wasserstein distance. We study the specific cases of p = 1 and 2: the p = 1 case gives a method for computing all infinitely many tropical geodesics, while in the case of p = 2, the Wasserstein metric is amenable to statistical analysis by providing an inner product structure on probability measures on the tropical projective torus. Section 4 gives algorithms to explicitly compute the tropical Wasserstein-p distances, while Section 5 presents the results of several numerical experiments implementing our proposed algorithms. We close the paper with a discussion in Section 6 on future research stemming from the work presented in this paper.
2 Tropical Geometry, the Tropical Projective Torus, and the Tropical Metric In this section, we give the basics of tropical geometry that are relevant for our work. We then present the tropical projective torus as our ground space of interest, and the tropical metric as the ground metric on this space. We also give alternative versions of the metric in terms of variational forms. This is the metric with respect to which we will define the tropical optimal transport problem and the tropical Wasserstein-p distances.

Essentials of Tropical Geometry: Tropical Algebra
Tropical geometry may be seen as a subdiscipline of algebraic geometry. In the latter, the zero sets of systems of polynomial equations are studied using algebraic methods; in the former, these polynomials are defined via the tropical semiring, (R ∪ {−∞ }, , ) where addition between two elements is given by their max and multiplication is given by their sum: Notice that tropical subtraction is not defined, therefore resulting in a semiring, rather than a ring. Both operations of the semiring are commutative and associative; multiplication distributes over addition. Tropicalization refers to interpreting classical arithmetic operations with their tropical counterparts. Using these operations, lines, polynomials, and other more general mathematical constructions can be built, which will result in "skeletal" piecewise linear structures.

The Tropical Projective Torus
Tropical geometry naturally gives rise to polyhedral structures. The interplay between algebraic geometry and polyhedral geometry results in new interpretations of important concepts which form the building blocks for the study of tropical geometry. An important example is the reinterpretation of a fundamental object in computational algebraic geometrythe Gröbner basis. A Gröbner basis is a particular generating set of an ideal in a polynomial ring over a field; computing Gröbner bases is one of the main approaches in solving systems of polynomials, which is a central problem in algebraic geometry. Reinterpreting Gröbner bases using valuations (functions over fields that give a notion of its size) gives rise to Gröbner complexes. Gröbner complexes lead to universal Gröbner bases, which are analogs to tropical bases; see Maclagan and Sturmfels (2015) for full details of this construction. The Gröbner complex is thus a fundamental object in tropical geometry; it is a polyhedral complex constructed for a homogeneous ideal in the polynomial ring K[x 0 , x 1 , . . . , x n ] over a field K. The ambient space of a Gröbner complex is the tropical projective torus, denoted by R n+1 /R1. In this paper, we consider the tropical projective torus as our ground space of interest.
The tropical projective torus is the quotient space that identifies vectors differing from each other by tropical scalar multiplication (or classical addition). It is generated by the following equivalence relation ∼ on R n+1 : x ∼ y ⇔ x 1 − y 1 = x 2 − y 2 = · · · = x n+1 − y n+1 .
Mathematically, R n+1 /R1 is constructed in the same manner as the complex torus: take a lattice Λ ∈ C n+1 as a real vector space, then the complex torus is C n+1 /Λ. For x ∈ R n+1 , letx be its image in R n+1 /R1. The tropical projective torus identifies with R n by taking representatives of the equivalence classes whose last coordinate is zero: We denote an element in R n+1 by x, an element in R n+1 /R1 byx, and an element in R n by x = (x 1 − x n+1 , . . . , x n − x n+1 )-which is the image ofx in R n .
The Space of Phylogenetic Trees. One important practical example that arises in the tropical projective torus is the space of phylogenetic trees, T N (where N is the fixed number of leaves in a tree). Speyer and Sturmfels (2004) identify an equivalence between the space of all phylogenetic trees and a tropical geometric space via a homeomorphism (Maclagan and Sturmfels, 2015;Monod et al., 2018;Lin et al., 2017). The space of phylogenetic trees is contained within the tropical projective torus. In other words, the tropical projective torus is also the ambient space of phylogenetic trees. Although the space of phylogenetic trees is a proper subset of the tropical projective torus, it possesses a very complex structure that is not yet well understood. In particular, it is connected and possesses a polyhedral structure, but is not convex (Monod et al., 2018;Lin et al., 2017). Additionally, trees are defined by a specific combinatorial condition, which makes the precise characterization of the space of phylogenetic trees and establishing its boundary within the tropical projective torus difficult. The dimension of tree space also is lower than the tropical projective torus: its dimension grows linearly in the number of leaves in a tree, while for the tropical projective torus, the dimension grows quadratically. Figure 1: Diagram illustrating embedding of and relationships between Euclidean spaces and the tropical projective torus. The dashed arrow represents the isometry of the tropical metric between all three spaces.

The Tropical Metric
The tropical projective torus R n+1 /R1 becomes a metric space when endowed with a generalized Hilbert projective metric function (Cohen et al., 2004;Akian et al., 2011), which is a combinatorial metric that is tropical in nature. It has been referred to as the tropical metric in recent literature (Lin et al., 2017;Monod et al., 2018). Our work here is based on the ambient tree space given by the tropical projective torus endowed with the tropical metric.
Definition 1. For a point x ∈ R n+1 , denote its coordinates by x 1 , x 2 , . . . , x n+1 and its representation in the tropical projective torus R n+1 /R1 byx. The tropical metric on R n+1 /R1 is given by When considering the representatives of the equivalence classes as in (1), the tropical metric translates to the following between R n+1 /R1 and R n : forx,ȳ ∈ R n+1 /R1 and x, y ∈ R n , . Figure 1 illustrates the relationship where R n+1 identifies with R n+1 /R1 by the equivalence relation ∼; R n+1 /R1 then embeds into R n . The metric d tr is defined on R n+1 /R1 and has a representation in R n ; it is an isometry from R n+1 to R n+1 /1 to R n . Again, recall the notation that an element in R n+1 is denoted by x, an element in R n+1 /R1 is denoted byx, and an element in R n is denoted by x = (x 1 − x n+1 , . . . , x n − x n+1 ).

Variational Forms of the Tropical Metric
It turns out that the tropical metric may be considered in terms of unknown functions and corresponding differential equations, which provides an alternative formulation for the tropical metric in terms of a variational form. Variational forms are useful in computational studies, since numerically, it is often easier to find solutions to variational problems rather than differential equations. As we will see further on, this turns out to be an important advantage in explicit computations of the tropical Wasserstein distances and associated results.
where v, z : [0, 1] → R n and we define the tropical Lagrangian L tr (·) as the tropical norm for a ∈ R n as follows: (3) it suffices to show that the integral is always no less than any of Similarly, for any 1 ≤ i, j ≤ n, by definition of L tr , we have By (4), we get Example 5. When n = 2, The above variational form (2) of d tr (·, ·) may be further generalized as follows.
Corollary 6. Forx,ȳ ∈ R n+1 /R1, let L tr be the same as in Proposition 4. For p > 1, we have Proof. When z(t) = t · y + (1 − t) · x, v(t) is still the constant y − x and the equality still holds. In addition, by the Hölder inequality, Hence for any z : [0, 1] → R n and v(t) = dz dt , as in Definition 1.

Optimal Transport and the Tropical Wasserstein-p Distances
We now give a brief background on and a description of the problem of optimal transport; we also formally present the setting of the optimal transport problem specific to our work. The question underlying the theory of optimal transport can be posed in a very basic and intuitive manner as follows: What is the most efficient way to move a given pile of dirt from one location to another? The total volume of the dirt must remain intact, but the shape and form of the pile may change during transportation and arrive at its location in a differently shaped pile. This problem has been recast mathematically in various formulations with various assumptions. There is a vast literature of historical as well as technical aspects and perspectives on the optimal transport problem; see for example Villani (2003Villani ( , 2008; Ambrosio and Gigli (2013) for detailed discussions.

Optimal Transport and Probability
Adapting the intuitive description of the optimal transport problem above to a more mathematically formal setting, we may view the pile of dirt as a probability measure to be transported over a space-or alternatively, one probability distribution to be transformed into another-which gives us a probabilistic and statistical perspective on the problem.
A key factor in solving the optimal transport problem is the cost function, which gives the cost of moving the pile of dirt, or the transporting the probability measure. Mathematically, this is generally a function of two variables-an origin or "start" location and destination or "end" location-which maps to the positive real line to give the cost, and may take into account any number of factors. In the simplest case, however, when the cost of moving the pile of dirt from its origin to destination is nothing more than the distance between the origin and destination, the solution to the optimal transport problem yields the Wasserstein distance (for a fixed dimension). Intuitively, the Wasserstein distance gives the minimum cost of transforming one probability distribution into another. This minimum cost is simply the "amount of dirt" to be transported, multiplied by the mean distance it must be moved. In the case of probability distributions that contain a total mass of 1, the minimum cost is therefore simply the mean distance it must be moved. More precisely, the Wasserstein distance is a distance function for probability distributions defined on a given metric space, referred to as the ground space and the associated metric is referred to as the ground metric; these concepts are formalized further on in Definition 8. The Wasserstein distance is thus a useful tool for comparing distributions.  Here, the plane below depicts the tropical projective torus R n+1 /R1 is the ground space; it is equipped with the tropical metric. This space admits well-defined probability measures (Monod et al., 2018). Collecting these probability measures as a separate space yields the space of probability measures on R n+1 /R1; in this figure, it is depicted in the manifold above. This space can be equipped with a particular metric-the Wasserstein distance. The Wasserstein distance is therefore defined on the space of probability measures on R n+1 /R1; it measures distances between probability measures on the tropical projective torus. In this illustrative figure, we also show the space of phylogenetic trees with N leaves, T N , as a figurative proper non-convex subset of the tropical projective torus R n+1 /R1. The probability measures associated with this specific subset of R n+1 /R1 are depicted in the Wasserstein space of probability measures above, which is also non-convex (see Remark 11).
Specific Setting. In our work, the ground space is the tropical projective torus and the ground metric is the tropical metric. We consider the set of all probability measures on the tropical projective torus, which exist and are well-defined (Monod et al., 2018), as a space. This work defines and constructs Wasserstein distances as a metric on these probability measures associated with the tropical projective torus. Figure 2 provides a conceptual illustration of the relationship between the ground space, equipped with a ground metric, and the Wasserstein space of probability measures over the ground space, equipped with the Wasserstein distance.
Wasserstein Distances as Metrics Between Probability Distributions. Although other metrics for probability distributions exist in the literature on mathematical statistics, the Wasserstein distance possesses desirable computational and intuitive properties. To illustrate a few such properties, let us consider random variables X, Y defined on R d distributed as X ∼ P and Y ∼ Q with densities p and q, respectively. Three commonly-used measures for distances between P and Q are total variation, 1 2 |p − q|; Hellinger, When comparing one discrete versus one continuous distribution, these distances yield results that are not very informative. Let P be uniform on [0, 1], and let Q be uniform on {0, 1/n, 2/n, . . . , 1}. The total variation distance between these distributions is 1, which is the total size of each of the two sets, and the largest that any distance can be, while the Wasserstein distance is 1/n. These distances also do not take into account the underlying geometry of the space on which the distributions are defined. Consider the three densities p 1 , p 2 and p 3 shown in Figure 3. We have and similar results for the Hellinger and L 2 distances, however, intuitively, we would like to think of p 1 and p 2 being more similar and and hence closer to each other than to p 3 . The Wasserstein distance is able to make this distinction.  (2019). The total variation, Hellinger, and L 2 distances between these three densities are the same, while the Wasserstein distance between p 1 and p 2 is smaller than that between either p 1 or p 2 and p 3 .
In computing a distance between distributions, we arrive at some measure of their similarity or dissimilarity, but the total variation, Hellinger, L 2 , and other distances do not provide any information on how or why the distributions are qualitatively different. Perhaps the most helpful property of the Wasserstein distance is that, in addition to a measure of distance between the distributions, we also obtain a map that describes how P morphs into Q. This map is known as a transport plan.
In addition to the illustrative examples discussed above, there are other desirable computational and statistical properties of the Wasserstein distance, such as stability to small perturbations and a well-behaved and intuitive Wasserstein Fréchet mean. Further details and more complete discussions on statistical aspects of the Wasserstein distance can be found in Panaretos and Zemel (2019); Wasserman (2019).
Aside from statistical aspects, there also exist other analytic advantages of the Wasserstein distances, depending on the context. For instance, the Wasserstein distances' intimate connection to optimal transport problems inherently make them natural tools in these and other settings with foundations in partial differential equations.
Wasserstein Distances and Phylogenetic Trees. Wasserstein distances have been previously studied in the context of phylogenetic trees. A single tree itself may be treated as a metric space, for instance, by considering genetic distances which measure distances between pairs of sequences on a single tree; the metric here is defined within the tree itself. When considering a single tree, the context is a finite metric space. Wasserstein distances have been defined and studied in these contexts, such as in Evans and Matsen (2012), where probability distributions giving rise to individual trees are compared. Kloeckner (2015) studies geometric properties of measures on equidistant trees (i.e., rooted trees with equal branch lengths from the root to all leaves) using Wasserstein distances. For finite spaces, Sommerfeld and Munk (2018) conduct statistical inference studies for empirical Wasserstein metrics computed from datasets. Very recently, Le et al. (2019) studied the sliced formulation of optimal transport-developed to alleviate computational and statistical drawbacks of optimal transport theory-on tree metrics. Sato et al. (2020) furthermore propose an extremely fast algorithm that solves the optimal transport problem to compute Wasserstein distances on a tree with one million nodes in less than one second. The setting of these works all differ from the study of Wasserstein distances on the space of phylogenetic trees.
In the context of tree spaces, other probability-based distances between trees have also been proposed (Garba et al., 2017(Garba et al., , 2020. These are related, but are nevertheless strictly different from the notion of distances between probability measures over tree space. The contributions of these works are classical measures between probability distributions on genetic sequences that make up trees, which then induce probabilistic distances between trees, including Hellinger distances and Kullback-Leibler divergences. Kullback-Leibler divergences measure the difference in terms of information gain between models of statistical inference (Kullback and Leibler, 1951). Outside the scope of interest of this paper, other tree spaces have also been proposed that are not probability-based; an example of a combinatorial construction based on posets that turns out to be related to tree-reconstruction using Markov processes is the edge-product space (??).

Formalizing the Optimal Transport Problem and Defining the Wasserstein-
p Distances.
Monge (1781) is largely recognized to have provided the first mathematical formalization of the optimal transport problem described above, while the subsequent probabilistic reinterpretation by Kantorovich (2006) lead to a fundamental computational breakthrough that seeded the development of linear optimization. As such, the statement of the mathematical optimal transport problem is often referred to as the Monge-Kantorovich transport problem and presented in the setting of measure theory. We now give an overview of this presentation.
Definition 7. Let Ω and Ω be separable metric spaces that are Radon spaces (that is, any probability measure on each space is a Radon measure). Let c : Ω × Ω → [0, ∞] be a Borel-measurable cost function. For ρ 0 ∈ P(Ω) and ρ 1 ∈ P(Ω ) where P(·) denotes the collection of probability measures on the respective spaces, the Monge-Kantorovich transport problem is to find a probability measure π on Ω × Ω such that is achieved. Here, Π(ρ 0 , ρ 1 ) denotes the collection of all probability measures on Ω × Ω with marginal measures ρ 0 on Ω and ρ 1 on Ω .
When the cost function is lower semi-continuous, and given that Ω and Ω are Radon spaces, Π(ρ 0 , ρ 1 ) is tight, and therefore a solution to the Monge-Kantorovich transport problem always exists under these conditions (e.g., Ambrosio et al., 2008). From this formulation, the Wasserstein-p distance may be defined as follows.
Definition 8. Let (Ω, d) be a separable metric Radon space. Let p ≥ 1 and P p (Ω) be the collection of all probability measures µ on Ω such that µ has finite pth moment for some x 0 ∈ Ω; i.e., +∞. The Wasserstein-p distance between probability measures ρ 0 , ρ 1 ∈ P p (Ω) is given by where, as before, Π(ρ 0 , ρ 1 ) is the collection of all probability measures on Ω × Ω with marginal measures ρ 0 and ρ 1 on the respective copies of Ω. Equivalently, we have where E[·] denotes the expectation, and the infimum is taken over all joint distributions of random variables X and Y with respective marginals ρ 0 and ρ 1 . The metric d is referred to as the ground metric; the function π is known as the transport plan.
The transport plan π(x, y) is a function that describes a way to move the measure ρ 0 into ρ 1 , and between locations x and y; transport plans are not unique. Since the total mass moved out of a region around x must be equal to ρ 0 (x)dx and the total mass moved into a region around x must be ρ 1 (x)dx, we have the following restrictions on a transport plan: In other words, π is a joint probability distribution with marginals ρ 0 and ρ 1 . The total infinitesimal mass which moves from x to y, therefore, is π(x, y)dxdy and the cost of moving this amount of mass from x to y is c(x, y)π(x, y)dxdy. The total cost is then C = c(x, y)π(x, y)dxdy = c(x, y)dπ(x, y).
The optimal transport plan is the π which achieves the minimal value of C: c(x, y)dπ(x, y).
If the cost of a move c(x, y) is no more than the distance between the two points d(x, y), then the optimal cost value C * is identically the Wasserstein-1 distance, W 1 .
Remark 9. In the particular case where p = 1, the Wasserstein-1 distance is also referred to as the Kantorovich-Rubinstein distance, and the earth mover's distance (EMD) in the computer science literature.
Remark 10. The Wasserstein distances satisfy all conditions for a formal definition of a metric (e.g., Villani, 2008). If the condition of finite pth moment is relaxed, the Wasserstein distances may technically be infinite, and therefore not a metric in the strict sense.
Remark 11. For any p ≥ 1, if (Ω, d) is a complete and separable metric space, then so too is (P p (Ω), W p ) (e.g., Villani, 2008). Other geometric properties between the ground space and its associated Wasserstein distance also hold, including compactness, convexity, as well as non-convexity. An adaptation of the Brunn-Minkowski theorem (Brunn, 1887;Minkowski, 1896) relating volumes of compact and convex sets, as well as its generalization to non-convex sets by Lyusternik (1935), for comparative relations between ground and Wasserstein spaces also exists (Villani, 2008). The geometric implication of these results is that compact, non-convex subsets of the ground space with respect to the ground metric correspond to non-convex subsets in the Wasserstein space of probability measures (with generalized Ricci curvature bounds) over the ground space with respect to the Wasserstein distance.
In the applicative setting of our work concerning the space of phylogenetic trees as a non-convex subset of the tropical projective torus, the implication is that the corresponding space of probability measures associated with the space of phylogenetic trees is also non-convex with respect to the Wasserstein distances. (Compactness of tree space can be established by fixing an upper bound on the height of trees.) This provides a geometric compatibility between the space of phylogenetic trees equipped with the tropical metric and its associated space of probability measures equipped with Wasserstein distances. See Figure 2 for an illustrative description of this relationship.
A Time-Dependent Cost Function: Formulating a Hamiltonian. In formulating the above variational forms of the tropical metric (2) and (5), the notation with respect to t is not by coincidence and purposely alludes to a dependence upon time. Within the setting of Wasserstein distances and their relation to the optimal transport problem where the ground metric is itself the cost function, intuitively, a time-dependent ground metric corresponds to a cost function where time is a cost factor.
Considering time dependence allows for a rich and alternate formulation of the optimal transport problem, which extends to the continuous displacement of measures-precisely the setting of the tropical metric on the tropical projective torus as a continuous metric measure space. However, there are certain instances where continuous displacement problems turn out to be equivalent to steady-state, time-independent problems with an alternate formulation that favors computational efficiency: this occurs when the Lagrangian L is homogeneous of degree 1 and convex.
Lemma 12. The tropical Lagrangian L tr defined in (3) is convex on R n . More specifically, for a, b ∈ R n and 0 ≤ w ≤ 1, we have (1 − w) a tr + w b tr ≥ (1 − w)a + wb tr .
Proof. By definition, So either there exist 1 ≤ j, k ≤ n such that Note that We also have Hence Lemma 12 holds in either case.
Remark 13. Note that convexity of L tr also implies convexity of 1 p L p tr . The convexity of the tropical Lagrangian L tr then allows for the formulation of the Hamiltonian (Villani, 2008, Example 7.5) for b ∈ R n as follows: We now explicitly compute the value of the Hamiltonian (7), which will provide concise formulations with regard to the tropical Wasserstein-p distances. For convenience, and identifying R n+1 /R1 with R n , for b ∈ R n we define Proposition 15. The value of H(b) is: (i) 0 when b = 0, or ζ(b) ≤ 1 and p = 1; (ii) ∞ when b = 0 and p < 1, or ζ(b) > 1 and p = 1; Proof. (i) When b = 0, n i=1 b i a i is always zero, and L tr (a) ≥ 0, so H(b) ≤ 0. However, when a = 0, the right-hand side of (7) is zero, so H(0) = 0. When ζ(b) ≤ 1 and p = 1, let Hence H(b) ≤ 0, and equality holds when a = 0. So H(b) = 0.
(ii) Now we may assume that b = 0 and thus ζ(b) > 0. We may choose nonempty S ⊂ {1, 2, . . . , n} such that For any N > 0 and each 1 ≤ i ≤ n, we let , which is N . Since ζ(b) > 0, when p < 1, or ζ(b) > 1 and p = 1, we have (iii) We denote u, v as in (i) above. Then Let s := u − v ≥ 0. We need to find the maximum of ζ(b)s − 1 p s p when s ≥ 0. The derivative of this function of s is Hence the function is increasing when 0 ≤ s ≤ ζ(b) 1 p−1 , and it is decreasing when s ≥ ζ(b) 1 p−1 . So the maximum is attained when s = ζ(b) 1 p−1 , thus Finally, as in (ii), we may choose nonempty S ⊂ {1, 2, . . . , n} such that and the equality holds when For notational convenience, we also define η : That is, η(b) i is defined by (9).
The Tropical Wasserstein-p Distances. We consider the tropical projective torus as a probability space (Monod et al., 2018) with finite pth moment as follows: Within the optimal transport framework discussed above and as in Definition 8, the tropical Wasserstein-p distance is given as follows:W tr p : where the infimum is taken over the set of all possible joint distributions (transport plans) π with marginals ρ 0 and ρ 1 , Π(ρ 0 , ρ 1 ). Here, the distanceW tr p depends the choice of p in the linear programming formulation (10). The following alternative gives an equivalent definition of the tropical Wasserstein-p distances.
Definition 16 (Tropical Wasserstein-p distance). The tropical Wasserstein-p distance is given by such that the following dynamical constraint or continuity equations hold: Here · tr is the tropical norm, ρ 0 , ρ 1 ∈ P p (R n ), ∇, ∇· are gradient and divergence operators in R n , and the infimum is taken over all continuous density functions ρ : [0, 1] × R n → R, and Borel vector fields v : [0, 1] × R n → R n .
Here, the formulation (11) given by the pairs (11a) and (11b) is known as the Benamou-Brenier formula, given by Benamou and Brenier (2000). As discussed in Chapter 8 of Villani (2003), when c satisfies suitable conditions, the linear programming formulationW tr p is equivalent to the dynamical formulation W tr p . In this work, we focus on the dynamical formulation (11) with p = 1, 2 for their concrete implications on computations of the tropical projective torus.

The Tropical Wasserstein-1 Distance
We first study the case p = 1. In this case, it turns out that the tropical Wasserstein-1 distance W tr 1 may be recast as the following minimization problem.
Proposition 17 (Minimal Flux Formulation). By identifying R n+1 /R1 with R n as discussed in Section 2.3, the tropical Wasserstein-1 distance satisfies where the infimum is taken over all Borel flux functions m : R n → R n .
Concretely, m(x) is the flux vector field that assigns a vector to each point in the measure and determines how much of the mass (measure) should be moved, and in which direction.
The reformulation of the tropical Wasserstein-1 distance given in Proposition 17 has enormous computational benefits, compared to that given in Definition 8 . Notably, the size of the optimization variable is much smaller in solving a discrete approximation; additionally, the structure of the formulation given in Proposition 17 borrows from L 1 -type minimization problems, which are well-studied and for which there exist fast and simple algorithms (see references in Li et al. (2018)). We will reap these benefits in formulating explicit algorithms to compute the tropical Wasserstein-p distances for p = 1, 2, as discussed further on in Section 4.
Geodesics on the Tropical Projective Torus. Geodesics on the tropical projective torus are not unique (Lin et al., 2017;Monod et al., 2018). In particular, between any two given points in R n+1 /R1, there are infinitely many geodesics. The following result gives the explicit connection between geodesics on the tropical projective torus and the minimizer of the tropical Wasserstein-1 distance.
Proposition 18 (Minimizer of the Tropical Wasserstein-1 distance). The minimizer of the tropical Wasserstein-1 distance is given by the following pair: Proof. The minimizer of tropical Wasserstein-1 distance may be derived as follows. Define a Lagrange multiplier Φ : R n → R for the equality constraint of (12), and consider the saddle point problem Notice that L is convex in m and concave in Φ. Thus, the saddle point (m, Φ) satisfies δ m L(m, Φ) = 0, δ Φ L(m, Φ) = 0. This corresponds to the equation pair (13).
Remark 19. We notice that the first equation in (13) represents the tropical Eikonal equation The tropical Eikonal equation describes the movement of each particle according to the infinitely many geodesics under the tropical metric between ρ 0 to ρ 1 . This behavior will be explored and demonstrated numerically in experiments further on in Section 5.
Proposition 20. The set of all infinitely many tropical geodesics is contained in a classical convex polytope.
Soc belongs to a tropical ellipse with fociā,b. By Proposition 26 of Lin and Yoshida (2018), the set of all points on tropical geodesics is a classical convex polytope.

The Tropical Wasserstein-2 Distance
We now consider the case where p = 2. Here we refer to (9) using the notation η(b).
Proposition 21 (Minimizer of the Tropical Wasserstein-2 Distance). The minimizer of the tropical Wasserstein- where η : R n → R n is given by where S is as in (8). Also, In particular, if ρ(t, x) > 0, then Proof. The minimizer path for the tropical Wasserstein-2 distance is derived as follows. For p = 2, denote Then the variational problem (11) can be reformulated as Notice that variational problem (15) is convex in (m, µ). Again, we denote the Lagrange multiplier Φ : [0, 1]× R n → R, then we can reformulate (15) into a saddle point problem.

Algorithms: Solving the Optimal Transport Problem
In this section, we design algorithms for solving the optimal transport problems that give rise to the tropical Wasserstein-p distances and geodesics. Our approach is mainly based on the G-Prox primal-dual hybrid gradient (G-Prox PDHG) algorithm (Jacobs et al., 2019), which is a modified version of Chambolle-Pock primal-dual algorithms Pock and Chambolle, 2011). We now provide a brief overview of the algorithm; see Jacobs et al. (2019); Chambolle and Pock (2011);Pock and Chambolle (2011) for further details. The classical primal-dual hybrid gradient algorithms convert the following minimization problem min into the following saddle point problem where f and g are convex functions with respect to a variable X, f * is a convex dual function of F , and K is a continuous linear operator. For each iteration, the algorithm performs gradient descent on the primal variable X and gradient ascent on the dual variable Y as follows: where suitable norms need to be considered in the update. For the tropical Wasserstein-1 and Wasserstein-2 distances, we apply the algorithm in (16) to (12) and (15) by setting Y = Φ and specifying W tr 1 : X = m, In this paper, we use a version of the G-Prox PDHG algorithm that applies the H 1 norm in the dual variable Y update and uses the L 2 norm in the primal variable X update. This choice of norms gives us more stable and faster convergence of the algorithm than the standard PDHG algorithm .

Computing the Tropical Wasserstein-1 Distances
We consider here p = 1. We first present the spatial discretization to compute the general Wasserstein-1 distance.
Consider a uniform lattice graph G = (V, E) with spacing ∆x to discretize the spatial domain, where V is the vertex set V = {1, 2, . . . , N }, and E is the edge set. Here i = (i 1 , . . . , i d ) ∈ V represents a point in R d . Consider a discrete probability set supported on all vertices: where q i here represents a probability at node i, i.e., q i = C i ρ(x)dx, and C i is a cube centered at i with length ∆x. Thus, ρ 0 (x), ρ 1 (x) is approximated by . We use two steps to compute the Wasserstein-1 distance on P(G). We first define a flux on a lattice. Denote the flux matrix as m = (m , where e v = (0, . . . , ∆x, . . . , 0) , with ∆x at the vth column. In other words, if we denote i = (i 1 , . . . , i d ) ∈ R d and m(x) = (m 1 (x), . . . , m d (x)), then We consider a zero flux condition: if a point i + 1 2 e v is outside the domain of interest Ω, we let m i+ 1 2 ev = 0. Based on such a flux m, we define a discrete divergence operator div G (m) : We next introduce the discrete cost functional This gives rise to the following optimization problem in the tropical setting for i = 1, . . . , N ; v = 1, . . . , d.
We solve (17) by studying its saddle point structure. Denoting the Lagrange multiplier of (17) as Saddle point problems such as (18) are well studied by the first-order primal-dual hybrid gradient (PDHG) algorithm. Implementing the G-Prox PDHG algorithm gives the following iteration steps: where the quantities h, τ are two small step sizes, and These steps alternate a gradient ascent in the dual variable Φ, and a gradient descent in the primal variable m.
It turns out that iteration (19) can be solved by simple explicit formulae. Since the unknown variables m, Φ are component-wise separable in this problem, each of its components m i+ 1 2 , Φ i can be independently obtained by solving (19). First, notice that The first iteration in (19) has an explicit solution, which is: where the shrink operator is a projection operation to the unit ball with norm · tr . Its exact formulation is given further on in Proposition 23. Second, consider Thus the second iteration in (19) becomes where ∆ G = div G · ∇ G is the discrete Laplacian operator.
We are now ready to state our algorithm.

1.
while the relative error of m tr > 2. m k+1 Remark 22. The relative error at iteration k is given by | m k tr − m k−1 tr | m k−1 tr .
In the algorithm, we require the shrink operator with respect to the tropical metric, shrink tr , which is given in the following result.
Proposition 23. Let h > 0 and b 1 ≥ b 2 ≥ · · · ≥ b k ≥ 0 > b k+1 ≥ · · · ≥ b n . We denote and We let is the following unique point x ∈ R n , where Proof. Note that by definition of t 1 , t 2 , they are bounded by all of u i with i ≤ j 1 and all of v i with i ≤ j 2 , respectively. In addition, we have Now we claim that Notice that (21) implies that We also have that (22) implies that Hence, the right-hand side of (23) So our claim is proved.
Since h > 0 is a constant, we can multiply the objective function in (20) by 2h. Now, this new function is greater than or equal to The global minimum of the last quadratic polynomial is attained exactly at the point x in Proposition 23, so we have a lower bound for the new objective function, which is given when a = x. Finally, we note that the equality of (23) is attained at x, so this value is actually attained by a = x.
b k j 1 j 2 1 2 Remark 25. Proposition 23 provides an algorithm to compute the shrink. Suppose we have h > 0 and a 0 , b ∈ R n and we would like find shrink tr (a 0 + hb, h) = argmin a∈R n |a − a 0 | 2 Then we let b = b + a0 h , the optimization problem becomes the one in Proposition 23 for b and h after sorting the coordinates of b .

Computing the Tropical Wasserstein-2 Distances
We now present an algorithm to compute the tropical Wasserstein-2 distance in the tropical projective torus R 3 /R1 identified with R 2 . Consider the same uniform lattice graph on a domain Ω ⊂ R 2 as in the case for the tropical Wasserstein-1 distance. Define the following matrices where the time interval is discretized uniformly with N t points, and N x is the number of vertices from a uniform lattice graph. Here we assume Neumann boundary conditions for ρ: ∂ρ ∂n = 0 on ∂Ω, wheren is a outward normal vector. Given initial densities ρ 0 and ρ 1 , the boundary conditions for ρ at t = 0 and t = 1 are Nt . We can reformulate the minimization problem (15) into a discretization as follows: In R 2 , using (3), we can calculate the tropical norm of the flux function m by considering the six different cases based on {m i+ 1 2 ev } 2 v=1 . The tropical norm of m is given as follows: Nt n=1 here be the Lagrange multiplier which satisfies the Neumann boundary condition on the boundary of the domain. The minimization problem (24) can be reformulated as a saddle point problem.
Again, we implement G-Prox PDHG to solve the problem as follows: where h, τ are two small step sizes and From (26), each component m n i+ 1 2 , ρ n i , and Φ n i can be obtained. From the first iteration, We calculate the minimizer by differentiating the equation with respect to ρ n i . The minimizer ρ k+1 is a positive root of the following cubic polynomial: Thus, we can calculate the root by using a cubic solver.
where root + (a, b, c) is a solution for a cubic polynomial x 3 + ax 2 + bx + c = 0.
We can reformulate the second iteration as follows: Differentiating the equation with respect to m n i+ 1 2 , we obtain the following expression: Solving this expression gives an explicit solution for (m n i+ 1 2 ) k+1 : Let µ = τ /(ρ n i ) k+1 and c = (c 1 , c 2 ) be The function F (c, µ) is then given as follows: Similarly, we get an explicit formula of Φ k+1 from the third iteration. Then the relative error at iteration k is calculated as We are now ready to present our algorithm to compute the tropical Wasserstein-2 metric.

1.
while the relative error of

Convergence
Our proposed primal-dual algorithms for the tropical Wasserstein-1 and tropical Wasserstein-2 distances converge to their respective minimizers as given by Propositions 18 and 21.
Theorem 26. (i) Consider the G-Prox PDHG algorithm to compute the tropical Wasserstein-1 distance.
Proof. The proof follows that of Theorem 1 in Pock and Chambolle (2011). We justify the conditions in Pock and Chambolle (2011). In the case of (i), we write the Lagrangian L as Observe that g, f * are convex functions and K is a linear operator. Then there exists a saddle point (m * , Φ * ). Notice that the preconditioning norm for Φ is Σ := µ(−∆ G ) −1 and the preconditioning norm for m is T := τ · Id where Id is an identity operator. Thus, the algorithm converges when Σ 1 2 KT 1 2 2 2 < 1. This is our condition √ τ µ (−∆ G ) − 1 2 div G 2 < 1, which finishes the proof. A similar argument holds for (ii).

Numerical Experiments
In this section, we present the results of numerical experiments solving the tropical optimal transport problem for three different sets of initial densities using our proposed G-Prox primal-dual methods for L 1 and L 2 . In particular, we give the minimizers of L 1 and L 2 tropical optimal transport problems from each experiment. Experiment 1. We consider a two-dimensional problem on Ω = [0, 1] × [0, 1]. The initial densities ρ 0 and ρ 1 are same sizes of squares centered at ( 1 3 , 1 3 ) and ( 2 3 , 2 3 ), respectively. In this experiment, the parameters are Figure 4 shows the minimizer m(x) of the tropical Wasserstein-1 distance and Figure 5 shows the minimizer ρ(t, x) of the tropical Wasserstein-2 distance.
Experiment 2. Similar to Experiment 1, we consider a two dimensional problem on Ω = [0, 1] × [0, 1]. The initial densities ρ 0 and ρ 1 are same sizes of squares centered at ( 1 3 , 2 3 ) and ( 2 3 , 1 3 ) respectively. The same parameters are set as in Experiment 1. Together with Experiment 1, Experiment 2 shows that the minimizers of tropical optimal transport show different geodesics depending on the positions of initial densities. See Figure 6 for L 1 result and Figure 7 for L 2 result. Experiment 3. We again consider a two dimensional problem on Ω = [0, 1] × [0, 1]. The initial density ρ 0 at time 0 is a square centered at (0.5, 0.5) with width 0.2. The initial density ρ 1 at time 1 is four squares of the same size centered at (0.2, 0.2), (0.2, 0.8), (0.8, 0.2) and (0.8, 0.8) with width 0.1. The same parameters are set as in Experiment 1. See Figure 8 for the L 1 result and Figure 9 for the L 2 result; notice that the geodesics of minimizers from both results depend on the direction in which the densities travel. We see that Experiment 3 coincides with Experiments 1 and 2.
Software. Software to implement the numerical experiments presented in this paper is publicly available and located on the TropicalOT GitHub repository at https://github.com/antheamonod/TropicalOT.

Discussion
In this paper, we connected optimal transport theory-specifically, dynamic optimal transport-with tropical geometry. In particular, we explicitly formulated geodesics for the tropical Wasserstein-p distances over the tropical projective torus. The tropical projective torus is the ambient space of the polyhedral Gröbner complex of a homogeneous ideal in a polynomial ring K[x 0 , x 1 , . . . , x n ] over a field K-a foundational object in tropical geometry. It is also the ambient space of the space of phylogenetic trees.
We constructed and implemented primal-dual algorithms to compute tropical Wasserstein-1 and 2 geodesics on the tropical projective torus. These results provide a framework to identifying all infinitely-many geodesic paths between points in this space, which leads to a better understanding of paths on the ambient space containing important structures in tropical geometry theory as well as in practice and applications. In addition, the Wasserstein-2 distance possesses an important structure for statistical inference, since it provides the form for Fréchet means on the tropical projective torus, as well as a general inner product structure.
Our research lays the foundation for further connections between optimal transport and tropical geometry. Our work provides powerful tools to study important aspects such as geometry and statistics on the tropical projective torus. A current work in progress is to characterize and solve the optimal transport problem on the subset of the tropical projective torus corresponding to phylogenetic tree space with 5 leaves, T 5 . This space is made up of a union of 5!! = 15 polyhedral cones in the tropical projective torus, each with dimension 2. In this study, the main challenge involves the polyhedral structure of the tree space (as discussed in Section 2.2), and in particular, how to handle the intersections of the cones; a weaker form of the divergence and gradient operators are required to traverse the cones. The present work solves the problem within a single cone, which defines a shrink operator with already six cases, see Table 1; we also expect the characterization of the shrink operator to be combinatorially more complicated on all 15 cones of T 5 .
From the perspective of optimal transport, we observe that the combinatorial structure of the tropical metric poses several interesting challenges in optimal transport. For example, the partial differential equations derived in Section 3 are defined in a piecewise manner: in two-dimensional sample space, there are six corresponding equations characterizing geodesics in optimal transport. In the general case, there are interesting regularity issues to be further studied. The theory of optimal transport and the study of associated density manifolds provide a natural base to construct heat equations with respect to the tropical metric. This provides an important potential to defining non-uniform probability distributions on the tropical projective torus: classically, the solution to the heat equation gives rise to the Gaussian distribution, thus, a solution to the tropical heat equation is a candidate for a tropical Gaussian distribution on the tropical projective torus (Tran, 2018;El Maazouz and Tran, 2019). The dynamic setting of optimal transport with the tropical ground metric introduced in this paper also provides a foundation to studying the displacement convexity and Ricci curvature tensor on the tropical projective torus. In forthcoming work, we further study such questions by applying the relevant work of Li (2018Li ( , 2019, which also studies geometric and probabilistic questions in the context of optimal transport theory. Figure 7: Experiment 2: L 2 tropical optimal transport. The figures show the geodesics of L 2 tropical optimal transport between two initial densities from t = 0 to t = 1. The initial densities are same as in Figure 6.

Figures: Numerical Experiments
(a) ρ 0 (b) ρ 1 (c) m : Experiment 3: L 2 tropical optimal transport. The six figures show the geodesics of L 2 tropical optimal transportation from t = 0 to t = 1. The initial densities are same as in Figure 8.