Plücker coordinates of the best-fit Stiefel tropical linear space to a mixture of Gaussian distributions

In this research, we investigate a tropical principal component analysis (PCA) as a best-fit Stiefel tropical linear space to a given sample over the tropical projective torus for its dimensionality reduction and visualization. Especially, we characterize the best-fit Stiefel tropical linear space to a sample generated from a mixture of Gaussian distributions as the variances of the Gaussians go to zero. For a single Gaussian distribution, we show that the sum of residuals in terms of the tropical metric with the max-plus algebra over a given sample to a fitted Stiefel tropical linear space converges to zero by giving an upper bound for its convergence rate. Meanwhile, for a mixtures of Gaussian distribution, we show that the best-fit tropical linear space can be determined uniquely when we send variances to zero. We briefly consider the best-fit topical polynomial as an extension for the mixture of more than two Gaussians over the tropical projective space of dimension three. We show some geometric properties of these tropical linear spaces and polynomials.


Introduction
Principal component analysis (PCA) is a powerful and most popular method to visualize and to reduce dimensionality of high dimensional datasets using tools in linear algebra [1].Principal components can be obtained by solving an optimization problem to find the best-fit linear space to a given sample over an Euclidean space.The primal problem of this optimization is to minimize the sum of squares of distances between each observation in the given dataset to its orthogonal projection onto the linear space, and the dual problem is to find the largest direction of the variance of a given dataset.In the multivariate analyses, the results rely heavily on a given metric, which essentially determines how similar any pair of data points are.Thus replacing the conventional Euclidean metric by another metric can work depending on problems, especially datasets from non-Euclidean spaces.Tropical linear algebra has been well studied by many mathematicians (for example, [2], [3] and [4]).Especially, it is well-known that convexity with the tropical metric behaves very well [5].Therefore, in 2017, Yoshida et al. [6] applied tropical linear algebra to PCA by solving the primal problem of the optimization with the tropical metric over the tropical projective torus using the max-plus algebra.
Therein [6], two approaches to PCA using tropical geometry has been developed: (i) the tropical polytope with a fixed number of vertices "closest" to the data points in the tropical projective torus or the space of phylogenetic trees with respect to the tropical metric; and (ii) the Stiefel tropical linear space of fixed dimension "closest" to the data points in the tropical projective torus with respect to the tropical metric.Here "closest" means that a tropical polytope or a Stiefel tropical linear space has the smallest sum of tropical distances between each observation in the given sample and its projection onto them in terms of the tropical metric.The first approach (i) has been well studied and applied to phylogenomics [7], as the (equidistant) trees space has a nice property of tropical convexity [5]: since the space of equidistant trees with a fixed set of labels for leaves is tropically convex [8] and since a tropical polytope is tropically convex [2], a tropical polytope is in the space of equidistant trees if all vertices of the tropical polytope are in the tree space.Meanwhile, the second approach (ii) had little attention, even though the tropical projective space can be essential in data analyses such as the characterization of the neural responses under nonstationarity [9].
The Stiefel tropical linear space, that can be characterized by a Plücker coordinate computed from a matrix, has been studied and it has nice properties, such as projection and intersection ( [3], [8], [10], [11]).In the second approach (ii), Yoshida et al. [6] showed explicit formulation on the best-fit Stiefel tropical linear space to a given sample when the Stiefel tropical linear space is a tropical hyperplane and the sample size is equal to the dimension of the tropical projective space.Recently, Akian et al. developed a tropical linear regression over the tropical projective space and extended the best-fit tropical hyperplane to a sample with any sample size ≥ 2 [12].However, in general, their formulation does not hold for finding the best-fit Stiefel tropical linear space if we vary the dimension of the Stiefel tropical linear space.
In this paper, therefore, we consider the explicit formulation of the bestfit Stiefel tropical linear space to a sample when we vary the dimension of the space and the sample size.More specifically, we focus on fitting a Stiefel tropical linear space of any smaller dimension to a sample with sample size ≥ 2 generated from a mixture of Gaussian distributions.In order to uniquely specify a Stiefel tropical linear space over the tropical projective space (R ∪ {−∞}) d /R1 with 1 = (1, 1, . . ., 1), we use the Plücker coordinate or the matrix A ∈ (R∪{−∞}) m×d associated to it, where m < d and m−1 is the dimension of the Stiefel tropical linear space.To compute the Plücker coordinate of a Stiefel tropical linear space is equivalent to compute tropical determinants of minors of its associated matrix.As Xie studied geometry of tropical determinants of 2 × 2 matrices in [13], we study geometry of the best-fit Stiefel tropical linear space to a sample generated by a Gaussian distribution, as a location of the "apex", i.e., the center of the Stiefel tropical linear space.Then we also study geometry of tropical polynomials.Specifically, we show an algorithm to project an observation onto a tropical polynomial in terms of the tropical metric and propose an algorithm to compute the best-fit tropical polynomial to a given sample in R 3 /R1.This paper is organized as follows: In Section 2 we describe basics in tropical arithmetic and geometry.Then in Section 3, we define the Best-fit Stiefel tropical linear space, that is, the best-fit Stiefel tropical linear space over (R ∪ {−∞}) d /R1 with a given sample.Section 4 describes a characterization of the matrix associated with the Plücker coordinate of the best-fit Stiefel tropical linear space to a sample generated by a Gaussian distribution when we send the variances to zero.In this section we also investigate geometry of the best-fit Stiefel tropical linear space of the dimension m − 1 when m = d − 1. Section 5 generalizes the results in Section 4 to a mixture of two Gaussian distributions.In Section 6 we show an algorithm to project an observation to a tropical polynomial in terms of the tropical metric and investigate the best-fit tropical polynomial to a sample when the variances are very small.

Contribution
We characterize the matrix associate with the Plücker coordinate for the bestfit Stiefel tropical linear space of dimension m − 1 to a sample generated by a Gaussian distribution over the tropical projective torus R d /R1 as we send all variances to zero.Then we also characterize the matrix associate with the Plücker coordinate for the best-fit Stiefel tropical linear space of dimension m−1 to a sample generated by a mixture of l many Gaussian distributions over the tropical projective torus R d /R1 as we send all variances to zero.Then we investigate the best-fit tropical polynomial to a sample generated by a mixture of Gaussian distributions and propose one way to estimate the best-fit tropical polynomial equation to fitting the set of such observations.

Basics of Stiefel Tropical Linear Spaces
Recall that through this paper we consider the tropical projective torus R d /R1, which is isometric to R d−1 .Here is a remark for the experts: we observe that tropical linear spaces are subsets of the tropical projective space (R ∪ {−∞}) d /R1 rather than the tropical projective torus R d /R1.This relatively technical point will not be important in what follows, as the projection of a point in the tropical projective torus into a Stiefel tropical linear space remains in the tropical projective torus.So in the basic definitions, we will use (R ∪ {−∞}) d /R1 instead of R d /R1.For basics of tropical geometry, see [3] for more details.In addition, the authors recommend readers to see [14] which contains very nice properties of tropical linear spaces and tropical convexity with the max-plus algebra.
Definition 1 (Tropical Arithmetic Operations) Throughout this paper we will perform arithmetic in the max-plus tropical semiring ( R ∪ {−∞}, , ) .In this tropical semiring, the basic tropical arithmetic operations of addition and multiplication are defined as: Definition 3 (Generalized Hilbert Projective Metric) For any two vectors v, w ∈ (R ∪ {−∞}) d /R1, the tropical distance d tr (v, w) between v and w is defined as: where v = (v 1 , . . ., v d ) and w = (w 1 , . . ., w d ).
Remark 1 (Lemma 5.2 in [14]) The tropical metric d tr over R d /R1 is twice the quotient norm of the maximum norm on R d .

(left).
There is an explicit formula to compute a tropical distance from an observation to a tropical hyperplane (Lemma 2.1 and Corollary 2.3 in [15]).Therefore it is easier to find the best-fit tropical hyperplane over the tropical projective space.Meanwhile, it is not enough to work with best-fit tropical hyperplanes because we can reduce only one dimension from the ambient space with a tropical hyperplane.Therefore, in this paper, we consider a lower dimensional Stiefel tropical linear space as the subspace to which we project data points.In what follows, let [m] denotes the set of integers {1, 2, . . ., m} where m is a positive integer.
Definition 5 (Tropical Matrix) For any V = {v (1) , . . ., v (m) } ⊂ (R ∪ {−∞}) d /R1, we define a tropical matrix M V such that the size of M V is m×d, and for any i ∈ [m], the i-th row of M V is v (i) (note here, we assume v (i) is a row vector).
Definition 6 (Tropical Determinant) Let q be a positive integer.For any tropical matrix A of size q × q with entries in R ∪ {−∞}, the tropical determinant of A is defined as: where Sq is all the permutations of [q] := {1, . . ., q}, and A i,j denotes the (i, j)-th entry of A.

Remark 2
The tropical determinant of any non-square matrix is −∞.
Our treatment of tropical linear spaces largely follows [10, Sections 3 and 4].Remark 3 Let A be a tropical matrix of size (d − 1) × d.Then the Stiefel tropical linear space of A is a tropical hyperplane.Furthermore, any tropical hyperplane is a Stiefel tropical linear space [16,Remark 1.21].For more details on the geometry of tropical linear spaces including tropical hyperplanes, see [14,17].
Remark 5 Although the four conditions (1), ( 2), ( 3) and ( 4) are imposed, the solution is neither a point nor empty.That is, the conditions are somehow redundant and it is not the case that each condition reduces one dimension.For example, the intersection (and the stable intersection) of only ( 1) and ( 2) is already p A .Here the stable intersection, computable only with tropical Plücker coordinates, reduces dimensions without fail [11].The intersection of ( 1) and (3) calculated by hand is a mixed dimensional set, while the stable intersection of ( 1) and ( 3) results in the one-dimensional Stiefel tropoical linear space which is different from p A .Similarly, the stable intersection of ( 3) and ( 4) is p A .Finally, the stable intersection of p A and (3) is a point (0, 5 − 2c, 5, −c).
To perform a "tropical principal component analysis", we need to project a data point onto a Stiefel tropical linear space, which is realized by the Red and Blue Rules [10,Theorem 15].
If this maximum is unique, attained with index τ i , then let γτ,τ i be the positive difference between the second maximum and this maximum, and set vτ i = max{vτ i , γτ,τ i }.
Then v gives the difference between u and a closest point of Lp.In particular, if w is the point in Lp returned by the Blue Rule, we have u = w + v.
Remark 7 For simplicity, when τ contains only one element, we treat it as a positive integer instead of a set.
We write π L(A) as the projection function which takes a point u ∈ R d /R1 and returns the nearest point w ∈ L(A) given by the Blue Rule.Depending on the size of m (i.e., the number of rows of A), we may prefer to use either the Blue Rule or the Red Rule to compute π L(A) (u).If m is relatively small, then we can compute π L(A) (u) naively with the Blue Rule in O(d m+1 ) time of operations.If m is relatively large, conversely, then we can use the Red Rule to compute the projection in O(m • (d/m) m+1 ) time of operations.In practice, we note that most of the permutations considered in the Red and Blue Rules do not seem to affect the computation; There is a faster algorithm than Red Rule and Blue Rule to compute a projection onto a tropical linear space by Theorem 2 in [18].However, in this research we only considered Red Rule and Blue Rule.

Best-fit Stiefel Tropical Linear Space
In analogy with the classical PCA, the (m−1)-th tropical PCA in [6] minimizes the sum of the tropical distances between the data points and their projections onto a best-fit Stiefel tropical linear space of dimension m − 1, defined by a tropical matrix of size m × d.
Here we recall that π L(A) (x (i) ) is the projection of x (i) onto the Stiefel tropical linear space L(A) for i = 1, . . .n.

Example 5
In the case of m = 2 and d = 3, the Stiefel tropical linear space becomes a hyperplane as shown in Example 2. For the sample S = {x (1) , . . ., x (8) } ⊂ R 3 /R1 in Fig. 2(left), the best fit hyperplane according to the numerical calculation in Fig. 2(middle) has the normal vector ω = (0, 0), which is the coordinate of the apex (hinge point).
Definition 12 (Fermat-Weber Point) Suppose we have a sample S = {x (1) , . . ., x (n) } ⊂ R d /R1.A Fermat-Weber point x * of S is defined as: Remark 8 Under the tropical metric d tr , a Fermat-Weber point is not unique [19].As a simple special case of the tropical PCA, we begin with a sample S = {X 1 , . . ., X d } from a single uncorrelated Gaussian, i.e., X i ∼ N ((0, . . ., 0), σI d×d ), where I d×d is the identity matrix and σ > 0, as well as its best-fit hyperplane.The first goal is to show that the best-fit hyperplane as σ → 0 is the one whose apex is located at the center of the Gaussian.
Then the mean tropical distances in R 3 /R1 from (X 1 , X 2 , X 3 ) to the tropical hyperplane H 0 and the tropical line consisting of (0, 0, z) for z ∈ R are given by Then, due to the symmetry in integration, the mean tropical distance to H 0 is given by There we used and As minz d tr ((X 1 , X 2 , X 3 ), (0, 0, z)) = d tr ((X 1 , X 2 , X 3 ), (0, 0, )) = |X 2 − X 1 |, the mean tropical distance to the line is, by the symmetry in integration, given by Then the mean tropical distances in R 3 /R1 from (X 1 , X 2 , X 3 ) to the tropical hyperplane H 0 and to the tropical hyperplane H (0,0,−c) for c > 0 divided by σ are given by 3 It was shown that the mean of the sum of distances between observations and their projections onto a tropical hyperplane H ω for ω ∈ R d /R1 takes the minimum with H 0 , i.e., when the center of Gaussian is on the apex of the hyperplane with d = 3.We are curious if the same holds for the hyperplanes with general d.Imagine, if the Gaussian center is outside of H 0 , the distance remains finite (> 0) even if σ → 0. Thus it suffices to consider the case when the Gaussian center is on H 0 (if not exactly on the apex) to find the bestfit hyperplane.Furthermore, as the definition of H 0 is that the maximum of x 1 , x 2 , . . ., x d is attained at least twice on it, we only need to separately consider the cases by how many times the maximum is attained, actually.
Theorem 5 Let X 1 , X 2 , . . ., X d ∼ N (0, σ 2 ).Then, in the limit σ → 0, the mean tropical distances divided by σ from (X 1 , X 2 , . . ., X d ) ∈ R d /R1 to the tropical hyperplane Hω, that passes through the origin, only depend on how many times(=k-times) the maximum is attained in the defining equations at the origin.It is the mean tropical distance to the projection to the k-dimensional H 0 (= d tr ((X 1 , . . ., X k ), H 0 )).Specifically, when the maximum is attained three times or twice, it is Proof When the hyperplane Hω passes through the origin, max{ω 1 , ω 2 , . . ., ω d } is attained at least twice (k times).By changing the coordinates, the condition can be written as Note that the last equation holds only within the neighborhood of the origin, |x i | < |max k<j≤d ω j |/2 for 1 ≤ i ≤ d, which is satisfied when σ → 0. The numerical calculation of the distance is plotted in Figure 3 and the specific cases with k = 2, 3 coincide with Theorem 5.
Remark 10 A point on H 0 can be represented as x 1 = x 2 = . . .= x k = c for some k ≤ d.Specifically, the apex is the highest codimensional case where all x i 's are equal (k = d).Note that d tr ((X 1 , . . ., X k , X k+1 , . . ., X d ), Hω) in R d /R1 is the same as d tr ((X 1 , . . ., X k ), H 0 ) in R k /R1 because the difference of the first and the second max of (X 1 , X 2 , . . ., X k ) is equal to d tr ((X 1 , . . ., X k ), H 0 ).Thus it suffices to show that the mean distance to H 0 decreases with k in Figure 3 to prove that the mean distance takes the minimum for H 0 , i.e., when the center of Gaussian is on the apex of the hyperplane for general d.Conjecture 6 E[d tr ((X 1 , . . ., X k ), H 0 )] monotonically decreases with k.Thus, the hyperplane that fits X the best as σ → 0 converges to H 0 , i.e. when the apex is at the center of the Gaussian.

Best-fit Stiefel tropical linear spaces
Next we consider a non-hyperplane Stiefel tropical linear space as a subspace.In the hyperplane case, we have considered not only the convergence of the mean tropical distance to zero but also its convergece rate.Along this line, our ultimate goal is to prove the following conjecture.
However, for the general Stiefel tropical linear space, it is hard to consider the convergence rate exactly, although we can give its upper bound.Therefore, we mostly focus on the convergence although the minimizers whose mean distance goes to zero as σ → 0 is not unique in general.In what follows, we begin with a specific example of the (non-hyperplane) Stiefel tropical linear space for which the projection distance goes to zero as σ → 0. We end this section with a discussion on the non-uniqueness of the minimizer by showing that the mean distance goes to zero as σ → 0 when a Stiefel tropical linear space passes through the center of the Gaussian.
For the purpose to unify the notation in the following proofs, we entirely use the indicator function for j ∈ N with any fixed m ∈ N, and the Kronecker delta for i, j ∈ N, where µ 1 , µ 2 , j ∈ R for j = 1, . . ., d.Then the projected point X ∈ R d /R1 of X onto the Stiefel tropical linear space of the matrix (9) is where * is the second smallest value in { 1 , . . ., d }.
Proof By using the indicator function, we can unify the notation as and, by Lemma 8, p({τ, i}) = µτ I τ ≤2 + µ i I i≤2 .
Then, the Blue Rule becomes Suppose imin reaches the smallest value in { 1 , . . ., d }, then be the projected point of X onto the one-dimensional Stiefel tropical linear space of the matrix (9).Then the tropical distance between X and X is where * is the second smallest value in { 1 , . . ., d }, and its expected value satisfies Proof Lemma 9 leads to the tropical distance.By the upper bound in [20], Next, we consider a generalization to the correlated Gaussian.Suppose we have a sample S = {X 1 , . . ., X n } where Then by [21, p. 202], we have where X i = (X i,1 , X i,2 , . . ., X i,d ) for i = 1, . . ., d, and Z i , Z i,1 , Z i,2 , . . ., Z i,d ∼ N (0, 1) for i = 1, . . ., d.

Proof
By using X j = µ j I j≤2 + j + I i≤2 , and p({τ, i}) = µτ I τ ≤2 + µ i I i≤2 , the Blue Rule becomes Note that we essentially repeated the same arguments for j := j + I j≤2 instead of j in Lemma 9.
Then the expected value of the tropical distance between X and the projected point X goes to 0 as σ → 0.
5 Mixture of two Gaussians fitted by a Stiefel tropical linear space of dimension one over R d /R1 Here we consider the tropical PCA for the mixture of two Gaussians, whose centers are located in general positions.

Deterministic setting: Stiefel tropical linear space of dimension one that passes through given two points
Under the assumption of the infinitesimal variances, the problem of finding the best-fit Stiefel tropical linear space for a mixture of two Gaussians turn out to finding the one-dimensional Stiefel tropical linear space that passes the centers of the both Gaussians as a deterministic problem.Here we specifically prove that the one-dimensional Stiefel tropical linear space that passes the given two points exists uniquely.

Lemma 16
The Stiefel tropical linear space with the Plücker coordinates P = (P 12 , P 13 , P 23 ) that passes through given two points µ = (µ 1 , µ 2 , µ 3 ) and ν Proof The condition that the µ is on P is (by the definition of the Stiefel tropical linear space) that max{P 23 + µ 1 , P 13 + µ 2 , P 12 (= 0) + µ 3 } is attained at least twice.Similarly, the condition that the ν is on P is that max{P 23 + ν 1 , P 13 + ν 2 , P 12 (= 0) + ν 3 } is attained at least twice.Thus P must be in the union of the following nine regions.
Proof By the definition of the Stiefel tropical linear space, the condition that the µ is on P is that max{Pτ 2τ3 + µτ 1 , Pτ 1τ3 + µτ 2 , Pτ 1τ2 + µτ 3 } is attained at least twice for all possible triplets (τ 1 , τ 2 , τ 3 ).The condition that the ν is on P is that max{Pτ 2τ3 + ντ 1 , Pτ 1 τ3 + ντ 2 , Pτ 1τ2 + ντ 3 } is attained at least twice for all possible triplets (τ 1 , τ 2 , τ 3 ).By considering the both µ and ν simultaneously for a specific τ = (i, j, k), we come back to Lemma 16, where, without loss of generality, we set This solution is unique for any P ik .Imagine you obtain P ik by two different ways through P ik − P il1 and P ik − P il2 .Then the difference of the solutions obtained through l 1 and l 2 vanishes, P through P il 1 ik − P through P il 2 ik = 0. Similarly, P through P il 1 ik − P through P l 2 k ik = 0. Thus, the solution does not depend on the way to solve.That is, the solution is consistent (not empty) and unique.
Remark 14 One can prove Theorem 17 using the fact that a tropical line segment between two points are unique if and only if these two points are in relative general position, i.e., all the inequalities in (5.9) in [17] are strict.Then we can extend the tropical line segment to its associated Stiefel tropical linear space by the way described in page 293 in [17].

Probabilistic setting: distance to best-fit space
In order to make it simple, suppose we have two random variables: Proof By the Blue Rule and p A0 ({i, j}) = 5I i≤2 + 5I j≤2 , Theorem 21 Suppose w is the projected point of either X 1 (or X 2 ) onto the Stiefel tropical linear space p A0 and P (∪ i,j {| ij | ≥ 5}) ≤ δ for δ > 0, i = 1, 2 and j = 1, . . ., d.Then the expected value of the tropical distance between X 1 or X 2 and w is smaller than 2σ 2 log(d − 1) with the probability 1 − δ.
Remark 17 One issue here is that X 1 and X 2 are not in a general position.In fact, the best-fit one-dimensional Stiefel tropical linear space for two Gaussian described in Theorem 21 may not be unique in the limit of σ → 0. However, the above one is the best one in the sense it is natural and stable (robust).
It may be rather convenient to consider a general position case for which the solution is unique and should coincide with the deterministic one shown in the previous subsection.In this general case, the Blue Rule becomes too complicated and we simply bound with inequalities instead.where µ ij ∈ R are in general positions (µ i −ν i = µ j −ν j for 1 ≤ i < j ≤ 3) and ij ∼ N (0, σ) with small σ > 0 for i = 1, 2 and j = 1, . . ., d. Suppose we project X 1 (or X 2 ) to the Stiefel tropical linear space that passes through µ 1 = (µ 11 , µ 12 , µ 13 , . . ., µ 1d ) and µ 2 = (µ 21 , µ 22 , µ 23 , . . ., µ 2d ).Then the expected value of the tropical distance between X 1 (or X 2 ) and the projected point X 1 (or X 2 ) goes to 0 as σ → 0.
Proof By [20], 6 Mixture of three or more Gaussians fitted by tropical polynomials over R 3 /R1 To explore a possible extension of a Stiefel tropical linear space as a subspace, we consider the projection of data points onto tropical polynomials.In R 3 /R1, the only nontrivial Stiefel tropical linear space is a tropical hyperplane, which is specified by a tropical linear function with a normal vector ω = (ω x , ω y , 0), Similarly, we can consider a x-quadratic tropical hypersurface, which is specified by a corresponding tropical quadratic function, We can further consider a x-cubic tropical hypersurface, which is specified by a corresponding tropical cubic function, although we do not treat cubic cases in this paper.

Deterministic setting: possible configurations of tropical curves that pass through given points
Throughout this paper, we have the mixture of Gaussians whose centers are located in general positions in mind to fit.Furthermore, under the assumption of the infinitesimal variances, the problem of finding the best-fit tropical curve for a mixture of Gaussians in R 3 /R1 can turn out to finding the curve that passes the centers of all the Gaussians.Thus, we first summarize the possible configurations in this deterministic case.Remember that the degree of freedom for a linear tropical curve to pass through is limited to two points.Thus higher degree polynomial curves may be suitable to fit three or more Gaussians.

best-fit tropical linear curves or hyperplanes
Let us briefly review the linear curve or hyperplane case, where we try to find the straight line that passes through the two given points (x 1 , y 1 , z 1 ) and (x 2 , y 2 , z 2 ) in R 3 /R1.Without loss of generality, z 1 = z 2 = 0 and x 1 < x 2 are assumed, as well as y 1 = y 2 .Depending on the slope of the line that connects given two points, there are three possible configurations for the two points to lie on the different half lines as in Fig 4. -2 x y Fig. 4 Examples of all three possible configurations for two points on a plane.Depending on the configuration pattern, the points lie on the different half lines.
Proof Direct calculations.
Remark 18 Algebraically speaking, the condition that a point (x, y) is on a hyperplane is equivalent to the condition that the normal vector of the hyperplane is on a hyperplane whose normal vector is (x, y).Thus, if two points (x 1 , y 1 ) and (x 2 , y 2 ) are on a hyperplane, the normal vector of the hyperplane is the intersection of two hyperplanes whose normal vectors are (x 1 , y 1 ) and (x 2 , y 2 ).

best-fit tropical x-quadratic curves
Here we try to find the quadratic curve that passes the three given points (x i , y i , z i ) in R 3 /R1 for i = 1, 2, 3. Without loss of generality, z 1 = z 2 = z 3 = 0 and x 1 < x 2 < x 3 are assumed, as well as y 1 = y 2 = y 3 = y 1 .Depending on the slope of the connecting line segments, there are 9(= 3 × 3) possible configurations for the three points to lie on the different half lines or line segments as in Fig 5 .Interestingly, x-quadratic curves cannot pass through one of the nine configurations.Examples of all eight possible configurations for three points on a plane.Depending on the configuration pattern, the points lie on the different half lines or line segments.third and the following figures show, respectively, "TooHigh-TooHigh", "TooHigh-High/Low", "High-TooHigh", "High-High/Low" and "Low-High/Low" configurations.
Lemma 24 (Best-fit tropical x-quadratic curves (Fig 5)) In the case of "Low-TooHigh" configuration with y 1 > y 2 and y 1 +2(x 3 −x 2 ) < y 3 , there is no x-quadratic curve that passes through the three points in R 3 /R1.In the other eight configurations, there is a unique x-quadratic curve that passes through the three points in R 3 /R1, where the points lie on the different half lines or line segments depending on the configuration as in Fig 5 .Proof Direct calculations.

Probabilistic setting: distance to best-fit space
To perform a PCA for point clouds, we need a projection rule onto a curve.
Lemma 25 The projection rules in each delineated region of R 3 /R1 to the hyperplane H 0 as well as the x-quadratic curve whose nodes are (0, 0, 0) and (0, 1, 1) are the rules shown in Fig. 6.Especially, the distances from (x, y, 0) to the curves are denoted by the red texts.
Proof By the triangle inequality, you only need to consider the boundaries of each region as candidates of the projection.Remaining is done by direct calculations for each region.Fig. 6 Projection rule in R 3 /R1 to the hyperplane H 0 (left) and the quadratic curve whose nodes are (0, 0, 0) and (0, 1, 1) (right).The red texts represents the distance from a point (0, x, y) to the curve with one of the geodesics shown as a red arrow.This distance function is piecewise linear on the domains delineated by the dotted gray lines and the curve itself.Although, in the quadratic case, we do not have a simple rule like "max -2nd max" for the hyperplane, at least one of the geodesics is a vertical or horizontal line segment, demonstrating the equivalence to L 1 norm.
Similar to fitting to a Stiefel tropical linear space, for fitting to a tropical polynomial, we also have an upper bound for the convergence rate of the mean distance between observations in a given sample and their projections as σ → 0. There, in practice, we do not know the Gaussian center µ in general and we estimate µ by its point estimate μ = 1 n i=1,..,n X i .polynomial when d = 3.In general it is not clear how to project an observation to a given tropical polynomial in terms of the tropical metric, similar to the blue rule and red rule in a case of a Stiefel tropical linear space.Projecting a point onto a tropical polynomial over the tropical projective space is a necessary and an important tool for statistical inference (supervised learning) using tropical geometry.We propose an algorithm to project a point onto a tropical polynomial for d = 3 and it is a future work to generalize this algorithm for d ≥ 3.

Theorem 1 (
The Blue Rule) Let p : [d] m → R ∪ {−∞} be a tropical Plücker vector and Lp its associated tropical linear space.Fix u ∈ R d /R1, and define the point w ∈ R d /R1 whose i-th coordinate isw i = maxτ min j ∈τ u j + p(τ ∪ {i}) − p(τ ∪ {j}) , for i = 1,2, . . ., d and τ runs over all (m − 1)-subsets of [d] that do not contain i.Then w ∈ Lp, and any other x ∈ Lp satisfies d tr (u, x) ≥ d tr (u, w).In other words, w attains the minimum distance of any point in Lp to u. Remark 6 This closest point may not be unique and there may be other points in Lp which have the same tropical distance from u. Theorem 2 (The Red Rule) Let p : [d] m → R ∪ {−∞} be a tropical Plücker vector and Lp its associated tropical linear space.Fix u ∈ R d /R1.Let v be the all-zeros vector.For every

Definition 11 (
Best-fit Stiefel Tropical Linear Space) Suppose we have a sample S = {x (1) , . . ., x (n) } ⊂ R d /R1.Let A be a tropical matrix of size m × d with d > m, and let L(A) be the Stiefel tropical linear space of A. If L(A) minimizes n i=1

Remark 9 A
Fermat-Weber point is a 0-dimensional best-fit Stiefel tropical linear space of a sample with respect to the tropical metric over the tropical projective torus R d /R1.Example 6The Fermat-Weber points for Example 5 is all the points in the green region in Figure2(left) according to the numerical calculation in Figure2(right).

10 Fig. 2 (
Fig. 2 (left) The best-fit hyperplane (gray) and the Fermat-Weber points (green) for the eight points (top) and iris data (bottom, only Setosa and Versicolor used).(middle) The contour plots of the cost function for the tropical PCA with minimum pointed by the red cross for the eight points (top) and iris data (bottom).(right) The contour plot of the cost function for the Fermat-Weber point (green) for the eight points (top) and iris data (bottom).

4
Gaussian distribution fitted by Stiefel tropical linear spaces over R d /R1 4.1 Best-fit tropical hyperplanes

2 √ π and 2 √π(
as σ → 0. Proof The distance to H 0 is given in Lemma 3. We regard x 1 = x1−x3 σ and x 2 = x2−x3 σ as random variables, whose joint probability density function p(x 1 , x 2 ) was shown to be the correlated Gaussian in the proof of Lemma 3, to get lim σ→0

Fig. 5
Fig.5Examples of all eight possible configurations for three points on a plane.Depending on the configuration pattern, the points lie on the different half lines or line segments.thirdand the following figures show, respectively, "TooHigh-TooHigh", "TooHigh-High/Low", "High-TooHigh", "High-High/Low" and "Low-High/Low" configurations.