Learning Algebraic Varieties from Samples

We seek to determine a real algebraic variety from a fixed finite subset of points. Existing methods are studied and new methods are developed. Our focus lies on aspects of topology and algebraic geometry, such as dimension and defining polynomials. All algorithms are tested on a range of datasets and made available in a Julia package.


Introduction
This paper addresses a fundamental problem at the interface of data science and algebraic geometry. Given a sample of points Ω = {u (1) , u (2) , . . . , u (m) } from an unknown variety V in R n , our task is to learn as much information about V as possible. No assumptions on the variety V , the sampling, or the distribution on V are made. There can be noise due to rounding, so the points u (i) do not necessarily lie exactly on the variety from which they have been sampled. The variety V is allowed to be singular or reducible. We also consider the case where V lives in the projective space P n−1 R . We are interested in questions such as: 1. What is the dimension of V ?  Let us consider these five questions for the dataset with m = 27 and n = 2 shown in Figure 1.
Here the answers are easy to see, but what to do if n ≥ 4 and no picture is available?
There is a considerable body of literature on such questions in statistics and computer science. The general context is known as manifold learning. One often assumes that V is smooth, i.e. a manifold, in order to apply local methods based on approximation by tangent spaces. Learning the true nature of the manifold V is not a concern for most authors. Their principal aim is dimensionality reduction, and V only serves in an auxiliary role. Manifolds act as a scaffolding to frame question 1. This makes sense when the parameters m and n are large. Nevertheless, the existing literature often draws its inspiration from figures in 3-space with many well-spaced sample points. For instance, the textbook by Lee and Verleysen [38] employs the "Swiss roll" and the "open box" for its running examples (cf. [38, §1.5]).
One notable exception is the work by Ma et al. [41]. Their Generalized Principal Component Analysis solves problems 1-4 under the assumption that V is a finite union of linear subspaces. Question 5 falls under the umbrella of topological data analysis (TDA). Foundational work by Niyogi, Smale and Weinberger [46] concerns the number m of samples needed to compute the homology groups of V , provided V is smooth and its reach is known.
The perspective of this paper is that of computational algebraic geometry. We care deeply about the unknown variety V . Our motivation is the riddle: what is V ? For instance, we may be given m = 800 samples in R 9 , drawn secretly from the group SO(3) of 3×3 rotation matrices. Our goal is to learn the true dimension, which is three, to find the 20 quadratic polynomials that vanish on V , and to conclude with the guess that V equals SO (3).
Our article is organized as follows. Section 2 presents basics of algebraic geometry from a data perspective. Building on [16], we explain some relevant concepts and offer a catalogue of varieties V frequently seen in applications. This includes our three running examples: the Trott curve, the rotation group SO (3), and varieties of low rank matrices. Section 3 addresses the problem of estimating the dimension of V from the sample Ω. We study nonlinear PCA, box counting dimension, persistent homology curve dimension, correlation dimension and the methods of Levina-Bickel [40] and Diaz-Quiroz-Velasco [22]. Each of these notions depends on a parameter between 0 and 1. This determines the scale from local to global at which we consider Ω. Our empirical dimensions are functions of . We aggregate their graphs in the dimension diagram of Ω, as seen in Figure 2.
Section 4 links algebraic geometry to topological data analysis. To learn homological information about V from Ω, one wishes to know the reach of the variety V . This algebraic number is used to assess the quality of a sample [1,46]. We propose a variant of persistent homology that incorporates information about the tangent spaces of V at points in Ω.
A key feature of our setting is the existence of polynomials that vanish on the model V , extracted from polynomials that vanish on the sample Ω. Linear polynomials are found by Principal Component Analysis (PCA). However, many relevant varieties V are defined by quadratic or cubic equations. Section 5 concerns the computation of these polynomials. Section 6 utilizes the polynomials found in Section 5. These cut out a variety V that contains V . We do not know whether V = V holds, but we would like to test this and certify it, using both numerical and symbolic algorithms. The geography of Ω inside V is studied by computing dimension, degree, irreducible decomposition, real degree, and volume. Section 7 introduces our software package LearningAlgebraicVarieties. This is writ-ten in Julia [6], and implements all algorithms described in this paper. It is available at https://github.com/PBrdng/LearningAlgebraicVarieties.git.
To compute persistent homology, we use Henselman's package Eirene [30]. For numerical algebraic geometry we use Bertini [5] and HomotopyContinuation.jl [9]. We conclude with a detailed case study for the dataset in [2, §6.3]. Here, Ω consists of 6040 points in R 24 , representing conformations of the molecule cyclo-octane C 8 H 16 , shown in Figure 10. Due to space limitations, many important aspects of learning varieties from samples are not addressed in this article. One is the issue of noise. Clearly, already the slightest noise in one of the points in Figure 1 will let no equation of the form (x − α) 2 + (y − β) 2 − γ vanish on Ω. But some will almost vanish, and these are the equations we are looking for. Based on our experiments, the methods we present for answering questions 1-5 can handle data that is approximate to some extent. However, we leave a qualitative stability analysis for future work. We also assume that there are no outliers in our data. Another aspect of learning varieties is optimization. We might be interested in minimizing a polynomial function f over the unknown variety V by only looking at the samples in Ω. This problem was studied by Cifuentes and Parrilo in [15], using the sum of squares (SOS) paradigm [8].

Varieties and Data
The mathematics of data science is concerned with finding low-dimensional needles in highdimensional haystacks. The needle is the model which harbors the actual data, whereas the haystack is some ambient space. The paradigms of models are the d-dimensional linear subspaces V of R n , where d is small and n is large. Most of the points in R n are very far from any sample Ω one might ever draw from V , even in the presence of noise and outliers.
The data scientist seeks to learn the unknown model V from the sample Ω that is available. If V is suspected to be a linear space, then she uses linear algebra. The first tool that comes to mind is Principal Component Analysis (PCA). Numerical algorithms for linear algebra are well-developed and fast. They are at the heart of scientific computing and its numerous applications. However, many models V occurring in science and engineering are not linear spaces. Attempts to replace V with a linear approximation are likely to fail. This is the point where new mathematics comes in. Many branches of mathematics can help with the needles of data science. One can think of V as a topological space, a differential manifold, a metric space, a Lie group, a hypergraph, a category, a semi-algebraic set, and lots of other things. All of these structures are useful in representing and analyzing models.
In this article we focus on the constraints that describe V inside the ambient R n (or P n−1 R ). The paradigm says that these are linear equations, revealed numerically by feeding Ω to PCA. But, if the constraints are not all linear, then we look for equations of higher degree.

Algebraic Geometry Basics
Our models V are algebraic varieties over the field R of real numbers. A variety is the set of common zeros of a system of polynomials in n variables. A priori, a variety lives in Euclidean space R n . In many applications two points are identified if they agree up to scaling. In such cases, one replaces R n with the real projective space P n−1 R , whose points are lines through the origin in R n . The resulting model V is a real projective variety, defined by homogeneous polynomials in n unknowns. In this article, we use the term variety to mean any zero set of polynomials in R n or P n−1 R . The following three varieties serve as our running examples. Example 2.1 (Trott Curve). The Trott curve is the plane curve of degree four defined by This curve is compact in R 2 and has four connected components (see Figure 3).  (3) consists of all 3×3-matrices X = (x ij ) with det(X) = 1 and X T X = Id 3 . The last constraint translates into 9 quadratic equations: These quadrics say that X is an orthogonal matrix. Adding the cubic det(X) − 1 gives 10 polynomials that define SO(3) as a variety in R 9 . Their ideal I is prime. In total, there are 20 linearly independent quadrics in I: the nine listed in (2), two from the diagonal of XX T −Id 3 , and nine that express the right-hand rule for orientation, like x 22 x 33 −x 23 x 32 −x 11 . Example 2.3 (Low Rank Matrices). Consider the set of m × n-matrices of rank ≤ r. This is the zero set of m r+1 n r+1 polynomials, namely the (r + 1)-minors. These equations are homogeneous of degree r+1. Hence this variety lives naturally in the projective space P mn−1 R . A variety V is irreducible if it is not a union of two proper subvarieties. The above varieties are irreducible. A sufficient condition for a variety to be irreducible is that it has a parametrization by rational functions. This holds in Example 2.3 where V consists of the matrices U T 1 U 2 where U 1 and U 2 have r rows. It also holds for the rotation matrices However, smooth quartic curves in P 2 R admit no such rational parametrization. The two most basic invariants of a variety V are its dimension and its degree. The former is the length d of the longest proper chain of irreducible varieties V 1 ⊂ V 2 ⊂ · · · ⊂ V d ⊂ V . A general system of d linear equations has a finite number of solutions on V . That number is well-defined if we work over C. It is the degree of V , denoted deg(V ). The Trott curve has dimension 1 and degree 4. The group SO(3) has dimension 3 and degree 8. In Example 2.3, if m = 3, n = 4 and r = 2, then the projective variety has dimension 9 and degree 6.
There are several alternative definitions of dimension and degree in algebraic geometry. For instance, they are read off from the Hilbert polynomial, which can be computed by way of Gröbner bases. We refer to Chapter 9, titled Dimension Theory, in the textbook [16].
A variety that admits a rational parametrization is called unirational. Smooth plane curves of degree ≥ 3 are not unirational. However, the varieties V that arise in applications are often unirational. The reason is that V often models a generative process. This happens in statistics, where V represents some kind of (conditional) independence structure. Examples include graphical models, hidden Markov models and phylogenetic models.
If V is a unirational variety with given rational parametrization, then it is easy to create a finite subset Ω of V . One selects parameter values at random and plugs these into the parametrization. For instance, one creates rank one matrices by simply multiplying a random column vector with a random row vector. A naive approach to sampling from the rotation group SO(3) is plugging four random real numbers a, b, c, d into the parametrization (3). Another method for sampling from SO(3) will be discussed in Section 7.
Given a dataset Ω ⊂ R n that comes from an applied context, it is reasonable to surmise that the underlying unknown variety V admits a rational parametrization. However, from the vantage point of a pure geometer, such unirational varieties are rare. To sample from a general variety V , we start from its defining equations, and we solve dim(V ) many linear equations on V . The algebraic complexity of carrying this out is measured by deg(V ). See Dufresne et al. [25] for recent work on sampling by way of numerical algebraic geometry.
Example 2.4. One might sample from the Trott curve V in Example 2.1 by intersecting it with a random line. Algebraically, one solves dim(V ) = 1 linear equation on the curve. That line intersects V in deg(V ) = 4 points. Computing the intersection points can be done numerically, but also symbolically by using Cardano's formula for the quartic. In either case, the coordinates computed by these methods may be complex numbers. Such points are simply discarded if real samples are desired. This can be a rather wasteful process.
At this point, optimization and real algebraic geometry enter the scene. Suppose that upper and lower bounds are known for the values of a linear function on V . In that case, the equations to solve have the form (x) = α, where α is chosen between these bounds.
For the Trott curve, we might know that no real points exist unless |x| ≤ 1. We choose x at random between −1 and +1, plug it into the equation (1), and then solve the resulting quartic in y. The solutions y thus obtained are likely to be real, thus giving us lots of real samples on the curve. Of course, for arbitrary real varieties, it is a hard problem to identify a priori constraints that play the role of |x| ≤ 1. However, recent advances in polynomial optimization, notably in sum-of-squares programming [8], should be quite helpful.
At this point, let us recap and focus on a concrete instance of the riddles we seek to solve. Where do these samples come from? Do the zero entries or the sign patterns offer any clue?
To reveal the answer we label the coordinates as (x 22 , x 21 , x 13 , x 12 , x 23 , x 11 ). The relations hold for all 40 data points. Hence V is the variety of 2 × 3-matrices (x ij ) of rank ≤ 1. Following Example 2.3, we view this as a projective variety in P 5 R . In that ambient projective space, the determinantal variety V is a manifold of dimension 3 and degree 3. Note that V is homeomorphic to P 1 R × P 2 R , so we can write its homology groups using the Künneth formula. In data analysis, proximity between sample points plays a crucial role. There are many ways to measure distances. In this paper we restrict ourselves to two metrics. For data in R n we use the Euclidean metric, which is induced by the standard inner product we use the Fubini-Study metric. Points u and v in P n−1 R are represented by their homogeneous coordinate vectors. The Fubini-Study distance from u to v is the angle between the lines spanned by representative vectors u and v in R n : This formula defines the unique Riemannian metric on P n−1 R that is orthogonally invariant.

A Variety of Varieties
In what follows we present some "model organisms" seen in applied algebraic geometry. Familiarity with a repertoire of interesting varieties is an essential prerequisite for those who are serious about learning algebraic structure from the datasets Ω they might encounter.
Rank Constraints. Consider m × n-matrices with linear entries having rank ≤ r. We saw the r = 1 case in Example 2.3. A rank variety is the set of all tensors of fixed size and rank that satisfy some linear constraints. The constraints often take the simple form that two entries are equal. This includes symmetric matrices, Hankel matrices, Toeplitz matrices, Sylvester matrices, etc. Many classes of structured matrices generalize naturally to tensors.
Example 2.6. Let n = s 2 and identify R n with the space of skew-symmetric s × s-matrices P = (p ij ). These satisfy P T = −P . Let V be the variety of rank 2 matrices P in P n−1 R . A parametric representation is given by p ij = a i b j − a j b i , so the p ij are the 2 × 2-minors of a 2 × s-matrix. The ideal of V is generated by the 4 × 4 pfaffians p ij p kl − p ik p jl + p il p jk . These s 4 quadrics are also known as the Plücker relations, and V is the Grassmannian of 2-dimensional linear subspaces in R s . The r-secants of V are represented by the variety of skew-symmetric matrices of rank ≤ 2r. Its equations are the (2r+2) × (2r+2) pfaffians of P . We refer to [29,Lectures 6 and 9] for an introduction to these classical varieties.
Example 2.7. The space of 3 × 3 × 3 × 3 tensors (x ijkl ) 1≤i,j,k,l≤3 has dimension 81. Suppose we sample from its subspace of symmetric tensors m = (m rst ) 0≤r≤s≤t≤3 . This has dimension n = 20. We use the convention m rst = x ijkl where r is the number of indices 1 in (i, j, k, l), s is the number of indices 2, and t is the number of indices 3. This identifies tensors m with cubic polynomials m = i+j+k≤3 m ijk x i y j z k , and hence with cubic surfaces in 3-space. Fix r ∈ {1, 2, 3} and take V to be the variety of tensors m of rank ≤ r. The equations that define the tensor rank variety V are the (r + 1) × (r + 1)-minors of the 4 × 10 Hankel matrix See Landsberg's book [37] for an introduction to the geometry of tensors and their rank. Matrices and tensors with rank constraints are ubiquitous in data science. Make sure to search for such low rank structures when facing vectorized samples, as in Example 2.5.
Hypersurfaces. The most basic varieties are defined by just one polynomial. When given a sample Ω, one might begin by asking for hypersurfaces that contain Ω and that are especially nice, simple and informative. Here are some examples of special structures worth looking for.
The 15 monomials correspond to the matchings of the complete graph with six vertices.
Example 2.10. The hyperdeterminant of format 2 × 2 × 2 is a polynomial of degree four in n = 8 unknowns, namely the entries of a 2 × 2 × 2-tensor X = (x ijk ). Its expansion equals This hypersurface is rational and it admits several nice parametrizations, useful for sampling points. For instance, up to scaling, we can take the eight principal minors of a symmetric 3 × 3-matrix, with x 000 = 1 as the 0 × 0-minor, x 100 , x 010 , x 001 for the 1 × 1-minors (i.e. diagonal entries), x 110 , x 101 , x 011 for the 2 × 2-minors, and x 111 for the 3 × 3-determinant.
Example 2.11. Let n = 10, with coordinates for R 10 given by the off-diagonal entries of a symmetric 5 × 5-matrix (x ij ). There is a unique quintic polynomial in these variables that vanishes on symmetric 5 × 5-matrices of rank ≤ 2. This polynomial, known as the pentad, plays a historical role in the statistical theory of factor analysis [24,Example 4.2.8]. It equals We can sample from the pentad using the parametrization x ij = a i b j +c i d j for 1 ≤ i < j ≤ 5.
Example 2.12. The determinant of the (p−1) × (p−1) matrix in Example 2.8 equals the squared volume of the simplex spanned by p points in R p−1 . If p = 3 then we get Heron's formula for the area of a triangle in terms of its side lengths. The hypersurface in R ( p 2 ) defined by this polynomial represents configurations of p points in R p−1 that are degenerate.
One problem with interesting hypersurfaces is that they often have a very high degree and it would be impossible to find that equation by our methods in Section 5. For instance, the Lüroth hypersurface [4] in the space of ternary quartics has degree 54, and the restricted Boltzmann machine [17] on four binary random variables has degree 110. These hypersurfaces are easy to sample from, but there is little hope to learn their equations from those samples.
Secret Linear Spaces. This refers to varieties that become linear spaces after a simple change of coordinates. Linear spaces V are easy to recognize from samples Ω using PCA.
Toric varieties become linear spaces after taking logarithms, so they can be learned by taking the coordinatewise logarithm of the sample points. Formally, a toric variety is the image of a monomial map. Equivalently, it is an irreducible variety defined by binomials. Replace each of these forty vectors by its coordinate-wise logarithm. Applying PCA to the resulting vectors, we learn that our sample comes from a 4-dimensional subspace of R 6 . This is the row space of a 4 × 6-matrix whose columns are the vertices of a regular octahedron: Our original samples came from the toric variety X A associated with this matrix. This means each sample has the form (ab, ac, ad, bc, bd, cd), where a, b, c, d are positive real numbers.
Toric varieties are important in applications. For instance, in statistics they correspond to exponential families for discrete random variables. Overlap with rank varieties arises for matrices and tensors of rank 1. Those smallest rank varieties are known in geometry as the Segre varieties (for arbitrary tensors) and the Veronese varieties (for symmetric tensors). These special varieties are toric, so they are represented by an integer matrix A as above.
Example 2.14. Let n = 6 and take Ω to be a sample of points of the form The corresponding variety V ⊂ P 5 R is a reciprocal linear space V ; see [36]. In projective geometry, such a variety arises as the image of a linear space under the classical Cremona transformation. From the sample we can learn the variety V by replacing each data point by its coordinate-wise inverse. Applying PCA to these reciprocalized data, we learn that V is a surface in P 5 R , cut out by ten cubics like 2x 3 Algebraic Statistics and Computer Vision. Model selection is a standard task in statistics. The models considered in algebraic statistics [24] are typically semi-algebraic sets, and it is customary to identify them with their Zariski closures, which are algebraic varieties. : The coordinates x ijkl represent the probabilities of observing the 16 states under this model.
Computational biology is an excellent source of statistical models with interesting geometric and combinatorial properties. These include hidden variable tree models for phylogenetics, and hidden Markov models for gene annotation and sequence alignment.
In the social sciences and economics, statistical models for permutations are widely used: Example 2.16. Let n = 6 and consider the Plackett-Luce model for rankings of three items [52]. Each item has a model parameter θ i , and we write x ijk for the probability of observing the permutation ijk. The model is the surface in P 5 R given by the parametrization The prime ideal of this model is generated by three quadrics and one cubic: When dealing with continuous distributions, we can represent certain statistical models as varieties in moment coordinates. This applies to Gaussians and their mixtures.
Example 2.17. Consider the projective variety in P 6 R given parametrically by m 0 = 1 and These are the moments of order ≤ 6 of the mixture of two Gaussian random variables on the line. Here µ and ν are the means, σ and τ are the variances, and λ is the mixture parameter. It was shown in [3, Theorem 1] that this is a hypersurface of degree 39 in P 6 . For µ = 0 we get the Gaussian moment surface which is defined by the 3 × 3-minors of the 3 × 6-matrix Example 2.18. Let n = 9 and fix the space of 3 × 3-matrices. An essential matrix is the product of a rotation matrix times a skew-symmetric matrix. In computer vision, these matrices represent the relative position of two calibrated cameras in 3-space. Their entries x ij serve as invariant coordinates for pairs of such cameras. The variety of essential matrices is defined by ten cubics. These are known as the Démazure cubics [35, Example 2.2]. The article [35] studies camera models in the presence of distortion. For example, the model described in [35,Example 2.3] concerns essential matrices plus one focal length unknown. This is the codimension two variety defined by the 3 × 3-minors of the 3 × 4-matrix Learning such models is important for image reconstruction in computer vision.

Estimating the Dimension
The first question one asks about a variety V is "What is the dimension?". In what follows, we discuss methods for estimating dim(V ) from the finite sample Ω, taken from V . We present six dimension estimates. They are motivated and justified by geometric considerations. For a manifold, dimension is defined in terms of local charts. This is consistent with the notion of dimension in algebraic geometry [16,Chapter 9]. The dimension estimates in this section are based on Ω alone. Later sections will address the computation of equations that vanish on V . These can be employed to find upper bounds on dim(V ); see (23).
In what follows, however, we do not have that information. All we are given is the input

Dimension Diagrams
There is an extensive literature (see e.g. [12,13]) on computing an intrinsic dimension of the sample Ω from a manifold V . The intrinsic dimension of Ω is a positive real number that approximates the Hausdorff dimension of V , a quantity that measures the local dimension of a space using the distances between nearby points. It is a priori not clear that the algebraic definition of dim(V ) agrees with the topological definition of Hausdorff dimension that is commonly used in manifold learning. However, this will be true under the following natural hypotheses. We assume that V is a variety in R n or P n−1 R such that the set of real points is Zariski dense in each irreducible component of V . If V is irreducible, then its singular locus Sing(V ) is a proper subvariety, so it has measure zero. The regular locus V \Sing(V ) is a real manifold. Each connected component is a real manifold of dimension d = dim(V ). The definitions of intrinsic dimension can be grouped into two categories: local methods and global methods [13,34]. Definitions involving information about sample neighborhoods fit into the local category, while those that use the whole dataset are called global.
Instead of making such a strict distinction between local and global, we introduce a parameter 0 ≤ ≤ 1. The idea behind this is that should determine the range of information that is used to compute the dimension from the local scale ( = 0) to the global scale ( = 1).
To be precise, for each of the dimension estimates, locality is determined by a notion of distance: the point sample Ω is a finite metric space. In our context we restrict extrinsic metrics to the sample. For samples Ω ⊂ R n we work with the scaled Euclidean distance For samples Ω taken in projective space P n−1 R we use the scaled Fubini-Study distance Two points u (i) and u (j) in Ω are considered -close with respect to the parameter if dist R n (u, v) ≤ or dist P n−1 R (u, v) ≤ , respectively. Given we divide the sample Ω into clusters Ω 1 , . . . , Ω l , which are defined in terms of -closeness, and apply the methods to each cluster separately, thus obtaining dimension estimates whose definition of being local depends on . In particular, for = 0 we consider each sample point individually, while for = 1 we consider the whole sample. Intermediate values of interpolate between the two.
Many of the definitions of intrinsic dimension are consistent. This means that it is possible to compute a scale from Ω for which the intrinsic dimension of each cluster converges to the dimension of V if m is sufficiently large and Ω is sampled sufficiently densely. By contrast, our paradigm is that m is fixed. For us, m does not tend to infinity. Our standing assumption is that we are given one fixed sample Ω. The goal is to compute a meaningful dimension from that fixed sample of m points. For this reason, we cannot unreservedly employ results on appropriate parameters in our methods. The sample Ω will almost never satisfy the assumptions that are needed. Our approach to circumvent this problem is to create a dimension diagram. Such diagrams are shown in Figures    The true dimension of a variety is an integer. However, we defined the dimension diagram to be the graph of a function whose range is a subset of the real numbers. The reason is that the subsequent estimates do not return integers. A noninteger dimension can be meaningful mathematically, such as in the case of a fractal curve which fills space densely enough that its dimension could be considered closer to 2 than 1. By plotting these diagrams, we hope to gain information about the true dimension d of the variety V from which Ω was sampled. R . The projective diagrams yield better estimates. The 600 data points were obtained by independently sampling pairs of 4 × 2 and 2 × 3 matrices, each with independent entries from the normal distribution, and then multiplying them.
One might be tempted to use the same dimension estimate for R n and P n−1 R , possibly via the Euclidean distance on an affine patch of P n−1 R . However, the Theorema Egregium by Gauss implies that any projection from P n−1 R to R n−1 must distort lengths. Hence, because we gave the parameter a metric meaning, we must be careful and treat real Euclidean space and real projective space separately.
Each of the curves seen in Figure 2 is a dimension diagram. We used six different methods for estimating the dimension on a fixed sample of 600 points. For the horizontal axis on the left we took the distance (6) in R 12 . For the diagram on the right we took (7) in P 11 R .

Six dimension estimates
In this section, we introduce six dimension estimates. They are adapted from the existing literature. Figures 2, 6 NPCA Dimension. The gold standard of dimension estimation is PCA. Assuming that V is a linear subspace of R n , we perform the following steps for the input Ω. First, we record the mean u := 1 m m i=1 u (i) . Let M be the m × n-matrix with rows u (i) − u. We compute σ 1 ≥ · · · ≥ σ min{m,n} , the singular values of M . The PCA dimension is the number of σ i above a certain threshold. For instance, this threshold could be the same as in the definition of the numerical rank in (21) below. Following [38, p. 30], another idea is to set the threshold as σ k , where k = argmax 1≤i≤min{m,n}−1 | log 10 (σ i+1 ) − log 10 (σ i )|. In our experiments we found that this improved the dimension estimates. In some situations it is helpful to further divide each column of M by its standard deviation. This approach is explained in [38, p. 26].
Using PCA on a local scale is known as Nonlinear Principal Component Analysis (NPCA). Here we partition the sample Ω into l clusters Ω 1 , . . . , Ω l ⊂ Ω depending on . For each Ω i we apply the usual PCA and obtain the estimate dim pca (Ω i ). The idea behind this is that the manifold V \Sing(V ) is approximately linear locally. We take the average of these local dimensions, weighted by the size of each cluster. The result is the nonlinear PCA dimension Data scientists have many clustering methods. For our study we use single linkage clustering. This works as follows. The clusters are the connected components in the graph with vertex set Ω whose edges are the pairs of points having distance at most . We do this either in Euclidean space with metric (6), or in projective space with metric (7). In the latter case, the points come from the cone over the true variety V . To make Ω less scattered, we sample a random linear function l and scale each data point u (i) such that l(u (i) ) = 1. Then we use those affine coordinates for NPCA. We chose this procedure because NPCA detects linear spaces and the proposed scaling maps projective linear spaces to affine-linear spaces.
We next introduce the notions of box counting dimension, persistent homology curve dimension and correlation dimension. All three of these belong to the class of fractal-based methods, since they rest on the idea of using the fractal dimension as a proxy for dim(V ).

Box Counting Dimension.
Here is the geometric idea in R 2 . Consider a square of side length 1 which we cover by miniature squares. We could cover it with 4 squares of side length 1 2 , or 9 squares of side length 1 3 , etc. What remains constant is the log ratio of the number of pieces over the magnification factor. For the square: log (4) log(2) = log(9) log(3) = 2. If Ω only intersects 3 out of 4 smaller squares, then we estimate the dimension to be between 1 and 2.
In R n we choose as a box the parallelopiped with lower vertex u − = min(u (1) , . . . , u (m) ) and upper vertex u + = max(u (1) , . . . , u (m) ), where "min" and "max" are coordinatewise minimum and maximum. Thus the box is {x ∈ R n : u − ≤ x ≤ u + }. For j = 1, . . . , n, the interval [u − j , u + j ] is divided into R( ) equally sized intervals, whose length depends on . A d-dimensional object is expected to capture R( ) d boxes. We determine the number ν of boxes that contain a point in Ω. Then the box counting dimension estimate is dim box (Ω, ) := log(ν) log(R( )) .
How to define the function R( )? Since the number of small boxes is very large, we cannot iterate through all boxes. It is desirable to decide from a data point u ∈ Ω in which box it lies. To this end, we set R( ) = λ + 1, where λ := max 1≤j≤n |u + j − u − j |. Then, for u ∈ Ω and k = 1, . . . , n we compute the largest q k such that q k The n numbers q 1 , . . . , q n completely determine the box that contains the sample u.
For the box counting dimension in real projective space, we represent the points in Ω on an affine patch of P n−1 R . On this patch we do the same construction as above, the only exception being that "equally sized intervals" is measured in terms of scaled Fubini-Study distance (7).
Persistent Homology Curve Dimension. The underlying idea was proposed by the Pattern Analysis Lab at Colorado State University [49]. First we partition Ω into l clusters Ω 1 , . . . , Ω l using single linkage clustering with . On each subsample Ω i we construct a minimal spanning tree. Suppose that the cluster Ω i has m i points. Let l i (j) be the length of the j-th longest edge in a minimal spanning tree for Ω i . For each Ω i we compute .
The persistent homology curve dimension estimate dim PHCurve (Ω, ) is the average of the local dimensions, weighted by the size of each cluster: In the clustering step we take the distance (6) if the variety is affine and (7) if it is projective.
Correlation Dimension. This is motivated as follows. Suppose that Ω is uniformly distributed in the unit ball. For pairs u, v ∈ Ω, we have , where 1 is the indicator function. Since we expect the empirical distribution C( ) to be approximately d , this suggests using log(C( )) log( ) as dimension estimate. In [38, §3.2.6] it is mentioned that a more practical estimate is obtained from C( ) by selecting some small h > 0 and putting In practice, we compute the dimension estimates for a finite subset of parameters 1 , . . . , k and put h = min i =j | i − j |. The ball in P n−1 R defined by the scaled Fubini-Study distance (7) is a spherical cap of radius . Its volume relative to a cap of radius 1 is with the same h as above and where C( ) is now computed using the Fubini-Study distance.
We next describe two more methods. They differ from the aforementioned in that they derive from estimating the dimension of the variety V locally at a distinguished point u ( ) . [40] introduced a maximum likelihood estimator for the dimension of an unknown variety V . Their estimate is derived for samples in Euclidean space R n . Let k be the number of samples u (j) in Ω that are within distance to u ( ) . We write T i (u ( ) ) for the distance from u ( ) to its i-th nearest neighbor in Ω. Note that

MLE Dimension. Levina and Bickel
This expression is derived from the hypothesis that k = k( ) obeys a Poisson process on the -neighborhood {u ∈ Ω : dist R n (u, u ( ) ) ≤ }, in which u is uniformly distributed. The formula (11) is obtained by solving the likelihood equations for this Poisson process.
In projective space, we model k( ) as a Poisson process on {u ∈ Ω : dist P n−1 R (u, u ( ) ) ≤ }. However, instead of assuming that u is uniformly distributed in that neighborhood, we assume that the orthogonal projection of u onto the tangent space T u ( ) P n−1 R is uniformly distributed in the associated ball of radius sin . Then, we derive the formula where T i (u ( ) ) is the distance from u ( ) to its i-th nearest neighbor in Ω measured for (7). It is not clear how to choose u ( ) from the given Ω. We chose the following method. Fix the sample neighborhood Ω i := {u ∈ Ω : dist R n (u, u (i) ) ≤ }. For each i we evaluate the formula (11) for Ω i with distinguished point u (i) . With this, the MLE dimension estimate is ANOVA Dimension. Diaz, Quiroz and Velasco [22] derived an analysis of variance estimate for the dimension of V . In their approach, the following expressions are important: The quantity β d is the variance of the random variable Θ d , defined as the angle between two uniformly chosen random points on the (d − 1)-sphere. We again fix > 0, and we relabel so that u (1) , . . . , u (k) are the points in Ω with distance at most from u ( ) . Let θ ij ∈ [0, π] denote the angle between u (i) − u ( ) and u (j) − u ( ) . Then, the sample covariance of the θ ij is The analysis in [22] shows that, for small and Ω sampled from a d-dimensional manifold, the angles θ ij are approximately Θ d -distributed. Hence, S is expected to be close to β dim V . The ANOVA dimension estimate of Ω is the index d such that β d is closest to S: As for the MLE estimate, we average (14) over all u ∈ Ω being the distinguished point.
To transfer the definition to projective space, we revisit the idea behind the ANOVA estimate. For u close to u ( ) , the secant through u and u ( ) is approximately parallel to the tangent space of V at u ( ) . Hence, the unit vector (u ( ) − u)/ u ( ) − u is close to being in the tangent space T u ( ) (V ). The sphere in T u ( ) (V ) has dimension dim V − 1 and we know the variances of the random angles Θ d . To mimic this construction in P n−1 R we use the angles between geodesics meeting at u ( ) . In our implementation, we orthogonally project Ω to the tangent space T u ( ) P n−1 R and compute (13) using coordinates on that space. We have defined all the mathematical ingredients inherent in our dimension diagrams. Figure 2 now makes sense. Our software and its applications will be discussed in Section 7.

Persistent Homology
This section connects algebraic geometry and topological data analysis. It concerns the computation and analysis of the persistent homology [14] of our sample Ω. Persistent homology of Ω contains information about the shape of the unknown variety V from which Ω originates.

Barcodes
Let us briefly review the idea. Given Ω, we associate a simplicial complex with each value of a parameter ∈ [0, 1]. Just like in the case of the dimension diagrams in the previous section, determines the scale at which we consider Ω from local ( = 0) to global ( = 1). The complex at = 0 consists of only the vertices and at = 1 it is the full simplex on Ω.
Persistent homology identifies and keeps track of the changes in the homology of those complexes as varies. The output is a barcode, i.e. a collection of intervals. Each interval in the barcode corresponds to a topological feature which appears at the value of a parameter given by the left hand endpoint of the interval and disappears at the value given by the right hand endpoint. These barcodes play the same role as a histogram does in summarizing the shape of the data, with long intervals corresponding to strong topological signals and short ones to noise. By plotting the intervals we obtain a barcode, such as the one in Figure 3.
The most straightforward way to associate a simplicial complex to Ω at is by covering Ω with open sets U ( ) = m i=1 U i ( ) and then building the associated nerve complex. This is the simplicial complex with vertex set  the nerve complex is called theČech complex at . Here dist R n and dist P n R are the distances from (6) and (7), respectively. Theorem 4.2 gives a precise statement for a sufficient condition under which theČech complex of U ( ) built on Ω yields the correct topology of V . However, in practice the hypotheses of the theorem will rarely be satisfied.
Cech complexes are computationally demanding as they require storing simplices in different dimensions. For this reason, applied topologists prefer to work with the Vietoris-Rips complex, which is the flag simplicial complex determined by the edges of theČech complex. This means that a subset σ ⊂ [m] is a face of the Vietoris-Rips complex if and only if U i ( ) U j ( ) = ∅ for all i, j ∈ σ. With the definition in (15), the balls U i ( ) and U j ( ) intersect if and only if their centers u (i) and u (j) are less than 2 apart.
Consider the sample from the Trott curve in Figure 3. Following Example 2.4, we sampled by selecting random x-coordinates between −1 and 1, and solving for y, or vice versa. The picture on the right shows the barcode. This was computed via the Vietoris-Rips complex. For dimensions 0 and 1 the six longest bars are displayed. The sixth bar in dimension 1 is so tiny that we cannot see it. In the range where lies between 0 and 0.2, we see four components. The barcode for dimension 1 identifies four persisting features for between 0.01 and 0.12. Each of these indicates an oval. Once these disappear, another loop appears. This corresponds to the fact that the four ovals are arranged to form a circle. So persistent homology picks up on both intrinsic and extrinsic topological features of the Trott curve.
The repertoire of algebraic geometry offers a fertile testing ground for practitioners of persistent homology. For many classes of algebraic varieties, both over R and C, one has a priori information about their topology. For instance, the determinantal variety in Example 2.5 is the 3-manifold P 1 R × P 2 R . Using Henselman's software Eirene for persistent homology [30], we computed barcodes for several samples Ω drawn from varieties with known topology.

Tangent Spaces and Ellipsoids
We underscore the benefits of an algebro-geometric perspective by proposing a variant of persistent homology that performed well in the examples we tested. Suppose that, in addition to knowing Ω as a finite metric space, we also have information on the tangent spaces of the unknown variety V at the points u (i) . This will be the case after we have learned some polynomial equations for V using the methods in Section 5. In such circumstances, we suggest replacing the -balls in (15) with ellipsoids that are aligned to the tangent spaces.
The motivation is that in a variety with a bottleneck, for example in the shape of a dog bone, the balls around points on the bottleneck may intersect for smaller than that which is necessary for the full cycle to appear. When V is a manifold, we design a covering of Ω that exploits the locally linear structure. Let 0 < λ < 1. We take U i ( ) to be an ellipsoid around u (i) with principal axes of length in the tangent direction of V at u (i) and principal axes of length λ in the normal direction. In this way, we allow ellipsoids to intersect with their neighbors and thus reveal the true homology of the variety before ellipsoids intersect with other ellipsoids across the medial axis. The parameter λ can be chosen by the user. We believe that λ should be proportional to the reach of V . This metric invariant is defined in the next subsection.
In practice, we perform the following procedure. Let f = (f 1 , . . . , f k ) be a vector of polynomials that vanish on V , derived from the sample Ω ⊂ R n as in Section 5. An estimator for the tangent space T u (i) V is the kernel of the Jacobian matrix of f at u (i) . In symbols, Let q i denote the quadratic form on R n that takes value 1 on T u (i) V ∩ S n−1 and value λ on the orthogonal complement of T u (i) V in the sphere S n−1 . Then, the q i specify the ellipsoids The role of the -ball enclosing the ith sample point is now played by U i ( ) := u (i) + E i . These ellipsoids determine the covering U ( ) = m i=1 U i ( ) of the given point cloud Ω. From this covering we construct the associatedČech complex or Vietoris-Rips complex.
While using ellipsoids is appealing, it has practical drawbacks. Relating the smallest for which U i ( ) and U j ( ) intersect to dist R n (u (i) , u (j) ) is not easy. For this reason we implemented the following variant of ellipsoid-driven barcodes. We use the simplicial complex on In (17) we weigh the distance between u (i) and u (j) by the arithmetic mean of the radii of the two ellipsoids E i and E j in the direction u (i) − u (j) . If all quadratic forms q i were equal to n j=1 x 2 j , then the simplicial complex of (17) equals the Vietoris-Rips complex from (15). Figure 4 compares the barcodes for the classical Vietoris-Rips complex with those obtained from ellipsoids. It seems promising to further develop variants of persistent homology that take some of the defining polynomial equations for (Ω, V ) into consideration.

Reaching the Reach
TheČech complex of a covering U = m i=1 U i has the homology of the union of balls U . But, can we give conditions on the sample Ω ⊂ V under which a covering reveals the true  Note that τ (V ) = +∞, if and only if V is an affine-linear subspace. Otherwise, the reach is a non-negative real number. In particular, there exist varieties V with τ (V ) = 0. For instance, consider the union of two lines V = {(x, y) ∈ R 2 : xy = 0}. All points in the diagonal D = {(x, y) ∈ R 2 : x = y, x = 0} have two closest points on V . Hence, D is a subset of the medial axis M V , and we conclude that 0 ≤ τ (V ) ≤ inf u∈V,w∈D u − w = 0. In general, any singular variety with an "edge" has zero reach.
To illustrate the concept of the reach, let V be a smooth curve in the plane, and draw the normal line at each point of V . The collection of these lines is the normal bundle. At a short distance from the curve, the normal bundle is a product: each point u near V has a unique closest point u * on V , and u lies on the normal line through u * . At a certain distance, however, some of the normal lines cross. If u is a crossing point of minimal distance to V , then u has no unique closest point u * on V . Instead, there are at least two points on V that are closest to u and the distance from u to each of them is the reach τ (V ). Aamari et al. [1] picture this by writing that "one can roll freely a ball of radius τ (V ) around V ".
Niyogi, Smale and Weinberger refer to τ (V ) −1 as the "condition number of V ". Bürgisser et al.
With probability ≥ 1 − δ, the homology groups of the following set coincide with those of V : A few remarks are in order. First of all, the theorem is stated using the Euclidean distance and not the scaled Euclidean distance (6). However, scaling the distance by a factor t means scaling the volume by t d , so the definition of β in the theorem is invariant under scaling. Moreover, the theorem has been rephrased in a manner that makes it easier to evaluate the right hand side of (18) in cases of interest. The assumption d ≤ 17 is not important: it ensures that the volume of the unit ball in R d can be bounded below by 1. Furthermore, in [46, Theorem 3.1], the tolerance can be any real number between 0 and τ /2, but then β depends in a complicated manner on . For simplicity, we took = τ /4. Theorem 4.2 gives the asymptotics of a sample size m that suffices to reveal all topological features of V . For concrete parameter values it is less useful, though. For example, suppose that V has dimension 4, reach τ = 1, and volume ν = 1000. If we desire a 90% guarantee that U ( ) has the same homology as V , so δ = 1/10, then m must exceed 1, 592, 570, 365. In addition to that, the theorem assumes that the sample was drawn from the uniform distribution on V . But in practice one will rarely meet data that obeys such a distribution. In fact, drawing from the uniform distribution on a curved object is a non-trivial affair [21].
In spite of its theoretical nature, the Niyogi-Smale-Weinberger formula is useful in that it highlights the importance of the reach τ (V ) for analyzing point samples. Indeed, the dominant quantity in (18) is β, and this grows to the power of d in τ (V ) −1 . It is therefore of interest to better understand τ (V ) and to develop tools for estimating it.
We found the following formula by Federer [27,Theorem 4.18] to be useful. It expresses the reach of a manifold V in terms of points and their tangent spaces: This formula relies upon knowing the tangent spaces at each point of u ∈ V . Suppose we are given the finite sample Ω from V . If some equations for V are also known, then we can use the estimator T u (i) V for the tangent space that was derived in (16). From this we get the following formula for the empirical reach of our sample: A similar approach for estimating the reach was proposed by Aamari et al.

Algebraicity of Persistent Homology
It is impossible to compute in the field of real numbers R. Numerical computations employ floating point approximations. These are actually rational numbers. Computing in algebraic geometry has traditionally been centered around exact symbolic methods. In that context, computing with algebraic numbers makes sense as well. In this subsection we argue that, in the setting of this paper, most numerical quantities in persistent homology, like the barcodes and the reach, have an algebraic nature. Here we assume that the variety V is defined over Q.
We discuss the work of Horobeţ and Weinstein in [32] which concerns metric properties of a given variety V ⊂ R n that are relevant for its true persistent homology. Here, the true persistent homology of V , at parameter value , refers to the homology of the -neighborhood of V . Intuitively, the true persistent homology of the Trott curve is the limit of barcodes as in Figure 3, where more and more points are taken, eventually filling up the entire curve.
An important player is the offset hypersurface O (V ). This is the algebraic boundary of the -neighborhood of V . More precisely, for any positive value of , the offset hypersurface is the Zariski closure of the set of all points in R n whose distance to V equals . If n = 2 and V is a plane curve, then the offset curve O (V ) is drawn by tracing circles along V .   Figure 5 we examine a conic V , shown in black. The light blue curve is its evolute. This is an astroid of degree 6. The evolute serves as the ED discriminant of V , in the context seen in [23, Figure 3]. The blue curves in Figure 5  It is shown in [32,Theorem 3.4] that the endpoints of bars in the true persistent homology of a variety V occur at numbers that are algebraic over Q. The proof relies on results in real algebraic geometry that characterize the family of fibers in a map of semialgebraic sets. The reach τ (V ) of any real variety V ⊂ R n is also an algebraic number. This follows from Federer's formula (19) which expresses τ (V ) as the optimal value of a polynomial optimization problem. In principle, the reach can be computed in exact arithmetic from the polynomials that define V . It remains an open problem how to do this effectively in practice. Eklund's recent work on bottlenecks [26] represents an important step towards a solution.
At present we do not know a good formula or a tight bound for the algebraic degrees of the barcode and the reach in terms of the invariants of the variety V . Deriving such formulas will require a further development and careful analysis of the offset discriminant that was introduced in [32]. We hope to return to this topic in the near future, as it can play a fundamental link between topology and algebraic geometry in the context of data science.

Finding Equations
Every polynomial in the ideal I V of the unknown variety V vanishes on the sample Ω. The converse is not true, but it is reasonable to surmise that it holds among polynomials of low degree. The ideal I Ω of the finite set Ω ⊂ R n can be computed using linear algebra. All our polynomials and ideals in this section lie in the ring R = R[x 1 , x 2 , . . . , x n ].

Vandermonde Matrices
Let M be a finite linearly independent subset of R. We write R M for the R-vector space with basis M and generally assume that M is ordered, so that polynomials in R M can be identified with vectors in R |M| . Two primary examples for M are the set of monomials x e = x e 1 1 x e 2 2 · · · x en n of degree d and the set of monomials of degree at most d. We use the notation R d and R ≤d for the corresponding subspaces of R. Their dimensions |M| are We write U M (Ω) for the m × |M| matrix whose i-th row consists of the evaluations of the polynomials in M at the point u (i) . Instead of U M (Ω) we write U d (Ω) when M contains all monomials of degree d and U ≤d (Ω) when M contains monomials of degree ≤ d.
For example, if n = 1, m = 3, and Ω = {u, v, w} then U ≤3 (Ω) is the Vandermonde matrix For n ≥ 2, we call U M (Ω) a multivariate Vandermonde matrix. It has the following property: The strategy for learning the variety V is as follows. We hope to learn the ideal I V by making an educated guess for the set M. The two desirable properties for M are: There is a fundamental tension between these two desiderata: if M is too small then (a) will fail, and if M is too large then (b) will fail. But, of course, suitable sets M do always exist, since the Hilbert's Basis Theorem ensures that all ideals in R are finitely generated.
The requirement (b) imposes a lower bound on the size m of the sample. Indeed, m is an upper bound on the rank of U M (Ω), since that matrix has m rows. The rank of any matrix is equal to the number of columns minus the dimension of the kernel. This implies: In practice, however, the sample Ω is given and fixed. Thus, we know m and it cannot be increased. The question is how to choose the set M. This leads to some interesting geometric combinatorics. For instance, if we believe that V is homogeneous with respect to some Z r -grading, then it makes sense to choose a set M that consists of all monomials in a given Z r -degree. Moreover, if we assume that V has a parametrization by sparse polynomials then we would use a specialized combinatorial analysis to predict a set M that works. A suitable choice of M can improve the numerical accuracy of the computations dramatically.
In addition to choosing the set of monomials M, we face another problem: how to represent I Ω ∩ R M ? Computing a basis for the kernel of U M (Ω) yields a set of generators for I Ω ∩ R M . But which basis to use and how to compute it? For instance, the right-singular vectors of U M (Ω) with singular value zero yield an orthonormal basis of I Ω ∩ R M . But in applications one often meets ideals I that have sparse generators. This holds in Section 2.
This is the first element in an orthonormal basis for I Ω ∩ R ≤2 , where Ω is a sample drawn from a certain variety V in R 9 . From such a basis, it is very hard to guess what V might be.
It turns out that V is SO(3), the group of rotations in 3-space. After renaming the nine variables, we find the 20-dimensional space of quadrics mentioned in Example 2.2. However, the quadrics seen in (2) are much nicer. They are sparse and easy to interpret.
For this reason we aim to compute sparse bases of multivariate Vandermonde matrices. There is a trade-off between obtaining sparse basis vectors and stability of the computations. We shall discuss this issue in the next subsection. See Table 1 for a brief summary.

Numerical Linear Algebra
Computing kernels of matrices of type U M (Ω) is a problem in numerical linear algebra. One scenario where the methodology has been developed and proven to work well is the Generalized Principal Component Analysis of Ma et al. [41], where V is a finite union of linear subspaces in R n . For classical Vandermonde matrices, the Bjoerck-Pereyra algorithm [7] accurately computes a LU-decomposition of the Vandermonde matrix; see [31,Section 22]. This decomposition may then be used to compute the kernel. A generalization of this for multivariate Vandermonde matrices of the form U ≤d (Ω) is given in [47,Theorem 4.4]. To date such a decomposition for U M (Ω) is missing for other subsets of monomials M. Furthermore, [47,Theorem 4.4] assumes that the multivariate Vandermonde matrix is square and invertible, but this is never the case in our situation.
In the literature on numerical algebraic geometry, it is standard to represent varieties by point samples, and there are several approaches for learning varieties, and even schemes, from such numerical data. See e.g. [18,28] and the references therein. From the perspective of commutative algebra, our interpolation problem was studied in e.g. [44,45].
We developed and implemented three methods based on classical numerical linear algebra: 1. via the R from a QR-decomposition, 2. via a singular value decomposition (SVD), or 3. via the reduced row echelon form (RREF) of U M (Ω).
The goal is to compute a (preferably sparse) basis for the kernel of U M (Ω), with N = |M|. All three methods are implemented in our software. Their descriptions are given below.
QR slightly less accurate and fast than SVD, yields some sparse basis vectors. SVD accurate, fast, but returns orthonormal and hence dense basis. RREF no accuracy guarantees, not as fast as the others, gives a sparse basis. Algorithm 1: with qr 1 Input: A multivariate Vandermonde matrix U ∈ R m×N and a tolerance value τ ≥ 0. 2 Output: A basis for the kernel of U . 3 Compute the QR-decomposition U = QR, where Q is orthogonal and R is upper triangular; N , a = (a 1 , . . . , a N ) and put a i = 1; 7 Solve R y = R i for y, where R i is the i-th column of R.;  Each of these three methods has its upsides and downsides. These are summarized in Table 1. The algorithms require a tolerance τ ≥ 0 as input. This tolerance value determines the numerical rank of the matrix. Let σ 1 ≥ · · · ≥ σ min{m,N } be the ordered singular values of the m × N matrix U . As in the beginning of Subsection 3.2, the numerical rank of U is Using the criterion in [19, §3.5.1], we can set τ = ε σ 1 max{m, N }, where is the machine precision. The rationale behind this choice is [19, Corollary 5.1], which says that the roundoff error in the σ i is bounded by E , where · is the spectral norm and U +E is the matrix whose singular values were computed. For backward stable algorithms we may use the bound E = O(ε)σ 1 . On the other hand, our experiments suggest that an appropriate value for τ is given by 1 2 (σ i + σ i+1 ), for which the jump from log 10 (σ i ) to log 10 (σ i+1 ) is significantly large. This choice is particularly useful for noisy data (as seen in Subsection 7.3). In case of noise the first definition of τ will likely fail to detect the true rank of U ≤d (Ω). The reason for this lies in the numerics of Vandermonde matrices, discussed below.
We apply all of the aforementioned to the multivariate Vandermonde matrix U M (Ω), for any finite set M in R that is linearly independent. We thus arrive at the following algorithm. Step 4. The more desirable sparse equations from (2) are found when using Algorithm 1 (with QR). In both cases the tolerance was set to be τ ≈ 4·10 −14 σ 1 , where σ 1 is the largest singular value of the Vandermonde matrix U ≤2 (Ω).
Running Algorithm 4 for a few good choices of M often leads to an initial list of nonzero polynomials that lie in I Ω and also in I V . Those polynomials can then be used to infer an upper bound on the dimension and other information about V . This is explained in Section 6. Of course, if we are lucky, we obtain a generating set for I V after a few iterations.
If m is not too large and the coordinates of the points u (i) are rational, then it can be preferable to compute the kernel of U M (Ω) symbolically. Gröbner-based interpolation methods, such as the Buchberger-Möller algorithm [44], have the flexibility to select M dynamically. With this, they directly compute the generators for the ideal I Ω , rather than the user having to worry about the matrices U ≤d (Ω) for a sequence of degrees d. In short, users should keep symbolic methods in the back of their minds when contemplating Algorithm 4.
In the remainder of this section, we discuss numerical issues associated with Algorithm 4. The key step is computing the kernel of the multivariate Vandermonde matrix U M (Ω). As illustrated in (20) for samples Ω on the line (n = 1), and M being all monomials up to a fixed degree, this matrix is a Vandermonde matrix. It is conventional wisdom that Vandermonde matrices are severely ill-conditioned [48]. Consequently, numerical linear algebra solvers are expected to perform poorly when attempting to compute the kernel of U d (Ω).
One way to circumvent this problem is to use a set of orthogonal polynomials for M. Then, for large sample sizes m, two distinct columns of U M (Ω) are approximately orthogonal, implying that U M (Ω) is well-conditioned. This is because the inner product between the columns associated to f 1 , f 2 ∈ M is approximately the integral of f 1 · f 2 over R n . However, a sparse representation in orthogonal polynomials does not yield a sparse representation in the monomial basis. Hence, to get sparse polynomials in the monomials basis from U M (Ω), we must employ other methods than the ones presented here. For instance, techniques from compressed sensing may help to compute sparse representations in the monomial basis.
We are optimistic that a numerically-reliable algorithm for computing the kernel of matrices U ≤d (Ω) exists. The Bjoerck-Pereyra algorithm [7] solves linear equations U a = b for an n × n Vandermonde matrix U . There is a theoretical guarantee that the computed solutionâ satisfies |a −â| ≤ 7n 5 + O(n 4 2 ); see [31,Corollary 22.5]. Hence,â is highly accurate -despite U being ill-conditioned. This is confirmed by the experiment mentioned in the beginning of [31,Section 22.3], where a linear system with κ(U ) ∼ 10 9 is solved with a relative error of 5 . We suspect that a Bjoerck-Pereyra-like algorithm together with a thorough structured-perturbation analysis for multivariate Vandermonde matrices would equip us with an accurate algorithm for finding equations. For the present article, we stick with the three methods above, while bearing in mind the difficulties that ill-posedness can cause.

Learning from Equations
At this point we assume that the methods in the previous two sections have been applied. This means that we have an estimate d of what the dimension of V might be, and we know a set P of polynomials that vanish on the finite sample Ω ⊂ R n . We assume that the sample size m is large enough so that the polynomials in P do in fact vanish on V . We now use P as our input. Perhaps the unknown variety V is one of the objects seen in Subsection 2.2.

Computational Algebraic Geometry
A finite set of polynomials P in Q[x 1 , . . . , x n ] is the typical input for algebraic geometry software. Traditionally, symbolic packages like Macaulay2, Singular and CoCoA were used to study P. Buchberger's Gröbner basis algorithm is the workhorse underlying this approach. More recently, numerical algebraic geometry has emerged, offering lots of promise for innovative and accurate methods in data analysis. We refer to the textbook [5], which centers around the excellent software Bertini. Next to using Bertini, we also employ the Julia package HomotopyContinuation.jl [9]. Both symbolic and numerical methods are valuable for data analysis. The questions we ask in this subsection can be answered with either.
In what follows we assume that the unknown variety V is equal to the zero set of the input polynomials P. We seek to answer the following questions over the complex numbers: 1. What is the dimension of V ? 2. What is the degree of V ?
3. Find the irreducible components of V and determine their dimensions and degrees.
Here is an example that illustrates the workflow we imagine for analyzing samples Ω. Suppose that an adversary constructs a dataset Ω of size m = 500 by the following process. He picks random integers s i and t j , computes the 4 × 4-Hankel matrix, and then deletes the antidiagonal coordinate x. For the remaining six coordinates he fixes some random ordering, such as (c, f, b, e, a, d). Using this ordering, he lists the 500 points. This is our input Ω ⊂ R 6 .
We now run Algorithm 4 for the m×210-matrix U ≤4 (Ω). The output of this computation is the following pair of quartics which vanishes on the variety V ⊂ R 6 that is described above: Not knowing the true variety, we optimistically believe that the zero set of P is equal to V . This would mean that V is a complete intersection, so it has codimension 2 and degree 16.
At this point, we may decide to compute a primary decomposition of P . We then find that there are two components of codimension 2, one of degree 3 and the other of degree 10. Since 3 + 10 = 16, we learn that P is not a radical ideal. In fact, the degree 3 component appears with multiplicity 2. Being intrigued, we now return to computing equations from Ω.
From the kernel of the m × 252-matrix U 5 (Ω), we find two new quintics in I Ω . These only reduce the degree to 3 + 10 = 13. Finally, the kernel of the m × 452-matrix U 6 (Ω) suffices. The ideal I V is generated by 2 quartics, 2 quintics and 4 sextics. The mystery variety V ⊂ R 6 has the same dimension and degree as the rank 2 Hankel variety in R 7 whose projection it is.
Our three questions boil down to solving a system P of polynomial equations. Both symbolic and numerical techniques can be used for that task. Samples Ω seen in applications are often large, are represented by floating numbers, and have errors and outliers. In those cases, we use Numerical Algebraic Geometry [5,9]. For instance, in Example 6.1 we intersect (22) with a linear space of dimension 2. This results in 16 isolated solutions. Further numerical analysis in step 3 reveals the desired irreducible component of degree 10. In the numerical approach to answering the three questions, one proceeds as follows: 1. We add s random (affine-)linear equations to P and we solve the resulting system in C n . If there are no solutions, then dim(V ) < s. If the solutions are not isolated, then dim(V ) > s. Otherwise, there are finitely many solutions, and dim(V ) = s.
2. The degree of V is the finite number of solutions found in step 1.
3. Using monodromy loops (cf. [5]), we can identify the intersection of a linear space L with any irreducible component of V C whose codimension equals dim(L).
The dimension diagrams from Section 3 can be used to guess a suitable range of values for the parameter s in step 1. However, if we have equations at hand, it is better to determine the dimension s as follows. Let P = {f 1 , . . . , f k } and u be any data point in Ω. Then, we choose the s from step 1 as the corank of the Jacobian matrix of f = (f 1 , . . . , f k ) at u; i.e, s := dim ker Jf (u).
Note that s = dim V (P) as long as u is not a singular point of V (P). In this case, s provides an upper bound for the true dimension of V . That is why it is important in step 3 to use higher-dimensional linear spaces L to detect lower-dimensional components of V (P).
Example 6.2. Take m = n = 3 in Example 2.3. Let P consist of the four 2 × 2-minors that contain the upper-left matrix entry x 11 . The ideal P has codimension 3 and degree 2. Its top-dimensional components are x 11 , x 12 , x 13 and x 11 , x 21 , x 31 . However, our true model V has codimension 4 and degree 6: it is defined by all nine 2 × 2-minors. Note that P is not radical. It also has an embedded prime of codimension 5, namely x 11 , x 12 , x 13 , x 21 , x 31 .

Real Degree and Volume
The discussion in the previous subsection was about the complex points of the variety V . The geometric quantity deg(V ) records a measurement over C. It is insensitive to the geometry of the real points of V . That perspective does not distinguish between P = {x 2 + y 2 − 1} and P = {x 2 + y 2 + 1}. That distinction is seen through the lens of real algebraic geometry.
In this subsection we study metric properties of a real projective variety V ⊂ P n R . We explain how to estimate the volume of V . Up to a constant depending on d = dim V , this volume equals the real degree deg R (V ), by which we mean the expected number of real intersection points with a linear subspace of codimension dim(V ); see Theorem 6.3 below.
To derive these quantities, we use Poincaré's kinematic formula [33,Theorem 3.8]. For this we need some notation. By [39] there is a unique orthogonally invariant measure µ on P n R up to scaling. We choose the scaling in a way compatible with the unit sphere S n : .
This makes sense because P n R is doubly covered by S n . The n-dimensional volume µ induces a d-dimensional measure of volume on P n R for any d = 1, 2, . . . , n − 1. We use that measure for d = dim(V ) to define the volume of our real projective variety as vol(V ) := µ(V ).
Let Gr(k, P n R ) denote the Grassmannian of k-dimensional linear spaces in P n R . This is a real manifold of dimension (n − k)(k + 1). Because of the Plücker embedding it is also a projective variety. We saw this for k = 1 in Example 2.6, but we will not use it here. Again by [39], there is a unique orthogonally invariant measure ν on Gr(k, P n R ) up to scaling. We choose the scaling ν(Gr(k, P n R )) = 1. This defines the uniform probability distribution on the Grassmannian. Poincaré's Formula [33,Theorem 3.8] states: Theorem 6.3 (Kinematic formula in projective space). Let V be a smooth projective variety of codimension k = n − d in P n R . Then its volume is the volume of P d R times the real degree: Note that in case of V being a linear space of dimension d, we have #(L ∩ V ) = 1 for all L ∈ Gr(n − d, P n R ). Hence, vol(V ) = vol(P d R ), which verifies the theorem in this instance. The theorem suggests an algorithm. Namely, we sample linear spaces L 1 , L 2 , . . . , L N independently and uniformly at random, and compute the number r(i) of real points in V ∩ L i for each i. This can be done symbolically (using Gröbner bases) or numerically (using homotopy continuation). We obtain the following estimator for vol(V ): We can sample uniformly from Gr(k, P n R ) by using the following lemma: Lemma 6.4. Let A be a random (k+1) × (n+1) matrix with independent standard Gaussian entries. The row span of A follows the uniform distribution on the Grassmannian Gr(k, P n R ). Proof. The distribution of the row space of A is orthogonally invariant. Since the orthogonally invariant probability measure on Gr(k, P n R ) is unique, the two distributions agree. Example 6.5. Let n = 2, k = 1, and let V be the Trott curve in P 2 R . The area of the projective plane P 2 R is half of the surface area of the unit circle: µ(P 1 R ) = 1 2 ·vol(S 1 ) = π. The real degree of V is computed with the method suggested in Lemma 6.4: deg R (V ) = 1.88364. We estimate the length of the Trott curve to be the product of these two numbers: 5.91763. Note that 5.91763 does not estimate the length of the affine curve depicted in Figure 3, but it is the length of the projective curve defined by the homogenization of the polynomial (1). Remark 6.6. Our discussion in this subsection focused on real projective varieties. For affine varieties V ⊂ R n there is a formula similar to Theorem 6.3. By [51, (14.70 where dL is the density of affine (n−d)-planes in R n from [51, Section 12.2], vol(·) is Lebesgue measure in R n and O m := vol (S m ). The problem with using this formula is that in general we do not know how to sample from the density dL given L ∩ V = ∅. The reason is that this distribution depends on vol(V )-which we were trying to compute in the first place. Suppose that the variety V is the image of a parameter space over which integration is easy. This holds for V = SO(3), by (3). For such cases, here is an alternative approach for computing the volume: pull back the volume form on V to the parameter space and integrate it there. This can be done either numerically or -if possible-symbolically. Note that this method is not only applicable to smooth varieties, but to any differentiable manifold.

Software and Experiments
In this section, we demonstrate how the methods from previous sections work in practice. The implementations are available in our Julia package LearningAlgebraicVarieties. We offer a step-by-step tutorial. To install our software, start a Julia session and type Pkg.clone("https://github.com/PBrdng/LearningAlgebraicVarieties.git") After the installation, the next command is

using LearningAlgebraicVarieties
This command loads all the functions into the current session. Our package accepts a dataset Ω as a matrix whose columns are the data points u (1) , u (2) , . . . , u (m) in R n .
To use the numerical algebraic geometry software Bertini, we must first download it from https://bertini.nd.edu/download.html. The Julia wrapper for Bertini is installed by Pkg.clone("https://github.com/PBrdng/Bertini.jl.git") The code HomotopyContinuation.jl accepts input from the polynomial algebra package MultivariatePolynomials.jl 1 . The former is described in [9] and it is installed using

Pkg.add("HomotopyContinuation")
We apply our package to three datasets. The first comes from the group SO(3), the second from the projective variety V of 2 × 3-matrices (x ij ) of rank 1, and the third from the conformation space of cyclo-octane.
In the first two cases, we draw the samples ourselves. The introduction of [21] mentions algorithms to sample from compact groups. However, for the sake of simplicity we use the following algorithm for sampling from SO(3). We use Julia's qr()-command to compute the QR-decomposition of a random real 3 × 3 matrix with independent standard Gaussian entries and take the Q of that decomposition. If the computation is such that the diagonal entries of R are all positive then, by [43,Theorem 1], the matrix Q is uniformly distributed in O(3). However, in our case, Q ∈ SO(3) and we do not know its distribution.
Our sample from the Segre variety V = P 1 R ×P 2 R in P 5 R is drawn by independently sampling two standard Gaussian matrices of format 2 × 1 and 1 × 3 and multiplying them. This procedure yields the uniform distribution on V because the Segre embedding is an isometry under the Fubini-Study metrics on P 1 R , P 2 R and P 5 R . The third sample, which is 6040 points from the conformation space of cyclo-octane, is taken from Adams et al. Now the current session should contain a variable data that is a 9 × 887 matrix. We produce the dimension diagrams by typing DimensionDiagrams(data, false, methods=[:CorrSum,:PHCurve]) In this command, data is our dataset, the Boolean value is true if we suspect our variety is projective and false otherwise, and methods is any of the dimension estimates :CorrSum, :BoxCounting :PHCurve, :NPCA, :MLE, and :ANOVA. We can leave this unspecified and type

DimensionDiagrams(data, false)
This command plots all six dimension diagrams. Both outputs are shown in Figure 6.  Three estimates are close to 3, so we correctly guess the true dimension of SO(3). In our experiments we found that NPCA and Box Counting Dimension often overestimate.
We proceed by finding polynomials that vanish on the sample. The command we use is where method is one of :with svd, :with qr, :with rref. round.(f) The precision can be specified, the default being to the nearest integer. We obtain the output Let us continue analyzing the 20 quadrics saved in the variable f. We use the following command in Bertini to determine whether our variety is reducible and compute its degree: import Bertini: bertini bertini(round.(f), TrackType = 1, bertini_path = p1) Here p1 is the path to the Bertini binary. Bertini confirms that the variety is irreducible of degree 8 and dimension 3 (cf. Figure 6). Using Eirene we construct the barcodes depicted in Figure 7. We run the following commands to plot barcodes for a random subsample of 250 points in SO (3): # sample 250 random points i = rand(1:887, 250) # compute the scaled Euclidean distances dists = ScaledEuclidean(data[:,i]) # pass distance matrix to Eirene and plot barcodes in dimensions up to 3 C = eirene(dists, maxdim = 3) barcode_plot(C, [0,1,2,3], [8,8,8,8]) The first array [0,1,2,3] of the barcode_plot() function specifies the desired dimensions. The second array [8,8,8,8] selects the 8 largest barcodes for each dimension. If the user does not pass the last array to the function, then all the barcodes are plotted. To compute barcodes arising from the complex specified in (17), we type dists = EllipsoidDistances(data[:,i], f, 1e-5) C = eirene(dists, maxdim = 3) barcode_plot(C, [0,1,2,3], [8,8,8,8]) Here, f = FindEquations(data, :with_qr, 2, false) is the vector of 20 quadrics. The third argument of EllipsoidDistances is the parameter λ from (17). It is here set to 10 −5 . . The left picture shows the standard Vietoris-Rips complex, while that on the right comes from the ellipsoid-driven complex (17). Neither reveals any structures in dimension 3, though V = SO(3) is diffeomorphic to P 3 R and has a non-vanishing H 3 (V, Z).
Our subsample of 250 points is not dense enough to reveal features except in dimension 0. Instead of randomly selecting the points in the subsample, one could also use the sequential maxmin landmark selector [2, §5.2]. Subsamples chosen this way tend to cover the dataset and to be spread apart from each other. One might also improve the result by constructing different complexes, for example, the lazy witness complexes in [2, §5]. However, this is not implemented in Eirene at present. 7.2 Dataset 2: a sample from the variety of rank one 2×3-matrices The second sample consists of 200 data points from the Segre variety P 1 R × P 2 R in P 5 R , that is Example 2.3 with m = n = 3, r = 1. We load our sample into the Julia session by typing data = datasets["2x3 rank one matrices"] We try the DimensionDiagrams command once with the Boolean value set to false (Euclidean space) and once with the value set to true (projective space). The diagrams are depicted in Figure 8. As the variety V naturally lives in P 5 R , the projective diagrams yield better estimates and hint that the dimension is either 3 or 4. The true dimension in P 5 R is 3. The next step is to find polynomials that vanish. We set homogeneous equations to true and d = 2: f = FindEquations(data, method, 2, true). All three methods, SVD, QR and RREF, correctly report the existence of three quadrics. The equations obtained with QR after rounding are as desired: x 1 x 4 − x 2 x 3 = 0, x 1 x 6 − x 2 x 5 = 0, x 3 x 6 − x 4 x 5 = 0.
Running Bertini we verify that V is an irreducible variety of dimension 3 and degree 3.
We next estimate the volume of V using the formula in Theorem 6.3. We intersect V with 500 random planes in P 5 R and count the number of real intersection points. We must initialize 500 linear functions with Gaussian entries involving the same variables as f: import MultivariatePolynomials: variables X = variables(f) Ls = [randn(3, 6) * X for i in 1:500] Now, we compute the real intersection points using HomotopyContinuation.jl.
using HomotopyContinuation r = map(Ls) do L # we multiply with a random matrix to make the system square S = solve([randn ( The command pi^2 * mean(r) reports an estimate of 19.8181 for the volume of V . The true volume of V is the length of P 1 R times the area of P 2 R , which is π · (2π) = 19.7392. Figure 9: Barcodes for 200 points on the Segre variety of 2 × 3 matrices of rank 1. The true mod 2 Betti numbers of P 1 R × P 2 R are 1, 2, 2, 1. The left picture shows the barcodes for the usual Vietoris-Rips complex computed using scaled Fubini-Study distance. The right picture is computed using the scaled Euclidean distance. Using the Fubini-Study distance yields better results.
Using Eirene, we construct the barcodes depicted in Figure 9. The barcodes constructed using Fubini-Study distance detect persistent features in dimensions 0, 1 and 2. The barcodes using Euclidean distance only have a strong topological signal in dimension 0.

Dataset 3: conformation space of cyclo-octane
Our next variety V is the conformation space of the molecule cyclo-octane C 8 H 16 . We use the same sample Ω of 6040 points that was analyzed in [2, §.6.3]. Cyclo-octane consists of eight carbon atoms arranged in a ring and each bonded to a pair of hydrogen atoms (see Figure 10). The location of the hydrogen atoms is determined by that of the carbon atoms due to energy minimization. Hence, the conformation space of cyclo-octane consists of all possible spatial arrangements, up to rotation and translation, of the ring of carbon atoms.
Each conformation is a point in R 24 = R 8·3 , which represents the coordinates of the carbon atoms {z 0 , . . . , z 7 } ⊂ R 3 . Every carbon atom z i forms an isosceles triangle with its Thus we expect to find 16 quadrics from the given data. In our sample we have c ≈ 2.21.
The conformation space is defined modulo translations and rotation; i.e., modulo the 6-dimensional group of rigid motions in R 3 . An implicit representation of this quotient space arises by substituting (24) into the Schönberg matrix of Example 2.8 with p = 8 and r = 3.
However, the given Ω lives in R 24 = R 8·3 , i.e. it uses the coordinates of the carbon atoms. Since the group has dimension 6, we expect to find 6 equations that encode a normal form. That normal form is a distinguished representative from each orbit of the group action. Brown et al. [10] and Martin et al. [42] show that the conformation space of cyclo-octane is the union of a sphere with a Klein bottle, glued together along two circles of singularities. Hence, the dimension of V is 2, and it has Betti numbers 1, 1, 2 in mod 2 coefficients. We next proceed to equations of degree 2. Our hope is to find the 16 quadrics in (24). Let us check whether this works. Figure 12 on the right shows the logarithms of the singular values of the multivariate Vandermonde matrix U ≤2 (Ω). Based on this we set τ = 10 −6 .
The command FindEquations(M, :with_svd, 2, 1e-6) reveals 21 quadrics. However, these are the pairwise products of the 6 linear equations we found earlier. An explanation for why we cannot find the 16 distance quadrics is as follows. Each of the 6 linear equations evaluated at the points in Ω gives about 10 −3 in our numerical computations. Thus their products equal about 10 −6 . The distance quadrics equal about 10 −3 . At tolerance 10 −6 , we miss them. Their values are much larger than the 10 −6 from the 21 redundant quadrics. By randomly rotating and translating each data point, we can manipulate the dataset such that FindEquations together with a tolerance value τ = 10 −1 gives the 16 desired quadrics. The fact that no linear equation vanishes on the manipulated dataset provides more evidence that 3 linear equations are determining the normal form for rotations. Figure 13: Barcodes for a subsample of 500 points from the cyclo-octane dataset. The left plot shows the barcodes for the usual Vietoris-Rips complex. The right picture shows barcodes for the ellipsoid-driven simplicial complex in (17). The right barcode correctly captures the homology of the conformation space.
The cyclo-octane dataset was used in [2, §.6.3] to demonstrate that persistent homology can efficiently recover the homology groups of the conformation space. We confirmed this result using our software. We determined the barcodes for a random subsample of 500 points. In addition to computing with Vietoris-Rips complexes, we use the 6 linear equations and the 16 distance quadrics to produce the ellipsoid-driven barcode plots. The results are displayed in Figure 13. The barcodes from the usual Vietoris-Rips complex do not capture the correct homology groups, whereas the barcodes arising from our new complex (17) do.