Distances Between Immersed Graphs: Metric Properties

Graphs in metric spaces appear in a wide range of data sets, and there is a large body of work focused on comparing, matching, or analyzing collections of graphs in different ambient spaces. In this survey, we provide an overview of a diverse collection of distance measures that can be defined on the set of finite graphs immersed (and in some cases, embedded) in a metric space. For each of the distance measures, we recall their definitions and investigate which of the properties of a metric they satisfy. Furthermore we compare the distance measures based on these properties and discuss their computational complexity.


Introduction
In this survey, we provide a methodical overview of a collection of distance measures aimed at comparing graphs immersed and embedded in an ambient metric space, such as Euclidean space. Data in this form arises in a wide range of areas, including GIS, trajectory analysis, protein alignment, plant morphology, and commodity networks such as electrical grids. Intuitively comparing two networks might require a mapping or correspondence between the networks. For example, if one has a ground truth of a road network and a simplification or reconstruction of the same network, one may wish to measure the error of the latter. In this case, a mapping between the two networks would identify the parts of the ground truth that are successfully reconstructed/simplified and would enable one to study the local error of the reconstruction. In another example, two embedded graphs could serve as representations of a geographic network (e.g., rivers) in two different years, and mappings between them allow one to measure where and how much the networks have changed. On the other hand, many networks are not isomorphic, nor is one necessarily interested in true subgraph isomorphism.
B Maike Buchin maike.buchin@rub.de Extended author information available on the last page of the article Given the prevalence of immersed and embedded graphs, as well as the wide range of potential domain questions, we need mathematical foundations for comparing and measuring the resemblance of such structures. Moreover, such data is often collected from a noisy and error-prone process, leading to a need for a wide variety of distances that are robust to different types of issues in the data. Such measures are useful in machine learning and statistical approaches, where an understanding of the mathematical structure of the options gives stronger guarantees and theoretical analyses.
More formally, in this paper, we consider the set G M of finite graphs immersed in a topological metric space (M, δ) and the set G M of finite graphs embedded in M. The most common setting is when we compare planar graphs immersed or embedded in Euclidean space, so that M = R 2 and δ is the Euclidean distance d 2 . We note that immersions (where edges are allowed to cross) and embeddings (where the image in the space M may not have crossings) are both of interest; for example, a road network is often considered to be embedded, but, in fact, given overpasses and bridges, in M = R 2 an immersion is the correct representation. Our core question is the following: given G 1 , G 2 ∈ G M (or ∈ G M ), how can we measure the distance between G 1 and G 2 ? Moreover, what metric properties does a given distance satisfy?
Goal. The goal of this survey is to enumerate and to understand various metric properties of G M and G M under different graph distance measures. We note that different distance measures capture different aspects of the given graphs, and we conjecture that a formal study of the mathematical properties strengthens our understanding of these tradeoffs. For example, some prioritize global similarity, while others focus on local matching; some are theoretically sound but difficult to compute. After briefly summarizing relevant definitions and background in Section 2, we discuss distances between immersed graphs in Section 3. Finally, we conclude in Section 4 by mentioning several natural open problems and possible future work in this area.

Preliminaries
We are motivated by the study of the spaces of graphs embedded in Euclidean space induced by different distances. In this work, we consider a more general setting: spaces of graphs immersed in a metric space (M, δ). (In fact, we consider spaces of pseudographs, allowing self-loops and multiple edges between pairs of vertices.) We begin by briefly defining some of the key concepts used in the remainder of this paper, but refer the reader to a topology text for full definitions [17,24].

Distances and Metrics
A number of key properties are considered desirable for comparisons between sets. Here, we focus on the properties most relevant for graph comparison; see for example [9,10] for more thorough surveys and discussion of distances in a range of more general settings. We note that there are additional studies of the geometry of graph spaces, e.g., [20].
For each distance discussed in this paper, we note whether or not the distance satisfies Finiteness, Identity, Symmetry, Separability, and Subadditivity. A question mark indicates that it does not need to satisfy that property, but may We say that d is a distance metric (or simply a metric) if it satisfies all five properties. However, less strict notions of dissimilarity can be defined. We say that d is a pseudo-metric if it satisfies finiteness, identity, symmetry, and subadditivity, but not necessarily separability. 1 We say that d is a semi-metric if it satisfies finiteness, identity, symmetry, and separability, but not necessarily subadditivity. We say that d is a quasi-metric if it satisfies finiteness, identity, separability, and subadditivity, but not necessarily symmetry; see Table 1. If we do not know which, if any, of these properties d satisfies, then we simply refer to d as a distance.
When we know that certain properties are not satisfied, we can add certain adjectives. We refer to distance functions that allow infinite distances as extended distances.
We say that d is asymmetric or directed 2 if it does not satisfy symmetry (and may or may not satisfy identity, separability, and subadditivity). For example, an extended directed quasi-metric is a distance that satisfies identity, separability, and subadditivity, but does not satisfy finiteness and symmetry.

Remark 1 (Distances Between Sets) Let
Note that: if d satisfies separability, then D is also separable. Likewise, finiteness, identity, and subadditivity of D follow from finiteness, identity, and subaditivity of d, respectively. Throughout this paper, this is how we symmetrize asymmetric distances, but we note that there are other options as well. As a more complicated example, if d is a pseudo-metric, we can make it separable by defining equivalence classes. For each x ∈ X, we define its equivalence class, denoted [x], as the set of points [x] := {y ∈ X | d(x, y) = 0} and let Y be the set of all equivalence classes. Then, we define D : d(x , y ).

Correspondences and Matchings
Let (M, δ) be a metric space, and A, B ⊂ M. One way to define the distance between A and B is to "line up" the points in the two sets and to use the distance δ to measure "how well" the two sets are aligned. For this, we define correspondences and matchings.
A correspondence τ between A and B is simply a subset of A × B. Often, it is convenient to think of τ both as a subset and as a function that can take subsets of A to subsets of B and vice versa. Specifically, for A ⊆ A, we define τ (A ) We call τ a matching if each element of A B appears in τ at most once. We call τ a perfect matching if τ induces a bijection between the two sets, b : A → B with b(a) = τ ({a}).

Open Balls and the Metric Topology
Let X be a set and d : X × X → R ≥0 a distance. For each r ≥ 0 and x ∈ X, we define the open ball of radius r centered at x by taking all points in X whose distance to x is less than r : In the case that d is a metric, this is called a metric ball; if d is a pseudo-metric, it is a pseudo-metric ball, and so on. The closure of an open ball is a closed ball and can be written: We can use the set of all open balls in X under the distance d in order to define a topology on X. This is called the open ball topology in general, and denoted (X, d).
If d is a metric, the resulting topological space is called a metric space. If d is a pseudo-metric, then it is a pseudo-metric space, and so on. We say that (X, d) is a metric measure space if there exists a Borel measure μ on (X, d) such that all open balls have positive measure (i.e., for all x ∈ X and r > 0, μ(B d (x, r )) > 0); see [16, Ch. 3 1 2 ] for more details on metric measure spaces. At times, we may find it convenient to assume that a topological space is compact. We say that a topological space T is compact if every open cover has a finite subcover. On compact sets, minimum and maximum are defined: See, e.g., [27,Corollary 13.18] for a proof. In particular, the previous lemma allows us to write max f = sup f and min f = inf f .

The Space of Paths
Let X be a set and d : X × X → R ≥0 be a distance function. Using the open ball topology in X and the subspace topology 3 on the unit interval I = [0, 1] ⊂ R, a path in X is a continuous map γ : I → X, and we shall refer to γ as a path from γ (0) to γ (1). If X is a metric space, then the length of γ can be defined, and we call γ rectifiable if its length is finite; see [26] for the formal definition of path lengths. A path is a geodesic if it is locally shortest, that is, if no local perturbation of γ results in a shorter path.
A reparameterization ϕ of the unit interval I is a continuous, non-decreasing, and bijective map ϕ : I → I . Thus, we may also call ϕ an orientation-preserving homeomorphism on I (here, the homeomorphism is orientation-preserving since φ(0) = 0 and φ(1) = 1). Thus, two paths γ 1 , γ 2 : I → X are equivalent, up to orientationpreserving homeomorphism, if there exist two reparameterizations ϕ 1 and ϕ 2 such that γ 1 • ϕ 1 = γ 2 • ϕ 2 . (Note: we can always force one of the reparameterizations to be the identity map). For a, b ∈ X, we use Paths X (a, b) to denote the collection of all paths from a to b in X, up to reparameterization. Then, for A, B ⊆ X, we define Let (M, δ) be a metric space, and let := Paths M (M, M). If we topologize with the compact-open topology, then is compact (and, by Lemma 1, we can take minima and maxima over ).

The Hausdorff Distance (General Form)
Let (M, δ) be a metric space and let 2 M denote the collection of all subsets of M. where the supremum over an empty set is defined to be zero and the infimum over an empty set is defined to be ∞. 4 See Fig. 1 We symmetrize this distance by taking the maximum (as in Remark 2) in order to obtain the Hausdorff distance δ H : 2 M ×2 M → R ≥0 , which is defined by Let Y ⊂ 2 M be a collection of non-empty, compact subspaces of M. Then, the directed Hausdorff distance when restricted to Y satisfies finiteness, identity, and subadditivity, but neither symmetry nor separability (i.e., it is a directed psuedo-metric on Y), and the Hausdorff distance is a metric when restricted to Y.
The proof of this lemma follows from the metric properties of δ; see [19] for details. For a detailed study of the Hausdorff distance, see [22].

Fréchet Distance Between Paths
We define the Fréchet distance δ F : × → R ≥0 by where ϕ ranges over all reparameterizations of I ; see Fig. 2. The Fréchet distance is sometimes called the dog-walking distance, with the following intuition: if a woman is walking on one path, and her dog is walking on the other, find the shortest possible The Fréchet distance asks for the correspondence which minimizes the longest grey connection, shown here as a thicker line leash length that allows them to walk on their curves from start to end. In a sense, the Fréchet distance is really a continuous perfect matching between the curves, where the cost of the matching is the worst case distance between points that are paired under the correspondence.
Similarly, the weak Fréchet distance δ wF : × → R ≥0 is defined using the same formula, but where φ is allowed to range over all continuous surjections (as opposed to bijections) such that φ(0) = 0 and φ(1) = 1. In other words, these maps do not need to be monotonic and are allowed to decrease.
While introduced originally by Fréchet in his thesis as one of the first examples of a metric [14], this distance was first investigated from a computational perspective by Alt and Godau [4], who also established the following lemma.

Immersions and Embeddings into Metric Spaces
Let (M, δ) be a metric space and G = (V , E) be an abstract graph, which we view as a stratified one-dimensional simplicial complex. We restrict our attention to nontrivial graphs, so that each graph must contain at least one vertex. In this setting, note that when we write x ∈ G, the point x can either be a vertex in V G or a point interior to an edge in E G . We topologize G using the quotient space topology, where each (closed) edge is homeomorphic to the unit interval I = [0, 1] in R 1 , and we topologize M using the open ball topology.
An immersion of G into M is a continuous map φ : G → M such that for each point x ∈ G and for all small enough open neighborhoods n x of x in G, the map φ restricted to n x (denoted φ| n x ) is a homeomorphism onto its image. 5 We call the pair (G, φ) an immersed graph in M. We say that (G, φ) is rectifiable if V and E are finite sets, and, for every edge e ∈ E G , the length of φ(e) is finite. To simplify notation, we sometimes use G in place of the pair (G, φ). Given two graphs (G 1 , φ 1 ) and (G 2 , φ 2 ) immersed in M, we say that they are equivalent up to homeomorphism, which we denote (G 1 , φ 1 ) ∼ = (G 2 , φ 2 ), if there exists a homeomorphism α : G 1 → G 2 such that the following diagram commutes: In other words, for all An immersion of G is said to be an embedding if G is homeomorphic onto the image φ(G). In other words, immersions allow edge crossings and embeddings do not; see Fig. 3 and [12,21,23]. The set of all rectifiable embeddings of nontrivial graphs (up to homeomorphism) into M is denoted G M .

Comparisons on Immersed Graphs
Given a metric space (M, δ), we consider the collection G M of finite graphs immersed in this metric space. Each (G, φ) ∈ G M is a graph G = (V , E), where V is the vertex set and E is the edge set, together with an immersion φ : G → M. In this section, we investigate the metric properties of different distance functions on G M .
By default, we work on immersed graphs, where edges may cross each other. As every embedding is an immersion, the metric properties hold automatically when restricted to embedded graphs. However, when not all metric properties hold for immersed graphs, some additional properties may hold for embedded graphs that do not hold for the immersed graphs. We explicitly discuss this restriction and why it holds in the relevant sections.

Hausdorff Distances
In Sect. 2.4.1, we introduced the directed and undirected Hausdorff distances in the general setting, − → δ H and δ H , respectively. We now consider − → δ H restricted to G M . We call this restriction the directed Hausdorff distance between immersed graphs Since each graph in G M is a compact subset of M, by Lemma 1, we know that the Hausdorff distance between two graphs in G M is realized by points in the graphs (and hence the inf and sup of Equation (1) are actually min and max).

Fréchet Distance
The Fréchet distance is a metric originally defined on paths in R n (see Sect. 2.4.2), but it can also be defined for more general objects, such as oriented manifolds. Let A, B ⊆ R d be two oriented manifolds and let f : A → M and g : B → M be two immersions. 6 Then the Fréchet distance between A and B is given by where α : A → B ranges over all orientation-preserving homeomorphisms. In fact, this definition can be generalized to any two homeomorphic spaces, such as graphs. However, care must be taken to either define an appropriate notion of orientation, or define the distance without requiring the homeomorphisms to be orientation-preserving.

Fig. 4 Graphs with large Fréchet but small Hausdorff distance
Here, the Fréchet distance between immersed graphs d F : G M × G M → R ≥0 can be defined by restricting our attention to the Fréchet distance between paths, and avoiding any mention of orientation-preserving homeomorphims. In particular, for if G 1 , G 2 are homeomorphic, and ∞ if they are not. 7 Here, α ranges over all edge mappings corresponding to the isomorphisms of G 1 and G 2 , see [5], and we assume the graphs have no degree two vertices. The latter can be assumed since we consider graphs in G M equivalent up to homeomorphism. Note that we are abusing notation slightly here and viewing φ G (e) as a parameterized curve, rather than just an immersion of an edge; since the Fréchet distance considers all reparameterizations of the curve regardless, any parameterization of the image of the edge is sufficient. Since graphs are compared using an isomorphism, the Fréchet distance can be arbitrarily larger than the Hausdorff distance, see

Theorem 2 (Metric Properties of Fréchet Distance Between Immersed Graphs) The Fréchet distance d F is a metric.
Proof It is well-known that the Fréchet distance is a pseudo-metric: Identity and symmetry follow directly from the definition.
Separability is also fulfilled: consider two graphs (G 1 , φ 1 ), (G 2 , φ 2 ) with Fréchet distance 0. These graphs must be isomorphic if their distance is 0, and hence G 1 is the same as G 2 .
For completeness, we provide a proof that the subadditivity is satisfied: Consider three graphs (G 1 , φ 1 The first inequality follows simply by combining the two terms, and the second inequality follows from the triangle inequality that is fulfilled by d F for curves (which in turn follows from it being fulfilled by δ F ).
Note that if G 1 and G 2 are simple graphs, each isomorphism has a unique edge mapping. Hence computing this distance is at least as hard as determining graph isomorphism. For trees [5], it can be computed in polynomial time and for graphs of bounded tree width [7] it is fixed-parameter tractable. For planar embedded graphs, it is desirable that isomorphisms are "orientation-preserving" in the sense that they preserve orderings of edges around each vertex. This property can be used to enumerate all such planar orientation-preserving isomorphisms, and thus compute the distance for embedded graphs in polynomial time [13].

Path-Based Distance
The path-based distance was originally presented in [1]. This distance uses the Fréchet distance between paths in graphs (see Sect. 2.4.2) to define a distance between graphs.
Let (M, δ) be a metric space. For each (G, φ) ∈ G M , let G denote the set of all paths in G up to reparameterization; that is, As noted in Remark 2, we can symmetrize this asymmetric distance in order to define the path-based distance d path : G M × G M → R ≥0 as the Hausdorff distance between path sets 1 and 2 . Specifically, Due to compactness of 1 and 2 and by Lemma 1, we can always find paths where this distance is realized. In other words, if −−→ d path (G 1 , G 2 ) = ε, then there exist paths P 1 ∈ 1 and P 2 ∈ 2 such that the Fréchet distance between P 1 and P 2 is ε. See Fig. 5 for an example; note that the two graphs shown have infinite Fréchet distance, as they are not homeomorphic.

Theorem 3 (Metric Properties of Path-Based Distances)
The directed path-based distance satisfies finiteness, identity, and subadditivity, but not separability nor symmetry. The path-based distance is a metric.
Proof For each (G, φ) ∈ G M , define the set of paths G = Paths G (G, G) and the set of immersed paths φ G ( G ) := {φ G • f | f ∈ G }. Since φ G is an immersion and G is compact, we know that φ G ( G ) is also compact. Hence, the theorem follows from setting  (b)Graph G 2 . Fig. 5 Two graphs, G 1 and G 2 , shown separately for ease of visibility. When they are embedded to be overlapping (so that the outer four vertices are in the same location in the plane), these graphs have small path based distance: each edge in G 1 will can map to two edges in G 2 , and the path based distance will be less than the radius of G 2 's inner loop The exact complexity of the path-based distance is still an open and potentially challenging question, as the measure depends up Fréchet mappings between curves in the graph, of which there could be exponentially many to consider. However, in the paper that introduces it [1], the authors present a polynomial time approximation algorithm which is based on the maximum Fréchet distance and demonstrate its efficacy on real-world map datasets.

Traversal Distance
The traversal distance, introduced by Alt et al. in [3], for a pair of immersed graphs (G 1 , φ 1 ), (G 2 , φ 2 ) ∈ G M is defined as follows. Let f : [0, 1] → G 1 be a continuous and surjective function, called a (full) traversal of G 1 , and let g : [0, 1] → G 2 be a continuous-but not necessarily surjective-function, called a partial traversal of G 2 . We compare functions f and g by their L ∞ norm, max g(t)). Taking the infimum over all possible f and g, we arrive at the traversal distance where f ranges over all full traversals of G 1 and g ranges over all partial traversals of G 2 . Noticing that reparametrizations of each traversal f and each partial traversal g are included in that infimum, we observe the following equivalence: Compared to the Fréchet distance, the traversal distance can also be applied to nonhomeomorphic graphs. On homeomorphic graphs, the Fréchet distance is an upper bound for the traversal distance, as the Fréchet correspondence yields a candidate G 2 G 1 Fig. 6 Homeomorphic graphs with small traversal but large Fréchet and path-based distances Example showing that the traversal distance violates both separability and subadditivity. Assume that the four vertices of these graphs are the same, i.e., G 1 and G 3 are subgraphs of G 2 . Since G 1 is a subgraph of G 2 , the traversal distance from G 1 to G 2 is zero. To compute − → d T (G 1 , G 3 ), conside the traversal of G 1 that starts at the bottom left, goes up to the next vertex, right to the third vertex, and finally down to the last vertex. The best partial traversal of G 3 would be one that goes up and down one of the vertical edges traversal to consider. However, the traversal distance can be arbitrarily smaller than the Fréchet distance, see Fig. 6.
We take the symmetric version of the traversal distance by maximizing the two directed distances. In other words, we define d T :

Theorem 4 (Metric Properties of Traversal Distance) The directed traversal distance − → d T satisfies finiteness and identity but does not satisfy symmetry, separability, and subadditivity. The symmetric traversal distance d T satisfies finitesness, identity, and symmetry, but it does not satisfy separability nor subadditivity.
Proof Since the distance between f and g is finite and since the traversal distance is taking the infimum over all possible traversals f and partial traversals g, we know that − → d T is finite. When (G 1 , φ 1 ) = (G 2 , φ 2 ), taking the same traversal yields a traversal distance of zero. Hence, the traversal distance satisfies the identity property. Since any proper subgraph has distance zero to its supergraph, but not vice versa, we also have that it is not symmetric.
To see that the traversal distance does not fulfill separability and subadditivity, consider the following graphs: G 2 is comprised of four vertices and four edges forming an axis-aligned rectangle in R 2 , G 1 is the subgraph of G 2 obtained by Fig. 8 The path p highlighted red in (G 1 , φ 1 ) has large distance to any path in (G 2 , φ 2 ), but the traversal distance from (G 1 , φ 1 ) to (G 2 , φ 2 ) is small removing the bottom edge, and G 3 is the subgraph of G 2 obtained by removing the top edge. See Fig. 7. Then, φ 1 ) and (G 2 , φ 2 ) are not equivalent (up to homeomorphism). Thus, separability is not satisfied. In addition, we have Since w > w/2 + 0, subadditivity is not satisfied.
Since − → d T satisfies finiteness and symmetry and identity, so does the symmetrized version d T . By construction, d T is also symmetric. The two graphs of Fig. 3 have distance zero, which means that d T does not satisfy separability. However, for d T , subadditivity does not hold. For this, consider again the graphs in Fig. 7. We have d T (G 1 , G 3 ) = w. If we add small outward left and right spikes (say, they are length with < w 2 ) on all graphs at height h/2 (calling the resulting graphs G * 1 , G * 2 , and G * 3 , respectively), then we have Hence, subadditivity does not hold.
Since a (partial) traversal is also a path in the graph, the traversal distance is related to the path-based distance. Both take the Fréchet distance from a traversal/path in (G 1 , φ 1 ) to a closest partial traversal/path in (G 2 , φ 2 ). However, the traversal distance minimizes over all traversals, whereas the path-based distance maximizes over all paths. Hence, the traversal distance is a lower bound to the path-based distance, i.e., 8 gives an example where the path-based distance is strictly larger than the traversal distance.

Strong and Weak Graph Distance
The strong and weak graph distances were first introduced in [2], with the goal of combining topology and geometry. They are based on the strong/normal and weak Fréchet distance between graphs. Let (G 1 , φ 1 ), (G 2 , φ 2 ) ∈ G M . A graph mapping s : G 1 → G 2 is a continuous map. An alternative characterization of a graph mapping for planar embedded graphs is as follows: each vertex v ∈ V 1 is sent to a point s(u) ∈ G 2 , and each edge {u, v} ∈ E 1 is sent to a path 8 from s(u) to s(v) in G 2 . Note that s(u) can be a vertex or any point internal to an edge. Fig. 9 Graph G 2 has strictly larger (strong or weak) graph distance than traversal distance to G 1 . And graph G 2 has strictly larger strong than weak graph distance to G 3 Letting s range over all graph mappings from G 1 to G 2 , the directed strong graph distance − → d S : G M × G M → R ≥0 between G 1 and G 2 is given by φ 2 (s(e))), Note that these distances are not symmetric. However, we may define their undirected versions by taking the maximum of the directed distances, i.e., In [2] it was shown that these distances are metrics for planar embedded graphs and pseudo-metrics for non-planar graphs. However, for immersed graphs, separability also holds for graphs when defining graph mappings as continuous maps between the (abstract) graphs. Hence we obtain: which connects this distance to the traversal distance discussed in Sect. 3.4. See also Fig. 9. Both the strong and weak graph distances are NP-hard to decide for general graphs. However, for trees we can compute them in cubic time for the strong graph distance and quadratic time for the weak graph distance. For planar embedded graphs under a geometric assumption (which requires cycles to have a nice shape), the weak distance can be computed in quadratic time, but the strong distance remains NP-hard. An open question is whether a "stronger/symmetric" version of the graph distance, which requires the same graph mapping for both directions, would be equivalent to a generalization of the contour tree distance; see Sect. 3.6 for a discussion of that distance.

Contour Tree Distance
In [6], motivated by computing the Fréchet distance between two surfaces, the "contour tree distance" is defined between the contour trees of two surfaces. We naturally extend this distance to a distance on the subset of G M consisting of connected (immersed) graphs; let C M denote this space. The contour tree distance d C : where τ ranges over the set of all correspondences between (G 1 , φ 1 ) and (G 2 , φ 2 ) such that: 3. For each y ∈ G 2 , τ ∩ (G 1 × {y}) is a non-empty, connected subset of G 1 .
The connectedness of the correspondence τ requires the graphs to be connected, too. This is the reason that we are restricting our attention to C M as opposed to G M .
The contour tree distance resembles the Fréchet distance in that it establishes a correspondence between portions of the graphs. However, unlike standard Fréchet distance, the contour tree distance allows a comparison of non-homeomorphic graphs using G 1 × G 2 , in a manner similar to the Fréchet distance. It allows for "stretching" a region of the graph, as any vertex in one can correspond to a connected subregion in the other. See Fig. 10 for an illustration.

Theorem 6 (d C is a Metric) On the space C M of connected graphs, d C is a metric.
Proof The distance fulfills identity via the trivial correspondence, and symmetry because the same correspondence works for G 1 × G 2 and G 2 × G 1 .
For separability, assume that d C (G 1 , G 2 ) = 0. Then either a correspondence τ exists such that d(φ(x), φ(y)) = 0 for all (x, y) ∈ τ or there is a limit of correspondences such that d(φ(x), φ(y)) < ε for arbitrary small ε and (x, y) ∈ τ . In both cases it follows that G 1 and G 2 have the same immersion.
To see that it also satisfies subadditivity, we concatenate the correspondences and use subadditivity in (M, δ).
The contour tree distance is NP-complete to compute, even for trees [6]. It seems that the contour tree distance can be considered as a symmetric version of the strong Fig. 10 Two graphs with small contour tree distance showing corresponding parts of the graphs graph distance; see Sect. 3.5. Both align portions of the graphs, but where the strong graph distance uses two separate mappings between the two graphs, the contour tree distance uses symmetric correspondences.

Local Persistent Homology Distance
The next distance we investigate is the local persistent homology (LPH) distance, originally presented in [1] as a metric on plane graphs. To define this distance, we compare the graphs at a local level using persistent homology (see, e.g., [11]). Briefly, persistent homology is a multiscale version of the fundamental topological notion of homology, which measures the "features" in a space (i. e. connected components, holes, and higher-dimensional voids).
More formally: given a nested sequence of topological spaces (called a filtration), persistent homology tracks the appearance ("births") and disappearance ("deaths") of topological features within the filtration. The results are then encoded in a persistence diagram as pairs of (birth, death) points in the first quadrant of the plane. A standard distance between persistence diagrams is as follows: given persistence diagrams D 1 and D 2 , their bottleneck distance d b is defined to be where f ranges over all bijections between D 1 and D 2 .
The local persistent homology distance compares graphs at a local level and requires the following additional preliminary definition. Let Y ⊆ X a set. We define an εthickening of Y , denoted Y ε , to be Fig. 11 A picture of a geometric graph embedded in R 2 , shown in the leftmost box, along with ε-thickenings for increasing values of ε. The graph contains 3 loops: two that are entirely in red graph, as well as one "relative" loop which is formed by the leftmost partial cycle of the graph along with its boundary. The birth and death times of each are indicated below the filtration; each such pair will form a point in the persistence diagram, with longer lifetime loops appearing further from the diagonal that has slope 1 When X is a real vector space, Y ε is referred to as the Minkowski sum of Y with a closed ball of radius ε in (X, d). (M, δ). We define the function δ G : M → R as the distance function to the set φ G (G); namely, for all x ∈ M, y). Equivalently, δ G (x) is the smallest non-negative such that x is in the -thickening G . Let U be a closed subset of M, and let ∂U denote the boundary of U . We consider the distance function restricted to the quotient space U * := U /∂U ; that is, we define δ G,U : U * → R by See Fig. 11 for an example. Given graphs (G 1 , φ 1 ) and (G 2 , φ 2 ) in G M , let D 1 and D 2 be the persistence diagrams of δ G 1 ,U and δ G 2 ,U , respectively. A local distance signature between G 1 and G 2 is the assignment of a distance to the neighborhood U ; in our case, we assign the bottleneck distance between D 1 and D 2 to the set U , and denote it B U (G 1 , G 2 ). The local distance signature is valuable in its own right and can be used in heatmaps to visualize the locations of large differences between the two graphs, as shown in Fig. 12.
We use the construction described above to define an overall distance between the two graphs.
Let B denote the set of all metric balls in (M, δ); that is, As a set, B is equivalent to the product M × R ≥0 . And so, we use the product topology on M × R ≥0 to define a topology on B.
If M is a metric measure space, this integral is well-defined. In particular, we can expand it to a double integral as follows: where U = B δ (x, r ). Moreover, as we see in the next theorem, this distance is a pseudo-metric: Nonnegativity.
Since ω is a weight function, we know that ω is nonnegative. As B U (G 1 , G 2 ) is the bottleneck distance between two persistence diagrams, we know this term is also nonnegative. Thus, d L H (G 1 , G 2 ), being the integral of the product of two nonnegative functions, is nonnegative.
Finiteness. Let X ⊂ M be the support of ω. Since ω is a weight function, we know that there exists an R such that ω vanishes on all U = B δ (x, r ) ∈ B satisfying R ≤ r , i.e., we have ω(U ) = 0. On the other hand, for all balls U = B δ (x, r ) ∈ B such that r ≤ R, we know that B U (G 1 , G 2 ) ≤ 2R (since thickening by 2R or more results in the empty topological space). Then, letting c = U ∈B ω(U ) dU and B denote the support of ω, we bound the expression of Equation (2): Symmetry. The symmetry of d L H follows immediately from the symmetry of the bottleneck distance between persistence diagrams.
Subadditivity. Subadditivity follows, again, from the subadditivity of the bottleneck distance between persistence diagrams.
Inseparability. Let (G 1 , φ 1 ) be the immersion of the graph with two vertices and one edge from Fig. 3(a) and let G 2 be the graph with three vertices and three edges from Fig. 3(b). Note that G 1 and G 2 are not homeomorphic, yet have the same image in M. Then, for all metric balls U in (M, δ), we have B U (G 1 , G 2 ) = 0. Thus, d L H is not separable.

Remark 3 (Variants of the LPH Distance)
When restricted to the setting where M is compact, then d L H also satisfies separability if ω is nonvanishing on small enough balls. This was observed in [1] for M a compact subspace of R n .
The definition of the LPH distance given in Equation (2) integrates the bottleneck distance over the space of all metric balls B. We can aggregate over B in other ways, such as using the L p -norm in place of the bottleneck distance. We can also replace the bottleneck distance B U with the Wasserstein metric or erosion distance between persistence diagrams. Changing the distance in these ways, the same metric properties hold. In addition, there are many different descriptors that one could consider in place of the persistence diagram, such as the Euler characteristic curve of δ G,U or a simpler descriptor such as the number of connected components in G ∩ U . However, if we use a weaker invariant, we may lose additional metric properties.

Graph Edit Distances
We next consider graph edit distances, which take a completely different and more combinatorial view when comparing graphs. Introduced in the 1980s, edit distances for general graphs are a well-studied way to compare graphs [25]. Given associated weights or costs for graph operations (e.g., vertices or edges to be inserted or deleted), the edit distance is the infimum of the sum of edit costs over all sequences of edit operations needed to transform one graph into the other. Edit distances are drastically different than other notions covered so far, in that they do not minimize some maximum correspondence, but rather sum all costs, leading to much larger distances. We give a brief overview of two versions of geometric edit distances and their metric properties here, as studied in [8]; see [15] for a survey of the many variants and heuristics on broader classes of graphs. Fig. 13 Two examples of edit sequences between graphs G 1 and G 2 . Both the edit distance and the geometric graph distance are ≥ 2δ, which Fréchet, Hausdorff, and path-based distances are all equal to δ on this example A natural notion of edit distance for geometric graphs, which to the best of our knowledge was first discussed in [8], allows deleting and inserting vertices and edges for some cost related to the distance in the ambient space M, as well as moving vertices for cost propotional to the distance moved in M and change in edge lengths. More formally, the cost is then defined for each edit operation as follows: 1. Edge Deletion and Insertion. The cost of removing or inserting an edge is the length of the edge. 2. Vertex Deletion and Insertion. The cost of inserting or deleting an isolated vertex is 1. The cost of inserting a vertex in the middle of an edge to create two edges is free, as is the reverse operation. Otherwise, the cost of deleting a vertex is 1, plus the cost of deleting all incident edges. 3. Vertex Moving. The cost of moving a vertex is the distance that the vertex is moved, plus the sum of the changes in edge lengths of all incident edges.
The graph edit distance d edit : where τ ranges over all finite sequences of edit operations that start at G 1 and end at G 2 and c(τ i ) denotes the cost of the edit operation τ i . In Fig. 13a, we see two graphs, G 1 and G 2 , which each have two vertices and a single edge. The edit sequence between them is moving the right vertex first (shown as a grey double arrow), for a cost of δ to move the vertex itself plus the difference in length between the blue edge and the grey diagonal; the next edit move will shift the left vertex up, which again costs δ plus the difference in length between the diagonal and the red upper edge. This simple example already demonstrates the distinction of the edit distance and previous distances introduced in this survey, as the Fréchet, Hausdorff, and the path-based distance are all equal to δ on these graphs, while the edit distance must always be ≥ 2δ, given that each vertex must move δ.
We note that variations of edit distance could be examined, particularly with regards to different costs for vertex insertion or deletion costs; however, we begin with this definition as it is the one proposed and studied in the literature [8]. We begin by exploring metric proerties of graph edit distance, before discussing some of its limitations. The following properties hold for this distance: 9

Theorem 8 (Metric Properties of Graph Edit Distance) The graph edit distance d edit is a semi-metric that satisfies sub-additivity.
Proof To see that d edit is finite, consider the following weighted graph G = (G M , E, ω), where the vertex sets are all immersed graphs. A pair of immersed graphs ((G 1 , φ 1 ), (G 2 , φ 2 )) corresponds to an edge in E if and only if they are connected through a single edit operation. The weight of the edge, ω ((G 1 , φ 1 ), (G 2 , φ 2 )), is the cost of that edit. Then, d edit ((G 1 , φ 1 ), (G 2 , φ 2 )) is the length of the shortest path from (G 1 , φ 1 ) to (G 2 , φ 2 ) in G. Since any graph is connected to the empty graph by a finite sequence of edit operations by removing all edges then removing all vertices, we know that the edit distance between two graphs is finite.
Identity follows because an empty sequence of edits transforms a graph to itself. Separability follows because, given (G 1 , φ 1 ) (G 2 , φ 2 ) ∈ G M , we know that there must be some edit operation of positive cost to convert (G 1 , φ 1 ) to (G 2 , φ 2 ).
To prove symmetry, consider a fixed finite sequence τ in Equation (4). Let −τ denote the reverse sequence. Since any finite sequence of edits can be reversed to transform one graph to another, we know that −τ is one of the sequences considered by the infimum when defining d edit (G 2 , G 1 ). Moreover, since costs of edit operations is symmetric, the cost of τ is the same as the cost of −τ . Hence, we obtain: Sub-additivity follows a similar argument.
However, one major limitation, especially when considering algorithms to calculate this distance, is that the edit distance may never be attained by a finite sequence of edit operations. For instance, consider again our simple example of two graphs consisting of single straight edges, shown in Fig. 13. As pointed out by [8], an optimal edit sequence would be to alternate moving the vertices by an infinitesimal amount so as to minimize the change in edge length incurred; see Fig. 13b. Taking the limit as that infinitesmal amount decreases to zero, we get that the edit distance is exactly 2δ, but that value can never be realized by any finite sequence of edits.
In order to get around this issue, [8] introduces an edit-like distance, which they call the geometric graph distance, that makes a few notable changes to the edit distance above. Rather than charging a unit for a vertex addition or deletion, vertex additions and deletions are free. Also, they introduce two fixed parameters-an edge weight and a vertex weight-which factor multiplicatively into the relevant edit operations. 10 To avoid pathological examples of infinite edit sequences as discussed above, instead of charging costs for individual moves, they instead look at the total change in length at the end of all edits. They then observe that edits can be required to be done in the following order: 1. Edge Deletion Phase. The cost of removing an edge is the length of the edge times the edge weight.

Vertex Deletion Phase.
Deleting an isolated vertex is free. Non-isolated vertices may not be deleted. 3. Vertex Moving Phase. The cost of moving a vertex is the distance that the vertex is moved times the vertex weight, plus for all incident edges the change in edge lengths times the edge weight. 4. Vertex Insertions Phase. The cost of adding vertices is free. In contrast to the edit distance defined above, adding a vertex in the middle of an edge is not an allowable edit operation. (However, it can be attained by removing the edge in the Edge Deletion phase, then adding a vertex in this phase, and adding the two edges in the next phase). 5. Edge Insertion Phase. The cost of inserting an edge is the length of the edge times the edge weight.
Then, the geometric graph distance d ggd : G M × G M → R ≥0 is defined by where κ ranges over all finite sequences of edit operations that start at G 1 and end at G 2 and preserve the phases described above. As proven in [8], this distance is a metric: Theorem 9 (Metric Properties of Geometric Graph Distance) The geometric graph distance d ggd is a metric on the set of geometric graphs without isolated vertices for positive edge and vertex weight.
However, as also shown in [8] the geometric graph distance is NP-complete to compute when considering non-planar graphs or choosing the edge weight much larger than the vertex weight. It is unknown if the problem remains NP-hard for planar graphs, or if the original edit distance formulation is also NP-hard. such as stability under certain types of perturbations. Finally, while we have briefly mentioned computational complexity of a few of these where results exist, the exact complexity or hardness of most remains open.