1 Introduction

Comparing geometric shapes is a task that arises in a wide arena of applications. The Fréchet distance and its variants have been used, to this end, to compare curves in applications such as dynamic time-warping [19], speech recognition [21], signature and handwriting recognition [22, 23], matching of time series in databases [20], as well as geographic applications, such as map-matching of vehicle tracking data [8, 24], and moving objects analysis [10, 11].

Informally, the Fréchet distance between two curves is the maximum distance a point on the first curve has to travel as this curve is being continuously deformed into the second curve, see Sect. 2.2 for the formal definition. Unlike the Hausdorff distance, which is solely based on nearest neighbor distances between points on the curves, the Fréchet distance requires continuous and order-preserving assignments of points and hence is better suited for comparing curves with respect to their intrinsic structure.

figure a

The Fréchet distance between two curves might be arbitrarily larger than their Hausdorff distance, as demonstrated by the figure below, and as this example shows, it seems to be a more natural measure of similarity between curves.

figure b

Previous Results

For two polygonal curves of total complexity n in the plane, their Fréchet distance can be computed in O(n 2logn) time [3], and their Hausdorff distance can be computed in O(nlogn) time [2]. It has been an open problem to find a subquadratic algorithm for computing the Fréchet distance for two curves. For the problem of deciding whether the Fréchet distance between two curves is smaller or equal a given value a lower bound of Ω(nlogn) was given by [9]. Recently, Alt [2] conjectured that the decision problem may be 3SUM-hard. The only subquadratic algorithms known are for quite restricted classes of curves such as for closed convex curves and for κ-bounded curves [4]. For a curve to be κ-bounded means, roughly, that for any two points on the curve the portion of the curve in between them cannot be further away from either point than κ/2 times the distance between the two points. For closed convex curves the Fréchet distance equals the Hausdorff distance and for κ-bounded curves the Fréchet distance is at most (1+κ) times the Hausdorff distance, and hence the O(nlogn) algorithm for the Hausdorff distance applies.

Aronov et al. [6] provided a near linear time (1+ε)-approximation algorithm for the discrete Fréchet distance, which only considers distances between vertices of the curves. Their algorithm works for backbone curves, which are used to model protein backbones in molecular biology. Backbone curves are required to have, roughly, unit edge length and a minimal distance between any pair of vertices. They use curve simplification to speed up their algorithm. Agarwal et al. [1] studied fast simplification that preserves the Fréchet distance.

The Input Model

We introduce a new class of curves, called c-packed curves, for which we can approximate the Fréchet distance quickly, given that the constant c is small. Intuitively, the constant c measures how “unrealistic” the input is. We compare this new input model to previous models such as fatness and low density, as well as κ-boundedness. These so-called realistic input models are commonly used for the analysis of problems where the worst case complexity is dominated by degenerate or contrived configurations which are highly unlikely to occur in practice, see [15] for an overview.

A curve π is c-packed if the total length of π inside any ball is bounded by c times the radius of the ball. A κ-bounded curve might have arbitrary length while maintaining a finite diameter, and as such may not be c-packed, see Sect. 4.3. But unlike κ-bounded curves, the Fréchet distance between two c-packed curves might be arbitrarily larger than their Hausdorff distance. Indeed, c-packed curves are considerably more general and a more natural family of curves. For example, a c-packed curve might self cross and revisit the same location several times, and the class of c-packed curves is closed under concatenation, none of which is true for κ-bounded curves. Intuitively, c-packed curves behave reasonably in any resolution.

See the figure below for a few examples of c-packed curves. The boundary of convex polygons, algebraic curves of bounded maximum degree, the boundary of (α,β)-covered shapes [17], and the boundary of γ-fat shapes [14] are all c-packed. Indeed, the boundaries of (α,β)-covered shapes and γ-fat shapes are assumed to be formed by a constant number of algebraic curves of bounded maximum degree. If one removes the requirement that a γ-fat curve be of bounded descriptive complexity, then also fractal curves, like Koch’s snowflake, which can have infinite length within a bounded area, can be fat [7]. Naturally, these curves cannot be c-packed. Interestingly, one can show that (α,β)-covered polygons are c-packed even if they have unbounded complexity, see Appendix A and also the result of Bose et al. [7].

figure c

It is easy to verify that c-packed curves are also low density [15], but a low-density curve might not be c-packed, for any bounded c, see Sect. 4.2. However, the class of c-packed curves is closed under simplification, see Lemma 4.3, and this is not true for low-density curves.

Our Results

We present a new algorithm for computing a (1+ε)-approximation of the Fréchet distance for polygonal curves in ℝd. Underlying the algorithm are several new insights. First, we use the idea of curve simplification to reduce the complexity of the free space diagram, as this simplification results in a contraction of the corresponding rows or columns in the free space diagram. We introduce the notion of relative free space complexity in Definition 3.3 to capture the complexity of the free space diagram of two curves, which are simplified to the appropriate resolution. Surprisingly, without simplification, almost any two curves from natural families of curves can have a free space diagram for the value realizing the Fréchet distance that has quadratic complexity (even in the plane). Secondly, we present an efficient construction algorithm for this reduced size free space diagram that enables us to solve the decision problem in linear time in the relative free space complexity of the curves. Thirdly, we prove that monotonicity events are sufficiently close to vertex–edge events or an approximate distance between two vertices of the curves. Therefore, the search for the Fréchet distance can be done efficiently without using parametric search or random sampling, by using approximate distance selection. Carefully combining these insights yields the new algorithm, which has running time near linear in the relative free space complexity of the input curves.

In the second part of the paper, we analyze the relative free space complexity for various families of curves. We prove that c-packed curves have linear relative free space complexity for fixed c and ε. We next prove a subquadratic bound on the relative complexity of the free space of low-density curves. This relies on a new packing lemma showing that, if the simplification of a low-density curve is long inside a relatively small area, then the original curve must contain many vertices in the vicinity of this region. We also prove that the relative free space complexity of κ-bounded curves is linear for a fixed κ, which leads to an improvement of the result by Alt et al. [4].

These bounds imply that the approximation algorithm provides fast approximation for the Fréchet distance for all these types of curves. We also show how to adapt our algorithm to handle closed curves. The new results are summarized in Table 1.

Table 1 Summary of new results for computing a (1+ε)-approximation to the Fréchet distance between two curves π and σ with n vertices in ℝd

Organization

In Sect. 2, we provide some background on the Fréchet distance and the notion of the free space diagram. In Sect. 3, we describe the approximation algorithm that uses simplification. To this end, we show in Sect. 3.1 that it suffices to only compute the reachable parts of the free space diagram and in Sect. 3.2 we present a fuzzy decider procedure and show how it can be used to make exact decisions during a binary search for the Fréchet distance. In Sect. 3.3, we deal with the different subroutines used in the search for the Fréchet distance and in Sect. 3.4 we give the resulting general algorithm and analyze its correctness and running time, which is near linear in the relative free space complexity. In Sect. 4, we bound the relative free space complexity of various families of curves. In particular, in Sect. 4.1, we introduce the notion of c-packed curves, and study their behavior under simplification. In Sect. 4.3, we bound the relative free space complexity of κ-bounded curves, and in Sect. 4.2 we handle low-density curves. In Sect. 5, we extend the algorithm to closed curves. We conclude with discussion in Sect. 6.

2 Preliminaries

2.1 Notations and Definitions

Let π be a curve in ℝd; that is, a continuous mapping from [0,1] to ℝd. In the following, we will identify π with its range π([0,1])⊆ℝd if it is clear from the context. The curve π is closed if π(0)=π(1). We use ∥⋅∥ to denote the Euclidean distance as well as the length of a curve. For a polygonal curve π, let V(π) denote the set of vertices of π. For two points p and q on a curve π, let π[p,q] denote the portion of the curve between the two points.

We denote with B(p,r) the ball of radius r centered at p, and S(p,r) denotes the corresponding sphere. Given a set of numbers U⊆ℝ, an atomic interval of U is a (possibly infinite) maximal interval on the real line that does not contain any point of U in its interior. Let be the set of all pairwise distances of points in P.

2.2 Fréchet Distance and the Free Space Diagram

A reparameterization is a bijective and continuous function f:[0,1]→[0,1]. It is orientation-preserving if f(0)=0 and f(1)=1. Given two reparameterizations f and g for two curves π and σ, respectively, define their width as

This can be interpreted as the maximum length of a leash one needs to walk a dog, where the dog walks monotonically along π according to f, while the handler walks monotonically along σ according to g. In this analogy, the Fréchet distance is the shortest possible leash admitting such a walk.

Formally, given two curves π and σ in ℝd, the Fréchet distance between them is

where f and g are orientation-preserving reparameterizations of the curves π and σ, respectively. The Fréchet distance complies with the triangle inequality; that is, for any three curves π,σ and τ we have .

Let π, σ be curves and δ>0 a parameter, the free space of π and σ of radius δ is defined as

We are interested only in polygonal curves. Then the square [0,1]2 can be broken into a (not necessarily uniform) grid called the free space diagram, where a vertical line corresponds to a vertex of π and a horizontal line corresponds to a vertex of σ. Every two segments of π and σ define a free space cell in this grid. In particular, let C i,j =C i,j (π,σ) denote the free space cell that corresponds to the ith edge of π and the jth edge of σ. The cell C i,j is located in the ith column and jth row of this grid.

It is known that the free space, for a fixed δ, inside such a cell C i,j (i.e., D δ (π,σ)∩C i,j ) is the clipping of an affine transformation of a disk to the cell [3], see the figure below; as such, it is convex and of constant complexity. Let \(I^{h}_{i,j}\) denote the horizontal free space interval at the top boundary of C i,j , and \(I^{v}_{i,j}\) denote the vertical free space interval at the right boundary.

figure d

The Fréchet distance between π and σ is at most δ if and only if there is an (x,y)-monotone path in the free space diagram between (0,0) and (1,1) that is fully contained in D δ (π,σ). Let the reachability intervals \(R^{h}_{i,j}\subseteq I^{h}_{i,j}\) and \(R^{v}_{i,j}\subseteq I^{v}_{i,j}\) consist of the points (x,y) on the boundary that are reachable by a monotone path from (0,0) to (x,y).

Such a path to (1,1) can be computed, if it exists, in O(n 2) time by dynamic programming, where n is the total complexity of the two polygonal curves π and σ, see [3].

2.2.1 Free Space Events

To compute the Fréchet distance consider increasing δ from 0 to ∞. As δ increases, structural changes to the free space happen. We are interested in the radii (i.e., the value of δ) of these events.

Consider a segment u of π and a vertex p of σ, a vertex–edge event corresponds to the minimum value δ such that u is tangent to B(p,δ). In the free space diagram, this corresponds to the event that a free space interval that consists of only one point was just created. The line supporting this boundary edge corresponds to the vertex, and the other dimension corresponds to the edge. Naturally, the event could happen at a vertex of u.

figure e

The second type of event, a monotonicity event, corresponds to a value δ for which a monotone subpath inside D δ becomes feasible, see Fig. 1. Geometrically, this corresponds to two vertices p and q on one curve and a directed segment u on the other curve such that: (1) u passes through the intersection S(p,δ)∩S(q,δ), and (2) u intersects B(q,δ) first and B(p,δ) second, where p comes before q in the order along the curve π.

Fig. 1
figure 1

Two curves π and σ and their free space diagram D δ (π,σ), where p=π(s),q=π(s′) and r=σ(t). Here, δ is the minimal free space parameter, such that a monotone path exists, i.e., in this example coincides with a monotonicity event

Other values of δ that would be relevant to our algorithm are the distances between any pair of points of V(π)∪V(σ). Technically, apart from the two single events that the endpoints of the curves are being matched to each other, these vertex–vertex events are vertex–edge events when they are relevant, but they will be handled naturally by our algorithm.

2.3 Curve Simplification

We suggest a straightforward greedy algorithm for curve simplification, which is sufficient for our purposes. We comment that Agarwal et al. [1] suggested a more aggressive (but slightly slower and more complicated) simplification algorithm that can be used instead.

Algorithm 2.1

Given a polygonal curve π=p 1 p 2 p 3p k and a parameter μ>0, consider the following simplification algorithm: First mark the initial vertex p 1 and set it as the current vertex. Now scan the polygonal curve from the current vertex until it reaches the first vertex p i that is in distance at least μ from the current vertex. Mark p i and set it as the current vertex. Repeat this until reaching the final vertex of the curve, and also mark this final vertex. Consider the curve that connects only the marked vertices, in their order along π. We refer to the resulting curve π′=simpl(π,μ) as the μ-simplification of π. Note that this simplification can be computed in linear time.

Remark 2.2

The simplified curve has the useful property that all its segments are of length at least μ, except for the last edge that might be shorter. For the sake of simplicity of exposition, we assume that the last segment in the simplified curve also has length at least μ. Our arguments can be easily modified to handle this more general case.

Lemma 2.3

For any polygonal curve π ind, and μ≥0, it holds .

Proof

Consider a segment u of simpl(π,μ) and the portion \(\widehat{\pi}\) of π that corresponds to it. Clearly, all the vertices of \(\widehat{\pi}\) are contained inside a ball of radius μ centered at the first endpoint of u visited by π, except the last vertex of \(\widehat{\pi}\). As such, one can parameterize u and \(\widehat{\pi}\), such that initially the point stays on the vertex of u while visiting all vertices of \(\widehat{\pi}\) (except the last one), and then simultaneously move in sync on u and the last segment of \(\widehat{\pi}\), in such a way that the distance is always at most μ.

figure f

 □

3 The Approximation Algorithm

3.1 Computing the Reachable Free Space

For two curves π and σ, their reachable free space, denoted by , is the set of all the points of D δ (π,σ) that are reachable from (0,0) by an (x,y)-monotone path.

The set has finite descriptive complexity inside each grid cell, and we need to describe it only for the grid cells that have non-empty intersection with . Clearly, generating only those grid cells is sufficient to decide if there is a monotone path between (0,0) and (1,1), which is equivalent to deciding if the Fréchet distance between π and σ is smaller or equal to δ. In particular, to fully describe , we will specify the reachability intervals \(R^{h}_{i,j}\subseteq I^{h}_{i,j}\) and \(R^{v}_{i,j}\subseteq I^{v}_{i,j}\) for each cell C i,j , which describe the intersection of with the top and right boundary of C i,j . These intervals contain all the needed information, since is convex.

The complexity of the reachable free space, for distance δ, denoted by N δ (π,σ), is the total number of grid cells which have non-empty intersection with . One can compute this set of cells and extract an existing monotone path in O(N δ (π,σ)) time, by performing a bfs of the grid cells that visits only the reachable cells. This yields the following relatively easy result. We include the details both for the sake of completeness and because the algorithm we suggest is engagingly simple.

Lemma 3.1

Given two polygonal curves π and σ ind, and a parameter δ≥0, one can compute a representation of in O(N δ (π,σ)) time. Furthermore, one can decide if , and if this is the case also extract reparameterizations in O(N δ (π,σ)) time.

Proof

We create a directed graph G that has a node v(i,j) for every reachable free space cell C i,j . With each node v(i,j) we store the free space intervals \(I^{h}_{i,j}\) and \(I^{v}_{i,j}\) as well as the reachability intervals \(R^{h}_{i,j}\subseteq I^{h}_{i,j}\) and \(R^{v}_{i,j}\subseteq I^{v}_{i,j}\).

Each node v(i,j) can have an outgoing edge to its right and top neighbor; an edge between these vertices exists if and only if the corresponding reachability interval between them is nonempty. In particular, a monotone path from (0,0) to a point (x,y)∈C i,j in corresponds to a monotone path in the graph G from v(1,1) to v(i,j). Furthermore, any such monotone path has exactly k=i+j−2 edges on it.

We compute the graph G on the fly by performing a bfs on it, starting from v(1,1), and keeping the invariant that when the bfs visits a node v(i,j) it enqueues the vertices v(i,j+1) and v(i+1,j), in this order, to the bfs queue (if they are connected to v(i,j), naturally).

This implies that at any point in time, and for any k, the bfs queue contains the nodes on the kth diagonal (i.e., all nodes v(i,j) such that i+j=k−1) of the diagram sorted from left to right. However, the same node might appear twice (consecutively) in this queue.

figure g

In every iteration, the bfs dequeues the one or two copies of the same node v(i,j) and merges the two copies of the same vertex into one if necessary. Now, the one or two vertices (i.e., v(i−1,j) and v(i,j−1)) that have incoming edges to v(i,j) are known, as are their reachability intervals. Therefore one can compute the reachability intervals for v(i,j) in constant time. Now, v(i,j+1) is enqueued if and only if the top side of the cell C i,j is reachable by a monotone path (i.e., \(R^{h}_{i,j}\neq\emptyset\)), and v(i+1,j) is enqueued if and only if the right side of the cell C i,j is reachable by a monotone path (i.e., \(R^{v}_{i,j}\neq\emptyset\)). Since is convex and of constant complexity, this can be done in constant time.

Clearly, the bfs takes time linear in the size of G and it computes the reachability information for all reachable free space cells of . Now, one can check if (1,1) is reachable by inspecting the reachability intervals for \(C_{\mathsf{n}_{\pi}-1 ,\mathsf{n}_{\sigma}-1}\), and checking if the top right corner of this cell is monotonically reachable from the origin, where n π is the number of vertices of the curve π. The monotone path realizing this can be extracted in linear time, by introducing backward edges in the graph and tracing a path back to the origin. □

Observation 3.2

One can compute all relevant vertex–edge events with radius ≤δ in O(N δ (π,σ)) time as follows. We compute the graph representation of using Lemma 3.1. Next, for each reachable cell consider the vertex–edge events at its top and right boundaries and compute their event radii. Recall that a cell boundary corresponds to an edge from the one curve and a vertex from the other curve. Clearly, any cell boundary can be used by the reparameterization of width ≤δ, if and only if the corresponding event radius is smaller or equal δ.

3.2 The Approximate Decision Procedure

In the following, we are interested in the maximum complexity of the reachable free space when considering any radius δ and simplifying the curves with radius εδ. The reasons will become apparent only shortly after, in Lemma 3.5 and Lemma 3.6, where we show that the simplification radius chosen this way enables us to either (i) compute a (1+ε)-approximation of the Fréchet distance, or (ii) solve the decision problem exactly using the simplified curves (see Sect. 3.3.5).

The idea underlying this approximate decision procedure is depicted in Fig. 2. We simplify the two input curves to a resolution that is (roughly) an ε-fraction of the radius we care about (i.e., δ), and we then use the exact decision procedure on these two simplified curves. Since the Fréchet distance complies with the triangle inequality and by Lemma 2.3, we can infer the original distance from this information. In order for this approach to work, the complexity of the reachable free space for the two simplified curves has to be small. This notion of complexity is captured by the following definition.

Fig. 2
figure 2

The idea of the fuzzy decision procedure using simplification

Definition 3.3

For two curves π and σ, let

be the maximum complexity of the reachable free space for the simplified curves. We refer to N(ε,π,σ) as the ε-relative free space complexity of π and σ. In order to give a more informative analysis, we will express the asymptotic time complexity of our algorithms not in terms of the size of the input, but instead use the size of the input and the free space complexity of the input as parameters.

We assume that for any 0<ε<1 the following properties hold for N(⋅,⋅,⋅).

  1. (P1)

    For any constant c′≥1, it holds N(ε/c′,π,σ)=O(N(ε,π,σ)).

  2. (P2)

    N(ε,π,σ)≤N(ε/2,π,σ)/2.

The above properties will hold for all the families of curves we consider. In Sect. 4.1 we show that N(ε,π,σ) is a linear function in the number of vertices of the two curves for a fixed ε>0 if the curves are sufficiently well-behaved (see for example Lemma 4.4). Combining this analysis with the time complexity analysis of the algorithms will yield near linear upper bounds on the running times of these algorithms for the classes of curves considered.

Remark 3.4

In the following, when we state the time complexity of our algorithms, we always assume that N(ε,π,σ)=Ω(n), where n is the total number of vertices of π and σ.

Lemma 3.5

Let π and σ be polygonal curves ind, and let ε>0 and δ>0 be two parameters. Then, the algorithm described below output, in O(N(ε,π,σ)) time, one of the following:

  1. (A)

    , and reparameterizations of π and σ of width ≤(1+ε)δ, and this happens if .

  2. (B)

    ” if .

  3. (C)

    If then the algorithm outputs either of the above outcomes.

In either case, the statement returned is correct.

Proof

Set μ=(ε/4)δ. Compute in linear time the curves π′=simpl(π,μ) and σ′=simpl(σ,μ) using Algorithm 2.1. Let δ′=δ+2μ and observe that μ/δ′=ε/(4+2ε). Using Lemma 3.1 we can decide whether in

time, by assumption (P1). If so, we output the reparameterizations as a proof that

On the other hand, if , then this implies, by the triangle inequality, that

Therefore, the algorithm outputs “” in this case. □

3.2.1 How to Use the Approximate Decider in a Binary Search

In order to use Lemma 3.5 to perform a binary search for the Fréchet distance, we can turn the “fuzzy” decision procedure into a precise one as follows.

Lemma 3.6

Let π and σ be two polygonal curves ind, and let 1≥ε>0 and δ>0 be two parameters. Then, there is an algorithm decider(π,σ,δ,ε) that, in O(N(ε,π,σ)) time, returns one of the following outputs: (i) a (1+ε)-approximation to , (ii) , or (iii) . The answer returned is correct.

Proof

Let δ′=δ/(1+ε′), for ε′=, c=1/3. We run the algorithm of Lemma 3.5 with parameters δ and ε′. If the call returns “”, then we return this result.

Otherwise, we call Lemma 3.5 with parameters δ′ and ε′. If it returns that “” then , and we return this result.

The only remaining possibility is that the two calls returned “” and “”. But then we have found the required approximation. Therefore, the resulting approximation factor of the reparameterizations returned by the call with δ is \(\leq\frac{(1+{\varepsilon}') \delta}{\delta'}= (1+ c{\varepsilon})^{2} < (1+{\varepsilon})\) as can be easily verified, since 0<ε≤1. □

3.3 Searching for the Fréchet Distance

3.3.1 Searching in a Fixed Interval

It is now straightforward to perform a binary search on an interval [α,β] to approximate the value of the Fréchet distance, if it falls inside this interval. Indeed, partition this interval into subintervals of length εα and perform a binary search to find the interval that contains the Fréchet distance. There are O(β/εα) intervals, and this would require O(log(β/εα)) calls to decider. By using exponential subintervals, one can do slightly better, as testified by the following lemma.

Lemma 3.7

Given two curves π and σ ind, a parameter 1≥ε>0, and an interval [α,β], one can perform a binary search in [α,β] and obtain a (1+ε)-approximation to if , or report that . The algorithm, denoted by search Interval(π,σ,[α,β],ε), takes \(O{ ( \log\frac{\log (\beta/\alpha)}{{\varepsilon}} )}\) calls to decider.

Proof

Let α i =α(1+ε)i for i=0,…,M=⌊log1+ε (β/α)⌋ and α M+1=β. Perform a binary search, using decider(π,σ,δ,ε) to find the two values α i and α i+1 such that . Since α i+1=(1+ε)α i , we conclude that we found the required approximation.

It might be that during this procedure one of the calls to decider(π,σ,δ,ε) found the required approximation, and in this case we abort the binary search and just return this approximation.

This process requires O(logM)=O(loglog1+ε (β/α)) calls to decider. Observe that

Indeed, e x/2≤1+xe x for x∈[0,1], and this implies that x/2≤ln(1+x)≤x, which is the inequality used above. □

3.3.2 Searching over Events

Clearly, the procedure searchInterval(π,σ,[α,β],ε) alone does not suffice to solve our main problem, since the interval of distances we are searching over might have arbitrarily large “spread” (i.e., logβ/α might be arbitrarily large). However, the Fréchet distance must be sufficiently close to a free space event in one of the “approximate” diagrams, i.e., a free space diagram of the two simplified curves. Thus, we can identify two kinds of critical value to search over, which are candidate values for the approximate Fréchet distance. These are the events where (i) the simplification of an input curve changes, or (ii) the reachability within the approximate free space diagram changes (i.e., a free space event; see Sect. 2.2.1).

The traditional solution to overcome this problem is to use parametric search. However, in our case, since we are only interested in approximation, we can use a simpler, “approximate”, search. It is sufficient to search over a set of values which approximate the event values by a constant factor, since we will use Lemma 3.7 to refine the resulting search interval in the main algorithm. Note, for instance, that we can easily use this lemma to turn a constant factor approximation of the Fréchet distance into a (1+ε)-approximation.

Algorithm 3.8

Let searchEvents(π,σ,Z) denote the algorithm that performs a binary search over the values of Z, to compute the atomic interval of Z that contains the Fréchet distance between π and σ. This procedure uses decider (Lemma 3.6) to perform the decisions during the search.

3.3.3 Searching over Simplifications

Consider the events when the simplified curves change, see Algorithm 2.1. Consider the set of all pairwise distances between vertices of π and σ. Observe that it breaks the real line into \(\binom{n}{2} + 1\) atomic intervals, such that in each such interval the simplification does not change. Thus simpl(π,μ) (resp. simpl(σ,μ)) might result in O(n 2) different curves depending on the value of μ, where n is the total number of vertices of π and σ. As a first step we would therefore like to use Algorithm 3.8 to perform a binary search over those distances to find the atomic interval that contains the required Fréchet distance. Naively, this would require us to perform distance selection. However, it is believed that exact distance selection requires Ω(n 4/3) time in the worst case [18]. To overcome this we will perform an approximate distance selection, as suggested by Aronov et al. [6].

Lemma 3.9

Given a set P of n points ind. Then, one can compute in O(nlogn) time a set Z of O(n) numbers, such that for any , there exist numbers x, x′∈Z such that xyx′≤2x. Let approxDistances(P) denote this algorithm.

Proof

Compute an 8-well-separated pairs decomposition of P. Using the algorithm of Callahan and Kosaraju [12] this can be done in O(nlogn) time, and results in a set of pairs of subsets {(X 1,Y 1),…,(X m ,Y m )}, where m=O(n), such that for any two points p,qP there exists a pair (X i ,Y i ) in the above decomposition, such that: (i) pX i and qY i (or vice versa), and (ii) \(\max(\mathrm{d{}i{}am} ( {X_{i}} ), \mathrm {d{}i{}am} ( {Y_{i}} )) \leq \min_{p_{i} \in X_{i},q_{i} \in Y_{i}}\Vert {p_{i} - q_{i}} \Vert/8\).

This implies that the distance of any pair of points in X i and Y i , respectively, are the same up to a small constant. As such, for every pair (X i ,Y i ), for i=1,…,m, we pick representative points p i X i and q i Y i , and set i =(3/4)∥p i q i ∥. Let Z={ 1,…, m ,2 1,…,2 m } be the computed set of values.

Consider any pair of points p,qP. For the specific pair (X i ,Y i ) that contains the pair of points p and q that we are interested in, we have i =(3/4)∥p i q i ∥≤∥p i q i ∥−diam(X i )−diam(Y i )≤∥pq∥≤∥p i q i ∥+diam(X i )+diam(Y i )≤(5/4)∥p i q i ∥≤2 i , thus establishing the claim. □

3.3.4 Monotonicity Events

The following lemma testifies that the radius of a monotonicity event must be “close” to either a vertex–edge event or to the distance between two vertices. Since we will approximate the vertex–vertex distances and perform a binary search over them, this implies that we further only need to consider vertex–edge events. Furthermore, by Observation 3.2, the number of those vertex–edge events which remain in the resulting search range can be bounded by the complexity of the reachable free space.

Lemma 3.10

Let x be the radius of a monotonicity event involving vertices p,q and a segment u. Then there exists a number y such that y/2≤x≤3y, and y is either in or y is the radius of a vertex–edge event.

Proof

Let s be the intersection point of S(p,x)∩S(q,x) which lies on u. Let p′ (resp. q′) be the closest point on u to p (resp. q).

Clearly ∥p′−q′∥≤∥pq∥ (since the projection onto the nearest neighbor of a convex set is a contraction), and since p′∈B(p,x) and q′∈B(q,x), the point s lies on the segment pq′.

This implies that x=∥ps∥≤∥pp′∥+∥p′−s∥≤∥pp′∥+∥p′−q′∥≤∥pp′∥+∥pq∥, by the triangle inequality.

figure h

A similar argument implies that

If ∥pp′∥≥2∥pq∥ then the above implies that x∈[1/2,3/2]∥pp′∥. If p′ is an endpoint of u then ∥pp′∥ is in \(\mathcal{W}\). Otherwise, ∥pp′∥ is the radius of the vertex–edge event between p and u. In either case, this implies the claim.

If ∥pp′∥≤2∥pq∥ then x=∥ps∥≤∥pp′∥+∥pq∥≤2∥pq∥+∥pq∥=3∥pq∥, and of course \(\Vert {p- q} \Vert \in \mathcal{W}\). Now, the two balls of radius x centered at p and q, respectively, cover the segment pq, and we have ∥pq∥/2≤x, which implies the claim. □

3.3.5 Searching with a Fixed Simplification

Assume that we have found simplifications τ and η, such that the Fréchet distance of those curves yields the desired (1+ε)-approximation. Clearly, an approximation of suffices for our result. To this end, let searchIntervalNoSimp(π,σ,[α,β],ε) be the variant of searchInterval from Lemma 3.7 that uses Lemma 3.1 directly instead of calling decider. This version searches for the Fréchet distance in the given interval, but does not perform simplification before calling the decision procedure. It returns a (1+ε)-approximation of the Fréchet distance, given that it is contained in this interval. Note that correctness and running time of Lemma 3.7 are not affected by this modification.

Lemma 3.11

Let τ and η be two given curves ind, with total complexity n, and let [h ,h +] be an interval, such that (i) , and (ii) there is no value of in the interval [h ,h +]. Then, for ε>0, one can (1+ε)-approximate and compute reparameterizations in O((n+N)log(N/ε)) time, where \(N =N_{\leq h^{+}}(\tau,\eta)\).

Let aprxFréchetNoSimp(τ,η,[h ,h +],ε) denote this algorithm.

Proof

For two real numbers x,y>0, we define [x/y]=max(x,y)/min(x,y).

Compute , using Lemma 3.1. Next, using Observation 3.2, compute from the set Z of all the radii of the vertex–edge events of τ and η with radius at most h +. Next, we sort Z, and perform a binary search over Z, using Lemma 3.1, for the atomic interval \(\mathcal{I}=[\alpha,\beta]\) of Z that contains the Fréchet distance . Next, call searchIntervalNoSimp(τ,η,[α,4α],ε) and searchIntervalNoSimp(τ,η,[β/4,β],ε). We claim that one of these two searches performed on the respective intervals will discover two consecutive values x and (1+ε)x, such that the two corresponding calls to the algorithm of Lemma 3.7 imply that .

Indeed, the interior of [α,β] does not contain any value in \(\mathcal{W}\) or a radius of a vertex–edge event of τ and η. Therefore, the interval [α,β] might contain only monotonicity events of τ and η. By Lemma 3.10, for a monotonicity event with radius r there exists a \(y \in Z\cup\mathcal{W}\), such that [r/y]≤3. But since there is no value of \(Z\cup\mathcal{W}\) in the interior of [α,β], and therefore, for any r″∈[4α,β/4] and \(y''\in Z\cup\mathcal{W}\), we have [r″/y″]≥4.

We conclude that no monotonicity event, vertex–edge event, or value of \(\mathcal{W}\) lies in the interval [4α,β/4]. Since the Fréchet distance must be equal to one such value, it follows that , but this implies that either or . In either case, the above algorithm would have found the approximate distance.

Computing and sorting the set of vertex–edge events takes O(NlogN) time by Observation 3.2. The binary search requires O(log|Z|) calls to the algorithm of Lemma 3.1. The two calls to searchIntervalNoSimp require O(log(1/ε)) calls to Lemma 3.1. Now, observe that all these calls to the algorithm of Lemma 3.1 are done with values of δh +. Thus the complexity of the reachable free space is bounded (up to a constant factor) by the number of vertex–edge events of values ≤h +, and this number is bounded by |Z|. Therefore, a call to Lemma 3.1 takes O(|Z|) time. Thus, the overall running time is O((n+|Z|)log(|Z|/ε)), and by definition \(\lvert Z\rvert= O{ ( N_{\leq h^{+}}(\tau,\eta ) )}\). □

3.4 The Approximation Algorithm

The resulting approximation algorithm is depicted in Fig. 3. It will be used by the final approximation algorithm as a subroutine. We first analyze this basic algorithm. We will then show how to use it, in Lemma 3.15 below, to get a faster approximation algorithm. The algorithm depicted in Fig. 3 performs numerous calls to decider, with approximation parameter ε>0. If any of these calls discover the approximate distance, then the algorithm immediately stops and returns the approximation. Therefore, at any point in the execution of the algorithm, the assumption is that all previous calls to decider returned a direction where the optimal distance must lie. In particular, a call to searchInterval \(( { \pi,\sigma,\mathcal{I}, {\varepsilon}} )\), would either find the approximate distance in the interval \(\mathcal{I}\) and return immediately, or the desired value is outside this interval.

Fig. 3
figure 3

The basic approximation algorithm

3.4.1 Correctness

Lemma 3.12

Given two polygonal curves π and σ, and a parameter 1>ε>0, the algorithm aprxFréchetI(π,σ,ε) computes a (1+ε)-approximation to .

Proof

If the algorithm found the approximation before step (F), then clearly it is the desired approximation, and we are done. (In particular, this must be the case if 4α′>β′/4.)

Otherwise, because of (C), we know that . By steps (D) and (E) it must be that . Since μ=3α=(ε/10)α′≤β′/4, it follows, by the triangle inequality, that

A similar argument shows that . Hence, the algorithm of Lemma 3.11 can be applied to π′ and σ′ for the range [α′,β′], as .

Now, by Lemma 3.11, we find that the value δ resulting from step (G), is contained in the interval . By the triangle inequality we conclude that the returned Fréchet distance is

since .

Note that since it is the width of a specific reparameterization between the two curves. □

3.4.2 Running Time

Lemma 3.13

For any x,y∈(2α,β/2), we have simpl(π,x)=simpl(π,y) and simpl(σ,x)=simpl(σ,y).

Proof

Indeed, the interval (α,β) does not contain any value of Z. As such, by Lemma 3.9, (2α,β/2) does not contain any value of the pairwise distances between vertices of the vertex set of π and σ which implies that the simplification is the same for any value inside this interval. □

Lemma 3.14

Given two polygonal curves π and σ with a total of n vertices in d, and a parameter 1>ε>0, the running time of aprxFréchetI(π,σ,ε) is O(N(ε,π,σ)logn).

Proof

Computing Z (and sorting it) takes O(nlogn) time by Lemma 3.9. Steps (C), (D) and (E) perform O(logn+log(1/ε))=O(logn) calls to decider, by Lemma 3.7. (Here, we assume that ε=Ω(1/n). If ε<1/n then we can just use the algorithm of Alt and Godau [3] since its running time is faster than our approximation algorithm in this case.) Each call to decider takes O(N(ε,π,σ)) time, so overall this takes O(N(ε,π,σ)logn) time. Computing the simplifications in step (F) with Algorithm 2.1 takes O(n) time.

By Lemma 3.11, a call to aprxFréchetNoSimp(π′,σ′,[α′,β′],ε/4) takes T=O((n+N)log(N/ε)) time, with N=N β(π′,σ′). Now, 3α and β′ are both inside the interval (2α,β/2), and as such, by Lemma 3.13, we have π′=simpl(π,3α)=simpl(π,β′) and σ′=simpl(σ,3α)=simpl(σ,β′). Therefore, we have

Thus, step (G) takes T=O(N(1,π,σ)log(N(1,π,σ)n/ε))=O(N(1,π,σ)logn), time since N(1,π,σ)≤n 2 and ε=Ω(1/n). Observe that N(1,π,σ)≤N(ε,π,σ) for ε≤1.

Finally, in order to compute the resulting reparameterizations in step (H), we compute the reparameterizations of π and π′ (resp. σ and σ′) as described in the proof of Lemma 2.3 and chain them with the reparameterizations of the simplified curves, which we obtained from step (G). Clearly, this and computing the resulting width takes O(n) time. Note that by the assumption in Remark 3.4 the term N(ε,π,σ) dominates over O(n). □

The running time of Lemma 3.14 can be slightly improved.

Lemma 3.15

The algorithm aprxFréchetI depicted in Fig3 can be modified to run in time O(N(ε,π,σ)+N(1,π,σ)logn) (see Definition 3.3).

Proof

Use Lemma 3.14, with ε 0=1/2, to get a 2-approximation ζ for the Fréchet distance between π and σ. This takes O(N(1,π,σ)logn) time. Let \(\mathcal{I}_{0} = [\zeta,2\zeta]\) be the corresponding interval that contains the distance. We could call searchInterval \(({\pi,\sigma, \mathcal{I}_{0}, {\varepsilon}} )\) and get a (1+ε)-approximation in \(O ( \mathsf{N}({{\varepsilon}, \pi, \sigma} )\log\frac{1}{{\varepsilon}} +\allowbreak \mathsf{N}({1, \pi, \sigma } )\log n )\) time.

One can do better by starting with a “large” ε and decreasing it during the binary search for the right value performed by searchInterval. This is a standard idea and it was also used by Aronov and Har-Peled [5].

Indeed, assume that in the beginning of the ith step, we know that the required Fréchet distance lies in an interval \(\mathcal{I}_{i-1} = [\alpha_{i-1},\beta_{i-1}]\) and \(\beta_{i-1} -\alpha_{i-1}= \Vert{\mathcal{I}_{0}} \Vert {\varepsilon}_{i-1}\), where ε i−1=1/2i−1.

Let \(\varDelta_{i-1} = \Vert{\mathcal{I}_{i-1}} \Vert = \beta_{i-1}-\alpha_{i-1}\), and let x i,j =α i−1+ i−1/4, for j=0,1,2,3,4. Call the procedure decider on three values x i,1, x i,2, and x i,3, with the approximation parameter being c 1 ε i , for c 1>0 being a sufficiently small constant. Based on the outcome of these three calls, we can determine in constant time which of the three intervals \(\mathcal{J}_{i,1} = [x_{i,0}, x_{i,2}]\), \(\mathcal{J}_{i,2} = [x_{i,1}, x_{i,3}]\), or \(\mathcal{J}_{i,3} =[x_{i,2}, x_{i,4}]\) must contain the Fréchet distance. We set this interval to be \(\mathcal{I}_{i}\).

We repeat this process for M steps, where M=⌈lg1/ε⌉. It is easy to verify that the final interval now provides the required approximation. The running time of this algorithm is \(O{ ( \mathsf{N}({1, \pi, \sigma} ) \log n + \sum_{i=1}^{M} \mathsf{N}({{\varepsilon}_{i}, \pi, \sigma}) )} \). Now, by assumption (P2) (see Definition 3.3), we have

and this implies the claim. □

The Result

Putting the above together, we get the following result.

Theorem 3.16

Given two polygonal curves π and σ with a total of n vertices in d, and a parameter 1>ε>0, one can (1+ε)-approximate the Fréchet distance between π and σ in O(N(ε,π,σ)+N(1,π,σ)logn) time (see Definition 3.3).

Interestingly, simplification is critical for the efficiency of the above algorithm. Indeed, consider the two nicely behaved curves depicted below. The reachable portion of the free space diagram of these two curves, for the distance realizing the Fréchet distance, covers a quadratic number of cells.

figure i

The use of simplification by itself is not sufficient to guarantee that the presented algorithm is efficient. Indeed, in might not be possible to simplify the input curves at all without losing too much information. In such contrived worst case examples, the free space diagram still has quadratic complexity due to the inherent structure of the curves. See the figure below for one such example. In the next section we will analyze the relative free space complexity using realistic input models and prove the efficiency of the above algorithm, given that the input is “realistic”.

figure j

4 The Relative Free Space Complexity of Families of Curves

In this section we are going to bound the relative free space complexity for different realistic input models of curves. We will introduce the new class of c-packed curves, and we compare this new input model to the previous models of κ-boundedness and low density.

4.1 On c-Packed Curves

We introduce a new family of curves, c-packed curves, and prove that their relative free space complexity N(ε,π,σ) is linear, for any two curves π and σ in this family. This implies that Theorem 3.16 works in near linear time for c-packed curves, which is one of our main results.

4.1.1 Definition and Basic Properties

Definition 4.1

A curve π in ℝd is c-packed if for any point p in ℝd and any radius r>0, the total length of π inside the ball B(p,r) is at most cr.

Lemma 4.2

Let π be a curve ind, μ>0 be a parameter, and let π′=simpl(π,μ) be the simplified curve. ThenπB(p,r+μ)∥≥∥π′∩B(p,r)∥ for any ball B(p,r).

Proof

Let u be a segment of π′ that intersects B(p,r) and let v=uB(p,r) be this intersection. Let π u be the portion of π that got simplified into u. Observe that π u is a polygonal curve that lies inside a hippodrome of radius μ around u; that is, \(\pi_{\mathsf{u}}\subseteq\mathcal{H}_{\mathsf{u}}= \mathsf{u}\oplus B ( {0, \mu} )\), where ⊕ denotes the Minkowski sum of the two sets, see the figure below.

figure k

In particular, erect two hyperplanes passing through the endpoints of v that are orthogonal to v, and observe that π u must intersect both hyperplanes. Hence, we conclude that the portions of π u in the hippodrome \(\mathcal{H}_{\mathsf{v}}= \mathsf{v}\oplus B ( {0,\mu} )\) are of length at least ∥v∥. Clearly, vB(p,r) implies that \(\mathcal{H}_{\mathsf{v}}\subseteq B ( {p, \mathsf{r}+ \mu} )\), which in turn implies that \(\pi_{\mathsf{u}}\cap\mathcal{H}_{\mathsf{v}}\subseteq B ( {p,\mathsf{r}+ \mu} )\) and thus ∥π u B(p,r+μ)∥≥∥v∥.

Summing over all segments v in π′∩B(p,r) implies the claim. □

Lemma 4.3

Let π be a c-packed curve ind, μ>0 be a parameter, and let π′=simpl(π,μ) be the simplified curve. Then, πis a 6c-packed curve.

Proof

Assume, for the sake of contradiction, that ∥π′∩B(p,r)∥>6c r for some B(p,r) in ℝd. If rμ, then set r′=2r and Lemma 4.2 implies that ∥πB(p,r′)∥≥∥πB(p,r+μ)∥≥∥π′∩B(p,r)∥>6c r=3c r′, which contradicts that π is c-packed.

If r<μ then let U denote the segments of π′ intersecting B(p,r) and let k=|U|. Observe that k>6c r/2r=3c, as any segment can contribute at most 2r to the length of π′ inside B(p,r). Therefore we have ∥π′∩B(p,2μ)∥≥∥π′∩B(p,r+μ)∥≥∥UB(p,r+μ)∥≥, since every segment of the simplified curve π′ has a minimal length of μ. By Lemma 4.2, this implies that ∥πB(p,3μ)∥≥∥π′∩B(p,2μ)∥≥>3, which is a contradiction to the c-packedness of π. □

4.1.2 Bounding the Relative Free Space Complexity

Lemma 4.4

For any two c-packed curves π and σ ind, and 0<ε<1, we have N(ε,π,σ)=O(cn/ε).

Proof

Let δ≥0 be an arbitrary number, μ=εδ, π′=simpl(π,μ) and σ′=simpl(σ,μ)

We need to show that the complexity of D δ (π′,σ′) is O(cn/ε). A free space cell of D δ (π′,σ′) corresponds to two segments uπ′ and vσ′. The free space in this cell is non-empty if and only if there are two points pu and qv such that ∥pq∥≤δ. We charge this pair of points to the shorter of the two segments. We claim that a segment cannot be charged too many times.

figure l

Indeed, consider a segment uπ′, and consider the ball B of radius r=(3/2)∥u∥+δ centered at the midpoint of u, see the figure above. Every segment vσ′ that participates in a close pair as above and charges u for it, is of length at least ∥u∥, and the length of vB is at least ∥u∥. Since σ′ is 6c-packed, by Lemma 4.3, we see that the number of such charges is at most

since ∥u∥≥μ.

We conclude that there are at most cn free space cells that contain a point of D δ . The complexity of the free space inside a cell is a constant, thus implying the claim. □

By plugging the above into Theorem 3.16, we get the following result.

Theorem 4.5

Given two polygonal c-packed curves π and σ with a total of n vertices ind, and a parameter 1>ε>0, one can (1+ε)-approximate the Fréchet distance between π and σ in O(cn/ε+cnlogn) time.

4.2 Relative Free Space Complexity of Low-Density Curves

Definition 4.6

A polygonal curve π in ℝd is ϕ-low density if any ball B(p,r) intersects at most ϕ segments of π that are longer than r.

First, observe that this input model is less restrictive than the input model which describes c-packed curves. It can be easily seen by a simple packing argument that a polygonal c-packed curve is ϕ-low density, for ϕ=2c. For any ball B=B(p,r), consider the ball with the same center that has radius r′=2r. Any edge intersecting B that is longer than r must contribute at least r to the length of the intersection of the curve with the larger ball, which is bounded by cr′. There can be at most cr′/r=2c edges of this type.

A curve that is low density, however, is not necessarily c-packed for a small value of c. Indeed, a low-density curve π might have an arbitrarily long intersection with a ball by having sufficiently small segments, see the figure below. However, in this case π must have many vertices in the areas where its length cannot be bounded, as we will show in the following section.

figure m

Claim 4.7

Let π be a ϕ-low density polygonal curve, and let C be a hypercube ind with side length . Then, the number of edges of length of π that intersect C is bounded by c d ϕ, where \(c_{d}=\lceil{\sqrt{d} / 2 } \rceil^{d}\).

Proof

Partition the cube C into a D×D×⋯×D grid, for \(D = \lceil{\sqrt{d} / 2 } \rceil\). Clearly, any edge that intersects C that has length ≥ must intersect one of the hypercubes in this grid. A hypercube of this grid has diameter

and is included in a ball of radius . Thus, a hypercube in this grid intersects at most ϕ such long edges. We conclude that there can be at most ϕD d long edges intersecting C. □

4.2.1 Low-Density Curves Can Be Long Only if They Pay for It

Lemma 4.8 below testifies that the parts of a low-density curve, where its length cannot be bounded by a constant, can be covered with hypercubes, such that each cube intersects at most a constant number of edges and at most a constant number of other cubes. We use this construction in Lemma 4.9 to relate the length of a low-density curve to the diameter of the covered area to the number of vertices. One can verify Lemma 4.8 using an easy modification of a lemma from [13]. We provide a proof, for the sake of completeness, in Appendix B.

Lemma 4.8

Let π be a ϕ-low density curve, of which n edges are intersecting a given hypercube C ofd. The hypercube C can be covered by a set of hypercubes , such that (i) , (ii) , (iii) any point pC is covered by at most 2d hypercubes, and (iv) each hypercube of intersects at most c d ϕ edges of π, where c d is a constant that depends only on the dimension d.

Lemma 4.9

Let π be a ϕ-low density curve ind, and let C be a cube ind with side length r. Let α=∥πC∥. There must be at least Ω((α/r)1+1/(d−1)) vertices of π contained in 3C, where 3C is the scaling of C by a factor of 3 around its center.

Proof

We will first give a lower bound on the number n of edges intersecting C (i.e., the edges that contribute to α). Then we will account for the edges that have endpoints outside 3C. So, take the n edges of π that intersect C and construct the cover of C resulting from Lemma 4.8 with respect to these edges.

Let C 1,…,C N denote the cubes in this cover, where r 1r 2≤…≤r N are the side lengths of the cubes used by the cover, respectively. Lemma 4.8 implies that N≤2d+1 dn, and, therefore, a lower bound on N would provide a lower bound on n.

So, the sum of the diameters of those N cubes bounds the length of the intersection \(\alpha\leq\sum_{i=1}^{N} c_{d}\phi\sqrt{d}r_{i}\), since every cube in this cover can intersect at most c d ϕ edges of π. Setting p=d and q=d/(d−1), we observe that 1/p+1/q=1/d+(d−1)/d=1, and by Hölder’s inequality,Footnote 1 we have

Lemma 4.8 also implies that the sum of the volumes of the cubes is at most 2dvol(C), since every point in C is covered at most 2d times by this cover. Therefore we have \(\sum_{i=1}^{N} r_{i}^{d} = \sum_{i=1}^{N}\mathrm{vol} ( {C_{i}} ) \leq2^{d} \mathrm{vol} ({C} ) = 2^{d} r^{d}\). Hence

This implies that c 2(α/r)d/(d−1)N, where \(c_{2}= { ( { 2 c_{d}\phi\sqrt{d}})}^{-d/(d-1)}\). Since N≤22d+1 n, we have c 3(α/r)d/(d−1)n, where \(c_{3}=\frac{1}{2^{2d+1} }{ ( { 2 c_{d}\phi\sqrt{d}} )}^{-d/(d-1)}\).

Now, some of these n edges intersecting C can have both endpoints outside 3C. Such edges are longer than the side length of C and by Claim 4.7 their number is bounded by c d ϕ.

Hence, the number of vertices of π inside 3C is at least n − c d ϕ ≥ c 3(α/r)d/(d−1)c d ϕ. □

Remark 4.10

One can also prove Lemma 4.9 directly, by building a quadtree and arguing that for a low-density curve to be sufficiently long, many edges in it have to be (sufficiently) short, thus implying the same bound. However, the current proof is more intuitive and cleaner.

Observation 4.11

The bound in Lemma 4.9 is tight. For any m>0 and any d>0, consider the integer grid in ℝd with coordinates in the range 1,…,m, and compute a path that visits all these grid points using only the grid edges of unit length, which is clearly possible.

Now, the resulting curve is 2d-low density and has length α=m d−1 and its diameter is \(r=\sqrt{d}m\). Lemma 4.9 implies that it has Ω((α/r)d/(d−1))=Ω(m d) vertices. Since this grid has m d vertices, this is tight.

4.2.2 Accounting for Many Reachable Free Space Cells

If many columns of the free space diagram of the two simplified low-density curves contain a linear number of reachable cells, then the curve must be “long” in the vicinity of the edges corresponding to those columns, since the simplification ensures a minimal edge length. A similar argument holds for the rows. Therefore, using Lemma 4.9, we can charge the additional reachable cells to vertices of the original curves. This yields the following result.

Lemma 4.12

For any two low-density curves π and σ ind, and 0<ε<1, we have \(\mathsf{N}({{\varepsilon}, \pi, \sigma} ) = O {( \frac {n^{2(d-1)/d}}{{\varepsilon}^{2}} )}\).

Proof

Let δ≥0 be an arbitrary radius, and let π′=simpl(π,μ) and σ′=simpl(σ,μ) be their simplifications, where μ=εδ. Then, we need to prove that \(N_{\leq\delta}(\pi', \sigma') = O { ( \frac {n^{2(d-1)/d}}{{\varepsilon}^{2}} )}\).

To this end, it suffices to bound the number of vertex–edge pairs (p,u), where p is a vertex of π′, u is an edge of σ′, and the distance between p and u is at most δ (naturally, we need to apply the same argument to pairs with vertices in σ′ and edges in π′). The total number of such pairs bounds the total complexity of .

Set M=O(n 1−2/d/ε 2), and associate every vertex–edge pair (p,u) that appears in the free space diagram with the vertex p.

Consider the grid of side length δ. For a grid cell R, consider the vertex of π′ in R that is associated with the largest number of such vertex–edge pairs, and say it is being associated with d R such vertex–edge pairs, and let v R denote this “popular” vertex of π′. The total number of vertex–edge pairs associated with vertices of π′ inside R is bounded by U R =|π′⊓R| d R , where |π′⊓R| denotes the number of vertices of π′ that lie inside R.

figure n

If d R M then U R ≤|π′⊓R|M, and we charge M units to each vertex of π inside R.

If d R >M then the length of σ′ inside C/3 is at least d R μ, where C is a cube centered at R with side length O(δ). Indeed, all the charges d R rise from different segments of σ′ that are in distance at most δ from v R , and each such segment has length at least μ.

By Lemma 4.9, we find that σ must have at least Ω((d R μ/δ)d/(d−1))=Ω((d R ε)d/(d−1)) vertices inside C. There is some constant c such that

by picking M to be sufficiently large. In particular, if |π′⊓R|≤d R , then \(U_{\mathsf{R}}=\lvert{\pi'} \sqcap\nobreak \mathsf{R} \rvert\, \mathsf{d}_{\mathsf{R}}\leq\mathsf{d}_{\mathsf{R}}^{2} \leq M\lvert{\sigma} \sqcap{C}\rvert\). Hence, we charge M units to each vertex of σ inside the cube C.

Otherwise, |π′⊓R|>d R >M. But then, the length of π′ inside C is at least |π′⊓R|μ, and by Lemma 4.9, we see that π must have at least Ω((|π′⊓R|ε)d/(d−1)) vertices inside C. Arguing as above, this implies that |π′⊓R|2M|πC|. As such, we have U R =|π′⊓R| d R ≤|π′⊓R|2M|πC|. Again, we charge M units to each vertex of π inside the cube C.

Since C intersects a constant number of cells of the grid, no vertex would get charged more than a constant number of times by the above scheme. Thus, every vertex, of either curve, gets charged O(M) units overall, and the total number of vertex–edge pairs present in is O(nM), as claimed. □

Observation 4.13

One can extend the construction of Observation 4.11 to show that Lemma 4.12 is close to being tight. Indeed, consider the grid curve of Observation 4.11 in d−1 dimensions, for an integer m. We now lift it to d dimensions by considering the [1,m]d cube and placing two copies of the above curve on two opposite faces of the cube, denoted by f and f′. Let π 1 and π 2 denote these two copies.

Next, delete the even edges from π 1 and the odd edges from π 2. Connect every vertex v 1 of π 1 to its corresponding (copied) vertex v 2 in π 2 by a path made out of the m−1 unit edges along the grid line connecting the two vertices. This results in a curve π that is similar to the curve constructed in Observation 4.11, but has the advantage that when simplified for the distance μ=m it results in a curve with m d−1 segments of length ≥m that connects points that lie on f and on f′, respectively.

Let σ be a copy of π. For a fixed ε>0, we can add a single segment to π such that the Fréchet distance between the resulting curves is exactly δ=m/ε. Now, these two curves have n=2m d+2 vertices overall, and furthermore, when we simplify them for the distance μ=εδ=m, we end up with two curves such that every long edge of π′ is going to be in distance ≤δ=m/ε from a constant fraction of the edges of σ′ (this would be all the edges if \(1/{\varepsilon}> \sqrt{d}\)). Therefore the complexity of the reachable free space is Ω(n π n σ)=Ω((m d−1)2)=Ω(n 2(d−1)/d), where n π denotes the number of vertices of π′. The upper bound of Lemma 4.12 is (only) larger by a factor of O(1/ε 2).

By plugging the above into Theorem 3.16, we get the following result.

Theorem 4.14

Given two low-density curves π and σ with a total of n vertices in d, and a parameter ε>0, there exists an algorithm which (1+ε)-approximates the Fréchet distance between π and σ in \(O { (\frac{n^{2(d-1)/d}}{{\varepsilon}^{2}} + n^{2(d-1)/d} \log n )}\) time.

4.3 Relative Free Space Complexity of κ-Bounded Curves

We revisit the definitions of Alt et al. [4] of κ-bounded and κ-straight curves. Note that these definitions describe an extremely restricted class of curves while c-packed curves form a fairly general and natural class of curves. However, it is not true that any κ-bounded curve is O(κ)-packed. We therefore give a separate proof to bound the relative free space complexity of κ-bounded curves in order to improve upon the result in [4].

Definition 4.15

Let κ≥1 be a given parameter. A curve π is κ-straight if for any two points p and q on the curve, we have ∥π[p,q]∥≤κpq∥.

A curve π is a κ-bounded if for all p,qπ we find that the curve π[p,q] is contained inside B(p,r)∪B(q,r), where r=(κ/2)∥pq∥, see the figure below.

figure o

Lemma 4.16

A κ-straight curve is 2κ-packed.

Proof

Let π be a κ-straight curve in ℝd, and consider any ball B(p,r) that intersects it. Let q and s be the first and last points, respectively, along π that are in B(p,r). Clearly, ∥qs∥≤2r, and by the κ-straightness ∥πB(p,r)∥≤∥π[q,s]∥≤κqs∥≤2κr. □

Remark 4.17

It is easy to verify that a κ-straight curve is also κ-bounded. However, κ-bounded curves, counterintuitively, can have infinite length even when contained inside a finite domain. An example of this is Koch’s snowflake, which is a fractal curve depicted in Fig. 4.

Fig. 4
figure 4

Koch’s snowflake is an example of a κ-bounded curve that has infinite length but a finite diameter

To see, intuitively, why Koch’s snowflake is κ-bounded, let π i be the ith polygonal curve generated by this process. There is a natural mapping between any point of π i and π i+1, for all i. In particular, consider two points p and q on the final curve π , and consider the two sequences of points p i ,q i π i , where p i+1π i+1 (resp. q i+1π i+1) is the natural image of p i (resp. q i ), lim i→∞ p i =p, and lim i→∞ q i =q.

Now, assume that r=∥pq∥. Observe that, for all i, the polygonal curve π i is made out of segments that are all of the same length. In particular, consider the first index k, such that this segment length of π k is of length ≤r/20. It is easy to argue that ∥p k p∥≤r/5 and ∥q k q∥≤r/5. In fact, one can argue that no point of π k moves more than a distance larger than r/5 to its final location on π .

Now, a tedious argument shows that there are O(1) segments of π k separating p k from q k . Therefore this portion of the curve π k is covered by a disk of radius O(r), and the corresponding portion of the final curve between p and q is also covered by a disk of radius O(r). This implies that Koch’s snowflake is κ-bounded.

A formal proof of this fact is considerably more tedious and is omitted.

Lemma 4.18

Let π be a κ-bounded polygonal curve ind, and let μδ be parameters. Let π′=simpl(π,μ). Then the number of segments of πintersecting B(s,δ) is bounded by O(κ d(1+δ/μ)d), for any s∈ℝd.

Proof

For π=u 1 u 2u k , let Y O ={u 1,u 3,…} and Y E ={u 2,u 4,…} be the sets of odd and even segments of π′, respectively.

Let X O Y O be the set of odd segments of π′ intersecting B(s,δ). For all i, pick an arbitrary point p i on the ith segment of X O that lies inside B(p,δ). Next, pick an original point q i of π in distance at most μ from p i , for i=1,…,M=|X O |. Observe that for all i we have ∥sq i ∥≤δ+μ. Furthermore, between any two distinct points p i and p j on the simplified curve π′ there must lie an even segment of Y E in between them along the curve, and the length of this segment is at least μ (because the simplification algorithm generates segments of length at least μ). Also, the endpoints of this even segment lie on the original curve π.

We claim that no two points of Q={q 1,…,q M } can be too close to each other; that is, there are no two points q′,q″∈Q, such that r=∥q′−q″∥≤μ/(4κ). Indeed, assume for the sake of contradiction, that there are two such points. Then, by the above, the portion of π connecting them contains two points t′,t″ that are at least μ apart. Observe that π[t′,t″]⊆X=B(q′,(κ/2)r)∪B(q″,(κ/2)r). However, the maximum distance between two points that are included inside X is bounded by its diameter. We have

since κ>1. A contradiction.

However, all the points of Q lie inside a ball of radius δ+μ centered at s. Now, placing a ball of radius μ′=μ/(8κ) around each point of Q, results in a set of interior disjoint balls. This implies, by a standard packing argument, that the number of points of Q is bounded by vol(B(s,δ+μ))/vol(B(⋅,μ′))=O((δ+μ)d/(μ/κ)d)=((1+δ/μ)d κ d).

This bounds the number of odd segments of π′ intersecting the ball B(s,δ), and a similar argument holds for the even segments intersecting this ball. □

Lemma 4.19

For any two κ-bounded polygonal curves ind π and σ, 0<ε<1, we have N(ε,π,σ)=O((κ/ε)d n).

Proof

Let δ≥0 be an arbitrary radius, and set μ=εδ. Let π′=simpl(π,μ) and σ′=simpl(σ,μ). We need to show that the complexity of the reachable free space is O(κ d(1+δ/μ)d n)=O((κ/ε)d n).

The boundary of a reachable cell in the free space diagram has a non-empty intersection with D δ (π′,σ′). Otherwise its interior could not be reached by a monotone path from (0,0). Therefore, using an argument similar to the proof of Lemma 4.4, Lemma 4.18 implies the desired bound. □

By plugging the above into Theorem 3.16, we get the following result.

Theorem 4.20

Given two κ-bounded polygonal curves π and σ with a total of n vertices ind, and a parameter 1>ε>0, there exists an algorithm which (1+ε)-approximates the Fréchet distance between π and σ in O((κ/ε)d n+κ d nlogn) time.

5 Extension to Closed c-Packed Curves

The Fréchet distance for closed curves is defined as in Sect. 2 with the altered condition that the reparameterizations f and g are orientation-preserving homeomorphisms on the one-dimensional sphere. Computing the Fréchet distance for closed curves is more difficult, as the constraint that the endpoints of the curves have to be matched to each other is dropped in this case and therefore the set of reparameterizations one has to consider is larger.

Observation 5.1

The decision problem for closed curves can be reduced to the previously considered case of open curves. Given two closed c-packed curves π and σ and a parameter δ. Pick a vertex p of the curve π, and assume that we know a point q on σ that is being matched to p by a pair of reparameterizations of π and σ of width at most δ. Clearly, if we break π open at p, and σ at q, we retrieve two open curves \(\widehat{\pi}\) and \(\widehat{\sigma}\), and we can use the previous method to decide if . Hence we only need to generate a suitable set of candidates for q to determine if the Fréchet distance between π and σ is at most δ within a certain approximation error.

Lemma 5.2

Let π be a closed c-packed polygonal curve ind, and let μδ be parameters. Let π′=simpl(π,μ). Then the number of edges of πintersecting B(p,δ) is bounded by O(/μ), for any p∈ℝd.

Proof

Consider the ball B=B(p,r) of radius r=μ+δ. Any edge u of π′ that intersects B(p,δ) has to contribute at least μ to the length of the intersection with B, as the simplification guarantees that every edge of π′ is of length at least μ. Since π′ is 6c-packed, by Lemma 4.3, we have ∥Bπ′∥≤6cr, and the number of intersections of π′ with B(p,δ) is N≤∥Bπ′∥/μ≤6cr/μ=6c(μ+δ)/μ=O(c+/μ), which implies the claim. □

Lemma 5.3

Given two closed c-packed polygonal curves π and σ with a total number of n vertices and parameters δ and 1>ε>0. Let π′=simpl(π,μ) and σ′=simpl(σ,μ) denote the curves simplified with μεδ and let p be a vertex of π′. We can compute a set of points of size O(c/ε), in O(n+c/ε) time, such that if then there exists a pair of reparameterizations of width at most (1+ε)δ that matches p to an element of .

Proof

We walk along the curve σ′ starting from an arbitrary point. If the starting point is in distance δ from p, then we add it to the candidate set . As we follow along the curve we create a candidate if we

  1. (a)

    (re-)enter the ball B(p,δ), or

  2. (b)

    have traveled a distance εδ along σ′ since the last creation of a candidate, unless we have exited the ball B(p,δ) in the meantime.

Clearly this takes time.

The number of events of type (a) is bounded (up to a factor of 2) by the number of intersections of σ′ with the sphere S(p,δ), and by Lemma 5.2, this number is bounded by O(/μ)=O(c/ε). By Lemma 4.3 the simplified curve σ′ is 6c-packed and therefore the length of its intersection with B(p,δ) is at most 6. This implies that we can have at most O(6/μ)=O(6c/ε) candidates that were created at events of type (b).

Consider reparameterizations of π′ and σ′ of width at most δ. Next, consider a point qσ′ that is matched to pπ′ by these reparameterizations. Observe that qB(p,δ) and there exists, by construction, a point such that ∥qq′∥≤εδ. Let p′ be a point on π′ that is matched to q′ by the given reparameterizations.

figure p

We match the curve segment \(\widehat{\sigma}\) between q and q′ to p and the curve segment \(\widehat{\pi}\) between p and p′ to q, see the figure above. Clearly this preserves the monotonicity of the matching. By the triangle inequality, any point on \(\widehat{\sigma}\) has distance at most (1+ε)δ to p. Similarly, for any point on \(\widehat{\pi}\) there is a point on \(\widehat{\sigma}\) that is in distance δ, therefore q′ is in distance (1+ε)δ of \(\widehat{\pi}\).

We conclude that the Fréchet distance between π′ and σ′ is at most (1+ε)δ when restricted to reparameterizations matching p to q′. □

One can adapt Lemma 3.5 to the closed curves case, by considering the O(cn/ε) open curves that result from breaking σ′ at any point of . The details of the adaption are straightforward, and we only state the result.

Lemma 5.4

Given two closed polygonal c-packed curves π and σ with a total of n vertices ind, and parameters δ and 1>ε>0. Then, there exists an algorithm which, in O((c/ε)2 n) time, correctly outputs one of the following:

  1. (A)

    If then the algorithm outputs “≤(1+ε)δ”.

  2. (B)

    If then the algorithm outputs “ .

  3. (C)

    If then the algorithm outputs either of the above outcomes.

Plugging Lemma 5.4 into the algorithm of Theorem 3.16, we get the following result.

Theorem 5.5

Given two closed polygonal c-packed curves π and σ with a total of n vertices ind, and a parameter 1>ε>0, one can (1+ε)-approximate the Fréchet distance between π and σ in O(c 2 n(ε −2+logn)) time.

6 Conclusions

We presented a new approximation algorithm for the Fréchet distance for polygonal curves in any fixed dimension. The new algorithm is surprisingly simple and should be practical. Furthermore it works for any kind of polygonal curves. Since the algorithm simplifies the curves to the “right” resolution during the execution, we expect the algorithm to be fast in practice. The algorithm’s analysis relies on the concept of the relative free space complexity of curves, which tries to capture the complexity of the free space diagram when simplification is being used.

Next, we introduced the c-packed family of curves. While not all curves are c-packed, it seems that most real life curves are c-packed. The family of c-packed curves is closed under simplification, and the property of a curve being c-packed is independent of the ambient dimension of the space containing the curve. We expect this concept to be used to analyze other algorithms in the future.

In particular, the relative free space complexity of c-packed curves is linear. We gave bounds for the relative free space complexity for several other types of curves, from low-density curves to κ-bounded curves. Finally, we also showed that the algorithm can be modified to handle closed curves efficiently.

Lower Bound

Our solution to the decision problem “beats” the lower bound of Ω(nlogn) [9], by a factor of logn (see Lemma 3.5). Since our decision procedure is approximated this is not too surprising. However, it is enlightening to consider where this proof breaks for our settings. Indeed, Buchin et al. [9] generate two curves such that the Fréchet distance might be realized by one vertex of one curve matching the whole other curve. On the other hand, in our case, the input model coupled with simplification guarantees that the number of segments matching a single vertex is only a constant.