Efficient Fastest-Path Computations in Road Maps

In the age of real-time online traffic information and GPS-enabled devices, fastest-path computations between two points in a road network modeled as a directed graph, where each directed edge is weighted by a"travel time"value, are becoming a standard feature of many navigation-related applications. To support this, very efficient computation of these paths in very large road networks is critical. Fastest paths may be computed as minimal-cost paths in a weighted directed graph, but traditional minimal-cost path algorithms based on variants of the classic Dijkstra algorithm do not scale well, as in the worst case they may traverse the entire graph. A common improvement, which can dramatically reduce the number of traversed graph vertices, is the A* algorithm, which requires a good heuristic lower bound on the minimal cost. We introduce a simple, but very effective, heuristic function based on a small number of values assigned to each graph vertex. The values are based on graph separators and computed efficiently in a preprocessing stage. We present experimental results demonstrating that our heuristic provides estimates of the minimal cost which are superior to those of other heuristics. Our experiments show that when used in the A* algorithm, this heuristic can reduce the number of vertices traversed by an order of magnitude compared to other heuristics.


Introduction The Shortest, Minimal-Cost and Fastest Path Problems
The shortest-path problem on graphs is one of the most fundamental algorithms in computer science, the graph being one of the most basic and common discrete structures, modeling an abundance of real-world problems involving networks.In the most basic scenario, graph vertices represent entities in a network and an edge between two vertices indicates the existence of a link between them (e.g. a communication or social network).The shortest path between two vertices  and  in the graph is then the path between  and  containing the minimal number of edges.In the case of a communication network, this could be the cheapest way to route a message to , originating at .In the more general case, edges are assigned weights which measure a cost associated with traversing that edge.The shortest path then becomes a minimal-cost path, where the cost of the path is the sum of the costs of its edges.In the case of a communication network, the associated cost of an edge may be its conductance.
A very important type of network is a road map, the graph vertices representing road junctions and the edges road segments between the junctions.In the simplest scenario, the graph is planar, the vertices are embedded in the plane, namely have (, ) coordinates, and each edge is assigned a positive weight measuring its Euclidean length in the plane.The minimal-cost path between two vertices  and  is then the edge path of minimal Euclidean length between  and , which could indicate the shortest drive (or walk) between these two points.In practice, in vehicle navigation applications, not all the edges of the road map having the same length are equivalent for a driver, since the possible driving speed on the roads may vary, depending on the category of the road.Highways are usually preferred, as they allow for higher speeds, thus a faster drive.Consequently, the more relevant weight assigned to a road segment is the so-called "travel time", which is the segment length divided by the maximal speed possible on that segment.The resulting minimal-cost path in this weighted graph is sometimes called the "fastest path".The more realistic variant of this problem is when the graph edges are directed, namely the travel time along an edge may depend on the direction of the edge.In the special case of a one-way road, the edge exists in just one direction (or, equivalently, the travel time in the opposite direction is infinite).

Precomputation and Dynamic Fastest Path Problems
Traditional minimal-cost path algorithms do not scale well to very large networks.More practical algorithms rely on a (typically heavy) preprocessing of the graph, resulting in extra information to store along with the basic graph data, which is exploited in answering online (, )-minimal-cost queries efficiently.While effective, this approach introduces a complication.Using travel times as the costs of road network edges is useful to correctly model a real-world navigation problem, but it also imposes a dynamic character on the problem, as the maximal speed on a road segment is rarely a constant -it changes over time depending on traffic conditions -hence the travel times are dynamic.Consequently, an algorithm which relies on preprocessing of the graph data in order to speed up the online (, )-fastest-path queries, must deal with the dynamic nature of the data by periodic repetition of the preprocessing.This rules out the use of unduly heavy preprocessing.

Objective
The computation of minimal-cost paths in dynamic weighted graphs has been the subject of intense study over the past decades, and many techniques have been proposed to solve different variants of the problem.A complete survey of the state-of-the-art is beyond the scope of this paper and we refer the interested reader to the survey and comparison of Bast et al. [1].
Our contribution is the description of a very effective heuristic function which can be used in the well-known A* algorithm for minimal-cost path computation on weighted directed graphs.Computation of the heuristic is fast and can easily be repeated periodically to accommodate dynamic traffic conditions in road networks.A* with this heuristic can be used in conjunction with many other techniques to provide a more complete solution to the general problem.

The Dijkstra and A* Algorithms
While in practice we typically would like to solve the "point-to-point" minimal-cost path problem between a source vertex  and a target vertex  in a directed graph, it turns out that this indirectly involves computing the minimal-cost path from  to many other vertices in the graph.Let  = (, , ) be a directed graph with vertex set  and edge set  ⊂  × , such that the positive real cost of traversing the directed edge (, ) is (, ).The minimal cost of a path from given vertex  ∈  to given vertex  ∈  may be obtained by computing the entire function   () from  to any other vertex  ∈  by solving the following linear program: Thinking of   () as an "embedding" of the graph vertices on the real line, this means we would like to "stretch"  and  as far apart as possible on the line, subject to the constraint that the endpoints of any edge (, ) are separated by a distance of at most (, ) -the weight of the edge.
In practice, this linear program can be transformed into a dynamic programming problem, which in turn can be solved by the celebrated Dijkstra algorithm [4], which traverses the graph vertices guided by a priority queue of vertices.The procedure terminates if  is reached and a minimal-cost path is then generated by tracing the path backwards from .If the priority queue empties before  is reached, the search fails and a minimal-cost path does not exist (e.g. if the graph is not connected).The complexity of the most efficient implementation of the Dijkstra algorithm [5] is ( +  log ), where  is the number of graph edges and  the number of graph vertices.Unfortunately, this is prohibitive, for the number of edges  is typically much larger than the number of edges along the minimal-cost path.
The Dijkstra algorithm may be accelerated into a "guided" A* search [7] if there is additional domain knowledge in the form of a heuristic function ℎ(, ), one that estimates the minimal cost from  to .If ℎ satisfies the additional consistency (or monotonicity) condition ℎ(, ) − ℎ(, ) ≤ (, ) for every edge (, ) of the graph and every vertex , then A* can be implemented more efficiently -no node needs to be processed more than once -and A* is equivalent to running Dijkstra's algorithm with the modified (still positive) edge weights: ′(, ) = (, ) + ℎ(, ) − ℎ(, ).In practice, in addition to the OPEN priority queue, a list CLOSED is maintained.Once popped from OPEN, a vertex goes into CLOSED and is never considered again.
Note that the original Dijkstra algorithm is equivalent to A* with the trivial admissible and consistent heuristic ℎ(, ) ≡ 0.
The following two theorems are useful in characterizing heuristics.

A* Heuristics
Much effort has been invested in designing good heuristics for A*.A complete account would be lengthy, and much of it is domain-dependent, so we discuss here just the most generic methods.

The Optimal Heuristic
In general, it is possible to precompute the optimal heuristic ℎ(, ) by solving a convex semi-definite program (SDP) for the ( 2 ) values of ℎ, forcing the conditions necessary for the heuristic to be admissible and consistent [9].It relies on the convenient fact that it is sufficient for the heuristic to be "locally" admissible on single edges, namely ℎ(, ) ≤ (, ) for every edge (, ) in order that it be admissible over arbitrary paths, significantly reducing the number of linear inequality conditions in the semi-definite program to (), thus the complexity of the entire algorithm to ( 3 ).However, this complexity is still prohibitive and the method is not applicable to graphs containing more than a few thousand vertices.A number of improvements to this are possible, but the method still remains quite complicated.

The Differential Heuristic DH
A very simple, but surprisingly effective differential heuristic, was proposed by Goldberg et al. [6] (who called it ALT) and independently by Chow [2].It requires some preprocessing of the graph  = (, , ).A small number (usually  ≤ 10) "landmark" vertices (also called anchors/pivots/centers)  1 , . .,   are chosen from ().In a preprocessing step, for each vertex  ∈ (), the vector of minimal costs () = �( 1 , ) , . ., (  , )� is computed and stored.Then, at the online computation of the minimal-cost path from  to , the heuristic is used.This heuristic requires �( +  log )� preprocessing time and () space to store.Given  and , ℎ(, ) can be computed online in () time.
It is convenient to think of () as an embedding of  in   and ℎ() as the embedding distance between  and  using the  ∞ norm: It is easy to see that ℎ(, ) is exact, namely ℎ(, ) = (, ), if  is on one of the minimal-cost paths between   and .It is also easy to apply Theorems 1 and 2 to show that the differential heuristic is admissible and consistent.
The degrees of freedom in this heuristic are the choice of the landmark vertices.Goldberg et al [6] show how to optimize these, concluding that a good choice are landmarks which cover the graph well.In the special case of a plane (or close to plane) graph, a good choice are vertices covering the boundary.In the sequel we will call this heuristic DH.

The FastMap Heuristic FM
Inspired by the interpretation of the differential heuristic as an embedding distance and by the FastMap algorithm used in machine learning, Cohen et al. [3] devised another embedding based on pairs of landmarks and defined the heuristic using the  1 norm distance between the embeddings.
The algorithm proceeds by finding a pair of farthest vertices ( 1 ,  1 ) -those having a large minimal-cost path between them -and computing for every vertex : Defining the following function on pairs of vertices: the weight (, ) of the graph edge (, ) is then modified by subtracting ℎ 1 (, ) from it and the process repeated  − 1 times on the modified graph to obtain the embedding vector () = � 1 (), . .,   ()�.The final heuristic is the  1 embedding distance: The authors show that this heuristic is also admissible and consistent.In the sequel we will call this heuristic FM.

The Separator Heuristic SH
Since each landmark employed by the differential heuristic defines a cost field on the graph vertices, where every vertex is assigned the value of the minimal cost of a path between vertex and the landmark, we first observe that this concept may be easily generalized.Instead of a landmark being a mere single vertex, it may be a set of vertices  ⊂ (), and the cost of a vertex  (relative to ) is defined as: This defines a more complicated distance field per landmark, to which the triangle inequality may be applied to obtain an analogous differential heuristic.Unfortunately, in practice this generalization does not add much power to that heuristic.
Significantly more power can be obtained if the set  is a separator of the graph, namely its removal (along with the edges incident on the removed vertices) results in  being partitioned into three sets  1 ,  and  2 =  −  1 −  , such that there exists no edges between  1 and  2 .This means that  separates between  1 and  2 and the separated graph contains at least two connected components, none of them mixing  1 and  2 .We may take advantage of the dichotomy on  induced by  by defining a signed cost field on  − positive in  1 and negative in  2 .Denote this signed cost field by .
Fig. 1 shows the unsigned cost fields induced on a road network by a single landmark vertex or a set of 5 landmark vertices, compared to the signed cost field induced by a separator.
As with the differential heuristic, we choose  separators  1 , . .,   , and define the embedding and the resulting heuristic is the  ∞ embedding distance: Using a signed cost boosts the values of this heuristic significantly.It remains to show that it is still admissible and consistent.Since the use of the signed cost changes the rules of the game relative to the differential heuristic, we provide next a separate proof of admissibility and consistency.In the sequel we will call this heuristic SH.

QED
Note that the separator itself may not be connected, and even if it is, it may separate the graph into more than two connected components, as in Fig. 3.This does not change any of the arguments above.

Computing the SH Heuristic
Although computing the heuristic is done in a preprocessing stage, it is still important that it be computable somewhat efficiently.In many applications (e.g.traffic-sensitive navigation) the edge weights are dynamic, namely change over time, so the heuristic must be updated periodically to reflect the new weights.Hence efficiency is important.
At first glance, it seems that computing the SH heuristic is much more complex than computing DH.DH requires a single-source minimal-cost computation over the entire graph for each of the  landmarks, costing �( +  log )� time.Using the same logic, it would seem that computing SH requires similar computation for each vertex in the separators, whose size in a planar graph is �√� [8], thus costing  �√( +  log )� time, which is significantly more than the complexity of computing DH.Fortunately, a straightforward "trick" reduces this complexity back down to the same proportions as DH.For each separator , introduce a new "virtual" vertex   to the graph connected to all vertices of , and assign a zero weight to all these edges.Then computing the heuristic associated with  is easily seen to be reduced to computing a single-source minimal-cost path computation over the entire new graph for   .

Choosing Good Separators
The quality of the SH heuristic very much depends on the choice of separators.It seems that the most informed value of ℎ(, ) is obtained when  and  are separated by one of the   and the separator is compact, in the sense that it contains few vertices and these vertices are "close" to each other, i.e. the "cost diameter" of the separator is small.In this case the separator functions as a "bottleneck", through which the minimal-cost path between  and  must pass, and the three points   ,   and   mentioned in the proof of Theorem and if   and   are connected by a path of small cost, (  ,   ) is probably small.
If the separator is not very compact it is difficult to guarantee that (  ,   ) will always be small.Indeed, it is easy to construct simple examples where (  ,   ) is very large.In analogy to the Euclidean planar case, a good rule of thumb is that if the separator is more or less parallel to the bisector between  and , (  ,   ) will be small.See Fig. 2 (right).If the graph is a road network that contains highways with small travel times, it is quite effective to use these highways as separators, as they are typically also minimal-cost paths, so all the vertices of the separators are very "close" to each other.Care must be exercised to completely separate the graph along the highway, as typically there are overpasses and underpasses related to the highway, i.e. the graph may not be planar close to the highway.
When the highways do not cover the road network in a systematic manner, it is more practical to take advantage of the planar layout of the network and simply "slice up" the network by straight lines.The simplest approach is to use equally-spaced horizontal and vertical lines.Each such line defines a separator as the vertices on the set of edges intersecting the line, on the one side of the line.However, based on the analogy to planar bisectors mentioned above, it is also advantageous that these lines span a variety of angles.It may also be more practical to use a piecewise-linear polyline to manually (i.e.interactively) define the separator, as this allows to better adapt to the features of the network.
Another way to obtain compact separators is by using the very effective METIS [12] software package for computing compact balanced separators in graphs.See Fig. 4 for example of a polyline separator and one generated by METIS.While we found that METIS generates very compact separators, it completely ignores the "cost diameter" of the separator, so is not optimal for our purposes.We also found that it is difficult to control METIS and cause it to generate a variety of separators at different locations and angles.

Directed Graphs
The preceding discussion is valid for directed graphs, in which the minimal-cost function is symmetric (thus also a metric): (, ) = (, ).In reality, road networks are directed graphs, traffic flowing with different velocities in opposite directions, with one-way roads as an extreme case of zero flow in one direction -a fact that cannot be ignored in a real-world application.Thus (, ) − the travel time from  to  -will typically be different from (, ).
The DH and SH heuristics described above may be generalized to the directed case, by storing two values per coordinate, representing minimal costs in opposite directions.For example, given a landmark vertex , the directed triangle inequalities relating to (, ) are (see Fig.  Note that, as opposed to the undirected case, ℎ(, ) may (in rare cases) be negative, so it should be capped at zero: ℎ(, ) = max { (, ) − (, ), (, ) − (, ), 0 } As in the undirected case, using the SH in the directed case requires using separators.These are defined and generated in the same way as the undirected case, i.e. the separation property ignores the directionality of the edges.Since the cost is no longer symmetric, it is not as useful to use the concept of "cost field", rather just the concept of (undirected) connected components.Fig. 6 summarizes the computation of our SH heuristic based on a separator  in the directed case:

Preprocessing:
For each separator , compute: 1.For every graph vertex , (, )the minimal cost from  to . 2. For every graph vertex , (, )the minimal cost from  to .
3. For every graph vertex , the label of the (undirected) connected component it belongs to when all edges connecting  to other vertices in the graph (in both directions) are removed.

Online Query:
Given vertices , , the SH heuristic based on separator , is: Computing (, ) for all vertices  is easy, as discussed in Section 4.1, through the use of a virtual vertex connected with edges of weight zero to all vertices of , and then performing a one-to-all minimal-cost computation from that vertex.At first glance, it would seem that directly computing the opposite (, ) is not that straightforward, but this may be solved by reversing the directions of all the graph edges.

Experimental Results
We have implemented the heuristics mentioned in this paper, namely the differential heuristic (DH), FastMap (FM) and our separator heuristic (SH) and compared how informed they are when approximating the travel time on a number of road networks whose edges are weighted with realistic travel times.We were not able to use the popular benchmark road networks from the 9 th DIMACS Implementation Challenge -Shortest Paths dataset [10], because these are undirected graphs, so do not reflect reality.Instead, we extracted directed graphs on the equivalent areas of New York, Colorado and the Bay Area from OpenStreetMap [11].Table 1 shows the specs of those graphs.We were surprised to discover that these were 10x more detailed than those in the DIMACS Challenge.The edges of the graphs were weighted by the minimal travel time along that edge, which was computed as the Euclidean length of the edge (as computed from the latitude and longitude information per vertex) divided by the maximal speed on that edge, as extracted from OpenStreetMap.In our experiments, we randomly chose 10,000 pairs of vertices from each map by randomly choosing two points (, ) uniformly distributed within the bounding box of the map, and then "snapping" those two points to the closest map vertex, as long as the snap was not too far.We then compared the true fastest path time (, ) with the heuristic ℎ(, ), when varying the number of "coordinates" used in the heuristics between 4, 6 and 8.We performed this experiment for the directed graph and an undirected version of the same graph, where the weight of an edge was taken as the minimal weight of the edges in each direction.In the directed case, we compared SH only to DH, as it is unclear how to generalize FM to the directed case.The DH landmarks were spread uniformly around the boundary of the network.The FM landmark pairs were computed in the manner described by Cohen el al [3], as pairs with distant travel times between them.The SH separators were chosen as a mix of METIS separators and polyline separators specified interactively to take advantage of bottlenecks in the networks.
For each pair of vertices (, ), we measure the relative quality of the heuristic: which is a value in [0,1] reflecting how informed the heuristic is.
Tables 2 and 3 show the mean and standard deviations of the heuristic qualities for the experiments we performed, on the undirected and directed graphs, respectively.Good values should be between 80% and 100%.
The results show that in the undirected case, the SH heuristic is consistently more informed by 3% to 13% than the DH heuristic, which in turn is also 3% to 13% more informed than the FM heuristic.The results are similar in the directed case: SH is 3% to 12% to more informed than DH.Although a 3% improvement in the quality of the SH heuristic over the DH heuristic would seem rather small, it can make a surprisingly big difference in the performance of the A* algorithm.The effect of a good heuristic is to reduce the number of road network vertices traversed during the search for the fastest path.Thus the efficiency of a heuristic in conjunction with A* is measured as the number of vertices on the fastest path divided by the total number of vertices traversed by A*:

Graph
The closer this number is to 1 -the more efficient the heuristic is.The efficiency of the heuristic is the mean of this quantity over all possible pairs (, ).The best possible efficiency on a road network is typically 40%-50%, since any variant of A* must traverse at least the fastest path vertices and also their immediate neighbors.When a heuristic is used, the efficiency can drop dramatically to the vicinity of 1%, meaning 100 vertices of the graph are explored for every one vertex along the fastest path.Tables 4 and 5 compare the efficiencies of the different heuristics using the same formats as Tables 2 and 3     To illustrate better the efficiency of the SH heuristic compared to that of the DH heuristic, Figs. 9 and 10 show the vertices traversed by A* when searching for the fastest path using the different heuristics on the same (, ) pair, in the undirected and directed cases.Despite the modest improvements in quality between DH and SH, the efficiency is improved by anywhere between a factor of 2.9 and a factor of 30.
It is interesting to understand better the effect of the location of the separator on the efficiency of A* using SH.Fig. 11 shows how A* traverses a road network when searching for the fastest path between two vertices using SH based on a single separator, in three different locations.For simplicity, this network is undirected and its edges are weighted by Euclidean edge lengths.The separators are all parallel to the "bisector" between the two vertices, but at different distances from the target.As long as the vertex under investigation is separated from the target vertex, the heuristic seems to be quite informed.This changes, sometimes quite dramatically, when the separator is crossed, indicating that the true power of the heuristic is in its separation property, as opposed to, e.g. the DH heuristic, which is based on no more than the very basic triangle inequality.
Like DH, SH may be used in conjunction with other types of optimizations of the A* algorithm (e.g.bi-directional search, reach-based and hierarchical methods) to independently boost its performance.

Figure 1 :
Figure 1: Cost fields on undirected graph with edges weighted by Euclidean edge lengths: (left) Unsigned cost field induced by a single (magenta) landmark.(middle) Unsigned cost field induced by a set of 5 landmarks.(right) Signed cost field induced by a separator.

Figure 2 :Theorem 3 :Case 1 :Case 2 :
Figure 2: (left) Illustration of Case 2 of proof of Theorem 3. The blue path is the minimal-cost path between vertices  and , which must cross the separator   at some vertex   .Vertices   and   are those on the separator having minimal cost to  and , respectively.(right) Analogous scenario for the planar Euclidean distance function.If   is approximately parallel to the (dotted) bisector between  and , (  ,   ) will be small and ℎ(, ) more informed.

Figure 3 :
Figure 3: A separator may disconnect the graph into (left) two or (right) more connected components.

Figure 4 :
Figure 4: Example of separators in the Bay area.(left) A separator defined by a dotted pink polyline effectively cutting through bridges across the bay and mountain ridges.The separator is the set of pink vertices which are incident on the cut edges to the left of the polyline.The blue and green regions are the largest connected components in the network after the separator is removed.Black regions are the union of the other (usually very small) connected components.(right) A more compact and more balanced separator computed by METIS.

Figure 5 :
Figure 5: Computation of the DH heuristic based on the landmark  in the directed case.The costs of the minimal-cost paths from  in both directions are stored for all vertices.

Figure 6 :
Figure 6: SH heuristic ℎ  (, ) for a directed weight graph based on a separator .

Figure 7 :
Figure 7: Comparison of heuristics using  = 4 coordinates on undirected weighted road networks.Red points are DH landmarks.Blue points joined by line segments are FM pairs.SH separators are in other colors.Top: New York (NY).Middle: Colorado (COL), Bottom: Bay Area (BAY).Left: Road network, Middle: Heuristic quality histogram, Right: Heuristic efficiency histogram.

Figure 8 :
Figure 8: Comparison of heuristics using  = 4 coordinates on directed weighted road networks.Red points are DH landmarks.Blue points joined by line segments are FM pairs.SH separators are in other colors.Top: New York (NY).Middle: Colorado (COL), Bottom: Bay Area (BAY).Left: Road network, Middle: Heuristic quality histogram, Right: Heuristic efficiency histogram.

Figure 10 :
Efficiency of the heuristics with  = 4 coordinates on different directed road networks.Top: NY, Middle: COL, Bottom: BAY.Colored vertices show the vertices traversed during the A* search for the fastest path from the black vertex to the magenta vertex.Green -SH, Red -DH.Magenta dotted lines indicate the SH separators.Cyan indicates fastest path, usually taking advantage of highways.

Figure 11 :
Figure 11:The effect of the location of the separator on the efficiency of the SH heuristic in an undirected road network whose edges are weighted by Euclidean edge lengths.Green vertices are those traversed by A* using SH with a single separator, marked in magenta, when computing the fastest path from the black source to the magenta target vertex.The separator is parallel to the bisector between the two vertices, but at different distances from the target.Note the deterioration in the efficiency once the separator is crossed.
The simplest example of a heuristic function for a plane graph with edge-length weights is the planar Euclidean distance ℎ(, ) = �|() − ()|� 2 , where () are the 2D coordinates of the position of  in the plane.A* is guaranteed to find the shortest path if ℎ is admissible, namely is a lower bound on the true minimal cost.It is easy to see that the Euclidean distance mentioned above has this property.Like the Dijkstra algorithm, A* maintains a priority queue of OPEN vertices.If ℎ is not admissible, a path will still be found, but not necessarily the minimal-cost path.

Table 1 :
Statistics of the graphs used in our experiments, as extracted from OpenStreetMap.

Table 2 :
Mean and standard deviation of heuristic quality (%) as measured in our experiments, over 10,000 pairs of vertices on an undirected road network.

Table 3 :
Mean and standard deviation of heuristic quality (%) as measured in our experiments, over 10,000 pairs of vertices on a directed road network.
. SH is more efficient than DH by a factor between 1.35 and 2.4 in the undirected case, and between 1.26 and 2.67 in the directed case.Figs.7 and 8give more details of the results for the simplest case of  = 4 on undirected and directed road networks.The left column of each table illustrates the four DH landmarks in red, the four FM pairs in blue and the four SH polyline separators in four other colors.The middle column shows the histogram of the distribution of the qualities of DH, FM and SH values in red, blue and green, respectively.The right column shows the histogram of the efficiencies, color-coded in the same way.

Table 4 :
Mean and standard deviation of heuristic efficiency (%) for A* as measured in our experiments, over 1,000 pairs of vertices on an undirected road network.

Table 5 :
Mean and standard deviation of heuristic efficiency (%) for A* as measured in our experiments, over 1,000 pairs of vertices on a directed road network.