# Trip Planning Queries in Road Network Databases

**DOI:**https://doi.org/10.1007/978-3-319-17885-1_1416

## Synonyms

## Definition

Consider a database that stores a spatial road network and a set of points of interest that are located on the edges of the network and belong to one or more categories from a fixed set of categories \(\mathcal{C}\). The Trip Planning Query (TPQ) is defined as follows: the user specifies two points on the edges of the network, a starting point *S* and a destination point *E*, and a subset of categories \(\mathcal{R}\), (\(\mathcal{R}\subseteq \mathcal{C}\)), and the goal is to find the *best* trip (route) that starts at *S*, passes through exactly one point from each category in \(\mathcal{R}\) and ends at *E*. An example of a TPQ is the following: A user plans to travel from Boston to Providence and wants to stop at a supermarket, a bank, and a post office. Given this query, a database that stores the locations of objects from the categories above (as well as other categories) should compute efficiently a feasible trip that minimizes the total traveling distance. Another possibility is to provide a trip that minimizes the total traveling time.

To formalize it further, consider the TPQ problem on a *metric graphs*. Given a connected graph \(G(\mathcal{V},\mathcal{E}\)) with *n* vertices \(\mathcal{V} =\{ v_{1},\ldots, v_{n}\}\) and *s* edges \(\mathcal{E} =\{ e_{1},\ldots, e_{s}\}\), the cost of traversing a path *v*_{ i }, *…*, *v*_{ j } is denoted by *c*(*v*_{ i }, *…*, *v*_{ j }) ≥ 0. *G* is a metric graph if it satisfies the following conditions: (1) *c*(*v*_{ i }, *v*_{ j }) = 0 iff *v*_{ i } = *v*_{ j }; (2) *c*(*v*_{ i }, *v*_{ j }) = *c*(*v*_{ j }, *v*_{ i }); and (3) The triangle inequality *c*(*v*_{ i }, *v*_{ k }) + *c*(*v*_{ k }, *v*_{ j }) ≥ *c*(*v*_{ i }, *v*_{ j }). Given a set of *m* categories \(\mathcal{C} =\{ C_{1},\ldots, C_{m}\}\) (where *m* < *qn*) and a mapping function *π* : *v*_{ i }*⟶ C*_{ j }that maps each vertex \(v_{i} \in \mathcal{V}\) to a category \(C_{j} \in \mathcal{C}\), the TPQ problem can be defined as follows:

*Given a set*\(\mathcal{R}\subseteq \mathcal{C}(\mathcal{R} =\{ R_{1},R_{2},\ldots, R_{k}\})\), *a starting vertex S and an ending vertex E, identify the vertex traversal*\(\mathcal{T} =\{ S,v_{t_{1}},\ldots, v_{t_{k}},E\}\)*(also called a trip) from S to E that visits at least one vertex from each category in*\(\mathcal{R}\) (i.e., \(\cup _{i=1}^{k}\pi (v_{t_{i}}) = \mathcal{R}\)) *and has the minimum possible cost*\(c(\mathcal{T}\)) *(i.e., for any other feasible trip*\(\mathcal{T}^{{\prime}}\)*satisfying the condition above*,\(c(\mathcal{T} ) \leq c(\mathcal{T}^{{\prime}}\))).

## Historical Background

TPQ was proposed in Li et al. (2005). It can be considered as a generalization of the Traveling Salesman problem (TSP) (Arora 1998) which is *NP*-hard. The reduction of TSP to TPQ is straightforward. By assuming that every point belongs to its own distinct category, any instance of TSP can be reduced to an instance of TPQ. A simple polynomial time 2-approximation algorithm for TSP on a metric graph can be obtained using the Minimum Spanning Tree (MST) (Cormen et al. 1997). The best constant approximation ratio for metric TSP is the \(\frac{3} {2}\)-approximation that can be derived by the Christofides algorithm (Christofides 1976). Also, a polynomial time approximation scheme (PTAS) for Euclidean TSP has been proposed by Arora (1998). For any fixed *ɛ* > 0 and any *n* nodes in \(\mathbb{R}^{2}\) the randomized version of the scheme can achieve a (1 +*ɛ*)-approximation in \(O(n \; \log ^{O(\frac{1} {\varepsilon } )}n)\) running time. There are many approximation algorithms for variations of the TSP problem, e.g., TSP with neighborhoods (Dumitrescu and Mitchell 2001), nevertheless these problems are not closely related to TPQ queries. A very good reference for a number of approximation algorithms on different versions of TSP is Arora (1998). Finally, there are many practical heuristics for TSP, e.g., genetic and greedy algorithms, that work well for some practical instances of the problem, but no approximation bounds are known about them. The optimal sequenced route selection problem proposed in Sharifzadeh et al. (2007) is the same problem but with the additional constraint that the user specifies the order of the visited categories. Therefore, the problem is not NP-hard anymore and can be solved in polynomial time. Another similar problem appeared in Ma et al. (2006), where the start and end points of the TPQ are the same point. The algorithm presented in Ma et al. (2006) is based on R-trees and is focused on objects located in a Euclidean space.

TPQ is also closely related to the Generalized Minimum Spanning Tree (GMST) problem. The GMST is a generalized version of the MST problem where the vertices in a graph *G* belong to *m* different categories. A tree *T* is a GMST of *G* if *T* contains at least one vertex from each category and *T* has the minimum possible cost (total weight or total length). Even though the MST problem is in *P*, it is known that the GMST is in *NP*. There are a few methods from the operational research and economics community that propose heuristics for solving this problem (Myung et al. 1995) without providing a detailed analysis on the approximation bounds. The GMST problem is a special instance of an even harder problem, the Group Steiner Tree (GST) problem (Garg et al. 1998). Since the GMST problem is a special instance of the GST problem, such bounds apply to GMST as well.

## Scientific Fundamentals

Because TPQ is difficult to solve optimally (NP-hard problem), approximation algorithms for answering the query are examined. For ease of exposition, a solution for the case that objects can move anywhere in space (not on a specific network) is presented first and then this solutin is extended to the road network case.

In the rest, the total number of vertices is denoted by *n*, the total number of categories by *m*, and the maximum cardinality of any category by *ρ*. For simplicity, it will be assumed that \(\mathcal{R} = \mathcal{C}\), thus *k* = *m*. Generalizations for \(\mathcal{R}\subset \mathcal{C}\) are straightforward. Also, \(\mathcal{T}_{a}^{P}\) denotes an approximation trip for problem *P*, while \(\mathcal{T}_{o}^{P}\) denotes the optimal trip. When *P* is clear from context the superscript is dropped.

### Nearest Neighbor Algorithm

The most intuitive algorithm for solving TPQ is to form a trip by iteratively visiting the nearest neighbor of the last vertex added to the trip from all vertices in the categories that have not been visited yet, starting from *S*. Formally, given a partial trip \(\mathcal{T}_{k}\) with *k* < *m*, \(\mathcal{T}_{k+1}\) is obtained by inserting the vertex \(v_{t_{k+1}}\) which is the nearest neighbor of \(v_{t_{k}}\) from the set of vertices in \(\mathcal{R}\) belonging to categories that have not been covered yet. In the end, the final trip is produced by connecting \(v_{t_{m}}\) to *E*. This algorithm is called \(\mathcal{A}_{NN}\), which is shown in Algorithm 1.

### Algorithm 1 \(\mathcal{A}_{NN}(G^{c}\),\(\mathcal{R}\),*S*,*E*)

1: \(v = S,I =\{ 1,\ldots, m\},\mathcal{T}_{a} =\{ S\}\)

2: **for***k* = 1 to *m***do**

3: *v* = the nearest *NN*(*v*, *R*_{ i }) for all *i* ∈ *I*

4: \(\mathcal{T}_{a} \leftarrow \) {v}

5: *I* ← *I* −{ *i*}

6: **end for**

7: \(\mathcal{T}_{a} \leftarrow \) {E}

It is possible to bound the approximation ratio of algorithm \(\mathcal{A}_{NN}\). Theorem 1 gives its bound. It should be pointed out that such bound is obtained based on the worst case analysis. In practice, much better bound will be obtained for a given problem instance.

### Theorem 1.

\(\mathcal{A}_{NN}\) *gives a (2* ^{m+1} *− 1)-approximation (with respect to the optimal solution). In addition, this approximation bound is tight.*

### Minimum Distance Algorithm

*T*1 = {

*S*→

*A*1 →

*B*1 →

*C*1 →

*E*} instead of the optimal trip

*T*2 = {

*S*→

*C*2 →

*A*2 →

*B*2 →

*E*}. In \(\mathcal{A}_{NN}\), the search in every step greedily expands the point that is closest to the last point in the partial trip without considering the end destination, i.e., without considering the direction. The more intuitive approach is to limit the search within a vicinity area defined by

*S*and

*E*as the ellipse shown in Fig. 1b.

Therefore, a novel greedy algorithm is introduced that is based on the intuition discussed above, called \(\mathcal{A}_{MD}\), that achieves a much better approximation bound, in comparison with the previous algorithm. The algorithm chooses a set of vertices {*v*_{1}, *…*, *v*_{ m }}, one vertex per category in \(\mathcal{R}\), such that the sum of costs *c*(*S*, *v*_{ i }) + *c*(*v*_{ i }, *E*) per *v*_{ i } is the minimum cost among all vertices belonging to the respective category *R*_{ i } (i.e., this is the vertex from category *R*_{ i } with the minimum traveling distance from *S* to *E*). After the set of vertices has been discovered, the algorithm creates a trip from *S* to *E* by traversing these vertices in nearest neighbor order, i.e., by visiting the nearest neighbor of the last vertex added to the trip, starting with *S*. The algorithm is shown in Algorithm 2.

### Algorithm 2 \(\mathcal{A}_{MD}(G^{c},\mathcal{R},S,E)\)

1: *U* = ∅

2: **for***i* = 1 to *m***do**

3: *U* ← *π*(*v*) = *R*_{ i }: *c*(*S*, *v*) + *c*(*v*, *E*) is minimized

4: *v* = *S*, \(\mathcal{T}_{a} \leftarrow \{ S\}\)

5: **while***U* ≠ ∅ **do**

6: *v* = *NN(v, U)*

7: \(\mathcal{T}_{a} \leftarrow \{ v\}\)

8: Remove *v* from *U*

9: **end while**

10: \(\mathcal{T}_{a} \leftarrow \{ E\}\)

### Theorem 2.

*If m is odd then* \(\mathcal{A}_{MD}\) *gives an m-approximate solution. In addition this approximation bound is tight. If m is even then* \(\mathcal{A}_{MD}\) *gives an m + 1-approximate solution. In addition this approximation bound is tight.*

Similarly, the bound for \(\mathcal{A}_{MD}\) is based on worst case analysis. In practice, much better bound is expected for any particular problem instance.

### TP Queries in Road Networks

*B*

^{+}-trees. For that purpose, the location of any point \(p \in \mathcal{P}\) is represented as an offset from the road network node with the smallest identifier that is incident on the edge containing

*p*. For example, point

*p*

_{4}is 1.1 units away from node

*n*

_{3}.

### Implementation of \(\mathcal{A}_{NN}\)

Nearest neighbor queries on road networks have been studied in Papadias et al. (2003), where a simple extension of the well known Dijkstra algorithm (Cormen et al. 1997) for the single-source shortest-path problem on weighted graphs is utilized to locate the nearest point of interest to a given query point. The algorithm of Papadias et al. (2003) can be used to incrementally locate the nearest neighbor of the last stop added to the trip, that belongs to a category that has not been visited yet. The algorithm starts from point *S* and when at least one stop from each category has been added to the trip, the shortest path from the last discovered stop to *E* is computed.

### Implementation of \(\mathcal{A}_{MD}\)

Here, the idea is to first locate the *m* points from categories in \(\mathcal{R}\) that minimize the network distance *c*(*S*, *p*_{ i }, *E*) using the underlying graph \(\mathcal{N}\), and then create a trip that traverses all *p*_{ i } in a nearest neighbor order, from *S* to *E*. It is easy to show with a counter example that simply finding a point *p* that first minimizes cost *c*(*S*, *p*)and then traverses the shortest path from *p* to *E*, does not necessarily minimize cost *c*(*S*, *p*, *E*). Thus, Dijkstra’s algorithm cannot be directly applied to solve this problem. Alternatively, an algorithm (shown in Algorithm 3) is proposed for identifying such points of interest.

The algorithm locates a point of interest *p* : *π*(*p*) ∈ *R*_{ i } (given *R*_{ i }) such that the distance *c*(*S*, *p*, *E*) is minimized. The search begins from *S* and incrementally expands all possible paths from *S* to *E* through all points *p*. Whenever such a path is computed and all other partial trips have cost smaller than the tentative best cost, the search stops. The key idea of the algorithm is to separate partial trips into two categories: one that contains only paths that have not discovered a point of interest yet, and one that contains paths that have. Paths in the first category compete to find the shortest possible route from *S* to any *p*. Paths in the second category compete to find the shortest path from their respective *p* to *E*. The overall best path is the one that minimizes the sum of both costs.

### Algorithm 3 Minimum Distance Query For Road Networks

1: Graph \(\mathcal{N}\), Points of interest \(\mathcal{P}\), Points *S*, *E*, Category *R*_{ i }

2: For each \(n_{i} \in \mathcal{N}\): *n*_{ i }. *c*_{ p } = *n*_{ i }. *c*_{ ¬ p } = *∞*

3: Priority Queue *PQ* = {*S*}, *B* = *∞*, \(\mathcal{T}_{B} =\emptyset\)

4: **while***PQ* not empty **do**

5: \(\mathcal{T}\) = *PQ.top*

6: **if**\(\mathcal{T}. c \geq B\)**then** return \(\mathcal{T}_{B}\)

7: **for** each node *n* adjacent to \(\mathcal{T}. last\)**do**

8: \(\mathcal{T}^{{\prime}} = \mathcal{T}\ \vartriangleright \) (create a copy)

9: **if**\(\mathcal{T}^{{\prime}}\) does not contain a *p***then**

10: **if**\(\exists p : p \in \mathcal{P}\), *π*(*p*) = *R*_{ i } on edge (\(\mathcal{T}^{{\prime}}.last,n\)) **then**

11: \(\mathcal{T}^{{\prime}}.c+ = c(\mathcal{T}^{{\prime}}.last,p\))

12: \(\mathcal{T}^{{\prime}}\leftarrow p,\ PQ \leftarrow \mathcal{T}^{{\prime}}\)

13: **else**

14: \(\mathcal{T}^{{\prime}}.c+ = c(\mathcal{T}^{{\prime}}.last,n\)), \(\mathcal{T}^{{\prime}}\leftarrow n\)

15: **if**\(n.c_{\neg p} > \mathcal{T}^{{\prime}}.c\)**then**

16: \(n.c_{\neg p} = \mathcal{T}^{{\prime}}.c\), \(PQ \leftarrow \mathcal{T}^{{\prime}}\)

17: **else**

18: **if** edge (\(\mathcal{T}^{{\prime}},n\)) contains *E***then**

19: \(\mathcal{T}^{{\prime}}.c+ = c(\mathcal{T}^{{\prime}}.last,E\)), \(\mathcal{T}^{{\prime}}\leftarrow E\)

20: Update *B* and \(\mathcal{T}_{B}\) accordingly

21: **else**

22: \(\mathcal{T}^{{\prime}}.c+ = c(\mathcal{T}^{{\prime}}.last,n\)), \(\mathcal{T}^{{\prime}}\leftarrow n\)

23: *if*\(n.c_{p} > \mathcal{T}^{{\prime}}.c\)

24: \(n.c_{p} = \mathcal{T}^{{\prime}}.c\), \(PQ \leftarrow \mathcal{T}^{{\prime}}\)

25: **endif**

26: **endfor**

27: **endwhile**

The algorithm proceeds greedily by expanding at every step the trip with the smallest current cost. Furthermore, in order to be able to prune trips that are not promising, based on already discovered trips, the algorithm maintains two partial best costs per node \(n \in \mathcal{N}\). Cost *n*. *c*_{ p }(*n*. *c*_{ ¬ p }) represents the partial cost of the best trip that passes through this node and that has (has not) discovered an interesting point yet. After all *k* points (one from each category \(R_{i} \in \mathcal{R}\)) have been discovered by iteratively calling this algorithm, an approximate trip for TPQ can be produced.

*p*that first minimizes cost

*c*(

*S*,

*p*) and then traverses the shortest path from

*p*to

*E*, will return the trip

*S*→

*p*

_{1}→

*n*

_{2}→

*E*with distance of 7.0. However, a better answer is

*S*→

*n*

_{2}→

*p*

_{3}→

*n*

_{2}→

*E*, which is achieved by the algorithm, with distance of 6.6. In Table 1 the priority queue that contains the search paths along with the update to the node partial best costs is listed in a step by step fashion. The pruning steps of the algorithm make it very efficient. A lot of unnecessary search has been pruned out during the expansions. For example, in step 3 the partial path

*Sn*

_{2}

*p*

_{1}is pruned out as

*c*(

*Sn*

_{2}

*p*

_{1}

*n*

_{1}) >

*n*

_{1}.

*c*

_{ p }and

*c*(

*sn*

_{2}

*p*

_{1}

*n*

_{2}) >

*n*

_{2}.

*c*

_{ p }. This algorithm can also be used to answer top

*k*queries. Simply maintaining a priority queue for \(\mathcal{T}_{B}\) and update

*B*corresponding to the

*k*th complete path cost. For example if

*k*= 3, then in step 4 \(\mathcal{T}_{B}\) will be updated to

*Sn*

_{2}

*p*

_{3}

*n*

_{2}

*E*(6. 6),

*Sp*

_{1}

*n*

_{2}

*E*(7) and in step 5 path

*Sp*

_{1}

*n*

_{1}

*n*

_{4}

*E*(8) will be added and the search will stop as by now the top partial path has cost equal to the third best complete path.

Step | Priority queue | Updates | |
---|---|---|---|

1 | Sp | n | |

2 | Sp | n | |

3 | Sn | n | |

B =7, \(\mathcal{T}_{B} = Sp_{1}n_{2}E\) | |||

4 | Sp | B=6.6, \(\mathcal{T}_{B} = Sn_{2}p_{3}n_{2}E\) | |

5 | Sn | B = 6.6, \(\mathcal{T}_{B} = Sn_{2}p_{3}n_{2}E\) | |

6 | Algorithm stops and returns \(\mathcal{T}_{B}\) | B = 6.6, \(\mathcal{T}_{B} = Sn_{2}p_{3}n_{2}E\) |

## Key Applications

## Future Directions

One future direction is to extend the TPQ with additional user specific constraints. For example, one can specify a query, where for each category (or a subset of them) a deadline is set. Visiting a location of a specific category is useful, only if it is done before the specified deadline (Bansal et al. 2004). For example, someone may want to visit a restaurant between 12 noon and 12:30 PM.

From a theoretical point of view, a good direction is to find a tight lower bound for the main memory version of the problem and solutions that match the approximation ratio indicated by this lower bound. Also, it is interesting to investigate if there is a PTAS for this problem.

## Cross-References

## References

- Arora S (1998) Polynomial-time approximation schemes for Euclidean TSP and other geometric problems. J ACM 45(5):753–782MathSciNetMATHCrossRefGoogle Scholar
- Bansal N, Blum A, Chawla S, Meyerson A (2004) Approximation algorithms for deadline-TSP and vehicle routing with time-windows. In: Proceedings of FOCSMATHCrossRefGoogle Scholar
- Christofides N (1976) Worst-case analysis of a new heuristic for the travelling salesman problem. Technical report, Computer Science Department, CMUGoogle Scholar
- Cormen T, Leiserson C, Rivest R, Stein C (1997) Introduction to algorithms. The MIT Press, CambridgeMATHGoogle Scholar
- Dumitrescu A, Mitchell JSB (2001) Approximation algorithms for TSP with neighborhoods in the plane. In: SODA ’01: Proceedings of the twelfth annual ACM-SIAM symposium on discrete algorithms, pp 38–46Google Scholar
- Garg N, Konjevod G, Ravi R (1998) A polylogarithmic approximation algorithm for the group steiner tree problem. In: Proceedings of SODAMATHGoogle Scholar
- Li F, Cheng D, Hadjieleftheriou M, Kollios G, Teng S-H (2005) On trip planning queries in spatial databases. In: SSTD: Proceedings of the international symposium on advances in spatial and temporal databases, pp 273–290Google Scholar
- Ma X, Shekhar S, Xiong H, Zhang P (2006) Exploiting a page-level upper bound for multi-type nearest neighbor queries. In: ACM GIS, pp 179–186Google Scholar
- Myung Y, Lee C, Tcha D (1995) On the generalized minimum spanning tree problem. Networks 26:231–241MathSciNetMATHCrossRefGoogle Scholar
- Papadias D, Zhang J, Mamoulis N, Taom Y (2003) Query processing in spatial network databases. In: Proceedings of VLDBCrossRefGoogle Scholar
- Sharifzadeh M, Kolahdouzan M, Shahabi C (2007) The optimal sequenced route query. VLDB J 17:765–787CrossRefGoogle Scholar
- Shekhar S, Liu D (1997) CCAM: a connectivity/clustered access method for networks and network computations. IEEE TKDE 19(1):102–119Google Scholar
- TSP Home Web Site. http://www.tsp.gatech.edu/
- Yiu M, Mamoulis N (2004) Clustering objects on a spatial network. In: Proceedings of SIGMODCrossRefGoogle Scholar