Introduction

This paper considers the problem of constructing fixed-length routes that start and end at the same location on a map. This task has several practical applications including planning a jogging route in an unfamiliar location, organising a cycling tour, or determining a walking route that allows a user to complete their daily steps target.

In the absence of any restrictions, routes of a prescribed length are easy to create. For example, we may choose to simply travel up and down the same street repeatedly until the required distance is covered. Similarly, we could also perform laps of a small locality. In this work, however, we will consider methods of producing routes that avoid repetition. That is, where possible, users should be discouraged from having to travel along a street or footpath more than once. We will call such routes “round trips”, and give a formal definition in “Problem Definition and Complexity”. Two real-world examples are depicted in Fig. 1.

Fig. 1
figure 1

Example round trips of length \(\approx 10\) km (left) and \(\approx 50\) km (right). The left figure starts and ends at the Arc de Triomphe in central Paris; the right figure starts at the western end of the Gower Peninsula, Wales. Due to the lack of alternative roads in the latter, the proposed round trip involves having to travel along some streets more than once

A common approach for producing round trips in street networks involves using online point-to-point routing services (such as Google Maps) and asking users to manually specify a series of waypoints, which are then linked by a sequence of paths. This forms the basis of the online Strava Routes service, which also allows users to specify preferences for paths involving hills, dirt tracks, and popular exercise trails. Similar schemes are also used by the popular online tools Komoot [19] and BRouter [12]. Typically, these approaches link waypoints using shortest paths, helping the solutions to “feel natural to the user, with no unnecessary detours” [1]. The shortest paths between two waypoints can also be calculated quickly on large maps, often in sub-linear time, using variants of the A* and Dijkstra’s algorithms [4].

A feature of the above approach is that it is the users’ responsibility to choose the appropriate waypoints; consequently, a trial-and-error approach may be required to achieve a round trip of the correct length. In the past five years or so, several tools have been made available that seek to automate this process. In April 2020, for example, Strava introduced a new feature for its paying subscribers that automatically produces round trips close to a desired length from a user-selected starting point [22]. Similar tools are now also available with Garmin Connect [13] and on websites such as RouteLoops [21] and PlotARoute [20]. As commercial enterprises, the algorithms used to produce these round trips are not in the public domain; instead, users send their input via a web request, with solutions then being calculated on the server side.

An open-source heuristic for producing round trips of a prescribed length is provided by the cloud-based application programming interface (API) OpenRouteService [14], using their “round_trip” feature. The basic idea in this algorithm is to generate an n-sided polygon that is roughly regular in shape and whose perimeter is approximately equal to the desired length. The initial bearing of the polygon is selected at random. Each vertex in this polygon is then repositioned over a street or at an intersection, and these are used as intermediate waypoints in the final round trip. When generating paths between these waypoints, efforts are also made to try to avoid using the same street segment more than once. Further details of this approach are given in “Initial Comparison and Setup”.

Another website that contains tools for automatically producing round trips is TrailRouter [15]. In addition to producing fixed-length round trips, this tool also seeks to maximise the amount of green space encountered along a route. The basic version of this algorithm operates similarly to OpenRouteService’s, using \(n=4\) to produce eight separate round trips with initial bearings of \(0^{\circ }, 45^{\circ }, 90^{\circ },\ldots , 315^{\circ }\). A second method is also used in which waypoints are placed on the perimeter of green spaces close to the starting point. In both cases, these waypoints are then linked using shortest paths as before. Weighting coefficients are also specified for different street segments to discourage their repetition and encourage paths that travel through green areas.

Note that all of these existing examples require significant resources to store and process the map database. OpenRouteService and TrailRouter, for example, use their servers to store the entire world map from OpenStreetMap, which contains information on nearly one billion ways [16]. The advantage of doing this is that various server-side actions can be performed that are not usually available on cloud-based APIs. OpenRouteService, for example, uses its map database in conjunction with the open-source Java-based routing engine GraphHopper to “snap” GPS coordinates to the nearest location on a street segment. GraphHopper also contains functionality for adding user-defined weights on street segments, allowing the production of paths that avoid using street segments seen in other parts of the round trip. TrailRouter similarly makes use of GraphHopper and also performs large amounts of server-side preprocessing on the map database to provide “green ratings” for each street segment. Street segments with higher green ratings are then preferred by GraphHopper’s path-finding algorithms.

Related Works

From a research perspective, work in the area of round-trip production is somewhat limited in the literature. Partially related is the 1975 paper of Johnson [24], which reports an algorithm for enumerating all cycles in a graph. In theory, this method could be employed to generate a set of cycles (round trips) on a street network, with the user then selecting a member of this set that contains the start/end point and whose length is close to the desired target length. The algorithm of Yen [35] might also be used for similar purposes by producing solutions one-by-one in increasing order of length until the target length has been reached (see [27], for example). In general, however, the number of cycles in a graph is known to grow at an exponential rate in relation to network size [10], making run times of these algorithms infeasible in most cases. As we will see in “Paper Aims and Contributions”, the exclusive use of cycles in this problem is also usually unsuitable in real-world applications involving street networks. More recently, Chalupa et al. [7] have presented integer programming formulations for the problem of finding the longest cycles in unweighted networks, though similar scaling-up issues to the previous approaches are still apparent.

A heuristic for computing jogging routes on maps was also previously proposed by Gemsa et al. [9]. In their work, street networks are represented as undirected planar embeddings and a solution is built by iteratively adding the boundary edges of different faces until the desired length is reached. A second method similar to the round_trip feature of OpenRouteService is also proposed that produces solutions between a sequence of three or four waypoints, which their heuristic calculates automatically. Solutions are also evaluated according to the amount of repetition, the use of “nice” edges (such as those occurring in green areas), and the number of sharp turns.

In more recent work [26], the authors of this current paper describe two further heuristics for finding fixed-length round trips in undirected street networks. The first operates by constructing a pair of paths between the starting point s and a suitable street intersection t. These s-t-paths are then joined to form a round trip, with the aim being to identify the best option for t. In their heuristic, the authors use the algorithms of [5] to compute pairs of short edge-distinct (or vertex-distinct) s-t-paths. Other approaches might also be used, however, such as those used for finding alternative point-to-point routes in road networks [1, 29]. The second heuristic proposed in [26] is based on local search. In general, this was found to be faster and more accurate than their first heuristic, though its solutions often contained small detours that made them somewhat haphazard in appearance.

Note that the methods in [9, 26] only operate on undirected graphs, making them unsuitable in applications involving one-way streets and unidirectional footpaths. As with the other methods considered in this section, they also depend on having access to the entire street network during execution. In the experiments of [26] this was achieved by using the Python library OSMNX [6] to download and process all street segments within k/2 metres of the given starting point (where k is the desired length of the round trip). For higher values of k, though, this was seen to lead to long download times and large datasets. Using \(k=\) 20,000 m, for example, a location like central London was seen to involve over 90,000 street segments and 70,000 intersections. Similarly, all of the reported experiments of [9] were conducted on a single, locally stored map of Karlsruhe, Germany, which comprised approximately 120,000 street segments and 100,000 intersections.

Paper Aims and Contributions

Our aim in the remainder of this paper is to develop fast and accurate heuristics that produce round trips in directed street networks. We also want these methods to produce solutions from any starting point on the Earth’s street network, but without needing to download and/or store large map databases. They should also be able to cope with a wide range of prescribed distances, from just a few metres to many kilometres, but still operate within timescales that are consistent with online point-to-point routing services (that is, typically a few seconds). To do this, our proposed methods will use very limited numbers of strategically chosen requests to existing cloud-based mapping services. The information collected in these requests will then be used in conjunction with bespoke graph-theoretic operators to strategically build a pool of candidate round trips. Outside of the requests, all computations, such as processes for improving and combining these round trips, are performed on the client.

Our proposed strategy has three main advantages. First, unlike existing methods, it avoids the need of having to host and maintain large databases of map data. Second, it also allows the possibility of using commercial mapping services such as Google Maps which, unlike OpenStreetMaps, can provide live data on traffic conditions and road closures. Finally, adjustments for different modes of travel such as walking, cycling, and driving, can be made by simply changing the parameters of the online requests. The information returned from these requests can then be used to provide the user with information on the elevation, surface type and greenery ratings of the proposed solutions, as demonstrated in “ Conclusions and Further Work”.

Our main contributions are summarised as follows:

  1. 1.

    The paper contains an examination of the computational issues surrounding the production of round trips in street networks, particularly looking at the graph-theoretical features of the problem and its underlying computational complexity. (See “Problem Definition and Complexity”.)

  2. 2.

    A new heuristic, the isochrone-polygon method, is proposed. Using just two or three API requests, this constructs a small pool of solutions that, on average, are superior to those returned by OpenRouteService’s “round_trip” feature. (See “The Isochrone-Polygon Method and “ Initial Comparison and Setup”.)

  3. 3.

    The design and testing of two contrasting methods for improving solutions returned by the isochrone-polygon method. The best of these uses multiobjective optimisation in conjunction with new graph-based neighbourhood operators to produce larger, improved pools of candidate solutions in short amounts of time. (See “Improving Solutions using Multiobjective Local Search”, “Learning Good Routes using Regression” and “Method Comparison”.)

All source code, problem instances, and results datasets used in this work can be downloaded from [28].

Problem Definition and Complexity

Let \(G=(V,A,w)\) be an arc-weighted, directed graph with vertex set V, arc set A, and nonnegative weights w(uv) for each arc \((u,v)\in A\). Street networks can be represented by such graphs using the arcs for individual street segments and the vertices for street intersections and dead ends. Here, weights correspond to travel distances (in metres) between adjacent vertices.

In mapping applications, each vertex \(v\in V\) is associated with a particular GPS coordinate, denoted by \(GPS (v)\). The above model is also usually extended so that each arc \((u,v)\in A\) is defined by a series of GPS longitude/latitude coordinates \((x_1,y_1),(x_2,y_2),\ldots ,(x_l,y_l)\), where \(GPS (u)=(x_1,y_1)\) and \(GPS (v)=(x_l,y_l)\). These coordinates define the curvature of each street segment (uv) and can also be used to calculate w(uv) using:

$$\begin{aligned} w(u,v) = \sum _{i=1}^{l-1} d(x_i,y_i,x_{i+1},y_{i+1}) \end{aligned}$$
(1)

where d(.) gives the geographical distance between two GPS coordinates (which can be reliably approximated using, for example, the haversine formula or Vincenty’s formulae). Often, mapping applications also make use of vertices to represent these intermediate coordinates, meaning that if \((u,v)\in A\) and \((v,u)\in A\) then \(w(u,v)=w(v,u)\). This also allows the vertices of a street map to be partitioned into two classes:

Definition 1

A vertex v is considered intermediate if either of the following applies:

  1. (i)

    v has exactly one incoming arc \((u_1,v)\), exactly one outgoing arc \((v,u_2)\), and \(u_1\ne u_2\); or

  2. (ii)

    v has exactly two incoming arcs, \((u_1,v)\) and \((u_2,v)\), exactly two outgoing arcs \((v,u_1)\) and \((v,u_2)\), and \(u_1\ne u_2\).

If neither of these cases applies, then v is classed as an intersection vertex.

If Case (i) of Definition 1 is satisfied, this means that v represents a coordinate occurring along a one-way street; if Case (ii) is satisfied, then v is a coordinate along a two-way street. In practice, it is possible to reduce the size of such graphs by smoothing the intermediate vertices. This is achieved as follows.

  • For Case (i), the vertex v and arcs \((u_1,v)\) and \((v,u_2)\) are removed and replaced by the single arc \((u_1,u_2)\), with \(w(u_1,u_2)\) set to \(w(u_1,v)+w(v,u_2)\).

  • For Case (ii), the vertex v and arcs \((u_1,v)\), \((u_2,v)\), \((v,u_1)\), and \((v,u_2)\) are removed and replaced by the arcs \((u_1,u_2)\) and \((u_2,u_1)\), with \(w(u_1,u_2)\) set to \(w(u_1,v)+w(v,u_2)\) and \(w(u_2,u_1)\) set to \(w(u_2,v)+w(v,u_1)\).

A demonstration of this smoothing process is shown in Fig. 2. Note that when an intermediate vertex v satisfying Case (ii) is smoothed, this will prevent round trips from being able to feature an arc (uv) followed immediately by the arc (vu). On a street network, this would correspond to a person being asked to switch direction part-way along a two-way street segment, which may not be desirable (or possible) depending on the circumstances at hand.

Fig. 2
figure 2

Demonstration of how graphs (street networks) with intermediate vertices a can be smoothed to make a new, smaller graph (b). These smoothing actions are carried out by the Python library OSMNX, for example, [6]. Note that \(v_1\) has not been smoothed here, despite satisfying Case (i) of Definition 1. This ensures that the structure of the one-way cycle is maintained

Given our arc-weighted directed graph \(G=(V,A,w)\), we now use the following terminology.

Definition 2

A walk is a series of pairwise adjacent vertices in G; a trail is a walk with no repeated arcs; and a path is a trail with no repeated arcs or vertices.

It is also useful to add a prefix u-v to these terms to denote a walk/trail/path that starts at vertex u and stops at v. In cases where \(u=v\), the following terms can also be used.

Definition 3

A u-v-walk/trail/path is considered closed whenever \(u=v\). Closed trails are also known as circuits; closed paths are known as cycles.

As mentioned earlier, our aim in this paper is to produce fixed-length round trips that avoid repetition and that start and end at the same predefined location. One possible problem definition is therefore as follows.

Definition 4

Let \(G=(V,A,w)\) be a directed arc-weighted graph, \(s\in V\) be our start and end vertex, \(k\ge 0\) be our desired length, and let w(uv) denote a nonnegative weight (length) for each arc \((u,v)\in A\). The k-length cycle problem involves identifying a solution \(S=(s=u_1,u_2,\ldots ,u_l=s)\) that is a cycle, and whose length \(L(S)=\sum _{i=1}^{l-1}w(u_i,u_{i+1})\) minimises the cost function \(f_1(S)=|k-L(S)|\).

Solutions under this problem definition naturally prohibit repetition, since cycles do not allow vertices or arcs to be used more than once. In applications with road networks, however, an insistence on using cycles is too strict because the underlying graphs are not necessarily biconnected. Examples of this can be seen in Fig. 2a. First, using \(s=v_1\) we see that only one cycle is possible due to the presence of the articulation point \(v_2\). Similarly, if \(s=v_3\), then only solution \((v_3,v_4,v_3)\) can be formed. A more flexible specification of our problem might therefore be defined as follows.

Definition 5

Given the same input as Definition 4, the k-length circuit problem involves identifying a solution \(S=(s=u_1,u_2,\ldots ,u_l=s)\) that is a circuit, and whose length \(L(S)=\sum _{i=1}^{l-1}w(u_i,u_{i+1})\) minimises the cost function \(f_1(S)=|k-L(S)|\).

Here, the use of circuits instead of cycles permits vertices to be visited more than once, allowing a greater variety of solutions to be considered. Issues remain, however. First, this definition allows “out-and-back” solutions, such as \((v_5,v_6,v_7,v_8,v_7,v_6,v_5)\) in Fig. 2a which, while never using an arc more than once, certainly involves a lot of repetition. Secondly, this definition is still stricter than those used by the existing tools noted in “Introduction”, which often allow street segments to be traversed more than once in the same direction. (In Fig. 2a, for example, this would occur when forming a round trip using the polygon defined by vertices \((v_2,v_8,v_9)\).)

Our chosen problem definition, therefore, considers solutions as closed walks that start and end at vertex s. To avoid repetition while still allowing the necessary flexibility, a second cost function \(f_2\) must now be used. To define this function, let \(S=(s=u_1,u_2,\ldots ,u_l=s)\) be a closed walk, and let E(S) be a multiset containing each occurrence of an arc in S, with its direction removed. For example, the solution \(S=(v_5,v_6,v_7,v_8,v_7,v_6,v_5)\) shown in Fig. 2a leads to the multiset \(E(S)=\{\{v_5,v_6\}:2,\{v_6,v_7\}:2,\{v_7,v_8\}:2\}\), where numbers following colons represent the number of occurrences (multiplicity) of each undirected edge in E(S). The function \(f_2\) now measures the percentage of the overall round trip that involves travelling along street segments (in either direction) that have already been visited:

$$\begin{aligned} f_2(S) = 100 \times \frac{\sum _{\forall (\{u,v\}:x) \in E(S)} (x-1)\cdot w(u,v)}{L(S)}. \end{aligned}$$
(2)

Here, x defines the multiplicity of the edge \(\{u,v\}\) in E(S). Note that measuring the percentage of overlap is preferable to using the total amount of overlap because using the latter would encourage very short round trips. Our final problem definition is therefore:

Definition 6

Given the same input as Definition 4, the k-length round trip (KRT) problem involves identifying a solution \(S=(s=u_1,u_2,\ldots ,u_l=s)\) that is a closed walk of length \(L(S)=\sum _{i=1}^{l-1}w(u_i,u_{i+1})\), and that minimises the two cost functions, \(f_1(S)=|k-L(S)|\) and \(f_2(S)\).

Since \(f_1\) and \(f_2\) measure different characteristics, Definition 6 constitutes a multiobjective problem.

The three problem variants stated in Definitions 46 are all \(\mathcal{N}\mathcal{P}\)-hard. Definition 4 generalises the \(\mathcal{N}\mathcal{P}\)-hard problem of finding the longest cycle in an arc-weighted graph.Footnote 1 In turn, this latter problem generalises the \(\mathcal{N}\mathcal{P}\)-hard problem of finding the longest cycle in an unweighted directed graph which, itself, is a generalisation of the \(\mathcal{N}\mathcal{P}\)-hard Hamiltonian cycle problem in directed graphs. Similar reasoning can be applied with Definitions 5 and 6. Definition 5, which considers circuits, is \(\mathcal{N}\mathcal{P}\)-hard since it generalises the \(\mathcal{N}\mathcal{P}\)-hard problem of identifying a maximum-weight directed Eulerian subgraph in an arc-weighted directed graph [8, 33]. Basagni et al. [3] have also shown that the problem of calculating closed walks of length k is \(\mathcal{N}\mathcal{P}\)-hard in arc-weighted graphs. This latter problem forms a special case of the KRT problem of Definition 6 in which only \(f_1\) is considered, also proving its \(\mathcal{N}\mathcal{P}\)-hardness.

Note that the existing online methods for producing round trips reviewed in “Introduction” are all heuristic-based and, while featuring polynomial-time complexities, are unlikely to produce optimal solutions for most problem instances. This compromise is nearly always necessary with \(\mathcal{N}\mathcal{P}\)-hard problems. These methods also operate under slightly different problem interpretations to ours. In particular, overlap is not measured using a cost function like \(f_2\); instead, attempts are made to avoid overlap by equally spacing out the waypoints on the generated polygon and by using additional weights on arcs to penalise their repeated use in a solution. The effects of these differences are evaluated in “Initial Comparison and Setup”.

The Isochrone-Polygon Method

In this section, we describe our initial algorithm for creating a pool of candidate solutions for the KRT problem. As stated earlier, this will operate by collecting information from a small number of calls to a suitable cloud-based mapping API. All remaining computation is then carried out on the client side.

Similarly to existing approaches, our method operates by constructing polygons on the map surface. The vertices of each polygon then act as a series of waypoints in a round trip. Here, however, adaptations are required because, unlike server-side implementations, most APIs do not provide convenient methods for quickly identifying whether arbitrary GPS coordinates are close to existing streets (and not, for example, in the middle of the sea). In addition, they do not usually provide the functionality for imposing weights on street segments to help avoid their repeated use in solutions.

To cope with these issues, our method begins by generating a (k/2)-metre isochrone I from the user-defined starting location s. Many popular mapping APIs allow the creation of such isochrones in a single request, including Bing Maps, Google Maps, Mapbox, ArcGIS, and OpenRouteService. The returned information is expressed as a list of GPS coordinates that define a polygon containing the parts of the map that are accessible from s by a path of length k/2 metres or less. Note that I can also be used to gauge the accessibility of the region surrounding s. For example, if the area of I is small compared to the area of the (k/2)-radius circle centred at s, this suggests the presence of nearby regions that, while being within k/2 metres (as the crow flies), cannot be accessed along known streets or paths within this distance. An example isochrone is shown in Fig. 3, where s is located on a curving coastline.

Fig. 3
figure 3

Illustration of the isochrone-polygon method. Using a target of \(k=10{,}000\) m, the shaded area in the left image shows a 5000 m isochrone I generated from an address on the coast of southern Wales. The arrow indicates the initial bearing, giving \(\beta =80.3^{\circ }\) and \(k'=2\times 4845 = 9690\) m. The right image shows the first ten polygons generated by Algorithm 1, using \(n=5\)

Having gained the isochrone I, the next step is to generate polygons whose vertices are within the area enclosed by I. These vertices are then used as waypoints that are linked by shortest paths to form a round trip. Our strategy here, which is carried out on the client, is to generate a series of ellipses and, from each one, form a polygon by placing n evenly-spaced vertices along the circumference. The overall process is described by Algorithm 1. As shown, this method requires an initial bearing \(\beta\) and desired perimeter \(k'\) for the generated polygons. These are determined by identifying the most distant point on the boundary of the isochrone (as illustrated in Fig. 3), which is intended to initially encourage the formation of round trips in the most accessible bearing from s. The algorithm then generates the list of polygons L by sweeping through the different bearings surrounding s, and adding polygons to L only when all of their n vertices are contained within I. After each sweep, the generated ellipses are also progressively flattened, stretched, and reduced in size. This is intended to help fit the polygons when the isochrone is irregularly shaped. It also aids the production of a diverse set of polygons. An example of this process is shown in Fig. 3 where, due to the shape of the isochrone, some of the polygons are quite elongated.

Algorithm 1
figure a

Generate polygons

Once the list L has been constructed, further API calls are required to calculate the shortest paths between the successive vertices (waypoints) in each polygon. Let \(L=(P_1,P_2,\ldots ,P_{|L|})\) be our list of polygons, and let \(P_i=(p_{i,1},p_{i,2},\ldots ,p_{i,n})\) denote the vertices of each polygon (where \(p_{i,1}=s\) for all \(i\in \{1,\ldots ,|L|\}\)). A round trip for polygon \(P_i\) is therefore constructed by linking each \((p_{i,j})\)-\((p_{i,j+1})\)-path (for all \(j\in \{1,\ldots ,n-1\})\), followed by a final \((p_{i,n})\)-\((p_{i,1})\)-path. Each of these paths is expressed as a sequence of GPS coordinates that correspond to intermediate and intersection vertices in the underlying street network \(G=(V,A,w)\), as described in “Problem Definition and Complexity”.

It is common for mapping APIs to allow several waypoints to be specified in a single routing request. The free versions of Google Maps, Bing Maps, and MapBox, for example, allow up to \(x=25\) waypoints, while OpenRouteService allows up to \(x=50\). Here, this means that up to \(\lfloor \frac{x-1}{n}\rfloor\) complete round trips can be constructed per request. In cases where waypoints do not correspond to locations on a street, these APIs also adjust the GPS coordinates to the nearest street-side location. Consequently, each polygon \(P_i=(p_{i,1},p_{i,2},\ldots ,p_{i,n})\) maps to a second polygon \(P'_i=(p'_{i,1},p'_{i,2},\ldots ,p'_{i,n})\), which specifies the actual waypoint coordinates used for the round trip. An example is shown in Fig. 4.

Fig. 4
figure 4

The left image shows a round trip generated by one of the polygons from Fig. 3. The polygon with the black outline indicates the positions of the actual waypoints used, \(P'_i=(p'_{i,1},p'_{i,2},\ldots ,p'_{i,n})\), which are slightly different to those of \(P_i\). The three out-and-back sections that occur in this round trip are indicated by the circles. In the right image, these out-and-backs have been removed and, consequently, the waypoints of \(P'_i\) have been further adjusted

The first image of Fig. 4 also indicates a tendency for this scheme to produce “out-and-back” sections in a solution. These are often undesirable because they can involve the user having to perform a U-turn at an awkward location. They also involve the repetition of streets which, according to the KRT problem definition, we would like to avoid. To counter this, an optional step is to shorten each round trip by removing all instances of out-and-backs. This can be achieved by using the arcs of a round trip to define an undirected simple graph. Degree-one vertices in these graphs (not including s) should then be removed one by one until no such vertices remain. The second image of Fig. 4 illustrates the effects of this additional process. In this particular case, the resultant round trip has no repetition.

Initial Comparison and Setup

In this section, we examine the solutions returned by OpenRouteService’s round_trip algorithm and compare their quality to those of our isochrone-polygon method. The overall behaviour of OpenRouteService’s method is described in Algorithm 2.Footnote 2 As described in “Introduction”, this is a server-side operation that starts by generating an n-sided polygon involving vertices \(p_1,\ldots ,p_n\) using a randomly selected initial bearing. A solution S is then formed by creating paths between each successive vertex in this polygon while also trying to avoid using the same street segment more than once.

Algorithm 2
figure b

OpenRouteService round_trip method

Here, our client-side algorithms were programmed in Python 3.8 and were executed on 3.5 GHtz Windows 10 machines with 8 GB RAM [28]. In our implementation of the isochrone-polygon method, we made use of the free and open-source API OpenRouteService. Consequently, each algorithm execution involved making one request to form the isochrone, and one additional request for each set of \(\lfloor \frac{50-1}{n}\rfloor\) round trips produced. For all experiments, we used a random sample of 250 real-world postal addresses from across North America and Western Europe. A single problem instance is therefore defined by an address and a target length k. The set of addresses was generated by selecting random dwellings from the USA, France, UK, Netherlands and Canada, with the number of addresses per country weighted by their relative number of households. This led to 155, 37, 33, 14, and 11 addresses from each country respectively. Addresses for the UK were generated using the tool at [18], those for the USA and Canada came from [23], and the remainder were taken from [17]. The selected travel mode in all experiments was “foot-walking”, though solutions for hikers, drivers and cyclists can be achieved by simply changing the relevant input parameters.

Results

To assess the performance of Algorithm 2, trials on our random sample were performed using values of \(k\in \{\)1000, 5000, 10,000, 25,000, 50,000\(\}\) (metres) and \(n\in \{3,4,\ldots ,8\}\). For each (kn) pair, five calls per address were then made to OpenRouteService’s round_trip feature to produce up to five different round trips. Figure 5 summarises the results of these trials. The top row shows the lengths of the returned solutions, expressed as the percentage difference to the target k; the bottom row shows the corresponding values for the percentage overlap (\(f_2\), Eq. (2)). Each box is formed from 250 values, corresponding to the mean of the five values for each address.

Fig. 5
figure 5

Accuracy of Algorithm 2 for differing values of k and n. Each point is the mean across the solutions returned from five calls to OpenRouteService’s round_trip feature

Figure 5 indicates a tendency for Algorithm 2 to return solutions that exceed the target length k. This is quite natural since the distances between successive vertices in the polygon are calculated as straight lines whereas the physical paths between these are nearly always longer. In this sense, the perimeter of the polygon can be seen as a lower bound on the length of the resultant round trip. We also see that this effect worsens for higher values of n. In these cases, the algorithm must try and produce edge-distinct paths between a greater number of waypoints, which tends to result in lengthened round trips.

The bottom row of Fig. 5 shows how the amount of overlap also tends to increase for larger values of n. This is because, with a greater number of paths being generated, the potential for overlap increases. On the other hand, larger values of k result in less overlap, because the resultant round trips span a larger geographical area. Also observe that, on rare occasions, overlap costs of 50% are returned. In these cases, all streets are visited twice, indicating that the round trip is an out-and-back or (more generally) a tree.

Figure 6 shows the corresponding results for our isochrone-polygon method (without out-and-back removal). As with Algorithm 2, there is also a tendency for solutions to be too long here, with higher values of n worsening this effect. Larger amounts of overlap are also apparent compared to Fig. 5 because, unlike Algorithm 2, this method contains no mechanism for avoiding the repetition of street segments. Overlap is also seen to worsen for smaller values of n, because the acuter angles that occur in the generated polygons tend to encourage more out-and-backs in a solution.

Fig. 6
figure 6

Accuracy of the isochrone-polygon method (without out-and-back removal) for differing values of n and k. Each point is the mean across the first five solutions produced at each address

Fig. 7
figure 7

Accuracy of the isochrone-polygon method (with out-and-back removal) for differing values of n and k. Each point is the mean across the first five solutions produced at each address

Figure 7, however, demonstrates that the removal of out-and-backs from the isochrone-polygon method’s solutions alleviates these issues. Specifically, their removal shortens round trips, helping to counter the overestimation seen in the previous results. Simultaneously, this removal also reduces the amount of overlap, helping to improve \(f_2\). To show these differences more clearly, Fig. 8 shows a direct comparison of the mean values achieved for \(f_1\) and \(f_2\) by each method. For \(f_1\), the isochrone-polygon method with out-and-back removals is clearly favourable, returning the most accurate solutions across all tested values of k and n. For \(f_2\), the isochrone-polygon method (without out-and-back removal) is by far the worst, while no significant difference was observed between the remaining methods.Footnote 3

Fig. 8
figure 8

Mean accuracy of Algorithm 2, the isochrone-polygon method (IP) and the isochrone-polygon method with out-and-back removal (IP w/ R). Note the differences in scales for cost function \(f_1\)

In addition to the greater accuracy of the isochrone-polygon method (with out-and-back removal), also observe that it operates using fewer calls to the API. As described, Algorithm 2 requires one call per solution. On the other hand, following the isochrone request, our method can produce up to \(\lfloor \frac{50-1}{n}\rfloor\) different solutions per call. Consequently, all results from the isochrone-polygon method presented here were found using just two calls per run.

Improving Solutions Using Multiobjective Local Search

In the previous section, we saw that the isochrone-polygon method (with out-and-back removal) can produce several high-quality candidate solutions to the KRT problem using only a small number of API calls. In this section we show how parts of these candidate solutions can be combined by the client to form further solutions, helping to increase algorithm accuracy.

Fig. 9
figure 9

(Left) Twelve round trips starting in central Frankfurt using \(k=5000\) m. (Right) The corresponding smoothed, arc-weighted directed graph \(G=(V,A,w)\)

To illustrate this, Fig. 9 shows twelve different round trips generated by the isochrone-polygon method. Each of these solutions is represented by a sequence of GPS coordinates; that is, intermediate and intersection vertices in the underlying street network. Consequently, we can combine these and use the smoothing actions from “Problem Definition and Complexity” to construct a corresponding arc-weighted, directed graph \(G=(V,A,w)\), as shown in the figure. This means that attempts can now be made by the client to find new high-quality solutions for the KRT problem using G. Note that G does not contain all streets and intersections within k/2 metres of the starting vertex s; instead, it only includes the streets and intersections used in the supplied set of round trips. This leads to smaller graph sizes but, potentially, less accuracy. On the other hand, doing this eliminates the need for storing and/or downloading large street network databases. Here, we make use of both smoothing actions seen in “Problem Definition and Complexity”. In particular, the application of Case (ii) is desirable as it prevents the formation of solutions that will users to perform U-turns midway along a street. This helps to prevent the formation of out-and-backs in a solution.

Here, our approach is to generate new solutions for G using local search, a general-purpose methodology that seeks to identify high-quality solutions within a space of potential solutions. Local search operates by moving among solutions within this space using a neighbourhood operator that makes small alterations to the current solution. Because the KRT problem is multiobjective, here we choose to make use of the Pareto local search framework of Paquete et al. [31] given in Algorithm 3. As with most multiobjective optimisation algorithms, this uses the concept of dominance for distinguishing between solutions. For the KRT problem, which uses the two cost functions \(f_1\) and \(f_2\), this is defined as follows.

Definition 7

Let S and \(S'\) be two candidate solutions to the KRT problem. Also, assume (as is the case here) that we are seeking to minimise the cost functions \(f_1\) and \(f_2\). Then S is said to dominate \(S'\), written \(S\preceq S'\), if and only if:

  • \(f_1(S) \le f_1(S')\) and \(f_2(S) < f_2(S')\), or

  • \(f_1(S) < f_1(S')\) and \(f_2(S) \le f_2(S')\).

If neither \(S\preceq S'\) nor \(S'\preceq S\), then solutions S and \(S'\) are said to be mutually non-dominating.

Algorithm 3
figure c

Pareto local search

As shown, Algorithm 3, starts with an archive set \({\mathcal {A}}\) of mutually non-dominating solutions. Each of the solutions in this archive is then marked as unvisited. In each iteration, an unvisited solution S is selected from the archive set, and new solutions \(S'\in N(S)\) are generated using a neighbourhood operator (see below). These new solutions are then added to \({\mathcal {A}}\) if they are not dominated by a solution already in the archive. In addition, if the new solution is seen to dominate any solutions currently in the archive, these are deleted from \({\mathcal {A}}\). This process continues until all solutions in \({\mathcal {A}}\) have been visited. The output \({\mathcal {A}}\) is a set of non-dominating solutions that are equal to, or an improvement on, the input set. Note that, under just one cost function, Algorithm 3 is equivalent to the steepest descent local search algorithm. Like steepest descent, the total number of iterations performed by the algorithm also depends on the number of improving moves that are carried out, which cannot be predicted.

Neighbourhood Operator

The solution space for the KRT is the set of all walks that start and end at vertex s. Neighbourhood operators for this problem should therefore take a member S of this space and make alterations that result in a new, different solution \(S'\). In this sense, we say that \(S'\) is a member of the set of solutions that are neighbours of S, written \(S'\in N(S)\).

To define the neighbourhood operator, let \(S=(s=u_1, u_2,u_3,\ldots ,u_l=s)\) be a candidate solution, written as a sequence of vertices in G. The arcs of S are therefore defined by the multiset \(\{(u_i, u_{i+1}): i\in \{1, 2,\ldots ,l-1\}\}\). An intuitive neighbourhood operator is to now take two vertices \(u_i,u_j \in S\) (where \(i<j\)), remove the corresponding \(u_i\)-\(u_j\)-walk in S, and then form a new (neighbouring) solution \(S'\) by adding a new, different \(u_i\)-\(u_j\)-walk. This new walk can be generated using path-finding algorithms such as breadth-first search, depth-first-search, or randomised variants of these.

Our proposed neighbourhood operator extends this idea by allowing several new solutions to be evaluated following a single application of a path-finding algorithm. The idea is to select a vertex \(u_i\in S\) and try to produce paths from this to all other vertices \(u_{j>i}\in S\) without using the arcs currently in S. It proceeds as follows. Given G, S, and a vertex \(u_i \in S\), a new graph \(G'\) is first constructed. This is simply a copy of G with all the arcs of S removed. Next, an additional dummy vertex \(u'_i\) is added to \(G'\), and all incoming arcs of \(u_i\) are redirected to \(u'_i\) (that is, all arcs of the form \((v,u_i)\) are replaced by \((v,u'_i)\)). Finally, a path-finding algorithm is used to form a directed tree T in \(G'\) that defines paths from \(u_i\) to all reachable vertices in the set \(\{u_{i+1},u_{i+2},\ldots ,u_l,u_i'\}\). Figure 10 shows example applications of this neighbourhood operator. In particular, Part (d) illustrates the effects of adding a new \(u_i\)-\(u'_i\)-path to the solution. This equates to adding a new sub-cycle at vertex \(u_i\).

Fig. 10
figure 10

Part a shows an example graph G with solution S (in blue) and a selected vertex \(u_i \in S\). Part b shows the graph \(G'\) formed with respect to G, S, and \(u_i\). Part c shows a solution \(S'\), formed by replacing the \(u_i\)-v path in S by a \(u_i\)-v in \(G'\). Part d shows the solution formed by adding to S the cycle defined by a \(u_i\)-\(u_i'\) path in \(G'\)

Note that Line 4 of Algorithm 3 involves evaluating all members of the set N(S). This involves applying our neighbourhood operator for each vertex \(u_i \in S\). A full description of how this is done is given by Algorithm 4.

Algorithm 4
figure d

Evaluate neighbourhood set

Learning Good Routes Using Regression

An alternative to using local search with the KRT problem is to employ a method that makes a series of calls to the mapping API. At each iteration, this method should then use information gained in previous calls to try to identify better waypoints for the construction of solutions. Recall from “The Isochrone-Polygon Method” that the isochrone-polygon method operates by first creating a polygon \(P=(p_1,\ldots ,p_n)\), where each vertex \(p_i\) is at a particular GPS coordinate. P is then used with the API to construct a round trip solution S. In turn, this process gives a second polygon \(P'=(p'_1,\ldots ,p'_n)\) where each \(p'_i\) corresponds to the actual waypoints used for S. At this point, we might now choose to use S and \(P'\) to inform the production of a new polygon for use with the API. Specifically, if the current solution S is seen to be too short (that is, \(L(S) < k\)), then this new polygon should be an enlarged version of \(P'\). Similarly, if \(L(S) > k\), the new polygon should be a shrunken version of \(P'\).

A description of our proposed method is given in Algorithm 5. As shown, the process starts by using an initial polygon \(P_1\) to produce a solution \(S_1\) and its actual waypoints \(P'_1\). At each iteration \(i\ge 2\), the information gained in iterations \(1,\ldots ,i-1\) is then used to build a regression model that is used to calculate the perimeter x for the new polygon \(P_i\). This polygon is then used to produce \(S_i\) and \(P'_i\). Here, our proposed method uses a simple linear (least-squares) regression model for determining the perimeter of new polygons. A demonstration of how x is calculated with this linear model is shown in Fig. 11a. Other regression models may also be used in practice, though we must bear in mind that the number of data points in these models will be rather small due to the limited runtimes being imposed.

Algorithm 5
figure e

Regression-based improvement

Fig. 11
figure 11

Part a demonstrates how the perimeter x for a polygon \(P_{i}\) (where \(i\ge 2)\) is calculated in Algorithm 5. As shown, the previously-observed solutions (along with the origin (0, 0)) are used to generate a line of best fit. This is then used with k to determine the required value x. Part b demonstrates the calculation of the normalised S-metric using an archive set of mutually non-dominating solutions. The value of the S-metric corresponds to the area of the shaded part

As noted previously, our chosen API allows the production of several solutions per call. Here, we therefore use Algorithm 1 to produce up to \(\lfloor \frac{50-1}{n}\rfloor\) initial polygons. A separate regression model is then maintained for each of these initial polygons, with \(\lfloor \frac{50-1}{n}\rfloor\) new solutions being produced at each iteration. As with our local search method, each solution produced is then considered for inclusion in the final archive set \({\mathcal {A}}\) using the dominance criteria given in Definition 7.

Method Comparison

In this section, we compare the performance of the isochrone-polygon method, multiobjective local search, and the regression-based improvement algorithm. In all trials reported here, out-and-back removal was employed, and \(n=4\) vertices were used in the generated polygons, allowing up to twelve separate round trips to be generated in each API call. The remaining experimental details are the same as “Initial Comparison and Setup”.

Two variants of the isochrone-polygon method are considered here. After determining the isochrone, the first of these, IP(12), uses just one further call to produce twelve solutions. These are then used to form the archive set \({\mathcal {A}}\) of mutually non-nominating solutions. Similarly, the second variant, IP(24), uses two calls to produce 24 solutions for the construction of \({\mathcal {A}}\). Our local search algorithm extends this approach by using these solutions to form the arc-weighted graph \(G=(V,A,w)\) as described in “Improving Solutions using Multiobjective Local Search”. This graph, together with \({\mathcal {A}}\), is then used as input for Algorithm 3. These algorithms are labelled LS(12) and LS(24) in the following tables and figures. Note that the local search method makes no further calls to the API during execution.

In our trials, all applications of local search used breadth-first search (BFS) for producing the tree T (as seen in Line 3 of Algorithm 4). This path-finding algorithm identifies paths between vertices that contain the fewest possible number of arcs. In our trials, we also experimented with a randomised version of BFS in which vertices were removed from random positions in the BFS queue (as opposed to just the head of the queue); however, no significant differences in performance were observed between these variants.Footnote 4 Hence, only results from the BFS variant are reported here. Using our arc-weighted graph \(G=(V,A,w)\), the complexity of BFS is \({\mathcal {O}}(|V|+|A|)\); consequently, each application of Algorithm 4 (and iteration of Algorithm 3) has complexity \({\mathcal {O}}(|S|(|V|+|A|))\).

Finally, for the regression-based improvement algorithm (RI), an iteration limit of \(l=10\) was used in all trials. This means that, including the production of the isochrone, eleven calls to the API are required per run. (Additional iterations of this algorithm were not seen to make further improvements to solutions in general.)

Algorithm Accuracy

To compare the quality of the solutions produced by our five algorithm variants, we use the well-known S-metric [2]. This measures the quality of an archive set \({\mathcal {A}}\) by calculating the volume of the space existing between \({\mathcal {A}}\) and a single reference point. A demonstration is provided in Fig. 11b. To be valid, the reference point must be dominated by all solutions in the archive. Here, reference points for each problem instance were calculated by examining the returned archive sets of each algorithm and then taking the maximum observed values for both cost functions. As shown in Fig. 11b, volumes were then normalised to assume a value between zero and one, with larger values reflecting a higher-quality archive set.

Fig. 12
figure 12

S-metric values for the five algorithm variants for differing values of k. Each box plot is formed from 250 data points, one per address

Figure 12 summarises the S-metric values achieved by the five algorithms for various values of k. As might be expected, for all values of k the isochrone-polygon method gives significantly better archive sets when the number of solutions it initially produces is doubled from 12 to 24. The addition of local search also makes large improvements to these results, with the 24-solution variant producing significantly better results than the twelve-solution variant for all values of k. In our trials, the 24-solution variant of local search was also seen to produce significantly better results than the regression-based improvement method for all values of k, although the latter was also seen to be a significant improvement on both isochrone-polygon variants. A summary of these S-metrics is also shown in Table 1, indicating the superiority of the archive sets returned by the 24-solution variant of the local search algorithm.Footnote 5

Table 1 S-metric values, percentage deviations between k and \(f_1\), and final archive set sizes from the five algorithm variants for differing values of k

Note that, unlike the local search algorithm, the regression-based improvement method does not optimise \(f_2\) directly during execution. It is therefore relevant to also compare algorithm accuracy according to cost function \(f_1\) only. To do this, Table 1 also gives information on the minimum observed value for \(f_1\) in the archive sets. (Specifically, for each algorithm this is expressed as a percentage deviation between the \(f_1\) value and k, averaged across the problem instances.). This reveals no significant differences between LS(24) and RI for any of the tested values of k, suggesting that, while these algorithms have similar capabilities for reducing \(f_1\), the local search method is more effective at also reducing \(f_2\).

Archive Sizes and Execution Times

In comparing these algorithms, it is also useful to look at the sizes of the archive sets produced, since larger values for \(|{\mathcal {A}}|\) will give users a wider choice of solutions. Table 1 summarises this information. As shown, the best results are again returned by the local search methods—indeed, across all values of k, both variants of local search were seen to produce significantly larger archive sets than the regression-based improvement method.

Table 2 Run times (in seconds) of the three optimisation algorithms for differing values of k

Table 2 summarises the run times of the three optimisation algorithms with and without latency (here latency refers to the time taken between sending the API request and receiving the response). As shown, in both cases the regression-based improvement method takes the longest to execute due to the larger number of requests made and its associated bookkeeping. The 24-solution variant of local search also takes slightly longer than the twelve-solution variant in both cases because (a) it makes an additional call to the API, and (b) the larger number of solutions leads to graphs \(G=(V,A,w)\) that have more vertices and arcs. That said, across our experiments with local search, 96% of runs were seen to terminate in less than six seconds, including latency.

Table 2 also demonstrates that run times lengthen when longer round trips are required. Some of these increases are due to the API, which requires more server-side computation time for calculating paths between distant waypoints. An extreme result was also witnessed for one address in our sample, located in rural Alberta, Canada. This particular case featured a very irregularly shaped isochrone whose outline tended to closely follow the small number of roads in the locality. As a result, it took Algorithm 1 over 100 s to identify suitable polygons for this case. One way of rectifying this problem would be to simply reduce the number of increments in each for-loop of the procedure.

Conclusions and Further Work

This paper has proposed several fast-acting heuristics for the \(\mathcal{N}\mathcal{P}\)-hard k-length round trip (KRT) problem. Unlike existing approaches, these algorithms involve only small amounts of data transfer and do not need to store large map databases; however, they are still able to produce accurate solutions in short amounts of time (typically a few seconds), even over large geographical areas. “Initial Comparison and Setup ” has demonstrated that our isochrone-polygon method produces solutions of greater accuracy than OpenRouteService’s round_trip algorithm. “Method Comparison” has also shown that solution quality can be further improved using a fast, bespoke local search algorithm (and to a lesser extent, a regression-based improvement method).

During our research, we explored the idea of replacing the local search procedure with a variant of Johnson’s algorithm that enumerates all cycles in the directed graph G that contain s. Given excess time, this approach is guaranteed to find the optimal (Pareto) archive set of cyclic solutions. However, it suffers from two drawbacks. First, the method only considers cycles which, as noted in “Problem Definition and Complexity”, is usually too restrictive for practical applications. Second, the run times of this approach are often excessive because the number of cycles usually rises exponentially with respect to graph size.

Fig. 13
figure 13

Screen shot of the output produced by our implementation [28]

The screenshot in Fig. 13 shows an example output from our downloadable implementation [28]. This output is written in html and viewed in a web browser. In addition to the interactive visualisations of round trips, other information is displayed in this output including elevation profiles, surface types, and greenery ratings. In future work, our multiobjective algorithm could be modified so that these features are also optimised. For example, additional cost functions might be used to help identify round trips that minimise the amount climbed or maximise the amount of paved surfaces. Other factors, such as the amount of lighting [30], the required number of street crossings [11], or the predicted traffic density, might also be considered, providing that this real-world information can be reliably determined.

Related to this, it would also be useful to further consider the aesthetics of a solution. Rossit et al. [34], for example, have noted that visual attractiveness is an important factor in vehicle routing problems, with users often preferring solutions that are judged to be compact and logical. Again, such factors could be incorporated into our optimisation methods using additional cost functions. These might involve comparing a solution’s similarity to a standard geometric shape, or measuring how easy it is for a user to memorise its directions. It would also be useful to constrain round trips so that they do not stray too far from their designated starting point. This could be used during pandemics, for example, when governments urge the public to stay within limited distances of their homes.

As noted in “Initial Comparison and Setup”, the algorithms presented here have been implemented in Python. We imagine that it be straightforward to reimplement these methods in JavaScript so that clients could download and execute these algorithms directly in a web browser. Shorter run times are also to be expected if compiled programming languages such as Java, Go, or C++ were to be used [32].