An Improvement Heuristic Based on Variable Neighborhood Search for Dynamic Orienteering Problems with Changing Node Values and Changing Budgets

We study the Dynamic Orienteering Problem (DOP) with changing node values and changing budgets. It is a complex combinatorial optimization problem with many applications, e.g., in tour planning. To solve the DOP, an improvement heuristic based on Variable Neighborhood Search VNSDOP\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {VNS}_{\mathrm {DOP}}$$\end{document} is proposed. In addition, three methods for handling solutions that became invalid by budget changes are presented. Heuristic VNSDOP\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {VNS}_{\mathrm {DOP}}$$\end{document} is experimentally compared with two improvement heuristics based on state-of-the-art algorithms for the static Orienteering Problem. In addition, the influence of the three invalid solution handling methods on the algorithms’ optimization behavior is evaluated experimentally. For the experiments, benchmark instances as well as instances generated from existing road networks are used. As a quality measure for the algorithms, their performance over time is used. The results show that both types of dynamic changes, i.e., changes in the node values and changes in the budget, lead to higher volatility in the results for all compared algorithms. However, the latter type has a more negative effect on the performance. Out of the compared algorithms, the proposed heuristic VNSDOP\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {VNS}_{\mathrm {DOP}}$$\end{document} obtains the best results in most cases on a variety of problem instances with dynamic node values and dynamic budgets, showing that it has a high performance over time in dynamic environments and is able to deal with different levels of dynamic changes. For DOPs with changing budgets, the invalid solution handling method that repairs solutions by fixing the violation of the budget constraint as fast as possible performs best for the considered algorithms.


Introduction
The orienteering problem (OP) is a combinatorial optimization problem where a simple cycle in a given undirected graph with edge weights and node values has to be calculated such that the total value of all nodes in the cycle is maximal while the sum of edge weights must not exceed a given threshold (the "budget"). The name of the problem comes from a sport with the same name, where players need to navigate to various check points as quick as possible by means of a compass and a map [51]. The underlying combinatorial optimization problem has applications in many areas, e.g., in the planning of city trips [52] and fuel delivery routes [14], in agriculture [32], or for planning surveillance activities in military scenarios [53]. In many of the applications where the OP occurs, the nodes correspond to locations in a city or a region and the calculated solution corresponds to a route on a road network. This paper considers a dynamic version of the Orienteering Problem (DOP) where parts of the problem change over time. For example, the nodes to be visited can change their value (measuring, e.g., their attractiveness or urgency) during the optimization process. This case is relevant for practical applications because in each of the examples mentioned above it might be possible that unexpected changes in node values occur after which a planned route has to be revised without exceeding the time limit (or violating other problemspecific constraints). For example, customers might change their order and thus their payment value, tourist spots might change their attractiveness over the course of a day or emergencies might occur that change the urgency with which locations have to be visited. Another case considered in this work is that the "budget", i.e., the threshold that limits the sum of edge weights, changes over time. This can happen in the aforementioned scenarios when, for example, tourists change their preferences regarding the length of the tour or when changes in the operational strategy of a business lead to less time being available for transportation and logistics.
Due to the dynamic nature of the DOP, it is necessary to develop algorithms that are able to quickly adapt existing solutions in a dynamic environment. For this reason, we consider "improvement heuristics" that improve or adapt a given cycle (path) as opposed to heuristics that independently construct a new solution without a given initial solution ("standalone algorithms"). Improvement heuristics can be easily integrated into existing tour planning systems allowing them to improve precalculated solutions and update them if changes occur.
In particular, an improvement heuristic is proposed that is based on Variable Neighborhood Search. This heuristic -called VNS DOP -is experimentally compared with two other improvement heuristics that are based on state-ofthe-art methods for the Static Orienteering Problem. The performance of these three algorithms is compared by using benchmark instances that have also been used, e.g., in [27] as well as with instances that are generated from existing road networks.
This paper builds on a preliminary conference version [31] and extends its results as follows. All experiments with dynamic node values are redone, however, on a larger scale, and an empirical analysis of the algorithms performance regarding the influence of the dynamics in node values is conducted. Furthermore, this work additionally considers changes in the budget as a new type of dynamic. In addition, it investigates the suitability of different newly proposed methods for handling invalid solutions that became invalid due to budget changes. For this type of dynamic, a new set of experiments is performed using a new set of DOP instances to investigate the effect of budget changes on the optimization behavior and to compare the performance of the algorithms for the DOP with this type of dynamic. Additional literature on recent research about the OP, its variants and related dynamic optimization problems as well as extended explanations have also been integrated into the manuscript.
The remainder of this paper is structured as follows. A short overview on existing works on Orienteering Problems, their dynamic variants and other related problems is given in Section Related Work. A formal description of the considered Dynamic Orienteering Problems is presented in Section Problem Description. The proposed algorithm VNS DOP is introduced in Section Variable Neighborhood Search and methods for handling invalidated solutions caused by dynamic changes are presented in Section Handling Methods for Invalidated Solutions. The computational experiments and their results are described in Section "Computational Evaluation". Conclusions are given in Section "Conclusion".

Related Work
One of the earliest works dealing with the"Orienteering Problem"(OP) is [51] where the name for the problem was introduced as the problem is similar to a sport with the same name. Other names for the OP are Selective TSP [29], Maximum Collection Problem [24], or, due to its structural similarities to Vehicle Routing Problems, Vehicle Routing Problem with Profits [5]. There exist several possible formulations of the OP as a linear program (see, e.g., [22,29]).
The OP is known to be NP-hard [14] and various heuristics for calculating approximate solutions have been developed for this problem. One of them is the Greedy Randomized Adaptive Search Procedure (GRASP) which has multiple variants, such as Memetic GRASP [33] or GRASP with path relinking [8]. The latter is outperformed by a newer GRASP heuristic that removes path segments [25] as well as by an Evolutionary Algorithm [27]. Other heuristics include an Ant Colony Optimization (ACO) for a multi-objective version of the OP [45], an ACO combined with machine learning [49], an Evolution-inspired algorithm with Hill Climbing [40] or an approximation algorithm for the case of an OP with an underlying directed graph and unit values for the nodes [37].
In the literature, there exist numerous studies on dynamic optimization problems as they are usually closer to many real world applications. For example, several for swarm intelligence algorithms, such as Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO), have been developed for dynamic optimization problems. These heuristics that were originally developed for static optimization problems have been adapted to deal with the changing properties of the dynamic problem instances. Due to dynamic changes, a found solution may not be valid anymore or an optimal solution might not be optimal anymore. For ACOs several strategies have been proposed to deal with dynamic problems, such as detecting a change in the problem instance SN Computer Science and performing a partial restart of the algorithm [16,18], maintaining diversity within the solutions that are constructed during execution by regularly exploiting the solution space [35], multiple population methods [36], or using memory schemes to utilize the already known good solutions to construct an adjusted solution for the changed problem instance [17]. More information can be found in the survey of Mavrovouniotis et al. [34].
Not only swarm intelligence algorithms, but also other metaheuristic methods such as Variable Neighborhood Search (VNS) are well suited for dynamic optimization. Sarsola et al. [44] propose a VNS to solve the Dynamic Vehicle Routing Problem (DVRP) in which a fleet of vehicles need to supply several customers at minimal cost and new customers may be added that need to be supplied. The VNS outperforms an ACO, a Genetic Algorithm, and a Tabu Search algorithm if one considers the best over all runs and is outperformed by the Genetic Algorithms when considering the average performance over all runs in most of the test instances. Khoudjia et al. [26] compared a VNS with PSO algorithms and found that the VNS computes better solutions on average and outperforms the PSO on larger instances, whereas the PSO is more effective at computing new best solutions.
The OP combines aspects of the well-known Traveling Salesperson Problem (TSP) and the Knapsack Problem (KP). The TSP has been intensively investigated in the literature and various studies deal with dynamic variants of the TSP (DTSP). Dynamic changes in the TSP can be the removal or addition of vertices as well as changes in the distance between vertices or in the traversal times assigned to the edges. Schmitt, Baldo and Parpinelli [46] propose an Ant System with a short-term memory for DTSP. Chowdhury et al. [9] also modify an ACO framework with an Adaptive Large Neighbourhood Search to generate new solutions for the DTSP by destroying and repairing relatively large fractions of the current solution. Strak et al. [48] introduce a particle swarm optimization algorithm for the DTSP, which is competitive with two ACOs.
The Dynamic Knapsack Problem (DKP) deals, e.g., with a changing capacity of the knapsack. Roostapour, Neumann and Neumann [42] introduce single-and multi-objective Evolutionary Algorithms to account for the changing knapsack capacity and show that the multi-objective approach outperforms the single-objective approach. The authors strengthen their results and expand their method in [43]. Assmim et al. [6] also use single-and multi-objective Evolutionary Algorithms to deal with a knapsack with changing capacity as well as stochastic and unknown item weights. Their results state that bi-objective optimization already outperforms single-objective optimization.
As for dynamic versions of the OP, there exist several OP variants which contain probabilistic elements. These problems are known as Stochastic OPs and can be interpreted as a specific type of dynamic OP. In one variant [3] it is possible that nodes randomly become unavailable after a path has been chosen so that some of the nodes have to be skipped. In this problem, the difference between the expected total value of the path and the expected length of the path is to be maximized. To solve the problem a Mixed-Integer Linear model and a metaheuristic have been proposed. In another variant the node values are randomized with normal distributions and the probability is to be maximized that the total value of a path exceeds a given target value. For this problem, an exact algorithm as well as a Genetic Algorithm are presented in [20]. In [56] the sale of books in the context of a university campus is presented as a Stochastic OP. It is modelled as a Markov Decision Process and approximately solved using an Approximate Dynamic Programming algorithm while allowing the route to dynamically change while the path is traversed.
Campbell, Gendreau and Thomas [7] introduce a VNS algorithm for the Orienteering Problem with stochastic travel and service times that achieves results similar to a Dynamic Programming approach but is significantly faster at computing solutions. Papapanagiotu, Montemanni and Gambardella [41] also propose a VNS algorithm for the Stochastic OP and expand their approach by embedding a sampling-based objective function evaluator. Solving the Stochastic OP is also important in robotics, as stochastic travel times can be used to model uncertain terrain. The approach of Thayer [50], for example, calculates for each vertex where an irrigation robot should go next in a vineyard to account for a battery constraint.
There also exist (true) dynamic OP variants in the literature where the problem instance changes over time. One example is the OP with Time Windows (OPTW) where time intervals are given that represent business hours at which certain nodes have to be visited (see, e.g., the surveys in [13,15] or the application to the city of Tehran in [1]). Another example is studied in [12] where the weights of the edges (representing travel times along the edges) dynamically change depending on the departure time from the starting node. A dynamic OP where node values linearly decrease depending on when nodes are visited is studied in [11]. However, in these works the edge weights and node values are dependent on the time at which a given node or an edge is visited, which primarily depends on the distance previously traveled on a given path. To the best of our knowledge, there exist no previous works that consider a (time-)dynamic OP where the instance dynamically changes during the optimization process.

Problem Description
The Orienteering Problem (OP) studied in this work is specified by a tuple (G, d, B, T, v 0 , s) consisting of an undirected, connected, simple graph G = (V, E) with node set V, edge set E ⊂ {{u, v} | u, v ∈ V} , a function d ∶ E → ℝ ≥0 that assigns to each edge e ∈ E a value d(e) > 0 that can be interpreted as the distance to traverse that edge, a positive number B ("budget"), a time interval T ("time horizon"), a node v 0 ∈ V ("depot"), and a value function s that assigns to each node v ∈ V a non-negative value s(v) ("score","profit"). It is assumed that function d satisfies the triangle inequality so that detours in a path increase its length, as is common in many real-life applications.
In this work, a closed path P is a sequence (v 0 , v 1 , … , v n , v 0 ) of nodes of G where each node occurs at most once except the depot v 0 which occurs exactly twice forming the start and the end of the path. The length of P is defined as the sum of the distances between the nodes v i and v i+1 (i ∈ {0, 1, 2, … , n − 1}) plus the distance between the nodes v n and v 0 where the distance between two nodes u and v is defined as the length of a shortest path between u and v. The length of a path is the sum of the lengths of its edges. Since the triangle inequality holds for d it follows that if {u, v} ∈ E , the length of a shortest path between u and v equals d ({u, v}) . In the following, the length of a closed path P is denoted by Length(P) . The value of a closed path P is the sum of the values s(v i ) for all nodes v i (i ∈ {0, 1, 2, … , n}) . The value of P is denoted by Value(P) . The objective of the OP is to find a closed path P in G that maximizes Value(P) while satisfying Length(P) ≤ B.
For the dynamic version of the OP, the Dynamic Orienteering Problem (DOP), parts of the problem can change over time. In this work, we consider two types of dynamic changes that can occur. With the first type, the values of the nodes change over time, i.e., the value function s is a (time-) dynamic function s ∶ V × T → ℝ ≥0 where T is a time interval. With the second type of dynamic, the budget changes over time, meaning that B is a function B ∶ T → ℝ ≥0 . The analysis in this work focuses on dynamics for these two components as they form the main features of the OP. For the dynamic OP, Value(P, t) is the value of P at time t for t ∈ T . If the time value t is clear we omit it and just use Value(P).
Note that DOPs where only the node values change have the property that a solution P that is valid at a time t (i.e., Length(P) ≤ B(t) ) cannot become invalid over time. Whereas for DOPs with changing budgets it is possible that a solution P is valid at time t, but invalid at a different time t ′ (i.e., B(t � ) < Length(P) ≤ B(t) ). This necessitates the development of handling methods for invalid solutions. Such methods are introduced in Section Handling Methods for Invalidated Solutions. It should also be noted that"time"in this work refers to the time during the run of an optimization algorithm. This is to be distinguished from the"time"at which a node or an edge is visited (which primarily depends on the distance traveled so far on a given path P, see [11,12], for example). How the dynamic functions s and B are defined for the experiments is described in Section Problem Instances.
We assume that the changes in these functions are not known beforehand to an optimization algorithm.
Since the static OP is NP-complete [14], it immediately follows that the DOP is also NP-complete. Based on the formal description of the problem, it can be seen that algorithms for the OP need to solve two sub-problems: 1) A subset V ′ ⊆ V with a preferably high sum of node values needs to be selected; 2) For the nodes in V ′ , a closed path P has to be calculated that preferably has a small length Length(P) . The latter sub-problem is highly similar to the well-studied Traveling Salesperson Problem (TSP). If the solution calculated for the second sub-problem exceeds the available budget B due to Length(P) > B , the first subproblem might need to be resolved. Due to the budget constraint, the first sub-problem has similarities to the well-known Knapsack Problem (KP). Hence, solving the OP requires one to take the interplay of these two combinatorial optimization problems into account.

Variable Neighborhood Search
The central idea of Neighborhood Search algorithms is to modify a given solution P old by selecting a new solution P new from a set (the "neighborhood" of P old ) containing solutions similar to P old while maximizing or minimizing an objective function f. Variable Neighborhood Search algorithms are based on the observation that a local maximum or local minimum within one neighborhood might not be one within a different neighborhood [19]. For this reason, Variable Neighborhood Search algorithms use different neighborhood functions, i.e., sets of solutions considered "similar to P old " over the course of the algorithm. Formally, this is written as P new = arg min P∈N k (P old ) f (P) (for the case of a minimization problem) where N k (P old ) is the neighborhood of solution P old and k is an index to denote different neighborhoods.
In this paper, two neighborhoods N (P) and N (P) are used which are defined as follows. The set N (P) is the set of all paths P ′ derived from P by inserting one unvisited node with a positive value into P (regardless of whether Length(P � ) exceeds the budget B). The set N (P) is defined as the set of all paths that are obtained by removing one node (except the depot note v 0 ) from P. Search steps with respect to these two neighborhoods thus allow the algorithm to add and remove nodes from a path.
Since solving DOP problems requires one to repeatedly solve sub-problems with different objectives, the heuristic proposed in this work is based on a generalization of this principle: Given a path P old , the algorithm selects a new solution P new from a neighborhood N k that maximizes or minimizes an objective function f which also changes during the run time of the algorithm, depending on an index :

SN Computer Science
The combination of different neighborhoods with varying objective functions -which can also be interpreted in the sense that the neighborhood structure is now determined by the pair (N k , f ) -allows for tuning of the algorithm's behavior. More specifically, using different objective functions f allows the algorithm to focus on specific aspects of the given optimization problem. For the problems in this work, the following objective functions are chosen: In f the number r is drawn randomly with uniform probability from the interval [0, 1]. For this case it can be shown that Eq. (1) is equivalent to choosing P new randomly from the set N k (P old ) . In the following, we define , f , f , f } as the set of considered objective functions. If Length(P) = Length(P old ) , we define f (P, P old ) = ∞ and f (P, P old ) = ∞ . Depending on the objective function f and the considered neighborhood (i.e., (1) , the objective function must be minimized or maximized. The absolute values are used to consider only positive values for the objective function f . For example, if one considers the function f with the neighborhood N , Length(P) > Length(P old ) holds and thus f needs to be maximized, since it is preferable for the new solution P to not be much longer than P old . If one considers f and the neighborhood N , Length(P) < Length(P old ) holds and thus f must be minimized to ensure a large difference between the path length of the new and the old solution. In general, for f with ∈ { , , } , the following holds: The choice of the objective functions is motivated by the structure of the OP. Objective function f focuses on the path length and prefers short paths. This is relevant for the aforementioned sub-problem that corresponds to the TSP. The objective function f primarily considers the total value of a path Value(P) without taking its length into account. This can be interpreted as solely focusing on the KP sub-problem. A third objective function f combines f and f such that the selection of new solutions takes both sub-problems into account. Function f corresponds to a random selection of paths which adds a perturbation mechanism to the algorithm and strengthens the exploration. For the neighborhoods considered in this work, two neighboring solutions differ by a single node so that the calculation of these functions can be performed efficiently by only considering the differing nodes.
The proposed algorithm VNS DOP is shown in Algorithm 1. VNS DOP starts with a short, exploration-focused initial phase that contains a randomized procedure which for a path P and parameter p ∈ [0, 1] iterates over all unvisited nodes with a positive value and with probability p inserts them into P at random positions, regardless of whether the length of the obtained path violates the budget constraint or not. This exploration-focused procedure is used to do a quick, cursory scan of different areas of the solution space to find a promising subset of nodes from which solutions are then improved using the Chained-Lin-Kernighan heuristic [4] (which is based on the implementation provided by the Concorde TSP Solver [10]). The computation time for the Chained-Lin-Kernighan heuristic is included in the time measurement framework that is described in "Measurement of algorithm performance". Afterwards and if necessary, nodes are removed such that f is minimized, the reason for using this objective function is that it provides a balanced view on both subproblems without using a large amount of time. The initial phase lasts for k init iterations where k init is a parameter.
Afterwards, the main iterations begin where the algorithm loops through three steps until a termination criterion is satisfied. With solution P obtained from the previous step, it repeatedly performs search steps using neighborhood N while maximizing a randomly chosen function from F until the resulting path exceeds budget B. The obtained path is now invalid. Next, the calculated path P is improved with respect to Length(P) by using the Chained-Lin-Kernighan heuristic.
In the third step, if the improved path P still exceeds the budget B, the algorithm then performs search steps with N (P) in order to remove nodes from P. In the VNS framework, this corresponds to a change in neighborhood since a local optimum with respect to N (P) was reached that cannot be feasibly improved by adding nodes. The search steps minimize a randomly chosen function from F until the resulting path P does not violate the budget constraint. The random selection of functions is done to provide a degree of variation in each iteration of the algorithm, which in combination with f strengthens the exploration and allows the algorithm to flexibly adapt to dynamic changes as it constantly generates and tests a variety of paths based on the current solution.

Handling Methods for Invalidated Solutions
In Dynamic OPs where the budget changes, it is possible that a valid solution P becomes invalid over time. Formally, this means that Length(P) ≤ B(t) holds at a time t ∈ T (i.e., P is valid), but Length(P) > B(t � ) for a t ′ > t . Since operational decisions for tour planning can be negatively affected when the current solution becomes invalid, it is necessary to develop handling methods H for invalid solutions that are immediately invoked when the current solution (or the best solution found so far) becomes invalid. For such methods, it is desirable that they lead to a valid solution of high quality in a short time.
To deal with changes in the optimization problem, several methods have already been proposed in the literature. One example is to memorize a set of solutions that can be called back into memory when the problem changes. This is particularly suited for algorithms that work with multiple individuals or populations, for example, the Populationbased ACO algorithm [17]. However, it should be noted that depending on the volatility of the problem and its constraints, it is possible that a large number of solutions might have to be memorized and regularly updated with a nonnegligible overhead in run time and memory usage. Another example can be found in [2] where different ways of dealing with invalid solutions are investigated and compared, including repairing solutions (with and without the usage of populations), waiting (in the hope that the solution later becomes valid again) and continuing to work with the invalid solution while incorporating a penalty term.
In this work, three handling methods are used. In the following, let P be a given solution that exceeds the budget at a given time t, i.e., Length(P) > B(t).
Handling by restarting: The first method is to discard P and to revert to the initial solution P 0 if one is available and if Length(P 0 ) ≤ B(t) holds, otherwise to the empty solution P ∅ = (v 0 ) that only contains the depot node v 0 and is always valid. The algorithm is then restarted. This method does not actually incorporate the invalid solution P, but corresponds to the idea of restarting the tour planning system from its initial state and constructing a new valid solution "from scratch", which might happen in practical scenarios where no other handling methods are available. In [17], a similar idea of restarting the algorithm is also used as a reference method. In the following, this handling method is denoted as H . Handling by fixing the constraint violation: The second handling method is based on the idea of creating a valid solution P ′ from P"as fast as possible"by modifying the solution P such that the violation of the constraint is reduced by the largest amount possible. In the context of the DOP, this corresponds to repeatedly removing nodes from P that reduce Length(P) as much as possible and thus, the degree with which the length of P exceeds the budget. This is done until the resulting solution P ′ satisfies Length(P � ) ≤ B(t) . In the following, this handling method is denoted as H . Handling while retaining solution quality: The third handling method aims to modify the solution P in a "parsimonious"way, i.e., in a way that retains the maximum amount of solution quality in P. For the DOP, this means SN Computer Science that nodes v from P with the lowest value s(v, t) are repeatedly selected and removed until the resulting solution P ′ is valid. This handling method is denoted as H .
The latter two methods are similar to the principle of repairing solutions from [2] and, as the notation suggests, are selected for this work since they focus on different aspects of the DOP, namely the TSP (with H ) and the KP sub-problem (with H ). With regard to the algorithm VNS DOP proposed in "Variable neighborhood search", it can also be said that H and H correspond to using the functions f and f in the while-loop (line 17-18) of Algorithm 1, respectively. Due to the general formulation of the latter two methods, it is also possible to combine them with other methods (e.g., by invoking them on multiple solutions in population-based methods) or to apply them to other optimization problems with similar dynamic constraints.
Combining handling methods with one another is possible as well, e.g., by considering a compromise between reducing the violation of the constraint and retaining solution quality. A possible implementation of this would be a ratio between these two terms, similar to f in the algorithm VNS DOP , but other methods that consider these two conflicting aspects at the same time are also conceivable. However, investigating such methods in detail would exceed the scope of this work. Another possibility is to dynamically change the handling method used by an algorithm during runtime, depending on various factors, such as the volatility of the environment or the amount by which constraints are violated. However, this work only focuses on the three handling methods mentioned above. So, H corresponds to resetting the algorithm which can happen in practice when no other handling is available, whereas H and H correspond to the handling of invalid solutions based on two sub-problems of the dynamic OP.

Computational Evaluation
In this section, we describe the experiments that have been done to evaluate the proposed algorithm VNS DOP for Dynamic Orienteering Problems and their results.

Measurement of Algorithm Performance
In the dynamic OP it is possible that the quality of a given path P as well as the optimal path P * change over the course of time. For this reason, it is not sufficient to consider an algorithm's performance only at a certain point in time. Rather, the algorithm's optimization behavior over a given time interval needs to be taken into account. For this reason, a measurement framework with extensive logging functionalities has been implemented (with the source code being available at [30]). To measure how well an algorithm performs over time, this work uses progress curves as described in [54], which plot over time the quality Value(P best ) of the best solution P best found so far. Similar to [54], the progress curves are recorded with respect to different time measures in order to evaluate different aspects of an algorithm. Due to the general formulation of progress curves, they can be easily used for other optimization problems if appropriate problem-specific time measures are chosen. In this work, the following two time measures are used: 1 Function evaluations (FE) count how often the total length Length(P) or the total value Value(P) of P is calculated. Evaluations with respect to this time measure give insight into how an algorithm deals with the TSP sub-problem since for a given subset of nodes V ′ ⊆ V there exist multiple paths with different lengths. 2 Subsets (SS) count how many different subsets V ′ ⊆ V of nodes have been used so far for the calculation of paths. This time measure focuses on how an algorithm selects suitable subsets of V and thus how it deals with the subproblem that is similar to the KP.
Note that these time measures do not depend on the hardware that is used to run an algorithm. Since the criterion Value(P) is to be maximized for the DOP, it is preferable for the resulting progress curve to quickly reach high values as opposed to progress curves for minimization problems such as the TSP where low values in the target criterion are desirable. This allows one to use the sum UB t = ∑ v∈V s(v, t) of all node values at a time t ∈ T as a trivial upper bound for normalizing the solution quality such that for every time point t it holds that Value(P)∕UB t ∈ [0, 1] , similar to the "optimization accuracy"measure used for dynamic optimization problems [38]. It should be noted that besides UB t , the value Value(P best ) of the best solution P best found so far can also change over time due to the dynamic changes in the problem (later described in "Problem Instances"). Example progress curves are shown in Fig. 1 left for algorithms described in the next subsection.

Choice of Algorithms for Comparison
Based on the literature review presented in "Related-Work", we selected the Evolutionary Algorithm from [27] and the Greedy Randomized Adaptive Search Procedure with Segment Remove from [25] as reference algorithms to evaluate the proposed method VNS DOP . Both of these heuristics are fairly recent algorithms that obtained favorable results when experimentally compared to other modern heuristics for the OP in their respective studies. In the following, they are abbreviated as EA and GSR, respectively.
The authors of EA uploaded their source code to GitHub, but for the experiments in this work the algorithm has been reimplemented based on the source code and the description in [27] to fit with and utilize our measurement framework for the DOP mentioned in "Measurement of Algorithm Performance". Algorithm GSR [25] was also reimplemented as its source code is not available.
Both algorithms were slightly adapted to be used as improvement heuristics that work with a given solution P 0 . In particular, algorithm EA initializes half of its population with mutated variants of P 0 that have been obtained with its mutation operator, after which the algorithm works identical to its original formulation. As for GSR , the algorithm skips the construction phase for an initial solution if P 0 does not just contain the depot v 0 and immediately proceeds to the local search phase using P 0 (otherwise the algorithm runs in its original form). If the local search phase ends before the termination criterion is satisfied, the path obtained so far is modified using the Segment Remove operator proposed in [25] and the algorithm GSR is repeated. The source code (in C++) for the algorithms EA and GSR as well as the proposed algorithm VNS DOP is available at [30].

Problem Instances
The DOP instances used in this work are based on two sets of (static) OP instances. The first set is based on OPLib [28], a benchmark for Orienteering Problems. There we chose the instances brazil58, brazil48, gr48, gr120 from the gen4-subset since it contains the most difficult instances [27,28]. The 4 instances from the subset were chosen because they specify the distance function d using a distance matrix, whereas most of the other instances specify distances by listing point coordinates and Euclidian distances between them. However, for applications on road networks it is not uncommon that the travel distance between two nodes differs from their Euclidian distance which is why we did not consider these instances.
The second set of instances (in the following referred to as city) is intended to contain properties of road networks in cities and was generated as follows: Map data from OpenStreetMap [39] was used from which we downloaded extracts for two German cities, Leipzig and Berlin (as examples for a smaller and a larger city) using the download server from [47]. We then applied a parser [23], extracted the roads and processed the resulting files to obtain the road network as a graph. On these two graphs, we randomly selected 100 nodes and assigned to them a random initial value s(v, 0) ∈ {1, 2, … , 10} excluding one random node which was set as a depot node v 0 with value 0 (default value). This was repeated three times for each city. In addition, distance matrices for these nodes were calculated so that the instance data is available in the same form as in the OPLib instances.
Regarding the budget B for the city instances, the value B = 80 000 was chosen based on the following reasoning: A survey [21] measured that the average speed in the two aforementioned cities is 11 mph ( ≈ 17.70 km/h). Since road transport drivers in the European Union are not allowed to drive for more than 4.5 h without taking a break [55], it is potentially possible to drive 49.5 miles during that time, which, after rounding, corresponds to approximately 80 000 m since the generated graphs measure distances in meters. Using the Concorde TSP Solver [10], it has been verified that for none of the city instances all nodes can be reached within this budget. This set of instances contains 6 Orienteering Problems and an example is shown in Fig. 1 right.
Before the dynamic OP instances are described, it is necessary to introduce some notation. In order to investigate the effect of dynamic changes, we introduce two parameters L ∈ {5, 10, 25, 50} and C ∈ {5, 10, 25, 50} which describe the frequency and intensity of changes in a dynamic OP instance, respectively. For given values L , C and for a given time measure ∈ {FE, SS} , the time horizon T is set as the discrete time interval T = {1 , 2 , … , 1 000 000 } and all algorithms are set to terminate when T expires. Let t 0 = 0 denote the start time for all time measures and define T � = {t 0 , t 1 , t 2 , … , t L } to be a set of time points where t 1 , t 2 , … , t L are L time points evenly distributed over T with t i = ⌊i ⋅ �T�∕(L + 1)⌋ for i ∈ {1, 2, … , L} . Based on the 10 static OP instances described above and using the newly introduced notation, two sets of dynamic OP instances were generated. Table 1 gives an overview of these instances, which are described in the following.
Instances with dynamic node values: Given a static OP instance, for each node v ∈ V let s(v, 0) be the initial value that was given beforehand for the OPLib instances or assigned in the procedure above for generating the city instances. Let V 1 = {v ∈ V | s(v, 0) > 0 ∨ v = v 0 } be the subset of nodes that are the depot or have a positive initial value (also called"points of interest" [13]). In the following, these nodes are set to be nodes for which the value function s can potentially change its value.
The time-dynamic function s is defined as follows. Using the parameter C mentioned above, let the number of nodes affected per change be determined by a value , v − c } are generated that each contain c distinct non-depot nodes randomly selected from V 1 such that Ṽ + ∩Ṽ − = ∅ . Next, a variable is used to iterate over these sets. For = 1, 2, … , c , draw a random integer q from [1, s(v − , 0)∕2] and change the values of v − and v + as follows: For all other nodes v ∈ V that are not contained in Ṽ + or Ṽ − , their value s(v, t i ) is not changed.
In other words, C% of the nodes in |V 1 | (which are randomly selected) undergo an absolute change in value at each t i by subtracting values from (C∕2)% of the nodes and adding these values to another disjoint set of nodes containing (C∕2)% of the nodes in |V 1 | . This ensures that the sum over all node values ∑ v s(v, t) does not change over time. The value subtracted from a node v is bounded by 50 % of its initial value s(v, 0) (while also ensuring that the new value is not negative). For all other t ∈ T ⧵ T � , the function s is set to be constant: Applying this procedure to the 10 static instances described above for each combination of values for L , C and leads to a set of 320 DOP instances.
Instances with dynamic budgets: Based on a static OP instance, a dynamic OP instance with a budget function B(t) is defined as follows. Let the initial budget B(0) be the budget B of the static OP instance. For each t i with i ∈ {1, 2, … , L} , set B(t i ) as B(t i ) = B(t i−1 ) + ⌊q ⋅ B(0)⌋ where q ≠ 0 is randomly drawn from [−C%, C%] . This means that at each t i , the budget changes by an additive term that is bounded in absolute value by C% of its initial value B(0). For all other t ∈ T⧵T � , the budget is constant: In addition, the function B(t) was additionally bounded from below and above to prevent situations where the budget is so large that all nodes can be potentially visited and situations with extremely small budgets where no valid solutions L times during a run with length 1 000 000 , the budget changes by an additive term that is bounded in absolute value by C% of the initial budget. 10 static OP instances 4 OPLib instances Static OP instances used for the generation of dynamic OP instances. 6 city instances P with Value(P) > 0 exist. This was done by calculating the length of the optimal TSP solution using the Concorde TSP Solver [10] as well as calculating the shortest possible cycles with positive value for each static OP instance. Similar to the dynamic OP instances with changing node values, this set contains 320 DOP instances, leading to a total of 640 dynamic OP instances which are uploaded to [30].

Parameter Values
The proposed algorithm VNS DOP described in "Variable Neighborhood Search", contains two parameters: The number of iterations in the initial phase k init and the probability p for random insertions in the initial phase. Regarding k init , since the number of possible solutions grows rapidly with increasing number of nodes in the graph, we consider it reasonable to scale the length k init of the initial exploration phase with the size of the graph. However, if the algorithm focuses too strongly on exploring the solution space, there might not be enough time to refine the discovered solutions. We thus set k init as k init = √ �V 1 � , i.e., the square root of the number of nodes with positive value including the depot as a compromise between these two conflicting aspects.
As for the second parameter p, it is desirable that p is close to the percentage of nodes contained in an optimal solution so that the insertion procedure and subsequent optimization of the path length leads to a solution of high quality. The authors of the EA presented in [27] dealt with a similar problem and proposed the formula √ B∕Length(P LK ) , which in the following is also used for the parameter p. This formula, which incorporates the budget B and the length of the path P LK obtained by applying the Chained-Lin-Kernighan heuristic on all nodes in V 1 , acts as an efficiently calculable approximation for the number of nodes contained in an optimal solution (in relation to all nodes). The parameters for the other two algorithms EA and GSR are set as described in their respective studies [25,27]. The initial solution used for the improvement heuristics in the following experiments is generated for each static OP instance using a simple greedy heuristic which, based on the path P = (v 0 ) , repeatedly appends nodes that increase its length the least until the path length exceeds B. These initial solutions are also used in the corresponding dynamic OP instances and available at [30].

Experimental Setup
The three compared improvement heuristics were executed on each of the 640 DOP instances and 10 static OP instances described in "Problem Instances" with 10 repetitions over which the progress curves were averaged. The runs were executed on a cluster with 4 computers, each having eight 3.4-GHz-cores (each run being executed on one core) and 32 GB RAM. For the dynamic OP instances, the compared algorithms were set to terminate when the time horizon T ends, i.e., after 1 000 000 function evaluations or 1 000 000 subsets are calculated, depending on the time measure by which changes in the DOP instance occur. Runs on the static instances were set to terminate after 1 000 000 time units have passed for both of the time measures.
Plotting the progress curves for each dynamic OP instance and time measure leads to a total of 640 diagrams (similar to Fig. 1 left) so that an individual evaluation of each diagram is not feasible. For this reason, we calculate the percentage that the area under the progress curve (area under curve, AUC ) occupies from the area of a theoretical progress curve with constant value 1. This value is denoted as AUC rel (relative AUC ) in the following and satisfies AUC rel ∈ [0, 1] providing an aggregate quality measure for the progress curves, similar to the evaluation methodology in [54]. These values allow us to quantify the performance of an algorithm on a given instance where a high value indicates that the algorithm quickly obtains solutions of high quality with respect to the time horizon T.
To compare the AUC values over different instances (similar to [38]), we normalize them with respect to the best attained value per instance. Formally, if AUC rel I,A denotes the relative AUC for algorithm A on an instance I, the normalized value AUC norm I,A ∈ [0, 1] is calculated as AUC norm where a value close to 1 indicates that the algorithm A reaches a performance similar to the best performing algorithm for instance I. This type of evaluation measure can be seen as an extension of the Ratio between AUC rel I,A and AUC rel I,A * where A * is the algorithm that obtained the best AUC rel on instance I. Satisfies 0 ≤ AUC norm I,A ≤ 1 , and if an algorithm A obtains the best results on an instance I, AUC norm I,A = 1 holds.

SN Computer Science
"collective mean fitness"that is commonly used to evaluate an algorithm's performance on dynamic optimization problems [38], where instead of the best values per iteration/ generation the optimization behavior over the entire run time is taken into account. An overview of the evaluation metrics used in this section is shown in Table 2.
It should be noted that in the following sections, the calculation of the AUC values is restricted to the last 500 000 time units for all progress curves. This is done to reduce effects caused by differences during the initialization of each algorithm and to specifically investigate the steady-state performance of the algorithms. Furthermore, it allows the following analyses to focus on the optimization behavior of the algorithms for the scenario where they are used long-term in a dynamic environment. However, the results of the following evaluations as well as the AUC norm I,A and AUC rel I,A values for all DOP instances and algorithms are uploaded to [30] for both 500 000 and 1 000 000 time units, allowing for further comparisons in future research.

The Effect of Dynamic Changes in Node Values on Algorithm Performance
Before the algorithms, EA, GSR, and VNS DOP are compared regarding their performance, we first analyze how the dynamics of the DOP instances with changing node values influence the performance of the algorithms as this yields insights on how the algorithms behave in a dynamic environment. In particular, the frequency L and intensity C by which the node values s(v, t) change and the effect these parameters have on the obtained AUC values are analyzed. Figure 2 shows line plots containing the medians of rescaled AUC rel values grouped by different configurations of L and C. The rescaling of AUC rel on a given dynamic OP instance is done with respect to the best AUC value for its corresponding static OP instance. This allows for approximate comparisons over different instances to investigate trends on how the AUC rel for a static OP instance changes when dynamics are added. The median was chosen since it is robust to outliers and skewed data. In addition, the Fig. 2 The effect of the parameters L and C for the changes in the functions s(v, t) on the (relative) attained AUC, shown as line graphs with increasing C for fixed L (left) and graphs with increasing L for fixed C (right). The colors for the three algorithms are the same as in Fig. 1 left, i.e., EA, GSR and VNS DOP . For each dynamic OP instance I L,C (generated from a static instance I 0 ) and algorithm A, the values were calculated which indicate the ratio between the AUC that A attains on I L,C and the best AUC attained on the instance I 0 .
A point in a diagram shows for an algorithm A the median for I L,C ,A calculated over all DOP instances with the same L and C. The points with L = 0 or C = 0 correspond to median values on static OP instances. For each point the 95% confidence interval (based on the sign test) is shown 95% confidence interval for the median (calculated from the non-parametric sign test) is shown.
It can be seen that the performance of all three algorithms remains fairly high for DOPs with changing node values as they all reach AUC values above 0.9. This means that no algorithm consistently performs worse than 90% compared to the best AUC reached on the corresponding static instance. It is also visible that the confidence intervals become wider for higher values of L and C indicating that the variability of the changes in AUC increases. However, the sequence of CIs and the slope of the line graphs show a slight upward trend in median values, especially in the bottom two plots on both sides. A possible interpretation for this is that the algorithms are able to collect nodes which increase in value and discard nodes with decreasing values leading to progress curves with higher AUC. Especially for VNS DOP , which consistently reaches the highest median values of the compared algorithms, it can be seen for high values of C that sometimes values over 1 are reached which indicates that due to the changes in node values it finds solutions with a higher total value than in the static case.

Comparison of VNS DOP with other Metaheuristics for DOPs with Dynamic Node Values
To compare the algorithms to one another, the AUC norm I,A values described in Sect. 6.5 are used in the following as they have the property that the best performing algorithm A * per instance I satisfies AUC norm I,A * = 1 and the other algorithms' performance is scaled percentage-wise to the performance of A * . Table 3 shows the average AUC norm I,A aggregated over DOP instances with the same parameter values L and C for both time measures. It can be seen that in both sets of evaluations VNS DOP obtains the best results for all values of L, C and for both time measures. Especially for the time measure FE, its average value is 1.000 for all L and C which means that with respect to this time measure it obtained the best AUC norm I,A value on all instances showing a high performance over time for different levels of dynamic changes in the node value function s. Since the AUC values consider the last 500 000 time units, this also indicates that VNS DOP is able to maintain this performance over longer time periods. For = FE , it can also be seen that the values for EA are higher than the AUC values for GSR showing that EA outperforms GSR, but not VNS DOP . For the"subset"time measure = SS , both EA and GSR obtain slightly higher values which can be interpreted in the sense that these algorithms work more efficiently regarding the number of subsets V ′ ⊆ V used, especially GSR as it reaches a performance that is competitive with EA for this time measure.

Effects of Invalid Solution Handling Methods and Dynamic Changes in the Budget
Similar to "The Effect of Dynamic Changes in Node Values on Algorithm Performance", the influence of the frequency L and intensity C determining the budget function B(t) as well as the effects of the invalid solution handling methods H presented in "Handling Methods for Invalidated Solutions" on the obtained AUC values are investigated. Figure 3 shows line plots with the medians of the values AUC norm for fixed C and increasing L, along with the 95% confidence interval for the median (calculated using the sign test). Note that in Fig. 3 AUC norm is used instead of rescaled AUC rel values as in "The Effect of Dynamic Changes in Node Values on Algorithm Performance" since the sum of node values is the same as in the static OP instances and does not change over time. Furthermore, the AUC norm focuses on the performance gap between different algorithms  For each point, the 95% confidence interval (based on the sign test) is shown. In addition, line graphs for the medians with increasing L for fixed C are shown. The colors for the three algorithms are the same as in Fig. 1 left, i.e., EA,GSR and VNS DOP allowing for easier comparisons of the methods. For example, for C = 5 the point (L = 50, median(AUC norm ) = 0.94) for VNS DOP using H in the top left indicates that for 50% of the DOP instances with these values for C and L the algorithm VNS DOP reached an AUC value that was more than 94% of the best AUC value and in the remaining 50% negative effect is reduced. More specifically, the decrease in performance is still visible for GSR, but less for EA while for VNS DOP the median stays at a high value. This shows that even when the problem is highly dynamic it is better to repair existing solutions than to restart the optimization algorithm.
However, for all algorithms the width of the confidence interval increases for large values of L which indicates that the dynamic changes also lead to the AUC norm values becoming more diverse. This especially holds for VNS DOP with H where for higher C and L its performance is hardly distinguishable from the EA, whereas for EA the variability in the AUC values does not increase as much with H . This, and the similar curve progressions for the other invalid solution handling methods indicate that the EA has a stable performance, even when restarted multiple times during a run. However, if handling methods H and H are used, VNS DOP outperforms the other algorithms in most cases as can be seen by the consistently high median values.
Regarding the differences between H and H , the sign test for paired samples, which is a non-parametric test, showed for all three algorithms that the difference in the median AUC norm values between H and H is highly significant ( p < 0.001 for all three tests, paired over n = 320 DOP instances). The paired nature of the data also makes it possible to plot the AUC norm for the different handling methods against one another, as shown in the scatter plots in Fig. 4 left in order to further investigate their differences.
For both H and H it can be seen that most of the points are below the diagonal, indicating that better results are obtained with H , although the gaps in performance are smaller with H for all algorithms. This is also visible when progress curves are plotted for different handling methods, one example being shown in Fig. 4 right: For H the algorithm ( VNS DOP in this example) first reverts to its initial solution and later to the empty solution P ∅ = (v 0 ) as even the initial solution exceeds the decreasing budget. The loss in solution quality is not as large with H , but it can be seen that H maintains most of the solution quality whenever the budget decreases. A possible interpretation for this is that more nodes are removed with H than with H until the solution becomes valid again leading to a larger decrease in solution quality.
Regarding the time measures in Fig. 4 left, it is interesting to note that for GSR more points with low AUC values occur for = FE more often than for = SS which, similar to the observation in the previous experiment in "Comparison of VNS DOP with other Metaheuristics for DOPs with Dynamic Budgets", indicates that its strength is to carefully select a subset V ′ ⊆ V and thoroughly test it before changing the subset. For VNS DOP it can be seen that the inverse

Comparison of VNS DOP with other Metaheuristics for DOPs with Dynamic Budgets
To compare the different algorithms, Table 4 shows the average AUC norm for the three algorithms and both time measures, split by values of L and C. Note that for each instance the best performing invalid solution handling method was used for the calculation of the mean, indicated by the asterisk ( * ) in the names of the algorithms. For = FE , the algorithm VNS DOP performs best for all values of L and C, followed by EA which outperforms GSR in all cases. VNS DOP also obtains the highest AUC norm values for = SS in most cases (except three with very high dynamics) indicating that it is able to consistently maintain a high performance over time with different types of dynamics. There are some cases with = SS where GSR outperforms EA and it can be observed that EA and GSR generally tend to reach higher values for = SS than for = FE . Similar to the observations in "Comparison of VNS DOP with other Metaheuristics for DOPs with Dynamic Budgets", this also shows that these two algorithms work efficiently to select subsets of nodes, but use more evaluations to construct solutions. In addition, it can also be seen that the values in Table 4 are smaller than the AUC norm values in Table 3 further indicating that changes in the budget have a stronger negative effect on the performance of the algorithms.

Conclusion
This work considered a Dynamic Orienteering Problem (DOP) where parts of the problem change over time during the optimization process. In particular, two types of dynamic changes were considered: i) changes in the value function s for the nodes and ii) changes in the budget B. Since dynamic problems necessitate the development of algorithms that quickly adapt existing solutions, an improvement heuristic VNS DOP was proposed that is based on Variable Neighborhood Search. The main idea of VNS DOP is to take the two interacting sub-problems of the OP into account by combining neighborhoods with changing objective functions. This allows the algorithm to dynamically change how new solutions are selected. This concept can also be applied to other combinatorial optimization problems by choosing the functions and neighborhoods in accordance with their characteristics and structure.
Computational experiments were performed with two improvement heuristics based on existing state-of-theart methods for the static OP and the proposed algorithm VNS DOP . For the evaluations, a large number of dynamic OP instances was used, containing various frequencies L and intensities C with which changes occur in the node values s and the budget B. The evaluation of the experiments, which considered the performance over the entire run time, investigated the influence of dynamic changes on the algorithms. For DOPs with dynamic node values s, it could be seen that the performance of all algorithms becomes more volatile as the dynamics increase. However, per time measure, frequency L ∈ {5, 10, 25, 50} and intensity C ∈ {5, 10, 25, 50} for dynamic changes in the budget B. The values are rounded to 3 decimal places and values in bold indicate the best average value for each aggrega-tion criterion and time measure. The asterisk ( * ) at the names of the algorithms indicates for each DOP instance the invalid solution handling method H was used that reached the best AUC value for each DOP instance. Here " VNS"is used as an abbreviation for VNS DOP L = 5 L = 10 L = 25 L = 50 there were some cases where a slight increase in performance for a dynamic OP instance was observed when compared to the performance on the corresponding static OP instance. The results show that VNS DOP is able to deal with different types of problem instances with different dynamic levels, since it outperformed the other algorithms with respect to several time measures. For DOPs with dynamic budgets B, it is possible that valid solutions become invalid over time as the budget constraint changes. For this reason, three invalid solution handling methods were presented that are called when solutions are suddenly invalidated. The experimental results for these methods show that "repairing"the current solution by fixing the constraint violation as fast as possible ( H ) leads to the best results for all three algorithms, as opposed to repairing while minimizing the loss in solution quality ( H ) or restarting the algorithm ( H ). For the DOP instances with changing budgets, the performance of the algorithms became more variable as the changes increased in frequency and intensity, but it could be observed that changes in the budget had a visible negative effect on the performance of all algorithms. However, VNS DOP still obtained the best results in most cases for the considered time measures and evaluation criteria.