1 Introduction

Travelling salesman problem (TSP) and knapsack problem (KP) are two well-known NP-Hard combinatorial optimisation problems. In TSP (Gutin and Punnen 2006), a salesman performs a cyclic tour through a set of cities with a goal of minimising the length (or hence the travelling time) of the cyclic tour. In KP (Kellerer et al. 2004), using a collection plan, a knapsack with a given capacity is filled in with a subset of given profitable items with a goal of maximising the total profit.

TSP and KP are classical problems. However, real-world applications such as postal or waste collection problems (Mei et al. 2014; Polyakovskiy and Neumann 2017; Hannan et al. 2020) need more complex problem models in which both TSP and KP characteristics are intrinsically and interdependently present simultaneously. Problems in which KP items are scattered over TSP cities are modelled in various ways such as cumulative capacitated routing problem (Ngueveu et al. 2010), orienteering problem (Vansteenwegen et al. 2011), and selective/prize collecting travelling salesman problem (Balas 2007; Laporte and Martello 1990). In some of these models, instead of a single tour, multiple tours are involved while in some other models, a city might not be visited if no item is collected from the city.

In this paper, we study a particular problem model named travelling thief problem (Bonyadi et al. 2013) that essentially encompasses both TSP and KP characteristics in an entangled fashion. In an example problem having this kind of model, a postal truck mandatorily visits each city to collect letters. Moreover, the postal truck makes profits as it optionally collects heavy parcels from the cities. However, the gradual change in the truck load as the truck picks heavy parcels affects its travelling speed between cities, and hence affect the travelling time, fuel consumption, air pollution, and travelling cost. A solution for such a problem is a mandatory cyclic tour of cities to be visited successively with a sequence of optional services to be given at the cities such that the total profit made minus the total cost incurred is maximised.

In a travelling thief problem (TTP), a thief (i) rents a knapsack having a certain capacity at a given renting cost per unit of time, (ii) performs a cyclic tour through a set of cities, and (iii) collects a subset of profitable items in the knapsack with the objective of maximising the net profit, which equals total profit minus total cost. The entanglement of TSP and KP in TTP comes from two factors: (i) As the thief collects more items, the knapsack gets heavier, the thief gets slower, the tour takes longer time, and the knapsack rent goes up; and (ii) the order of cities in the tour affects the order of items that could be collected without exceeding the knapsack capacity. TTP is a multi-component problem since it has both TSP and KP as components. Solving such multi-component problems is more challenging because finding an optimal overall solution to a multi-component problem cannot be guaranteed by simply finding an optimal solution to each underlying component (Michalewicz 2012; Mei et al. 2016; Bonyadi et al. 2019).

TTP solving methods have obtained some progress over the years but further improvement is needed. Here we summarise five types of TTP methods in the context of the paper but a detailed exploration is presented later. Constructive methods (Polyakovskiy et al. 2014; Bonyadi et al. 2014; Mei et al. 2014; Faulkner et al. 2015) use Chained Lin-Kernighan heuristic (Applegate et al. 2003) to get a cyclic tour and then use various heuristics to construct a collection plan. Fixed tour methods (Maity and Das 2020; Polyakovskiy et al. 2014; Faulkner et al. 2015; Wu et al. 2017; Polyakovskiy and Neumann 2017) generate cyclic tours like constructive methods and then use exact or approximate methods to find collection plans. Cooperative methods (Bonyadi et al. 2014; El Yafrani and Ahiod 2016; Wagner et al. 2018; El Yafrani and Ahiod 2018; Namazi et al. 2020; Zhang et al. 2021) iteratively alternate between a search for a cyclic tour and another search of a collection plan, keeping one of the two unchanged while searching for the other until no further improvement. Full-encoding methods (Mei et al. 2014; Wagner 2016; El Yafrani and Ahiod 2016, 2017; Wuijts and Thierens 2019) deals with the entire TTP problem at a time using cyclic tour and collection plan changing operators within the same search framework. Hyper-heuristic methods (Ali and Mohamedkhair 2020; Mei et al. 2015; Martins et al. 2017; El Yafrani et al. 2018) generate or select low level heuristics as neighbourhood operators for cyclic tours or collection plans and use them in search.

Given the TTP literature summarised above, one aspect common among all approaches is that the search for one component’s solution (cyclic tour or collection plan) takes only the other component’s unchanged current solution (collection plan or cyclic tour) into account. Moreover, some approaches adopts an iterative strategy to alternate between the aforementioned neighbourhood operators for the two components. However, even these approaches might not really help get the overall search direction best for solving the entire multi-component problem. The reason is for the best solution of the entire problem, solutions for the two components, at the same time, should mutually best correspond to each other. Note that the coordination issue has been partially addressed by exact evaluation of the collection plans or city selections in related problems other than TTP and such problems include generalized traveling salesman problem (Bontoux et al. 2010), cumulative capacitated vehicle routing problem (Ngueveu et al. 2010), and vehicle routing problems with profits (Vidal et al. 2016). However, such exact evaluation via dynamic programming or labelling algorithms is usually costly and do not scale well for large problems. Ideally, a heuristic based cheaper but strong coordination method is needed between solving methods for the TSP and KP components in TTP meaning any considerable change on one component’s solution should take into account the other component’s all possible future solutions subject to the one component’s same solution.

In this paper, we first show that even a simple local search based coordination approach, let alone an exact evaluation based approach, is not effective in addressing the poor coordination issue in existing TTP methods. Then, we propose a human designed coordination heuristic that makes changes to collection plans during exploration of cyclic tours. We also propose another human designed coordination heuristic that explicitly exploits the cyclic tours in item selections during collection plan exploration. We further propose a machine learning based coordination heuristic that captures characteristics of the two human designed coordination heuristics. Our proposed coordination heuristics explore potentially better TTP solutions than the approaches exhibiting poor coordination. We empirically evaluate the effectiveness of our proposed approaches. On a set of benchmark problems, our proposed approaches help our coordination based TTP solver significantly outperform existing state-of-the-art TTP solvers. Our TTP solver is named Cooperation Coordination (CoCo) and it is available from https://github.com/majid75/CoCo.

We note that this paper thoroughly extends our previous preliminary work (Namazi et al. 2019) that has presented an early version of our human designed cyclic tour exploration coordination heuristic. In this paper, we have considerably revised the human designed heuristic. Furthermore, we have designed two cyclic tour exploration coordination heuristics: (i) a local search based heuristic and (ii) a machine learning based heuristic. Next, we have designed a coordination based collection plan heuristic. Moreover, we have described our proposed approaches more formally and in greater details.

We continue the paper as follows. Section 2 covers preliminaries. Section 3 explores related work. Section 4 describes the search framework used. Section 5 describes the proposed coordination heuristics. Section 6 presents the experimental results. Section 7 presents the conclusions.

2 Preliminaries

We formally define TSP, KP, and TTP. We describe the neighbourhood operators 2OPT for TSP and BitFlip for KP. We also define the prefix minimum and suffix maximum functions to help describe TTP coordination heuristics.

2.1 Travelling salesman problem

Assume a set \(C = \{1,\ldots ,n\}\) of \(n>1\) cities. The distance between each two cities \(c\ne c'\) is \(d(c,c') = d(c',c)\). In TSP, a salesman starts travelling from city 1 and visits each other city exactly once and returns back to city 1. The salesman thus completes a non-overlapping cyclic tour through all cities. For a given set C of cities, assume \(t = \langle t_0, t_1, \ldots , t_n\rangle \) is a cyclic tour with \(t_0 = t_n = 1\), and \(t_k=c\) iff \(t(c)=k\), where \(c \in C \setminus \{1\}\) is a city and \(k\in [1,n-1]\) is a position. No city in \(C \setminus \{1\}\) can be visited more than once in a cyclic tour t. So we have \(t_k \ne t_{k'}\) for any \(k \ne k'\) where \(k,k'\in [1,n-1]\). Given a cyclic tour t, the total distance travelled by the salesman is \(D(t) = \sum _{k= 0}^{k<n}d(t_{k},t_{k+1})\).

Definition 1

(TSP) Given a set C of cities, distance \(d(c,c')\) between each pair of cities \(c\ne c'\), find a cyclic tour t for a salesman such that the objective total distance D(t) is minimised. Note that the objective could be the total travelling time if the travelling speed is constant between each two cities.

Given a cylic tour t, a tour segment \(t[b,e] = \langle t_{b}, \ldots , t_{e}\rangle \) of length \(|t[b,e]| = e-b+1\) with \(0< b< e< n\) comprises cities in t between positions b and e both inclusive. In TSP, a tour segment reversal operator 2OPT (Croes 1958) is often used in generating a neighbouring tour from a given tour.

Definition 2

(2OPT) Given a cyclic tour t and positions b and e such that \(0< b< e < n\), a 2OPT(tbe) operator reverses the tour segment t[be] of length \(|e-b+1|\). So 2OPT essentially reverses the order of cities between positions b and e to produce a new tour \(t'\). Thus, \(t'_{b+k} = t_{e-k}\) is obtained for \(0 \le k \le e - b\) taking \({\mathcal {O}}(e-b)\) time. Any other city at position \(k \not \in [b,e]\) remains at the same position, i.e. \(t'_k = t_k\).

Lemma 1

(2OPT for TSP) Given a cyclic tour t in TSP, a 2OPT(tbe) operator produces a new cyclic tour \(t'\) for which computing \(D(t') = D(t) + d(t_{b-1},t_{e})+ d(t_{b}, t_{e + 1}) -d(t_{b-1},t_{b}) - d(t_{e},t_{e+1})\) takes \({\mathcal {O}}(1)\) time, when D(t) is already known.

2.2 Knapsack problems

Assume a set \(I = \{1,\ldots ,m\}\) of \(m>0\) items. Each item has weight \(w_i> 0\) and profit \(\pi _i > 0\). Assume \(p = \langle p_1, p_2, \ldots , p_m \rangle \equiv \{ i:p_i = 1\}\) is a collection plan with \(p_i \in \{0,1\}\) for each item i, where \(p_i = 1\) means i is a collected item and \(p_i = 0\) means i is an uncollected item. Assume the knapsack has weight capacity \(W>0\). For a given collection plan p, the total weight of the knapsack is \(W(p) = \sum ^{i=m}_{i=1}w_ip_i\), the knapsack constraint is \(K(p)\equiv W(p) \le W\), and the total profit of the collected items is \(P(p) = \sum _{i=1}^{i=m}\pi _ip_i\).

Definition 3

(KP) Given a set I of items with weight \(w_i\) and profit \(\pi _i\) for each item i and also the knapsack capacity W, find a collection plan p such that the objective total profit P(p) is maximised subject to the knapsack constraint \(K(p) \equiv W(p) \le W\).

In KP, an item selection operator BitFlip (Polyakovskiy et al. 2014) is often used in generating a neihbouring collection plan from a given one.

Definition 4

(BitFlip) Given a collection plan p and an item i, a BitFlip(pi) operator flips \(p_i\) from 0 to 1 or vice versa to produce a new collection plan \(p'\) taking \({\mathcal {O}}(1)\) time.

Lemma 2

(BitFlip for KP) Given a collection plan p in KP, a BitFlip(pi) operator produces a new collection plan \(p'\) with \(P(p') = P(p) + \pi _i\times (p'_i - p_i)\). Here, computation of \(P(p')\) takes \({\mathcal {O}}(1)\) time when P(p) is already known.

For convenience of exposition and for the sake of formality, below we define pick and unpick operations for collection plans in KP.

Definition 5

(Pick and Unpick) Given a collection plan p in KP, picking an item i is when \(p_i = 0\) and BitFlip(pi) is applied. Moreover, unpicking an item i is when \(p_i = 1\) and BitFlip(pi) is applied on a collection p.

2.3 Travelling thief problems

We start with all the notations and terminologies used for TSP and KP in Sects. 2.1 and 2.2 respectively. However, the salesman in TSP is viewed as the thief in TTP, the items in KP are scattered over the cities in TTP, and the thief travels around to collect the items. Moreover, the travelling speed in TTP gets slower as the theif collects items and the knapsack gets heavier.

Assume, in TTP, each item i is collected from a city \(l_i\) and a city c has a set \(I(c) = \{i: l_i = c\}\) of items. However, the designated city 1, where the cyclic tour t of the thief starts from and ends at, arguably does not have any item since such an item could be collected without any travelling. So, \(l_i > 1\) for any item i and \(I(1) = \{\}\). A TTP solution \(\langle t, p\rangle \) comprises a cyclic tour t and a collection plan p. An item i in a city \(l_i\) has a position \(t(l_i)\) in a cyclic tour t.

Assume the thief in TTP rents a knapsack of weight capacity \(W>0\) at a renting rate of \(R>0\) per unit of time. For a given collection plan p, also assume the total weight of the items collected from city c is \(w_p(c) = \sum _{i \in I(c)}w_ip_i\). Further, assume that \(w_{t, p}(k) = \sum ^{k'=k}_{k'=0}w_p(t_{k'})\) denotes the weight of the knapsack after collecting items from cities up to position k in the tour t using a collection plan p for a TTP solution \(\langle t, p\rangle \). Assume a speed function \(s(w) = s_{\max } - \frac{w}{W} \times (s_{\max } - s_{\min })\) for the current knapsack weight \(w \le W\), where the given maximum and minimum speed limits of the thief are \(s_{\max }\) and \(s_{\min }\) respectively with \(s_{\max }\ge s_{\min }\). So for a TTP solution \(\langle t, p\rangle \), the thief travels from city \(t_k\) to \(t_{k+1}\) with the knapsack weight \(w_{t,p}(k)\) and with a travelling speed \(s_{t,p}(k) = s(w_{t,p}(k))\). Moreover, the travelling time up to the position k in the cyclic tour t is \(\tau _{t,p}(k) = \sum _{k'= 0}^{k'<k}d(t_{k'},t_{k'+1})/s_{t,p}(k')\) and the total travelling time is \(T(t,p) = \tau _{t,p}(n) = \sum _{k = 0}^{k<n}d(t_{k},t_{k+1})/s_{t,p}(k)\). Hence, the total renting cost of the knapsack is \(R(t,p) = R\times T(t,p)\), and so the net profit is \(N(t,p) = P(p) - R(t,p)\). In TTP, we have to maximise the objective N(tp) over all possible cyclic tours t and all possible collection plans p subject to the knapsack constraint \(K(p)\equiv W(p) \le W\).

Definition 6

(TTP) Given a set C of cities, a set I of items, distance \(d(c,c')\) between each pair of cities \(c\ne c'\), weight \(w_i\) and profit \(\pi _i\) for each item i available in city \(l_i\), the knapsack capacity W, the knapsack renting rate R, a speed function s(w) with \(s_{\max }\) and \(s_{\min }\) as the maximum and the minimum speeds respectively, find a solution \(\langle t, p\rangle \) comprising a cyclic tour t and a collection plan p such that the objective N(tp) is maximised subject to the knapsack constraint K(p).

Figure 1 shows a TTP example, a solution, and the objective computation.

Fig. 1
figure 1

Left: a TTP instance having 5 cities (circles) and 4 items (rectangles) and a TTP solution with the travelled cyclic path (solid lines) and collected items (solid rectangles). Eacy city has a city index. City 1 (double circle) is the designated city to start from and end to. Each item has an item index and a tuple for weight and profit. Lines have distances as labels. Dashed lines are not in the travelled path and dotted rectangles are for items not collected. Right: other required parameters of the TTP instances along with the calculation of the net profit for the TTP solution

Although we do not claim any contribution, we prove relations of TTP with TSP and KP and show that TTP is NP-Hard. For this, we show how TSP and KP could be reduced to TTP. Note that there are many ways to get the reductions, but we just show one example for each case.

Lemma 3

(TSP Reduction) Solving a TSP is equivalent to solving a TTP when for the speed function, \(s_{\max } = s_{\min }\) and the knapsack weight capacity \(W\ge \sum _{i=1}^{i=m}w_i\), i.e. the knapsack is sufficiently large to hold all items.

Proof

With \(s_{\max } = s_{\min }\), the travelling speed becomes constant. With \(W \ge \sum _{i=1}^{i=m}w_i\), all items must be collected for the maximum profit. So the collection plan p has no impact on the cyclic tour t. \(\square \)

Lemma 4

(KP Reduction) Solving a KP is equivalent to solving a TTP when distance \(d(c,c')\) is the same for any two cities \(c \ne c'\) and for the speed function, \(s_{\max } = s_{\min }\) resulting into a constant speed.

Proof

When the distance \(d(c,c')\) is the same for any two cities \(c \ne c'\), and the speed is always a constant during the tour, the total travelling time is always the same. So the cyclic tour t has no impact on the collection plan p. \(\square \)

As per Lemmas 3 and 4, TSP and KP are both special cases of TTP. So we have the following lemma, which is also mentioned byMei et al. (2014).

Lemma 5

(TTP Complexity) TTP is NP-Hard since TSP and KP are NP-Hard.

We define TSP and KP components of a TTP when respectively the KP and the TSP components are left unchanged.

Definition 7

(TSPC) Given a TTP and a particular collection plan p, find a cyclic tour t so that the TTP objective is maximised. This is the TSP component of TTP or in short TSPC.

Definition 8

(KPC) Given a TTP and a particular cyclic tour t, find a collection plan p so that the TTP objective is maximised. This is the KP component of TTP or in short KPC.

We also show that solving the TSP or KP component of a TTP is not equivalent to solving a standalone TSP or KP respectively.

Lemma 6

(TSPC) TSPC is NP-Hard and is not equivalent to TSP.

Proof

For the first part: TSP easily reduces to TSPC with a constant speed function having \(s_{\max } = s_{\min }\). For the second part: assume the speed depends on the knapsack weight. Although the collection plan is unchanged, reordering cities may also change the item collection order. This implies the travelling speed and the travelling time even between the same pair of cities might change. This means the collection plan could still affect exploration of cyclic tours in TSPC. \(\square \)

The lemma below is provided by Polyakovskiy and Neumann (2017).

Lemma 7

(KPC) KPC is NP-Hard and is not equivalent to KP.

Above two lemmas show that just using standalone TSP and KP solvers to solve TSPC and KPC will not work. The reason is still the mutual interdependence of TSPC and KPC within TTP.

We adapt 2OPT and BitFlip operators to TSPC and KPC respectively.

Lemma 8

(2OPT for TSPC) Given a TTP solution \(\langle t, p\rangle \) with \(w_{t,p}(k)\) and \(\tau _{t,p}(k)\) for all positions, a 2OPT(tbe) operator produces a new cyclic tour \(t'\) for which computing \(N(t',p)\) needs \({\mathcal {O}}(e - b)\) time.

Proof

Given \(w_{t,p}(k)\) and \(\tau _{t,p}(k)\) for all positions in t, for each position \(k \in [b,e+1]\) in \(t'\) first we have to compute the knapsack weight \(w_{t',p}(k)\), the travelling speed \(s_{t',p}(k-1)\), and the travelling time up to each position \(\tau _{t',p}(k)\). Then, the new total travelling time and the new objective values are computed as \(T(t',p)=T(t,p)+\tau _{t',p}(e+1)-\tau _{t,p}(e+1)\) and \(N(t',p) =P(p)-R \times T(t',p)\), respectively. \(\square \)

Lemma 9

(BitFlip for KPC) Given a TTP solution \(\langle t, p\rangle \) with \(w_{t,p}(k)\) and \(\tau _{t,p}(k)\) for all positions, BitFlip(pi) produces a new collection plan \(p'\) for which computing \(N(t,p')\) needs \({\mathcal {O}}(n - t(l_i))\) time.

Proof

In addition to computing \(P(p')\) in \({\mathcal {O}}(1)\) time, for all positions \(k\in [t(l_i),n-1]\) in t, we have to compute the knapsack weight \(w_{t,p'}(k)\), the travelling speed \(s_{t,p'}(k)\), and the travelling time up to each position \(\tau _{t,p'}(k+1)\). Then, the new total travelling time and the new objective values are computed as \(T(t,p')=\tau _{t,p'}(n)\) and \(N(t,p') = P(p')-R \times T(t,p')\), respectively. \(\square \)

3 Related work

TTP was introduced by Bonyadi et al. (2013) and later many benchmark instances were given by Polyakovskiy et al. (2014). Depending on whether cities and items are dynamically made available for visiting or collection, TTP is of two types: dynamic and static. For dynamic TTP, we refer to a recent article by Sachdeva et al. (2020). In this paper, we mainly deal with static TTP solving: all cities must be visited and all items are available all the time. The thief decides whether particular cities are to be visited first or particular items are to be collected. Existing TTP solvers can be grouped into 5 main categories: (i) constructive methods, (ii) fixed-tour methods, (iii) cooperative methods, (iv) full encoding methods, and (v) hyper-heuristic methods. We give an overview of each category below. For further details, we also refer the reader to a recent review article (Wagner et al. 2018).

3.1 Constructive methods

In constructive methods, an initial cyclic tour is generated (TSP) using the Chained Lin-Kernighan heuristic (Applegate et al. 2003). The cyclic tour is then kept unchanged while collection plans are generated (KPC) using item scores based on their weight, profit, and position in the cyclic tour. This category includes greedy approaches such as Simple Heuristic (Polyakovskiy et al. 2014), Density-Based Heuristic (Bonyadi et al. 2014), Insertion (Mei et al. 2014) and PackIterative (Faulkner et al. 2015). These approaches have been used in restart-based algorithms such as S5 (Faulkner et al. 2015) and in the initialisation phase of other methods.

3.2 Fixed-tour methods

In fixed-tour methods, after generating an initial cyclic tour (TSP) using constructive methods, an exact or an approximate method is used to find a collection plan (KPC). Exact methods (Wu et al. 2017; Polyakovskiy and Neumann 2017) using dynamic programming or mixed integer programming approaches can find the best collection plan for every given cyclic tour. However, these methods can not solve large instances in a reasonable time. Approximate methods (Polyakovskiy et al. 2014; Faulkner et al. 2015; Maity and Das 2020) iteratively improve the collection plan by using the BitFlip operator on one or more items in each iteration. Approximate methods can solve large instances in a reasonable time although they do not guarantee to find the best collection plan for a given cyclic tour.

3.3 Cooperative methods

Cooperative methods are iterative approaches based on the cooperational coevolution approach (Potter and De Jong 1994). After generating an initial TTP solution using a constructive or a fixed-tour method, the cyclic tours and the collection plans are explored by two separate search modules for TSPC and KPC. These two search modules are executed by a meta-optimiser in an interleaving fashion so that their interdependent nature is somewhat considered (Wagner et al. 2018). Some well-known cooperative methods are Cooperative Solver (CoSolver) (Bonyadi et al. 2014), CoSolver with 2OPT and Simulated Annealing (CS2SA) (El Yafrani and Ahiod 2016), CS2SA with offline instance-based parameter tuning (CS2SA*) (El Yafrani and Ahiod 2018) and CoSolver with reverse order item selection (RWS) (Zhang et al. 2021). Moreover, a surrogate assisted cooperative solver (Namazi et al. 2020) approximates the final TTP objective value for any given initial TSP tour without finding the final solution; based on the approximation, non-promising initial solutions are discarded and thus more solutions are considered within a given time budget.

3.4 Full-encoding methods

Full-encoding methods consider the problem as a whole. Well-known full-encoding methods include a Memetic Algorithm with Two-stage Local Search (MATLS) (Mei et al. 2014), a Memetic algorithm with Edge-Assembly and 2-Points crossover operators (MEA2P) (Wuijts and Thierens 2019), a swarm intelligence algorithm (Wagner 2016) based on max-min ant system (Stützle and Hoos 2000), a memetic Algorithm with 2OPT and BitFlip search (El Yafrani and Ahiod 2016), another memetic algorithm with joint 2OPT and BitFlip (El Yafrani and Ahiod 2017) such that BitFlip is used on just one item each time a 2OPT operator is used on cyclic tours, and an evolutionary algorithm using typical TSP and KP operators but maintaining quality solutions over epochs (Nikfarjam et al. 2022).

Overall full-encoding methods do not perform well beyond a few hundred cities and a few thousand items due to search space explosion.

3.5 Hyper-heuristic methods

In hyper-heuristic based methods, genetic programming (GP) is usually used to generate or select low level heuristics for cyclic tour or collection plan exploration. One GP method (Mei et al. 2015) generates two packing heuristics for collection plans. An individual in each generation is a tree with internal nodes being simple arithmetic operators and leaf nodes are numerical parameters of a given TTP. Other GP methods (Martins et al. 2017; El Yafrani et al. 2018) learn how to select a sequence of low level heuristics for TSPC or KPC. Another GP method (Martins et al. 2017) uses Baysian networks with low level heuristics as networks of individuals in each generation. Yet another GP method (El Yafrani et al. 2018) has trees as individuals in each generation with internal nodes as functions and low level heuristics as leaf nodes. A recent random and reward based hyperheuristic method (Ali and Mohamedkhair 2020) uses 23 operators and 4 ways to choose from 23 operators but evaluates the method only on 9 problem instances. Overall hyper-heuristic methods do not perform well beyond a few hundred cities and a few thousand items since the search space becomes very large.

4 TTP search framework

As we see from the TTP literature, existing TTP methods have very little to no explicit coordination between selection decisions made for cyclic tours and collection plans. In this paper, we propose coordination based methods for TTP. Our proposed approaches for TSPC select moves that explore cyclic tours and collection plans in a coordinated fashion and explicitly based on their potential mutual effects. Also, our proposed approach for KPC selects marginally profitable items to explore collection plans with respect to the cyclic tour selected earlier. We embed our coordination based approaches within 2OPT and along with BitFlip operators to be used in exploring cyclic tours and collection plans. Our proposed coordination based approaches thus improve the effectiveness of the search for TTP solutions.

Note that our proposed approaches could also be viewed as cooperative approaches since our algorithm also does move between cyclic tour exploration and collection plan exploration in an interleaving fashion. Moreover, our proposed approaches for TSPC are also like full-encoding methods since they make changes to both cyclic tours and collection plans at the same time.

figure a

Algorithms 1 and 2 describe the TTP search framework that we use in evaluating our proposed coordination based approaches. The search framework is similar to the cooperational coevolution approach (Potter and De Jong 1994; El Yafrani and Ahiod 2018). It has three main functions: TTPS, TSPS, and KPS. The framework allows customisation of its various parts to facilitate development of TTP search methods with or without coordination.

Below we list the abbreviations used in the proposed search framework.

TTPS::

The main TTP search function in Algorithm 1

TSPS::

The TSP component search function in Algorithm 2

KPS::

The KP component search function in Algorithm 2

NOCH::

No coordination heuristic in Sect. 4.4

SGCH::

Search guided coordination heuristic in Sect. 5.2

PGCH::

Profit guided coordination heuristic in Sect. 5.4

CISH::

Coordinated item selection heuristic in Sect. 5.5

IPR::

Item profitability ratio defined in Sect. 5.3

LCIPR::

The lowest collected item profitability ratio defined in Sect. 5.3

HUIPR::

The highest uncollected item profitability ratio defined in Sect. 5.3

SBFS::

Standard bit-flip search in Sect. 4.4

MBFS::

Marginal bit-flip search in Sect. 5.5

NLBC::

Non-linear binary classifer in Sect. 5.6

LGCH::

Learning guided coordination heuristic in Sect. 5.6

4.1 Function TTPS

Function TTPS in Algorithm 1 has two loops, one inside another. The outer loop runs for a given timeout limit. Each iteration of the outer loop is a restarting of the search from scratch. Inside the outer loop, first an initial cyclic tour t and an initial collection plan p for t are generated. Function ChainedLinKernTour generates the initial cyclic tour using Chained Lin-Kernighan heuristic (Applegate et al. 2003). Then, Function InitCollectionPlan generates the initial collection plan taking the best of the solutions returned by PackIterative (Faulkner et al. 2015) and Insertion (Mei et al. 2014) methods. Once a complete TTP solution \(\langle t, p\rangle \) is thus obtained, the inner loop then refines that solution in an iterative fashion. In each iteration of the inner loop, Functions TSPS and KPS are invoked in an interleaving fashion to improve the cyclic tour and the collection plan. The inner loop terminates when the objective value does not change between two successive iterations.

figure b

4.2 Function TSPS

Function TSPS in Algorithm 2 is a steepest ascent hill-climbing method. Inside the main loop, from the current solution \(\langle t, p\rangle \), a new solution \(\langle t', p'\rangle \) is generated using the neighbourhood operator 2OPT and the coordination function CoordHeu for each tour segment t[be], where \(b \in [1,n-2]\) and \(t_e\) is in the precomputed Delaunay triangulation (Delaunay 1934) neighbourhood DelaTriNeighb array of \(t_{b}\). The best solution among the newly generated solutions that are better than the current solution is accepted as the current solution for the next iteration of the main loop. Note that the main loop of each invocation of the function continues as long as the improvement in the objective value is at least \(\alpha \%\) with respect to the objective value computed at the starting of the loop (Dueck 1993). Here, \(\alpha \) essentially controls when to switch from the TSP component to the KP component. After initial experiments, we set \(\alpha = 0.01\).

Notice that in Function TSPS, after calling the operator 2OPT, there is a calling of the coordination function CoordHeu. We know Operator 2OPT makes changes only to the cyclic tour. When no change in the collection plan is sought after Function 2OPT, Function CoordHeu is defined to be returning just p as \(p'\). However, in this paper, considering coordination between TSP and KP components, we design alternative coordination functions to be used as Function CoordHeu. We later describe the alternative functions.

4.3 Function KPS

Function KPS in Algorithm 2 starts with an initial subset \(I'\) of items selected by Function SelectItemsSubset based on a given tour segment t[be] in a solution \(\langle t, p\rangle \). The loop in Function KPS runs until for all of the items in \(I'\), BitFlip has been applied without any improvement in the objective, since the latest change in the collection plan. In each iteration of the loop, one previously unchecked item i from \(I'\) is randomly checked and \(p_i\) is flipped using BitFlip(pi). The change in \(p_i\) is accepted if it improves the objective. Note that every time a change in p is thus accepted, \(I'\) is computed again by Function SelectItemsSubset and all items in the new \(I'\) are marked unchecked. This in essence restarts the KP search within the same loop. Functions MarkAllItemsUnchecked, AllItemsChecked, RandomUncheckedItem, MarkItemChecked are respectively for marking all items in \(I'\) unchecked, testing whether all items in \(I'\) are checked already, selecting an unchecked item i from \(I'\) randomly, and marking a selected item i as checked. We do not further describe these obvious functions.

For Function SelectItemsSubset, we could typically use all items in the given tour segment t[be], or just some of them. Considering coordination between the TSP and the KP component, in this paper, we later propose strategies to select a subset of items from a tour segment.

4.4 Baseline solver version

Our baseline TTP solver has no explicit coordination between TSP and KP components. As shown in Algorithm 3, for Function CoordHeu, we use Function that just returns p making no change at all, and for Function , we use Function that just returns I(t[be]), i.e. the set of all items available in the tour segment t[be]. For convenience, in discussing the experimental results, we will denote the approach using Function NoCoordHeu by NOCH. Note that Function TSPS with Function NoCoordHeu is almost the same as the method used for solving the TSP component by El Yafrani and Ahiod (2018). Also, note that Function KPS with SelectTourSegmentItems is called the standard bit-flip search (SBFS) (Polyakovskiy et al. 2014; Faulkner et al. 2015) algorithm for solving the KP component in TTP.

figure d

5 Proposed coordination approaches

We give a motivating example to show how coordination helps evaluate a cyclic tour better in TTP. We also characterise Operator 2OPT to find the reasons behind its poor coordination behaviour. We develop our coordination based heuristics for TTP on top of the search framework in Algorithms 1 and 2. We develop three alternative approaches to be used within Function TSPS and one alternative approach to be used within Function KPS. The three coordination approaches to be used to define Function CoordHeu within Function TSPS are local search based, human designed intuitive, and machine learning models. The other coordination approach to be used within Function KPS is a strategy to select items by Function SelectItemsSubset.

5.1 Observing coordination effect after 2OPT

In Function TTPS in Algorithm 1, Function TSPS and Function KPS are invoked in an interleaving fashion. In the baseline algorithm in Sect. 4.4, after Operator 2OPT is called in Function TSPS, Function NoCoordHeu is used for Function CoordHeu. This means no change in collection plan is made after changing the cyclic tour. The example below shows such an approach results in incorrect or misleading evaluations of the TTP solutions by Function TSPS.

Fig. 2
figure 2

From the scenario in Fig. 1 with \(t = \langle 1,2,3,4,5,1\rangle \), \(p = \langle 0,0,1,1\rangle \), and \(N(t,p)= 4\), (Left) only 2OPT is applied on t to get \(t' = \langle 1,4,3,2,5,1\rangle \) and (Right) after 2OPT is applied, p is also changed to \(p'=\langle 1,0,0,1\rangle \)

Consider the TTP example in Fig. 1 and the solution comprising cyclic tour \(t = \langle 1, 2, 3, 4, 5, 1\rangle \) and collection plan \(p = \langle 0,0,1,1\rangle \) having the objective value \(N(t,p) = 4\). When Operator is applied on the cyclic tour t to reverse the tour segment \(\langle 2,3,4\rangle \) to \(\langle 4,3,2\rangle \), the resultant cyclic tour is \(t' = \langle 1,4,3,2,5,1\rangle \). Figure 2 (Left) shows that if the collection plan p is not changed when t changes to \(t'\), the objective value \(N(t',p) = -1.5\) is used to evaluate \(t'\). In this case, there is no explicit coordination between the cyclic tour and the collection plan. Then, Fig. 2 (Right) shows that if the collection plan p is also changed to \(p' = \langle 1,0,0,1\rangle \) after t is changed to \(t'\), the objective value \(N(t',p') = 6\) is used to evaluate \(t'\). There is coordination here between the cyclic tour and the collection plan. This example clearly shows that the potential of a cyclic tour is better reflected when the collection plan is also adjusted with the cyclic tour and so coordination is needed. To have a clearer perspective, with \(N(t',p) = -1.5\), the resultant tour \(t'\) could be easily rejected during search while with \(N(t',p') = 6\), the same resultant tour \(t'\) could be easily accepted. With an interleaving approach of invoking Functions TSPS and KPS, existing TTP methods thus do not properly evaluate generated TTP solutions and thus suffer from not having a proper search direction. In this paper, we argue that for better coordination between two TTP components, the quality of each cyclic tour or the collection plan should be evaluated along with the best possible corresponding collection plan or the cyclic tour and not against only the current collection plan or cyclic tour. Our arguments above are equally applicable to both TTP components. However, in this paper, we mainly evaluate cyclic tours against the best possible collection plans. This is because the 2OPT operator used in Function TSPS can make changes to many cities in large tour segments while BitFlip operator used in Function KPS makes changes to only one item in the collection plan, and we put more emphasis on the large changes. For an operator making huge change in Function KPS, one could also evaluate collection plans against the best possible cyclic tours.

Below we define the quality of a cyclic tour and prove its time complexity.

Definition 9

(Cyclic Tour Quality) The quality \(Q(t) = \max _pN(t,p)\) of a cyclic tour t in TTP is the maximum objective value N(tp) over all possible collection plans p.

Lemma 10

(Cyclic Tour Quality) Computing quality Q(t) for a cyclic tour t in TTP is NP-Hard.

Proof

Computing Q(t) is essentially solving KPC and so is NP-Hard as per Lemma 7. \(\square \)

From the above lemma, it is clear that invoking a complete search for collection plans for each and every tour segment reversal for a given cyclic tour is not feasible within a given timeout limit. So we need an incomplete search or a heuristic.

5.2 Local search based coordination

Since computing Q(t) for t is very hard, as shown above, we want to obtain an estimate of Q(t) for a given t. For this, in this paper, we propose to invoke a local search based incomplete approach. We name our proposed approach as Search Guided Coordination Heuristic (SGCH) for TTP.

SGCH Implementation Algorithm 4 shows the implementation of our proposed SGCH approach on top of the search framework. For Function SelectItemsSubset in Function KPS, we define Function SelectTourSegmentItems to be returning I(t[be]) and for Function CoordHeu in Function TSPS, we define Function SearchGuidedCoordHeu to be returning the collection plan produced by Function KPS by exploring items \(I(t'[b,e])\). Notice that Function KPS is called twice for SGCH with the same definition of Function SelectTourSegmentItems: once in Function TTPS for \(t[1,n-1]\), i.e. for the entire tour, and again in Function SearchGuidedCoordHeu called from Function TSPS for each reversed tour segment \(t'[b,e]\).

figure e

5.3 Characterising 2OPT coordination behaviour

We characterise the coordination behaviour of Operator 2OPT using item profitability ratio (IPR). Below we formally define IPR.

Definition 10

(Item Profitability Ratio) For an item i, the item profitability ratio (IPR) \(r_i = \pi _i/w_i\). An item i is more profitable than item \(i'\), if \(r_i > r_{i'}\), or if \(\pi _i > \pi _{i'}\) when \(r_i = r_{i'}\).

Greedy constructive KP heuristics typically collect items in non-increasing order of IPR. Constructive TTP heuristics also exhibit similar trends. To describe these trends, we need two functions. For a given sequence of numbers and a given position, one of the functions return the smallest number from the beginning up to the given position while the other returns the largest number from the ending down to the given position.

Definition 11

(Prefix Minimum) Given a sequence \(s=\langle s_1, s_2, \ldots , s_n\rangle \) of n numbers \(s_k\) with \(k\in [1,n]\), the prefix minimum function \(\Pi \) is defined by \(\Pi (s, k) = \min (\Pi (s, k-1), s_k)\) when \(1<k\le n\) and \(\Pi (s,1) = s_1\). Using the definition, we also get the prefix minimum sequence \(s'=\Pi (s) = \langle \Pi (s,1), \Pi (s,2), \ldots , \Pi (s,n) \rangle \) for a given sequence s in O(n) time. For example, the prefix minimum sequence \(s'\) is \(\Pi (s)=\langle 9, 6, 6, 4, 4, 4\rangle \) when the given sequence s is \(\langle 9, 6, 8, 4, 5, 7\rangle \).

Definition 12

(Suffix Maximum) Given a sequence \(s=\langle s_1, s_2, \ldots , s_n\rangle \) of n numbers \(s_k\) with \(k\in [1,n]\), the suffix maximum function \(\Omega \) is defined by \(\Omega (s, k) = \max (s_k,\Omega (s, k+1))\) when \(1\le k < n\) and \(\Omega (s,n) = s_n\). Using the definition, we also get the suffix maximum sequence \(s''=\Omega (s) = \langle \Omega (s,1), \Omega (s,2), \ldots , \Omega (s,n) \rangle \) for a given sequence s in O(n) time. For example, the suffix maximum sequence \(s''\) is \(\Omega (s)=\langle 9, 8, 8, 7, 7, 7\rangle \) when the given sequence s is \(\langle 9, 6, 8, 4, 5, 7\rangle \).

5.3.1 IPR Trends in TTP

As is already mentioned above, constructive KP heuristics exhibit a non-increasing trend in IPR. In TTP, items are scattered over cities and item collection order is restricted by city visiting order in the cyclic tour. Therefore, constructive greedy TTP methods such as PackIterative (Faulkner et al. 2015) and Insertion (Mei et al. 2014) use IPRs along with distances of the respective cities from the end of the cyclic tour in constructing the collection plan. So a monotonous non-increasing trend in IPRs of collected items is not expected in TTP solutions. However, given a cyclic tour, within each city, we can reasonably expect items are collected in non-increasing order of IPR, unless there is not enough space for a highly profitable but heavy item. This could be a key guideline to get collection plans in TTP. Below we define the lowest collected IPR (LCIPR) and the highest uncollected IPR (HUIPR) for each city in a TTP solution.

Definition 13

(Lowest Collected IPR) Given a TTP solution \(\langle t, p\rangle \), for a city \(t_k\) at position k in the cyclic tour t, the lowest collected IPR is \(L(t,p,k) = \min _{i \in I(t_k) \wedge p_i = 1} r_i\). We then define a series of LCIPRs as \(L(t,p) = \langle L_1, L_2,\ldots ,L_{n-1}\rangle \) where \(L_k = L(t,p,k)\). Using Definition 11, we further define a prefix minimum function \(\Pi (L(t,p), k)\) and a prefix minimum sequence \(\Pi (L(t,p))\).

Definition 14

(Highest Uncollected IPR) Given a TTP solution \(\langle t, p\rangle \), for a city \(t_k\) at position k in the cyclic tour t, the highest uncollected IPR is \(H(t,p,k) = \max _{i \in I(t_k) \wedge p_i = 0} r_i\). We then define a series of HUIPRs as \(H(t,p) = \langle H_1, H_2,\ldots ,H_{n-1}\rangle \) where \(H_k = H(t,p,k)\). Using Definition 12, we further define a suffix maximum function \(\Omega (H(t,p),k)\) and a suffix maximum sequence \(\Omega (H(t,p))\).

5.3.2 LCIPR and HUIPR Trends in TTP

Figure 3 (Left) shows the LCIPR sequence L(tp) and the HUIPR sequence H(tp) for a TTP solution returned by PackIterative (Faulkner et al. 2015) for a benchmark instance eil76_n750_uncorr_10.ttp. Clearly, L(tp) and H(tp) do not exhibit any monotonous trends. However, L(tp) does exhibit an overall decreasing trend from city positions low to high and H(tp) does exhibit an overall increasing trend from city positions high to low. Notice that the prefix minimum sequence \(\Pi (L(t,p))\) and the suffix maximum sequence \(\Omega (H(t,p))\) in effect capture the two overall trends respectively. Moreover, both of these two trend lines are monotonous, although one is in the forward direction and the other is in the backward direction.

Fig. 3
figure 3

City position in a tour (x-axis) vs IPR (y-axis) for (Left) a TTP solution with objective value 77544.88 as obtained by the PackIterative method for a benchmark instance eil76_n750_uncorr_10.ttp and for (Right) the solution with objective value 72151.46 as obtained after the application of operator 2OPT on the PackIterative generated solution for the same TTP instance on the cities between positions 39 and 74, keeping the collection plan unchanged meaning considering no coordination

5.3.3 Disruptions in Trends by

2OPT

On the TTP solution \(\langle t,p\rangle \) shown in Fig. 3 (Left), if we apply operator to reverse the cities in the tour segment between positions 39 and 74 keeping the collection plan p unchanged, we get a new solution \(\langle t',p\rangle \) that is shown in Fig. 3 (Right). Notice that the 2OPT operator affects the prefix minimum sequence \(\Pi (L(t',p))\) and the suffix maximum sequence \(\Omega (H(t',p))\) in the reversed tour segment \(t'\) for \(39 \le k \le 74\). Further notice that the objective value 72151.46 of the resultant solution \(\langle t',p\rangle \) is smaller than the objective value 77544.88 of the solution \(\langle t,p\rangle \). This means the resultant solutions in such cases would be mostly rejected by the search algorithm. The degradation of the objective value by the 2OPT operator is because in the resultant tour, less profitable items are collected in the cities furthest from the end of the tour causing more travelling time. As shown before in Fig. 3 (Left), this was not the case in the solution before application of the 2OPT operator. So this is somewhat clear that the 2OPT operator results in deviation from the typical trends of \(\Pi (L(t,p))\) and \(\Omega (H(t,p))\) in TTP.

5.4 Human designed intuitive coordination

Although the local search based approach mentioned before is a way to obtain an estimate of quality \(Q(t')\) for a generated cyclic tour \(t'\), invoking the local search method for each generated cyclic tour \(t'\) would be costlier. So in this paper, we design a heuristic approach to obtain a modified collection plan \(p'\) that will be used to get the estimated quality value for the generated cyclic tour \(t'\). We name our proposed approach as Profit Guided Coordination Heuristic (PGCH) for TTP. The proposed approach tries to fix the disruptions in the trends in the resulting collection plan produced by Operator 2OPT.

Fig. 4
figure 4

City positions in a tour (x-axis) vs IPR (y-axis) when items in the solution shown in Fig. 3 (Right) are picked or unpicked. Top-Left: green shaded regions denote collected items that should be unpicked. Top-Right: green shaded collected items in the Top-Left figure are now unpicked. Bottom-Left: blue shaded regions denote uncollected items that should be picked. Bottom-Right: blue shaded uncollected items in the Bottom-Left figure are picked

5.4.1 Fixing Trends after 2OPT using PGCH

After applying 2OPT as shown in Fig. 3 (Right), we fix the deviations of trends in the resultant \(\Pi (L(t',p))\) and \(\Omega (H(t',p))\) by using \(\Pi (L(t,p))\) and \(\Omega (H(t,p))\) from Fig. 3 (Left) as references. As per PGCH, at any city position k in the reversed tour segment, a collected item i having IPR \(r_i\) below \(\Pi (L(t,p),k)\) should be unpicked. Such less profitable collected items are shown as green shaded regions in Fig. 4 (Top-Left) and the result of unpicking such less profitable items and so a changed collection plan \({\bar{p}}\) are shown in Fig. 4 (Top-Right). Further, as per PGCH, at any city at position k in the reversed tour segment, an uncollected item i having IPR \(r_i\) above \(\Omega (H(t,p),k)\) should be picked. Such more profitable uncollected items in the changed collection plan \({\bar{p}}\) in Fig. 4 (Top-Right) are shown as blue shaded regions in Fig. 4 (Bottom-Left) and the result of picking such more profitable items and so a further changed collection plan \(p'\) are shown in Fig. 4 (Bottom-Right). Nevertheless, Fig. 4 (Top-Right) and (Bottom-Right) altogether show that such unpicking and picking of the items in the shaded regions would fix the trends of \(\Pi (L(t',p))\) and \(\Omega (H(t',p))\) by changing the collection plan p to \({\bar{p}}\) first and then ultimately to \(p'\) and thus having \(\Pi (L(t',p'))\) and \(\Omega (H(t',p'))\). Figure 5 (Left) and (Right) respectively show the solutions \(\langle t',p\rangle \) and \(\langle t',p'\rangle \) with respective objective values 72151.46 and 78252.18 while the objective value for \(\langle t,p\rangle \) is 77544.88. So the solution \(\langle t',p'\rangle \) could be accepted by the search while \(\langle t',p\rangle \) could be rejected.

5.4.2 Overall Comments on PGCH

While the above example shows PGCH helps improve the objective value, this is in general not true. This is because, as mentioned earlier, finding the best collection plan \(p'\) to compute the quality value \(Q(t')\) of the generated cyclic tour \(t'\) is eventually an NP-Hard problem, whereas PGCH is just an approximation heuristic. In fact, PGCH might result into a decrease in the objective value when (i) distances between cities, not just positions as in PGCH, also affect the objective value, (ii) all low profitable items in earlier positions might not necessarily be unpicked, (iii) not enough high profitable items are available in the later positions in the reversed tour segment, or they may not have been picked by PGCH. Regardless of underestimation or overestimation of the objective value, the above positive example is just to make a point that PGCH after 2OPT helps better evaluate the potential of a changed cyclic tour \(t'\) than what just 2OPT alone does.

Fig. 5
figure 5

City positions in a tour (x-axis) vs IPR (y-axis) (Left) when Operator 2OPT has just been applied to t keeping p unchanged and (Right) when p is changed using PGCH after applying Operator 2OPT. The Left figure is the same as Fig. 3 (Right) and the Right figure is the same as Fig. 4 (Bottom-Right) but cities outside the reversed segment and the prefix minimum of LCIPR are also shown. The objective 0value for the Left solution is 72151.46 and that for the Right solution is 78252.18 while that for the solution in Fig. 3 (Left) before applying 2OPT is 77544.88

5.4.3 PGCH Implementation

Algorithm 5 shows the implementation of our PGCH approach. In the first loop, collected items that have IPR below \(\Pi (L(t,p,k))\) are unpicked and in the second loop, uncollected items that have IPR above \(\Omega (H(t,p,k))\) are picked.

figure f

Lemma 11

(PGCH) Given a TTP solution \(\langle t', p\rangle \) obtained after applying Operator on solution \(\langle t,p\rangle \), Algorithm 5 computes \(p'\) and so a new solution \(\langle t',p'\rangle \). Computing \(p'\) and then \(N(t',p')\) takes \(O(n - b + |I(t'[b,e])|)\) time in total.

Proof For each application of Algorithm 5, the two loops run for \(O(|I(t'[b,e])|)\) times. Then, to compute \(N(t',p')\), as an approximation of \(Q(t')\), for each city from positions b to \(n-1\) in \(t'\), we need to compute the knapsack weight and the travelling speed, which needs \(O(n-b)\) time. Thus, PGCH and computation of \(N(t',p')\) takes \(O(n-b + |I(t'[b,e])|)\) time in total. \(\square \)

Note PGCH implementation requires computation of \(\Pi (L(t,p))\) and \(\Omega (H(t,p))\) for the current solution \(\langle t,p\rangle \) in the beginning of each iteration of the main loop in Function TSPS. Below we show the time complexity of this.

Lemma 12

(Trend Lines) Computing \(\Pi (L(t,p))\) and \(\Omega (H(t,p))\) for any solution \(\langle t,p\rangle \) requires \(O(n+m)\) time.

Proof Based on Definitions 11 and 12, computing these sequences for any solution \(\langle t,p\rangle \) needs considering all items in O(m) time and considering all cities in O(n) time. So, the total needed time is \(O(n+m)\). \(\square \)

5.5 Coordination based item selection

In Function KPS when called from TTPS in Algorithm 1, Function is by default defined by Function SelectTourSegmentItems that returns all items in \(I(t[1,n-1])\), i.e. all items in I, in an uncoordinated fashion. As mentioned before, it is called the standard bit-flip search (SBFS). However, SBFS leads to an unguided exploration of the collection plans. In this paper, we present a targeted form of bit-flip search and name it marginal bit-flip search (MBFS). MBFS restricts the items to be explored using the cyclic tour in a coordinated fashion. We call our proposed approach for selection of the items to be explored as Coordinated Item Selection Heuristic (CISH) and we define as such.

Before going into further details, let us define marginally collected and uncollected items in a given tour segment for a given TTP solution. The marginally collected items have the lowest collected IPRs at the cities where the prefix minimum sequence changes as we move from low positions to the high positions in the cyclic tour. Similarly, the marginally uncollected items have the higest uncollected IPRs at the cities where the suffix maximum sequence changes as we move from high positions to low positions.

Definition 15

(Marginally Collected Item) Given a TTP solution \(\langle t,p\rangle \) and a tour segment t[be], an item \(i \in I(t[b,e])\) is a marginally collected item if there exists \(k: i \in I(t_k)\) such that \(r_i = L(t,p,k) = \Pi (L(t,p),k)\) and there exists no \(k': b\le k'< k\) such that \(L(t,p,k') = L(t,p,k)\).

Definition 16

(Marginally Uncollected Item) Given a TTP solution \(\langle t,p\rangle \) and a tour segment t[be], an item \(i \in I(t[b,e])\) is a marginally uncollected item if there exists \(k: i \in I(t_k)\) such that \(r_i = H(t,p,k) = \Omega (H(t,p),k)\) and there exists no \(k': k < k' \le e\) such that \(H(t,p,k') = H(t,p,k)\).

Our CISH approach, in case of using MBFS in Function KPS considers unpicking only marginally collected items and picking only marginally uncollected items. Algorithm 6 shows Function , that implements the CISH approach. This function returns at most one arbitrarily selected marginally collected item and at most one arbitrarily selected marginally uncollected item from each city in the tour segment t[be], even though multiple marginally collected items or multiple marginally uncollected items could exist in a city.

figure g

We now analyse the time complexity of applying Operator BitFlip in case of using MBFS in Function KPS on a marginally collected or uncollected item and subsequently update the prefix minimum and suffix maximum sequences.

Lemma 13

Applying the BitFlip operator on a marginally collected or uncollected item requires \(O(n+m)\) time to recompute \(\Pi (L(t,p))\) and \(\Omega (H(t,p))\) and thus update marginally collected and uncollected items.

Proof The proof is obvious from Definitions 11, 12, 15, and 16. \(\square \)

Note that Function SelectMarginalItems could not be used when Function is called from Function SearchGuidedCoordHeu as part of SGCH implementation in Algorithm 4. The reason is after applying Operator 2OPT, calling Function SelectMarginalItems do not find any marginally collected or uncollected items in \(I(t'[b,e])\) since the prefix minimum \(\Pi (L(t',p))\) and suffix maximum \(\Omega (H(t',p))\) sequences, as shown in Fig. 3 (Right), do not exhibit any changes within the reversed tour segment. As such, for SGCH, we could at best use Function SelectTourSegmentItems as is shown in Algorithm 4.

5.6 Machine learning based coordination

As shown in Fig. 3 (Left), the prefix minimum and suffix maximum sequences roughly demarcate collected and uncollected items creating non-linear demarcation lines. As such, from a number of generated example TTP solutions for the same given TTP instance, a properly trained non-linear binary classifier (NLBC) could learn classification of an item as collected or uncollected at a given position of its city in a given cyclic tour and the training could even be online and instance specific. After 2OPT in Function TSPS in Algorithm 2, we then can use the trained NLBC to decide which item in the reversed segment is to be collected and which one is not to be. This essentially replaces the local search based or human designed intuitive coordination with machine learning based coordination. Nevertheless, we name this proposed approach as Learning Guided Coordination Heuristic (LGCH).

Given a typical timeout limit of 10 minutes to solve each problem instance, as is used as standard in evaluation of TTP methods, it is difficult to perform online training of NLBC models within the timeout limit before using them during search for the rest of the left-out time. We still perform instance specific online training within the timeout limit. However, after running preliminary experiments, we keep the learning effort as low as practical and set required parameter values as deemed appropriate. Below we describe the NLBC models, their training procedures, and their use during search.

5.6.1 Training and Validation Examples

For a given problem instance, we generate \(\dfrac{30}{\max _c|I(c)|}\) solutions to be used in training and half of that number of solutions to be used in validation. The training and validation solutions are generated by using Chained Lin-Kernighan heuristic (Applegate et al. 2003) for cyclic tours followed by PackIterative (Faulkner et al. 2015) and Insertion (Mei et al. 2014) for collection plans for the cyclic tours. Keeping the generated cyclic tours unchanged, only the generated collection plans are further improved by running our proposed MBFS algorithm and the improved collection plans are actually used in training and validation of the neural network. In this way, our learning model captures the characteristics of the initialisation and improvement of the collection plan by the MBFS algorithm. Then, we use the learning model to define Function CoordHeu to be used within Function TSPS in Algorithm 2. Nevertheless, the actual input to the NLBC models are the normalised item profitability ratio \(\textsf {nipr}(i)=\dfrac{r_i}{\max _{i'}r_{i'}}\) for an item i and its normalised position \(\textsf {np}(i)=\dfrac{t(l_i)}{n}\) in a cyclic tour t of a TTP solution \(\langle t, p\rangle \). On the other hand the actual output of the NLBC models are \(p_i\) denoting whether an item i is collected or not in the collection plan p of the same TTP solution \(\langle t,p\rangle \). To be more specific, input features \(\textsf {nipr}(i)\) and \(\textsf {np}(i)\) of each item i is fed to the NLBC model at a time and \(p_i\) is predicted for the same item. So the training examples comprise all items in all training solutions. However, each pair \(\langle \textsf {nipr}(i), \textsf {np}(i)\rangle \) can appear in multiple solutions. So we take only unique such pairs from the generated training solutions and use the collection state \(p_i\) with the highest frequency over all the corresponding solutions in training the NLBC models. Conceptually, an NLBC model would make an overall prediction of whether an item should be collected or not when its city is in a certain position in a possible cyclic tour for the given TTP instance.

5.6.2 Neural Networks as NLBCs

We use neural networks to represent NLBCs. For just two inputs \(\textsf {nipr}(i)\) and \(\textsf {np}(i)\) and one output \(p_i\) for any given item i, we could think of simpler statistical models. We choose neural networks because the output for a given i not only does explicitly depend on just \(\textsf {nipr}(i)\) and \(\textsf {np}(i)\) but also implicitly depends on the inputs features for the other i values. In our view, the neural networks through their weights are a promising means to accumulate the implicit dependencies over i values and generalise over problem instances. As for using more input features, we have tried to incorporate distance, but in our preliminary experiments, the city positions appeared to be more promising than the distances of the cities from City 1 in the forward or backward direction. Having discussed this, we acknowledge that further experiments with various machine learning techniques and input features are necessary to make any more meaningful conclusion in this regard. We further emphasise that our main focus in developing LGCH is to show that a machine learning approach could effectively capture the characteristics of our human designed coordination heuristics. Of course better performing machine learning approaches could be developed and we consider that to be out of scope of the paper.

5.6.3 Neural Network Architecture and Training

Figure 6 shows the architecture of the neural network. It has three layers: one input layer, one hidden layer and one output layer. The first two layers have \(\ln m\) neurons each, where m is the number of items. The last layer has only one neuron. We use the rectified linear unit (ReLU) as the activation function in the neurons in the first two layers and the sigmoid activation function in the neuron in the last layer. We use the feed forward neural network architecture in the mlpack C++ library (Curtin et al. 2023) with its default optimiser. We train the same neural network architecture 10 times to get 10 separately trained models for each TTP instance. We then take the best trained model in terms of the number of the correctly classified pairs of \(\langle \textsf {nipr}(i), \textsf {np}(i)\rangle \) for the validation examples. Henceforth, we refer to the best trained neural network as the neural network \({\mathcal {N}}\) and use \({\mathcal {N}}(\textsf {nipr}(i),\textsf {np}(i)) = p_i\) to denote its prediction \(p_i\) made for the pair \(\langle \textsf {nipr}(i),\textsf {np}(i)\rangle \).

Fig. 6
figure 6

The neural network architecture used in representing NLBCs in our proposed LGCH. The architecture has 3 layers with two input features \(\textsf {nipr}(i) = \dfrac{r_i}{\max _{i'}r_{i'}}\) and \(\textsf {np}(i) = \dfrac{t(l_i)}{n}\) and one output \(p_i\). The first two layers have \(\ln (m)\) neurons, where m is the number of items

5.6.4 LGCH Implementation Using Neural Network Predictions

For Function CoordHeu in Function TSPS in Algorithm 2, we define Function to be returning \(p'\) where \(p'_i = {\mathcal {N}}(\textsf {nipr}(i),\textsf {np}(i))\) for \(i \in I(t'[b,e])\) and \(p'_i = p_i\) for \(i\not \in I(t'[b,e])\). Note that because of the knapsack constraint, the precise implementation, as shown in Algorithm 7, needs unpicking of all items followed by picking of the items predicted to be collected in the reversed tour segment.

figure i
figure j

5.6.5 Reusing Neural Network Predictions

Since for an item, the neural network \({\mathcal {N}}\) only needs \(\textsf {nipr}(i)\) and \(\textsf {np}(i)\), we can actually make predictions for all items and all positions beforehand and store them. This would certainly save the time required to recompute the predictions for the same items and the same positions over and over again. In fact, our preliminary experiment shows that recomputation of the predictions becomes costlier since each call of \({\mathcal {N}}\) is arguably compute intensive. However, a straightforward approach to store all predictions for all items for all positions require O(nm) memory and more importantly needs O(nm) calls of the costlier computation of \({\mathcal {N}}\). In this paper, we propose an alternative strategy to store only one profitability ratio for each position and thus taking only O(n) memory and \(O(n\log _2 m)\) calls of \({\mathcal {N}}\). The idea is to store the profitability ratio, called the boundary profitability ratio (BPR) that approximately demarcates the collected items from the uncollected items in a given position. The idea is again based on the previously mentioned key guideline for TTP that at a given position, more profitable items are more likely to be collected. The notions of lowest collected IPR and highest uncollected IPR are relevant in this context. However, instead of two such IPRs, we rather use one BPR in this case. Nevertheless, Algorithm 8 shows our implementation of computing BPR for each position. In Function computeBPRs in Algorithm 8, we first sort profitability ratios of all items in a non-decreasing order and store only unique values in increasing order in \({\mathcal {R}}\). Then, for each position k, we store in \({\mathcal {B}}[k]\) the value returned by Function  that runs binary search to find the profitability ratio below which items are not collected at position k. Next, we redefine Function to be returning \(p'\) where for \(i \in I(t'[b,e])\), \(p'_i = 1\) when \(r_i \ge {\mathcal {B}}[t'(l_i)]\) and \(p'_i = 0\) when \(r_i < {\mathcal {B}}[t'(l_i)]\), and for \(i \not \in I(t'[b,e])\), \(p'_i = p_i\). Note that because of the knapsack constraint, the precise implementation, as shown in Algorithm 8, needs unpicking of all items followed by picking of the items predicted to be collected in the reversed tour segment. Also, note that after computing BPRs in \({\mathcal {B}}\), we no longer need the neural network \({\mathcal {N}}\) and BPRs are sufficient for the purpose of our machine learning guided coordination heuristic.

After computing BPRs using the trained neural network only to replace the same neural network with the computed BPRs, the same question could come again whether we could use a simpler machine learning model. Considering the scope of this work, we leave the quest for finding a better machine learning model for future. However, we make a particular note that an explicit metric to determine the BPRs is not known to us and we have just relied on the implicit power of a neural network for this.

6 Experiments

We describe the benchmark instances that we use in our experiments. We also discuss the experiment settings and evaluation metrics. Then, we compare various versions of our proposed solver. Finally, we compare our proposed solver with existing state-of-the-art TTP solvers.

6.1 Benchamrk TTP instances

TTP solvers are typically evaluated using the benchmark instances introduced by Polyakovskiy et al. (2014). Each TTP benchmark instance has been generated based on the following things:

  • A symmetric TSP instance with 51 to 85,900 cities as taken from TSPLIB (Reinelt 1991). While generating the benchmark instances, the number of cities has been used in determining the total number of items.

  • A set I(c) of 1, 3, 5 or 10 items for each city c. So each TTP instance has \(m = (n-1) \times |I(c)|\) items. Note that \(\max _cI(c)\) is used in generating training solutions for LGCH in Sect. 5.6.

  • Weights and profits of all items are (i) bounded and strongly correlated, or (ii) uncorrelated but weights are similar for all items, or (iii) fully uncorrelated.

  • A knapsack with a weight capacity indicator ranging from 1 to 10, where larger indicator means larger knapsack capacity (not the knapsack capacity itself) (Polyakovskiy et al. 2014).

Note that the exact TTP instances used in our experiments are downloaded from https://cs.adelaide.edu.au/~optlog/CEC2014COMP_InstancesNew/. These instances have from 76 to 33,810 cities and from 75 to 338,090 items with the knapsack capacity from 5780 to 153,960,049 unit of weight. These instances are divided into 3 categories (El Yafrani and Ahiod 2018, 2016). Below we briefly describe the three categories.

  • CatA: The knapsack weight capacity is relatively small. There is only one item in each city. The weights and profits of the items are bounded and strongly correlated.

  • CatB: The knapsack weight capacity is moderate. There are 5 items in each city. The weights and profits of the items are uncorrelated. The weights of all items are similar.

  • CatC: The knapsack weight capacity is high. There are 10 items in each city. The weights and profits of the items are uncorrelated.

As shown below, there are 20 TTP instances in each of the above three categories. The TTP instance names in each category are based on the names of the same TSP instances that are used in generating the TTP instances. For each TSP instance, three TTP instances are generated for three categories, just by changing the item distribution as discussed above in the description of the categories. Notice that the numbers of cities appear in the names of the instances. Depending on the categories, the numbers of items are various multiples of the numbers of cities.

figure l

We will analyse the performance of the solvers on each category, but for overall analyses, we also use all 60 instances from the three categories altogether. In the charts, unless mentioned otherwise, performance on the instances will be plotted in the order CatA, CaB, CatC of categories and within each category in the order of the instances as shown above. Notice that the order of instances in this way within each category is roughly in the order of their sizes.

6.2 Experimental settings

We run each solver version on each TTP instance 10 times, each time with a standard timeout of 10 minutes. For each run in all experiments, we ensure a new initial cyclic tour is generated using the Chained Lin-Kernighan heuristic (Applegate et al. 2003) whenever an initial cyclic tour is needed in each run or in each restart in a run. We run all experiments on the high performance computing cluster Gowonda with a 2 GB memory limit and an Intel Xeon CPU X5650 running at 2.66 GHz on each machine.

To measure performance differences across solvers, we use the relative deviation index (RDI) (Kim and Kim 1996) for each solver on each TTP instance. RDI for a given solver on a given TTP instance is defined as \(\dfrac{N_\textsf {mean} - N_\textsf {min}}{N_\textsf {max} - N_\textsf {min}}\times 100\) where \(N_\textsf {max}\) and \(N_\textsf {min}\) are respectively the maximum and minimum N(tp) over all runs over all solver versions that we run for the respective experiment and \(N_\textsf {mean}\) is the mean over all 10 runs of the same solver. Note that the larger the RDI value of a solver version, the better its performance. While we use RDI values to present our main results, we do include in the appendix \(N_\textsf {max}\), \(N_\textsf {min}\), and \(N_\textsf {mean}\) along with \(N_\textsf {stddev}\), and \(N_\textsf {median}\) for each solver for each instance where \(N_\textsf {stddev}\) and \(N_\textsf {median}\) are the standard deviation and median of N(tp) values over the 10 runs of the solver on the instance.

We use Wilcoxon Signed Rank Test with 95% confidence interval and also 95% Confidence Interval plots to show the significance of differences in the performances of various solvers and versions.

We use line charts to compare instance specific performances of various solvers. The line charts have the problem instances on the x-axis. The problem instances are sorted on the number of cities within each category. We have noted before that the number of items in each instance depends on the number of cities. Nevertheless, it is in general difficult to find a well-justified order of the instances in terms of hardness even when the numbers of cities and items increase in TTP and as such we do not intend to find any obvious trend. Given that no trend is intended among the problem instances, one could think of using bar charts in such cases. We do not use bar charts because with large numbers of data points, the bodies of the bars make extracting information from the peaks of the bars difficult by matching the same type bars.

6.3 Comparison of proposed solver versions

In Algorithm 2, in Function TSPS, we have four ways to define Function CoordHeu: NoCoordHeu, SearchGuidedCoordHeu, ProfitGuidedCoordHeu, and LearningGuidedCoordHeu, which are respectively denoted by NOCH, SGCH, PGCH, and LGCH. Further, in Algorithm 2, in Function TSPS, Function KPS can be run in two ways: standard bit-flip search and marginal bit-flip search, which are respectively denoted by SBFS and MBFS. Note that MBFS uses coordinated item selection heuristic (CISH) to limit BitFlip operators only on the marginal items. So we will denote our proposed solver version by \(X+Y\), where \(X \in \{\text {NOCH}, \text {SGCH}, \text {PGCH}, \text {LGCH}\}\) and \(Y \in \{\text {SBFS}, \text {MBFS}\}\). For example, NOCH+SBFS denotes a solver version having NOCH and SBFS, and is the baseline version as described in Sect. 4.4.

6.3.1 Overall effectiveness of MBFS approach

Figure 7 shows that NOCH+MBFS outperforms NOCH+SBFS and PGCH+MBFS outperforms NOCH+SBFS. The differences are clear in the large instances in all three categories albeit some mixed performances by PGCH+MBFS and PGCH+SBFS in small CatB and CatC instances and in CatA instances that all have comparatively small number of items. Exploring only marginally collected and uncollected items using CISH inside MBFS approach allows more focused exploration and more efficient utilisation of the limited time budget. In the instances having fewer items, the restriction however excessively reduces the search space and narrows down the chance of finding better solutions due to lack of diversity. Nevertheless, we compute p-values of Wilcoxon Signed Rank Test on the RDI values of all 60 instances. The p-value for NOCH+MBFS and NOCH+SBFS is 0.00001 while that for PGCH+MBFS and PGCH+SBFS is 0.0012. So at 95% confidence level, we conclude our MBFS approach statistically significantly improves the performance over SBFS.

Fig. 7
figure 7

RDI values obtained (y-axis) on problem instances (x-axis) by various versions of our proposed solver to show the effectiveness of MBFS over SBFS and PGCH over NOCH, and also the interaction of MBFS and PGCH

6.3.2 Overall effectiveness of PGCH approach

Figure 7 shows that overall PGCH+SBFS outperforms NOCH+SBFS and PGCH+MBFS outperforms NOCH+MBFS. The differences are very clear and large in almost all instances in all three categories. We compute p-values of Wilcoxon Signed Rank Test on the RDI values of all 60 instances. The p-value for PGCH+SBFS and NOCH+SBFS is 0.00001 and that for PGCH+MBFS and NOCH+MBFS is also 0.00001. So at 95% confidence level, we conclude our PGCH approach statistically very significantly improves the performance over the NOCH approach.

6.3.3 Learning details of LGCH approach

Table 1 shows the learning details of the LGCH approach. The training times in the table include the time spent in generation of training and validation solutions, sorting and selecting unique \(\langle \textsf {nipr}(i), \textsf {np}(i) \rangle \) pairs from generated solutions, training 10 neural networks, and finally computing boundary profitability ratios (BPRs) to be used in Algorithm 8. Given the timeout of 10 minutes for each TTP instance, notice that the maximum training time needed is about 4 min and is in the largest CatA instance. Within the same category, training time increases with the increase of the problem size. For the same TSP instance, the training time decreases from CatA to CatB to CatC. This is because to keep the number of input \(\langle \textsf {nipr}(i), \textsf {np}(i) \rangle \) pairs to the neural network almost the same for all three categories, we generate more training and testing solutions in CatA than in CatB and CatC (30, 6 and 3 solutions for training and 15, 3 and 2 solutions for validation in CatA, CatB, and CatC respectively). In the table, we also show the percentage of unique \(\langle \textsf {nipr}(i), \textsf {np}(i) \rangle \) pairs with respect to the total number of pairs found in the example collection plans. As problem size increases, the percentage of unique pairs arguably increases. Nevertheless, the average accuracy values of the neural networks for the training and validation \(\langle \textsf {nipr}(i), \textsf {np}(i) \rangle \) pairs are very high (above 95%). The mean validation accuracy over the 60 instances is very slightly better than the mean training accuracy. The p-value of the Wilcoxon Signed Rank Test is 0.0455 for the training and the validation accuracy values and so the difference is still statistically significant at 95% confidence level.

Table 1 Training time in seconds, % of unique pairs among all pairs \(\langle \textsf {nipr}(i), \textsf {np}(i) \rangle \) in training and validation solutions, % average training accuracy, and % average validation accuracy over the 10 neural networks trained in LGCH

6.3.4 Overall comparison of PGCH, SGCH, and LGCH

In Fig. 8, we compare RDI values obtained by PGCH, SGCH, LGCH, and NOCH on all 60 instances from three categories. For these solver versions, we use MBFS since it has already been shown to be better than SBFS. To be clearer, the solver versions compared are respectively PGCH+MBFS, SGCH+MBFS, LGCH+MBFS, and NOCH+MBFS. We see that PGCH and LGCH make huge improvement over NOCH. However, SGCH performs worse than NOCH. The reason is running KPS for every tour segment reversal, even when KPS is restricted only to the reversed segment, takes huge time and consequently within a given timeout of 10 min, not much of the TTP search space is explored. Note that we include SGCH in this comparison mainly to show that a simple local search based coordination approach does not work well in TTP. Nevertheless, among other heuristics, PGCH appears to be performing slightly better than LGCH. With the p-value of 0.00782 of Wilcoxon Signed Rank Test, the difference in the performances of PGCH and LGCH is also statistically significant at 95% confidence level. This result is very interesting as we can see that the machine learning based algorithm LGCH has learnt almost up to the level of the human designed intuitive heuristic PGCH.

Fig. 8
figure 8

RDI values obtained (y-axis) on problem instances (x-axis) by various versions of our proposed solver to show the comparison of SGCH, PGCH, LGCH, and NOCH when MBFS is used with all of them

Figure 9 shows the numbers of restarts in Function TTPS in Algorithm 1 in each instance when SGCH, PGCH, LGCH, and NOCH are used along with MBFS. We see that SGCH performs the least numbers of restarts since it spends huge time in running KPS for each tour segment reversal in TSPS. The low numbers of restarts also indicate low diversity in terms of the search space exploration. Notice that PGCH and LGCH performs very similar numbers of restarts in all instances. The numbers of restarts performed by NOCH are very similar to those performed by PGCH or LGCH in small instances in each category, but is quite larger in large instances. NOCH is arguably faster than PGCH or LGCH and so help explore more of the search space by restarting more number of times. However, with a poor evaluation of the generated cyclic tours, NOCH eventually does not result into better RDI values.

Fig. 9
figure 9

Numbers of restarts (y-axis) in Function TTPS in Algorithm 1 in problem instances (x-axis) by various versions of our proposed solver to show the comparison of SGCH, PGCH, LGCH, and NOCH when MBFS is used with them

6.4 Further analysis of PGCH and LGCH over NOCH

Fig. 10
figure 10

Mean of relative lengths (y-axis) of tour segments reversed by 2OPT and accepted by the search algorithm in Function TSPS over 10 runs when NOCH, PGCH, and LGCH are used along with MBFS on problem instances (x-axis)

Fig. 11
figure 11

Mean of numbers of accepted application of 2OPT per restarts in Function TTPS over 10 runs when NOCH, PGCH, and LGCH are used along with MBFS on problem instances (x-axis)

To investigate the huge difference in the performance of PGCH and LGCH from NOCH, we observe the reverse tour segments generated, evaluated, and accepted during search. Figure 10 shows the mean relative lengths \(\dfrac{|t[b,e]|}{n}\times 100\) of the tour segments reversed by 2OPT and accepted by the search algorithm in Function TSPS over 10 runs when used with NOCH, PGCH and LGCH along with MBFS. Moreover, Fig. 11 shows the mean numbers of the tour segments of which mean lengths have been shown in Fig. 10. From these two figures, we see that the use of PGCH and LGCH has resulted in the acceptance of notably larger tour segments reversed by 2OPT operator and also in larger numbers than what NOCH has resulted in. In the absence of a coordination heuristic, as shown in Fig. 5, the quality values of the cyclic tours produced by 2OPT are not estimated properly and thus the reversed tour segments are rejected by the search algorithm. Arguably, this happens even at an worsened level when reversed tour segments are large in sizes and more in numbers. In contrast, when a coordination heuristic such as PGCH or LGCH is used, the quality values of the reversed tour segments are more properly estimated and as we see from Figs. 10 and 11, larger reversed tour segments are accepted in larger numbers and thus we have obtained higher objective values at the end. This explains the advantage of PGCH and LGCH over NOCH. Notice that both in Figs. 10 and 11, PGCH and LGCH are very close in most TTP instances, except in Fig. 10 in CatA instances. However, CatA instances have only one item in each city. As such picking or unpicking the only available item in each city by mistake as a classification error of the neural network is harder to compensate than the classification errors in CatB and CatC.

Fig. 12
figure 12

Mean objective gains (y-axis) \(G_\textsf {TSP}\) by TSPS per restarts in TTPS over 10 runs of Algorithm 1, when NOCH, PGCH, and LGCH are used along with MBFS on problem instances (x-axis)

In Algorithm 1 in Function TTPS, we compute the objective values \(N_\textsf {BS}\) and \(N_\textsf {TSP}\) respectively before and after running TSPS. We then compute means of objective gains \(G_\textsf {TSP} = \dfrac{N_\textsf {TSP} - N_\textsf {BS}}{N_\textsf {BS}} \times 100\) over all iterations of the outer loop in TTPS over all 10 runs of Algorithm 1 for each instance. The mean objective gains are shown in Fig. 12. We see that, in \(G_\textsf {TSP}\), in most cases, PGCH is better than LGCH, which is better than NOCH. The performance difference of NOCH from that of PGCH or LGCH is arguably huge in CatB and CatC instances. This is explainable as PGCH or LGCH is targeted to improve evaluation of the cyclic tours produced by 2OPT and the better evaluation results in accepting longer and more tour segment reversals and hence better \(G_\textsf {TSP}\) values.

6.5 Comparison with existing TTP solvers

We compare our proposed MBFS with a simulated annealing method in terms of the performance improvement in the KP component while we use PGCH with both. We then compare our PGCH+MBFS and LGCH+MBFS solvers with other existing state-of-the-art TTP methods.

6.5.1 Comparison of MBFS with simulated annealing search

We compare our hill-climbing based MBFS algorithm with a simulated annealing search (SAS) algorithm (El Yafrani and Ahiod 2018) for the KP component of TTP. The SAS algorithm defines Function to be called in Function TTPS in Algorithm 1. For Function TSPS, we use PGCH in this case with both MBFS and SAS. We compute means of objective gains \(G_\textsf {KP} = \dfrac{N_\textsf {KP} - N_\textsf {TSP}}{N_\textsf {TSP}} \times 100\) over all iterations of the outer loop in TTPS over all 10 runs of Algorithm 1 for each instance. Figure 13 shows that MBFS is very slightly better than SAS in mean \(G_\textsf {KP}\) values. However, the difference is statistically not significant with p-value 0.27572 of Wilcoxon Signed Rank Test at 95% confidence level.

Fig. 13
figure 13

Mean objective gains (y-axis) \(G_\textsf {KP}\) over all iterations of the outer loop of TTPS over 10 runs of Algorithm 1, when MBFS and SAS are used along with PGCH on problem instances (x-axis)

Fig. 14
figure 14

RDI values obtained (y-axis) on problem instances (x-axis) when MBFS and SAS are used along with PGCH

Fig. 15
figure 15

Numbers of restarts (y-axis) on problem instances (x-axis) when MBFS and SAS are used along with PGCH

Interestingly, as per Fig. 14, RDI values obtained by using MBFS are significantly higher than those obtained by using SAS. The p-value of the Wilcoxon Signed Rank Test is 0.00001. To understand this apparent anomaly, in Fig. 15, we compare numbers of restarts i.e. the numbers of iterations the outer loop in Function TTPS in Algorithm 1 runs with MBFS or SAS, along with PGCH in TSPS of course. We see that MBFS leads to a huge numbers of restarts compared to what SAS leads to. This indicates that via more restarts, MBFS leads to greater diversity and eventually better RDI values while SAS spends time in the simulated annealing process and does not get good RDI values. We further reason that with targeted search, MBFS converges quickly to local optima and thus resorts to restarts more often while SAS solely depends on diminishing probabilities of accepting worse solutions and thus get out of local optima. Notice that the numbers of restarts get lower with the increase in the problem size. This is because in large problems, arguably only fewer or even no restarts could take place within a limited timeout of 10 minutes.

Table 2 Comparison of RDI values obtained by our proposed CoCoP and CoCoL solvers and those obtained by existing state-of-the-art TTP solvers such as MATLS, S5, and CS2SA*

6.5.2 Comparison with MATLS, S5, and CS2SA* solvers

We name our final TTP solver as Cooperative Coordination (CoCo) and based on the experimental results presented so far, we obtain two CoCo versions. These two versions are PGCH+MBFS and LGCH+MBFS, and for the rest of the paper, we respectively name them as CoCoP and CoCoL.

We compare our CoCoP and CoCoL solvers with three existing state-of-the-art TTP solvers such as MATLS (Mei et al. 2014), S5 (Faulkner et al. 2015) and CS2SA* (El Yafrani and Ahiod 2018). CS2SA* is selected because our TTP search framework in Algorithms 1 and 2 is similar to its cooperational coevoluation approach. MATLS and S5 are selected due to their salient performance reported by Wagner et al. (2018). The source code for CS2SA* and MATLS has been obtained from the corresponding authors. We have reconstructed S5 ourselves and S5 does not have any parameters to be tuned.

CS2SA* and Recent Descendants

After CS2SA* (El Yafrani and Ahiod 2018), two further TTP methods (Maity and Das 2020; Zhang et al. 2021) have been reported. We note our important observations about these methods, and discuss why and how we compare and not compare our proposed methods with these three methods.

  • CS2SA* (El Yafrani and Ahiod 2018): It is reported by Wuijts and Thierens (2019) that CS2SA* and its precursors incorrectly present the objective values by taking the rounded values of the distances between cities. This is different from the definition of TTP benchmark instances (Polyakovskiy et al. 2014). As such this makes CS2SA* incomparable with other TTP methods. Wagner et al. (2018) reports more issues with the precursor of CS2SA*. Further to these, while investigating the source code of CS2SA*, we have observed that it uses the same stored high quality TSP tour in each run and mainly focuses on improving the collection plan. This partially explains why its precursor (El Yafrani and Ahiod 2016) somewhat misleadingly concludes that the KP component of the TTP is more critical compared to the TSP component for optimisation while our effort in the TSP component shows otherwise. Nevertheless, using the same TSP tour in each run of a TTP method does not conform to the standard practice in empirical evaluation of methods that have stochasticity in decision making. For a fair comparison in this paper, when we run CS2SA* in our experiments, we compute the objective values correctly and also use different TSP tour in each run.

  • A CS2SA* descendant (Maity and Das 2020): This method follows the same incomparable empirical evaluation style of CS2SA* (El Yafrani and Ahiod 2018). This method more explicitly shows that it is a fixed tour method. Moreover, its evaluation is based on only 9 benchmark instances. As such, we do not compare our proposed TTP solvers with this method.

  • Another CS2SA* descendant (Zhang et al. 2021): This method follows the same incomparable experiment setup as CS2SA* (El Yafrani and Ahiod 2018) does. Unfortunately, its source code is not available. Moreover, while making an attempt to reconstruct this method and to run as we do with CS2SA*, we could not find necessary details in its corresponding published article. The pseudocode is unclear and appears to have issues that include (i) by definition item scores cannot be negative but pseudocode has conditions on that, (ii) the loop does not terminate unless the knapsack is full but practically it might be partially filled, and (iii) items are sorted by their scores but are picked mainly in the order of the cities. As a result of all these, we do not compare our proposed TTP solvers with this method.

Table 2 shows the RDI values obtained by CoCoP, CoCoL, MATLS, S5, and CS2SA* solvers. From the table, we see that CoCoP performs better than CoCoL. Both CoCoP and CoCoL outperform the other three solvers in almost all problem instances in all three categories. Moreover, S5 performs the third best but with a big difference with CoCoP and CoCoL while CS2SA* is the worst performer. The 95% confidence interval plots of the RDI values in Fig. 16 also shows the statistical significance of the performance differences. More specifically, the p-value for Wilcoxon Signed Rank Test on the RDI values obtained by CoCoP and S5 is 0.00001 and by CoCoL and S5 is also the same. These indicate very highly significant differences. The overlapping intervals of CoCoP and CoCoL shows that their performance difference is statistically not significant. Nevertheless, Tables 6, 7, and 8 in the appendix provides further details on the objective values obtained by various solvers.

Fig. 16
figure 16

95% confidence intervals for our proposed CoCo solver and existing state-of-the-art TTP solvers such as MATLS, S5, CS2SA*, and CoCo. Overlapping confidence intervals mean the performance differences are not significant

Fig. 17
figure 17

Sample changes in best objectives (y-axis) in each second (x-axis) by the best performing three solvers S5, CoCoP and CoCoL on CatC pla33810 instance. For better visual representation, plotted values are actually the maximum objective value obtained by any of the three solvers minus the objective value obtained by the respective solvers at the respective timepoints. Moreover, the logarithmic scale in the y-axis is used. So the lower the better in the chart although TTP is by definition a maximisation problem

Figure 17 shows that in sample runs of S5, CoCoP and CoCoL on CatC pla33810 instance, CoCoP makes good progress before getting into the flat region. CoCoL shows a better trend than S5 but is worse than CoCoP. The difference in CoCoP and CoCoL is that CoCoL’s search relies on the pattern learnt from its training solutions which are arguably not very high quality and so its prediction does not help much when already further better solutions are found over time.

Table 3 shows the performances of the three best solvers S5, CoCoP, and CoCoL, when the timeout is 1 h instead of the standard of 10 min. We see that the three solvers perform similarly with the longer timeout as they do with the shorter timeout. This shows the consistency of their performance over the time horizon.

Table 3 Comparison of RDI values obtained by S5, CoCoP, and CoCoL solvers when 1-hour timeout is used instead of standard 10-min timeout; all other settings remain the same

6.5.3 Comparison with a recent solver MEA2P

We compare our proposed best performing CoCoP solver with a recent TTP solver named MEA2P (Wuijts and Thierens 2019). MEA2P is a steady state Memetic algorithm with Edge-Assembly (Nagata 2006) and Two-Points crossover (EAX) operators. Like a number of other solvers (Wagner 2016; Mei et al. 2015; Martins et al. 2017; El Yafrani et al. 2018), MEA2P is targeted to solve small TTP instances. For its initial population, MEA2P generates 50 solutions, each with a random cyclic tour and an empty collection plan. Then, in each of its 2500 iterations, MEA2P generate a new solution by combining two randomly selected solutions using the edge-assembly crossover operator (Nagata 2006) on the cyclic tours and the two-point crossover operator on the collection plans. The initial solutions and the subsequently generated combined solutions are improved using a local search method that uses 2OPT (Croes 1958), node insertion (Faulkner et al. 2015), bit-flip (Polyakovskiy et al. 2014; Faulkner et al. 2015) and item exchange (Mei et al. 2016) moves in an interleaving fashion.

MEA2P demands heavy computation time particularly in large problems. Therefore, for a meaningful comparison, instead of running for 10 minutes, we run both MEA2P and CoCoP with a termination criterion of 2500 restarts for each TTP instance. Also, we use only the 8 small instances from each of the three categories. For large instances MEA2P takes hours and days. As we see, this experiment setting is different from the settings in other earlier experiments presented in this paper.

Table 4 Comparison of average execution times and RDI values of MEA2P and CoCoP on 8 instances in each category
Table 5 Best objective values obtained by CoCo variants and other algorithms, each running for 10 min on each instance in each of the three categories

Table 4 shows the average execution times and the RDI values obtained by MEA2P and CoCoP on 8 small instances. Moreover, Tables 9, 10, and 11 in the appendix provides further details on the execution times and the objective values obtained by the two solvers. Nevertheless, from these tables, we see that MEA2P runs in the scale of hours and days while CoCoP runs in the scale of seconds and minutes. Overall, MEA2P takes a number of times the execution time of CoCoP. In RDI values, MEA2P achieves very good performance in small instances while CoCoP achieves so in large instances. We further investigate the reasons behind such performance. MEA2P is a population based algorithm that tries to maintain diversity by starting from random solutions, keeping a number of solutions in its population, and using combination operators. So in small instances, MEA2P affords the time to explore the search space to a large extent and obtains better objective values. However, CoCoP is a single-solution based search algorithm that depends on Chained Lin-Kernighan (CLK) heuristic (Applegate et al. 2003) for initial cyclic tours, and PackIterative (Faulkner et al. 2015) and Insertion (Mei et al. 2014) methods for initial collection plans. So the greater diversity needs to come from the search restart or from the initial solution generators. In Table 4 Column 2 (title “% Unique CLK Init Solutions”), we show the relative unique initial cyclic tours found by the CLK heuristic. These numbers essentially help us explain that CoCoP performs better when CLK generates large numbers of unique initial cyclic tours, which is more usual in large instances than in small ones.

Comments on a Recent Method by Nikfarjam et al. (2022)

For convenience, we use NNN to refer to the recent TTP method by (Nikfarjam et al. 2022). Upon careful consideration, we do not compare our method with NNN. There is considerable overlap between NNN and MEA2P (Wuijts and Thierens 2019), and as such a comparison against NNN appears redundant. Furthermore, the results obtained by NNN do not appear to have compelling advantages. The detailed reasons are further discussed below.

  1. 1.

    NNN and MEA2P are both evolutionary algorithms. Both use EAX crossover operators on tours to generate neighbour TTP solutions. The only difference between the two methods is that NNN keeps the current generation in a structured form while MEA2P uses a flat one-dimensional form.

  2. 2.

    Inexplicably, NNN neither provides comparisons with the most relevant MEA2P method nor does it cite MEA2P, even though the two methods are ostensibly very similar. Moreover, NNN uses problem instances that are mostly different from what MEA2P uses. Looking at the common instances, MEA2P performs better on a280 instances, while NNN performs better on eil51 instances. Based on the presented results, it is unclear whether NNN is actually better than MEA2P. In this paper, we have already shown MEA2P performs better than our proposed method in small instances, but our proposed method is better in large instances.

  3. 3.

    NNN uses dynamic programming for KP but only for tiny instances with at most 280 cities. In contrast, our benchmark instances have the numbers of cities in the range of 76 to 33810. For larger instances (maximum 4461 cities), NNN uses a bitflip local search method instead of dynamic programming. This indicates that dynamic programming does not scale up in large problem instances. Indeed, the paper on NNN also states that.

6.5.4 Best objective values obtained

Table 5 shows the best objective values obtained by the CoCo variants against those obtained by other existing solvers when running for 10 min. The best objective values for other solvers are obtained from the results in Sect. 6.5.2, from the results reported by Wuijts and Thierens (2019), and also the results reported by Wagner et al. (2018) (excluding the results of CS2SA solver (El Yafrani and Ahiod 2016) due to a faulty evaluation in it as reported by Wuijts and Thierens (2019)). Notice that CoCo variants obtain new best results in most and large problem instances.

7 Conclusion

A travelling thief problem (TTP) has profitable items scattered over cities and a thief rents a knapsack and performs a cyclic tour to collect some items and thus maximises the profit while minimises the travelling time and so the renting cost of the knapsack. Thus a TTP has two components: one component is like the travelling salesman problem (TSP) and the other component is like the knapsack problem (KP). TTP is computationally NP-Hard since both TSP and KP are NP-Hard. TTP is a proxy to many real-world problems such as waste collection and mail delivery.

TTP research has made significant progress lately. However, most existing TTP methods do not explicitly exploit the mutual dependency of the two components and thus lack proper coordination. In this paper, we show first that a simple local search based coordination approach does not work in TTP. We then propose one coordination heuristic for changing collection plans during cyclic tour exploration and another for explicitly exploiting cyclic tours during collection plan exploration. We further propose a machine learning based coordination heuristic that captures characteristics of the human designed coordination heuristics. Our proposed coordination based approaches help our TTP solver explore better TTP solutions within given timeout limit. Consequently our proposed solver named Cooperation Coordination (CoCo) significantly outperforms existing state-of-the-art TTP solvers on a set of benchmark problems. CoCo is available from https://github.com/majid75/CoCo.