1 Introduction

Hub-and-spoke operation is now an almost dominant style of operation both in telecommunication and transportation. The hubs serve as consolidating, switching, and sorting centers. In transportation and logistics, the hub-and-spoke structures are present in almost all modes of transport (cross-docks in national level, major airports, and container ports in continental and global levels). Hubs are facilities at which arriving flows of commodities in smaller volume originated from the spokes are consolidated, sorted and re-distributed (repartitioned) in a larger volume to be transported on fewer highly utilized links and sent to another hub that is either the final destination or is a transshipment point where the spoke final destination is allocated to. The main motivation for deploying such a flow network structure is in exploiting economies of scale (in terms of time and/or cost) in transporting higher volumes on fewer significantly more efficient corridors (connecting hubs).

Given a graph G(VA) where V is the set of nodes as origins and destinations and A is the set of all possible arcs, the Hub Location Problem (HLP) seeks a subset of nodes in V as hub nodes. Subsequently, each spoke (non-hub) node in V is allocated to one or more hubs. In classical models, two main classes are distinguished Alumur and Kara (2008): (1) Single Allocation: wherein each spoke node is assigned to exactly one hub node, and (2) Multiple Allocation: offering the possibility of allocating a spoke node to more than one hub node. Moreover, one may also distinguish between capacitated and un-capacitated variants. Some authors also distinguish between the cases where cardinality of the set of selected hub nodes is defined exogenously and when it is an endogenous part of the problem. This topic has been one of the very active areas of research in the past 2 decades.

1.1 Problem description

Given a graph G, a fixed integer p, as the number of hub nodes to be installed, \(\Gamma\) as the minimum number of spoke nodes to be allocated to each hub node, C as the capacity of each feeder route, \(w_{ij}\) as the demand of flow to be transferred/transported from i to j and \(t_{ij}\) as the cost of traveling between every two nodes i and j, which is assumed, in this study, as the time needed to travel between nodes i and j. We also introduce \(d_i\) as the load of transporter after leaving node i on a spoke-level route (the load at arrival minus the total demand of i from every other node plus the total supply of i to every other node). We seek to construct from the scratch a hub-and-spoke network, which consists of a complete hub-level subgraph with p hub nodes and p spoke-level directed cyclic routes, each of which composed of a hub node and all the spoke nodes allocated to it. The hub-level network is a complete subgraph where pair of connected hubs have a bidirectional link. The objective function of our problem is minimizing the total transit time composed of transportation time plus transhipment time in the whole network, provided that the total volume of flow on a spoke-level arc does not violate the capacity of the transporter on it, given a homogenous fleet of transporters.

This problem is a special case of the Bounded Cardinality Capacitated Hub Routing Problem (BCCHRP) proposed in Gelareh et al. (2015) in which the lower and upper bounds on the number of hub nodes coincide. We refer to this problem as Capacitated Single-Allocation p-Hub Location Routing Problem (CSApHLRP).

Figure 1 represents a solution for an instance with \(|V|=n=19\) nodes and \(p=4\) fully interconnected hub nodes. Nodes 2, 3, 8, and 13 represent hubs, while all the remaining nodes are spoke nodes. One observes that each hub together with spoke nodes allocated to it forms a direct cycle. In this solution, four cycles are defined as follows: (i) \(2\rightarrow 1\rightarrow 0\rightarrow 2\), (ii) \(3\rightarrow 4\rightarrow 5\rightarrow 6\rightarrow 7\rightarrow 3\), (iii) \(8\rightarrow 9\rightarrow 10\rightarrow 11\rightarrow 12\rightarrow 8\) and, (iv) \(13\rightarrow 14\rightarrow 15\rightarrow 16\rightarrow 17\rightarrow 18\rightarrow 13\).

Fig. 1
figure 1

A solution to a network with 4 hubs and 19 nodes

1.2 Literature review

In this work, we are focusing on a problem that fits within the category of capacitated single-allocation hub location problems (CSAHLP). In the following, we review some of the related works in the literature with a particular emphasis on the single-allocation scheme and exogenous number of hubs in modeling and (meta)heuristic approaches in solution methods. The references brought here are sorted by publication year.

Rabbani and Kazemi (2015) proposed a genetic algorithm and a simulated annealing for solving an un-capacitated multiple allocation p-hub center problem.

Simulated annealing (SA) and iterated local search (ILS) were proposed by Zarandi et al. (2015) to solve single-allocation hub median problem.

The un-capacitated p-Hub location problem wherein each spoke node should be connected to exactly r hubs (where r is a predefined fixed number) is studied by Peiró et al. (2014). The authors proposed a GRASP algorithm to solve the problem.

Rodríguez-Martín et al. (2014) proposed a Mixed Integer Programming (MIP) formulation and a branch-and-cut algorithm for the hub location and routing problem. In this study, the single-allocation scheme is considered where the capacity constraint is defined in terms of number of spokes per route, and the number of hub nodes is considered as a fixed parameter.

A Multiple Ant Colony Optimization (MACO) algorithm is proposed by Ting and Chen (2013) to solve the capacitated single-allocation hub location problem. In this study, the problem is decomposed into two subproblems: (1) the facility location problem and (2) the multiple depot vehicle routing problem. The two subproblems are treated together within MACO where collaboration of colonies is operated by interchanging information through pheromone renewing the chosen location and costumer assignment.

Gelareh et al. (2013) proposed an MIP formulation for the hub-and-spoke network design problem and fleet deployment of liner shipping, to minimize weighted sum of transit times while considering fixed deployment costs. The proposed MIP is based on a 4-index formulation of HLPs and allocated spokes forming directed cycles (the so-called string) to hubs. Note that, in this work, the fleet of vessels defining the capacity of strings are to be determined from a given heterogenous fleet.

Another VNS is proposed by Jarboui et al. (2013) for the un-capacitated single-allocation hub location problem.

Kratica et al. (2012) proposed a genetic algorithm (GA) for solving an un-capacitated multiple allocation hub location problem. In this study, binary encoding and genetic operators are employed, while the run time is optimized using a caching technique.

Ilić et al. (2010) investigate the Un-capacitated single-allocation p-hub median problem. In this work, a Variable Neighborhood Search (VNS) is considered where a variable neighborhood descent (VND)-based local search using three different neighborhood structures is proposed.

Another simulated annealing for single-allocation capacitated hub location problem was proposed by Yu et al. (2010), given that routes and hubs were constrained by capacity.

Correia et al. (2009) proposed an MIP formulation for the single-allocation hub location problem where capacity on hubs is not an exogenous value, but it is a variable with discrete choices of value. Several variants of this problem have been studied in the literature, such as capacitated and un-capacitated for the single-allocation problem (Correia et al. 2009), and capacitated flow splitting for the multiple allocation one (Campbell et al. 2007).

Randall (2008) proposed an ant colony meta-heuristic optimization algorithm to solve the capacitated single-allocation hub location problem. In this work, four variations of ant colony optimisation meta-heuristic are developed to explore different construction modeling choices.

A heuristic based on (GA) is proposed by Stanimirovic (2008) to solve un-capacitated multiple allocation p-hub median problem. This GA uses binary representation of the solutions. A mutation operator with frozen bits and a caching technique has been employed to solve the problem.

As a relevant work, Ge et al. (2007) investigate the Un-capacitated Single-Allocation p-Hub Location problem. In this study, the authors aim at locating p hub nodes in the network and allocating non-hub nodes to the hubs under the single-allocation scheme where the main objective is to minimize the transportation cost.

Ebery et al. (2000) proposed an MIP formulation and a heuristic for capacitated multiple allocation hub location problem. They hybrid method that combines evolutionary algorithm (EA) with a branch-and-bound algorithm (B&B) to solve a capacitated multiple allocation hub location problem. In this work, the proposed EA selects the set of hubs, while B&B continues with the allocation part of the problem by allocating the non-hubs to the located hubs.

A two-phase tabu search algorithm considering location as the first phase and routing as the second phase with short-term memory is investigated by Tuzun and Burke (1999), to solve capacitated hub location problem considering single-allocation scheme.

Abdinnour-Helm (1998) developed a hybrid method based on combination of genetic algorithm and tabu search to solve an un-capacitated hub location problem with single-allocation scheme. A Genetic Algorithm is used to choose hubs, while a Tabu Search is used to assign spokes to the located hubs.

An application of the p-median Problem is addressed by Rolland et al. (1997). They proposed a tabu search algorithm employing short-term and long-term memory technics, as well as a strategic oscillation for random tabu list sizes.

Klincewicz (1992) studied the p-hub location problem, and to avoid local optima in p-hub location problem, they proposed two meta-heuristics: a tabu search and a GRASP algorithm. The objective of this work is to minimize the total costs of sending traffic over the links. In the proposed two algorithms, local search procedure is based on the two-exchange neighborhood structure.

The work presented in this paper has some similarities with the work presented in Rodríguez-Martín et al. (2014). However, our work is distinguished from Rodríguez-Martín et al. (2014), as follows: in Rodríguez-Martín et al. (2014), the objective function is only to minimize the total transportation cost, while in here, we also considered time as the transshipment cost. In the former, the spoke-level routes are undirected, while we consider direction for the network, such that flow on the spoke-level route circulates in only one direction. Finally, in Rodríguez-Martín et al. (2014), the capacity is defined as the maximum number of spokes along a spoke-level route (the number of physical ports on a machine, as a constraint in telecommunication), while we define it as the volume of flow circulating on a spoke-level route that needs to respect the capacity of transporters available on it.

Our work is also distinguished from the work presented in Gelareh et al. (2015), by the fact that we do not consider any upper/lower bound on the number of hubs. Rather, we assume a fixed cardinally (p) as an exogenous parameter.

To the best of our knowledge, so far, the non-exact methods applied to solve variants of hub location problem were either a meta-heuristic algorithm or a classic heuristic algorithm.

Table 1 summarizes some key features in closely related contributions.

Table 1 A summary of main elements of the relevant contributions in the literature

1.3 Motivation, scope, and contribution

This work has initially been inspired from an application in the distribution chain of pharmaceutical products. Assume we have a chain of consumers in a number of different pharmacies, while a company is to deliver their demand. Local pharmacies are able to store a small amount of products which are more demanded; however, this usually happens that they receive a new prescription which is not available in their stock. Thus, they need to order it and the maximum delay to deliver the ordered prescription needs to be short (for formal situations, no more than a day). What the pharmacies need per day is usually in a reasonably moderate amount, and consequently, a truck is enough to serve quite a few number of pharmacies in a tour every morning. The aim is at designing a service network meeting the criteria while optimizing the time and resource consumption.

This work contributes to the state-of-the-arts as follows: In addition to presenting a mathematical model and identifying several classes of valid inequalities and the respective separation routines, we propose a tailored relax-and-cut algorithm based on the Lagrangian relaxation technique. The primal/dual information (such as Lagrangian multipliers and reduced cost of spoke-level variables), which are accumulated during the iterations of solving Lagrangian dual, are exploited to guide the heuristic algorithms of our hyper-heuristic framework. The proposed hyper-heuristic algorithm is comprised of a portfolio of low-level heuristics (some of which are using the Lagrangian relaxation information) and a heuristic selection method that learns—in the course of process—how to dynamically replace the selected heuristic among the existing heuristics aiming at reaching a higher likelihood of success and improvement. The latter is, in fact, a reinforcement learning method that has been inspired from the concept of association rules in the business data-mining world. We are unaware of any similar work in the literature and, to the best of our knowledge, this is the first work tackling a variant of hub location problems using a hyper-heuristic approach (that, in particular, exploits Lagrangian relaxation information and data-mining knowledge).

The remainder of this paper is organized as follows: In Sect. 2, we present the mathematical model based on the model presented in Gelareh et al. (2015). Then, Sect. 3 is divided into three parts. In the first part, a Lagrangian relaxation algorithm and several classes of valid inequalities are proposed for the problem. In the second part of this section, a hyper-heuristic solution approach and its components exploiting information of Lagrangian dual are presented. In the third part, the overall algorithm and some separation routines for the identified valid inequalities are proposed to be used in a relax-and-cut algorithm. Computational experiments are reported in Sect. 4. Finally, in Sect. 5, we conclude our work and present the possible future work.

2 Mathematical formulation of the CSApHLRP

A 2-index model (cardinality of the set of decision variables is of \(o(n^2)\)) is proposed in Gelareh et al. (2015) for the Bounded Cardinality Capacitated Hub Routing Problem (BCCHRP). Bearing in mind the idea of the proposed model for the BCCHRP, here, in this paper, we are going to model the CSApHLRP. The variables and parameters required to model this problem are introduced in Tables 2 and 3.

Table 2 Model parameters
Table 3 Decision variables

(CSApHLRP)

$$\begin{aligned} \min \,&\sum _{i,j,k,l} \left( t_{kl}(s_{ijkl}+{\alpha x_{ijkl}})\right) + \sum _{i,j,k,l:(k\ne i \vee l\ne j) }(\varphi ^k + \varphi ^l)x_{ijkl} \end{aligned}$$
(1)
$$\begin{aligned} s.\,t.\,\nonumber \\&q \le \sum _k z_{kk} \le p \end{aligned}$$
(2)
$$\begin{aligned}&\sum _l z_{kl} =1 \quad \forall k \in V \end{aligned}$$
(3)
$$\begin{aligned}&z_{ik} \le z_{kk} \quad \forall i,k\in V: k\ne i \end{aligned}$$
(4)
$$\begin{aligned}&\sum _{i} z_{ik} \ge \Gamma z_{kk} \quad \forall k \in V \end{aligned}$$
(5)
$$\begin{aligned}&\sum _{j\ne i}r_{ij} = 1 \quad \forall i \in V \end{aligned}$$
(6)
$$\begin{aligned}&\sum _{j\ne i}r_{ji} = 1 \quad \forall i \in V \end{aligned}$$
(7)
$$\begin{aligned}&r_{ij}+r_{ji} \le 2 - z_{ik} - z_{jl} \quad \forall i,j,k,l\in V:j\ne i, \quad k\ne l \end{aligned}$$
(8)
$$\begin{aligned}&r_{ij}+r_{ji} \le 1 \quad \forall i,j \in V:j\ne i, \end{aligned}$$
(9)
$$\begin{aligned}&\sum _{k \ne i} (x_{ijik} + s_{ijik}) = 1, \quad \forall i,j\in V:j\ne i, \end{aligned}$$
(10)
$$\begin{aligned}&\sum _{l \ne j} (x_{ijlj} + s_{ijlj}) = 1, \quad \forall i,j\in V:j\ne i, \end{aligned}$$
(11)
$$\begin{aligned}&\sum _{l \ne i,k} (x_{ijkl} + s_{ijkl}) = \sum _{l \ne j,k} (x_{ijlk} + s_{ijlk}), \quad \forall i,j,k \in V,\, k \not \in \{ i,j,\}, \end{aligned}$$
(12)
$$\begin{aligned}&\sum _{l\ne k} x_{ijkl} \le z_{kk} \quad \forall i,j,k \in V:j\ne i, k<l \end{aligned}$$
(13)
$$\begin{aligned}&\sum _{l\ne k} x_{ijlk} \le z_{kk} \quad \forall i,j,k \in V:j\ne i, k<l \end{aligned}$$
(14)
$$\begin{aligned}&s_{ijkl} \le r_{kl} \quad \forall i,j,k,l \in V:l\ne k \end{aligned}$$
(15)
$$\begin{aligned}&\sum _{ijl:j\ne i }w_{ij}s_{ijkl} \le C^v, \quad \forall k \in V \end{aligned}$$
(16)
$$\begin{aligned}&r\in {\mathbb {B}}^{|V|^2}, \quad z \in {\mathbb {B}}^{|V|\times |V|}, \quad x_{ijkl}, s_{ijkl} \in \in {\mathbb {R}}^{|V|^4}_{[0,1]}. \end{aligned}$$
(17)

The objective function (1) is comprised of two parts; the first accounts for the total transportation time and the second part is the transshipment time. The transportation part is composed of travel time on the hub-level edges as well as the spoke-level arcs. The travel time on the hub-level edges is discounted by \(\alpha\), because the transporters offer faster services. The transhipment time is accounted for the hub nodes along every O-D path.

Constraints (2) restrict the number of installed hubs, while constraints (3) ensure that every node is assigned to one and only one hub. A self allocation at i shows that i is a hub (i.e., \(z_{ii}=1\)). Constraints (4) indicate that a spoke node can only be assigned to a hub node. However, constraints (5) are to make sure that at least \(\Gamma\) nodes are allocated to every designated hub node. Constraints (6), (7) are to guarantee that a single spoke-level arc leaves (arrives at) every spoke node. Constraints (8) are to avoid having spoke-level arcs between spoke nodes which are allocated to different hub nodes. Constraints (9) ensure that a spoke-level arc can exist only in one direction. Constraints (10)–(12) correspond to flow conservation for every O-D pair. Constraints (13) and (14) are to ensure that traversing an edge composed of two hubs is equivalent to traversing a hub edge connecting those two hubs. Constraints (15) guarantee that the flow on each feeder network travel in the direction of existing arcs. Constraints (16) are to ensure that flow traversing every feeder network is constrained to the capacity of the vehicle deployed on it. This is given as a parameter.

To ensure that the above mathematical model represents correctly the real characteristics of the problem, a set of sub-tour elimination constraints will be added to the model (will be explained in detail in proposition Proposition 1 to guarantee that there will be no sub-tour appears in any feasible solution.

3 Solution algorithm

The proposed solution algorithm is a lower/upper bound solution method. In this algorithm, the lower bounds are calculated using a Lagrangian relaxation-based relax-and-cut approach allowing development of reduced costs of the spoke-level design variables as a by-product of iterations of solving Lagrangian dual problem. The choice of this relaxation (from among other possibilities) is based on a trade-off between the quality of bounds and the computational efficiency of solving the Lagrangian dual problem. The primal (approximate primal solution) and dual information (Lagrangian multipliers) are used to guide different heuristic algorithms that are components of our hyper-heuristic framework.

3.1 Lagrangian relaxation

Lagrangian relaxation for solving (mixed) integer programming problems was first proposed in Geoffrion (1974), Geoffrion and Bride (1978) and later in Fisher (1981, 2004). The idea behind this method is to relax complicating constraints by penalizing the objective function upon violation of these constraints. The relaxed problem normally has a special structure and can be solved more efficiently comparing the original problem (see Guignard 2003 for a comprehensive survey of the method.).

Three well-known methods are commonly practiced in the literature for solving the Lagrangian dual problem. The oldest and most well-known one is the subgradient method as an iterative method for solving convex minimization problems. The subgradient method was originally proposed in the 60s in the former Soviet Union. Almost at the same time, a very similar method has been proposed in Held and Karp (1971) for solving traveling salesman problem. Later, Lemarechal (1975) proposed the well-known bundle methods as an extension to the simple subgradient. The volume algorithm was later proposed in Barahona and Anbil (2000) as a method which simultaneously produces a primal feasible solution and a dual bound for the problem. A further analysis of volume algorithm and its relation with bundle method has been studied in Bahiense et al. (2002).

Several variants of the subgradient-based methods have been proposed in the literature. Here, based on some observations from our preliminary computational experiments, we opted to use the well-known bundle method of Frangioni (1995) and the corresponding code (publicly available).

We opted to relax constraints (6), (7), (8), (9), and (15) in the Lagrangian fashion. Let \(u^1_{ij}, u^2_{ij}\) and \(u^4_{ij}\) for all \(i, j \in V, i \ne j\), and \(u^3_{ijkl}\) and \(u^5_{ijkl}\) for all \(i, j, k, l \in V\), while \(i \ne j\) and \(k \ne l\) be the Lagrangian multipliers corresponding to (6), (7), (8), (9), and (15) constraints. By doing so, the dual information is going to be retrieved from the coefficients of \(r_{ij}\) and \(z_{ij}\) variables. A solution to such a Lagrangian relaxation, which is a lower bound to the initial model, partially describes the network design and route structures. The dual information can be used to either construct a partially feasible solution or, in general, to guide some of the heuristics in our hyper-heuristic framework. This is called a warm start for a heuristic/huper-heuristic algorithm.

In fact, this relaxation has been chosen among four different scenarios some of which could result in a decomposable subproblem. However, our extensive initial computational experiments find this relaxation as the best trade-off between the quality of bound and computational intractability, among others:

$$\begin{aligned} {\mathrm{(LRX-CSApHLRP)}}\nonumber \\ \min \,&\sum _{i,j,k,l} \left( t_{kl}(s_{ijkl}+{\alpha x_{ijkl}})\right) + \sum _{i,j,k,l:(k\ne i \vee l\ne j) }(\varphi ^k + \varphi ^l)x_{ijkl} \nonumber \\&+\sum _{i,j\ne i} u^1_{ij} \left( \sum _{j\ne i}r_{ij} - 1 \right) + \sum _{i,j\ne i} u^2_{ij} \left( \sum _{j\ne i}r_{ji} - 1 \right) \nonumber \\&+\sum _{ijkl:k\ne l,j\ne i}u^3_{ijkl} \left( r_{ij}+r_{ji} - 2 + z_{ik} + z_{jl} \right) + \sum _{i,j\ne i} u^4_{ij} ( r_{ij}+r_{ji} - 1)\nonumber \\&+\sum _{ijkl,l\ne k} u^5_{ijkl} (s_{ijkl} - r_{kl}) \end{aligned}$$
(18)
$$\begin{aligned} s.\,t.\,\nonumber \\&(2), (3), (4), (5)\nonumber \\&(10), (11), (12)\nonumber \\&(13), (14), (16)\nonumber \\&r\in {\mathbb {B}}^{|V|^2}, \quad z \in {\mathbb {B}}^{|V|\times |V|},\quad x_{ijkl}, s_{ijkl} \in \in {\mathbb {R}}^{|V|^4}_{[0,1]}. \end{aligned}$$
(19)

3.1.1 Valid inequalities for a relax-and-cut

In the proposed Lagrangian relaxation model (LRX-CSApHLRP), all the constraints containing route variables r are relaxed and added to the objective function multiplied by the Lagrangian multipliers. As a consequence, a feasible solution to the resulting network design problem tends to be infeasible (or partially feasible) with respect to the initial model, in many iterations when solving the Lagrangian dual problem. Such infeasibility includes the situation wherein the spoke/feeder nodes allocated to a same hub are forming more than one route; or the cases where a feeder/spoke node allocated to a given hub node lies among the spoke/feeder nodes allocated to one or more other hub nodes.

To circumvent such situations, some valid inequalities are identified and LRX-CSApHLRP is solved by a branch-and-cut process, and the necessary valid inequalities are separated and added upon need. When using a modern MIP solver such as CPLEX, this does not cause efficiency issues, because in each iteration of subgradient algorithm, when LRX-CSApHLRP is solved, only the objective function is being updated and all the separated cuts are retained in the memory (no need to be separated again) if a warm start can carried out (as is the case in the modern MIP solvers such as CPLEX).

For all \(S\subseteq V\), let define \(\delta (S)^+ =\{ (i,j)| i\in S, j\in V/\ S, \}\), \(\delta (S)^- =\{ (i,j)| j\in S, i\in V/\ S\}\) and \(\gamma (S)=\{ (i,j)| i\in S, j\in S, j\ne i\}\). In the following, we present some valid inequalities for the problem.

Proposition 1

A kind of Generalized Sub-tour Elimination Constraints (GSECs) ensures that the cut size in every feeder route in a feasible solution is atleast equal to 1 in a directed graph (and is equal to two in an undirected graph).

Let\(S\subset V\)and\(S'=V{\setminus } S\):

$$\begin{aligned} \sum _{ k\in S, l\in S'} r_{kl} \ge \sum _{j\in S'}z_{ij}, \quad \forall i\in S. \end{aligned}$$
(20)

Proof

Given \(S \subseteq V\). The right-hand side of this inequality represents the total number of spoke nodes in S which are allocated to hubs belonging to \(S'\) that is \((V{\setminus } S)\). Thus, obviously corresponding to every \(i \in S\) allocated to hub \(j \in S'\), we must have at least one spoke edge, such that one of its end-nodes (i) is in S and the other one (j) is in \(V {\setminus } S\). However, we still might have other spoke nodes in \(V {\setminus } S\) that are connected to i. Thus, these inequalities are valid for the \({\mathcal {P}}(CSA\)pHLRP).

Proposition 2

The following inequalities are valid for (CSApHLRP):

$$\begin{aligned} \sum _{i,j\in S: j\ne i} (r_{ij} + r_{ji}) \le |S|. \end{aligned}$$
(21)

Proof

To proof this proposition, let us assume two different cases for every feasible solution:

  1. (1)

    Where all the nodes in S are all placed on a single feeder spoke ring. In other word, all nodes in S are allocated to a same hub.

  2. (2)

    Where S contains nodes on different feeder rings.

In the first case, a number of arcs between all i and j in S are at most equal to |S| and this number is exactly equal to |S| if S contains all the nodes allocated to the hub. This is the case, where the constraint is binding.

In the second case, if a pair ij in S is allocated to different hubs, there must not be any direct arc connecting them. Thus, the constraint is valid.

Proposition 3

The following 2-matching constraints for the main and feeder routes are valid for the CSApHLRP polytope, \({\mathcal {P}}(CSApHLRP)\):

$$\begin{aligned}&\sum _{a\in \gamma (H)} r_a + \sum _{a\in (\delta ^+(T)\cup \delta ^-(T))} r_a \le |H| + \frac{|{\mathcal {T}}|-1}{2} , \end{aligned}$$
(22)

where for all\(H\subset {\mathcal {V}}\)and all\({\mathcal {T}}\subset (\delta ^+(H)\cup \delta ^-(H))\)satisfying the following:

  1. (i)

    \(|\{i,j\}\cap H| = 1\),   \(\forall (i,j)\in {\mathcal {T}}\),

  2. (ii)

    \(\{i,j\}\cap \{k,l\} = \emptyset , \quad \forall (i,j)\ne (k,l)\in {\mathcal {T}}\), and

  3. (iii)

    \(|{\mathcal {T}}|\ge 3\)and odd.

\({\mathcal {T}}\subset {\mathcal {V}}\)is called teeth (with respect to the main route subgraph or the feeder route subgraph, that is, \({\mathcal {T}}^r, H^r,{\mathcal {T}}^y\)and\(H^y\)) and\(H\subset {\mathcal {V}}\)is known as a handle.

Proof

Given that, for \(a = (i,j)\), we note \(r_{ij} = r_a = r(a)\) whenever \(r_{ij} = r_{ji}\) (direction of the arc does not make any change). One knows that, in every feasible solution of CSApHLRP, we have the following:

$$\begin{aligned}&2 r(\gamma (H)) + r(\delta ^+(H))+ r(\delta ^-(H)) = \sum _{i\in H} (r(\delta ^+ (i)) + r(\delta ^- (i))), \end{aligned}$$

and from constraints bound constraints \(r_{ij}\le 1\), one can obtain the following:

$$\begin{aligned}&\sum _{k\in H} \sum _{l\ne k}(r_{kl}+r_{lk}) \le 2 |H|. \end{aligned}$$

Hence

$$\begin{aligned} 2 |H|&\ge 2 r(\gamma (H)) + r(\delta ^+(H))+ r(\delta ^-(H)) \nonumber \\&= 2 r(\gamma (H)) + r(\delta ^+(H/\ {\mathcal {T}}) + r(\delta ^+({\mathcal {T}}))+r(\delta ^-(H/\ {\mathcal {T}})) + r(\delta ^-({\mathcal {T}}))). \end{aligned}$$
(23)

Given that \(r(\delta ^+({\mathcal {T}})\cup \delta ^-({\mathcal {T}}))\le |{\mathcal {T}}|\) deduced from the bound constraints \(r_a\le 1\), we add this set of constraints to (23) and we obtain the following:

$$\begin{aligned} 2 |H| + |{\mathcal {T}}|&\ge 2 r(\gamma (H)) + r(\delta ^+(H/{\mathcal {T}}) + r(\delta ^-(H/{\mathcal {T}})) + 2r(\delta ^+({\mathcal {T}}))+ 2r(\delta ^-({\mathcal {T}}))) \nonumber \\&\ge 2 r(\gamma (H)) +2r(\delta ^+({\mathcal {T}}))+2r(\delta ^-({\mathcal {T}})). \end{aligned}$$
(24)

Again, given that \(|{\mathcal {T}}|\) is an odd number, by multiplying both sides by \(\frac{1}{2}\), we conclude the following:

$$\begin{aligned} r(\gamma (H)) + r(\delta ^+({\mathcal {T}}) + r(\delta ^-({\mathcal {T}})) )\le&|H| +\frac{|{\mathcal {T}}|-1}{2} \end{aligned}$$
(25)

and the proof is complete.

Proposition 4

The following comb inequalities for both main and feeder networks are valid for\({\mathcal {P}}(CSApHLRP)\):

$$\begin{aligned}&\sum _{a\in A(H)}r(a) + \sum _{j=1}^{t} r(A(T_j)) \le |H| + \sum _{j=1}^{t} |T_j| - \frac{3t+1}{2}, \end{aligned}$$
(26)

where for all\(S\subset {\mathcal {V}}\)and all\(T_i\subset {\mathcal {V}},~\forall i\in \{1,\dots , t\}\)satisfying the following:

  1. (i)

    \(H,T_1,T_2,\dots ,T_t \subseteq {\mathcal {V}}\),

  2. (ii)

    \(T_j {\setminus } H \ne \emptyset ,~~~\forall j\in \{1,\dots , t\}\),

  3. (iii)

    \(T_j \cap H \ne \emptyset ,~~~\forall j\in \{1,\dots , t\}\),

  4. (iv)

    \(T_i \cap T_j = \emptyset ,~~~\forall j\in \{1,\dots , t\}\), and

  5. (v)

    \(t\ge 3\) and odd.

\(T_1,T_2,\dots ,T_t\)are called teeth (with respect to the main subgraph or the feeder route subgraph, i.e., \(T_i^r, H_i^r,T_i^y\)and\(H^y,~\forall i\in \{1,\dots , t\}\)) and\(H\subset {\mathcal {V}}\)is a handle.

Proof

For \(S\subset {\mathcal {V}}\), in every feasible solution of CSApHLRP we have the following:

$$\begin{aligned}&r(\gamma (H)) = |H| - \frac{\delta ^+(S)+\delta ^-(S)}{2}, \end{aligned}$$
(27)

and for every \(T_j\subset {\mathcal {V}},~ j\in \{1,\dots ,t\}\), we have the following:

$$\begin{aligned}&r(\gamma (T_j)) = |T_j| - \frac{\delta ^+(T_j)+\delta ^-(T_j)}{2}. \end{aligned}$$
(28)

Furthermore, from (27) and (28), one yields the following:

$$\begin{aligned} r(\gamma (H)) + \sum _i r(\gamma (T_i))&= |H| + \sum _j|T_j| \nonumber \\&\quad - \frac{1}{2} \left( (\delta ^+(H)+\delta ^-(H)) + \sum _j (\delta ^+(T_j)+\delta ^-(T_j))\right) . \end{aligned}$$
(29)

Let \(\delta _i^+(H)\cup \delta _i^-(H)\) denote the cut set associated with arcs having one end-point in \(H\cap T_i\) and another end-point outside H. We know that \(\delta ^+(H)+\delta ^-(H) \ge \sum _{i}(\delta _i^+(H)+\delta _i^-(H))\).

It can be easily shown that \((\delta _i^+(H)+\delta _i^-(H))+(\delta ^+(T_i)+\delta ^-(T_i))\ge 3\). We also know that \(\delta ^+(H)+\delta ^-(H)\) and \(\delta ^+(T_i)+\delta ^-(T_i)\) are even numbers and, therefore, \(\delta ^+(H)+\delta ^-(H)+\sum _i(\delta ^+(T_i)+\delta ^-(T_i)) \ge 3t+1\). By substituting in (29), the proof is complete.

3.2 Hyper-heuristic for CSApHLRP

Hyper-heuristics may be seen as a generalization of (meta-)heuristics and an easier way for categorize a large body of literature of heuristics and meta-heuristics that was rather difficult to classify previously. A significant part of the optimization literature is devoted to the use of a given Meta-heuristic (or a hybrid scheme) to solve the problem and often a comparison among a bunch of such methods is presented in different articles. According to Burke et al. (2009), hyper-heuristic is a search methodology or learning mechanism to select or generate heuristics to solve a specific combinatorial problem, while, in this work, we deal with the class of heuristic selection hyper-heuristics (the notion of heuristics to choose heuristics in the context of combinatorial optimisation). Cowling et al. (2001) focuses on the strategies for systematic alternation among different low-level heuristics in different steps of the search (e.g., supported by learning mechanisms). This kind of hyper-heuristic is considered as a high-level approach that, given a problem and a number of low-level heuristics, can select and apply a suitable low-level heuristic at each iteration (Burke et al. 2003).

Using a combination of an operator (low-level heuristic) selection vector and a solution acceptance criterion, a hyper-heuristic must offer a ’good’ solution (when exists), within an unseen low-level problem domain (Marshall et al. 2015). Figure 2 depicts a general structure of a hyper-heuristic.

It must be noted that in a meta-heuristic, one conducts a search in the space of feasible solutions, while, in a hyper-heuristic, the search is carried out over the space of heuristic algorithms (Burke et al. 2003). Our goal here is to select the best series (not necessarily a sequence) of (meta-)heuristics to be applied to obtain high-quality solution(s).

Fig. 2
figure 2

The general scheme of a selection-type hyper-heuristic framework

Hyper-heuristics have been successfully applied to solve various problems in routing and scheduling (see Danach et al. 2015a, b; Monemi et al. 2015; Marshall et al. 2014; Garrido and Riff 2010; Garrido and Castro 2012), scheduling (see Ahmed et al. 2015; Zheng et al. 2015; Aron et al. 2015), and bin packing (see Sim et al. 2015; Beyaz et al. 2015; Ross et al. 2002) among others.

Özcan et al. (2012) categorizes such methods as in the following: (i) Purely Random selection that consists of randomly selecting a low-level heuristic from among a set of existing heuristics (see also Özcan and Kheiri 2012), (ii) Random Descent selection that randomly selects a low-level-heuristic and applies it until the solution in hand is improved (see Kendall and Mohamad 2004 as an example of such a selection strategy). (iii) Random Permutation Descent selection that randomly selects a set of low-level-heuristics, and applies them in a circular manner. It then applies the first heuristic until a worsening move is hit. The next heuristics are applied on a cyclic manner (see Özcan and Kheiri 2012; Cowling et al. 2001; Burke and Soubeiga 2003). (iv) Greedy selection applies all low-level heuristics separately at each iteration and chooses the one that generates the best cost (see Cowling et al. 2002, 2001; Özcan and Kheiri 2012 as examples of this strategy). (v) Peckish selection follows the same strategy as the greedy selection, but a subset of low-level heuristics is considered instead of all low-level heuristics (see Cowling and Chakhlevitch 2003, 2007) and, (vi) Reinforcement Learning heuristic selection is based on a reward value for each heuristic. Such type of reward that represents its historical performance has been considered by Özcan et al. (2012), Drake et al. (2015), and Aron et al. (2015).

Our approach falls within the category of reinforcement learning methods and has been inspired from the concept of association rules in the business data-mining world (Aguinis et al. 2012). By the definition of Burke et al. (2010), our approach: (1) is a heuristic-selection type (as opposed to heuristic-generation type), (2) applies an online learning, and (3) is a hybrid construction–perturbation heuristic.

The overall flow of algorithm is described in the following: first, an initial solution is constructed using a constructive heuristic. Then, through a learning process, the hyper-heuristic proposes a series of perturbation, constructive, and/or improvement heuristics to improve the solution.

The first heuristic, which is a constructive one, optionally exploits the Lagrangian dual information to construct the initial (probably partial) solution. In the next step, such a solution can be partially destructed and reconstructed, perturbed-and-improved, just-improved, or any other combination. A potential improvement made by a transition from the first constructive heuristic to another heuristic is evaluated.

An acceptance criterion then decides on the transition from the current state (heuristic) to another one.

Here, we use a simulated annealing move acceptance scheme, where the decision regarding the transition (move) to be accepted depends on quality of the solution obtained from applying heuristics (scale of improvement) and the elapsed CPU time. All improved moves are always accepted, and the acceptance of worsening ones follows Metropolis criterion Kirkpatrick (1984):

$$\begin{aligned} \exp (-\Delta f /T) > R(0,1), \end{aligned}$$
(30)

where \(\Delta f\) is the difference between the post-transition and the current objective function values, T is a synthetic temperature and R(0, 1) is a uniform random number between 0 and 1. T represents the global temperature. When T is larger, many non-improving or even deteriorating transitions (moves) can be accepted, while, when it tends to zero, \(\exp (-\Delta f /T)\) become closer to zero, too. Therefore, for sufficiently small values of T, only improving moves are accepted.

The algorithm starts with solving Lagrangian relaxation LRX-CSApHLRP using bundle algorithm (Frangioni 1995). The output (when stopping criteria of bundle are met) is in the form of a feasible solution with respect to the kept constraints (at least the p hubs are identified) as well as a set of coefficients of variables in the objective function. Such coefficients contain dual information.

3.2.1 Low-level heuristics

A rather large set of heuristics categorized in three classes, constructive, improvement, and perturbation are used within our hyper-heuristic framework. In the following, we describe every heuristic with sufficient details.

Constructive heuristics They build a solution from the scratch, based on several predefined rules. In this study, two constructive algorithms are implemented—CS1 and CS2.

  1. CS 1)

    That is outlined in Algorithm 1, performs as in the following: it selects p hubs that have the highest sum of supply and demand. Then, it allocates at least \(\Gamma\) nearest spokes to every hub (by considering capacity, e.g., based on the volume of demand/supply arriving/departing at/from the hub for the spoke on route). Finally, it incrementally constructs each feeder route by sequencing the spokes allocated to a given hub taking into consideration the capacity constraint.

  2. CS 2)

    Is distinguished from the first one, i.e., CS1, in the way, it selects the set of p hub candidates. CS2 uses the dual information of the Lagrangian relaxation (i.e., the coefficient of \(z_{ik}\) in LRX-CSApHLRP) as well as some statistics collected about the frequency that a given node appeared as a hub during the iterations of bundle and the historical memory of hyper-heuristic process, if any. It allocates at least \(\Gamma\) spoke nodes to each hub node and at the end allocates remaining spokes. To construct the spoke-level route and sequencing the hubs and its allocated spokes along such a route, we start from a hub and add the arcs incrementally. Let l be the last node already added to the partial spoke-level route and \({\mathscr{S}}_L\) be the set of remaining spokes to put on the route, unless \({\mathscr{S}}_L\) is a singleton, the next candidate spoke node, which we call it \(R\_Nearest\), to be added to the existing partial route is \(i^*={{\text{argmin}}}_{ i\in {\mathscr{S}}_L}\{\alpha _1t_{s_l\,s_c} + \alpha _2 r_{s_l\,s_c} \}, 0\le \alpha _1+\alpha _2 \le 1, \alpha _1,\alpha _2\ge 0\). Here, \(t_{s_l\,s_c}\) represents the travel time between the last spoke \(s_l\) in the cycle of the feeder route associate to the hub and the candidate spoke \(s_c\). Similarly, \(r_{s_l\,s_c}\) represents the travel time between the last spoke \(s_l\) in the cycle of the feeder route associate to the hub and the candidate spoke \(s_c\), and \({\widetilde{r}}_{s_l\,s_c}\) corresponds to the coefficient of \(r_{s_l\,s_c}\) in LRX-CSApHLRP.

figure a

Improvement These set of heuristics start from a complete solution and apply some moves to improve the objective value. The following improvement heuristics are proposed:

  1. IMP 1)

    Rotate Given a route (a hub and a sequence of spokes along the route), this heuristic rotates (clock-wise or counter clock-wise) the string. Such rotation of size k will designate the kth spoke node after the current hub as the new hub and re-establish the current hub as a spoke. This heuristic is useful when the order of nodes on the route is rather correct, but the choice of hub is the cause of infeasibility (with respect to the capacity) or sub-optimality.

  2. IMP 2)

    Re-order It re-sequences the nodes (hubs and spokes) to find a better order of nodes along the route. While re-ordering, a sequence of \(\{H,S_1,S_2,\dots ,S_m\}\) may, for example, turn to \(\{S_3,S_1,H,S_m,\dots ,S_2\}\), wherein \(S_3\) becomes the new hub and H turns to a spoke node. This heuristic takes into the capacity on the route and the total volume of demand and supply of nodes, such that, by some intelligent re-sequencings, either a feasible solution is obtained or an already feasible solution is improved.

  3. IMP 3)

    Re-allocate It removes an allocated spoke from a feeder network that has the maximum number of feeder nodes, and insert it into another feeder network with sufficient residual capacity. This node can be a hub or spoke in its original route and can be equivalently hub or spoke in the destination route. If it becomes a hub at the destination then the hub at the destination turns to spoke on the route. Normally, such nodes can be the one in an irreducible subset of the nodes on the route that are the cause of infeasibility or the nodes that are far from each other, but lie on the same feeder-level arc (imposing high transportation time).

  4. IMP 4)

    SW1ffImp It randomly swaps two allocated spoke nodes from two different feeder networks, if and only if the solution improves the incumbent. This heuristic can be seen as a kind of search that tries to diversify and sample from different parts of the search space.

  5. IMP 5)

    SW2ffImp It randomly swaps three spoke nodes belonging to three different feeder networks, if and only if the solution improves over the incumbent.

Perturbation These heuristics start from a complete solution and do some diversifications by injecting some noises. The following heuristics have shown to be effective:

  1. PRT 1)

    SW1ff It randomly swaps two spokes allocated to the two different hubs (see Algorithm 2). At the end, it either returns the feasible solution with the maximum Hamming distance (with respect to the binary design variables) from the input solution or one of the solutions among the p farthest ones.

  2. PRT 2)

    SW2ff It randomly swaps three spokes belonging to three different routes. It follows the same principle as in SW1ff, except that a very limited number (\(n^2<|.| \ll n^3\)) of samples are examined as a complete 3-opt is very expensive.

  3. PRT 3)

    SW1hf It swaps a random feeder and its corresponding hub. The hub becomes a spoke and the spoke one becomes a hub.

  4. PRT 4)

    SW2hf It is a fast local search that swaps a random feeder and a random hub, which should not be the hub to which it is allocated.

  5. PRT 5)

    SWhh It is a fast local search swapping two hubs. This can be done by taking into account the supply/demand of the entering hub that can violate the capacity (if we seek generating feasible solutions from perturbation).

  6. PRT 6)

    Insert/delete It removes an allocated spoke from a route, and insert it to another one either with respect to the residual capacity available on another route or even randomly.

figure b

It must be emphasized that the perturbation heuristics aim at introducing some diversifications. Therefore, we do not expect a perturbed solution to be a feasible one. Rather, we hope that it can make a jump and help exploring unvisited regions in the solution space.

3.2.2 Learning algorithm

As mentioned earlier, several heuristic-selection methods are introduced in the literature. In this study, we work with a reinforcement learning mechanism to select the heuristics to be applied during the hyper-heuristic process.

In reinforcement learning, we are dealing with an agent that learns through some interaction with a so-called dynamic environment. Such interaction are usually realized through some trial-and-error efforts. At each point in time, t, an agent is in a given state \(s_t\in S\) and can choose an action \(a_t\) –possibly from among several– to make a transition to another state \(s_{t+1}\in S\). In this work, we deal with the sets S and A of bounded cardinality, i.e., \(card(|A|)+card(|S|)< card(\mathbb {R})=\aleph\). More precisely, we work with a finite number of transition that can be made.

When an action is taken, a numerical reward, \(r_{t+1}\) is attributed to this action. Such a reward is a feedback from the environment that depends on the quality of action taken by the agent. An agent that wants to maximize the total reward attributed to it has to be able to exploit the experiences which it has obtained so far (memory) and also be able to explore other parts of the environment and identify better actions Sutton and Barto (1998). In general, there are two types of reinforcement learning:

  1. (a)

    Online learning heuristic rewards are updated online (i.e., dynamically) throughout the search according to the performance of every heuristic.

  2. (b)

    Off-line learning heuristic rewards are retrieved from a database, which stores the historical performance of every heuristic (often on similar class of problems) in the past.

Here, learning mechanism, which is presented in Algorithm 3, is inspired from a data-mining technique called Association Rules (AR) guided by a Tabu Search (TS). When looking at the principles of AR, one can observe that AR is actually a kind of reinforcement learning that helps in respecting the dependency order of heuristic applications. The AR learning mechanism is introduced by Agrawal et al. (1993). This is a technique in data mining that has attracted many attention. Extracting interesting correlation, frequent patterns, associations, or casual structures from group of items in the transaction databases or data repositories are the purpose of AR technique. Traditionally, AR has been widely used in Market Basket Analysis (Aguinis et al. 2012) to find how items purchased by customers are related.

In association analysis, there are two sets of items (called itemsets): (1) the antecedent (the if part), and (2) the consequent (the then part). Moreover, an association rule has two more values that express the degree of uncertainty about the rules. The first value is called the support for the rule, which is defined as the proportion of task-relevant data transactions for which the pattern is true. The second one is known as the confidence of the rule and is a measure of certainty or trustworthiness associated with each discovered pattern.

Here, our goal is to find some relationships between the different implemented heuristics, to find the best series of heuristics to be applied. The numerical reward is not going to be assigned to a single heuristic but to a series of them. Association Rule heuristic-selection method deals with two main variables: support and confidence. These variables are used to measure the performance of a series of heuristics. If a heuristic does not have the required support at a certain time, or if it does not have enough confidence, the Tabu Search meta-heuristic method prevents it from being selected.

The relevant mapping are explained in the following:

  1. (a)

    A heuristic series (\(H_m,\dots ,H_n\)) weight (rewards) is equal to the negative summation (for minimization case) of the objective function value divided by the number of times this heuristic has been applied.

  2. (b)

    A support of an \(H_s\) is equal to the summation of the heuristic series (\(H_1,\dots ,H_s\)) rewards in the precedent transactions divided by the summation of all the heuristic series rewards.

  3. (c)

    The confidence of a suggested \(H_s\) in such series (\(H_m,\dots ,H_n\)) is equal to the support of \(H_s\) divided by the support of item set (\(H_m,\dots ,H_n\)).

Tabu list\(T_l\): Given our time limit, we initially set \(T_l=\emptyset\), and at each iteration, \(T_l\) is updated as follows: if the elapsed CPU time is less than a certain milestone (e.g., the time limit divided by a given number V1), heuristics are chosen randomly to establish a historical heuristic series performance. Otherwise, if the elapsed CPU time is greater or equal to the milestone V1, \(T_l\) will contain all heuristics for each series expect those for which the support value is greater than a threshold \(th_S\). In the case that the elapsed time is greater than the second milestone \(V2>V1\), \(T_l\) will contain all heuristics for each series expect those for which the confidence value is greater than a threshold \(th_C\). It must be noted that the tabu list is updated in accordance with the changes of quality of each heuristic series.

figure c

3.3 Algorithm

Solving Lagrangian dual The corresponding Lagrangian dual problem is solved using the bundle method of Frangioni (1995).

Tree management and branching rules Our effort is put into quickly improving the lower bound, and accordingly, we process the nodes following a breadth-first scheme. Moreover, our extensive computational experiments reveal that the solution method is rather robust to the branching decisions, and therefore, no particularly promising branching scheme is identified.

Cut management We add cuts mainly during the solution process to find the optimal solution to the MIP problem. Some cuts are added to almost all the nodes of the branch-and-bound tree, while others are added only at certain nodes. All such cuts are added if they meet a threshold of violation, \(\epsilon\). All the globally valid cuts are added as global cuts. This means that, once added, they remain in the LP until the end of the branch-and-bound process. No cut pool is maintained.

Separation of valid inequalities (20) If a node i is allocated to a hub node j, we can send one unit of flow from a dummy node p to the hub node where i is allocated to. If less than one flow arrives at that hub, a cut is identified.

Separation of valid inequalities (21) At any integer node where an inequality of form (21) is violated, every sub-path of length \(|S| + 1\) represents such a valid inequality and is violated by 1. This separation can be done in \(O(n^2)\).

Separation of valid inequalities (22) Initial computational experiments show that Padberg and Rao (1982) outperforms Fischetti et al. (1998) which is a heuristic mechanism.

Separation of valid inequalities (26) Separation based on block decomposition (as implemented for TSP in Concorde) is shown to be more efficient than that of Letchford and Lodi (2002)—even though a polynomial time.

4 Computational experiments

We have generated our instances based on the well-known Australian Post (AP) data set from the OR-Library. The transhipment times are generated randomly for some real values within [2, 5] for the given time unit. The existing capacities of the original data (see Ernst and Krishnamoorthy 1999) are not used, because they do not always result in feasible solutions as the problem structures are different. We assume that our fleet of vehicles is homogenous and, therefore, has a unique capacity size. Let \(\hbox {O}_i = \sum _j W_{ij}\) and \(\hbox {D}_i = \sum _j W_{ij}\) and let p be the number of hubs. The capacity is generated as \(C = \frac{\sum _i O_{i} + \sum _i D_{i}}{p}\).

All experiments were performed on an Intel 2.54 GHz core i5 CPU and 4 Gb RAM on a machine with Windows 7 as OS. All instances are named in a format \(ni\_pj\_k\), where i indicates the number of nodes, j indicates the number of hubs, and k indicates the factor of economies of scale.

Our initial computational experiment suggested that valid inequalities (20) and (21) be separated only at integer nodes rather than fractional solutions. However, valid inequalities (22) and (26) can only be separated at fractional nodes and at every 100 nodes, if fractional.

The termination criteria are chosen in such a way to avoid non-promising computational efforts and premature convergence. The proposed termination criterion sets a global time limit \({T_{limit}} = \max {\{6000, n^2 \times p \times 250\}}\) m.s. for the overall computation. In addition, we set a second termination criterion for terminating the whole algorithm once \(10\times n \times p\) non-improving iterations have been observed.

In construction phase, \(\alpha _1\) and \(\alpha _2\) were set to 0.5. This means that LR r-coefficients and the travel time are equally important in influencing the choice of arcs to be established. V1 is set to \(T_{limit}/3\), while V2 is set to \(2T_{limit}/3\). \(th_S\) and \(th_C\) are set to a tolerance of 15% from the best objective function value.

When doing iterations of bundle algorithm, the subproblem was solved using CPLEX 12.7.1, the time limit was set to 3,600 seconds, and the number of iterations was set to 100. \(\epsilon\) is set to 1e-6.

While a fair comparison between a heuristic algorithm and an exact decomposition is rather non-trivial, in Table 4, we report the results of our computational experiments with hyper-heuristic (HH), with and without using results of LR, next to the results of Bender decomposition from Gelareh et al. (2015). There are, in total, 24 instances ranging from 10 nodes with 3 hubs to 20 nodes with 6 hubs. The first column reports the instance name and the second (resp. seventh) one indicates the computational times of Bender Decomposition method (resp. HH.). The best solution found by HH without LR (resp. Bender Decomposition) is reported in the fifth (resp. third) column. The fourth column reports the status at termination when applying Bender decomposition method. The gap between HH without LR results (resp. LR with HH represented by LR-HH) and Benders Decompositions are reported in the seventh (resp. ten) column. The last column indicates the gap between the best objective function value taken by HH with and without using LR results.

One observes that the relative gaps between the solution reported by Benders decomposition and those of HH never exceeded 0.08% if HH uses LR results; otherwise, it does not exceed 0.32%. In this table, the negative gaps indicate that such HH solutions are better than the feasible solution obtained by applying the Benders decomposition method in Gelareh et al. (2015). When compared to the computational times elapsed before obtaining such high-quality solutions, it is confirmed that the proposed HH framework is capable of obtaining high-quality solutions in very reasonable computational times, especially within LR results.

The gap between HH with and without LR results reaches 2.37% in instance n15_p3_0.8, which proves the additional value of using the LR variable coefficient into the process of HH. LR variable coefficient shown in the objective function appears very well during the process of HH. In construction phases (resp. in perturbation phase), all routes between any pairs have the coefficient greater than a fixed number such as \(\iota\) (resp. \(\nu\)) are excluded. Furthermore, during the improvement phases, pairs with small coefficients are favored to be connected.

Table 4 Quality comparison between benders decomposition and the proposed hyper-heuristic algorithms with and without LR

Table 5 reports the total number of iterations every heuristic has worked on each instance. Rotate improvement heuristic has worked the most in all instances. Next, it comes to the constructive heuristics CS1 and CS2, which have more or less similar number of iterations for many instances. This may indicate that the information collected in the course of Lagrangian optimization in LR, the statistics collected during HH process, and the construction by taking into account demand (i.e., CS1) when choosing the hubs, have some similarities. Among the perturbation heuristics, SW2hf contributes in the most number of iterations.

Table 5 Statistic of applying different heuristics in solving every instances

We have applied the proposed HH framework on a wider range of problem instance sizes. Table 7 reports the statistics of applying the proposed HH on a testbed composed of instances of up to 100 nodes.

Table 6 Quality comparison between the initial objective value taken by the proposed hyper-heuristic algorithms with and without LR

In Table 6, initial objective values of the HH with and without LR results are reported. In general, initial objective function using LR results is better than the results of this approach without LR. The average relative gap between the initial and the best found objective function value, for the instances of Table 6 without using LR results, is 21.45% (resp. 19.36% using LR results), while this becomes 26.66% for the instances with greater than 20 nodes (see Table 7). This is, on one hand, due to a much larger and spread solution space in larger instances, which, in turn, increases the probability of finding better solutions during the search and, on the other hand, may show that the HH is capable of benefiting from such a much larger search space to better improve the initial solutions. Another pessimistic view might question the performance of construction heuristics and quality of initial solution constructed by them.

Table 7 HH results for large instances of p-hub location problem

To ensure the quality of solutions, we reserve only information about the allocation resulted from LR HH and ignore all spoke routes information. The problem has some similarity with the pick-up delivery vehicle routing problem with multiple depots. Each hub is considered as depot, and the objective is to construct a feeder network for each hub from its allocated spokes. We inject all allocation information into solver, CPLEX, which finds feeder networks. All constraints related to the number of hubs and allocations is ignored. This process, which uses some data resulted from a heuristic approach and completes the solution using a solver, is referred to as Matheuristic. Matheuristic optimization algorithms (Bartolini and Mingozzi 2009) are a combination between (meta-)heuristic approaches and other different techniques of mathematical programming.

The results of matheuristic are reported in Table 8 together with the relative gap between the solution of matheuristic and LR HH objective value, the execution time, and the Cplex status at termination. One observes that—within a time limit of 1 h—as the instance size increases, the LR HH is by far more efficient.

Table 8 A comparison between results of matheuristic with that of LR HH

In Table 9, we report the results of applying iterative method for solving Lagrangian relaxation proposed in Sect. 3.1. The bundle method is known for its higher precision and faster convergence compared to the simpler versions of subgradient method. We have initialized the Lagrangian multipliers with the corresponding dual values of the LP relaxation, wherever possible (problem size).

In Table 9, we have reported the CPU time as well as the best bound obtained for the given amount of CPU time. LR was able to converge for the problem until size of 20 nodes in reasonable time for several instances.

Table 9 Lagrangian relaxation results

5 Conclusion and future work

A hyper-heuristic approach is proposed for solving instances of a variant of Hub Location Routing Problems (CSApHLRP). The problem has been inspired from an application in the distribution chain of pharmaceutical product. While exact methods have shown a very limited success when dealing with instances of problem and the instances of moderate size remain still very intractable, one has to resort to non-exact (heuristic-based) methods for obtaining solutions to the moderate-to-large size instances.

We proposed the first hyper-heuristic for this class of problems. The proposed hyper-heuristic consists of several low-level heuristics guided by means of a learning method, which is inspired from a topic in business data-mining world known as Association Rules. The hyper-heuristic benefits from the primal/dual information of a Lagrangian relaxation of the problem (relax-and-cut, more precisely). We have identified several classes of valid inequalities and the corresponding separation routines to separate such inequalities and use them within the relax-and-cut framework.

Our numerical results show that, in the case of instances for which an optimal solution or a solution with known quality is known, the proposed method is very competitive. In all such cases, the solutions are obtained in a very reasonable CPU time. Therefore, it is expected that for larger size instances for which finding an optimal solution (with general-purpose solvers or even some decompositions) needs an effort beyond capacity of current hardware resources, the proposed framework that is equipped with advances learning mechanisms performs very well.

In our future work, we will add further aspects of real-life application to the model and consider solving problem taking into consideration a heterogeneous fleet of trucks/vessels with different route capacities, inter-route transshipment, etc.