1 Introduction

The well-known travelling salesman problem (TSP) and its variants have been widely studied and possess a rich literature [1, 2]. Among its variants, travelling salesman problem with time windows (TSPTW) looks for a tour with the minimum cost where each city is visited within an associated time frame, namely between its earliest start time and its due time. Such formulated time-constrained problems have many industrial applications in a variety of fields, including logistics, transportation systems, and manufacturing [3]. Furthermore, investigation of TSPTW allows a deeper understanding of other related and more complicated problems like the vehicle routing problem with time windows [4] and its variants. It is proven that the TSPTW problem is NP-hard, and even finding a feasible solution is proven to be NP-complete [5].

It is not surprising that TSP and its variants provide a paradigmatic benchmark for the emerging technology of quantum computing, which opens up an alternative perspective for solving computationally hard problems. Quantum algorithms developed by Shor and Grover, with provable speedups compared to the best-known classical algorithms, are not suitable for the noisy intermediate-scale quantum (NISQ) era [6]. At the same time, there have been promising attempts to solve optimization problems, including TSP, using current quantum technology, the most prominent ones being the variational quantum eigensolver (VQE) [7], quantum approximate optimization algorithm (QAOA) [8], and quantum annealing (QA) [9, 10]. In each case, one needs to represent the optimization problem in the form of unconstrained binary model.

In particular, quantum annealing relies on the quantum adiabatic theorem [11]. An initial Hamiltonian is picked whose ground state is easy to prepare, and the system is evolved by applying a time-dependent Hamiltonian. The time-dependent Hamiltonian gradually brings in the problem Hamiltonian, whose ground state encodes the solution to the optimization problem of interest. Quantum adiabatic theorem guarantees that the final state of the system is close to the ground state of the problem Hamiltonian for a sufficiently long evolution [12]. Hence, finding the solution to the optimization problem can be reduced to the problem of finding the ground state of the problem Hamiltonian. Theoretically, the performance of quantum annealing depends on the minimum spectral gap encountered during the process. Whether quantum annealing provides speedup against the classical algorithms remains controversial [13, 14]. How to detect quantum speedup is a question of interest on its own [15], and various metrics have been proposed.

Quantum annealing has attracted significant attention since it is realizable in the commercially available D-Wave machines [16]. In order to solve a problem using QA, one can formulate the problem as a quadratic unconstrained binary optimization problem (QUBO) which is then easily recast into the problem Hamiltonian. The only QUBO formulation proposed in the literature for the TSPTW problem is due to Papalitsas et al. [17]. However, the formulation is flawed as it allows some infeasible solutions. Furthermore, the authors assume that the earliest start time is equal to 0 for all cities, which limits the possible applications.

In this paper, we provide three different formulations for the TSPTW problem. The first formulation extends the formulation proposed in [17] by taking into account both the earliest start times and the due times for each city. In the second formulation, we provide an alternative higher-order binary model which is more space-efficient. Finally, we present an alternative QUBO formulation based on the integer linear programming formulation given in [18]. First two formulations allow more than one assignment of binary variables to encode the optimal route without any penalty. All proposed models can be easily modified to obtain formulations for other variants of TSP like makespan problem with time windows, in which the total tour duration is minimized. To investigate the efficiency of the edge-based and ILP formulations, we provide some experimental results obtained by running small instances of the problem on the D-Wave Advantage.

This paper is organized into the following sections. We start with introducing the problem and the necessary concepts in Sect. 2. In Sect. 3, we present our formulations for the TSPTW problem. Section 4 contains experimental results from the D-Wave quantum annealing device. We discuss the formulations and the results in Sect. 5, and we conclude by Sect. 6.

2 Background

In this section, we provide background information concerning quantum annealing and related concepts. We also discuss recent results concerning travelling salesman problem with time windows.

2.1 Simulated and quantum annealing

Consider a minimization problem in which the aim is to find the global minimum of a cost function defined over a discrete set, whose elements are called the solutions. Let’s model the solution space as a graph where the nodes are the solutions, and the edges are defined by a neighborhood rule. Proposed by Kirkpatrick et al. [19], simulated annealing (SA) can be regarded as a random walk on the search space, whose steps are parameterized by a temperature parameter called T. At each iteration, a random step is taken, and the step is accepted if it has a lower cost. If the new step has a higher cost, then it is accepted with a probability determined by the temperature T and the difference between the existing and the new costs. At higher temperatures, the transition between the states occurs more frequently. According to a cooling schedule, at each iteration, the parameter T is decremented, and the optimal solution is found with the help of the thermal fluctuations.

Quantum annealing (QA) is a quantum mechanical heuristic method for solving optimization problems relying on the quantum adiabatic computation. Note that quantum annealing is indeed a physical process taking place in an analog quantum device, whereas simulated annealing is an analogy of a physical procedure. The idea is independently introduced by many authors, including [9, 10]. As opposed to simulated annealing, in quantum annealing, quantum fluctuations are used instead of thermal fluctuations.

In the framework of QA, the system is initialized to the ground state of the transverse field Hamiltonian \(H_D\) and a problem Hamiltonian \(H_F\) is designed so that its ground state encodes the solution. The overall Hamiltonian takes the form

$$\begin{aligned} H(t) = A(t) H_D + B(t)H_F, \end{aligned}$$

where A(t) is decreased gradually from the initial value \( A(0)=1 \) to \( A(\tau ) =0\), while B(t) is gradually increased from \( B(0)=0 \) to \( B(\tau )=1 \), \( \tau \) being the computation time, so that at the end of the annealing procedure \( H(\tau ) = H_F \). Then, by quantum adiabatic theorem, the final ground state gives us a solution which is close to the optimal. Instead of climbing over the local minima as in SA, quantum tunneling is used to escape local minima [20]. We refer readers to [14, 20] to read more details on the topic.

2.2 Ising model and D-Wave

Consider n particles where each particle can be either in state \(-1\) or \(+1\) called the spin. An assignment of \(-1\)s and 1s to the spins is known as the spin configuration. The Ising model is a mathematical model for ferromagnetism used in statistical mechanics to analyze the properties of spin configurations. The interaction force or the coupling strength between the particles is denoted by \( J_{ij} \), and an external force \(h_i\) called the qubit bias is applied on each particle. The energy of a configuration is given by

$$\begin{aligned} H(s) = \sum _{i}h_i s_i + \sum _{i<j} J_{ij}s_i s_j, \end{aligned}$$

where \( s_i \in \{-1,1\} \). One should note that the problem of finding the spin configuration which minimizes H(s) is NP-hard in general.

D-Wave QPU is a collection of particles arranged on a special architecture (Chimera or Pegasus), and the Ising energy minimization problem is natively solved using quantum annealing. The initial and the problem Hamiltonians take the form

$$\begin{aligned} H_D = \sum _i \sigma _i^x, ~~~~H_F = \sum _i h_i \sigma _i^z + \sum _{i<j} J_{ij} \sigma _i^z \sigma _j^z, \end{aligned}$$

where \( \sigma _i^x \) and \( \sigma _i^z \) denote the Pauli-x and Pauli-z operators acting on ith qubit, respectively. Hence, one needs to design the Hamiltonian \( H_F \) whose ground state encodes the optimal solution to the minimization problem that is aimed to be solved.

Note that not all couplings are available on the hardware, and therefore a process called minor embedding is needed to map the logical qubits to the physical ones. Furthermore, there are specific ranges for \( h_i\) and \( J_{ij} \), so that the coupling and the qubit bias require scaling.

2.3 Quadratic unconstrained binary optimization

One may find it more natural to express an optimization problem using binary variables instead of spin variables. Formally, objective function for the quadratic unconstrained binary optimization problem is defined as

$$\begin{aligned} H(x) = \sum _{i\le j} x_i Q_{ij} x_j, \end{aligned}$$

where x is a binary vector and Q is a real square upper triangular matrix. The correspondence between the QUBO and the Ising model can be observed by the change of variable \( x_i = \frac{1-s_i}{2} \). Noting that \( x_i^2=x_i \) for any i since \( x_i \in \{0,1\} \), the diagonals of the matrix Q are the linear coefficients and the off-diagonals are the quadratic coefficients. In [21], a list of well-known problems and their QUBO formulations are presented.

One can propose a generalization of QUBO in which the objective function is a general polynomial in bits. If the objective function involves monomials with \( k \ge 3\) variables, then it is called a high order binary optimization (HOBO) problem. Note that sometimes such problem is also referred to as PUBO, polynomial unconstrained binary optimization problem [22]. It is always possible to obtain an equivalent QUBO formulation by quadratization, which is explained in more detail in the “Appendix.”

2.4 Integer linear programming

In integer linear programming (ILP), the problems are formulated through some set of linear constraints over integer variables and a linear objective function to be minimized. An ILP problem is formally defined as

$$\begin{aligned}&\text {minimize} \qquad \sum _{i} c_i y_i \\&\text {subject to} \qquad \sum _{j} a_{ij}y_j \le b_i,\,\, i=1,\dots ,m \\&\quad {} \qquad \qquad \qquad y_i \ge 0, y_i \in {\mathbb {Z}}\end{aligned}$$

where \(a_{ij}\in {\mathbb {R}}\), \(b_j\in {\mathbb {R}}\), \( c_i \in {\mathbb {R}}\). ILP problem is known to be NP-complete.

Integer quadratic program (IQP) is defined analogously with a quadratic objective function and a set of linear constraints. Both ILP and IQP problems can be expressed as QUBO problems which will be discussed next.

2.5 Transformation into binary problems

Direct preparation of QUBO formulation may not always be convenient. One may first define some constraints and use integer variables while formulating an optimization problem. Below, we present the procedures for transforming linear inequalities into equality constraints, mapping integer variables to binary ones, and the penalty method for removing the constraints.

Suppose we have integer variables \(y_1,\dots ,y_k\) such that \({\underline{y}}_i \le y_i \le {\overline{y}}_i\) where \({\underline{y}}_i,{\overline{y}}_i\in {\mathbb {Z}}\) are some constants bounding \( y_i \). Since in the paper we will use integer variables only, we will write \(y\in \{\underline{y}_i,\ldots , \overline{y}_i\}\) instead of \({\underline{y}}_i \le y_i \le {\overline{y}}_i\) . Also, suppose that \(f(y_1,\dots ,y_k)\) is our objective function to be minimized.

Let us start with the penalty method for removing the constraints. Given a linear equality constraint of the form

$$\begin{aligned} \sum _{i=1}^ka_i y_i = b, \end{aligned}$$
(1)

where \(a_i,b\in {\mathbb {R}}\), the transformation procedure simply transforms objective function f into

$$\begin{aligned} f(y_1,\dots ,y_k) + P \left( \sum _{i=1}^ka_i y_i - b\right) ^2. \end{aligned}$$
(2)

Note that the new function is equal to f if and only if variables \(y_i\) satisfy the equality. Constant \(P\in {\mathbb {R}}_{>0}\) is the penalty constant that has to be adjusted.

Linear inequality constraints have to be first transformed to equality constraints first through the so-called slack variables. Suppose we have an inequality constraint of the form

$$\begin{aligned} \sum _{i=1}^ka_i y_i \le b. \end{aligned}$$
(3)

Then, by adding slack variables \(\xi \), we obtain \(\sum _{i=1}^ka_i y_i + \xi = b\) and move it to the objective function according to previously described procedure. Note that \(\xi \) has to be optimized by the optimization procedure as well. Taking into account both sides of the inequality, we can bound the slack variable \( \xi \) as follows:

$$\begin{aligned} 0 \le \xi \le - \left( \sum _{i=1}^k \min \{ a_{i}{\underline{y}}_i, a_{i}{\overline{y}}_i\} -b \right) . \end{aligned}$$
(4)

Now, suppose we have a function \(f(y_1,\dots ,y_k)\), where \(y_1,\dots ,y_k\) are integer variables. In order to transform f into pseudo-Boolean function, it is enough to replace each occurrence of an integer variable y with

$$\begin{aligned} E_{{\underline{y}}}^{\overline{y}}(y) = {\underline{y}}+\sum _{i=0}^{k_y-2}2^i x_{y,i}+\left( \overline{y}-\sum _{i=0}^{k_y-2}2^i\right) x_{y,k_y-1}, \end{aligned}$$
(5)

where \(k_{y} = \lceil \log _2({\overline{y}}-{\underline{y}}+1)\rceil \), and \(x_{y,i}\) are the newly introduced binary variables to be optimized. Note that f and \(f\left( E_{{\underline{y}}_1}^{\overline{y_1}}(y_1),\dots ,E_{{\underline{y}}_k}^{\overline{y_k}}(y_k)\right) \) are polynomials of the same order. In particular, our transformation maps quadratic polynomials into quadratic pseudo-Boolean polynomials.

The procedures above make QA an alternative approach addressing all problems admitting formulation as ILP or IQP.

2.6 TSP problem and its variants

Let \(G=(V,{\vec {E}}) \) be directed graph where \( V=\{0,1,\dots ,n\} \) is the set of nodes and \({\vec {E}} \subset V\times V \) is the set of arcs.

A tour consists of a sequence of vertices and edges, where the edges connect the adjacent vertices in the sequence, and no edge is repeated. A tour that visits each node exactly once is called a Hamiltonian cycle. For every pair of nodes (uv) , one can associate the cost of travelling from node u to v which is denoted by \( c_{uv} \). Finding a Hamiltonian cycle that minimizes the total cost of travelling between the nodes is known as the Travelling Salesman Problem (TSP).

Among the many generalizations of TSP, we will focus on Travelling Salesman Problem with Time Windows (TSPTW). Consider a vehicle that starts from the depot labeled by 0, visits each city, and returns to the depot. For each city v, there is an associated service time which is included in the cost of the arc outgoing from v and a time window \( [e_v,l_v] \) such that the city v should be visited within the time window, where \(e_v\) is the earliest start time and \(l_v\) is the due time for city v. If the vehicle arrives at city v before \( e_v \), the vehicle should wait. TSPTW aims to find a Hamiltonian cycle that minimizes the total cost and satisfies the time window constraints. Another objective in this setup would be to minimize the total completion time of the tour, known as the Makespan Problem with Time Windows (MPTW).

Both exact and heuristic algorithms have been proposed for TSPTW and its variants. Some of the first approaches include solutions to the MPTW problem. In [23], a branch and bound procedure is utilized, and a non-linear program is formulated. Langevin et al. [24] presents an integer program using a commodity flow formulation for both problems. In 2012, Baldacci et al. proposed an algorithm that outperformed the existing exact solutions using dynamic programming [25]. Some other dynamic programming approaches include the works of [26, 27]. The references [28, 29] provide exact solutions to the problem using constraint programming. A more recent study combining constraint programming and reinforcement learning is presented in [30].

The first and the only attempt in solving the TSPTW problem using quantum algorithms is by Papalitsas et al. [17], which we will discuss in more detail in the following section. There has also been some ongoing research on using quantum algorithms to solve the TSP problem, and we can mention the various attempts of QAOA [31,32,33] and QA [34, 35]. Some other related work includes [36, 37], which use QA to solve vehicle routing problem, a generalization of TSP to multiple vehicles.

3 Unconstrained binary models for TSPTW

A Hamiltonian cycle is a feasible solution for the TSPTW problem if the vehicle obeys the time window constraints of each city while visiting the cities in the order imposed by the cycle. The optimal solution is the feasible solution with the least cost. Given a cycle, let’s investigate the tour of the vehicle in more detail. The vehicle leaves the depot immediately and moves to the first city on the tour. Then, there are three possibilities: If the vehicle enters the city within its time windows (if the arrival time is between the earliest start time and due time), then service is done immediately, and the vehicle moves to the next city. If the vehicle arrives before the earliest start time of the city, then it waits until the earliest start time and then moves to the next city. Finally, if the vehicle arrives later than the due time of the city, then the cycle is infeasible. After this procedure is repeated for each city on the cycle, the vehicle returns to the depot. Note that given the cycle, one can calculate the waiting times and services times using a classical procedure. Hence, we are specifically interested in the Hamiltonian cycle that is a feasible or optimal solution to the problem. We will refer to such cycles as feasible routes and optimal routes, respectively.

To describe the tour of the vehicle, we will introduce the concepts of arrival and waiting time for city v which will be denoted by \(\alpha _v\) and \(\nu _v\), respectively. We can describe the arrival times using a recurrence relation satisfying the following conditions:

  1. 1.

    Initialization constraint Arrival \(\alpha _v\) to the first visited city v equals \(c_{0v}\).

  2. 2.

    Recurrence constraint If w and v are consecutive cities in the tour, \(\alpha _v = \alpha _w + \nu _w+ c_{wv}\).

  3. 3.

    Service constraint \(\alpha _v + \nu _v\) (Service time for city v) is between \(e_v\) and \(l_v\).

The constraints given above will form the backbone of the formulations we will present in the following subsections. The first one is the corrected and generalized version of the QUBO formulation presented in [17]. The second one is a HOBO formulation that is based on the standard QUBO formulation for the TSP problem given in [21]. The third representation is a QUBO model based on the ILP formulated in [18].

3.1 Edge-based formulation

We start with edge-based formulation of the TSPTW inspired by [17]. A QUBO formulation for the TSPTW problem is presented in [17] with a simplifying assumption of \( e_v=0 \) for all \( v \in V \). However, there is a flaw in the given formulation, as we will discuss next.

A tour is of the form \( p_0,p_1,\dots ,p_{n+1} \), where \( p_0=p_{n+1}=0 \) so that the tour starts and ends at the depot. We will refer to \( p_i \) as the node at position i of the tour or the ith visited city. For each \( i=2, \dots , n \) and \( u,v \in V\) s.t. \(u\ne v\), let the binary variables \( x_{u,v}^i \) be defined as

$$\begin{aligned} x_{u,v}^i = {\left\{ \begin{array}{ll} 1, &{} \text {nodes} \,u, v\, \text {are at consecutive positions}\, i-1\, \text {and}\, i \,\text {in the Hamiltonian cycle,}\\ 0, &{} \text {otherwise.} \end{array}\right. }\nonumber \\ \end{aligned}$$
(6)

Furthermore, we introduce variables \(x_{0,v}^1\) (\(x_{v,0}^{n+1}\)) which equals 1 iff v is the first (last) visited city. Since each variable \(x_{u,v}^i\) indicates the occurrence of edge (uv) in the tour, we call it an edge-based model.

3.1.1 Route checking

To check whether the given bit assignment is a Hamiltonian cycle, the authors of [17] propose the Hamiltonian

$$\begin{aligned}&\biggl ( 1 - \sum _{v=1}^n x_{0,v}^1 \biggr )^2 + \sum _{i=2}^n \biggl (1- \sum _{\begin{array}{c} u,v=1\\ u \ne v \end{array}}^n x_{u,v}^i \biggr )^2 + \biggl ( 1 - \sum _{v=1}^n x_{v,0}^{n+1} \biggr )^2 \nonumber \\&\quad + \sum _{u=1}^n \biggl (1- \biggl ( x_{u,0}^{n+1} + \sum _{i=2}^n \sum _{\begin{array}{c} v=1\\ v \ne u \end{array}}^n x_{u,v}^i \biggr ) \biggr ) ^2 + \sum _{v=1}^n \biggl (1- \biggl ( x_{0,v}^{1} + \sum _{i=2}^n \sum _{\begin{array}{c} u=1\\ u \ne v \end{array}}^n x_{u,v}^i \biggr ) \biggr ) ^2 .\nonumber \\ \end{aligned}$$
(7)

The first line imposes that for each \( i =1,\dots ,n+1\), exactly one edge is traversed at each time step. The second line ensures that the vehicle leaves each node exactly once, and similarly, the third line checks whether each node is entered exactly once. This approach omits the subtour conditions which are required to ensure that the solution consists of a single closed tour. Taking into account the form of the Hamiltonian in Eq. (7), it is possible to find a solution in the form of two disjoint paths as presented in Fig. 1.

Fig. 1
figure 1

Undesired solution accepted in the formulation given in [17]

To remove disjoint tours from the set of feasible solutions, we need to include an additional term in the Hamiltonian, ensuring that the city that is left and entered at consecutive times is the same. This leads to penalty term of the form

$$\begin{aligned}&\sum _{v=1}^n \biggl ( 1 - \sum _{w=1}^n x_{0,v}^1 x_{v,w}^{2} \biggr )^2 + \sum _{i=2}^{n-1} \sum _{v=1}^n \biggl ( 1 - \sum _{\begin{array}{c} u,w=1\\ u \ne w \end{array}}^n x_{u,v}^i x_{v,w}^{i+1} \biggr )^2 \nonumber \\&\quad + \sum _{v=1}^n \biggl ( 1 - \sum _{u=1}^n x_{u,v}^n x_{v,0}^{n+1} \biggr )^2. \end{aligned}$$
(8)

The above is not a QUBO anymore as we have terms of order 4. We claim that the squares can be removed, if exactly one city is visited at each step.

Let iv be an arbitrary pair of time step i and city v. Based on the condition from the first line of Eq. (7), for given i there exists exactly one \(u'\) such that \(x_{u',v}^i=1\). This transforms the formula inside the parenthesis into

$$\begin{aligned} 1 - \sum _{\begin{array}{c} u,w=1\\ u \ne w \end{array}}^n x_{u,v}^i x_{v,w}^{i+1} = 1 - \sum _{\begin{array}{c} w=1\\ w \ne u' \end{array}}^n x_{u',v}^i x_{v,w}^{i+1} = 1 - \sum _{\begin{array}{c} w=1\\ w \ne u' \end{array}}^n x_{v,w}^{i+1}. \end{aligned}$$
(9)

For the given v, \(x_{v,w}^{i+1}\) is either 0 for all w or following the reasoning above, there is exactly one \(w'\) such that \(x_{v,w'}^{i+1}=1\), again based on Eq. (7). Hence, the expression inside the parenthesis is either equal to 0 or 1, and thus the square can be omitted. Finally, we have the following condition:

$$\begin{aligned} \sum _{v=1}^n \biggl ( 1 - \sum _{w=1}^n x_{0,v}^1 x_{v,w}^{2} \biggr ) + \sum _{i=2}^{n-1} \sum _{v=1}^n \biggl ( 1 - \sum _{\begin{array}{c} u,w=1\\ u \ne w \end{array}}^n x_{u,v}^i x_{v,w}^{i+1} \biggr ) + \sum _{v=1}^n \biggl ( 1 - \sum _{u=1}^n x_{u,v}^n x_{v,0}^{n+1} \biggr ).\nonumber \\ \end{aligned}$$
(10)

Let us now show that the last line in Eq. (7) is not required anymore. Recall that the first line of Eq. (7) accounts for checking whether exactly one edge is traversed at each time step and the second line ensures that the vehicle leaves each node exactly once. Equation (10) ensures that if \(x_{v,w}^{i}=1\), then for some unique u we have \(x_{u,v}^{i-1}=1\). Together with the first and the second lines, this already enforces the condition that if the vehicle leaves node v, it should have entered v in the previous time step, eliminating the necessity for having the third line in Eq. (7).

In summary, we can check whether the resulting tour is a Hamiltonian cycle using the following Hamiltonian:

$$\begin{aligned}&H_{{\mathcal {R}}}= \left( 1 - \sum _{v=1}^n x_{0,v}^1 \right) ^2 + \sum _{i=2}^n \left( 1- \sum _{\begin{array}{c} u,v=1\\ u \ne v \end{array}}^n x_{u,v}^i \right) ^2 + \left( 1 - \sum _{v=1}^n x_{v,0}^{n+1} \right) ^2 \nonumber \\&\quad \quad \quad + \sum _{u=1}^n \left( 1- \left( x_{u,0}^{n+1} + \sum _{i=2}^n \sum _{\begin{array}{c} v=1\\ v \ne u \end{array}}^n x_{u,v}^i \right) \right) ^2 + \sum _{v=1}^n \left( 1 - \sum _{w=1}^n x_{0,v}^1 x_{v,w}^{2} \right) \nonumber \\&\quad \quad \quad + \sum _{i=2}^{n-1} \sum _{v=1}^n \left( 1 - \sum _{\begin{array}{c} u,w=1\\ u \ne w \end{array}}^n x_{u,v}^i x_{v,w}^{i+1} \right) + \sum _{v=1}^n \left( 1 - \sum _{u=1}^n x_{u,v}^n x_{v,0}^{n+1} \right) . \end{aligned}$$
(11)

3.1.2 Time windows constraints

Recall that in the TSPTW problem, each city v has a time window \( [e_v,l_v] \) where both \( e_v \) and \( l_v \) are integer variables. In the state-of-the-art formulation of TSPTW [17], it is assumed that \( e_v=0 \) for all v. Next, we will improve the Hamiltonian so that arbitrary \(0\le e_v\le l_v\) will be allowed. Furthermore, instead of using one-hot encoding to express slack variables, we will use binary encoding as we discussed in Sect.  2.5, which will exponentially reduce the number of qubits representing the slack variables.

We start by defining the recurrence relation for arrival times, ensuring all three time windows constraints mentioned at the beginning of this section. For each \(i=1,\dots , n+1\), let \(A_i\) denote the arrival time to ith visited city (\(A_{n+1}\) is the arrival time to the depot), and let \(\omega _i\) be integer variables denoting the waiting time at ith visited city. \(A_i\) can be expressed using the following recurrence relation:

$$\begin{aligned} A_1=\sum _{v=1}^n x_{0,v}^1 c_{0v}, \qquad A_{i} = A_{i-1} + \omega _{i-1} + \sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n c_{uv} x_{u,v}^{i}, \end{aligned}$$
(12)

which can be expressed explicitly as

$$\begin{aligned} A_1=\sum _{v=1}^n x_{0,v}^1 c_{0v}, \qquad A_i = \sum _{t=1}^{i-1} \omega _t + \sum _{v=1}^n x_{0,v}^1 c_{0v} + \sum _{t=2}^{i} \sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n c_{uv} x_{u,v}^t \end{aligned}$$
(13)

for \(i=2,\dots ,n+1\). Note that the very definition of \(A_i\) already implies initialization and recurrence constraints.

The service constraints take the form

$$\begin{aligned}&\sum _{v=1}^n x_{0,v}^1 e_{v} \le A_1 +\omega _1 \le \sum _{v=1}^n x_{0,v}^1 l_{v}, \end{aligned}$$
(14)
$$\begin{aligned}&\sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u,v}^i e_v \le A_i +\omega _{i} \le \sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u,v}^i l_v \end{aligned}$$
(15)

for each \(i=2,\dots ,n\). Next, we will prove that the inequalities in Eqs. (14) and (15) can be replaced by Eqs. (16) and (17):

$$\begin{aligned}&\sum _{v=1}^n x_{u,v}^i e_v \le A_1 +\omega _{1}, \quad A_1 \le \sum _{v=1}^n x_{0,v}^i l_v, \end{aligned}$$
(16)
$$\begin{aligned}&\sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u,v}^i e_v \le A_i +\omega _{i}, \quad A_i \le \sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u,v}^i l_v. \end{aligned}$$
(17)

If \( A_i \) and \( \omega _i \) satisfy the former, then they already satisfy the latter. For the converse, suppose that the latter is satisfied, yet \(A_i + \omega _i\) is greater than the corresponding due time. Instead of \( \omega _i \), \( \omega _i^{'} \) can be chosen so that \( A_i+\omega _i^{'} \) is equal to the corresponding due time which is still greater than or equal to the corresponding earliest start time and the superfluous waiting time \( \omega _i^{'} -\omega _i \) can be moved to \(\omega _{i+1}\). Note that the number of bit assignments encoding the feasible routes increases, as we allow bit assignments encoding feasible routes to exist in the search space with different assignments to \( \omega \) without penalty.

To state the final form of the Hamiltonian, the inequality constraints will be included in the objective function and the integer variables \( \omega _i \) will be transformed into binary variables according to the procedure described in Sect. 2.5.

To convert the inequalities into equalities, we will use slack variables \(\xi _{e,i},\xi _{l,i}\) for e- and l-dependent inequalities for time step i. Trivially, \(\xi _{e,i}\) and \(\xi _{l,i}\) are lower-bounded by 0 for the case when inequalities are tightly satisfied. Let’s first define a lower bound for the arrival times \( A_i \), which will be useful for the rest of the discussion:

$$\begin{aligned} \underline{A}_i = \min _{v=1,\dots ,n}c_{0v} + \sum _{t=1}^{i-1}c^{(t)}, \end{aligned}$$
(18)

and \(c^{(1)},c^{(2)},\ldots \) is a non-decreasing sequence of all costs \(c_{uv}\) for \( u\ne 0 \).Footnote 1

For the l-dependent inequalities, we define the upper bound \( {\bar{\xi }}_{l,i} \) for the slack variables \(\xi _{l,i}\) as

$$\begin{aligned} {\bar{\xi }}_{l,i} {:}{=}-(\underline{A}_i - \max _{v=1,\dots ,n} l_v)= \max _{v=1,\dots ,n} l_v - \min _{v=1,\dots ,n}c_{0v} - \sum _{t=1}^{i-1}c^{(t)} . \end{aligned}$$
(19)

For the e-dependent inequalities, the slack variables should compensate the inequality when the arrival time \( A_i \) is greater than the earliest start time \( e_v \) of the corresponding city. Otherwise, the vehicle has to wait until the earliest start time and \( A_i+\omega _i=e_v \) which results in an equality. Since the arrival time should be less than or equal to the due time of the corresponding city, we have the following upper bound for the slack variables independent of the time point i

$$\begin{aligned} {\bar{\xi }}_{e,i} \equiv {\bar{\xi }}_{e} {:}{=}\max _{v=1,\dots ,n}(l_v-e_v). \end{aligned}$$
(20)

Next, we replace the integer variables by binary variables. Since the upper bounds for the slack variables are already discussed, let us focus on the waiting times. For each i, \( 0 \le \omega _i \) and we have the equality when vehicle arrives later than the earliest start time and gives the service immediately. For the upper bound, note that if the vehicle arrives early, it is sufficient for the vehicle to wait until the earliest start time. Hence, for \( \omega _i \), we define the upper bound \({\bar{\omega }}_i \) as

$$\begin{aligned} {\bar{\omega }}_i {:}{=}\max _{v=1,\dots ,n} e_v - \underline{A}_i = \max _{v=1,\dots ,n} e_v - \min _{v=1,\dots ,n}c_{0v} - \sum _{t=1}^{i-1}c^{(t)}. \end{aligned}$$
(21)

Now, we are ready to present the final form of the penalty Hamiltonian:

$$\begin{aligned} \begin{aligned}&H_{{\mathcal {T}}{\mathcal {W}}} {:}{=}\biggl (\sum _{v=1}^n x_{0,v}^1 e_{v} - A_1 -E_0^{{\bar{\omega }}_i}(\omega _1) + E^{{\bar{\xi }}_e}_0(\xi _{e,1})\biggr )^2\\&\qquad \quad \quad \quad + \biggl (A_1 - \sum _{v=1}^n x_{0,v}^1 l_{v} + E^{{\bar{\xi }}_{l,1}}_0(\xi _{l,1})\biggr )^2\\&\qquad \quad \quad \quad + \sum _{i=2}^{n}\biggl (\sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u,v}^i e_v - A_i -E_0^{{\bar{\omega }}_i}(\omega _i) + E^{{\bar{\xi }}_e}_0(\xi _{e,i}) \biggr )^2 \\&\qquad \quad \quad \quad + \sum _{i=2}^{n}\biggl (A_i-\sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u,v}^i l_v + E^{{\bar{\xi }}_{l,i}}_0(\xi _{l,i})\biggr )^2 . \end{aligned} \end{aligned}$$
(22)

3.1.3 Objective Hamiltonian and representation cost

In this paper, we focus on TSPTW problem, where the objective is to minimize the total cost of the tour. Hence, the objective Hamiltonian takes the following form:

$$\begin{aligned} H_{{\mathcal {C}}}^\mathrm{TSPTW} {:}{=}\sum _{v=1}^n c_{0v} x_{0,v}^1 + \sum _{i=2}^{n} \sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n c_{uv} x_{u,v}^i + \sum _{v=1}^n c_{u0} x_{u,0}^{n+1} . \end{aligned}$$
(23)

Hence, the QUBO Hamiltonian for the TSPTW problem can be expressed as

$$\begin{aligned} H_{\text {TSPTW}} {:}{=}P_1 H_{{\mathcal {R}}}+ P_2 H_{{\mathcal {T}}{\mathcal {W}}}+ P_3 H_{{\mathcal {C}}}^{\text {TSPTW}}, \end{aligned}$$
(24)

where \(H_{{\mathcal {R}}}\) and \(H_{{\mathcal {T}}{\mathcal {W}}}\) are defined as in Eqs. (11) and (22) and \( P_1,P_2,P_3 \) are the constants which need to be adjusted.

Let us calculate the number of qubits required for the formulation. For each \( i=2,\dots ,n \) and uv different than 0, there exist variables of the form \( x_{u,v}^i \), a total of \( n(n-1)^2 \) variables. For the case \( i=1 \) and \( i=n+1 \), there are 2n additional variables of the form \( x_{0,v}^1 \) and \( x_{v,0}^{n+1} \). For the binary representation of the variables, \(\omega _i\), \(\xi _{e,i}\) and \(\xi _{l,i}\), the number of required qubits is \(\lfloor \log _2({\bar{\omega }})\rfloor \) +1, \(\lfloor \log _2(\overline{\xi }_{e,i})\rfloor +1\) and \(\lfloor \log _2(\overline{\xi }_{l,i})\rfloor +1\), respectively. Let’s define \(\delta {:}{=}\lfloor \log _2(\max _vl_v)\rfloor +1 \), noting that \( \max _vl_v \) is an upper bound for \( \omega _i \), \( \xi _{e,i} \), and \( \xi _{l,i} \). Hence, in total the number of required qubits is at most

$$\begin{aligned} n^3 -2n^2 +3n +3n\delta = \mathcal {O}(n^3 + n\delta ). \end{aligned}$$
(25)

In general, \( \delta \) grows at least linearly by n since it is an upper bound for the arrival time to the nth city.

3.2 Node-based formulation

In this section, we will present a node-based formulation for the TSPTW problem. While edge-based formulation focuses on the variables of the form \(x_{u,v}^i\), here we will define variables using the original idea presented for the TSP problem in [21]. For each \( i=1, \dots ,n \) and \(v=1,\dots ,n\), let’s define the binary variables \(x_{v}^i \) such that

$$\begin{aligned} x_{v}^i = {\left\{ \begin{array}{ll} 1, &{} \text {node { v} is visited at { i}th step in the tour}\\ 0, &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(26)

Note that there is no need to define variables for \(i=0\) and \(n+1\) as \( x_{v}^0= x_{v}^{n+1}=1 \). Since the binary variables now represent the node visited at each time step, the formulation is called node-based.

To check whether the tour is a Hamiltonian cycle, we can use the following Hamiltonian as defined in [21]:

$$\begin{aligned} {\tilde{H}}_{{\mathcal {R}}} {:}{=}\sum _{v=1}^n \biggl (1 - \sum _{i=1 }^{n} x_{v}^i \biggr )^2 + \sum _{i=1}^{n} \biggl (1 - \sum _{v=1}^n x_{v}^i \biggr )^2. \end{aligned}$$
(27)

We define the Hamiltonian for the time windows constraints \( \tilde{H}_{\mathcal {TW}}^{i} \) analogously to the time windows constraints \( H_\mathcal {TW} \). Since \(x_{u,v}^i=1\) iff both \(x_u^{i-1}\) and \(x_v^{i}\) are equal to 1, it is enough to replace the variables of the form \(x_{u,v}^i \) by \( x_u^{i-1}x_v^{i} \) and we obtain the following Hamiltonian:

$$\begin{aligned} \begin{aligned}&{\tilde{H}}_{{\mathcal {T}}{\mathcal {W}}} {:}{=}\biggl (\sum _{v=1}^n x_{v}^1 e_{v} - {\tilde{A}}_1 -E_0^{{\bar{\omega }}_1}(\omega _1) + E^{{\bar{\xi }}_e}_0(\xi _{e,1})\biggr )^2 \\&\qquad + \biggl ({\tilde{A}}_1 - \sum _{v=1}^n x_{v}^1 l_{v} + E^{{\bar{\xi }}_{l,1}}_0(\xi _{l,1})\biggr )^2\\&\qquad + \sum _{i=2}^{n}\biggl (\sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u}^{i-1} x_{v}^i e_v - {\tilde{A}}_i -E_0^{{\bar{\omega }}_i}(\omega _i) + E^{{\bar{\xi }}_e}_0(\xi _{e,i}) \biggr )^2 \\&\qquad + \sum _{i=2}^{n}\biggl ({\tilde{A}}_i-\sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n x_{u}^{i-1} x_{v}^i l_v + E^{{\bar{\xi }}_{l,i}}_0(\xi _{l,i})\biggr )^2, \end{aligned} \end{aligned}$$
(28)

where

$$\begin{aligned} {\tilde{A}}_1=\sum _{v=1}^n x_{v}^1 c_{0v}, \qquad {\tilde{A}}_i = \sum _{t=1}^{i-1} \omega _t + \sum _{v=1}^n x_{v}^1 c_{0v} + \sum _{t=2}^{i} \sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n c_{uv} x_{u}^{t-1} x_{v}^t. \end{aligned}$$
(29)

Note that bounds on the slack variables or \(\omega \) have not changed.

Finally, the cost Hamiltonian takes the form

$$\begin{aligned} {{\tilde{H}}}_{\mathcal {C}}^{\text {TSPTW}} {:}{=}\sum _{v=1}^n c_{0v} x_v^1 + \sum _{ \begin{array}{c} u,v=1 \\ u\ne v \end{array}}^n c_{uv} \sum _{i=2 }^{n} x_{u}^{i-1}x_{v}^{i} + \sum _{v=1}^n c_{v0} x_{v}^n. \end{aligned}$$
(30)

and the Hamiltonian for the TSPTW problem is expressed as

$$\begin{aligned} \tilde{H}_{\text {TSPTW}} {:}{=}P_1 \tilde{H}_\mathcal {R} + P_2 \tilde{H}_\mathcal {TW} + P_3 \tilde{H}_\mathcal {C} ^{\text {TSPTW}}. \end{aligned}$$
(31)

Let us now estimate the number of qubits required. For each \( i=1,\dots ,n \) and v different than 0, there exist variables of the form \( x_{v}^i \), a total of \( n^2 \) variables. The number of qubits required to express \(\omega \), \(\xi _{e,i}\) and \(\xi _{l,i}\) is the same as in the edge-based encoding. Thus, the number of required qubits is at most \(\mathcal {O}(n^2+n\delta )\).

3.3 ILP approach

Finally, we will discuss a new QUBO formulation for the TSPTW problem based on the ILP model presented in [18]. It is assumed that the costs between the cities satisfy the triangle inequality, i.e., \( c_{uv} \le c_{uw} + c_{wv} \) for all \( u,v,w = 1,\dots ,n \).

The variables \( x_{uv} \) for all \( u,v =0,1,\dots , n \) such that \(u\ne v\) are defined as

$$\begin{aligned} x_{u,v} = {\left\{ \begin{array}{ll} 1 &{} \text {edge} (u,v) \text {appears in the tour},\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(32)

Note that unlike the edge-based formulation discussed previously, the time step in which the edge is visited is not specified.

Let’s recall the notation for the arrival time and waiting time at city v. We denote the arrival time and waiting time for each \( v=1,\dots ,n+1 \) by \( \alpha _v \) and \( \nu _v \), respectively. Note that by \(\alpha _{n+1} \), we denote the arrival time to the depot at the end of the tour. Let \( \sigma _v \) denote the exact service time for city v (including the waiting time). We present the ILP formulation from [18] in its entirety.

$$\begin{aligned}&\text {minimize} \sum _{\begin{array}{c} u,v=0 \\ u \ne v \end{array}}^n c_{uv}x_{u,v} \end{aligned}$$
(33)
$$\begin{aligned}&\text {subject to} \sum _{u=0}^n x_{u,v}=1~~&v=0,1,\dots ,n \end{aligned}$$
(34)
$$\begin{aligned}&\sum _{v=0}^n x_{u,v}=1~~&u=0,1,\dots ,n \end{aligned}$$
(35)
$$\begin{aligned}&\sigma _v \ge e_v~~&v=1,\dots ,n \end{aligned}$$
(36)
$$\begin{aligned}&\sigma _v \le l_v~~&v=1,\dots ,n \end{aligned}$$
(37)
$$\begin{aligned}&\sigma _v + c_{v0}x_{v,0} \le \sum _{\begin{array}{c} u,v=0 \\ u \ne v \end{array}}^n c_{uv}x_{u,v} + \sum _{v=1}^n \nu _v~~&v=1,\dots ,n \end{aligned}$$
(38)
$$\begin{aligned}&\alpha _v - c_{0v}x_{0,v} \ge 0 ~~&v=1,\dots ,n \end{aligned}$$
(39)
$$\begin{aligned}&\alpha _v + (l_v - c_{0v})x_{0,v} \le l_v~~&v=1,\dots ,n \end{aligned}$$
(40)
$$\begin{aligned}&\sigma _v = \alpha _v + \nu _v~~&v=1,\dots ,n \end{aligned}$$
(41)
$$\begin{aligned}&\sigma _u - \alpha _v + (l_u - c_{0v} + c_{uv} ) x_{u,v} \le l_u - c_{0v} ~~&u\ne v;~ u,v=1,\dots ,n \end{aligned}$$
(42)
$$\begin{aligned}&\alpha _v - \sigma _u + (l_v - e_u- c_{uv} ) x_{u,v} \le l_v - e_u ~~&u\ne v;~ u,v=1,\dots ,n \end{aligned}$$
(43)
$$\begin{aligned}&x_{u,v}\in \{0,1\}&u\ne v;~ u,v=1,\dots ,n \end{aligned}$$
(44)
$$\begin{aligned}&\sigma _{v}\in {\mathbb {Z}}_{\ge 0}, \alpha _v \in {\mathbb {Z}}_{\ge 0},\nu _v \in {\mathbb {Z}}_{\ge 0}.&v=1,\dots ,n \end{aligned}$$
(45)

In ILP, the precedence is given to reducing the space of feasible solutions over the number of variables used and some constraints are added specifically for that purpose. Before transforming ILP problem into QUBO formulation, we can remove those constraints such as Eq. (38), as converting them to equalities would require additional slack variables. We will remove Eqs. (36) and (37) as they define the upper and lower bounds for the variables \( \sigma _v \), and those bounds will be utilized while converting the integer variables into binary. The variables \( \alpha _v \) will be replaced by \( \sigma _v - \nu _v \), and Eq. (41) will be removed. We prefer to remove \( \alpha _v \) instead of \( \sigma _v \) or \( \nu _v \) as the range for the variables \( \alpha _v \) is larger and they are more qubit consuming. We express the simplified ILP problem as follows:

$$\begin{aligned}&\text {minimize} \sum _{\begin{array}{c} u,v=0 \\ u \ne v \end{array}}^n c_{uv}x_{u,v} \end{aligned}$$
(46)
$$\begin{aligned}&\text {subject to} \sum _{u=0}^n x_{u,v}=1~~&v=0,1,\dots ,n \end{aligned}$$
(47)
$$\begin{aligned}&\sum _{v=0}^n x_{u,v}=1~~&u=0,1,\dots ,n \end{aligned}$$
(48)
$$\begin{aligned}&c_{0v}x_{0,v} \le \sigma _v - \nu _v ~~&v=1,\dots ,n \end{aligned}$$
(49)
$$\begin{aligned}&\sigma _v - \nu _v + (l_v - c_{0v})x_{0,v} \le l_v~~&v=1,\dots ,n \end{aligned}$$
(50)
$$\begin{aligned}&\sigma _u - \sigma _v + \nu _v + (l_u - c_{0v} + c_{uv} ) x_{u,v} \le l_u - c_{0v} ~~&u\ne v;~ u,v=1,\dots ,n \end{aligned}$$
(51)
$$\begin{aligned}&\sigma _v - \nu _v - \sigma _u + (l_v - e_u- c_{uv} ) x_{u,v} \le l_v - e_u ~~&u\ne v;~ u,v=1,\dots ,n \end{aligned}$$
(52)
$$\begin{aligned}&x_{u,v}\in \{0,1\}&u\ne v;~ u,v=1,\dots ,n \end{aligned}$$
(53)
$$\begin{aligned}&\sigma _{v}\in {\mathbb {Z}}_{\ge 0}, e_v \le \sigma _{v}\le l_v&v=1,\dots ,n \end{aligned}$$
(54)
$$\begin{aligned}&\nu _v \in {\mathbb {Z}}_{\ge 0}&v=1,\dots ,n \end{aligned}$$
(55)

Equations (49) and (52) initialize the arrival time to the first visited city and also ensure that \( 0 \le \alpha _v \le l_v \) for the remaining cities. Note that due to triangle inequality, \( c_{0v} \le \alpha _v \le l_v \) is also correct.

Next, we will transform the inequalities into equalities using slack variables and convert those variables to binary. The upper bounds for the slack variables for Eqs. (49)–(50) can be defined as

$$\begin{aligned} \overline{\xi }_{49, v} {:}{=}l_v \text { and } \overline{\xi }_{50, v} {:}{=}l_v - c_{0v} \end{aligned}$$
(56)

noting that

$$\begin{aligned} \xi _{49, v}&\le - (\min \{c_{0v},0\}- \sigma _v + \nu _v )= \alpha _v \le l_v, \end{aligned}$$
(57)
$$\begin{aligned} \xi _{50, v}&\le -(\sigma _v -\nu _v + \min \{l_v - c_{0v},0\} -l_v) = l_v-\alpha _{v} \le l_v - c_{0v}. \end{aligned}$$
(58)

Equations (51) and (52) are included in the model to ensure that \( \sigma _u + c_{u,v} = \alpha _v \) when \( x_{u,v}=1 \). If \( e_u + c_{u,v} \ge l_v \) for some pair of cities (uv) , then Equations (51) and (52) need to be discarded from the model. Hence, we will assume that \( e_u + c_{u,v} \le l_v \) when calculating the bounds for the slack variables. In Eqs. (51) and (52), the minimum is achieved when \( x_{u,v} \) is set to 0 since \( l_u - c_{0v} + c_{uv} \ge 0 \) by triangle inequality and \( l_v-e_u-c_{uv} \ge 0 \) by our assumption. We can bound the variables \( \xi _{51, u,v} \) and \( \xi _{52, u,v} \) as

$$\begin{aligned}&\xi _{51, u,v} \le -(\sigma _u - \sigma _v + \nu _v + \min \{l_u - c_{0v} + c_{uv} ,0\} - l_u + c_{0v}) \end{aligned}$$
(59)
$$\begin{aligned}&= -\sigma _u + \alpha _v + l_u - c_{0v} \end{aligned}$$
(60)
$$\begin{aligned}&\le -e_u + l_v + l_u - c_{0v} \end{aligned}$$
(61)
$$\begin{aligned}&\xi _{52, u,v} \le -(\sigma _v - \nu _v - \sigma _u + \min \{l_v - e_u- c_{uv},0\} - l_v + e_u) \end{aligned}$$
(62)
$$\begin{aligned}&= -\alpha _v + \sigma _u + l_v - e_u ) \end{aligned}$$
(63)
$$\begin{aligned}&\le -c_{0v} + l_u + l_v - e_u. \end{aligned}$$
(64)

Hence, the upper bound for the slack variables is defined as

$$\begin{aligned} \overline{\xi }_{51, u,v} {:}{=}-e_u + l_v + l_u - c_{0v} \text { and } \overline{\xi }_{52, u,v} {:}{=}-c_{0v} + l_u + l_v - e_u. \end{aligned}$$
(65)

The integer variables \( \nu _v \) can be bounded such that \( 0 \le \nu _v \le {\bar{\nu }}_v \) implying

$$\begin{aligned} {\bar{\nu }}_v {:}{=}e_v - c_{0,v}. \end{aligned}$$
(66)

For completeness, let us write the full Hamiltonian based on the ILP above. We will apply penalty \(P_1\) for the constraints defined in Eqs. (47) and (48) and \( P_2 \) for the remaining constraints. For simplicity, we are including all the constraints even if \(e_u+c_{u,v}>l_v\). The final Hamiltonian takes the form

$$\begin{aligned} \hat{H}_{\text {TSPTW}} {:}{=}P_1 \hat{H}_\mathcal {R} + P_2 \hat{H}_\mathcal {TW} + P_3 \hat{H}_\mathcal {C} ^{\text {TSPTW}}. \end{aligned}$$
(67)

where

$$\begin{aligned}&\hat{H}_\mathcal {R} = \left( \sum _{u=0}^n \left( 1-\sum _{v=0}^n x_{u,v}\right) ^2 + \sum _{v=0}^n\left( 1-\sum _{u=0}^n x_{u,v}\right) ^2\right) , \end{aligned}$$
(68)
$$\begin{aligned}&\hat{H}_\mathcal {TW} = \left( \sum _{v=1}^n\left( E_{e_v}^{l_v}(\sigma _v) - E_0^{{\bar{\nu }}_v}(\nu _v) - c_{0v}x_{0,v} - E_0^{\overline{\xi }_{49, v}}(\xi _{49, v})\right) ^2\right. \end{aligned}$$
(69)
$$\begin{aligned}&\qquad +\sum _{v=1}^n\left( E_{e_v}^{l_v}(\sigma _v) - E_0^{{\bar{\nu }}_v}(\nu _v) + (l_v - c_{0v})x_{0,v} - l_v + E_0^{\overline{\xi }_{50, v}}(\xi _{50, v})\right) ^2 \end{aligned}$$
(70)
$$\begin{aligned}&\qquad +\sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n\left( E_{e_u}^{l_u}(\sigma _u) - E_{e_v}^{l_v}(\sigma _v) + E_{0}^{{\bar{\nu }}_v}(\nu _v) + (l_u - c_{0v} + c_{uv} ) x_{u,v} - l_u + c_{0v} + E_0^{\overline{\xi }_{51, u,v}}(\xi _{51, u,v}) \right) ^2 \end{aligned}$$
(71)
$$\begin{aligned}&\qquad +\left. \sum _{\begin{array}{c} u,v=1\\ u\ne v \end{array}}^n\left( E_{e_v}^{l_v}(\sigma _v) - E_0^{{\bar{\nu }}_v}(\nu _v) - E_{e_u}^{l_u}(\sigma _u) + (l_v - e_u- c_{uv} ) x_{u,v} - l_v + e_u + E_0^{\overline{\xi }_{52, v}}(\xi _{52, u,v}) \right) ^2 \right) , \end{aligned}$$
(72)
$$\begin{aligned}&\hat{H}_\mathcal {C} ^{\text {TSPTW}} = \sum _{\begin{array}{c} u,v=0 \\ u \ne v \end{array}} c_{uv}x_{u,v}. \end{aligned}$$
(73)

Let us note that \( \hat{H}_\mathcal {R} \) does not account for the route constraints on its own as the subtour elimination constraints are included within the time windows constraints. Nevertheless, we use the notation \( \hat{H}_\mathcal {R} \) for consistency with the other formulations.

Let us calculate the number of qubits required for the given representation. \((n+1)^2\) qubits represent the variables \(x_{u,v}\). For each \(\nu _v\) and \(\sigma _v\) for \(v=1,\dots , n\), at most \(\delta \) qubits are required, a total of \(2n\delta \) qubits. For each slack variable, at most \(\mathcal {O}(\delta )\) qubits are required, and since we have \(\mathcal {O}(n^2)\) inequalities, we will need \(\mathcal {O}(n^2\delta )\) variables. Thus, in total we will need \(\mathcal {O}(n^2 + n^2 \delta )\) qubits.

3.4 Additional comments

All formulations work independently of whether the cost matrix is symmetric or not. It is assumed that \( e_v \ge c_{0v} \) for all \( \{v \in 1,\dots ,n\} \) since the earliest start time for a city can not be smaller than \( c_{0v} \). In the ILP formulation, the variable \( \nu _v \) representing the waiting time for city v can be removed if \( e_v = c_{0v} \).

In the edge-based and ILP formulations, we can remove some variables when moving from one city to another is forbidden. This is the case when the graph is not complete or when \( e_u + c_{uv} \ge l_v\) for some \( (u,v) \in \vec {E}\). We can simply set \(x_{u,v}^i=0\) or \( x_{u,v}=0 \) depending on the formulation and ignore those variables. In such a case, the number of required qubits is reduced to \(\mathcal {O}(n|\vec {E}| + n\delta )\) and \(\mathcal {O}(|\vec {E}| + |\vec {E}|\delta )\), respectively, for the edge-based and ILP formulations. Note that in the node-based formulation, we cannot remove any qubit even if traversing some arc is not possible. Furthermore, the corresponding inequalities given in Eqs. (51) and (52) may be removed in the ILP formulation.

One can introduce alternative objective functions to encode different problems. For instance for the MPTW problem, we can use the following objective function for the edge-based formulation:

$$\begin{aligned} H_{{\mathcal {C}}}^{\text {MPTW}} {:}{=}\sum _{i=1}^{n} \omega _i + H_{{\mathcal {C}}}^\mathrm{TSPTW}. \end{aligned}$$
(74)

Similar Hamiltonians can be defined for the node-based and ILP formulations as well.

When the three models are compared, the number of required qubits is \(\mathcal {O}(n^3+n\delta )\), \(\mathcal {O}(n^2+n\delta )\) and \(\mathcal {O}(n^2+n^2\delta )\), respectively, for the edge-based, node-based and ILP formulations. When quadratization is performed to the HOBO formulation obtained through the node-based approach by replacing the product of variables \(x_u^{t-1}x_v^{t} \) with \(x_{uv}^t \), the resulting QUBO has asymptotically the same number of variables with that of the edge-based approach. Furthermore, the two formulations are similar in nature, but the quadratized formulation involves additional constraints coming from the quadratization procedure itself. Further details on quadratization is given in the “Appendix.”

To have an overview of the number of required variables for real problem instances, we calculated the number of required variables for the instances introduced in [38] and the results are plotted in Fig. 2. Further details can be found in the “Appendix.”

Fig. 2
figure 2

The number of variables required by different formulations for each instance from [38]

4 Results from the D-Wave machine

In this section, we will present the results conducted on the D-Wave Advantage QPU hardware. We ran several experiments using instances of small sizes to demonstrate the state-of-the-art capability of D-Wave systems. We used edge-based and ILP approaches to formulate the problems, as the HOBO formulation obtained using the node-based approach can be used only after performing quadratization as explained in Sect. 3.4.

Given that the penalty constants are well adjusted, the ground state encodes the optimal route with no penalties coming from the time windows constraints and has the lowest energy. Note that some bit assignments may encode an optimal (feasible) route yet violate some time windows constraints. Such assignments have larger energy than the bit assignment (if exist) with all time constraints being satisfied. Throughout the section, we will use the term sample encoding optimal (feasible) route for the bit assignments encoding optimal (feasible) routes obtained by quantum or simulated annealing, regardless of whether they violate time windows constraints or not. Such samples can be classified as optimal (feasible) since we are interested in the optimal (feasible) route, and other variables can be calculated using a classical procedure mentioned in Sect. 3.

4.1 Instances

We created 10 random metric TSPTW instances for each \(n \in \{3,4,5\}\) where n is the number of cities. For the cost matrix, we picked random integers between 1 and 10, and for the earliest start times, we picked random integers between 1 and 20. For the latest start times, we picked random integers between the earliest start time of the city and 40. For the first 5 instances, the TSP and TSPTW solutions are the same, and for the remaining 5, they are different. An optimal route exists for each instance.

4.2 Embedding

Before running a problem on D-Wave QPU, the QUBO formulation is converted into Ising formulation that consists of the linear and the quadratic terms. The quadratic terms represent the coupler strength between the qubits, and one can represent the variables by nodes and the quadratic terms by the edges. As D-Wave QPUs do not admit a fully connected topology, the variables cannot be mapped directly to the physical qubits on the machine. Hence, each variable is represented by a set of qubits called the chain, and the qubits in a chain are coupled strongly enough based on a parameter called chain strength so that they end up in the same state. The existence of longer chains increases the error in the results.

This process of mapping the variables to the physical qubits is known as the minor-embedding problem. We conducted our experiments on D-Wave Advantage that consists of 5640 qubits oriented in Pegasus graph topology [39]. For minor-embedding, we used the minorminer algorithm provided by D-Wave. For each instance, the number of logical and physical variables is shown in Fig. 3 and the range for the maximum chain lengths for instances with different numbers of cities is given in Table 1.

Fig. 3
figure 3

The number of logical and physical variables is plotted for each instance

Table 1 Range for the maximum chain lengths for instances with different numbers of cities

4.3 Penalty constants

In the QUBO formulation, there exist objective Hamiltonian and Hamiltonians that account for the constraints. Associated with each Hamiltonian, there exist penalty constants whose magnitude depends on the penalty constant. When a constraint is violated, the Hamiltonian brings in a penalty to the energy,

In our formulations, we have three constants, \( P_1 \), \( P_2 \) and \( P_3 \), corresponding to Hamiltonians that check whether the tour is a Hamiltonian cycle, that encodes the time windows constraints, and that accounts for the cost of the tour, respectively. In the ILP formulation, subtour elimination constraints are included within the time windows constraints. In the original TSP problem, the only constants to be adjusted are \( P_1 \) and \( P_2 \) and it is enough to set \( 0< \mathcal {C}\cdot P_2 <P_1\) [21] where \( \mathcal {C} = \max _{v,w}c_{vw} \). The idea is to set \( P_1 \) large enough to ensure that the constraints are not violated in the favor of optimizing the cost. In the case of TSPTW, this requires a more detailed investigation as the penalty constant for the time windows needs to be adjusted as well, and multiple factors affect the penalty constant including the earliest and latest start times. If the penalty constants are not well adjusted, a bit assignment encoding an infeasible route may have lower energy than a bit assignment encoding the optimal route.

To adjust the penalty constants, we set \( P_2=1 \) and parameterize \( P_1 \) and \( P_2 \) in terms of \( \mathcal {C} \). After the model is formulated, it is converted into Ising formulation, and both linear and quadratic terms are scaled to match the allowed range of the specific QPU. In order to determine the penalty constants, we used simulated annealing and tested a grid of penalty constants, and selected those which maximize the probability that the optimal route is observed in the sampleset for both formulations. For the simulated annealing experiments, we set the beta range (inverse temperature) as (5, 100) , the number of steps to 10000, and the number of samples as 100. Let us remark that using simulated annealing with the pre-knowledge of the optimal route is not applicable in practice, and there is not a one-to-one correspondence between the samples obtained by SA and QA experiments. Nevertheless, as the main aim of our experiments is demonstrating the capabilities and the limits of the D-Wave machines, this approach provides foresight into the selection of the penalty constants.

In Fig. 4, we plot the results of the SA experiments for different choices of \( p_1 \) and \( p_2 \) where \( P_1 = \mathcal {C} \cdot p_1 \) and \( P_2 = \mathcal {C}/{p_2}\). In the first two plots, the probability that a sample encoding the optimal route is observed within the sampleset is plotted for the edge-based and ILP formulations, respectively, for an instance with 4 cities. For some choice of penalty values, samples encoding non-optimal routes end up with lower energies than the samples encoding optimal route, yet the optimal route is also sampled with positive probability. In the last two plots, the probability for such penalty constant pairs is set to 0.

Fig. 4
figure 4

The probability that a sample encoding the optimal route is observed for a edge-based and b ILP formulation for an instance with 4 cities. Simulated annealing is used for the probability estimation. In plots c and d, the results are repeated except that we set probability to 0 if the least energy sample does not encode an optimal solution

4.4 Optimization results

For each instance, we set \( P_1 \) and \( P_2\) based on the results of the SA experiments and run the experiments on D-Wave Advantage. For the experiments, we set the annealing time parameter as \( 50 \mu s \) and the number of samples as 1000. We note that using shorter and longer annealing times did not provide any significant improvement. The chain strength is set between 1.4 and 2 using the following strategy: Initially, the chain strength is set as 2 and it is decreased gradually if a satisfying outcome is not obtained, in the meantime keeping in mind that chain breaks should be avoided.

We begin with the edge-based formulation. Initially, we investigate whether the sample with the lowest energy encodes the optimal route or not for each instance. That was the case for all instances with 3 cities, whereas for instances with 4 cities the lowest energy sample did not encode the optimal route for three of the instances. In Fig. 5, the histogram of the energies is plotted for an instance with 3 cities on the left, and for an instance with 4 cities on the right. The ground state energy is calculated and indicated by the red line on both plots. Note that even though the sample with the lowest energy encodes the optimal route on the right plot, its energy is higher than the ground state energy. This indicates that the bit assignment violates some time window constraints; nevertheless, the correct route is found.

Fig. 5
figure 5

Histogram of the energies obtained from D-Wave using edge-based formulation

In Fig. 6, the histogram of the energies for two instances with 4 and 5 cities is plotted, in which the samples with the lowest energies do not encode the optimal route. For the instance with 4 cities, the sample with the lowest energy encodes a feasible route, and it is observed that the distribution of the energies is shifted to the right. This is even more apparent on the right plot for the instance with 5 cities as the gap between the sample with the lowest energy and the ground state energy is larger. Indeed, this is true for all instances with 5 cities, and the existence of such a gap prevents sampling states which correspond to the optimal route.

Fig. 6
figure 6

Histogram of the energies obtained from D-Wave using edge-based formulation in which the lowest energy samples do not encode an optimal route. In plot b, there are no feasible or optimal samples

For the ILP formulation, the samples with the lowest energy encode optimal routes for all instances with 3 cities. For the instances with 4 cities, this was the case for only half of the instances. In Fig. 7, the first plot is the histogram for the energies obtained for an instance with 3 cities. Samples encoding optimal and feasible routes are separated clearly, and the distribution of the energies is not similar to that of the edge-based formulation. In the second plot, the histogram of the energies is plotted for an instance with 4 cities, and even though the sample with the lowest energy encodes the optimal route, its energy is larger than the ground state energy. In the third plot, it is seen that the distribution is even more shifted to the right for an instance with 5 cities.

Fig. 7
figure 7

Histogram of the energies obtained from D-Wave using ILP formulation. In plots a and b, there are no feasible samples, and in plot c, there are no optimal samples

Besides checking whether the lowest energy state encodes the optimal route, we calculated the ratio of the samples encoding feasible and optimal routes among the sampleset, which is displayed in Fig. 8. For the instances with 5 cities, no samples encoding feasible routes and optimal routes were observed for the edge-based formulation.

Fig. 8
figure 8

Ratio of samples that encode feasible and optimal routes is given in the following plots. a Edge-based formulation for instances with 3 cities. b Edge-based formulation for instances with 4 cities. The sample with the lowest energy does not encode optimal route for instances numbered 1, 5, and 6. c ILP formulation for instances with 3 cities. d ILP formulation for instances with 4 cities. The sample with the lowest energy does not encode the optimal route for instances numbered 2, 3, 4, 5, and 7. e ILP formulation for instances with 5 cities. The lowest energy sample does not encode the optimal route for any of the instances

5 Discussion

Let us now analyze some key features of the formulations presented in this work and discuss the experimental results obtained from the D-Wave machine.

When we compare the number of qubits required by each formulation, the node-based formulation is the most advantageous when \(|\vec {E}| \gg n\). A similar observation can be also made by looking at Table 2 in the “Appendix,” which compares the number of variables required for the instances from the AFG dataset. On the other hand, one of the main challenges in mapping real world-applications to current D-Wave devices is the restriction of 2-local interactions. One often ends up with a higher-order problem, as in the case of our node-based formulation, and some quadratization is required to recast the problem into QUBO format. If one applies quadratization to the node-based model using the procedure described in [40], the resulting QUBO model has asymptotically the same number of variables as the original QUBO model. More information regarding quadratization is given in the “Appendix.” However, the inclusion of additional constraints due to quadratization introduces new constants to be tuned. Adjustment of these parameters is one of the challenges in QA in general.

In [22], the authors directly simulate HOBO models using simulated annealing and simulated quantum annealing (SQA) to assess whether there is any advantage over quadratization. It turns out that using the HOBO formulation does not improve the performance of the SQA which may be due to the limited connectivity of current devices. Nevertheless, future quantum annealers might allow k-local interactions as well for \( k \ge 3 \) or better connectivity, which makes HOBO formulations still valuable. Furthermore, HOBO formulations can be natively solved using some other approaches like QAOA or VQE as suggested in [22, 32, 33, 41, 42].

An advantage of the presented edge-based and node-based formulations for TSPTW is that they allow the vehicle to wait in a city even after the earliest start time. Hence, the same route exists in the search space with different assignments to qubits representing waiting times and slack variables. Thanks to this, different bit assignments encode the optimal route without any penalty.

Recall that the main difference between the formulations is how we encode the binary variables, either using edge or node formulation. This dichotomy often appears when encoding problems related to graphs, and it is also mentioned in [43]. Let us mention that there are alternative encoding ideas not covered in this paper. For instance, in [33, 41], the authors used binary encoding to represent the permutations for the TSP problem. Even though such encoding has an unbounded-order Pauli terms, an optimal number of \(\sim n\log n\) qubits were achieved.

One of the biggest challenges for all formulations is the choice of penalty values. In the original TSP problem, the only constants to be adjusted are \( P_1 \) and \( P_3 \) and it is enough to set \( P_1 = 2 P_3 \max _{v,w}c_{v,w}\) [21]. In the case of TSPTW, this requires a more detailed investigation. In this study, we used simulated annealing to search for the penalty values, and the search was restricted to a predefined range. The penalty values heavily depend on the specific instance, making it hard to find a general rule which would work for all instances. Besides the penalty values, there are multiple parameters such as the annealing time or the chain strength that should be tuned while performing the experiments.

For the instances we have used, the edge-based formulation requires fewer qubits than the ILP formulation. However, for large instances, ILP formulation might be more advantageous. Furthermore, when the experimental results for the edge-based and ILP formulations are compared, it is observed that ILP formulation is more promising as the probability of observing a sample encoding the optimal route is higher for the instances with 4 cities, and some samples encoding feasible routes are observed for the instances with 5 cities. The difference between the performance of the two formulations might be due to the different energy landscapes created by the two formulations. Another reason might be the choice of correct penalty constants. Overall, long chain lengths are one of the reasons behind the unsatisfactory results.

6 Conclusions and future work

In this paper, we proposed three unconstrained binary models for general travelling salesman problem with time windows. Two of the introduced models were QUBO models, which (up to graph embedding) can be natively used for quantum annealing. The third model includes higher-order terms which makes it more suitable to variational quantum computing or digital annealers. We investigated analytically the memory requirements of the introduced models. Finally, we investigated the performance of edge-based model and ILP model on the currently available quantum annealer developed by D-Wave, using randomly created TSP instances with 3, 4, and 5 cities. ILP model performed better as the probability of observing a sample encoding the optimal route was higher for the 4 cities case. In addition, some samples encoding feasible routes were observed in the case of 5 cities in the ILP model, while no feasible samples were obtained in the edge-based model.

A natural progression of this work is to investigate further the choice of penalty values. Instead of assigning a single penalty value for all of the time windows constraints, using more than one penalty value for different sub-constraints can help fine-tuning. For the node-based formulation, some other techniques for quadratization [44, 45] can be investigated, which can result in a model with less resource requirement. Further studies should be carried out to examine more closely the energy distribution of the samples, and in particular the gap between the energy of the ground state and the samples obtained from D-Wave.

The formulations presented here may be extended to a wide range of problems based on TSP like the vehicle routing problem and its variants, laying the groundwork for future research into the field of quantum optimization. Although the experimental work is limited to small instances, it offers insights into the field, emphasizing some of the challenges faced while solving complicated optimization problems using QA. Considering the number of variables required by the real instances, the study also puts forward the limits of the current quantum hardware for solving real-world problems, suggesting that hybrid algorithms can be a better alternative for solving large instances.

On should also note that alternative proposals for harnessing binary models have been proposed. In particular, in [46] quantum-inspired hardware architecture for speeding-up solutions of combinatorial problems was introduced. The architecture has been implemented using FPGA, enabling over four orders of magnitude of speedup for solving TSP comparing with simulated annealing. From the perspective of real-world applications, TSPTW provides more challenging and more relevant problem to study.