Deterministic annealing with Potts neurons for multi-robot routing

A deterministic annealing (DA) method is presented for solving the multi-robot routing problem with min–max objective. This is an NP-hard problem belonging to the multi-robot task allocation set of problems where robots are assigned to a group of sequentially ordered tasks such that the cost of the slowest robot is minimized. The problem is first formulated in a matrix form where the optimal solution of the problem is the minimum-cost permutation matrix without any loops. The solution matrix is then found using the DA method is based on mean field theory applied to a Potts spin model which has been proven to yield near-optimal results for NP-hard problems. Our method is bench-marked against simulated annealing and a heuristic search method. The results show that the proposed method is promising for small-medium sized problems in terms of computation time and solution quality compared to the other two methods.


Introduction
Multi-robot system (MRS) is more robust and efficient when compared to single robots for doing advanced and complex tasks. They are very useful in applications like indoor warehouses [23], automated container terminals [25], search and rescue (coverage problems) [4], surveillance [10], etc. However, the success of MRS depends on the team's goals like coordination, collaboration or cooperation between them. There are different types of MRS problems depending upon the team goals and objectives like multi-robot coordination, multi-robot task allocation, multi-robot path planning, etc. When a MRS problem involves assigning a set of tasks (or goals or targets) to a set of robots, such that a specific objective is achieved while taking constraints into account, is called the multi-robot task allocation (MRTA) problem [11]. In this paper, we study one of the important MRTA problems termed as the multi-robot routing problem with min-max objective, where several homogeneous robots are required to visit a group of locations to accomplish certain tasks. The robots should be routed in such a way that the maximum cost incurred by any one of the robots is minimized. The robot locations and the task locations are all known and the locations are in the Euclidean plane with or without obstacles. However, the constraints are such that it may be necessary to do some tasks in a specific order. Moreover, we are specifically looking at multiple mobile robots seeking to distribute spatial motion tasks among them. A practical example is the container terminal with stacked containers; it does not make sense to pick a container at the bottom of a stack while others are piled on top of it. The problem is illustrated in Fig. 1. Most of the other multi-robot routing problems found in literature focuses only on minimizing the sum of the makespan of all the robots. In our case, the given tasks should be ordered, grouped and assigned in such a way that the cost for the slowest robot is minimized. This helps in reducing the load on a single robot and distributes the work load evenly among all robots, thus reducing the breakdown of robots due to unbalanced task distribution. Hence, this objective is more practical and useful in many real-world applications.
Given the high relevance in practical robot applications, there are surprisingly few studies of the MRR problem with min-max objective (MRR-MM) [2,3,9,21,24,30,32]. There are other works similar to this problem [5,7,22,23,27] but differ in a number of ways like triangle inequality, returning back to fixed positions, heterogeneous task, etc. A detailed review of the problem can be found in [6]. This is a NP-hard problem with the number of tasks [27] as the time to compute Fig. 1 Illustration of the MRR problem with min-max objective. The red dotted lines are utility costs between tasks (T) and blue solid lines are between robots (R) and tasks. The solution is the assignment of tasks in a sequential order such that the maximum cost incurred by any one of the robots is minimized R1   R2   R3   T4   T3   T2   T1   R2   R3  T1  T3   T4   R1  T2 the solutions grows exponentially with the size of the problem. Researchers have to trade-off between solution quality and the computational time to solve such problems. They use meta-heuristics, hybrid and stochastic algorithms to obtain reasonable solutions. In this paper, we seek to find nearoptimal solutions for this problem in a considerable amount of time. So, in this paper, we propose a novel method adapted from the field of statistical physics that can be used to solve this problem. The method is called the deterministic annealing method (DA) using the concepts of mean field theory (MFT) and Potts neurons. This method has been previously applied successfully to other NP-hard problems like graph bisection [29], TSP [16], knapsack [26], air-crew scheduling [20], communication routing [12,13], etc., and has been successful in finding near-optimal deterministic solutions faster than other heuristics approaches.
In this paper, we first formulate the MRR-MM problem in the form of a matrix. The search space is the set of all the permutation matrices of the problem size and the permutation matrix with the lowest cost is the optimal solution. This is the first contribution of this paper where the MRR-MM problem is mapped into a permutation matrix. One can use any search algorithm to find the optimal matrix using this problem formulation (for example, we also apply simulated annealing method to this formulation to compare with our method). We use a deterministic annealing method to search for the permutation matrix by applying mean-field dynamics which produces very good solutions. Secondly, the previous works on DA which was used to solve other NP-hard problems made use of only the Potts vector, whereas we map it as a Potts matrix. This is inspired by ideas in a technical report by Jönsson and Söderberg [17] and we develop the method fully to the MRR-MM problem. This is the second contribution of this paper. Thirdly, we study the DA method's performance with respect to scalability, optimality and computation time by comparing against two other generic methods: the stochastic-simulated annealing (SA) and a heuristic graph-based method (inspired from Turpin et al. [33]). The DA method shows promising results in terms of computation resources and optimality. The DA method used with Potts Neuron and MFT, when applied to NP-hard problems like the multi-robot routing variants, produces better solutions compared to the general meta-heuristics methods used in the literature. This is the third contribution of this paper.
The rest of the paper is organized as follows: The related work and the background for DA method with Potts neurons and MFT is presented in Sect. 2. The problem formulation is detailed in Sect. 3. The DA method is elucidated in Sect. 4. Detailed experiments and performance characteristics of the method, and benchmarks with other methods, are presented in Sect. 5 followed by concluding remarks in Sect. 6.

MRR-MM problem
There are very few works on the MRR-MM problem and the constraints applied are different for each depending upon the application. For example, Parker [28] studied it with a focus on producing fault tolerant systems; suggesting a behavior-based architecture that enabled robots to cooperate and achieve tasks even if individual robots failed. Arkin et al. [1] worked on a similar problem termed as "min-max k-paths cover problem," where k paths are found in a graph G = (V , E) covering V vertices and the length of each path is not more than four times the optimal solution. However, in their work, the k paths do not have different start locations (depots) as in our case. Lagoudakis et al. [3,21,30] assumed that the utility costs were symmetric and obeyed the triangle inequality. Lagoudakis et al. developed a theoretical distributed framework using the auction algorithm that establishes performance guarantees for small to medium-sized problems. While Bae and Park developed a heuristic based on a primal-dual technique for heterogeneous robots. Faigl et al. [9] explored a similar problem where multiple unmanned aerial vehicles (UAVs) were required to visit the given targets in a short span of time. They considered the UAVs to have Dubin models (for respecting the curvature constraints) and a neighborhood visit by the UAVs (instead of the exact location) to adapt it to be used for real-world search and rescue operation. They made use of variable neighborhood search method and extended it to include self-organizing maps to learn the allocation from experience. Their problem differs from our problem in two aspects; it focused on the additional constraints (curvature constraints and neighborhoods) and focused on very fast computation time, instead of optimality. Turpin et al. [33] worked on a similar problem as ours with multiple starting locations for robots and path planning between the robots. They suggested a polynomial-time approximation algorithm with a sub-optimality bound of five times the cost obtained from the method by Arkin et al. [1]. This is the only work that is very closely related to our exact problem (but with an addition of collision avoidance) and focuses on optimality and computation time. So, a simpler form of this method will be used for comparing the results of our method in Sect. 5.
As seen from the literature, some researchers have focused on finding faster solutions rather than optimal solutions and focused on the scalability of their approaches. Some others focus on the practical parameters of the problem like fault-tolerant or dynamic planning of the task allocation with no emphasis on the solution quality or computation time required. So, in this paper, we seek to find optimal (small-size problems) and near-optimal (for large size problems) solutions in a considerable amount of time for the MRR-MM problem. In order to achieve this, we design a deterministic annealing (DA) with mean field theory (MFT) and Potts neurons method to find the global minima of the system. Most of the other related works in the literature traded optimality for computation time or scalability. For a multi-robot task routing scenario, near-optimal solutions can save huge operation cost of the robots, and hence, we focus on finding near-optimal solutions for this NP-hard problem using this novel approach. The DA method has produced nearoptimal solutions for small-medium sized problems with shorter computation time than the other methods.

DA method with Potts neuron
The Hopfield neural network (HNN) [14] was first used to solve the classic traveling salesman problem. Similarly, many optimization problems were solved by mapping it into a Hopfield neural network. If the objective function of the problem can be expressed in terms of a Hopfield energy function E, then minimizing this energy function minimizes the objective function. This is because the constraints are "embedded" into the weights of the network. In a HNN, the general form for the energy function used is where w i j is the connection strength between neural units s i and s j , s denotes the vector with binary s i values. The connection strengths w i j are designed in such a way that the configuration of s with minimum energy E(s) corresponds to the optimal solution of the problem.
In such an energy function, there are typically many local minima and one should therefore search for the global minima with methods that can escape local minima. Simulated annealing is one such method which uses a temperature kT that controls the ability to escape local minima. As kT → 0, the system is expected to converge to the global minimum.
Hopfield and Tank [15] later introduced a continuous neural unit v i ∈ [0, 1] instead of s i in the HNN. This continuous units v i denote the probability for the binary unit s i to have the value 1. This is referred to as deterministic annealing (DA) with MFT or mean field annealing.
Peterson and Söderberg [29] showed that if v i j unit is used instead of v i , it gave more stable convergence. Because it imposes a constraint called "1-out-of-Z " and this Z -state Potts unit is a variable with Z possible states/values. This allows only one unit from a group of Z units to turn ON (1) and the rest of the units to OFF (0). So, when an optimization problem is expressed as an energy function with the constraints embedded as weights of the Potts units and annealing is done, as kT → 0, the system converges to the global minima giving the optimal solution of the problem.

Problem formulation
The requirements of our MRR-MM problem include: -Every task is done. -Every robot does at least one task. -The robots can have different start locations (multiple depots). -The robots are not required to return to their starting locations (depot) after the task completion. They return to any end locations. -The utility costs are asymmetric (e.g., the cost for going from task T1 to task T2 is different from going from task T2 to task T1).
So, given M robots and N tasks, where M ≤ N , and a utility cost between M robots and N tasks, the aim is to find an optimal assignment of M robots to a set of N tasks such that the total utility cost of the slowest robot is minimized. The utility cost is divided into two as: the transfer costs Δ i j between any two consecutive tasks i and j, expressed in the transfer matrix , and T i , the cost for doing a task i, expressed in the task vector T.
A solution square matrix S, of size (2M + N )×(2M + N ), also denoted as the transfer assignment matrix, is defined with binary variables: Special additional transfers are introduced for moving from the start position of a robot to a task and for going from a final task to a possible end position for a robot. They are labeled S and E followed by the robot letter. It is not necessary that the start and end points of the robots are related to each other. It is, e.g., possible for robot X to start at S X and end at EY . For an example with two robots (X and Y ) and three tasks (a, b, and c), the complete set of tasks is The element s a,b is read as "the task a is followed by task b". Many of the transfers in the matrix S are dummy or undesired, and thus not free to take nonzero values. For instance, the element s a,S X means "the task a is followed by going to the start position S X ", which is not desired and the element should always be zero. Similarly end position transfers like s E X,a imply going from the end position E X to a previous task a, which should not happen, and therefore, this element should also always be zero. The element s S X,E X and similar elements imply a transfer directly from a start position to an end position, without doing any tasks. Also, there are transfers that lead to loops. They can be first-order loops like (a, a), (b, b), or higher order, which will be discussed later. Undesired transfers and first-order loops are avoided by fixing their corresponding matrix elements to zero and the effective transfer assignment matrix looks like this: Thus, even though the full transfer assignment matrix is 7×7, the effective matrix (the sub-matrix that changes) is a permutation matrix 1 of size 5 × 5. This matrix has 24 solutions. For example, one solution is, EY , which is read as "the robot X goes from start position, does task b and then goes to end position X", and "the robot Y goes from start position, does task a followed by task c, and then goes to end position Y". The solution cost of this matrix can be found by using the values of the S matrix applied on the transfer cost matrix and task vector T, similar to the manner used in the Hungarian algorithm [18], or using the transport cost design variable that is discussed in Sect. 4.2.2. The problem is now encoded in a matrix and the search space is the list of all permutation matrices of the given size. The optimal solution to this problem is the valid permutation matrix with the minimum cost. Some of the permutation matrices will not correspond to valid solutions due to the presence of loops, i.e., a robot will eventually be routed to an already done task. Permutation matrices with loops must be avoided. In our example above, with a 5×5 sub-matrix, there are 120 permutation matrices, of which 24 are valid without any loops and each robot does at least one task. The optimal solution is the valid solution matrix with the minimum cost. One can use any search method to find this "valid and minimum cost" permutation matrix, i.e., the optimal solution matrix.

Methodology
In order to apply the DA method on this problem formulation, we do the following steps: 1. Convert S matrix into Potts transfer assignment matrix V. 2. Design variables to a. detect loop matrices using V matrix and to b. find minimum cost permutation matrix. 3. Derive the local field energy equation using the design variables. 4. Substitute this field energy function in the standard Potts matrix equation. 5. Tune an annealing schedule for reducing this energy function to find the global minima.

Potts transfer assignment matrix
The V matrix consists of continuous units v i j ∈ [0, 1], and has the same size as the S matrix. The Potts unit v i j represents a thermal average value for s i j during an annealing process. Just like the S matrix, the V matrix has an effective sub-matrix that is doubly stochastic.

The propagator matrix
To detect the matrices with loops and thereby impose an extra penalty for such solutions in the search, we introduce the propagator matrix P, as suggested by Lagerholm et al. [19].
The propagator matrix element p i j expresses the number of paths that lead from task i to task j. This can be understood from the Neumann series expansion in (5): the matrix elements in V tell the paths from one task to another task directly, the elements in V 2 tell the paths from one task to another task via an intermediate task, the elements in V 3 correspond to counting the paths between two tasks via two intermediate tasks, and so on and so forth. At least one propagator matrix element will approach infinity if V moves toward a solution with loops and the propagator matrix can be used to impose a penalty on the V matrix for loops.

Minimum cost
To compute the solution cost of a V matrix, two vectors L and R are introduced. The element L i is the cost for reaching task i from the left, i.e., going from any start position to task i (including doing task i). The element R i is the cost for reaching task i backward, i.e., to do task i and then continue from task i to any end position. The make-span for the robots is the run-time for the slowest robot, i.e., the one that has largest cost to finish its tasks, and the transport cost C (the make-span) is defined as the average of these two, where the indices α and β correspond to the components in L and R with maximum values (only considering the last M elements in L and the first M elements in R). Thus, the solution cost of a Potts transfer assignment matrix V can be found using Eqs. (14), (15) and (6) (detailed in Appendix A).
In the following section, the deterministic annealing method is derived that makes use of these two design variables, the Potts V matrix, and an annealing schedule to find optimal solution for this problem.

The local field energy
The Potts neuron v i j represents a transfer; doing task j after task i. For each of the Potts neurons v i j , a corresponding local field U local i j is developed with three terms: the transport cost field (U trans i j ) (to minimize the maximum of the transport costs), the loop cost field (U loop i j ) (to suppress loop solutions), and the stabilization field (U stable i j ) (to stabilize the system). The transport cost field is computed by taking the derivative of expression (6) with respect to the Potts unit v i j , where α and β correspond to the indices of the components in L and R with maximum values, respectively (considering only the last M elements in L and the first M elements in R).
The loop field is the same as introduced in Häkkinen et al. [13], which adds an extra penalty to avoid solution matrices with loop (i.e., repetitive tasks): The stabilization field helps to avoid oscillations in the DA updating due to the additional challenge of normalization. We use the same damping term as suggested by Eberhard et al. [8], which also corresponds to a soft normalization cost on the rows and columns: The complete local field for the MFT update is thus where γ and η are the positive weighting factors for the loop and stabilization terms, respectively.

Potts matrix
Now, that we have the local field for the MFT update, we need to update the "Potts" matrix using the equation, This problem is trickier than previous Potts MFT approaches since they were able to formulate the problem as a onedimensional Potts vector but we have a two-dimensional Potts matrix due to the nature of our problem. Our Potts matrix should be maintained as doubly stochastic even after the update. So, an additional external normalization is required to ensure that the matrix is doubly stochastic. This is done using Sinkhorn-Knopp normalization [31] for a certain number of times, say w, until the V matrix becomes doubly stochastic to a good approximation. This normalization, iter-atively switching between normalizing rows and columns, is guaranteed to converge (albeit sometimes very slowly) to a doubly stochastic matrix if the matrix is nonnegative.

Annealing period
The DA process runs for a pre-defined annealing period, with an initial temperature kT start , a final temperature kT stop and a factor kT factor for decreasing the temperature. All these parameters require proper tuning. At some point, the system reaches a critical temperature kT c , where a phase transition takes place. Here, the system undergoes rapid changes and converges toward a solution, which is illustrated in Fig. 2. It is important to set the initial temperature kT start > kT c , but not by much since that leads to unnecessary long convergence time. The annealing factor kT factor gives a smooth decrease in temperature for the system to eventually go on-shell. It is beneficial to use an adaptive cooling schedule where the cooling rate is decreased during the transition phase to allow better convergence to solutions. In our case, the adaptive cooling is set up based on the value dV max . This is the largest absolute change in value of a single element of the V matrix in the latest update. The exponential decrease in kT is done by the adaptive annealing factor kT factor , based on dV max , such that it lies within the predefined range [kT min fac , kT max fac ], as described in Algorithm 1.

DA algorithm
The DA method is outlined in Algorithm 1. The MRR-MM problem objective is defined with M number of robots and N number of tasks, with the utility costs expressed in the  N ). Now, each element of this sub-matrix is set to a nonzero random value 1 n+ , where is a very small value such that i v i j = 1 and j v i j = 1. After initializing the V matrix (shown in line no 2 of Algorithm 1), the design variables -propagator and the transport costs are calculated in line no 3. These variables are then used to find the transport cost field, stabilization field and the loop field which are then used to find the local field in line no 4. The iteration begins by setting the system to the predetermined start temperature kT start . After every iteration, the temperature is decreased by a factor kT factor , and the Potts units are updated according to the local field Eq. (10) line no 4, and Potts MFT, Eq. (11) line no 7. The resulting V matrix (line no 7) is a non-normalized matrix and an external Sinkhorn normalization is done after each update to ensure that the V matrix is (approximately) doubly stochastic (line nos [8][9][10]. Note that it is only the effective sub-matrix in V that is updated and normalized to be doubly stochastic. The process continues, reducing the temperature by an adaptive factor kT factor (line no 13 and 14) after every iteration. Hence, when the system goes through the phase transition period, the adaptive cooling slows down the system leading to better convergence of solutions. A solution is considered to be reached if M + N values in the Potts matrix are greater than 0.9 and the remaining variables are less than 0.1 (line no 16).

Parameter tuning
Proper values for the weighting factors η and γ , the parameters kT start and kT stop , and the annealing factor range, [kT min fac , kT max fac ], are all found by fine-tuning using trial-anderror methods. It is a simple and exhaustive method, where a parameter value is chosen based on experience and it is slowly adjusted by keeping other parameters to a constant value. The kT start and kT stop values are chosen such that the phase transition happens (the start and the end) smoothly. The kT min fac and kT max fac values are chosen based on the rate of solution convergence desired which again depends on our available computational resource. The three field functions are given equal weights, and hence, η value and γ value are also set to 1. Increasing their values will reduce the quality of the solutions. Also, η value set to 1 at all times enforces that the system is stable. The γ value is also set to 1 to ensure there is no loop solutions. However, for small size problems, the γ value can be set to a value less than 1 as there are not many loop solutions.
.95 for some j, or j v i j < 0.95 for some i, do 9 Sinkhorn-Knopp normalization 10 end 11 Repeat steps 3 and 4 12 Calculate dV max

13
Set the adaptive annealing factor kT factor = kT min fac + (kT max fac − kT min fac ) * tanh(dV max ) 14 Update the temperature kT = kT * kT factor 15 end 16 Print V matrix

Implementation details
This section outlines the details of the experiments where our method is compared with two other methods-simulated annealing and a heuristic method. The utility costs and T are randomly generated using a uniform distribution. The DA and SA methods require the annealing parameters kT start , kT stop , and kT factor . The DA algorithm also requires two additional parameters to be tuned: γ , the weighting factor for the U loop i j function, and η, the weighting factor for the U stable i j function. All three algorithms 2 are implemented using C++ with Eigen library and MATLAB running on Ubuntu 16.04 with octa-core Intel i7 processor and 15.6 GB memory. The DA code is based on the algorithm originally written for the communication routing problem by Häkkinen et al. [13].

Scalability comparison
An extensive study was done to compare the scalability of the three approaches. In the first experiment, shown in Figs. 3 and 4 , the number of robots (M) was kept constant (M = 10) and In the third set of experiments, the values for the elements in were also randomly generated from a uniform distribution with different problem sizes. All elements in the task vector T were set to 1, meaning that the cost of doing every task was considered the same. The SA algorithm was run under two or three different annealing conditions: kT factor = 0.9 (faster cooling), kT factor = 0.99 (slower cooling) and kT factor = 0.999 (very slow cooling). DA was run for kT min fac = 0.9, with adaptive cooling (with kT max fac = 0.99) during the phase transition period. The weight factors η and γ were set to 1.

Optimality and computation time
In the first set of experiments, M was kept constant and N varied. The slow cooling rate for SA (0.999) was only tested for small problems; for bigger problems the convergence was too slow. Figure 3 shows the performance of the three methods in terms of solution cost and computation time (in log scale) for N = 10, 20, . . . , 200. Figure 4 shows the same performance metrics for larger problem sizes like N = 10, 20, . . . , 780. For very small problem sizes (i.e., few tasks per robot), SA is a good algorithm. However, as the number of tasks per robot increases, DA clearly outperforms SA in terms of solution quality and computation time. Moreover, DA has a smaller distribution of final cost for the solutions compared to SA (results not shown). The second set of experiments, where N was kept constant and M varied, reconfirmed the findings (Figs. 4 and 5 ).
The heuristic method outperforms the other methods in terms of computation time by a wide margin. For bigger problems, i.e., when the number of tasks per robot is large, it is also better than the other methods in terms of solution cost. DA is the second choice after the heuristic method as  Table 1, which shows the performance of SA at different annealing rates compared with DA and the heuristic approach. The matrices differed more between the tests in experiment three than in the other two experiments, and the results are less clear. In terms of final cost, the heuristic method is better for large problems and the DA method is often best for smaller (but not very small) problems. For very small problems, SA is the method of choice. The heuristic method is always the quickest.

Annealing rate
The effect of annealing rate for SA and DA is illustrated in Fig. 6. The problem size is M = 10 and N = 200, where the utility costs are generated from a uniform random distribution. The heuristic method is also shown for reference. Figure 6 shows how the final solution cost varies with the computation time for annealing rates (kT min fac ) 0.9, 0.95, 0.99, 0.992 and 0.999. Both SA and DA yield better solutions with slower cooling rate.

The and Á weight factors
The loop weight factor, γ , determines the relative importance given to the loop cost in comparison to the other two costs. When M = N with the requirement that each robot does at least one task, the problem is reduced to a linear assignment problem based on a min-max objective and the loop cost is irrelevant. So, in this case, γ is set to zero as the search space is comparatively small and it allows the DA algorithm to focus fully on the transport cost. However, when the number of tasks grows in relation to the number of robots, e.g., if M/N < 0.3, γ should be set at 1 to avoid loop solutions. If γ > 1 then the final solution costs ended up too large. The normalization weight factor η was set to 1 in all experiments, which avoided oscillations. If η > 1 then the algorithm converged too fast and the solutions were sub-optimal.

Normalization time
Jönsson and Söderberg [17] used Sinkhorn normalization to get a doubly stochastic matrix when working with the "Double Potts Approach." We also adapt the Sinkhorn normalization [34] for the same process. But it is the most expensive step of the DA algorithm, contributing to more than 90% of the total computation time. The normalization accuracy was set to 0.95 (instead of 1), i.e., the row and column normalization were repeated until all row and column units sum to above 0.95 (some would sum to > 1). If the accuracy of the normalization step was reduced (i.e., below 0.95), the computation time was significantly reduced but at the cost of losing solution quality. For example, if the normalization accuracy was set at 0.9 then the computation time decreased by 30% but the solution quality degraded by 20%.

M/N ratio
As the problem size increases, the solution space also increases and the problem generally becomes harder. However, the ratio of the number of robots to the number of tasks

Distribution of solutions
To estimate how good the annealing methods were at finding low cost solutions, the costs for all valid solution matrices V for two specific problem sets were compared to the solution set obtained with SA and DA at different annealing rates. Figure 8 shows this for a problem with M = 2 and N = 4, with 144 valid solutions. SA was run with annealing rates 0.9, 0.92, 0.95, 0.97 and 0.999 and DA was run with annealing rate 0.9. All methods found the optimal solutions with cost 67. Figure 9 shows a case where M = 5 and N = 6, with 432,000 valid solutions, of which 64,946 were used for the histogram. SA was run with annealing rates 0.9, 0.99 and 0.999 and DA was run with an annealing rate of 0.9. The lowest cost was 78.46, obtained with SA and an annealing rate of 0.999 (lowest cost with the heuristic method was 117.9 and lowest with DA was 95.08). The examples show that the  The blue color shows the best results followed by green color for second best final solutions from both SA and DA methods are among the lowest cost solutions.

Advantages and disadvantages
It is evident from Figs. 4 and 5 that the heuristic method performs better than the other two methods by a large margin.
Still, Table 1 shows that the heuristic does not always perform better than the DA method in terms of solution cost. SA at annealing rate 0.9 is fast but gives very poor solutions. With a slow annealing rate (like 0.99 or 0.999), solutions are comparatively better but they come with a high computation cost. The problem size and M/N ratio affect SA much more than the other methods.
A difficulty with the DA method is tuning the parameters, especially η and γ , which affect the solution quality (this is commented by Söderberg and Jönsson [17]). Some tests are always required to tune these parameters. When it comes to the annealing period, SA and DA get improved quality of the solution for slower annealing rate but DA always yields better solution than SA at the same annealing rate. Another important factor to be noted is that the external normalization required for DA means a considerable extra computational burden. When it comes to the heuristic method, it does not explore all possible solutions and the solution quality highly depends on how the discrete edges are found in the multidigraph and how they are cut.

Conclusions
This paper presents a novel deterministic annealing method based on mean-field theory and the Potts model for solving the multi-robot routing problem with min-max objective. A novel feature is the generic formulation of this problem in terms of a permutation matrix. Another feature includes the application of a Potts matrix for defining the problem, instead of the Potts vector case. A dedicated mean-field equation is derived that minimizes the maximum cost of the robots (i.e., the robot that takes longest time). By use of Sinkhorn normalization and the path propagator, the method converges to good quality solutions. As a side note: Jönsson and Söderberg [17] tried some preprocessing techniques to improve the convergence at lower temperatures, which did lead to shorter computation times. Applying such methods would likely make the DA method faster here as well, but we opted to use only the Sinkhorn normalization.
Extensive experiments are conducted by comparing this method to two other methods: simulated annealing (based on the same problem formulation) and a heuristic method. Deterministic annealing produces better results than simulated annealing for comparable convergence times if the number of tasks is large compared to the number of robots. The heuristic method outperforms both simulated and deterministic annealing in terms of computation speed. For problems with few robots, deterministic annealing sometimes produces better solutions than the heuristic method. However, for problems with many robots and many tasks, the heuristic method is a clear winner. The future work involves extending this problem formulation to linear assignment and scheduling problem, and adapting the problem formulation and method accordingly.
Funding Open access funding provided by Halmstad University. Not applicable.
Data Availability Not applicable.

Code availability
The source code for this algorithm is available in GitHub (link provided above).

Conflict of interest Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

Appendix A
The elements L i and R i can be computed by iterating the following relations to say, k, times: and see which values they converge to when all L i and R i are set to zero from the start, i.e., L i (0) = 0 and R i (0) = 0 for all i. Writing them in matrix form and iterating them to k → ∞ leads to the expressions where P is the propagator matrix and diag(X) denotes the column vector with the diagonal elements from the matrix X. For our previous 7 × 7 example, we have two vehicles, A and B. The left vector is 16) and the right vector is Thus, the robots' estimated run-times are the last M elements in L (estimated from the start to the end) and the first M elements in R (estimated from the end to the start).