Multi-agent path planning: comparison of different behaviors in the case of collisions

. In the context of Industry 4.0 and cyber-physical production systems, the role of production logistics is perceived as more and more important in order to reach the overall manufacturing targets. One central aspect in organizing the flow of material consists in task allocation and path planning for transport resources dis-posing of growing autonomy. There are various approaches for multi-agent path planning as well as the way of dealing with collisions. Collisions are possible due to traffic volume and can either be treated on planning level or in a short-term way on control level. The paper presents existing strategies for path finding before giving an overview of methods to deal with autonomous transport resources that meet in a manufacturing environment. Then, different existing behaviors and reactions in the case of collision detection based on several criteria are compared. This step allows classifying the strategies depending on the manufacturing environment and its organization.


Introduction
With the introduction of the concepts of Industry 4.0 like digitization and decentralization, various challenges arise in production environments. In addition, customer demand regarding increasing number of product variants as well as shorter product life cycles result in smaller lot sizes. [1,2] Production logistics play an important role in the manufacturing process and companies understand more and more its influence on the throughput time [3]. The utilization of autonomous resources is supposed to turn logistics more efficient due to their possibility to organize themselves on a decentral level. This means that the operational level obtains decision-making autonomy within defined constraints. A challenge in this context is the balance between taking advantage of the autonomy and reaching central planning targets. One central aspect is the interaction of various resources with possibly different skill sets. [4][5][6] Mobile autonomous resources, i.e. self-directed vehicles also called autonomous guided vehicles (AGV), take center stage in current research. There exist different concepts for multi-agent path finding (MAPF) presenting each advantages and disadvantages. [7,8] Considering the range of concepts for MAPF and especially collision avoidance an evaluation of these approaches comparing their applicability in a specific manufacturing environment is missing. An approach is needed in order to identify a suitable strategy or set of strategies for avoiding collisions in production environments.
The scope of this paper is to give an overview of existing approaches for dealing with and avoiding collisions of autonomous transport resources. In addition, these ways are compared based on a choice of valuable criteria.

2
State of the art Relevant literature concerning production logistics, autonomous resources and MAPF is summed up in this section to present applicable concepts. Before taking a deeper look on path planning and collision avoidance, a short introduction to logistics and mobile resources in general is given.

Logistics and mobile resources
As described in the introduction, AGVs are used for a variety of single and multi-agent applications but the main sector is logistic systems. The most common logistic scenarios are the transportation of dock containers in a terminal [9][10][11] and transportation services in warehouses as well as production lines [12]. While in the past, AGVs were only used for transportation services recent projects have also focused on expanding their skill set. Hoff and Sarker [13] have distinguished between six different major types of AGVs that include unit load, towing, parallel truck, forklift, light load and assembly line vehicles. The newest addition to these classic AGV systems have been drones which are tested in warehouses [14]. The vehicle design and fleet combination is just one of many central research topics in AGV Systems. Others involve the sensors, navigation, localization, vehicle control and the communication between the agents [15].

Path planning for multiple resources
The classical MAPF problem for a set of n agents receives an input tuple (G,s,t) where G = (V,E) is an undirected graph with V vertices denoting locations and E edges. s : [1,…,n] -V maps an agent to a start vertex , and t : [1,…,k] -V maps an agent to a target vertex. Often the graph is given with its corresponding adjacency matrix that displays the connected vertices and the weight of the edges (c.f. figure 1). For the solution of this problem, one can find methods, which take different constraints into consideration. The Dijkstra´s algorithm is one of the most commonly known algorithms for the shortest path search [16]. It calculates the shortest path between two vertices in an undirected grid. An evolution of Dijkstra´s algorithm is the A*Algorithm, which estimates the distance, i.e. costs, from the start vertex to the target vertex to preselect appropriate vertices with a heuristic [17]. Warren [18] proposes a modified A*Algorithm which is faster than the normal A*Algorithm due to a further reduced solution quantity.. Both the Dijkstra and A*Algorithm are not flexible and cannot react to a changing environment [19]. Therefor the D*Algorithm (Dynamic A*) was developed by Stentz [19] as an extension of the A*Algorithm. Koenig and Likhachev [20] propose the D*Lite Algorithm which combines the heuristic approach of the A*Algorithm with the incremental one of the D*Algorithm. Other extensions of the D*Algorithm are the Focussed D* and Field D*Algorithm. The Focussed D*Algorithm focusses on the repair of the cost map, thus reducing the computational costs [21]. While all of these approaches are limited by a discrete set of possible transitions in the grid the Field D*Algorithm is interpolation based and calculates smooth paths for the agents [22].
These algorithms were developed for single agent path planning but can also be used for MAPF problems. The M*Algorithm is specified for MAPF problems and minimizes the global cumulative cost function of all agents [23]. An extension of the M*Algorithm is the M*UM*Algorithm which considers uncertainty in the position and pose of the agent [24,25]. Nebel et al. [26] propose an M*Algorithm for uncertainty in the destination of the agents path while Ma et al. [27] developed a MAPF algorithm with delay probabilities.

Collision detection and avoidance
One of the main topics within the MAPF problem is the collision detection and avoidance between the agents. There are two main approaches for the collision detection: one is a centralized and the other one a decentralized. For the centralized approach the information about the actual position, path and speed must be collected in a central location [28]. Felner et al. [29] are adding a heuristic to the Conflict-Based Search (CBS) by aggregating collisions among the agents. While this approach uses discretized time steps and assumes that all actions have the same time duration Andreychuk et al. [30] propose a Continuous Time Conflict-Based Search (CCBS). The decentralized approach enables the agents to detect collisions by themselves and react accordingly. Wei et al. [31] present an approach where only the affected agents need to communicate with each other in case of a collision. Trodden and Richards [32] use a Model Predictive Control (MPC) where only one agent replans, while all others continue with their path. The agent that replans its path is identified via a bidding system, where the highest bidder receives the replanning token in each iteration. Desaraju and How [33] use a two-step approach where the first step is the individual component and the second step is the interaction component. While in the first step each agent minimizes the cost of its own paths, the interaction component minimizes the total cost across all agents.
If a collision is detected, researchers propose a variety of strategies for the behavior of the involved resources. Liu et al. [34] plan paths for a multi-agent pickup and delivery problem where no paths cross each other during the duration of the exercised task. A similar approach is used by Kiarostami et al. [35] where a Monte-Carlo Tree Search is used to find non-overlapping paths.
Sun et al. [36] developed a decoupled approach where every agent's path is planned with an A*Algorithm. Every agent continuously checks its front area and if another agent is detected in a critical region, a behavior-based collision avoidance strategy is executed. The behaviors are specialized for an intralogistic problem description and selected by specified traffic rules. Another decentralized approach where possible collisions are detected by the agent itself is proposed by Gochev et al [37]. Instead of a rule based collision avoidance the agents switch in a collision avoidance mode. In this mode the agents create a roundabout along the original path which is faster than stepping back and changing directions. Chang et al. [38] are using a detection shell around each agent and according to the nearest object inside this detection shell breaking and gyroscopic forces are calculated.
A five layer neural network with three hidden layers which is able to generate crossing paths and the associated velocity vectors of the agents so that no collision occurs is presented by Chen et al [39]. This approach is based on the optimal reciprocal collision avoidance (ORCA) which is often used for the collision avoidance in free navigating agents [40].

Concept
Based on the approaches presented in chapter 2, criteria to be considered in manufacturing systems are identified before giving a comparison of the concepts for collision avoidance. For the use of autonomous transport resources in real manufacturing environments, this is a necessary step in order to choose appropriate strategies depending on individual production constraints.

Criteria to compare collision avoidance strategies
To evaluate the different collision avoidance strategies it is necessary to define the environmental conditions. The criteria to compare the different collision avoidance strategies are not to be confused with the environmental settings. Since there exists no past work on the ranking or comparison of these influences for the different algorithms we have identified the following environmental influences which were the most common research topics in chapter 2.2:  Order Duration: One of the most researched and discussed problem for the MAPF sector is the uncertainty in the order duration. It is not feasible to assume that the supposed order duration is equal to the real executed order duration because of delays that can occur. Obstructions can occur because of obstacles in the path or delays at the charging and discharging processes, which lead to different execution times. The uncertainty is not only limited to the duration of a delay but also on the occurrence of a delay during the order.  Time Constraints: After the planning of each order, every task has a starting time and a deadline when it should be finished. For the AGVs, this means that it must reach its target vertex prior its planned deadline. Since there exists uncertainty in the delay of orders, the planned time for the task of the AGV must have a buffer.  Prioritization: Most Companies prioritize their orders because of customer relationships, demands or order volume. For this, they use different principles like First in First out, Least Production or scalar as weights to classify each order.
All of these aspects have to be considered not in an isolated test area, but in real manufacturing conditions, the previously mentioned environmental settings. These comprise the next parameters based on Adam [41]:  Product information including volume and lot sizes  Production organization including layout, workflow and in-house production depth  Organization of planning and control processes  Organization of logistic processes  Presence of workers including their skills and schedules

Comparison of different collision avoidance strategies
There exist many different approaches for AGVs to behave in the case a collision is detected. We have identified five basic strategies from the research done in chapter 2.3 and the collision avoidance behavior presented by Sun et al. [36]. Though they explain eight different behaviors, all of them can be reduced to two basic strategies. That is why we adapted the avoiding and waiting strategies of Sun et al. plus the variation of the start time, no crossing paths and a change of velocities for our evaluation. All strategies are further described in the following segment. To avoid repetitions it is assumed that the paths of the agents cross at vertex n, the velocities of the agents are v and the distance from the start vertex to vertex n is d.
1. Path Crossing: Probably the oldest strategy in case a collision is detected consists in finding another path, which traverses no previously planned path. This approach seems only suitable if there is a grid big enough to reach the target through different sets of vertices and edges. 2. Priority: The agents take over the different priorities from the orders and if they collide at vertex n the higher prioritized agent passes the vertex n first while the other agent waits at the previous vertex n- 1. In this scenario, it is possible that the waiting agent produces a deadlock or has to wait for a long time because of higher prioritized agents. The approach is shown in figure 2.

Fig. 2.
Two crossing paths where the agents have different priorities. agent 1 passes vertex n before agent 2 because of a higher priority.
3. Start time: Another possible strategy is to vary the start time of the task to avoid a detected collision. For each agent's path the speed v and distance d is known, so it is possible to calculate the time when the agent reaches vertex n. With this information, the starting time of the specific order can be varied so the agent passes vertex n before or after the other agent reaches vertex n. 4. Velocities: Based on the computed time to reach vertex n the velocity v can be varied, so that the AGV passes the vertex n earlier or later to avoid a collision. This approach can be applied in two versions. The velocity of the AGV can be set at a fixed value at the start of each order or the velocity can be changed temporarily during travelling the path. 5. Collision avoidance mode: If one agent detects another agent or obstacle on its path, one or both agents can switch into the collision avoidance mode. In this mode, the agents pass the obstacle on their own without the need to replan. For this, all agents must be equipped with the necessary sensors to do such a maneuver.

Application and evaluation settings
The environment for the test cases is a ROS based path planner and its simulation environment. A test scenario is set up to evaluate different criteria for the collision avoidance strategies presented in chapter 3.2. The parameters mentioned in chapter 3.1 are meant to be analyzed and structured regarding their influence on the choice of suitable collision avoidance strategies. They make the simulation better comparable to real life applications but also computational expensive. Workers are seen as obstacles and these will cross and obstruct the planned paths at a certain percentage. The duration of the obstruction is determined with a Gaussian normal distribution. In addition, we assume that the workstations have different states, so the vertices of the grid are changing. Vertices will have the capacity one, so only one agent can occupy each vertex at a specific time step. The edges also have a capacity of one. Despite this, it is possible that multiple agents are in the same cross or pass each other in a lane because the ROS path planner builds its own grid on the prescribed map.
To evaluate and compare the different strategies we analyze a set of key performance indicators in the simulation. Since most researchers try to minimize the sum of costs of all agents this is one of the identified parameters. By analyzing additional parameters, the influences of the different collision avoidance strategies can be shown. Furthermore, we analyze the traffic which is generated from the different strategies and also the grid capacity utilization. With these parameters, we can make a first assessment of how the applied strategies influence workers or how easily additional orders can be implemented in the plan:  Sum of all costs: ∑ =1 where is the path cost of agent i and I is the quantity of all planned agents. The cost of each path is equal to its length.  Grid capacity utilization: Ratio between all the occupied vertices and edges to all available vertices and edges in the grid. This value shows how much space of the grid is occupied by agents or the agents' path with the different strategies.  Traffic: Ratio between the actual duration of an order and the planned time of an order. With a higher ratio, more interruptions occur which equals more traffic.
The first tests in a simulation environment compare the average throughput time of four AGVs with each executing orders of two or three jobs for the strategies priority and path crossing (cf. chapter 3.2). The results for the mean value of the order duration show that avoiding collisions by circumnavigating already planned paths leads, as expected, to 10 up to 30 % longer throughput times. It is evident that these strategies depend highly on the considered manufacturing layout and the existing traffic. This needs to be regarded in a more detailed way.

Conclusion
Current challenges linked to the concepts of Industry 4.0 affect in particular production logistics. This comprises among others the need of more flexible solutions for material transport in manufacturing environments. Thus, the use of autonomous resources increases as these means of transport present advantages compared to standard logistic solutions and as they allow relocating decisions on resource level.
This contribution presents an approach to compare different ways of dealing with collisions between AGVs based on valuable criteria. The treated types of collision avoidance were path crossing, priorities, start time, velocities and internal collision avoidance mode. Relevant criteria to evaluate these strategies are order duration, time constraints and prioritization.
The next step will be a more extensive evaluation, validation and verification of the concept presented in chapter 3 within a simulation scenario. These analyses will aim at verifying the applicability of the stated criteria and complete them if necessary. Additional varying speeds of the vehicles on their way through the production environment have to be implemented to make the application more realistic.
Open Access This chapter is licensed under the terms of the Creative Commons Attripermits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.