A Strategy-Based Algorithm for Moving Targets in an Environment with Multiple Agents

Most studies in the field of search algorithms have only focused on pursuing agents, while comparatively less attention has been paid to target algorithms that employ strategies to evade multiple pursuing agents. In this study, a state-of-the-art target algorithm, TrailMax, has been enhanced and implemented for multiple agent pathfinding problems. The presented algorithm aims to maximise the capture time if possible until timeout. Empirical analysis is performed on grid-based gaming benchmarks, measuring the capture cost, the success of escape and statistically analysing the results. The new algorithm, Multiple Pursuers TrailMax, doubles the escaping time steps until capture when compared with existing target algorithms and increases the target’s escaping success by 13% and in some individual cases by 37%.


Introduction
There has been extended research on search algorithms for many years. The study and development of such algorithms were based on the basic scenario of a single agent that is tasked with finding a target or goal state on a graph within minimal time. Each search algorithm has its own purpose and need. Even in a simple, static environment, the pathfinding search algorithm faces several challenges. In complex environments, more challenges arise. Various assumptions of this single agent with a single target, the scenario can be relaxed, leading to more difficult problems: there can be several pursuing agents that need to coordinate their search, assigning strategy to the agents before following targets, there can be multiple targets, all of which need to be caught, and targets can move on the graph over time rather than be in a fixed position.
Many suitable algorithms have been proposed for pursuing agents in the domains of video and computer games, robotics, warehouses [1], and military and surveillance applications [2]. Some of these algorithms are for a single agent, such as MTS [3], D* Lite [4] or RTTES [5] and some are multi-agent, for example, FAR [6], WHCA* [7], CBS [8] and MAMT [9]. These algorithms aim to find the shortest path to the target location(s). While the shortest path is important, the run time is essential, too, as considered by real-time heuristic algorithms [10].
Besides a more standard pathfinding search for a single agent pursuing a single target on a static map, the case could be complicated with an increase in the number of agents or dynamic changes in the environment. For example, in the scenarios with moving targets, the target algorithms also play an essential role in developing multi-agent scenarios, but they are less studied. The goal of such algorithms is to evade capture as long as possible.
Consider a pursuit and evasion game, where players could be human or computer-controlled. Other examples are video games such as Grand Theft Auto and Need For Speed where both sides of players can be controlled by the algorithms or a flight simulation application where computer-controlled targets are needed to catch or shoot [11]. To make the game more interesting, intriguing, and challenging, the targets need to behave intelligently. Therefore, good target algorithms are an essential factor in improving the gaming experience.
Target algorithms that exist usually have strategies such as maximising the escaping distance [12], random movements to selected, unblocked positions in order to evade from the capturer [13] or, in a state-of-the-art approach called TrailMax, maximising the survival time in the environment by considering the potential moves of pursuing agents on each time step [14].
Multi-agent pathfinding (MAPF) problems have been analysed in detail in the literature [15]. These problems are known to be NP-hard [1]. As an example of such a problem in a video game is when all non-player agents need to navigate from a starting location to the goal location on a conflict-free route in a static or dynamic environment [16].
Algorithms developed for moving, in other words escaping targets, can make the empirical study of MAPF problems more meaningful, useful, and challenging. Thus, how can we improve on existing ones? We previously introduced an algorithm [17] based on TrailMax that can be used for multiple moving targets to flee from multiple agents in a dynamic environment. A good design of such an algorithm can help targets to escape more intelligently, rationally and in a human-like manner.
This study considers more testing scenarios against more pursuer strategies, target algorithms, benchmarked maps, player combinations and improving the cost while the target expands pursuers' nodes. Empirical evaluations report different performance metrics, such as capture cost, success rate, computation time and statistical analysis for the significance of the findings.
In the remaining parts of this paper, the following section presents the related work. "Multiple Pursuers TrailMax: Proposed Approach" describes the new approach to the problem. Empirical comparisons are described in the subsequent section, and "Discussion" and "Conclusion" sections follow up.

Related Works
This section introduces several existing target algorithms in the literature. The following is a brief description of each algorithm.

Target Algorithms
Although there is plenty of research in the literature emphasising algorithms for pursuing agents, there are few studies that are conducted on algorithms for mobile targets. The A* algorithm is a classic example that is implemented as an algorithm for many pursuing agents, as well as target algorithms [15].
TrailMax. TrailMax is an intelligent algorithm that is based on a strategy. It generates a path for a target considering the pursuing agent's possible moves, i.e., it efficiently computes possible routes by expanding its current and adjacent neighbouring nodes and agent's nodes simultaneously [14].
The aim of the TrailMax algorithm is to make the targets stay longer by maximising the capture time. The players can move on the map; thus, the target computes an action on every time step with new updated information about the players. It is for one-to-one player scenarios.
The algorithm works as follows. To compute a path, an escape route that maximises its distance away from the agent, it checks the best cost of the neighbouring states against the pursuer's costs and expands nodes accordingly. The algorithm expands nodes that are not yet expanded and not already occupied in the target closed list and not in the pursuer closed list. The node with the best cost is added to the target's closed list, which would generate the path afterwards. The first element in the path is an action for a target to take. This procedure is repeated from scratch every time step.
It is a state-of-the-art target strategy algorithm that performs the best against pursuing agents, aiming to make the targets less catchable or more difficult to be caught [12].
Minimax. When used as the target algorithm, it runs an adversarial search that alternates moves between the pursuers and the target. When the pursuing agent gets closer to the target state, then the target distances itself from the pursuing agent's state. To make the algorithm faster, Minimax is run with alpha-beta pruning search, where alpha (α) and beta (β) are constantly updated to avoid the exploration of suboptimal branches [18]. The used depth is 5, i.e., the outcomes after at most 5 moves of each party are considered.
Dynamic Abstract Minimax. Dynamic Abstract Minimax (DAM) is a target algorithm that finds a relevant state on the map environment and directs the target using Minimax with alpha-beta pruning in an abstract space. There is a hierarchy of abstractions. Higher levels might not provide enough information about the map and lose important details, such SN Computer Science as an agent at the close by, and fine abstract levels might be very detailed and increase the computation costs.
The search starts on the highest level of abstraction, an abstract space created from the original space. The minimax algorithm runs a search at the highest level of abstract space and continues to the next low level of abstraction. It stops at the level where the target can avoid the capture. Then, on this level of abstraction, if a path exists, an escape route is computed using the PRA* algorithm (described in next section). If the target cannot escape and there is no available move to avoid the capture on the selected abstract space, then the level of abstraction is decreased, and the whole process repeats until the target can successfully run away from being caught [18]. The used depth is 5.
Simple Flee. Another algorithm for targets is Simple Flee (SF), which can be used to escape from the pursuing agents to the predefined states on the map [19]. The SF algorithm works as follows. At the beginning of the search, the target identifies some random locations on the map. When the target starts moving, it navigates to the furthest location away from the pursuers. To disorient the pursuing agents, such as incremental heuristic algorithms, D* Lite [4] and MT-Adaptive A* [20], that can search from the target's state, the direction towards the selected location changes in every five steps, and if it is the furthest location, it keeps moving. The number of locations on the map and the number of steps before the change are the parameters of the algorithm.
Greedy. This is the standard greedy algorithm that repeatedly makes the best local optimal choices that, in hope, would lead to global solutions. This is a simple and fast approach to solving a problem that uses sub-optimal and easily computed heuristics [21].
Greedy runs a cumulative Manhattan distance of maximising the gap towards the pursuers. It evaluates its options and moves to that state. Once it is at that point, it will stay until being captured, if any other maximum states are not available [19].
Target algorithms, without strategy but considering a pursuing agent's location, make their way to the furthest away state possible. When a target escapes from a pursuer, which, in multi-agent scenarios, sometimes might fall into the path of other pursuing agents. This causes an issue in MAPF frameworks. To avoid this limitation, the study in this paper considers all pursuers, and this new approach provides a winning strategy for the target.

Pursuing Algorithms
This study sets out to develop a new multiple target algorithm. Therefore, this part of the section briefly introduces algorithms for pursuing agents, which will be used in the experiments.
PRA*. Partial-Refinement A* (PRA*) is an algorithm that reduces the cost of search by generating a path on an abstract level of the search space. These abstracted spaces (graphs) are built from the grid map. The abstract level is selected dynamically. The A* algorithm is then used to run a search with sub-goals on the abstract graph. The abstract path creates a corridor of states in the actual search space, through which the optimal path is found. This is a widely used approach and its variations have been described with different search techniques [22]. STMTA*. In cases where more than one target exists, an effective strategy for pursuing agents helps to win the game. Strategy Multiple Target A* (STMTA*) algorithm uses methods to intelligently assign agents to targets to create an opportunity of capturing targets faster [23]. All routes towards the targets are computed and based on the given strategy the optimal combination is selected. Once the strategy is assigned, the pursuing agents know the targets they follow, all agents use the A* algorithm to move towards the targets.
The routes are the distances from the pursuer to the target. Depending on the assignment strategy, the distances between pursuer-target pairs are preferred. For the initial assignment, summation cost or mixed cost criteria are minimised [12]. The summation-cost sums all the distances (n) and mixed-cost takes the longest distance, makespan (m) but in cases of tie break, it uses the sum of distances. The mentioned approach does not focus on re-assigning the agents after their assigned targets have been captured.
Variants of this algorithm using different criteria such as twin-cost, cover-cost, and weighted-cost, were introduced and developed [24]. STMTA* uses these three criteria during the tests because the previous study measured their performance, and overall, they produced better results than the other cost criteria. Throughout the experiments, if any target is caught, the pursuing agent is reassigned to another target depending on the strategy followed.
The twin-cost criterion multiplies the sum of distances n with makespan m, (n * m) . In situations, if a tie-breaker is needed, then the average of n and m is taken.
The weighted-cost criterion multiplies these values with a given percentage, totalling to 100% and adds them up. During the experiments, the ratio of 50/50 was used for the weighted-cost criterion, (n * 0.5) + (m * 0.5) . The combination with the lowest value is selected for twin-cost and weighted-cost criteria.
The cover-cost criterion uses a different approach. Instead of using the distance cost, it computes the area each pursuer covers. By taking turns, a pursuer and a target mark each available, not occupied state covered P or T respectively. The pursuer does need to reach the target, depending on the players' positions on the map, pursuers and targets intersect in between. Each pursuer's cover is measured and the combination with most Ps is assigned to the pursuers. When a pursuer computes its P, it is possible to overlap among other pursuers. For example, the summations-cost criterion adds all distances per combination and the lowest value among all combinations is selected. In the cover-cost, the P values are summed for each combination and the highest result is preferred.

Multiple Pursuers TrailMax: Proposed Approach
In the following section, a new target algorithm is described. First, the motivation is given for the algorithm, then it follows with pseudo code, see Algorithm 1, and finalises with further improvements.
When the problem was described in the Introduction section, it was stated that a smart target algorithm is very useful to have. In the simple scenarios where a single agent pursues one target, the target would know from which agent it needs to escape, as there is only one. Some of the strategies to run away from the agent have been discussed in the previous sections. But if a situation is considered where multiple targets need to escape from the current state and move to the safest destination in the dynamic environment, how would targets know which pursuing agent they need to avoid for a successful run? For example, SF can flee from the closest pursuer but sometimes could run into other pursuers. What would be a smart move for a target while avoiding capture if there are many pursuers?
Although the TrailMax algorithm, as introduced in the previous section, is a state-of-the-art algorithm, it has been designed to work with only one agent, meaning a target does not have any strategy to escape from one pursuer and avoid another approaching pursuer at the same time.
For this specific reason, a target algorithm that would be able to identify approaching multiple agents and escape from all pursuers, a novel algorithm, called Multiple Pursuers TrailMax (MPTM), is developed.
The MPTM algorithm uses a similar methodology as TrailMax but is enhanced for MAPF problems. There are two possible benefits that could come from extending Trail-Max to MAPF problems. First, the target can identify the state location of other targets and collaborate with them. Second, it can ensure the escape not only from one pursuing agent but from any approaching evading agents. Here the focus is on the second issue. It is exhaustive, meaning it considers all possible moves from the agents. Therefore, it is relatively computationally intensive and provides a solution if one exists.

The Algorithm
The pseudo-code for the MPTM algorithm is depicted in Algorithm 1. First, the current locations of all players (pursuers and target) need to be initialised in line 2. The next step is to group all players according to their role and append their positions into the relevant queues, all pursuers to the pursuer_node_queue and a target to the target_node_queue. At this point, all players will have a cumulative cost of zero, lines from 3 to 5. To make it easier to follow the code, each movement cost will be equal to one, unless it is in wait action, then it is zero. This is with the assumption that there is no octile distance. However, the algorithm works with different speeds and distances. while target_node_queue not empty do 9:

SN Computer Science
ct ← get c from target_node_queue 10: cₐ ← get c from pursuer_node_queue 11: if (ct ≤ cₐ) then 12: remove target from target_node_queue 13: if target not in target_closed and pursuer_closed and parent node not in pursuer_closed then 14: insert target into target_closed 15: append target neighbours onto target_node_queue 16: else 17: for each pᵢ of players do 18: get state sᵢ for pᵢ 19: if sᵢ is pursuer then 20: cₐ ← get c on pursuer_node_queue 21: remove pᵢ from pursuer_node_queue 22: if pᵢ not already in pursuer_closed then 23: insert pᵢ into pursuer_closed 24: if pᵢ in target_closed then 25 The algorithm has four different lists. The target_node_ queue and pursuer_node_queue contain expanded, visited nodes, such as the current state or neighbouring states for both target and pursuers. The target_closed and pursuer_ closed lists contain states that are already visited and occupied by players.
Since this is the target algorithm, in line 7, it starts first to check if it is already caught or not. Then loops through if there are any target nodes in the target_node_queue. As this is the first step, it only contains the target's current position. Then, it computes the cumulative cost c, the highest value, for target c t and pursuers c a at lines 9 and 10. If the c t is lower or equal to the c a , then the target expands its nodes, line 11.
During the expansion of nodes for targets in lines 12-15, first, the target node is removed from the target_node_queue and placed inside target_closed if it is not already in the list and not in the pursuer_closed list. It also checks if the target's parent node is not in pursuer_closed. The target loops through its available adjacent neighbours and adds them to the target_node_queue. These steps are iterated until no state is left to expand. The nodes are expanded like in breadth-first search, first-in-first-out.
When the target c t is higher than c a , the condition on line 11, the pursuers take the turn, and they start to expand their nodes. The main part of this algorithm is the lines between 16 and 28, where each pursuer loops through its state and expands its nodes independently from other pursuers. The target needs to know the position of pursuers' states and loops through each player. If it is a pursuing agent, then this  Lines 29-32 generate a path. The last element in target_ closed is the furthest state that the target could move to. This list is reversed to identify the route, and the first element in the list is the action that the target takes. The function repeats every time step to find the best action for the target.
This turn-based expansion goes to the point where all states on the map have been occupied either by the target or the pursuers. The target could only win if its state is not taken by any pursuers until the timeout.
For multiple targets, the algorithm is run on each target, and normally, each will get a different outcome based on their location. The result will be the same if they are all in the same state. Even if the starting position is different, the targets could join their path if that is the optimal option.

Further Improvements
The strategy of TrailMax works for one-to-one agent scenarios, and to get the best cost from the list for each player is straightforward. But this is not the case for the MPTM algorithm as it considers many pursuing agents in one search. The pursuer_node_queue contains information for all pursuers and their moving directions with costs.
It has already been discussed that the initial cost is zero for all players. When line 11 is called, it will be true, and the target will take turns to expand and increase its cost by one. On the next iteration, this condition will be false, as the cost for the target is 1, and all pursuers' cost is still zero. The expansion takes place for pursuers. As there are many pursuers, line 20 will request for the first pursuer's cost from the pursuer_node_queue. Then this pursuer will expand and increase its cost to 1. There is a problem here because TrailMax requests the best cost on each iteration. It would have been fine if there was only one pursuer, but this is an issue with multiple pursuers. If the best cost was considered for multiple pursuers, then only the first pursuer would be expanded as only its cost would be incremented. This leads to the fact that only the same pursuer is requested with the best cost and all other pursuers are left without expansion with initial cost zero.
To fix the above problem, the cost requested on lines 10 and 20 is not the best cost but a cost for each pursuer in order of from the pursuer_node_queue. This gives greater opportunity for a target to evaluate all pursuers' moves and make decisions more accurately.
Another enhancement is that MPTM does not only consider and run away from the closest pursuing agent but takes into consideration all pursuers on the map by checking each pursuer's state on line 18.

Empirical Evaluations
In this section, the empirical results will be presented to demonstrate the efficiency of the proposed algorithm. First, the experimental setup will be described, then, performance results of the MPTM algorithm described in previous section will be reported.

Experimental Setup
For better comparability, standardised grid-based maps from the commercial game industry are used as a benchmark [25]. The environments used are eight maps from Baldur's Gate  Table 1. Within the experiments, these maps are used with a four-connected grid and impassable obstacles. Figure 1 displays sample maps used for the experiments, where black coloured spaces are the obstacles, and the white space is a traversable area. The maps were chosen based on the presence of obstacles and difficulty of navigation. The movement directions could be up, down, left, and right with a cost of one each. That said, the approach should work with different moving costs as well.
The scenarios were chosen to have multiple targets, and for the experiments, initially, two and later three targets were tested. The combination of pursuers versus targets is displayed in Table 2. These scenarios help to understand the behaviour of the MPTM algorithm when targets are outnumbered.
All players are placed at different randomly selected locations on each map. There were two different sets of starting positions. The first set has all pursuers in the same location and all targets in the same location, and targets are positioned at the farthest distance from pursuers. The second set has all players randomly positioned in disperse, in various walls of the map. This helps to measure and analyse the performance of the algorithms.
Each configuration runs 20 times. The implementation [19] kindly provided by Alejandro Isaza was used as a basis but extended such that multiple targets and various agenttarget assignment strategies could be used. The results were obtained using a Linux machine on Intel Core i7 with a 2.2 GHz CPU and RAM with 16 GB.

Experimental Results
Performance analysis is conducted with respect to three key indicators: (i) the number of steps taken for each target algorithm before being caught, (ii) its success rate and (iii) computation time. The first two of the measurements are averaged considering all targets, and the time is normalised per step.
During the experiments, each test run finishes when all targets are caught or there is a timeout. If some pursuers already caught their assigned targets, the chase continues as long as there are still uncaught targets. With PRA*, all pursuers continue with the next closest target, and it is possible that all pursuers will chase only one closest target and leave others because of their far distance. Whereas the STMTA* algorithm has an assignment strategy, all targets are being chased, and when one target is caught, the pursuer that becomes idle is reassigned next uncaught target. Success for pursuers is achieved when all targets are caught, and the number of steps until the targets have been caught is recorded. The success for the targets is to avoid the capture or stay on the map as long as possible.  Capture Cost. To evaluate the MPTM algorithm, a comparison with SF, Minimax and Greedy is displayed in Table 2. This measures the performance in terms of the number of steps for all targets. The numbers indicate the mean of steps for target algorithms on all maps. Table 2 displays results for different target algorithms. Each value is the mean of eight tested maps. The proposed MPTM target algorithm offers a much longer stay on the maps for all configurations. This indicates that it avoids capture and demonstrates smarter decisions. The higher number is better.
Some maps have island-type obstacles that allow the targets to escape from pursuers more easily, see Fig. 1.
Although each map has many states to explore, as seen in Table 1, all algorithms managed to find an escape route. SF and Greedy both display similar capture time and their results are close to each other. Minimax is better than SF and Greedy but still not as good as MPTM.
The results compared in Table 2 show that for all player combinations, the MPTM algorithm managed to escape all pursuing agents two times longer than MMX. The same algorithm when compared against SF or Greedy, the results display that on average MPTM manages to run away from the pursuers 2.3 times longer. The graph in Fig. 2 provides a visual comparison of the times to capture between MPTM and the other three target algorithms.
Comparing scenarios with a different pursuing agent and target numbers shows that, as expected, when the pursuer to target ratio increases, capture times tend to decrease, while when the pursuer to target ratio decreases, capture times tend to increase.
The evidence shows that the new MPTM algorithm outperforms SF, Minimax and Greedy algorithms in the number of steps in all test configurations.
While the experiments were designed to study target algorithms, it is also interesting to note that the STMTA* algorithm with its assignment strategy variations performs overall better than PRA*.
Statistical tests are also used on the capture costs to find out which of the results are significantly different. The proposed MPTM algorithm is compared against existing SF, Greedy and MMX algorithms. Only the STMTA* weighted-cost algorithm's results are used for the comparison as it has shown overall the best results among other pursuer algorithms as shown in Table 2. The capture costs are not normally distributed; therefore, the statistical results are obtained using the Wilcoxon Rank Sum tests. A significance level of 0.05 is used. The values obtained from the statistical tests are provided on a map in Table 3. Table 3 displays p values for all eight maps and four different player configurations separately that are used during the experiments. There were two starting positions on each map. Each set of players was aggregated on the first position and on the second position, all players were dispersed. The results in the table display p values individually for each starting position.
From this data, it can be seen that the majority of the results display statistically significant differences. p values presented in Table 3, the results below 0.05 indicate significant differences, while there are results that are below 0.01 the level of significance. Although some results are close significant. Most of the aggregated positions show significance, in contrast to dispersed positions.
It is possible to conclude that the results of the experiments for capture cost are significant for 0.01 on most of the tests. The findings should make an important contribution to the field of target search algorithms.
Success Rate. Success for the agents is achieved when a pursuing agent gets to the position of the target. In multitarget scenarios, success is achieved when all targets have been captured. For the target(s), success is the absence of agent success. The success rate for algorithms is shown in Table 4. The results presented in the table are for four target algorithms against four pursuing agent algorithms for all sets of configurations.
From this Table 4, the SF and MMX algorithm performs the worst, and they always get caught by pursuing agents in any tested combination. The Greedy algorithm shows being caught in every possible test against STMTA* algorithm and its variations. It also failed against PRA*, but only in one instance, where it managed to succeed when the deadlock occurred. It happened on the 5vs3 player configuration. In this particular example, when the pursuers caught one target, instead of approaching and catching the remaining targets, the pursuers kept moving one step back and forward until timeout.
On the other hand, MPTM shows better results in comparison with SF, Greedy and MMX. Although it has the cases where it eventually gets caught 100% but in overall performance MPTM manages quite well. The graph in Fig. 3 illustrates how MPTM performed for all test configurations on all maps.
Like for capture costs, success rates are also dependent on pursuer and target ratios. The success was proportional to the number of pursuers and targets. More pursuers for the same number of targets increased the captivity. The success   Fig. 3. The behaviour of the MPTM algorithm is better on the maps that have obstacles that could be navigated around, for example, the maps illustrated in Fig, 1. These types of maps may be suitable for adaptive target algorithms as they offer opportunities for escape but may be difficult for the pursuing agent algorithms if they do not have strategies such as the trap strategy [26]. The maps AR0311SR, AR0527SR and AR0707SR have dead-ends or blind alleys and thus make it more difficult to find an escape route, leading to lower target performance on these maps.
With some algorithms, pursuing agents sometimes fail to catch the targets, although these are outnumbered. They might catch one target but fail to catch the other, or keep following the target, or end in a deadlock until timeout. This is commonly seen in PRA* as there is no assignment strategy before starting the move, unlike STMTA*.
On average, over all maps per player configuration, the success rate can be 13% better than Minimax, Greedy and SF.
Timing. This section measures the time taken for each algorithm during the same tests that measured the capture cost and the success rate. Each experiment is recorded in seconds and averaged over all tests. Table 5 provides the results for each target algorithm. SF, Greedy and MMX do not do as much computation as MPTM prior to moving, therefore their results are smaller and closer to each other in comparison to MPTM, which has greater differences.
To find the best possible action, the MPTM algorithm computes all possible moves for the target and all pursuers on the map, therefore the computation time is much higher.

Discussion
Results presented in the previous section show that the MPTM algorithm has a greater chance of escaping from multiple pursuing agents, which has been the main focus of this study. The MPTM algorithm can predict the possible future movements of pursuers and therefore MPTM can function smartly by avoiding capture and fleeing as far as possible until it runs out of all options. This could be similar to cop and robber situations, where the robber is a villain and escapes from the cops as illustrated in the simulation gaming map from Baldur's Gate in Fig. 4. The simulation displays the initial position of four cops (pursuers) and three robbers (targets) on the map.
The proposed MPTM algorithm is measured and compared against SF, Greedy and MMX algorithms. MPTM offers better results by staying much longer on the maps and manages to escape the pursuing agents. The number of steps is the capture cost, where in some cases the MPTM avoids capture by 2.6, 2.9 and 2.4 times longer than SF, Greedy and MMX, respectively. Moreover, these results were statistically tested using the Wilcoxon Rank Sum test to establish the significance of the findings. Table 3 displays the p-values and with a 95% level of confidence, most of the results indicate significant differences. Another key measurement is the success rate that exceeds expectations for MPTM with Fig. 4 The Baldur's Gate benchmarked gaming AR0311SR map (Fig. 3a) with pursuers the targets at the initial position 91.08% of being caught, the lower is better, whereas SF and MMX get caught 100%, and Greedy with 99.98%.
Based on different maps and various player configuration settings, the suggested new algorithm allows functioning efficiently. Despite MPTM's success rate and outsmarting pursuers, further research is needed on improving the computation process. To avoid exhaustive and intensive computation with larger player configurations and to speed up the search, it might be more beneficial to have a branching factor or window-based search.

Conclusion
The aim of this paper was to provide a solution for MAPF problems and develop a target algorithm that would consider multiple pursuers and make a smart escape. Numerous interesting studies have been conducted on search algorithms, and among them are solutions to the MAPF frameworks. Only a few studies have been carried out on target algorithms, especially in multi-target environments.
This research shows that the TrailMax is a successful algorithm for control of targets if developed further for dealing with multiple pursuers. We have proposed amendments to the TrailMax algorithm to make it work as a strategy for multi-agent multi-target search problems in dynamic environments.
The resulting MPTM algorithm has been shown to outperform other target algorithms for the same scenario, and that can make pursuit and evasion scenarios in computer games more challenging, meaningful, and interesting. The results clearly show that the MPTM algorithm performs far better, with at least doubling capture cost and escaping success by 13% on the gaming maps used for benchmarking.
The issue of comparatively high computational costs could be explored in further research, for example, by exploring the use of heuristics that cut off parts of the search space. Although this study focused on evasion from multiple pursuers, further investigation to extend MPTM to collaborate with other targets would be very interesting.

Conflict of interest
The authors declare that they have no conflict of interest.

Ethical approval Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.