Anomalously acting agents: the deployment problem

The detection of intentionally antagonistic behavior in robot swarms brings about challenges that exceed identifying merely erroneous behavior. We investigate a data-based approach to recognize antagonistic behavior in robots executing a deployment task. The task requires a robot swarm of variable size and start positions to optimally distribute within an arbitrary convex surveillance area. Combining a long short-term memory neural network and a normalizing ﬂow, our approach learns to approximate the probability of a robot action. Thus, actions with low probability density values can be categorized as anomalous. The applicability of the proposed approach is validated on simulated runs containing benevolent, antagonistic, and erroneous robots. Both antagonistic and erroneous robots are detected with an accuracy of more than 90 percent.


Introduction
Robots have brought sustained progress to industry, transportation, logistics, and to the service and security sectors.They have become more accurate, more versatile, and more cost effective over the years, with much of the progress coming from improvements in the designs and algorithms of individual robots.Future progress may be achieved by letting a network of communicating robots cooperate.On the one hand, employing a network of robots may be more flexible and robust since the network can dynamically adapt to the amount of work to be done and it may be able to reorganize if individual robots break down [1].In addition, the modular combination of robots allows for the parallel execution of subtasks by simpler robots whose individual capabilities may be limited to certain tasks [2].On the other hand, some tasks may only be achievable when multiple robots cooperate.Examples for the latter include coverage tasks where a group of robots shall cover a large environment that cannot be covered by a single robot alone, e.g., to surveil an area with sensors to direct relief efforts in a disaster scenario, to track the shape of contaminant clouds or to provide signal coverage for radio communication [3,4].Sometimes, a more dense coverage is desirable in certain areas.Mathematically, all of these practical problems can be subsumed in the same abstract framework of a general deployment or coverage problem, which is the setting of this paper.Coverage tasks have been studied for a variety of circumstances, see, e.g., [5][6][7].
As it is often the case with networked robotics, the proposed approaches for coverage tasks generally assume that all involved robots are cooperative, whereas uncooperative behavior is studied more seldomly.In general, potential attacks on robot swarms have been identified in [3,8], with [8] defining an attack on a robot swarm as altering the swarm's behavior with the goal of reducing the efficiency of the swarm or preventing it from accomplishing its task.Applying this definition to the coverage problem, antagonistic behavior may for instance consist of a robot physically moving in an altered manner to gain coverage of a particularly sensitive area, thereby preventing that area from being covered by one of the benevolent robots.This goes beyond purely erroneous behavior that is not directed at a specific antagonistic goal.Whereas methodological inspiration can be drawn from approaches dealing with faulty, or generally anomalous, behavior [9], antagonistic behavior does raise additional research questions.For instance, overt antagonistic behavior may be more obstructive if left unaddressed, whereas more stealthy antagonistic behavior may be harder to detect but less effective at disrupting the swarm's task.In this context, the present work aims to develop and analyze an approach that is able to identify antagonistic behavior in deployment tasks.In particular, we focus on physically antagonistic behavior that affects the swarm by influencing mechanical-dynamical quantities such as robot motion.Other antagonistic influences, like the manipulation of communication, are not subject of this study.Indeed, identifying the presence and identity of deliberately antagonistic agents in a robot swarm has received little attention [10], and the authors are not aware of any work that studies antagonistic behavior in the deployment-problem setting.Several papers investigate a robot swarm's ability to reach consensus in a collective decision problem despite the presence of misbehaving agents [11][12][13][14].In [15] and [16], the detection of anomalies is investigated in network traffic and physical properties such as power consumption, speed, or residual energy.The research in [10] assumes prior knowledge of signatures of abnormal swarm behavior which can then be used to check against the observed swarm behavior.Further, by examining the effect of a randomized secret control signal on the robot's behavior, [17] proposes to verify that a robot is not controlled by an intruder mimicking normal swarm behavior.However, this approach presupposes the existence of a central control unit that is usually not available in decentralized robot swarms.Thus, beyond the setting, the novelty of this work lies in proposing a data-driven approach for detecting physically antagonistic behavior, requiring only data on normal, benevolent behavior, no central control unit, and no a priori expert knowledge.
Methodologically, this work combines a neural network architecture called normalizing flow with a feature embedding based on a recurrent neural network.The proposed scheme is validated with real-world data from robotics experiments.
The paper is organized as follows.Section 2 describes the deployment task and the different types of anomalous robots considered in this work.Section 3 provides information regarding the optimal solution to the deployment task as well as the neural network architecture used in this work.Section 4 describes the detection approach including additional information regarding its implementation.In Sections 5 and 6, the results are visualized and discussed, and an outlook to future work is provided.

Problem Setting
Surveillance or monitoring tasks require the robot swarm to deploy within a fixed or dynamic area.In this work, the coverage area is represented as a single convex polygon, and it is the task of the robot swarm to distribute uniformly within this area.While it generally cannot be assumed that the optimal actions of the robot agents in a swarm are known, we facilitate the examination of the proposed anomaly detection method by computing the benevolent behavior of the robot swarm deterministically and using the computed actions as a reference.The deterministic behavior of a robot swarm that cooperatively solves the deployment problem is characterized as follows.Each robot is informed about the vertices of the area's boundary and knows its current position relative to the vertices.Additionally, in order to avoid collisions and ensure proper coverage, each robot communicates its position at regular time steps t j to the other agents in the robot swarm.The coverage is optimized iteratively using Lloyd's Algorithm [18], see Algorithm 1.At each time step t j with j ∈ N 0 , a robot leverages the position information to compute its Voronoi cell, where the Voronoi cell of the ith robot represents all points within the coverage area that are closer to the robot than to any other robot and are therefore under the surveillance of robot i. Figure 1 visualizes the Voronoi decomposition for the respective time step as black Algorithm 1 Application of Lloyd's Algorithm [18] to the Deployment Problem j ← 0 while t j < t end do V (t j ) ∈ R nv×2 ← vertices of coverage area at time step t j X(t j ) ∈ R nr×2 ← current robot positions at time step t j for i in {1, ..., n r } do compute the Voronoi cell W i (V , X) of robot i compute the (weighted) area center c i of Voronoi cell W i move robot i towards c i end for t j+1 ← t j + ∆t end while lines.In order to improve the coverage of its cell area, the robot moves towards the area center of its Voronoi cell until the next time step.It then reports its new position.If the vertices of the coverage area are fixed, the robots will converge to their optimal position after a finite number of time steps.In this work, we employ a swarm of omnidirectional mobile robots, as depicted in Figure 2.Each robot has four Mecanum wheels [19] arranged in a rotation-symmetric fashion.This setup allows the robots to accelerate in any direction independent of their current configuration and, neglecting inertia, to always move in straight lines towards the cell center.Since Voronoi cells are convex, the robots do not leave their cell while moving straight towards the center of their Voronoi cell.
Deviating from the normal behavior, where all robots cooperate according to Lloyd's algorithm, in this work, three types of anomalous behavior are investigated.Firstly, in the event of a hardware failure, a robot may become immobile, but may still communicate its current position.The idle robot will thus induce a sub-optimal coverage and monitoring of the area.Secondly, software or hardware errors resulting in the improper propulsion of the robot wheels or inaccurate tracking of the robot position can cause a robot to move erroneously.The effect of such actor or sensor problems is exemplified in the second scenario, with a robot performing random actions.Such random movement will prevent the swarm from converging to its optimal positions and increase energy consumption.The effect of erroneous robots on task completion is also visualized in the second row of Figure 1.Thirdly, with the goal of keeping the robot swarm from monitoring a certain area of interest within the coverage area, a robot may actively behave antagonistically.The behavior of such an antagonistic robot is visualized in the last row of Figure 1, with the pink density representing the antagonistic robot's area of interest.By intentionally moving to its area of interest, the antagonistic robot will establish control over that area since the robot swarm's ambition to optimize the coverage of the entire area will cause the benevolent robots to distribute within the remaining area.Whereas straight movement toward the area of interest is likely to be easier to detect, the antagonistic robot can, e.g., hide its target position by only slightly shifting its trajectory as later described in Section 3.1.Subsequently, the theoretical foundations necessary to understand the proposed approach are presented.

Voronoi cells coverage area
Fig. 1 Development of the robot deployment for time steps t 0 , t 2 and t 8 using a swarm that includes only normal robots (top), one random (yellow) and one idle (white) robot (middle) and an antagonistic (pink) robot that moves towards the pink target area (bottom).

Theoretical Foundations
The optimal solution to the deployment problem, the computation of antagonistic movements as well as the concept and architecture of a normalizing flow are elaborated in the following subsections.

Deployment Problem
The optimal coverage of a polygonal area W is found by minimizing the cost function with x i ∈ R 2 being the position of robot i and W i representing the Voronoi cell that encloses x i in the Voronoi decomposition of W generated for {x 1 , . . ., x n } [20].
The weighting function φ : R 2 → R determines the importance of different locations within the coverage area and is set to a uniform distribution by default.This leads to the benevolent robots assigning equal importance to every point in W .In contrast, the density function φ ant of an antagonistic robot is set to a normal distribution φ ant = N (µ ant , σ ant ), as depicted in the bottom row of Figure 1.Using the Delaunay triangulation [21,22] of its Voronoi cell W ant , the antagonistic robot weights the points in W ant according to their density value in φ ant .The weighted center of W ant is then selected as the target position for the current time step.The antagonistic robot behavior can be set to aggressive or stealthy behavior by means of decreasing or increasing the covariance σ ant .

Normalizing Flows
Normalizing flows are a type of neural network that learns a model of an unknown, complex target probability distribution p * x over continuous random variables [23,24].In order to evaluate p * x and draw samples from it, the normalizing flow learns an invertible and differentiable transformation function T = T K • ... • T 1 .Each of the stacked transformations T k is computed by one layer of the neural network.The transformation function maps from a base distribution p u (u) to p x (x; θ), with p x (x; θ) being the 3 The normalizing flows learns an invertible and differentiable transformation T = T K • ... • T 1 between a uniform base distribution pu and the probability density function px.As a result, sampling actions from px as well as evaluating the probability density value of a given action is possible [25].
model of the target distribution approximated by a normalizing flow with parameters θ, that consist of the trainable parameters as well as the hyperparameters of the neural network.The transformation between both distributions is visualized in Figure 3.The base distribution is usually chosen as a multivariate normal or uniform distribution or any other probability density that is easy to evaluate.Using the change of variables formula, see [25], the probability density value p x (x) of a data point x can thus be computed as Similarly, it is possible to sample from p x by drawing a sample u s ∼ p u and then transforming the sample in order to get In order to fit the model distribution p x to the target distribution p * x during training, the Kullback-Leibler divergence D KL [ p * x (x) || p x (x; θ) ] [26] between both distributions is minimized.As elaborated in [25], assuming that the training data is sampled from the target distribution p * x , this is equivalent to maximizing the probability of the training data, i.e., Normalizing flows have been successfully used, e.g., for anomaly detection in industrial time series data [27] and for simulation-based inference [28].
This work employs a type of autoregressive flow with a spline-based transformer.Autoregressive flows compute the transformation T k for each layer k using a strictly monotonic function τ called transformer function and a so-called conditioner function c i , such that as shown in Figure 3, see [25].Since each c i only depends on z <i = [z 0 , ..., z i−1 ], the Jacobian matrix in Equation ( 2) is triangular and therefore easy to invert.The transformer function is built of quadratic splines [29] for K segments with endpoints z i0 , ..., z iK .For each layer, these segments are determined by a conditioner implemented as a masked autoencoder for distribution estimation (MADE) [30].

Detection Approach
An important consideration in developing the detection approach is that anomalies, including antagonistic behavior, are characterized as having a low probability of occurrence.Consequently, there is rarely enough data available to narrow down and classify different anomalous patterns in order to detect them [9].Instead, it is a common approach to anomaly detection to use data to learn a representation of normal behavior and to identify anomalies as the deviation thereof.Thus, under the assumption of unaltered normal behavior, the approach proposed in this work may in principle detect arbitrary physically anomalous behavior.While different types of neural networks have been considered, see Section 5, this work leverages the ability of a neural spline flow to model a conditional probability distribution p(a|s; θ) of a normal, i.e. benevolent, robot action.Both the action a and the state s are computed using the spatial information that is communicated between the robots, as shown in Figure 4.The robot action a at time t j is represented as the motion vector between a robot's position at t j and its position at t j=1 = t j + ∆t.The state is composed of two parts.In order to get information about the closeness of other robots, the direction vector from the position of each robot to the position of each of its c nearest neighbours is computed.Since the robots must also take the boundary of the coverage area into account, the direction vector between each robot position and the positions of the coverage area's vertices is computed as well.Both the robot swarm size and the number of polygon vertices are varied between runs, which allows to deal with different polygon shapes and a variable number of robots.Consequently, the number of state features is variable as well and has to be embedded to a vector of fixed size before it is passed to the normalizing flow.With recurrent neural networks being suitable for an arbitrary numbers of features, we select two Long Short Term Memory (LSTM) [31] networks to embed the state information.As visualized in Figure 5, the embedding contains a concatenation of the last hidden states of both LSTMs.The entire network can then be trained end-to-end.
The capability of normalizing flows to generate samples is leveraged for categorizing an action as normal or anomal.Since a trained model maximizes the log probability of training data samples under p x , see Equation ( 4), it can be assumed that normal actions drawn from p x lie in regions with a high probability value.According to this observation, the predicted probability density value of an observed action a obs coverage area robots action direction vectors to vertices direction vectors to neighbors Fig. 4 Illustration of an emblematic situation of a deployment task.The corresponding data, i.e., a robot's action as well as the direction vectors to its neighbors and the area vertices, is collected at each time step.should be significantly higher than the probability density value of the uniform base distribution, Consequently, an action is classified as anomal, if p x (a = a obs |s obs ) < s p u (T −1 (a obs )|s obs ) (6) with a scaling factor s.
An approximation of the probability density of an action a obs at a state s obs can be computed by drawing samples a sample ∼ p x (a|s obs , θ) from the trained model.Taking missing information from neighbors or noise into account, the observed action should lie approximately uniformly within the action samples.Figure 6 visualizes the observed actions as well as action samples.
The following subsections provide detailed information regarding the robots, the collection of simulation data, and the hyperparameters and training of the neural network.
a y action normalizing flow Fig. 5 Embedding of the distance between a robot position (x, y) and the positions of its nearest c neighbours and all nv vertices of the coverage area.

Robots
In simulation scenarios, the robots are modeled using a multibody model of an omnidirectional mobile robot with four Mecanum wheels as depicted in Figure 2. The model is taken from [1, Sec.4.3.2]and considers the robot chassis and the four wheels as individual bodies but, for simplicity purposes, neglects the robot's wheels suspension and considers only planar motion.The wheels are assumed to roll without slipping, which is practically the most serious modeling simplification.Moreover, each wheel's contact point with the ground is assumed to have a position that is constant relative to the robot chassis, and the wheels are modeled as one monolithic body each.The derivation of the robots' kinematics is rather intricate, which is due to the geometry of the Mecanum wheels and due to the fact that the model is subject to two non-holonomic kinematic constraints, which, however, do not affect the macroscopic maneuverability of the robot.Due to these intricacies, for compactness, the reader is kindly referred to [1, Sec.4.3.2]for full detail on the model and its derivation.This work picks the same model parameters as in the reference since these are chosen to mimic the robot from Figure 2. The simulation model's system-theoretic state vector comprises states sufficient to describe rotatory motion of the robot chassis and wheels.To this paper's detection method, only robot positions are relevant and, hence, made available.Similar to the hardware robot, the simulated robot takes desired translational velocities of the robot's center as inputs.From these, desired angular velocities for the wheels are calculated that kinematically fit the inputs.Local motor controllers are then tasked to govern the motors to quickly reach the desired angular wheel velocities.In simulations, as later on in hardware, these motor controllers run at a frequency of 100 Hz, whereas the desired chassis velocity is recalculated at a frequency of 5 Hz to always point into the direction of the area center of the robot's Voronoi cell.The robots reach a maximum velocity of 0.4 m/s in each direction, resulting in a maximum range of an

Simulation data
The coverage area is represented as a convex polygon with an area of 300 m 2 to 1200 m 2 .The number of entities in a robot swarm n r ∈ [10,30]∩N and polygon vertices n v ∈ [3, 7] ∩ N are sampled uniformly.At the start of each coverage task, the robot swarm is randomly positioned within the coverage area.An example for the start configuration can be seen in the first column of Figure 1.The coverage vertices and robot positions are scaled to lie between 0 and 1 in order to improve the range that the training data lies in.Each robot communicates its position at regular time steps t j , with the time span ∆t between sampling instances set to 5 seconds.Considering the possibility of communication issues due to poor signal coverage, a large coverage area, or a large number of robots in the swarm, robots only receive the position information of swarm entities that are spatially close.For validation and test data, it is assumed that for each robot the position of its c = min(25, n r −1) closest neighbors is available, with n r being the maximum number of robots in the swarm.However, to enforce flexibility, during training the number of neighbors is sampled in the range c = [min(15, n r − 1), 25].At each time step t j a robot starts at position (x, y) and, using the position (x tj+1 , y tj+1 ) at the next time step, we observe its action (a x , a y ) := (x tj+1 − x, y tj+1 − y).Additionally, state information is collected as the distance vectors from (x, y) to the robot's closest c neighbors, as well as the distance vectors to all n v coverage area vertices.The neural network is trained on a dataset built by simulating 200 different coverage tasks of benevolent robot behavior for a duration of 10 time steps.Collecting the state and action of each robot at each time step, the training dataset contains 40880 samples.The training performance is validated on a second dataset of normal robot behavior, simulating 50 coverage tasks that result in 10220 data samples.For the test data, that contains both normal and anomalous behaviour, 100 coverage tasks are simulated, yielding 20330 samples.In each of the 50 test simulations, one to three robots behave anomalously, with P (random) = P (idle) = 0.25 and P (antagonistic) = 0.5.The mean µ ant of the antagonistic agent's area of interest is chosen randomly within the coverage area, while the covariance is restricted to σ ant ∈ {0.5 2 , 1 2 , 1.5 2 ..., 3.5 2 }.

Model hyperparameters and training
As described in Section 3.2, we use an autoregressive normalizing flow with a splinebased transformer and add an LSTM for the embedding of the state information.Since each action is conditioned on a state, the conditioner leverages both z <i and the state embedding.In accordance with the dimensionality and range d v of the robot movement per time step, the base distribution is defined as a two-dimensional uniform distribution, described by the probability density The final model contains 19212 trainable parameters and has been trained for 178 training epochs needing a duration of 30 minutes on a NVIDIA GeForce RTX 2070 graphics card.Since almost half of actions are smaller than 0.5 m due to the robots converging, a weighted random sampler is employed to ensure that the model is trained uniformly on actions of different magnitudes.The model and training hyperparameters of the model can be found in Table 1.They are selected using the random hyperparameter optimization provided by weights and biases (WandB) [32].The code is based on the Python [33] package nflows [34] and the model is trained using PyTorch [35].

Results
The performance of the final model is evaluated on the validation set based on multiple criteria.The log probability of the actions in the weighted training set increases from −2.96 after the first training epoch to 1.12 after 178 epochs, with both training and validation loss showing converging behavior.The unweighted final validation log probability is 0.77.Additionally, if the normalizing flow fits the target density well, transforming the training data to values in the base distribution p u should result in noise that also resembles the base distribution [36].As shown in Figure 7, this validation criterium is fulfilled reasonably well.
After the training and validation of the model is completed, the trained model can be used to investigate the test dataset containing both anomalous and benevolent robots.Observed actions are classified as anomal according to Equation ( 6), inserting the base uniform probability density value p u (u v ) = 0.03125 and setting the scaling Fig. 8 The robot incentive to move towards the antagonistic target depends on the covariance σant of the target area.Plots 1 to 7 depict the Euclidean distance in m between the antagonistic target and the robot's position at time t j on the x-axis and the position at time t j + ∆t on the y-axis.
Larger covariances lead to robots moving faster towards the target area.The lower right plot shows the detection accuracy per covariance.
factor to s = 2, such that a obs is anomal if p x (a = a obs ) < 0.625.From the samples in the validation dataset, only 4.3% of the actions fall below this threshold.Thus, the probability of a robot performing more than 2 anomal steps is about 0.8% and a robot is categorized as anomal when taking more than 2 anomal steps.Table 2 summarizes the model performance on the test data for the normal, erroneous, and antagonistic robots.The detection method proposed in this work yielded an accuracy of correctly identified anomalous robot behavior exceeding 90% for all three robot types, i.e. idle, random and antagonistic.While the random robot is detected in 100% of the cases, the idle robot probably benefits from converging actions having a very small magnitude and therefore being close to no movement.Most importantly, the antagonistic robot is detected in 90% of the cases.Since antagonistic robots can be coincidentally close to their antagonistic target position, it is expected that some of them will behave almost normally.The accuracy is therefore satisfying for all robot types.Figure 8 visualizes how larger covariances of the antagonistic area of interest lead to robots moving less aggressively towards the area.For the covariance σ ant = 0.25, most antagonistic robots move directly to the center of the area µ ant .With increasing covariance, the end positions of the robots are further away from the center of the target area, indicating a more cautious movement towards the area of interest.However, due to their rapidly decreasing density, it can also be seen that areas with a small covariance have little influence onto robots with large starting distances.The lower right plot depicts, for each covariance, the percentage of antagonistic steps categorized as anomalous.There is no apparent correlation between detection accuracy and covariance.
In addition, Figures 9, 10, and 11 depict the detection of antagonistic and erroneous robots in different coverage settings.The figures trace the robot movement over 10 time steps, with the black arrows being the actions, and the blue arrows representing samples from the action density.A robot is marked with a square if it has exceeded the threshold of two anomalous actions and is consequently being categorized Fig. 10 In this run, one of the antagonistic robot's start position allows it to perform near-normal movements towards its target.As a result, it is not detected.
as anomalous.The robot swarm in Figure 9 contains one random and two antagonistic agents.The behavior of the left antagonistic robot is set to maximum stealth by setting the covariance of its target area to σ ant = 12.25.In contrast, the right robot moves aggressively towards its area of interest with σ ant = 0.25.While the neural network has no information about these covariances, it correctly categorizes all robots after only a few steps.In Figure 10, one of the three antagonistic robots has a target close to its starting position and to its optimal coverage position.Consequently, the robot behavior is nearly normal and the robot is not detected.While unfortunate, this is to be expected of an anomalous robot that, by pure chance, behaves almost identically as a normal one given the same situation, and is therefore more a question of observability independent of the detection method employed.Figure 9 visualizes a different kind of difficulty.From the qualitative results it appears that large distances between vertices lead to an inadequate representation of the connecting edge.Thus, the predicted actions of robots close to long edges are skewed towards the edge of the coverage area, hereby causing one of the benevolent robots to be categorized as anomalous.In order to mitigate this problem, further investigation is required.As an alternative to the approach described so far, as a lightweight approach to learning an action probability, we investigated Gaussian process regression (GPR) [37] in a setting of five robots within a fixed coverage area.The GPR was able to fit to the training data and yielded worse predictions for anomalous actions than for normal actions.However, even though the setting was simplified, the GPR overall was not able to generalize well to unknown states.
Besides that, the spatial dispersion of the robot entities and the vertices of the coverage area can be represented as a graph, therefore we also investigated different graph neural network methods have been investigated for the state embedding [38][39][40][41].However, whereas various approaches learned the local structure of neighbors and area around a robot entity, the resulting state embedding was rotation invariant.This rotation invariance cannot be reconciled with the robot actions describing a movement in a certain direction within the area's coordinate system.
Since the robot actions are computed deterministically in the current setting, it might be possible to use a purely threshold-based detection approach.This means that robot entities could compute the optimal action of a neighboring robot and check if the Euclidean distance between optimal action and executed action exceeds a threshold.However, a clear advantage of the probabilistic approach presented in this work is that it results in a very intuitive anomaly recognition based on the probability of an action and simultaneously allows to quantify the uncertainty about an action being anomalous.
Additionally, in many real-world use cases the optimal action used for a thresholdbased approach is not known or only partially known.This includes robots that have been trained with reinforcement learning [42], robot swarms that have been deployed within an area that contain obstacles, such that a robot might need to dynamically adapt its behaviour to avoid collisions [43], or agents performing complex tasks that are very expensive to compute, such as road traffic [15].

Conclusion
This paper presents a method for the reliable detection of antagonistic robots in a deployment setting.The approach uses a neural network comprising an LSTM and a normalizing flow to learn the probability density of robot actions and categorize actions with a low probability value as anomalous.The neural network was trained solely on the benevolent behavior of a robot swarm being deployed in diverse coverage areas, consequently not being limited to the detection of a specific type of anomalous behaviour.The presented detection approach was able to correctly identify more than 90 percent of the antagonistic and erroneous robots used for the testing phase.All things considered, the proposed methodology is suitable for alleviating a variety of the security challenges for robot swarms that are mentioned in [3], especially for intrusion detection, ensuring swarm mobility and adhering to energy constraints.
We plan to extend our work by validating the applicability of the proposed detection approach on settings with environmental noise, obstacles and more advanced antagonistic agents.Leveraging the intuitiveness of our probabilistic approach, we plan to introduce a dynamic adaptation of the anomaly threshold that reacts to disturbances caused by environmental influences on multiple entities in the robot swarm.A further, interesting point of research is the development of counteractions against antagonistic agents.

Fig. 2
Fig.2Omnidirectional mobile robot used as an agent in the robot swarm[1].

Fig. 6
Fig. 6 Observed robot actions in a simulation run and action samples from the trained model.action of d v = ∆t • √ 0.4 2 + 0.4 2 m.The Voronoi tesselation, and, in consequence, the cell centroids the robots aim for, are updated every ∆t = 5 seconds.

Fig. 7
Fig. 7 Kernel density estimation and cumulative density for both dimensions of the training data transformed to noise values.

Fig. 9
Fig. 9 Robot swarm containing 2 antagonistic robots with σant = 0.25 for the more aggressive robot and σant = 12.25 for the rather stealthy robot.

Fig. 11
Fig.11Large distances between vertices of a coverage area may lead to actions being shifted towards the edges.As a result, multiple normal robots are mistakenly being categorized as anomal in this run.

Table 1
Hyperparameters of the LSTM + NSF model

Table 2
Accuracy, percentage of actions categorized as anomalous and detection accuracy based on a threshold of more than two anomalous actions for all robot types.The results for the antagonistic robot are first presented for all σant, then evaluated for the different σant.