1 Introduction

Robots have brought sustained progress to industry, transportation, logistics, and to the service and security sectors. They have become more accurate, more versatile, and more cost-effective over the years, with much of the progress coming from improvements in the designs and algorithms of individual robots. Future progress may be achieved by letting a swarm of communicating robots cooperate. On the one hand, employing a robot swarm may be more flexible and robust since it can dynamically adapt to the amount of work to be done, and it may be able to reorganize if individual robots break down [1]. In addition, the modular combination of robots allows for the parallel execution of sub-tasks by simpler robots whose individual capabilities may be limited to certain tasks [2]. On the other hand, some tasks may only be achievable when multiple robots cooperate. Examples of the latter include coverage tasks where a group of robots shall cover a large environment that cannot be covered by a single robot alone, e.g., to monitor an area with sensors to direct relief efforts in a disaster scenario, to track the shape of contaminant clouds, or to provide signal coverage for radio communication [3, 4]. Sometimes, a more dense coverage is desirable in certain areas. Mathematically, all of these practical problems can be subsumed in the same abstract framework of a general deployment or coverage problem, which is the setting of this paper. Coverage tasks have been studied for a variety of circumstances, see, e.g., [57].

As is often the case with networked robotics, the proposed approaches for coverage tasks generally assume that all involved robots are cooperative, whereas uncooperative behavior is studied more seldom. In general, potential attacks on robot swarms have been identified in [3, 8], with [8] defining an attack on a robot swarm as altering the swarm’s behavior with the goal of reducing the efficiency of the swarm or preventing it from accomplishing its task. Applying this definition to the coverage problem, antagonistic behavior may, for instance, consist of a robot physically moving in an altered manner to gain coverage of a particularly sensitive area, thereby preventing that area from being covered by one of the benevolent robots. This goes beyond purely erroneous behavior that is not directed at a specific antagonistic goal. Whereas methodological inspiration can be drawn from approaches dealing with faulty, or generally anomalous, behavior [9], antagonistic behavior does raise additional research questions. For instance, overt antagonistic behavior may be more obstructive if left unaddressed, whereas more stealthy antagonistic behavior may be harder to detect but less effective at disrupting the swarm’s task. In this context, the present work aims to develop and analyze an approach that can identify anomalous behavior in deployment tasks. In particular, we focus on the physical manifestation of erroneous or antagonistic behavior that affects the swarm by influencing mechanical-dynamical quantities such as robot motion. Other antagonistic influences, like the manipulation of communication, are not the subject of this study.

Indeed, identifying the presence and identity of deliberately antagonistic agents in a robot swarm has received little attention [10], and the authors are not aware of any work that studies antagonistic behavior in the deployment-problem setting. Several papers investigate a robot swarm’s ability to reach consensus in a collective decision problem despite the presence of misbehaving agents [1114]. In [15] and [16], the detection of anomalies is investigated in network traffic and physical properties such as power consumption, speed, or residual energy. The research in [10] assumes prior knowledge of signatures of abnormal swarm behavior, which can then be used to check against the observed swarm behavior. Further, by examining the effect of a randomized secret control signal on the robot’s behavior, [17] proposes to verify that a robot is not controlled by an intruder mimicking normal swarm behavior. However, this approach presupposes the existence of a central control unit that is usually not available in decentralized robot swarms. Thus, beyond the setting, the novelty of this work lies in proposing a data-driven approach for detecting antagonistic behavior, requiring only data on normal, benevolent behavior, no central control unit, and no a priori expert knowledge.

Methodologically, this work combines a neural network architecture called normalizing flow with a feature embedding based on recurrent neural networks. The proposed scheme is validated on data that was generated using the multibody model of an omnidirectional mobile robot.

The paper is organized as follows: Sect. 2 describes the deployment task and the different types of anomalous robots considered in this work. Section 3 provides information regarding the optimal solution to the deployment task as well as the neural network architecture used in this work. Section 4 describes the detection approach, including additional information regarding its implementation. In Sects. 5 and 6, the results are visualized and discussed, and an outlook to future work is provided.

2 Problem setting

Surveillance or monitoring tasks require the robot swarm to deploy within a fixed or dynamic area. In this work, the coverage area is represented as a single convex polygon, and it is the task of the robot swarm to distribute uniformly within this area. While it generally cannot be assumed that the optimal actions of the robot agents in a swarm are known, we facilitate the examination of the proposed anomaly detection method by computing the benevolent behavior of the robot swarm deterministically and using the computed actions as a reference. The deterministic behavior of a robot swarm that cooperatively solves the deployment problem is characterized as follows. Each robot is informed about the vertices of the area’s boundary and knows its current position relative to the vertices. Additionally, in order to avoid collisions and ensure proper coverage, each robot communicates its position at regular time steps \(t_{j}\), \(j \in \mathbb{N}_{0}\), to the other agents in the robot swarm. The coverage is optimized iteratively using Lloyd’s Algorithm [18], see Algorithm 1. At each time step \(t_{j}\), a robot leverages the position information to compute its Voronoi cell, where the Voronoi cell of the \(i\)th robot represents all points within the coverage area that are closer to the robot than to any other robot and are therefore under the surveillance of robot \(i\). Figure 1 visualizes the Voronoi decomposition for the respective time step as black, dashed lines. In order to improve the coverage of its cell area, the robot moves towards the area center of its Voronoi cell until the next time step. It then reports its new position. If the vertices of the coverage area are fixed, the robots will converge to their optimal position after a finite number of time steps. In this work, we employ a swarm of omnidirectional mobile robots, as depicted in Fig. 2. Each robot has four Mecanum wheels [19] arranged in a rotation-symmetric fashion. This setup allows the robots to accelerate in any direction independent of their current configuration and, neglecting inertia, to always move in straight lines towards the cell center. Since Voronoi cells are convex, the robots do not leave their cell when moving straight towards the center of their Voronoi cell.

Algorithm 1
figure 1

Application of Lloyd’s Algorithm [18] to the Deployment Problem

Fig. 1
figure 2

Development of the robot deployment for time steps \(t_{0}\), \(t_{2}\) and \(t_{8}\) using a swarm that includes only normal robots (top), one random (yellow) and one idle (white) robot (middle) and an antagonistic (pink) robot that moves towards the pink target area (bottom). Color figure online

Fig. 2
figure 3

Omnidirectional mobile robot used as an agent in the robot swarm [1]

Deviating from the normal behavior, where all robots cooperate according to Lloyd’s algorithm, in this work, three types of anomalous behavior are investigated. First, in the event of a hardware failure, a robot may become immobile but may still communicate its current position. The idle robot will thus induce a sub-optimal coverage and monitoring of the area. Second, software or hardware errors resulting in the improper propulsion of the robot wheels or inaccurate tracking of the robot position can cause a robot to move erroneously. The effect of such actor or sensor problems is exemplified in the second scenario, with a robot performing random actions. Such random motions will prevent the swarm from converging to its optimal positions and increase energy consumption. The effect of erroneous robots on task completion is also visualized in the second row of Fig. 1. Thirdly, with the goal of keeping the robot swarm from monitoring a certain area of interest within the coverage area, a robot may actively behave antagonistically, e.g., due to being hacked. The behavior of such an antagonistic robot is visualized in the last row of Fig. 1, with the pink density representing the antagonistic robot’s area of interest. By intentionally moving to its area of interest, the antagonistic robot will establish control over that area since the robot swarm’s ambition to optimize the coverage of the entire area will cause the benevolent robots to distribute within the remaining area. Whereas straight movement toward the area of interest is likely to be easier to detect, the antagonistic robot can, e.g., hide its target position by only slightly shifting its trajectory as later described in Sect. 3.1. Subsequently, the theoretical foundations necessary to understand the proposed approach are presented.

3 Theoretical foundations

The optimal solution to the deployment problem, the computation of antagonistic actions as well as the concept and architecture of a normalizing flow are described in detail in the following subsections.

3.1 Deployment problem

The optimal coverage of a polygonal area \(W\) is found by minimizing the cost function

$$ H(\boldsymbol{x}_{1}, \dots , \boldsymbol{x}_{n}, W) = \sum _{i=1}^{n} \int _{W_{i}} || \boldsymbol{q} - \boldsymbol{x}_{i}||^{2} \phi _{i}(\boldsymbol{q})\, \mathrm{d} W_{i} $$
(1)

with \(\boldsymbol{x}_{i} \in \mathbb{R}^{2}\) being the position of robot \(i\) and \(W_{i}\) representing the Voronoi cell that encloses \(\boldsymbol{x}_{i}\) in the Voronoi decomposition of \(W\) generated for \(\{\boldsymbol{x}_{1}, \dots , \boldsymbol{x}_{n}\}\) [20]. For each Voronoi cell, \(\boldsymbol{q} \in \mathbb{R}^{2}\) represents the positions that lie within \(W_{i}\). The weighting function \(\phi : \mathbb{R}^{2} \rightarrow \mathbb{R}\) determines the importance of different locations within the coverage area and is set to a uniform distribution by default. This leads to the benevolent robots assigning equal importance to every point in \(W\). In contrast, the density function \(\phi _{\text{ant}}\) of an antagonistic robot is set to a normal distribution \(\phi _{\text{ant}} = \mathcal{N}(\mu _{\text{ant}}, \sigma _{ \text{ant}})\), as depicted in the bottom row of Fig. 1. Using the Delaunay triangulation [21, 22] of its Voronoi cell \(W_{\text{ant}}\), the antagonistic robot weights the points in \(W_{\text{ant}}\) according to their density value in \(\phi _{\text{ant}}\). The weighted center of \(W_{\text{ant}}\) is then selected as the target position for the current time step. The antagonistic robot behavior can be set to aggressive or stealthy behavior by means of decreasing or increasing the covariance \(\sigma _{\text{ant}}\).

3.2 Normalizing flows

Normalizing flows are a type of neural network that learns a model of an unknown, potentially complicated target probability distribution \(p^{*}_{x}\) over continuous random variables [23, 24]. In order to evaluate \(p^{*}_{x}\) and draw samples from it, the normalizing flow learns an invertible and differentiable transformation function \(\boldsymbol{T} = \boldsymbol{T}_{K} \circ \cdots \circ \boldsymbol{T}_{1}\). Each of the stacked transformations \(\boldsymbol{T}_{k}\) is computed by one layer of the neural network. The transformation function maps from a base distribution \(p_{u}(\boldsymbol{u})\) to \(p_{x}(\boldsymbol{x};\boldsymbol{\theta})\), with \(p_{x}(\boldsymbol{x};\boldsymbol{\theta})\) being the model of the target distribution approximated by a normalizing flow with parameters \(\boldsymbol{\theta}\), that consist of the trainable parameters as well as the hyperparameters of the neural network. The transformation between both distributions is visualized in Fig. 3. The base distribution is usually chosen as a multivariate normal or uniform distribution or any other probability density that is easy to evaluate. Using the change of variables formula, see [25], the probability density value \(p_{x}( \boldsymbol{x})\) of a data point \(\boldsymbol{x}\) can thus be computed as

$$ p_{x}( \boldsymbol{x}) = p_{u} (\boldsymbol{T}^{-1}(\boldsymbol{x}))\, |\det (\boldsymbol{J}_{\boldsymbol{T}^{-1}}( \boldsymbol{x}))|, $$
(2)

where \(\boldsymbol{J}_{\boldsymbol{T}^{-1}}\) represents the Jacobian matrix of the inverse transformation function \(\boldsymbol{T}^{-1}\) with respect to \(\boldsymbol{x}\). Similarly, it is possible to sample from \(p_{x}\) by drawing a sample \(\boldsymbol{u}_{\text{s}} \sim p_{u}\) and then transforming the sample in order to get

$$ \boldsymbol{x}_{\text{s}} = \boldsymbol{T}(\boldsymbol{u}_{\text{s}}). $$
(3)

In order to fit the model distribution \(p_{x}\) to the target distribution \(p^{*}_{x}\) during training, the Kullback-Leibler divergence \(D_{\text{KL}}[\,p^{*}_{x}(\boldsymbol{x})\,\vert \vert \,p_{x}(\boldsymbol{x}; \boldsymbol{\theta})\,]\) [26] between both distributions is minimized. As elaborated in [25], assuming that the training data is sampled from the target distribution \(p^{*}_{x}\), this is equivalent to maximizing the probability of the training data, i.e.,

$$\begin{aligned} \mathcal{L}(\boldsymbol{\theta}) &= D_{\text{KL}}[\,p^{*}_{x}(\boldsymbol{x})\,\vert \vert \,p_{x}(\boldsymbol{x}; \boldsymbol{\theta})\,] \\ &= -\mathbb{E}_{p^{*}_{x}(\boldsymbol{x})} [\, \log p_{x}(\boldsymbol{x}; \boldsymbol{\theta}) \,] + \text{const} \\ &\approx - \frac{1}{N} \sum _{n=1}^{N} \log p_{u} (\boldsymbol{T}^{-1}(\boldsymbol{x}_{n}; \boldsymbol{\theta})) + \log \vert \det (J_{\boldsymbol{T}^{-1}}(\boldsymbol{x}_{n}; \boldsymbol{\theta}))\vert + \text{const}. \end{aligned}$$
(4)

Normalizing flows have been successfully used, e.g., for anomaly detection in industrial time series data [27] and for simulation-based inference [28].

Fig. 3
figure 4

The normalizing flow learns an invertible and differentiable transformation \(\boldsymbol{T} = \boldsymbol{T}_{K} \circ \cdots \circ \boldsymbol{T}_{1}\) between a uniform base distribution \(p_{u}\) and the probability density function \(p_{x}\). As a result, it is possible to sample actions from \(p_{x}\), as well as evaluate the probability density value of a given action [25]

This work employs a type of autoregressive flow with a spline-based transformer. Autoregressive flows compute the transformation \(T_{k}\) for each layer \(k\) using a strictly monotonic function \(\tau \) called transformer function and a so-called conditioner function \(c_{i}\), such that

$$ z'_{i} = \tau (z_{i}, c_{i}(\boldsymbol{z}_{< i})) $$
(5)

as shown in Fig. 3, see [25]. Since each \(c_{i}\) only depends on \(\boldsymbol{z}_{< i} = [z_{0}, \ldots, z_{i-1}]\), the Jacobian matrix in Equation (2) is triangular and therefore easy to invert. The transformer function is built of quadratic splines [29] for \(K\) segments with endpoints \(z_{i0}, \ldots, z_{iK}\). For each layer, these segments are determined by a conditioner implemented as a masked autoencoder for distribution estimation (MADE) [30].

4 Detection approach

An important consideration in developing the detection approach is that anomalies, including antagonistic behavior, are characterized as having a low probability of occurrence. Consequently, there is rarely enough data available to narrow down and classify different anomalous patterns in order to detect them [9]. Instead, it is a common approach to anomaly detection to use data to learn a representation of normal behavior and to identify anomalies as the deviation thereof. Thus, under the assumption of unaltered normal behavior, the approach proposed in this work may, in principle, detect arbitrary physically anomalous behavior.

Further benefits of a data-driven approach to anomaly detection include its adaptability to different settings as long as corresponding data is available. For instance, the approach can be trained on the behavior of a swarm performing a different task or following a different algorithm for solving the coverage problem. In addition, the robustness of the model to environmental influences can be adapted by adding noise to the training data in order to increase the variance of the normal motion behavior.

While different types of neural networks have been considered, see Sect. 5, this work leverages the ability of an autoregressive flow with a spline-based transformer, also called neural spline flow (NSF), to model a conditional probability distribution \(p(\boldsymbol{a} \vert \boldsymbol{s}; \boldsymbol{ \theta })\) of a normal, i.e., benevolent, robot action. Both the action \(\boldsymbol{a}\) and the state \(\boldsymbol{s}\) are computed using the spatial information that is communicated between the robots, as shown in Fig. 4. The robot action \(\boldsymbol{a}\) at time \(t_{j}\) is represented as the motion vector between a robot’s position at \(t_{j}\) and its position at \(t_{j+1} = t_{j} + \Delta t\). The state is composed of two parts. In order to get information about the closeness of other robots, the direction vector from the position of each robot to the position of each of its \(c\) nearest neighbors is computed. Since the robots must also take the boundary of the coverage area into account, the direction vector between each robot position and the positions of the coverage area’s vertices is computed as well. Both the robot swarm size and the number of polygon vertices vary between runs, which allows one to deal with different polygon shapes and a variable number of robots. Consequently, the number of state features is variable as well and has to be embedded in a vector of fixed size before it is passed to the normalizing flow. With recurrent neural networks being suitable for an arbitrary number of features, we select the Long Short Term Memory (LSTM) [31] network architecture to embed the state information. Due to their dissimilar effect on the robot actions, the distances to the robot positions and the distances to the positions of the coverage area’s vertices are embedded separately, using one LSTM each so that two LSTMs are used in total. As visualized in Fig. 5, the embedding contains a concatenation of the last hidden states of both LSTMs. The entire network can then be trained end-to-end.

Fig. 4
figure 5

Illustration of an emblematic situation of a deployment task. The corresponding data, i.e., a robot’s action as well as the direction vectors to its neighbors and the area vertices, is collected at each time step. Color figure online

Fig. 5
figure 6

Embedding of the distance between a robot position \((x, y)\) and the positions of its nearest \(c\) neighbors and all \(n_{\text{v}}\) vertices of the coverage area

The capability of normalizing flows to generate samples is leveraged for categorizing an action as normal or anomal. Since a trained model maximizes the log probability of training data samples under \(p_{x}\), see Equation (4), it can be assumed that normal actions drawn from \(p_{x}\) lie in regions with a high probability value. According to this observation, the predicted probability density value of an observed action \(\boldsymbol{a}_{\text{obs}}\) should be significantly higher than the probability density value of the uniform base distribution,

$$ p_{x}(\boldsymbol{a} = \boldsymbol{a}_{\text{obs}} | \boldsymbol{s}_{\text{obs}}) = p_{u} ( \boldsymbol{T}^{-1}(\boldsymbol{a}_{\text{obs}}) | \boldsymbol{s}_{\text{obs}})\, |\det (\boldsymbol{J}_{ \boldsymbol{T}^{-1}}(\boldsymbol{x}))|\, \gg \,p_{u} (\boldsymbol{T}^{-1}(\boldsymbol{a}_{\text{obs}}) | \boldsymbol{s}_{\text{obs}}) . $$

Consequently, an action is classified as anomal, if

$$ p_{x}(\boldsymbol{a} = \boldsymbol{a}_{\text{obs}} | \boldsymbol{s}_{\text{obs}}) < s \, p_{u} ( \boldsymbol{T}^{-1}(\boldsymbol{a}_{\text{obs}}) | \boldsymbol{s}_{\text{obs}}) $$
(6)

with a scaling factor \(s\).

An approximation of the probability density of an action \(\boldsymbol{a}_{\text{obs}}\) at a state \(\boldsymbol{s}_{\text{obs}}\) can be computed by drawing samples \(\boldsymbol{a}_{\text{sample}} \sim p_{x}(\boldsymbol{a} \vert \boldsymbol{s}_{\text{obs}}, \boldsymbol{\theta})\) from the trained model. Taking missing information from neighbors or noise into account, the observed action should lie approximately uniformly within the action samples. Figure 6 visualizes the observed actions as well as action samples.

Fig. 6
figure 7

Observed robot actions in a simulation run and action samples from the trained model. Color figure online

The following subsections provide detailed information regarding the robots, the collection of simulation data, the hyperparameters, and the training of the neural network.

4.1 Robots

In simulation scenarios, the robots are modeled using a multibody model of an omnidirectional mobile robot with four Mecanum wheels as depicted in Fig. 2. The model is taken from [1, Sect. 4.3.2] and considers the robot chassis and the four wheels as individual bodies but, for simplicity purposes, neglects the robot’s wheels’ suspensions and considers only planar motion. The wheels are assumed to roll without slipping. Moreover, each wheel’s contact point with the ground is assumed to have a position that is constant relative to the robot chassis, and the wheels are modeled as one monolithic body each. The derivation of the robots’ kinematics is rather intricate, which is due to the geometry of the Mecanum wheels and due to the fact that the model is subject to two nonholonomic kinematic constraints, which, however, do not affect the macroscopic maneuverability of the robot. Due to these intricacies, for compactness, the reader is kindly referred to [1, Sect. 4.3.2] for full detail on the model and its derivation. This work picks the same model parameters as in the reference since these are chosen to mimic the robot from Fig. 2. The simulation model’s system-theoretic state vector comprises states sufficient to describe the rotatory motion of the robot chassis and wheels. To this paper’s detection method, only robot positions are relevant and, hence, made available. Similar to the hardware robot, the simulated robot takes the desired translational velocities of the robot’s center as inputs. From these, desired angular velocities for the wheels are calculated that kinematically fit the inputs. Local motor controllers are then tasked to govern the motors to quickly reach the desired angular wheel velocities. In simulations, as later on in hardware, these motor controllers run at a frequency of \(100~\text{Hz}\), whereas the desired chassis velocity is recalculated at a frequency of \(5~\text{Hz}\) to always point into the direction of the area center of the robot’s Voronoi cell. The robots reach a maximum velocity of \(0.4\text{ m}/\text{s}\) in each direction, resulting in a maximum range of an action of \(d_{\text{v}} = \Delta t \cdot \sqrt{0.4^{2} + 0.4^{2}}\text{ m}\). The Voronoi tesselation and, in consequence, the cell centroids the robots aim for, are updated every \(\Delta t = 5\) seconds.

4.2 Simulation data

The coverage area is represented as a convex polygon with an area of 300 m2 to 1200 m2. The number of entities in a robot swarm \(n_{r} \in [10, 30] \cap \mathbb{N}\) and polygon vertices \(n_{v} \in [3, 7] \cap \mathbb{N}\) are sampled uniformly. At the start of each coverage task, the robot swarm is randomly positioned within a coverage area that has been randomly generated while considering the constraints on the area and number of vertices. An example of the start configuration can be seen in the first column of Fig. 1. The coverage vertices and robot positions are scaled to lie between 0 and 1 in order to improve the range that the training data lies in. Each robot communicates its position at regular time steps \(t_{j}\), with the time span \(\Delta t\) between sampling instances set to 5 seconds. Considering the possibility of communication issues due to poor signal coverage, a large coverage area, or a large number of robots in the swarm, robots only receive the position information of swarm entities that are spatially close. For validation and test data, it is assumed that for each robot the position of its \(c = \text{min}(25, n_{\text{r}}-1)\) closest neighbors is available, with \(n_{\text{r}}\) being the maximum number of robots in the swarm. However, to enforce flexibility, during training the number of neighbors is sampled in the range \(c = \left [\text{min}(15, n_{\text{r}}-1), 25 \right ]\). At each time step \(t_{j}\) a robot starts at position \((x, y)\) and, using the position \((x_{t_{j+1}}, y_{t_{j+1}})\) at the next time step, we observe its action \((a_{x}, a_{y}) :=(x_{t_{j+1}} - x, y_{t_{j+1}} - y)\). Additionally, state information is collected as the distance vectors from \((x, y)\) to the robot’s closest \(c\) neighbors, as well as the distance vectors to all \(n_{\text{v}}\) coverage area vertices. The neural network is trained on a dataset built by simulating 200 different coverage tasks of benevolent robot behavior for a duration of 10 time steps. Collecting the state and action of each robot at each time step, the training dataset contains 40,880 samples. The training performance is validated on a second dataset of normal robot behavior, simulating 50 coverage tasks that result in 10,220 data samples. For the test data that contains both normal and anomalous behavior, 100 coverage tasks are simulated, yielding 20,330 samples. In each of the 50 test simulations, one to three robots behave anomalously, with \(P(\text{random}) = P(\text{idle}) = 0.25\) and \(P(\text{antagonistic}) = 0.5\). The mean \(\mu _{\text{ant}}\) of the antagonistic agent’s area of interest is chosen randomly within the coverage area, while the covariance is restricted to \(\sigma _{\text{ant}} \in \{0.5^{2}, 1^{2}, 1.5^{2},\ldots, 3.5^{2}\}\).

4.3 Model hyperparameters and training

As described in Sect. 3.2, we use an autoregressive normalizing flow with a spline-based transformer and add an LSTM for the embedding of the state information. Since each action is conditioned on a state, the conditioner leverages both \(\boldsymbol{z}_{< i}\) and the state embedding. In accordance with the dimensionality and range \(d_{\text{v}}\) of the robot motion per time step, the base distribution is defined as a two-dimensional uniform distribution, described by the probability density

$$ p_{u}(\boldsymbol{u}) = \left \{ \begin{aligned} \frac{1}{(d_{\text{v}}- (-d_{\text{v}}))^{2}} &\text{ for } -d_{ \text{v}} \leq u_{1} \leq d_{\text{v}} \text{ and } -d_{\text{v}} \leq u_{2} \leq d_{\text{v}}, \\ 0 \hspace{18pt} &\text{ else}. \end{aligned} \right . $$
(7)

The final model contains 19,212 trainable parameters and has been trained for 178 training epochs needing a duration of 30 minutes on an NVIDIA GeForce RTX 2070 graphics card. Since almost half of the actions are smaller than 0.5 m due to the robots converging, a weighted random sampler is employed to ensure that the model is trained uniformly on actions of different magnitudes. The model and training hyperparameters of the model can be found in Table 1. They are selected using the random hyperparameter optimization provided by weights and biases (WandB) [32]. The code is based on the Python [33] package nflows [34] and the model is trained using PyTorch [35].

Table 1 Hyperparameters of the LSTM + NSF model

5 Results

The performance of the final model is evaluated on the validation set based on multiple criteria. The log probability of the actions in the weighted training set increases from −2.96 after the first training epoch to 1.12 after 178 epochs, with both training and validation loss showing converging behavior. The unweighted final validation log probability is 0.77. Additionally, if the normalizing flow fits the target density well, transforming the training data to values in the base distribution \(p_{u}\) should result in noise that also resembles the base distribution [36]. As shown in Fig. 7, this validation criterium is fulfilled reasonably well.

Fig. 7
figure 8

Kernel density estimation and cumulative density for both dimensions of the training data transformed to noise values

After the training and validation of the model is completed, the trained model can be used to investigate the test dataset containing both anomalous and benevolent robots. Observed actions are classified as anomal according to Equation (6), inserting the base uniform probability density value \(p_{u}(\boldsymbol{u}_{v}) = 0.03125\) and setting the scaling factor to \(s = 2\), such that \(\boldsymbol{a}_{\text{obs}}\) is anomal if \(p_{x}(\boldsymbol{a} = \boldsymbol{a}_{\text{obs}}) < 0.625\). From the samples in the validation dataset, only 4.3% of the actions fall below this threshold. Thus, the probability of a robot performing more than 2-anomal steps is about 0.8% and a robot is categorized as anomal when taking more than 2-anomal steps. Table 2 summarizes the model performance on the test data for the normal, erroneous, and antagonistic robots. The detection method proposed in this work yielded an accuracy of correctly identifying anomalous robot behavior exceeding \(90\%\) for all three robot types, i.e., idle, random, and antagonistic. While the random robot is detected in \(100\%\) of the cases, the idle robot probably benefits from actions taken close to convergence that have a very small magnitude and is therefore close to no motion. Most importantly, the antagonistic robot is detected in \(90\%\) of the cases. Since antagonistic robots can be coincidentally close to their antagonistic target position, it is expected that some of them will behave almost normally. The accuracy is therefore satisfying for all robot types.

Table 2 Accuracy, percentage of actions categorized as anomalous, and detection accuracy based on a threshold of more than two anomalous actions for all robot types. The results for the antagonistic robot are first presented for all \(\sigma _{\text{ant}}\), then evaluated for the different \(\sigma _{\text{ant}}\)

Figure 8 visualizes how larger covariances of the antagonistic area of interest lead to robots moving less aggressively towards the area. For the covariance \(\sigma _{\text{ant}} = 0.25\), most antagonistic robots move directly to the center of the area \(\mu _{\text{ant}}\). With increasing covariance, the end positions of the robots are further away from the center of the target area, indicating a more cautious movement towards the area of interest. However, due to their rapidly decreasing density, it can also be seen that areas with a small covariance have little influence on robots with large starting distances. The lower right plot depicts, for each covariance, the percentage of antagonistic steps categorized as anomalous. There is no apparent correlation between detection accuracy and covariance.

Fig. 8
figure 9

The robot’s incentive to move towards the antagonistic target depends on the covariance \(\sigma _{\text{ant}}\) of the target area. Plots 1 to 7 depict the Euclidean distance in m between the antagonistic target and the robot’s position at time \(t_{0}\) on the \(x\)-axis and the position at time \(t_{10}\) on the \(y\)-axis. Larger covariances lead to robots moving faster towards the target area. The lower right plot shows the detection accuracy per covariance

In addition, Figs. 9, 10, and 11 depict the detection of antagonistic and erroneous robots in different coverage settings. The figures trace the robot’s motions over 10 time steps, with the black arrows being the actions, and the blue-gray arrows representing samples from the action density. A robot is marked with a square if it has exceeded the threshold of two anomalous actions and is consequently categorized as anomalous. The robot swarm in Fig. 9 contains one random and two antagonistic agents. The behavior of the left antagonistic robot is set to maximum stealth by setting the covariance of its target area to \(\sigma _{\text{ant}}=12.25\). In contrast, the right robot moves aggressively towards its area of interest with \(\sigma _{\text{ant}}=0.25\). While the neural network has no information about these covariances, it correctly categorizes all robots after only a few steps. In Fig. 10, one of the three antagonistic robots has a target close to its starting position and to its optimal coverage position. Consequently, the robot’s behavior is nearly normal, and the robot is not detected. Whereas unfortunate, this is to be expected of an anomalous robot that, by pure chance, behaves almost identically as a normal one given the same situation, and is therefore more a question of observability independent of the detection method employed. Figure 11 visualizes a different kind of difficulty. From the qualitative results, it appears that large distances between vertices lead to an inadequate representation of the connecting edge. Thus, the predicted actions of robots close to long edges are skewed towards the edge of the coverage area, thereby causing some of the benevolent robots to be categorized as anomalous. In order to mitigate this problem, further investigation is required.

Fig. 9
figure 10

Robot swarm containing 2 antagonistic robots with \(\sigma _{\text{ant}} = 0.25\) for the more aggressive robot and \(\sigma _{\text{ant}} = 12.25\) for the rather stealthy robot. Color figure online

Fig. 10
figure 11

In this run, one of the antagonistic robot’s start position allows it to perform near-normal motions towards its target. As a result, it is not detected. Color figure online

Fig. 11
figure 12

Large distances between vertices of a coverage area may lead to actions being shifted towards the edges. As a result, multiple normal robots are mistakenly being categorized as anomal in this run. Color figure online

As an alternative to the approach described so far, as a lightweight approach to learning an action probability, we investigated Gaussian process regression (GPR) [37] in a setting of five robots within a fixed coverage area. The GPR was able to fit into the training data and yielded worse predictions for anomalous actions than for normal actions. However, even though the setting was simplified, the GPR overall was not able to generalize well to unknown states.

Besides that, the spatial dispersion of the robot entities and the vertices of the coverage area can be represented as a graph, therefore we also investigated different graph neural network methods for the state embedding [3841]. However, whereas various approaches learned the local structure of neighbors and the area around a robot entity, the resulting state embedding was rotation invariant. This rotation invariance cannot be reconciled with the robot actions describing a motion in a certain direction within the area’s coordinate system.

Since the robot actions are computed deterministically in the current setting, it might be possible to use a purely threshold-based detection approach. This means that robot entities could compute the optimal action of a neighboring robot and check if the Euclidean distance between optimal action and executed action exceeds a threshold. However, a clear advantage of the probabilistic approach presented in this work is that it results in a very intuitive anomaly recognition based on the probability of an action and simultaneously allows quantifying the uncertainty about an action being anomalous.

Additionally, in many real-world use cases the optimal action used for a threshold-based approach is not known or only partially known. This includes robots that have been trained with reinforcement learning [42], robot swarms that have been deployed within an area that contains obstacles, such that a robot might need to dynamically adapt its behavior to avoid collisions [43], or agents performing complex tasks that are very expensive to compute, such as road traffic [15].

6 Conclusion

This paper presents a method for the reliable detection of antagonistic robots in a deployment setting. The approach uses a neural network comprising an LSTM and a normalizing flow to learn the probability density of robot actions and categorize actions with a low probability value as anomalous. The neural network was trained solely on the benevolent behavior of a robot swarm being deployed in diverse coverage areas, consequently not being limited to the detection of a specific type of anomalous behavior. The presented detection approach was able to correctly identify more than 90 percent of the antagonistic and erroneous robots used for the testing phase. All things considered, the proposed methodology is suitable for alleviating a variety of security challenges for robot swarms that are mentioned in [3], especially for intrusion detection, ensuring swarm mobility and adhering to energy constraints.

We plan to extend our work by validating the applicability of the proposed detection approach in settings with environmental noise, obstacles, and more advanced antagonistic agents. Leveraging the intuitiveness of our probabilistic approach, we plan to introduce a dynamic adaptation of the anomaly threshold that reacts to disturbances caused by environmental influences on multiple entities in the robot swarm. A further interesting point of research is the development of counteractions against antagonistic agents.