Introduction

Multi-agent system is common in nature, such as bird flocks [1], fish schools [2], insect swarms [3], mammal herds [4], and so on. Inspired by natural multi-agent system which achieves complicate functions through collaboration of simple intelligence, the research of artificial multi-agent system [5, 6] is in progress, for instance, terrestrial robot swarm [7], UAV swarm [8], etc. Flocking is one of the most important characteristics of a multi-agent system, which contains three aspects: decentralization, aggregation, and movement matching [9]. The multi-agent system mainly relies on the collaborative mechanism including interactions and dynamics to realize flocking actions [10, 11]. The interaction structure is mainly divided into three kinds including fixed topologies [12], switching topologies [13], and time-varying topologies [14], has been studied over years as well as their consensus conditions [15, 16]. Besides, biological observations indicate that the above three kinds of interaction structures do exist in natural flocks [17], e.g. the fixed topology that exists stably in pigeons [18] and starlings [19], the time-varying topology of cooperating within a specific metric distance [20]. Specifically, jackdaws are found to interact in a certain topology when moving to roosts, while in collective anti-predator mobbing events they interact based on metric distance [20]. Namely, interaction rules switch under different situations. The inference of interaction structures is a crucial problem in revealing the laws of multi-agent system [21, 22]. The characteristics of decentralization, aggregation, and movement matching make agents in the flocking keep similar behavioral patterns with their interacted neighbors. Therefore, the interaction topology of a multi-agent system could be inferred from collective behaviors [23,24,25], by measuring the similarity of multivariate time series such as trajectories and directions in the flocking.

Similarity measurement is the most widely used and fundamental approach of inferring the interaction topology [26], especially in the fields of finance [27], climate [28], neuroscience [29], medicine[30], and trajectory data mining [31, 32]. It is one of the similarity measurement problems to infer the interaction relationship from the flocking time series, and the data has the following three characteristics. First, the data contains information from trajectories, speeds, directions, etc., and thus is multivariate, which suffers the curse of dimension. Second, the high synergy of the flocking results in highly similar trajectories, which requires a high-resolution measurement. Third, the flocking process often generates a large amount of data, which requires the similarity measurement method to be computationally simple, yet stable to outliers and noises without additional processing. Therefore, suitable similarity measurement should be applied to infer the interaction topology from the multivariate time series data. Linear similarity measurements include Euclidean Distance (ED), Pearson correlation coefficient [33], Escoufier’s RV coefficient [34], Spearman’s rank correlation [35], etc. Nonlinear similarity measurements include Mutual Information [36], partial correlation [37], lead-lag correlation [38], Distance Correlation (DC) [39], Dynamic Time Wrapping (DTW) [40], and Sparse Principal Component Analysis (SPCA) [41], etc. However, some of the methods such as the Pearson correlation coefficient and partial correlation cannot deal with multivariate time series. Among the similarity measurements of multivariate time series, ED is the most popular one which has linear complexity, but it performs poorly with high-dimensional data and outliers. Mutual Information (MI) could test the independence between variables, but it may not correctly measure the strength of the relationship [42]. DTW can handle shifting and scaling of the shape, which is a benchmark in the literature [43]. DC detects nonlinear dependence and has potential in deep learning [44]. But DTW, DC, and MI are all computationally expensive. Extended Frobenius norm (Eros) [45] measures the correlation coefficient matrix but neglects the time order of the time series. Meanwhile, there are also many proposals to address the deficiency of the classical similarities [46].

Nonetheless, improving the definition of similarity is still an important open problem. The measurements mentioned above give absolute similarity measurements between time series, while in many multi-agent systems only relative relationships are required. To meet the requirements of multivariate calculation, high resolution, and low computation cost with stability brought by the flocking data, we propose a new kind of global relative similarity named K-similarity which could deal with both univariate and multivariate time series data. It is developed from ONIT clustering framework [47] with high accuracy, update ability, and fault tolerance, which is inherited by K-similarity. More specifically, a global relative similarity named K-similarity is proposed in this paper for interaction structure inference of multi-agent system: (1) cut the multivariate time series into snapshots through specific criteria, and the data is clustered on each snapshot; (2) construct an overlay network by merging consecutive networks generated from the clustering results on each snapshot; (3) transform weights of the overlay network into a similarity matrix. In summary, the main contributions of this paper are as follows:

\(\bullet \) A global relative similarity named K-similarity is proposed. The unique structure endows K-similarity with abilities to process multivariate data with high resolution, low computation cost, and stability towards outliers including both concentrated outliers and random outliers;

\(\bullet \) Potential properties of K-similarity are explored. Various K strategies are available for measuring K-similarity, with which multi-scale information could be detected. Besides, K-similarity has the potential to transform into a strict distance;

\(\bullet \) A Gaussian Mixture Model (GMM)-based threshold strategy is designed for the K-similarity matrix, assuming that the weight distribution is composed by existed links and random effects, which is interpretable and applicable in the practical field;

\(\bullet \) Extensive tests are conducted to compare our K-similarity with four benchmark measurements to verify its superiority. The results show that K-similarity has high accuracy and stability in both simulated and real datasets.

The rest of the article is organized as follows. The next section introduces the definition and K-strategies of K-similarity. The subsequent section describes the characteristics of K-similarity and its threshold strategy followed by which experiments to test K-similarity under various conditions are conducted and compared with other four similarity methods on the simulation model of a multi-agent system. Then the K-similarity and threshold strategy on a real pigeon dataset is applied. The penultimate section clarifies the origin of the ideas in this paper and discusses their risks, restrictions, and possible decisions to minimize them. The final section concludes the paper and proposes directions for future work.

Formulation of K-similarity

Fig. 1
figure 1

Overview of the method

K-similarity is proposed in this paper to measure the global relationship of all agents. An overview of the method is shown in Fig. 1. To begin with, time series of trajectories are divided into snapshots to apply K-means clustering. The clustering results on each snapshot are transformed into a network of k branches. Then, networks of all snapshots are merged together into an overlay network whose adjacency matrix is extracted to form the K-similarity matrix. Finally, the inferred topology is obtained by applying the GMM-base threshold strategy to the similarity matrix. In addition, situations with outliers and different choices of K strategies are discussed.

In this section, we mainly describe the definition of K-similarity and its K strategies. The outliers and threshold method are discussed in section Features of K-similarity. K-similarity names after k-means clustering which is a fundamental step applied on every snapshot. The details are as follows.

Definition

Let \({\mathcal {L}}=\{L_1,L_2,\dots ,L_N\}\) be the time series set of N agents in the multi-agent system, where \(L_i = \{l_0^i, l_1^i, \cdots , l_{\tau }^i \}, i = 1, 2, \cdots , N \), and \( L_i \) is the time series of agent i consisting of observations at time \(t = 0, \cdots , \tau \), where \(l_t^i\) is a multidimensional observation of agent i at time t.

Following the framework in [47], the time series are first divided into snapshots \( S_p, p = 1, 2, \cdots , P\), where \(S_p\) is a snapshot consisting of points \(l_{t_i}^i, i = 1,2,\cdots ,N\) each from a time series \(L_i\). After taking the snapshots, we can cluster the data of nodes on each snapshot by K-means, thereby obtaining \(k_p\) clusters on pth snapshot, namely \(c^p=\{c_1^p,c_2^p,\dots ,c_{k_p}^p \}\), such that \(s_p=\cup _{i=1}^{k_p}c_i^p\), and \(c_i^p \cap c_j^p=\emptyset \), for \(i \ne j \). A network \(g^p\) of the pth snapshot can be constructed by assigning nodes in the same cluster fully connected while others in different groups disconnected, hence forming \(k_p\) number of connected branches. Summing up the adjacency matrix of each network \(g^p\) with assigned weights, we obtain an overlay network with weights

$$\begin{aligned} G_{ij}=\sum _{p=1}^P w_p g_{ij}^{p}, \end{aligned}$$
(1)

where

$$\begin{aligned} g_{ij}^p=\left\{ \begin{aligned} 1,&i\in c_k^p,j\in c_k^p; \\ 0,&others. \end{aligned} \right. \end{aligned}$$
(2)

and \(w_p\) is the weight of the pth network contributing to the summing result.

Without loss of generality, we set \(w_p=1/P\). Let the adjacency matrix G multiplies P. We obtain

$$\begin{aligned} \begin{pmatrix} P &{} &{} &{} PG_{ij}\\ &{} P &{} &{}\\ &{} &{} \ddots &{} \\ PG_{ij} &{} &{} &{} P\\ \end{pmatrix} \end{aligned}$$

where \(G_{ij} \le 1\), and \(PG_{ij} \le P.\)

Then we define the global K-similarity matrix of the N nodes to be

$$\begin{aligned}&S = \frac{1}{P+1} \begin{pmatrix} P+1 &{} &{} &{} PG_{ij}\\ &{} P+1 &{} &{}\\ &{} &{} \ddots &{} \\ PG_{ij} &{} &{} &{} P+1\\ \end{pmatrix}\\&\quad = \begin{pmatrix} 1 &{} &{} &{} \frac{P}{P+1}G_{ij}\\ &{} 1 &{} &{}\\ &{} &{} \ddots &{} \\ \frac{P}{P+1}G_{ij} &{} &{} &{} 1\\ \end{pmatrix} \end{aligned}$$

which guarantees that the self-similarity is 1 for every node and others remain less than 1.

Similarity matrix S defined above conforms to the following properties: 1. All diagonal entries are equal to 1; 2. Non-diagonal elements consist of real numbers in the closed interval [0, 1]; 3. The matrix is symmetric.

The choice of K strategy

The value of k directly determines the number of clusters on each snapshot. Nodes that are clustered into the same group are considered the same, namely the space on each snapshot is separated into k areas, within each area the similarity inside is designated as 1, and the similarity between each area is 0. Therefore, the value of k plays the role of a ruler which decides the scale of similar structures we could detect. The smaller the k is, the smaller the structures we could find. In this sense, we could give several strategies to determine k according to different application scenarios, some of which have a certain similarity scale, while others may have changing similarity scales. These different K-strategies will be tested in section Threshold strategy.

K-fixed strategy: We could appoint a fixed k for every snapshot to detect large-scale similarity structures with larger k, and smaller structures with smaller k. Namely,

$$\begin{aligned} k_p = k_{fixed}. \end{aligned}$$
(3)

The k-fixed strategy has limitations in detecting all scales of similarity, but it is able to provide an accurate measurement at a specific scale.

K-random strategy: The choice of k could variate on each snapshot, measuring multi-levels of similarities. The most simple and effective method is to select k randomly from a set of positive integers. For example,

$$\begin{aligned} P\{k_p=k|k=2,3,\dots ,N-1\}=\frac{1}{N-2}. \end{aligned}$$
(4)

Randomness affects the performance of K-random strategy in short time series, but shows little effects in longer time series.

K-adaptive strategy: Let K-means algorithm determines its own choice of k on each snapshot by gap statistic or Average Silhouette Width (ASW). Take ASW as an example:

$$\begin{aligned} k_p =\underset{k}{arg \ max} \ ASW(S_p). \end{aligned}$$
(5)

K-adaptive strategy only reveals the most conspicuous scale of data on a micro level, which either leads the right way or deviates a lot.

K-times strategy: Different k is able to detect similarity on a different scale. Therefore, applying K-means clustering several times each with a different k simultaneously in a snapshot (e.g. \(k = 2,\dots ,N-1\)) enables us to obtain multi-scale similarity information. It could be combined with both K-fixed and K-random strategies. There are numerous potential combinations of K-fixed and K-times strategies, making it difficult to choose broad-spectrum parameters for general applications. Therefore, in this paper, the K-times strategy is used to enhance the K-random strategy in the following experiments for maximum performance, especially in short time series.

This new definition of similarity contributes to enriching the tools for similarity measurement. First, the snapshot reduces the dimension of the time series, allowing multivariate data to be computed on one snapshot. Second, the choice of k values has the potential to explore similarities at a different scale, making it possible to improve accuracy. Third, the nonlinear map from distance to a network helps exclude the excessive influence of outliers, which will be discussed in the next section.

Features of K-similarity

In this section, we explore three features of K-similarity. Firstly, it could transform into a strictly defined distance, which has the potential to expand its application to various scenarios. Secondly, nonlinear mapping on each snapshot and the unique structure of the network enable K-similarity to keep stable with outliers. Thirdly, the weight distribution under the trivial condition is explored, which inspires the GMM-based threshold strategy.

Transformation into distance

Similarity has a close relationship with distance. The difference between them is that distance is a fundamental and general concept with restricted definition, while the meanings of similarity are much richer and applicable to various scenarios. However, K-similarity defined above also has the potential to transform into the strict distance, expanding its application in data analysis with theoretical support. Different from the common definition of the relationship between similarity and distance which is \(s = 1/(1+d)\), the distance between node i and j from K-similarity is given by

$$\begin{aligned} d_{ij} =1-s_{ij}, \end{aligned}$$
(6)

where \(s_{ij}\) is the element in the global similarity matrix S. The definition of Eq. (6) satisfies the principle of simplicity and meets the requirement of the distance definition:

(1) the identity axiom. \(d_{ij}=0, \) iff \(i=j.\) This axiom holds according to the definition of the matrix S.

(2) the triangle inequality. \(d_{ij}+d_{jk} \ge d_{ik}\).

Proof. It is self-evident that the inequality holds when \(i=j\), or \(j=k\), or \(i=k\). Suppose \(d_{ij}+d_{jk} \ge d_{ik}\) holds when \(i \ne j \ne k\), and \(d_{ij} = 1-s_{ij}\) , then we have \(1-s_{ij}-s_{jk} \ge -s_{ik}\), namely, \(1+s_{ik} \ge s_{ij} +s_{jk}.\) Multiply both sides of the equation by \(P+1\), we obtain \(P+1+PG_{ik} \ge PG_{ij}+PG_{jk}\), where \(PG_{ij}\) is the accumulated times node i and node j are clustered into one group. Suppose node i, j, and k have been simultaneously clustered into one group for m times, then we have \((P-m)+1+(PG_{ik}-m) \ge (PG_{ij}-m)+(PG_{jk}-m)\). Because \(PG_{ij}-m\) and \(PG_{jk}-m\) are separate counts in \(P-m\) times of clustering, \(PG_{ij}-m+PG_{jk}-m\le P-m\) holds. As we know that \(PG_{ik}-m \ge 0\), which means \(P+1+PG_{ik} \ge PG_{ij}+PG_{jk}\) holds, and thus \(d_{ij}+d_{jk} \ge d_{ik}\) holds.

(3) the symmetry axiom. \(d_{ij}=d_{ji}.\) This axiom holds due to the symmetry of matrix S.

Stability against outliers

The nonlinear mapping from the distance in the snapshot to the network enables K-similarity good at processing the outliers of the time series as is described in [47].

According to the definition, the range of the K-similarity matrix is between 0 and 1. We have

$$\begin{aligned} s_{min} = 0,\ s_{max} = 1,\ and\ 0 \le \frac{P}{P+1}G_{ij} <1. \end{aligned}$$
(7)

Similarity \(\frac{P}{P+1}G_{ij}\) means that node i and node j are clustered \(PG_{ij}\) times into one group in the total P times of clustering. Here, P refers to the number of snapshots. Therefore, we have the range of the similarity values under different number of snapshots P in Table 1. The distance \(d_{ij}=1-s_{ij}\) have the same value range as the similarity.

Table 1 Similarity range under different number of snapshots

Assuming we have one outlier data in the UAV swarm time series, which is a value of one node on one snapshot. Then this outlier contributes only \(\frac{1}{P+1}\) to the similarity. As long as the time series is long enough, the effects brought by the outlier will approach 0.

Trivial weight distribution

To distinguish truly meaningful links in the K-similarity matrix, we first look into the similarity matrix derived from a random system. Assume that the time series of the multi-agent system are completely random. That is, in a period of time \(\tau \) we obtain P snapshots, and data on each snapshot are randomly distributed. Furthermore, we assume the data on different snapshots are independent, contrary to the Markov property which should be obeyed in multi-agent systems. Then, any node i is assigned to be connected with node j in a snapshot with probability

$$\begin{aligned} P(w_{ij} = 1) = kP(i \in C_k)P(j \in C_k) = \frac{1}{k}. \end{aligned}$$

Therefore,

$$\begin{aligned} P(g_{ij}=0)=1-\frac{1}{k} = \frac{k-1}{k}. \end{aligned}$$

Now we have P independent snapshots according to the assumption. If we assume to adopt K-fixed strategy, the total weight of the link between i and j obeys the binomial distribution with probability 1/k of success. Namely,

$$\begin{aligned} P\left( G_{ij}=\frac{X}{P+1}\right) = \left( {\begin{array}{c}P\\ X\end{array}}\right) \left( \frac{1}{k}\right) ^X \left( \frac{k-1}{k}\right) ^{P-X}. \end{aligned}$$

If we adopt K-random strategy which is mainly adopted in the following experiments, the expectation of each snapshot is

$$\begin{aligned}&E\left( \frac{1}{k}\right) = \left( \frac{1}{2}+\frac{1}{3}+\cdots +\frac{1}{N-1}\right) \frac{1}{N-2} \\&\quad \approx \frac{1}{N-2}\left( \ln (N-1)+\gamma -1\right) , \end{aligned}$$

where \(\gamma \approx 0.5772\) is the Euler’s constant. The approximation holds when N is large.

From the above derivation, we could assume that a similarity matrix with its weights distributed as a binomial distribution is a trivial matrix that should be excluded from the K-similarity matrix. We know that in K-means clustering \(k\ge 2\), and thus \(1/k\le 1/2\). Therefore, the binomial of random results is right-skewed. Using this assumption, we could design a threshold strategy for the K-similarity matrix.

Threshold strategy

Current methods require a user-defined parameter to control the trade-off between the natural sparsity of interaction topologies and measurement error [27, 48, 49], which has a great effect on the performance. Here, we develop a special threshold method for K-similarity according to its right-skewed distribution of random results. We exclude the random effects from the similarity measurement, thus obtaining the meaningful linkage inference. The K-similarity matrix measures the relative distance in a group, therefore, we consider the agents with relatively high weights are more likely to connect each other. While the ones with low weights are more likely to be generated from random effects.

Fig. 2
figure 2

Schematic diagram of GMM-based threshold strategy

For simplicity and generality, we apply GMM with two Gaussian models to describe the weight distribution of the K-similarity matrix. It is interpretable that one Gaussian model discovers the links that possibly exist, and the other Gaussian model simulates the right-skewed binomial distribution of random effects. GMM is parametrized by the mixture component weights and the component means and variances. For GMM with 2 components, they separately have means of \(\mu _{1}\), \(\mu _{2}\) and variances of \(\sigma _{1}\), \(\sigma _{2}\). The mixture component weights are defined as \(\phi _{1}\) and \(\phi _{2}\), with the constraint that \(\sum _{i=1}^{2} \phi _{i}=1\) so that we have

$$\begin{aligned} \begin{aligned} p(x)&=\sum _{i=1}^{2} \phi _{i} {\mathcal {N}}\left( x \mid \mu _{i}, \sigma _{i}\right) , \\ {\mathcal {N}}\left( x \mid \mu _{i}, \sigma _{i}\right)&=\frac{1}{\sigma _{i} \sqrt{2 \pi }} \exp \left( -\frac{\left( x-\mu _{i}\right) ^{2}}{2 \sigma _{i}^{2}}\right) , \\ \sum _{i=1}^{2} \phi _{i}&=1. \end{aligned} \end{aligned}$$
(8)

The intersection of the two Gaussian models could be defined as the threshold. First, Gaussian mixture model in R package Mclust [50] is applied to cluster the weights into 2 groups. Then, find the nearest values of two Gaussian distributions, each belonging to one distribution, and use their mean as the threshold. An illustration of the GMM-based threshold strategy is presented in Fig. 2. Matrix A refers to the original adjacency matrix of the interaction topology with 1 representing the existed links and 0 for the non-existed ones. The similarity histogram presents the weight distribution from the K-similarity matrix, and the threshold comes from the intersection of the two Gaussian models. Matrix S refers to the K-similarity matrix whose shading represents the link weights. The elements framed in red are the links obtained after the threshold.

Experiments on simulated datasets

To demonstrate the ability of K-similarity to measure the interaction relationship in multi-agent systems, we conduct experiments on simulated datasets. To begin with, we introduce a model of the multi-agent system to produce the flocking data. Then different K strategies and conditions with outliers are tested in K-similarity measurement. Afterward, we apply K-similarity to infer the interaction topology and compare it with the other four similarity measurements. Finally, we test the threshold strategy designed for K-similarity on the simulated datasets.

Cucker–Smale model for multi-agent system

Various models are proposed to describe behaviors of the multi-agent system, including first-order differential equation models such as Boids [51] and Vicsek model [52], and second-order differential equation models such as Cucker–Smale (CS) model [53]. Here, we take the CS model as an example [54]. The dynamical movement of each agent i at time t is

$$\begin{aligned}&\frac{\mathrm{d}}{\mathrm{d}t}x_i (t) = v_i (t), \nonumber \\&\frac{\mathrm{d}}{\mathrm{d}t}v_i (t) = \alpha \sum _{j \ne i}a_{ij}(x)(v_j-v_i)-\nabla U_{x_i}(x)-\nabla V_{x_i}(x),\nonumber \\ \end{aligned}$$
(9)

where \(\alpha \) is a parameter controlling the coupling strength, \(a_{ij}\) is the impact that agent i exerts on agent j, \(\nabla U_{x_i}(x)\) is potential field force forming the shape, and \(\nabla V_{x_i}(x)\) is anti-collision repulsive force.

We assume the UAV agents in connection tend to have speed coordination. Thus, the impact of the interaction relationship is mainly contained in \(a_{ij}\). To be specific,

$$\begin{aligned}&a_{ij} = C_{ij}\frac{1}{(1+\left\| x_i-x_j\right\| ^2)^2} \nonumber \\&\nabla U_{x_i}(x) = (\left\| x_i-{\bar{x}}\right\| )-R)\frac{{\bar{x}}-x_i}{\left\| {\bar{x}}-x_i\right\| }, \nonumber \\&\nabla V_{x_i}(x) = \frac{R}{2}\sin \frac{\pi }{N} \sum _{j=1}^{N}\left[ \frac{x_i-x_j}{\left\| x_i-x_j \right\| }-(x_i-x_j)\right] ,\nonumber \\ \end{aligned}$$
(10)

where \(C_{ij}\) is the element of interaction topology C. \(C_{ij}=1\) for connected individuals i and j. \(C_{ij}=0\) for unconnected i and j.

The potential function as noted above is to form a circle. The system adopts a two-way chain structure as the basic interaction topology to guarantee the connectivity of all agents. Then, some random links generated by the ER network are superimposed on the basic topology to form the interaction topology of the multi-agent system we need to infer. The time delay in this paper is randomly set to be 1–3 units of time, and Gaussian noise is added to the observation data. 2-D space is adopted in the model. The undirected interaction topology of the multi-agent system could be inferred by speed and location similarity in the 2-D space. To maintain meaningful results, the trajectory we use is a segment that does not converge.

In this paper, we discussed three kinds of interaction topologies to explore the applicability of K-similarity in various situations.

Fixed topology: The swarm adopts a fixed topology throughout the whole observation.

Switching topology: The switching topology means that the change of positions happens on a much faster time scale than interaction structures rearrangement so that we could assume a fixed interaction topology between switches.

Time-varying topology: The time-varying topology means that the time scale of the topology change and that of the position change are comparable. Here, we assume the connectivity is determined by the distance between individuals, which is also the most common way to construct the time-varying topology in the modeling of UAV swarms. Namely,

$$\begin{aligned} C_{ij}= \left\{ \begin{aligned} 1,&\Vert x_i-x_j \Vert \le d_c\\ 0,&others. \end{aligned} \right. \end{aligned}$$
(11)

where \(d_c\) is the communication range. The topology changes every time the swarm moves.

K-random strategy

Before applying K-similarity to infer the interaction topology, we should in advance determine the K strategy adopted in different situations. We test the performance of different K strategies in the model described in section Cucker–Smale model for multi-agent system, and obtain results in Fig. 3. Whether there are connections among agents is a highly imbalanced problem in most situations. Therefore, the performance is measured by Area Under Precision–Recall Curve (PR AUC) and Area Under the Receiver Operator Characteristic Curve (ROC AUC) [55], which are commonly used in class imbalance problems. A PR curve is a graph presenting the relationship between precision and recall. A ROC curve is a graph that presents the trade-off between True-Positive Rate (TPR) and False-Positive Rate (FPR). AUC is the area under the curve ranging from 0–1. The higher the curve is on the y-axis, the larger the AUC value and the better the model performance. PR and ROC curves and AUC values are calculated with R package ROCR [56].

We test 6 kinds of K strategies with various window size in Fig. 3, including K-fixed=5, K-fixed=8, K-fixed=N/2, K-adaptive, K-random, and K-random*N/2 (K-times enhanced K-random strategy with repeating \(\left[ N/2 \right] \) times). The window size win ranges from 5 to 300 (the total length of the time series \(P=300\)). The slide is set to be \(slide=\left[ 2/3win\right] \). Results are averaged across the whole time series in one experiment. And the experiments are executed 50 times repeatedly on a chain topology compounding an ER network with a connection probability of 0.02.

Fig. 3
figure 3

Comparison of different K strategies with respect to different window size

Fig. 4
figure 4

Outliers and the influence towards K-similarity measurement. A Locations of concentrated outliers, B locations of random outliers, C outliers test with respect to the proportion of snapshots, D outliers test with respect to the proportion of outliers on a snapshot

Results show that K-random and K-random*N/2 stably outperform K-adaptive and K-fixed strategies which have fewer varieties of k values, indicating that K-random and its enhanced strategies are able to detect multi-scale similarity information with k changing from 2 to \(N-1\). As marked by the red circle in Fig. 3, K-random performs worse than K-random*N/2 when the window size is small, and then gradually climbs up to approach K-random*N/2. While K-random*N/2 stays at the top in most scenarios. It indicates that repetition helps to integrate biased results from random measurements for complete global information. Among the 6 strategies to choose k, the K-random strategy is the simplest yet effective method to process long time series, and K-random*N/2 is the most accurate to process short time series. Therefore, we will use these two strategies as appropriate in the following experiments.

It should be noticed that K-fixed strategies with \(k=5,8,\left[ N/2\right] \) outperform the K-adaptive strategy overall, and even sometimes perform better than the K-random strategy in PR AUC or ROC AUC. Meanwhile, K-fixed strategies with different choices of k bring rather different results. Interestingly, \(K-fixed=N/2\) cannot always win \(K-fixed=5\) and \(K-fixed=8\). We can infer that smaller scales (with larger k) of detection are not always better, since redundant information overwhelms valuable information. The k-adaptive strategy has the poorest performance, indicating that the ASW standard in K-means clustering cannot bring a variety of k values.

Performance with outliers

The influence of outliers is a common problem in data mining. We test K-similarity’s performance over a range of outlier proportions, including the proportion of snapshots containing outliers (\(p_1\)) and the proportion of outliers on each snapshot (\(p_2\)). \(p_2\) is calculated according to the amount of all data values on the snapshot (N nodes \(\times 2\)-dimensional position \(+\) N nodes \(\times 2\)-dimensional velocity). Furthermore, it is assumed that outliers have two types of distribution, random (Fig. 4A) and concentrated (Fig. 4B). The locations of random outliers are generated by randomly selecting a certain proportion of snapshots and a certain proportion of data on these snapshots. Whereas the locations of concentrated outliers are generated by selecting concentrated snapshots and concentrated data on those snapshots, both with random starting points. Selected data are replaced by randomly sampled values from the entire time series to generate outliers. Figure 4A and B only presents the outliers on positions for ease of understanding. The experiments are executed 50 times repeatedly on a chain topology compounding an ER network with a connection probability of 0.02.

Figure 4C and D shows the AUC trends separately with the change of the proportion of snapshots containing outliers (\(p_1\)) and that of outliers on each snapshot (\(p_2\)). It is obvious that the larger the proportion of the outliers, the poorer the performance of the K-similarity. As the total proportion (\(p_1 \times p_2\)) grows to 100%, ROC AUC drops to 0.5 and RP AUC drops to close to 0. Comparing Fig. 4C and D, we could find that the influence of \(p_1\) and \(p_2\) are similar when they are both smaller than 0.4, but situations diverge after they get larger. In random cases, the increase of \(p_2\) does not affect the performance of K-similarity as much as \(p_1\). But it will greatly amplify the effect of \(p_2\) when \(p_1=1\). That is, for random outliers, the proportion of snapshots is the more important factor until it reaches 1, and then the effect of the proportion of outliers on a snapshot starts to show.

In contrast, both \(p_1\) and \(p_2\) contribute to the performance of K-similarity with concentrated outliers. When the fixed \(p_2\) is small, gradually increase the \(p_1\), the performance first decreases and then increases slightly in Fig. 4C. This interesting phenomenon may be explained by mutual cancelation caused by concentrated outliers. Concentrated outliers imply that a node with the wrong velocity usually also has its position wrong. Therefore, applying k-means to this snapshot may give this node completely wrong connections to other nodes. But when most of the snapshots have several completely wrong nodes connected to other nodes, the adverse effects may be eliminated due to the step of averaging links in K-similarity. In general, K-similarity is more stable to concentrated outliers than random outliers. We will compare the performance of K-similarity with benchmark methods under random outliers in section Comparison performance.

Comparison with benchmark methods

The accuracy of K-similarity is tested in the simulated CS model, comparing PR and ROC AUC with four other benchmark methods. This section introduces the four methods and then presents the performances in simulated datasets with and without outliers.

Benchmark methods

We compare K-similarity with the four most commonly used and classical methods in similarity measurement for inference of multi-agent system as listed below.

Euclidean distance (ED): Euclidean Distance between time series X and Y is defined by

$$\begin{aligned} ED(X,Y)=\sqrt{\sum _{t=1}^{N} (x_{t}^i-y_{t}^j)^2}. \end{aligned}$$
(12)

Distance correlation (DC): Distance correlation proposed by [57] is a creative way to detect the dependence. For random variables (XY), denote the joint characteristic function of (XY) by \(f_{(X, Y)}\), and its marginal characteristic functions \(f_{X}\) and \(f_{Y} .\) The distance covariance between X and Y is defined as the root of the following equation:

$$\begin{aligned} d c o v^{2}(X, Y)&=\int _{R^{p+q}}\left| f_{(X, Y)}(t, s)-f_{X}(t) f_{Y}(s)\right| ^{2} \\&\quad \times w(t, s) \mathrm{d} t \mathrm{d} s \text{, } \end{aligned}$$

where p and q are the dimensions of X and Y, respectively, and w(ts) is the weight function given by \(\left( c_{p} c_{q}|t|_{p}^{p+1} |s|_{q}^{q+1}\right) ^{-1}\) with \(c_{p}=\pi ^{(1+p) / 2} \Gamma ((1+p) / 2)\) and \(c_{q}=\pi ^{(1+q)} / 2 \Gamma ((1+q) / 2) .\) By standardizing the distance covariance, the distance correlation can be defined as,

$$\begin{aligned} d c o r(X, Y)=\frac{d c o v(X, Y)}{\sqrt{d c o v(X, X)} \sqrt{d c o v(Y, Y)}}. \end{aligned}$$

We adopt R package energy [58] to calculate DC.

Dynamic time wrapping (DTW): Given time series \(X=\{x_1,x_2,\dots ,x_n\}\) and \(Y=\{y_1,y_2,\dots ,y_n\}\), DTW is calculated by

$$\begin{aligned}&DTW(X,Y) = \left\{ \begin{aligned}&0,&if \ n = 1 \ and \ m = 1; \\&\infty ,&if \ n = 1 \ or \ m = 1; \\&d(H(X),H(Y)) + \min ( DTW(X,R(Y)),DTW(R(X),Y),\\&DTW(R(X),R(Y)))&others. \end{aligned} \right. \end{aligned}$$
(13)

where H(X) means \(x_1\), and R(X) means \(\{x_2,\dots ,x_n\}\). DTW is calculated by R package dtw [59].

Mutual information (MI): The MI of discrete random variables X and Y is defined as

$$\begin{aligned} I(X, Y)=\sum _{x \in X, y \in Y} p(x, y) \log \frac{p(x, y)}{p_{X}(x) p_{Y}(y)}, \end{aligned}$$

where \(p_{X}, p_{Y}\) are the marginal probability mass functions of X and Y, respectively, and p is the joint probability mass function of (XY). For continuous random variables, the MI is defined as

$$\begin{aligned} I(X, Y)=\int _{X} \int _{Y} p(x, y) \log \frac{p(x, y)}{p_{X}(x) p_{Y}(y)} \mathrm{d}x \mathrm{d}y, \end{aligned}$$

where p is the joint probability density function of (XY) and \(p_{X}\) and \(p_{Y}\) are the probability density functions of X and Y, respectively. MI is calculated by R package infotheo [60].

Comparison performance

We compare the PR AUC and ROC AUC performance of our method with four benchmark methods under both fixed topology and time-varying topology. The cases with and without outliers are also discussed, where it is assumed that 10% of random outliers exist. Furthermore, the inference of switching topologies with 3 different fixed topologies at each stage is tested online. With the above settings, we comprehensively compare the accuracy and stability of each method under 50 times repeated experiments.

Fig. 5
figure 5

Comparison of methods in fixed, switching, and time-varying topology. A Comparison with four benchmark methods in fixed topology, B online inference in switching topology, C comparison with four benchmark methods in fixed topology

Figure 5A presents the results of inferring the fixed topologies with different densities. As shown in Fig. 3, the K-random strategy is as good as K-random*N/2 for long time series and requires minimal computation. Therefore, we apply K-random here to compare with other benchmark methods. The results show an overall downward trend with the increase in topological density. It can be explained by increased synergy with increasing density, resulting in indistinguishable trajectories. ROC AUC of all methods gradually decreases to 0.5, which indicates the failure of the methods when reaching a certain density. However, most of the PR AUC curves begin to rise as density increases, while the ROC AUC curves continue to decline. The reason is that ROC AUC under random sampling has the robustness of uneven distribution of positive and negative samples, while PR AUC does not. A growing density means a growing number of positive samples, and the calculation formula leads to an increase in the PR AUC value.

Fig. 6
figure 6

Performance of GMM-based threshold strategy

Figure 5B shows the effectiveness of K-similarity in switching topology. We apply K-random and K-random*N/2 with a window size of 5 to infer topology online. Since the switching time point is assumed to be unknown in advance, a large window size may increase the inference error near the switching point. The results show that in an online inference of relatively stable topologies, K-random*N/2 is more accurate and robust than the K-random strategy. The fluctuation of velocity variance shows that when the velocity fluctuates too sharply or nearly converges, it is difficult to infer the network. In contrast, it is suitable to apply K-similarity when it maintains certain stability without full convergence, which is also the main feature of collective motion, especially for UAV swarm.

Figure 5C reveals patterns of different methods inferring time-varying topologies with and without outliers. K-similarity methods show consistently high accuracy under both conditions. While distance-based methods like ED and DTW perform perfectly for ideal simulation data without outliers but decline a lot with outliers existing.

Comparing Fig. 5A and C, K-similarity outperforms the four benchmark methods in most conditions. Especially when adding 10% random outliers, the performance of all benchmark methods has varying degrees of decline and fluctuation except DC which remains stably low. However, K-similarity remains the top performer with only a slight decline, showing strong stability against outliers. Comparing Fig. 5B and C, K-random and K-random*N/2 perform similarly in time-varying topology, while differ in switching topology. It demonstrates that the power of repetition does not contribute to accuracy when topology changes fast.

GMM-based threshold for K-similarity

GMM-based threshold strategy models the weight distribution of the K-similarity matrix by two Gaussian distributions. One distribution with higher weights is considered as the potential links, while the other with lower weights is regarded as the results of random effects. The intersection point between two distributions is designated as the threshold. We conduct experiments to illustrate the effectiveness of the threshold to the K-similarity matrix as is shown in Fig. 6. PR AUC, ROC AUC, and Adjust Rand Index (ARI) are applied to measure the performance of the threshold strategy. The value range of ARI is \([-1, 1]\). The larger the value, the better the performance. Comparison with other methods after the threshold is not presented, as the GMM-based threshold strategy is specially developed for the K-similarity matrix. The experiments are executed 50 times repeatedly on a chain topology compounding an ER network with a connection probability of 0.01.

In Fig. 6, the orange line represents results from the original K-similarity matrix, and the black one represents that from the matrix after the threshold. In Plot C, the orange points represent the best ARI value iterating over all thresholds in a K-similarity matrix, while black points represent the ARI value of the threshold. The results indicate that the threshold strategy applied in the K-similarity matrix performs well with respect to PR AUC, ROC AUC, and ARI. Specifically, the matrix after the threshold still holds most of the interaction information, fulfilling a trade-off between sufficient information and measurement error.

Experiment on real datasets

In this section, we apply K-similarity and GMM-based threshold strategy in two real datasets to test the efficiency of our method.

Pigeons

It is indicated that interaction topology exists in pigeons, starlings, and other bird flocks. Agents in bird flocks follow simple interaction rules such as long-range attraction, short-range repulsion, and intermediate-range alignment [22]. Therefore, we could infer their interaction relationship by measuring their behavioral similarity. To reveal the effectiveness of K-similarity towards general multi-agent systems, we experiment on a real pigeon dataset from a cluster of homing pigeons in [38] to infer the interaction topology among agents. Research [38] measures the time delay to obtain a leadership hierarchy which is transformed into an undirected interaction topology in this paper as is shown in Plot A of Fig. 7. The GPS Data adopted in this paper is from the experiment of flock AB, site 2, release 7, with both experienced and inexperienced pigeons in the group. The flying distance is around 10.11 km. The name of each node is the code of every pigeon. The undirected interaction topology is regarded as the true interaction topology of the pigeons, which remains for us to infer. Plot B of Fig. 7 shows the flocking trajectories, most of which are overlapped with each other due to the long range of tracking. The x-axis is the latitude, and the y-axis is the longitude.

Fig. 7
figure 7

Topologies and trajectories of the pigeons. A Interaction topology need to infer, B trajectories of the nine pigeons, C topology inferred from K-similarity with GMM-based threshold strategy

Here, we mainly adopt position (latitude and longitude) and speed (magnitude and direction) to infer the interaction topology of the pigeon flock. First, agents with closer interactions tend to communicate more frequently and are therefore more likely to have similar speeds than other agents. Second, whether a group adopts broadcasting or an energy-saving strategy to maintain its interaction, it can be classified as a kind of location preference. That is, agents in a group tend to link to agents that are geographically close to them. We infer the interaction topology by two steps: similarity measurement and threshold. To begin with, K-similarity is calculated to measure the global relationship of all agents and compare the K-similarity with the other four benchmark similarity measurements mentioned above. Then we select the possible links according to GMM-based threshold strategy in section Threshold strategy, thereby obtaining the inferred interactions topology. All data are z-scored to avoid dimensional effects.

Fig. 8
figure 8

Topology inference of pigeon flocking. A PR AUC of different methods, B ROC AUC of different methods, C GMM-based threshold of K-similarity matrix

Figure 8 presents the comparison between K-similarity and the four other methods concerning PR and ROC AUC, as well as the performance of GMM-based threshold on K-similarity. It is shown that K-similarity has an evident advantage in accuracy, outperforming the second-best method MI by 0.05 PR AUC and 0.1 ROC AUC. Unlike the performance on simulated data in Fig. 5, MI shows better performance than DC and DTW, similar to the simulated experiment with outliers. GMM-based threshold strategy also gives a clear division of the weight distribution in plot C of Fig. 8. The solid line represents the threshold with the highest ARI, and the dotted line represents the threshold determined by-GMM based method. The inferred topology after the threshold is shown in plot C of Fig. 7. Thick lines indicate stronger interaction and thin lines weaker interaction. As we can see that most of the important links are recovered, and some links are incorrectly inferred with weaker weights than others. Experiments on real datasets reveal the generality and adaptability of K-similarity.

Temperature

To demonstrate the applicability of our method to the more general cases, we apply K-similarity and GMM-based threshold on a real datasetFootnote 1 consisting of 5844 daily air temperature measurements in the US from 2000 to 2015 [61]. The temperature data we use here are univariate. The dataset is also analyzed in [62, 63]. K-similarity is applied to infer the relationship of 49 states with temperature data measured on the surface by degrees Kelvin (short for degK in Fig. 9). The temperature patches are drawn from one of 5844 days.

Fig. 9
figure 9

Topologies inferred from temperature time series

We test the K-similarity with different k values. Results in Fig. 9 illustrate that different k values detect different scales of information. The larger the k, the closer and smaller the relationship it detects, and the more skewed the random effect is. When \(k=2\) the binomial distribution of the random effects is similar to a Gaussian distribution with the mean of 1/2. Therefore, the random effects overwhelm useful information when \(k=2\). The result of \(k=5\) best matches the information we normally receive. It does not mean that the performance of \(k=10\) and \(k=20\) are poor, but that the detection scale of larger k is small, and the detected links are more closely related. Overall, the inferred topologies tend to link neighboring states, consistent with common sense that neighboring states tend to have similar temperatures.

In this section, we test our method on a homing pigeon dataset with multivariate time series and a temperature dataset with univariate time series. The results show that our method could measure similarity accurately and identify important connections, which may contribute to similarity measurement and topology inference in various application fields.

Discussion

In this section, we clarify the origin of the ideas in this paper and discuss their risks, restrictions, and possible decisions to minimize them. The idea of developing K-similarity to infer the topology of the multi-agent system is inspired by a previous work we have done [47] which integrates the k-means clustering results on each snapshot into a matrix for time series clustering. The integrated matrix contains global relationships among individuals, which triggers us to explore the possibility of using it for topology inference. Therefore, we further develop the clustering method in [47] into K-similarity for global relative similarity measurement of multi-agent systems.

There are various models to construct multi-agent systems. We mainly concentrate on flocking which is a fundamental behavior of multi-agent systems. The flocking behaviors are modeled by some classic models including Vicsek, Boids, and C-S models, etc. In the real-world control and natural multi-agent systems, the movement is controlled by acceleration. Hence, we use the second-order C-S model which models acceleration directly to describe the multi-agent system behaviors. However, the C–S model is a set of highly abstract mathematical functions which does not consider the specific control problems of multi-agent systems. Therefore, the data generated by the C–S model is ideal compared to the real-world data. To better test K-similarity in less ideal situations, we apply it to the C–S model with outliers and real datasets, demonstrating the generality of K-similarity. In addition to flocking behavior, multi-agent systems also have a variety of behavior patterns. K-similarity cannot solve the topology inference problem of all kinds of multi-agent systems. It aims at the data of consensus or synchronized dynamic network systems.

K-similarity is essentially evolved from ED, and therefore, has similar properties. It only compares data with the same timestamp, which ignores the time delay and other complex factors such as positive and negative feedback. We do not compare our method with other complex topology inference techniques, for K-similarity is developed as a new kind of similarity or distance measurement. Therefore, we compare K-similarity with ED to check if our method improves the performance, DTW which is designed for time series with time delay, DC which measures correlations between multivariate time series, and MI which is the most common tool to detect distributional information for interaction inference. Therefore, we use ED, DTW, DC, and MI as benchmark methods which also have matured R packages for standard computation. The four benchmark methods each set a baseline for detecting different types of information to infer the interactions. However, the property of K-similarity to detect global relative similarity also limits its use only in systems with known time series. The dynamic entry and exit of the new upcoming time series will be illustrated in future work in the section, which will help K-similarity be applied to more scenarios as a general similarity or distance measure.

We use K-similarity to measure the strength of interactions between individuals, obtaining a weight matrix of interactions. Without prior knowledge such as network density, a criterion needs to be developed for selecting important edges. The general practice is to take the threshold. It is necessary to carefully examine the characteristics of the matrix obtained by K-similarity and to develop a method for determining the threshold. We theoretically discuss the weight distribution of the matrix obtained by K-similarity from random systems and find that it is close to a right-skewed binomial distribution with the success probability of \(1/k \le 1/2\). We assume that truly existing interactions have large weights in the K-similarity matrix, which is left-skewed. The assumption is consistent with the experimental results which are mostly bimodal. Therefore, we developed the GMM method to distinguish between random factors and real interactions. The risk is that when the value of k is relatively small such as 2 and P is large, the binomial distribution caused by random factors is closer to a Gaussian distribution with the mean of 0.5. It will result in the undistinguishable weight distribution of random effects and real interaction. In the temperature experiment, we do find this feature when \(k=2\). To minimize this risk, it is recommended to use larger \(k\ge 4\) when k is fixed.

Finally, we illustrate the effectiveness of our method with two real datasets. For the natural flocking system, no ground truth topology can be used as the standard answer. Therefore, we use a pigeon dataset with a known topology that is also inferred from tracking data by a different method in [38]. The pigeon data agrees well with the flocking behavior defined in this paper and thus is helpful demonstrate the validity of K-similarity and its threshold strategy. For the artificial multi-agent system, there does exist a real topology. But we have not conducted real experiments in this area, nor have we obtained the data from other research. To demonstrate the generalizability of our algorithm, we apply K-similarity on a temperature dataset to infer topology among states by measuring the similarity of temperature time series. The results show that the inferred topologies are highly correlated with geographic locations as well as the climate zones, which agrees with common sense.

Conclusion

In this paper, we propose a new method named K-similarity to measure the global relative relationship in a multi-agent system. K-similarity is a networked synthesis of clustering results from each snapshot. Multivariate time series are split into snapshots where K-means clustering is implemented with various K strategies available. Then, the clustering results on each snapshot are nonlinearly mapped into a network and together form a synthetic network. The link weight of the network contains the global relative similarity between agents, i.e. K-similarity. It has the following properties in the analysis:

\(\bullet \) K-similarity can be converted to a strict distance, and therefore has the potential to be applied in a variety of scenarios;

\(\bullet \) It is stable to outliers including random and concentrated outliers since the mapping rules limit the influence of outliers to a normal range;

\(\bullet \) We discuss the weight distribution of a trivial case with completely random trajectories, and derive a GMM-based threshold strategy to select informative links from the K-similarity matrix;

\(\bullet \) Different K strategies can detect similarities at different scales, which provides a possible tool for building higher-order networks and multi-scale detection.

Experiments are executed on both simulated and real datasets to verify its superiority over the other four benchmark similarity measurements and the effectiveness of the GMM-based threshold. K-similarity provides a framework to enrich the definition of similarity in a flexible paradigm. The current method aims at systems with a constant number of time series. In the future, a dynamic entry and exit mechanism for time series is to be designed. Assuming that the new coming time series have a negligible effect on the overall distribution, the time series could be added by joining the closest cluster on each snapshot, and be removed simply by deleting its row and column in the similarity matrix. This mechanism will help K-similarity be applied to more scenarios as a general similarity or distance measure.