A global relative similarity for inferring interactions of multi-agent systems

Gu, Kongjing; Duan, Xiaojun; Qi, Mingze; Yan, Liang

doi:10.1007/s40747-022-00877-5

A global relative similarity for inferring interactions of multi-agent systems

Original Article
Open access
Published: 04 October 2022

Volume 9, pages 1671–1686, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

A global relative similarity for inferring interactions of multi-agent systems

Download PDF

Kongjing Gu¹,
Xiaojun Duan ORCID: orcid.org/0000-0002-2999-0541¹,
Mingze Qi¹ &
…
Liang Yan¹

1298 Accesses
Explore all metrics

Abstract

Interactions and dynamics are critical mechanisms for multi-agent systems to achieve complex intelligence through the cooperation of simple agents. Yet, inferring interactions of the multi-agent system is still a common and open problem. A new method named K-similarity is designed to measure the global relative similarities for inferring the interactions among multiple agents in this paper. K-similarity is defined to be a synthetic measure of relative similarity on each observation snapshot where regular distances are nonlinearly mapped into a network. Therefore, K-similarity contains the global relative similarity information, and the interaction topology can be inferred from the similarity matrix. It has the potential to transform into distance strictly and detect multi-scale information with various K strategies. Therefore, K-similarity can be flexibly applied to various synchronized dynamical systems with fixed, switching, and time-varying topologies. In the experiments, K-similarity outperforms four benchmark methods in accuracy in most scenarios on both simulated and real datasets, and shows strong stability towards outliers. Furthermore, according to the property of K-similarity we develop a Gaussian Mixture Model (GMM)-based threshold to select probable interactions. Our method contributes to not only similarity measurement in multi-agent systems, but also other global similarity measurement problems.

Fixed Time Synchronization of Complex Dynamical Networks

Design of Heterogeneous Multi-agent System for Distributed Computation

Multi-Agent Control: A Graph-Theoretic Perspective

Article 26 October 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Multi-agent system is common in nature, such as bird flocks [1], fish schools [2], insect swarms [3], mammal herds [4], and so on. Inspired by natural multi-agent system which achieves complicate functions through collaboration of simple intelligence, the research of artificial multi-agent system [5, 6] is in progress, for instance, terrestrial robot swarm [7], UAV swarm [8], etc. Flocking is one of the most important characteristics of a multi-agent system, which contains three aspects: decentralization, aggregation, and movement matching [9]. The multi-agent system mainly relies on the collaborative mechanism including interactions and dynamics to realize flocking actions [10, 11]. The interaction structure is mainly divided into three kinds including fixed topologies [12], switching topologies [13], and time-varying topologies [14], has been studied over years as well as their consensus conditions [15, 16]. Besides, biological observations indicate that the above three kinds of interaction structures do exist in natural flocks [17], e.g. the fixed topology that exists stably in pigeons [18] and starlings [19], the time-varying topology of cooperating within a specific metric distance [20]. Specifically, jackdaws are found to interact in a certain topology when moving to roosts, while in collective anti-predator mobbing events they interact based on metric distance [20]. Namely, interaction rules switch under different situations. The inference of interaction structures is a crucial problem in revealing the laws of multi-agent system [21, 22]. The characteristics of decentralization, aggregation, and movement matching make agents in the flocking keep similar behavioral patterns with their interacted neighbors. Therefore, the interaction topology of a multi-agent system could be inferred from collective behaviors [23,24,25], by measuring the similarity of multivariate time series such as trajectories and directions in the flocking.

Similarity measurement is the most widely used and fundamental approach of inferring the interaction topology [26], especially in the fields of finance [27], climate [28], neuroscience [29], medicine[30], and trajectory data mining [31, 32]. It is one of the similarity measurement problems to infer the interaction relationship from the flocking time series, and the data has the following three characteristics. First, the data contains information from trajectories, speeds, directions, etc., and thus is multivariate, which suffers the curse of dimension. Second, the high synergy of the flocking results in highly similar trajectories, which requires a high-resolution measurement. Third, the flocking process often generates a large amount of data, which requires the similarity measurement method to be computationally simple, yet stable to outliers and noises without additional processing. Therefore, suitable similarity measurement should be applied to infer the interaction topology from the multivariate time series data. Linear similarity measurements include Euclidean Distance (ED), Pearson correlation coefficient [33], Escoufier’s RV coefficient [34], Spearman’s rank correlation [35], etc. Nonlinear similarity measurements include Mutual Information [36], partial correlation [37], lead-lag correlation [38], Distance Correlation (DC) [39], Dynamic Time Wrapping (DTW) [40], and Sparse Principal Component Analysis (SPCA) [41], etc. However, some of the methods such as the Pearson correlation coefficient and partial correlation cannot deal with multivariate time series. Among the similarity measurements of multivariate time series, ED is the most popular one which has linear complexity, but it performs poorly with high-dimensional data and outliers. Mutual Information (MI) could test the independence between variables, but it may not correctly measure the strength of the relationship [42]. DTW can handle shifting and scaling of the shape, which is a benchmark in the literature [43]. DC detects nonlinear dependence and has potential in deep learning [44]. But DTW, DC, and MI are all computationally expensive. Extended Frobenius norm (Eros) [45] measures the correlation coefficient matrix but neglects the time order of the time series. Meanwhile, there are also many proposals to address the deficiency of the classical similarities [46].

Nonetheless, improving the definition of similarity is still an important open problem. The measurements mentioned above give absolute similarity measurements between time series, while in many multi-agent systems only relative relationships are required. To meet the requirements of multivariate calculation, high resolution, and low computation cost with stability brought by the flocking data, we propose a new kind of global relative similarity named K-similarity which could deal with both univariate and multivariate time series data. It is developed from ONIT clustering framework [47] with high accuracy, update ability, and fault tolerance, which is inherited by K-similarity. More specifically, a global relative similarity named K-similarity is proposed in this paper for interaction structure inference of multi-agent system: (1) cut the multivariate time series into snapshots through specific criteria, and the data is clustered on each snapshot; (2) construct an overlay network by merging consecutive networks generated from the clustering results on each snapshot; (3) transform weights of the overlay network into a similarity matrix. In summary, the main contributions of this paper are as follows:

$\bullet $ A global relative similarity named K-similarity is proposed. The unique structure endows K-similarity with abilities to process multivariate data with high resolution, low computation cost, and stability towards outliers including both concentrated outliers and random outliers;

$\bullet $ Potential properties of K-similarity are explored. Various K strategies are available for measuring K-similarity, with which multi-scale information could be detected. Besides, K-similarity has the potential to transform into a strict distance;

$\bullet $ A Gaussian Mixture Model (GMM)-based threshold strategy is designed for the K-similarity matrix, assuming that the weight distribution is composed by existed links and random effects, which is interpretable and applicable in the practical field;

$\bullet $ Extensive tests are conducted to compare our K-similarity with four benchmark measurements to verify its superiority. The results show that K-similarity has high accuracy and stability in both simulated and real datasets.

The rest of the article is organized as follows. The next section introduces the definition and K-strategies of K-similarity. The subsequent section describes the characteristics of K-similarity and its threshold strategy followed by which experiments to test K-similarity under various conditions are conducted and compared with other four similarity methods on the simulation model of a multi-agent system. Then the K-similarity and threshold strategy on a real pigeon dataset is applied. The penultimate section clarifies the origin of the ideas in this paper and discusses their risks, restrictions, and possible decisions to minimize them. The final section concludes the paper and proposes directions for future work.

Formulation of K-similarity

K-similarity is proposed in this paper to measure the global relationship of all agents. An overview of the method is shown in Fig. 1. To begin with, time series of trajectories are divided into snapshots to apply K-means clustering. The clustering results on each snapshot are transformed into a network of k branches. Then, networks of all snapshots are merged together into an overlay network whose adjacency matrix is extracted to form the K-similarity matrix. Finally, the inferred topology is obtained by applying the GMM-base threshold strategy to the similarity matrix. In addition, situations with outliers and different choices of K strategies are discussed.

In this section, we mainly describe the definition of K-similarity and its K strategies. The outliers and threshold method are discussed in section Features of K-similarity. K-similarity names after k-means clustering which is a fundamental step applied on every snapshot. The details are as follows.

Definition

Let ${\mathcal {L}}=\{L_1,L_2,\dots ,L_N\}$ be the time series set of N agents in the multi-agent system, where $L_i = \{l_0^i, l_1^i, \cdots , l_{\tau }^i \}, i = 1, 2, \cdots , N $, and $ L_i $ is the time series of agent i consisting of observations at time $t = 0, \cdots , \tau $, where $l_t^i$ is a multidimensional observation of agent i at time t.

Following the framework in [47], the time series are first divided into snapshots $ S_p, p = 1, 2, \cdots , P$, where $S_p$ is a snapshot consisting of points $l_{t_i}^i, i = 1,2,\cdots ,N$ each from a time series $L_i$. After taking the snapshots, we can cluster the data of nodes on each snapshot by K-means, thereby obtaining $k_p$ clusters on pth snapshot, namely $c^p=\{c_1^p,c_2^p,\dots ,c_{k_p}^p \}$, such that $s_p=\cup _{i=1}^{k_p}c_i^p$, and $c_i^p \cap c_j^p=\emptyset $, for $i \ne j $. A network $g^p$ of the pth snapshot can be constructed by assigning nodes in the same cluster fully connected while others in different groups disconnected, hence forming $k_p$ number of connected branches. Summing up the adjacency matrix of each network $g^p$ with assigned weights, we obtain an overlay network with weights

$$\begin{aligned} G_{ij}=\sum _{p=1}^P w_p g_{ij}^{p}, \end{aligned}$$

(1)

where

$$\begin{aligned} g_{ij}^p=\left\{ \begin{aligned} 1,&i\in c_k^p,j\in c_k^p; \\ 0,&others. \end{aligned} \right. \end{aligned}$$

(2)

and $w_p$ is the weight of the pth network contributing to the summing result.

Without loss of generality, we set $w_p=1/P$. Let the adjacency matrix G multiplies P. We obtain

$$\begin{aligned} \begin{pmatrix} P &{} &{} &{} PG_{ij}\\ &{} P &{} &{}\\ &{} &{} \ddots &{} \\ PG_{ij} &{} &{} &{} P\\ \end{pmatrix} \end{aligned}$$

where $G_{ij} \le 1$, and $PG_{ij} \le P.$

Then we define the global K-similarity matrix of the N nodes to be

$$\begin{aligned}&S = \frac{1}{P+1} \begin{pmatrix} P+1 &{} &{} &{} PG_{ij}\\ &{} P+1 &{} &{}\\ &{} &{} \ddots &{} \\ PG_{ij} &{} &{} &{} P+1\\ \end{pmatrix}\\&\quad = \begin{pmatrix} 1 &{} &{} &{} \frac{P}{P+1}G_{ij}\\ &{} 1 &{} &{}\\ &{} &{} \ddots &{} \\ \frac{P}{P+1}G_{ij} &{} &{} &{} 1\\ \end{pmatrix} \end{aligned}$$

which guarantees that the self-similarity is 1 for every node and others remain less than 1.

Similarity matrix S defined above conforms to the following properties: 1. All diagonal entries are equal to 1; 2. Non-diagonal elements consist of real numbers in the closed interval [0, 1]; 3. The matrix is symmetric.

The choice of K strategy

The value of k directly determines the number of clusters on each snapshot. Nodes that are clustered into the same group are considered the same, namely the space on each snapshot is separated into k areas, within each area the similarity inside is designated as 1, and the similarity between each area is 0. Therefore, the value of k plays the role of a ruler which decides the scale of similar structures we could detect. The smaller the k is, the smaller the structures we could find. In this sense, we could give several strategies to determine k according to different application scenarios, some of which have a certain similarity scale, while others may have changing similarity scales. These different K-strategies will be tested in section Threshold strategy.

K-fixed strategy: We could appoint a fixed k for every snapshot to detect large-scale similarity structures with larger k, and smaller structures with smaller k. Namely,

$$\begin{aligned} k_p = k_{fixed}. \end{aligned}$$

(3)

The k-fixed strategy has limitations in detecting all scales of similarity, but it is able to provide an accurate measurement at a specific scale.

K-random strategy: The choice of k could variate on each snapshot, measuring multi-levels of similarities. The most simple and effective method is to select k randomly from a set of positive integers. For example,

$$\begin{aligned} P\{k_p=k|k=2,3,\dots ,N-1\}=\frac{1}{N-2}. \end{aligned}$$

(4)

Randomness affects the performance of K-random strategy in short time series, but shows little effects in longer time series.

K-adaptive strategy: Let K-means algorithm determines its own choice of k on each snapshot by gap statistic or Average Silhouette Width (ASW). Take ASW as an example:

$$\begin{aligned} k_p =\underset{k}{arg \ max} \ ASW(S_p). \end{aligned}$$

(5)

K-adaptive strategy only reveals the most conspicuous scale of data on a micro level, which either leads the right way or deviates a lot.

K-times strategy: Different k is able to detect similarity on a different scale. Therefore, applying K-means clustering several times each with a different k simultaneously in a snapshot (e.g. $k = 2,\dots ,N-1$) enables us to obtain multi-scale similarity information. It could be combined with both K-fixed and K-random strategies. There are numerous potential combinations of K-fixed and K-times strategies, making it difficult to choose broad-spectrum parameters for general applications. Therefore, in this paper, the K-times strategy is used to enhance the K-random strategy in the following experiments for maximum performance, especially in short time series.

This new definition of similarity contributes to enriching the tools for similarity measurement. First, the snapshot reduces the dimension of the time series, allowing multivariate data to be computed on one snapshot. Second, the choice of k values has the potential to explore similarities at a different scale, making it possible to improve accuracy. Third, the nonlinear map from distance to a network helps exclude the excessive influence of outliers, which will be discussed in the next section.

Features of K-similarity

In this section, we explore three features of K-similarity. Firstly, it could transform into a strictly defined distance, which has the potential to expand its application to various scenarios. Secondly, nonlinear mapping on each snapshot and the unique structure of the network enable K-similarity to keep stable with outliers. Thirdly, the weight distribution under the trivial condition is explored, which inspires the GMM-based threshold strategy.

Transformation into distance

Similarity has a close relationship with distance. The difference between them is that distance is a fundamental and general concept with restricted definition, while the meanings of similarity are much richer and applicable to various scenarios. However, K-similarity defined above also has the potential to transform into the strict distance, expanding its application in data analysis with theoretical support. Different from the common definition of the relationship between similarity and distance which is $s = 1/(1+d)$, the distance between node i and j from K-similarity is given by

$$\begin{aligned} d_{ij} =1-s_{ij}, \end{aligned}$$

(6)

where $s_{ij}$ is the element in the global similarity matrix S. The definition of Eq. (6) satisfies the principle of simplicity and meets the requirement of the distance definition:

(1) the identity axiom. $d_{ij}=0, $ iff $i=j.$ This axiom holds according to the definition of the matrix S.

(2) the triangle inequality. $d_{ij}+d_{jk} \ge d_{ik}$.

Proof. It is self-evident that the inequality holds when $i=j$, or $j=k$, or $i=k$. Suppose $d_{ij}+d_{jk} \ge d_{ik}$ holds when $i \ne j \ne k$, and $d_{ij} = 1-s_{ij}$ , then we have $1-s_{ij}-s_{jk} \ge -s_{ik}$, namely, $1+s_{ik} \ge s_{ij} +s_{jk}.$ Multiply both sides of the equation by $P+1$, we obtain $P+1+PG_{ik} \ge PG_{ij}+PG_{jk}$, where $PG_{ij}$ is the accumulated times node i and node j are clustered into one group. Suppose node i, j, and k have been simultaneously clustered into one group for m times, then we have $(P-m)+1+(PG_{ik}-m) \ge (PG_{ij}-m)+(PG_{jk}-m)$. Because $PG_{ij}-m$ and $PG_{jk}-m$ are separate counts in $P-m$ times of clustering, $PG_{ij}-m+PG_{jk}-m\le P-m$ holds. As we know that $PG_{ik}-m \ge 0$, which means $P+1+PG_{ik} \ge PG_{ij}+PG_{jk}$ holds, and thus $d_{ij}+d_{jk} \ge d_{ik}$ holds.

(3) the symmetry axiom. $d_{ij}=d_{ji}.$ This axiom holds due to the symmetry of matrix S.

Stability against outliers

The nonlinear mapping from the distance in the snapshot to the network enables K-similarity good at processing the outliers of the time series as is described in [47].

According to the definition, the range of the K-similarity matrix is between 0 and 1. We have

$$\begin{aligned} s_{min} = 0,\ s_{max} = 1,\ and\ 0 \le \frac{P}{P+1}G_{ij} <1. \end{aligned}$$

(7)

Similarity $\frac{P}{P+1}G_{ij}$ means that node i and node j are clustered $PG_{ij}$ times into one group in the total P times of clustering. Here, P refers to the number of snapshots. Therefore, we have the range of the similarity values under different number of snapshots P in Table 1. The distance $d_{ij}=1-s_{ij}$ have the same value range as the similarity.

Table 1 Similarity range under different number of snapshots

Full size table

Assuming we have one outlier data in the UAV swarm time series, which is a value of one node on one snapshot. Then this outlier contributes only $\frac{1}{P+1}$ to the similarity. As long as the time series is long enough, the effects brought by the outlier will approach 0.

Trivial weight distribution

To distinguish truly meaningful links in the K-similarity matrix, we first look into the similarity matrix derived from a random system. Assume that the time series of the multi-agent system are completely random. That is, in a period of time $\tau $ we obtain P snapshots, and data on each snapshot are randomly distributed. Furthermore, we assume the data on different snapshots are independent, contrary to the Markov property which should be obeyed in multi-agent systems. Then, any node i is assigned to be connected with node j in a snapshot with probability

$$\begin{aligned} P(w_{ij} = 1) = kP(i \in C_k)P(j \in C_k) = \frac{1}{k}. \end{aligned}$$

Therefore,

$$\begin{aligned} P(g_{ij}=0)=1-\frac{1}{k} = \frac{k-1}{k}. \end{aligned}$$

Now we have P independent snapshots according to the assumption. If we assume to adopt K-fixed strategy, the total weight of the link between i and j obeys the binomial distribution with probability 1/k of success. Namely,

$$\begin{aligned} P\left( G_{ij}=\frac{X}{P+1}\right) = \left( {\begin{array}{c}P\\ X\end{array}}\right) \left( \frac{1}{k}\right) ^X \left( \frac{k-1}{k}\right) ^{P-X}. \end{aligned}$$

If we adopt K-random strategy which is mainly adopted in the following experiments, the expectation of each snapshot is

$$\begin{aligned}&E\left( \frac{1}{k}\right) = \left( \frac{1}{2}+\frac{1}{3}+\cdots +\frac{1}{N-1}\right) \frac{1}{N-2} \\&\quad \approx \frac{1}{N-2}\left( \ln (N-1)+\gamma -1\right) , \end{aligned}$$

where $\gamma \approx 0.5772$ is the Euler’s constant. The approximation holds when N is large.

From the above derivation, we could assume that a similarity matrix with its weights distributed as a binomial distribution is a trivial matrix that should be excluded from the K-similarity matrix. We know that in K-means clustering $k\ge 2$, and thus $1/k\le 1/2$. Therefore, the binomial of random results is right-skewed. Using this assumption, we could design a threshold strategy for the K-similarity matrix.

Threshold strategy

Current methods require a user-defined parameter to control the trade-off between the natural sparsity of interaction topologies and measurement error [27, 48, 49], which has a great effect on the performance. Here, we develop a special threshold method for K-similarity according to its right-skewed distribution of random results. We exclude the random effects from the similarity measurement, thus obtaining the meaningful linkage inference. The K-similarity matrix measures the relative distance in a group, therefore, we consider the agents with relatively high weights are more likely to connect each other. While the ones with low weights are more likely to be generated from random effects.

For simplicity and generality, we apply GMM with two Gaussian models to describe the weight distribution of the K-similarity matrix. It is interpretable that one Gaussian model discovers the links that possibly exist, and the other Gaussian model simulates the right-skewed binomial distribution of random effects. GMM is parametrized by the mixture component weights and the component means and variances. For GMM with 2 components, they separately have means of $\mu _{1}$, $\mu _{2}$ and variances of $\sigma _{1}$, $\sigma _{2}$. The mixture component weights are defined as $\phi _{1}$ and $\phi _{2}$, with the constraint that $\sum _{i=1}^{2} \phi _{i}=1$ so that we have

$$\begin{aligned} \begin{aligned} p(x)&=\sum _{i=1}^{2} \phi _{i} {\mathcal {N}}\left( x \mid \mu _{i}, \sigma _{i}\right) , \\ {\mathcal {N}}\left( x \mid \mu _{i}, \sigma _{i}\right)&=\frac{1}{\sigma _{i} \sqrt{2 \pi }} \exp \left( -\frac{\left( x-\mu _{i}\right) ^{2}}{2 \sigma _{i}^{2}}\right) , \\ \sum _{i=1}^{2} \phi _{i}&=1. \end{aligned} \end{aligned}$$

(8)

The intersection of the two Gaussian models could be defined as the threshold. First, Gaussian mixture model in R package Mclust [50] is applied to cluster the weights into 2 groups. Then, find the nearest values of two Gaussian distributions, each belonging to one distribution, and use their mean as the threshold. An illustration of the GMM-based threshold strategy is presented in Fig. 2. Matrix A refers to the original adjacency matrix of the interaction topology with 1 representing the existed links and 0 for the non-existed ones. The similarity histogram presents the weight distribution from the K-similarity matrix, and the threshold comes from the intersection of the two Gaussian models. Matrix S refers to the K-similarity matrix whose shading represents the link weights. The elements framed in red are the links obtained after the threshold.

Experiments on simulated datasets

To demonstrate the ability of K-similarity to measure the interaction relationship in multi-agent systems, we conduct experiments on simulated datasets. To begin with, we introduce a model of the multi-agent system to produce the flocking data. Then different K strategies and conditions with outliers are tested in K-similarity measurement. Afterward, we apply K-similarity to infer the interaction topology and compare it with the other four similarity measurements. Finally, we test the threshold strategy designed for K-similarity on the simulated datasets.

Cucker–Smale model for multi-agent system

Various models are proposed to describe behaviors of the multi-agent system, including first-order differential equation models such as Boids [51] and Vicsek model [52], and second-order differential equation models such as Cucker–Smale (CS) model [53]. Here, we take the CS model as an example [54]. The dynamical movement of each agent i at time t is

$$\begin{aligned}&\frac{\mathrm{d}}{\mathrm{d}t}x_i (t) = v_i (t), \nonumber \\&\frac{\mathrm{d}}{\mathrm{d}t}v_i (t) = \alpha \sum _{j \ne i}a_{ij}(x)(v_j-v_i)-\nabla U_{x_i}(x)-\nabla V_{x_i}(x),\nonumber \\ \end{aligned}$$

(9)

where $\alpha $ is a parameter controlling the coupling strength, $a_{ij}$ is the impact that agent i exerts on agent j, $\nabla U_{x_i}(x)$ is potential field force forming the shape, and $\nabla V_{x_i}(x)$ is anti-collision repulsive force.

We assume the UAV agents in connection tend to have speed coordination. Thus, the impact of the interaction relationship is mainly contained in $a_{ij}$. To be specific,

$$\begin{aligned}&a_{ij} = C_{ij}\frac{1}{(1+\left\| x_i-x_j\right\| ^2)^2} \nonumber \\&\nabla U_{x_i}(x) = (\left\| x_i-{\bar{x}}\right\| )-R)\frac{{\bar{x}}-x_i}{\left\| {\bar{x}}-x_i\right\| }, \nonumber \\&\nabla V_{x_i}(x) = \frac{R}{2}\sin \frac{\pi }{N} \sum _{j=1}^{N}\left[ \frac{x_i-x_j}{\left\| x_i-x_j \right\| }-(x_i-x_j)\right] ,\nonumber \\ \end{aligned}$$

(10)

where $C_{ij}$ is the element of interaction topology C. $C_{ij}=1$ for connected individuals i and j. $C_{ij}=0$ for unconnected i and j.

The potential function as noted above is to form a circle. The system adopts a two-way chain structure as the basic interaction topology to guarantee the connectivity of all agents. Then, some random links generated by the ER network are superimposed on the basic topology to form the interaction topology of the multi-agent system we need to infer. The time delay in this paper is randomly set to be 1–3 units of time, and Gaussian noise is added to the observation data. 2-D space is adopted in the model. The undirected interaction topology of the multi-agent system could be inferred by speed and location similarity in the 2-D space. To maintain meaningful results, the trajectory we use is a segment that does not converge.

In this paper, we discussed three kinds of interaction topologies to explore the applicability of K-similarity in various situations.

Fixed topology: The swarm adopts a fixed topology throughout the whole observation.

Switching topology: The switching topology means that the change of positions happens on a much faster time scale than interaction structures rearrangement so that we could assume a fixed interaction topology between switches.

Time-varying topology: The time-varying topology means that the time scale of the topology change and that of the position change are comparable. Here, we assume the connectivity is determined by the distance between individuals, which is also the most common way to construct the time-varying topology in the modeling of UAV swarms. Namely,

$$\begin{aligned} C_{ij}= \left\{ \begin{aligned} 1,&\Vert x_i-x_j \Vert \le d_c\\ 0,&others. \end{aligned} \right. \end{aligned}$$

(11)

where $d_c$ is the communication range. The topology changes every time the swarm moves.

K-random strategy

Before applying K-similarity to infer the interaction topology, we should in advance determine the K strategy adopted in different situations. We test the performance of different K strategies in the model described in section Cucker–Smale model for multi-agent system, and obtain results in Fig. 3. Whether there are connections among agents is a highly imbalanced problem in most situations. Therefore, the performance is measured by Area Under Precision–Recall Curve (PR AUC) and Area Under the Receiver Operator Characteristic Curve (ROC AUC) [55], which are commonly used in class imbalance problems. A PR curve is a graph presenting the relationship between precision and recall. A ROC curve is a graph that presents the trade-off between True-Positive Rate (TPR) and False-Positive Rate (FPR). AUC is the area under the curve ranging from 0–1. The higher the curve is on the y-axis, the larger the AUC value and the better the model performance. PR and ROC curves and AUC values are calculated with R package ROCR [56].

We test 6 kinds of K strategies with various window size in Fig. 3, including K-fixed=5, K-fixed=8, K-fixed=N/2, K-adaptive, K-random, and K-random*N/2 (K-times enhanced K-random strategy with repeating $\left[ N/2 \right] $ times). The window size win ranges from 5 to 300 (the total length of the time series $P=300$). The slide is set to be $slide=\left[ 2/3win\right] $. Results are averaged across the whole time series in one experiment. And the experiments are executed 50 times repeatedly on a chain topology compounding an ER network with a connection probability of 0.02.

Results show that K-random and K-random*N/2 stably outperform K-adaptive and K-fixed strategies which have fewer varieties of k values, indicating that K-random and its enhanced strategies are able to detect multi-scale similarity information with k changing from 2 to $N-1$. As marked by the red circle in Fig. 3, K-random performs worse than K-random*N/2 when the window size is small, and then gradually climbs up to approach K-random*N/2. While K-random*N/2 stays at the top in most scenarios. It indicates that repetition helps to integrate biased results from random measurements for complete global information. Among the 6 strategies to choose k, the K-random strategy is the simplest yet effective method to process long time series, and K-random*N/2 is the most accurate to process short time series. Therefore, we will use these two strategies as appropriate in the following experiments.

It should be noticed that K-fixed strategies with $k=5,8,\left[ N/2\right] $ outperform the K-adaptive strategy overall, and even sometimes perform better than the K-random strategy in PR AUC or ROC AUC. Meanwhile, K-fixed strategies with different choices of k bring rather different results. Interestingly, $K-fixed=N/2$ cannot always win $K-fixed=5$ and $K-fixed=8$. We can infer that smaller scales (with larger k) of detection are not always better, since redundant information overwhelms valuable information. The k-adaptive strategy has the poorest performance, indicating that the ASW standard in K-means clustering cannot bring a variety of k values.

Performance with outliers

The influence of outliers is a common problem in data mining. We test K-similarity’s performance over a range of outlier proportions, including the proportion of snapshots containing outliers ($p_1$) and the proportion of outliers on each snapshot ($p_2$). $p_2$ is calculated according to the amount of all data values on the snapshot (N nodes $\times 2$-dimensional position $+$ N nodes $\times 2$-dimensional velocity). Furthermore, it is assumed that outliers have two types of distribution, random (Fig. 4A) and concentrated (Fig. 4B). The locations of random outliers are generated by randomly selecting a certain proportion of snapshots and a certain proportion of data on these snapshots. Whereas the locations of concentrated outliers are generated by selecting concentrated snapshots and concentrated data on those snapshots, both with random starting points. Selected data are replaced by randomly sampled values from the entire time series to generate outliers. Figure 4A and B only presents the outliers on positions for ease of understanding. The experiments are executed 50 times repeatedly on a chain topology compounding an ER network with a connection probability of 0.02.

Figure 4C and D shows the AUC trends separately with the change of the proportion of snapshots containing outliers ($p_1$) and that of outliers on each snapshot ($p_2$). It is obvious that the larger the proportion of the outliers, the poorer the performance of the K-similarity. As the total proportion ($p_1 \times p_2$) grows to 100%, ROC AUC drops to 0.5 and RP AUC drops to close to 0. Comparing Fig. 4C and D, we could find that the influence of $p_1$ and $p_2$ are similar when they are both smaller than 0.4, but situations diverge after they get larger. In random cases, the increase of $p_2$ does not affect the performance of K-similarity as much as $p_1$. But it will greatly amplify the effect of $p_2$ when $p_1=1$. That is, for random outliers, the proportion of snapshots is the more important factor until it reaches 1, and then the effect of the proportion of outliers on a snapshot starts to show.

In contrast, both $p_1$ and $p_2$ contribute to the performance of K-similarity with concentrated outliers. When the fixed $p_2$ is small, gradually increase the $p_1$, the performance first decreases and then increases slightly in Fig. 4C. This interesting phenomenon may be explained by mutual cancelation caused by concentrated outliers. Concentrated outliers imply that a node with the wrong velocity usually also has its position wrong. Therefore, applying k-means to this snapshot may give this node completely wrong connections to other nodes. But when most of the snapshots have several completely wrong nodes connected to other nodes, the adverse effects may be eliminated due to the step of averaging links in K-similarity. In general, K-similarity is more stable to concentrated outliers than random outliers. We will compare the performance of K-similarity with benchmark methods under random outliers in section Comparison performance.

Comparison with benchmark methods

The accuracy of K-similarity is tested in the simulated CS model, comparing PR and ROC AUC with four other benchmark methods. This section introduces the four methods and then presents the performances in simulated datasets with and without outliers.

Benchmark methods

We compare K-similarity with the four most commonly used and classical methods in similarity measurement for inference of multi-agent system as listed below.

Euclidean distance (ED): Euclidean Distance between time series X and Y is defined by

$$\begin{aligned} ED(X,Y)=\sqrt{\sum _{t=1}^{N} (x_{t}^i-y_{t}^j)^2}. \end{aligned}$$

(12)

Distance correlation (DC): Distance correlation proposed by [57] is a creative way to detect the dependence. For random variables (X, Y), denote the joint characteristic function of (X, Y) by $f_{(X, Y)}$, and its marginal characteristic functions $f_{X}$ and $f_{Y} .$ The distance covariance between X and Y is defined as the root of the following equation:

$$\begin{aligned} d c o v^{2}(X, Y)&=\int _{R^{p+q}}\left| f_{(X, Y)}(t, s)-f_{X}(t) f_{Y}(s)\right| ^{2} \\&\quad \times w(t, s) \mathrm{d} t \mathrm{d} s \text{, } \end{aligned}$$

where p and q are the dimensions of X and Y, respectively, and w(t, s) is the weight function given by $\left( c_{p} c_{q}|t|_{p}^{p+1} |s|_{q}^{q+1}\right) ^{-1}$ with $c_{p}=\pi ^{(1+p) / 2} \Gamma ((1+p) / 2)$ and $c_{q}=\pi ^{(1+q)} / 2 \Gamma ((1+q) / 2) .$ By standardizing the distance covariance, the distance correlation can be defined as,

$$\begin{aligned} d c o r(X, Y)=\frac{d c o v(X, Y)}{\sqrt{d c o v(X, X)} \sqrt{d c o v(Y, Y)}}. \end{aligned}$$

We adopt R package energy [58] to calculate DC.

Dynamic time wrapping (DTW): Given time series $X=\{x_1,x_2,\dots ,x_n\}$ and $Y=\{y_1,y_2,\dots ,y_n\}$, DTW is calculated by

$$\begin{aligned}&DTW(X,Y) = \left\{ \begin{aligned}&0,&if \ n = 1 \ and \ m = 1; \\&\infty ,&if \ n = 1 \ or \ m = 1; \\&d(H(X),H(Y)) + \min ( DTW(X,R(Y)),DTW(R(X),Y),\\&DTW(R(X),R(Y)))&others. \end{aligned} \right. \end{aligned}$$

(13)

where H(X) means $x_1$, and R(X) means $\{x_2,\dots ,x_n\}$. DTW is calculated by R package dtw [59].

Mutual information (MI): The MI of discrete random variables X and Y is defined as

$$\begin{aligned} I(X, Y)=\sum _{x \in X, y \in Y} p(x, y) \log \frac{p(x, y)}{p_{X}(x) p_{Y}(y)}, \end{aligned}$$

where $p_{X}, p_{Y}$ are the marginal probability mass functions of X and Y, respectively, and p is the joint probability mass function of (X, Y). For continuous random variables, the MI is defined as

$$\begin{aligned} I(X, Y)=\int _{X} \int _{Y} p(x, y) \log \frac{p(x, y)}{p_{X}(x) p_{Y}(y)} \mathrm{d}x \mathrm{d}y, \end{aligned}$$

where p is the joint probability density function of (X, Y) and $p_{X}$ and $p_{Y}$ are the probability density functions of X and Y, respectively. MI is calculated by R package infotheo [60].

Comparison performance

We compare the PR AUC and ROC AUC performance of our method with four benchmark methods under both fixed topology and time-varying topology. The cases with and without outliers are also discussed, where it is assumed that 10% of random outliers exist. Furthermore, the inference of switching topologies with 3 different fixed topologies at each stage is tested online. With the above settings, we comprehensively compare the accuracy and stability of each method under 50 times repeated experiments.

Figure 5A presents the results of inferring the fixed topologies with different densities. As shown in Fig. 3, the K-random strategy is as good as K-random*N/2 for long time series and requires minimal computation. Therefore, we apply K-random here to compare with other benchmark methods. The results show an overall downward trend with the increase in topological density. It can be explained by increased synergy with increasing density, resulting in indistinguishable trajectories. ROC AUC of all methods gradually decreases to 0.5, which indicates the failure of the methods when reaching a certain density. However, most of the PR AUC curves begin to rise as density increases, while the ROC AUC curves continue to decline. The reason is that ROC AUC under random sampling has the robustness of uneven distribution of positive and negative samples, while PR AUC does not. A growing density means a growing number of positive samples, and the calculation formula leads to an increase in the PR AUC value.

Figure 5B shows the effectiveness of K-similarity in switching topology. We apply K-random and K-random*N/2 with a window size of 5 to infer topology online. Since the switching time point is assumed to be unknown in advance, a large window size may increase the inference error near the switching point. The results show that in an online inference of relatively stable topologies, K-random*N/2 is more accurate and robust than the K-random strategy. The fluctuation of velocity variance shows that when the velocity fluctuates too sharply or nearly converges, it is difficult to infer the network. In contrast, it is suitable to apply K-similarity when it maintains certain stability without full convergence, which is also the main feature of collective motion, especially for UAV swarm.

Figure 5C reveals patterns of different methods inferring time-varying topologies with and without outliers. K-similarity methods show consistently high accuracy under both conditions. While distance-based methods like ED and DTW perform perfectly for ideal simulation data without outliers but decline a lot with outliers existing.

Comparing Fig. 5A and C, K-similarity outperforms the four benchmark methods in most conditions. Especially when adding 10% random outliers, the performance of all benchmark methods has varying degrees of decline and fluctuation except DC which remains stably low. However, K-similarity remains the top performer with only a slight decline, showing strong stability against outliers. Comparing Fig. 5B and C, K-random and K-random*N/2 perform similarly in time-varying topology, while differ in switching topology. It demonstrates that the power of repetition does not contribute to accuracy when topology changes fast.

GMM-based threshold for K-similarity

GMM-based threshold strategy models the weight distribution of the K-similarity matrix by two Gaussian distributions. One distribution with higher weights is considered as the potential links, while the other with lower weights is regarded as the results of random effects. The intersection point between two distributions is designated as the threshold. We conduct experiments to illustrate the effectiveness of the threshold to the K-similarity matrix as is shown in Fig. 6. PR AUC, ROC AUC, and Adjust Rand Index (ARI) are applied to measure the performance of the threshold strategy. The value range of ARI is $[-1, 1]$. The larger the value, the better the performance. Comparison with other methods after the threshold is not presented, as the GMM-based threshold strategy is specially developed for the K-similarity matrix. The experiments are executed 50 times repeatedly on a chain topology compounding an ER network with a connection probability of 0.01.

In Fig. 6, the orange line represents results from the original K-similarity matrix, and the black one represents that from the matrix after the threshold. In Plot C, the orange points represent the best ARI value iterating over all thresholds in a K-similarity matrix, while black points represent the ARI value of the threshold. The results indicate that the threshold strategy applied in the K-similarity matrix performs well with respect to PR AUC, ROC AUC, and ARI. Specifically, the matrix after the threshold still holds most of the interaction information, fulfilling a trade-off between sufficient information and measurement error.

Experiment on real datasets

In this section, we apply K-similarity and GMM-based threshold strategy in two real datasets to test the efficiency of our method.

Pigeons

It is indicated that interaction topology exists in pigeons, starlings, and other bird flocks. Agents in bird flocks follow simple interaction rules such as long-range attraction, short-range repulsion, and intermediate-range alignment [22]. Therefore, we could infer their interaction relationship by measuring their behavioral similarity. To reveal the effectiveness of K-similarity towards general multi-agent systems, we experiment on a real pigeon dataset from a cluster of homing pigeons in [38] to infer the interaction topology among agents. Research [38] measures the time delay to obtain a leadership hierarchy which is transformed into an undirected interaction topology in this paper as is shown in Plot A of Fig. 7. The GPS Data adopted in this paper is from the experiment of flock AB, site 2, release 7, with both experienced and inexperienced pigeons in the group. The flying distance is around 10.11 km. The name of each node is the code of every pigeon. The undirected interaction topology is regarded as the true interaction topology of the pigeons, which remains for us to infer. Plot B of Fig. 7 shows the flocking trajectories, most of which are overlapped with each other due to the long range of tracking. The x-axis is the latitude, and the y-axis is the longitude.

Here, we mainly adopt position (latitude and longitude) and speed (magnitude and direction) to infer the interaction topology of the pigeon flock. First, agents with closer interactions tend to communicate more frequently and are therefore more likely to have similar speeds than other agents. Second, whether a group adopts broadcasting or an energy-saving strategy to maintain its interaction, it can be classified as a kind of location preference. That is, agents in a group tend to link to agents that are geographically close to them. We infer the interaction topology by two steps: similarity measurement and threshold. To begin with, K-similarity is calculated to measure the global relationship of all agents and compare the K-similarity with the other four benchmark similarity measurements mentioned above. Then we select the possible links according to GMM-based threshold strategy in section Threshold strategy, thereby obtaining the inferred interactions topology. All data are z-scored to avoid dimensional effects.

Figure 8 presents the comparison between K-similarity and the four other methods concerning PR and ROC AUC, as well as the performance of GMM-based threshold on K-similarity. It is shown that K-similarity has an evident advantage in accuracy, outperforming the second-best method MI by 0.05 PR AUC and 0.1 ROC AUC. Unlike the performance on simulated data in Fig. 5, MI shows better performance than DC and DTW, similar to the simulated experiment with outliers. GMM-based threshold strategy also gives a clear division of the weight distribution in plot C of Fig. 8. The solid line represents the threshold with the highest ARI, and the dotted line represents the threshold determined by-GMM based method. The inferred topology after the threshold is shown in plot C of Fig. 7. Thick lines indicate stronger interaction and thin lines weaker interaction. As we can see that most of the important links are recovered, and some links are incorrectly inferred with weaker weights than others. Experiments on real datasets reveal the generality and adaptability of K-similarity.

Temperature

To demonstrate the applicability of our method to the more general cases, we apply K-similarity and GMM-based threshold on a real dataset^{Footnote 1} consisting of 5844 daily air temperature measurements in the US from 2000 to 2015 [61]. The temperature data we use here are univariate. The dataset is also analyzed in [62, 63]. K-similarity is applied to infer the relationship of 49 states with temperature data measured on the surface by degrees Kelvin (short for degK in Fig. 9). The temperature patches are drawn from one of 5844 days.

We test the K-similarity with different k values. Results in Fig. 9 illustrate that different k values detect different scales of information. The larger the k, the closer and smaller the relationship it detects, and the more skewed the random effect is. When $k=2$ the binomial distribution of the random effects is similar to a Gaussian distribution with the mean of 1/2. Therefore, the random effects overwhelm useful information when $k=2$. The result of $k=5$ best matches the information we normally receive. It does not mean that the performance of $k=10$ and $k=20$ are poor, but that the detection scale of larger k is small, and the detected links are more closely related. Overall, the inferred topologies tend to link neighboring states, consistent with common sense that neighboring states tend to have similar temperatures.

In this section, we test our method on a homing pigeon dataset with multivariate time series and a temperature dataset with univariate time series. The results show that our method could measure similarity accurately and identify important connections, which may contribute to similarity measurement and topology inference in various application fields.

Discussion

In this section, we clarify the origin of the ideas in this paper and discuss their risks, restrictions, and possible decisions to minimize them. The idea of developing K-similarity to infer the topology of the multi-agent system is inspired by a previous work we have done [47] which integrates the k-means clustering results on each snapshot into a matrix for time series clustering. The integrated matrix contains global relationships among individuals, which triggers us to explore the possibility of using it for topology inference. Therefore, we further develop the clustering method in [47] into K-similarity for global relative similarity measurement of multi-agent systems.

There are various models to construct multi-agent systems. We mainly concentrate on flocking which is a fundamental behavior of multi-agent systems. The flocking behaviors are modeled by some classic models including Vicsek, Boids, and C-S models, etc. In the real-world control and natural multi-agent systems, the movement is controlled by acceleration. Hence, we use the second-order C-S model which models acceleration directly to describe the multi-agent system behaviors. However, the C–S model is a set of highly abstract mathematical functions which does not consider the specific control problems of multi-agent systems. Therefore, the data generated by the C–S model is ideal compared to the real-world data. To better test K-similarity in less ideal situations, we apply it to the C–S model with outliers and real datasets, demonstrating the generality of K-similarity. In addition to flocking behavior, multi-agent systems also have a variety of behavior patterns. K-similarity cannot solve the topology inference problem of all kinds of multi-agent systems. It aims at the data of consensus or synchronized dynamic network systems.

K-similarity is essentially evolved from ED, and therefore, has similar properties. It only compares data with the same timestamp, which ignores the time delay and other complex factors such as positive and negative feedback. We do not compare our method with other complex topology inference techniques, for K-similarity is developed as a new kind of similarity or distance measurement. Therefore, we compare K-similarity with ED to check if our method improves the performance, DTW which is designed for time series with time delay, DC which measures correlations between multivariate time series, and MI which is the most common tool to detect distributional information for interaction inference. Therefore, we use ED, DTW, DC, and MI as benchmark methods which also have matured R packages for standard computation. The four benchmark methods each set a baseline for detecting different types of information to infer the interactions. However, the property of K-similarity to detect global relative similarity also limits its use only in systems with known time series. The dynamic entry and exit of the new upcoming time series will be illustrated in future work in the section, which will help K-similarity be applied to more scenarios as a general similarity or distance measure.

We use K-similarity to measure the strength of interactions between individuals, obtaining a weight matrix of interactions. Without prior knowledge such as network density, a criterion needs to be developed for selecting important edges. The general practice is to take the threshold. It is necessary to carefully examine the characteristics of the matrix obtained by K-similarity and to develop a method for determining the threshold. We theoretically discuss the weight distribution of the matrix obtained by K-similarity from random systems and find that it is close to a right-skewed binomial distribution with the success probability of $1/k \le 1/2$. We assume that truly existing interactions have large weights in the K-similarity matrix, which is left-skewed. The assumption is consistent with the experimental results which are mostly bimodal. Therefore, we developed the GMM method to distinguish between random factors and real interactions. The risk is that when the value of k is relatively small such as 2 and P is large, the binomial distribution caused by random factors is closer to a Gaussian distribution with the mean of 0.5. It will result in the undistinguishable weight distribution of random effects and real interaction. In the temperature experiment, we do find this feature when $k=2$. To minimize this risk, it is recommended to use larger $k\ge 4$ when k is fixed.

Finally, we illustrate the effectiveness of our method with two real datasets. For the natural flocking system, no ground truth topology can be used as the standard answer. Therefore, we use a pigeon dataset with a known topology that is also inferred from tracking data by a different method in [38]. The pigeon data agrees well with the flocking behavior defined in this paper and thus is helpful demonstrate the validity of K-similarity and its threshold strategy. For the artificial multi-agent system, there does exist a real topology. But we have not conducted real experiments in this area, nor have we obtained the data from other research. To demonstrate the generalizability of our algorithm, we apply K-similarity on a temperature dataset to infer topology among states by measuring the similarity of temperature time series. The results show that the inferred topologies are highly correlated with geographic locations as well as the climate zones, which agrees with common sense.

Conclusion

In this paper, we propose a new method named K-similarity to measure the global relative relationship in a multi-agent system. K-similarity is a networked synthesis of clustering results from each snapshot. Multivariate time series are split into snapshots where K-means clustering is implemented with various K strategies available. Then, the clustering results on each snapshot are nonlinearly mapped into a network and together form a synthetic network. The link weight of the network contains the global relative similarity between agents, i.e. K-similarity. It has the following properties in the analysis:

$\bullet $ K-similarity can be converted to a strict distance, and therefore has the potential to be applied in a variety of scenarios;

$\bullet $ It is stable to outliers including random and concentrated outliers since the mapping rules limit the influence of outliers to a normal range;

$\bullet $ We discuss the weight distribution of a trivial case with completely random trajectories, and derive a GMM-based threshold strategy to select informative links from the K-similarity matrix;

$\bullet $ Different K strategies can detect similarities at different scales, which provides a possible tool for building higher-order networks and multi-scale detection.

Experiments are executed on both simulated and real datasets to verify its superiority over the other four benchmark similarity measurements and the effectiveness of the GMM-based threshold. K-similarity provides a framework to enrich the definition of similarity in a flexible paradigm. The current method aims at systems with a constant number of time series. In the future, a dynamic entry and exit mechanism for time series is to be designed. Assuming that the new coming time series have a negligible effect on the overall distribution, the time series could be added by joining the closest cluster on each snapshot, and be removed simply by deleting its row and column in the similarity matrix. This mechanism will help K-similarity be applied to more scenarios as a general similarity or distance measure.

Notes

NCEP Reanalysis data provided by the NOAA/OAR/ESRL PSL, Boulder, Colorado, USA, from their Web site at https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html.

References

Sirot E (2006) Social information, antipredatory vigilance and flight in bird flocks. Anim Behav 72(2):373–382
Google Scholar
Pan Yu, Dong H (2020) Computational analysis of hydrodynamic interactions in a high-density fish school. Phys Fluids 32(12):121901
Google Scholar
Buhl JS, David JTI, Couzinain D, Hale JJ, Despland E, Miller Edgar R, Simpson SJ (2006) From disorder to order in marching locusts. Science 312(5778):1402–1406
Google Scholar
Couzin ID, Krause J, James R, Ruxton GD, Franks NR (2002) Collective memory and spatial sorting in animal groups. J Theor Biol 218(1):1–11
MathSciNet Google Scholar
Hyondong O, Ramezan SA, Chaoli S, Yaochu J (2017) Bio-inspired self-organising multi-robot pattern formation: a review. Robot Auton Syst 91:83–100
Google Scholar
Wen X, Qin S (2021) A projection-based continuous-time algorithm for distributed optimization over multi-agent systems. Complex Intell Syst, pp 1–11
Gao W, Phan Trung V, Shengkai L, Wombacher Michael Q, Junle PY, Guo C, Goldman Daniel I, Levin Simon A, Austin Robert H et al (2021) Emergent field-driven robot swarm states. Phys Rev Lett 126(10):108002
Google Scholar
Soon-Jo C, Avinash PA, Philip D, Shaojie S, Vijay K (2018) A survey on aerial swarm robotics. IEEE Trans Rob 34(4):837–855
Google Scholar
Beaver Logan E, Malikopoulos Andreas A (2021) An overview on optimal flocking. Annu Rev Control 51:88–99
MathSciNet Google Scholar
Hoare MR et al (1979) Structure and dynamics of simple microclusters. Adv Chem Phys 40:49
Google Scholar
Stefano B, Vito L, Yamir M, Martin C, Hwang D-U (2006) Complex networks: Structure and dynamics. Phys Rep 424(4–5):175–308
MathSciNet Google Scholar
Beaver Logan E, Malikopoulos Andreas A (2020) Beyond reynolds: A constraint-driven approach to cluster flocking. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 208–213. IEEE
Zahra Y, Ali TH (2020) Cluster consensus of fractional-order non-linear multi-agent systems with switching topology and time-delays via impulsive control. Int J Syst Sci 51(10):1685–1698
MathSciNet MATH Google Scholar
Bai G, Li Y, Fang Y, Zhang Y-A, Tao J (2020) Network approach for resilience evaluation of a uav swarm by considering communication limits. Reliab Eng Syst Saf 193:106602
Google Scholar
Steffi K, Zhiyong C, Middleton Richard H (2015) Overview: collective control of multiagent systems. IEEE Trans Control Netw Syst 3(4):334–347
MathSciNet MATH Google Scholar
Dengxiu Yu, Philip CL, Chen. (2020) Smooth transition in communication for swarm control with formation change. IEEE Trans Industr Inf 16(11):6962–6971
Zhang H-T, Chen Z, Vicsek T, Feng G, Sun L, Riqi S, Zhou T (2014) Route-dependent switch between hierarchical and egalitarian strategies in pigeon flocks. Sci Rep 4(1):1–7
Google Scholar
Chen D, Liu X, Bowen X, Zhang H-T (2017) Intermittence and connectivity of interactions in pigeon flock flights. Sci Rep 7(1):1–11
Google Scholar
Hangjian L, Mclvor Guillam E, van der Vaart Kasper, Vaughan Richard T, Thornton Alex, Ouellette Nicholas T, (2019) Costs and benefits of social relationships in the collective motion of bird flocks. Nat Ecol Evolut 3(6):943–948
Hangjian L, Mclvor Guillam E, Joseph W, van der Vaart Kasper, Vaughan Richard T, Thornton Alex, Ouellette Nicholas T, (2019) Behavioural plasticity and the transition to order in jackdaw flocks. Nat Commun 10(1):1–7
Fei L, Zhong M, Tang S, Maggioni M (2019) Nonparametric inference of interaction laws in systems of agents from trajectory data. Proc Natl Acad Sci 116(29):14424–14433
MathSciNet MATH Google Scholar
Hangjian Ling, Mclvor Guillam E, van der Vaart Kasper, Vaughan Richard T, Thornton Alex, Ouellette Nicholas T. (2019) Local interactions and their group-level consequences in flocking jackdaws. Proc R Soc B 286(1906):20190865
Cavagna A, Cimarelli A, Giardina I, Parisi G, Santagati R, Stefanini F, Viale M (2010) Scale-free correlations in starling flocks. Proc Natl Acad Sci 107(26):11865–11870
Google Scholar
Lukeman R, Li Y-X, Edelstein-Keshet L (2010) Inferring individual rules from collective behavior. Proc Natl Acad Sci 107(28):12576–12580
Google Scholar
Milner Jordan E, Blackwell Paul G, Niu Mu (2021) Modelling and inference for the movement of interacting animals. Methods Ecol Evolut 12(1):54–69
Google Scholar
Ivan Brugere, Brian Gallagher, Berger-Wolf Tanya Y (2018) Network structure inference, a survey: motivations, methods, and applications. ACM Comput Surv (CSUR) 51(2):1–39
Google Scholar
Marti Gautier, Nielsen Frank, Bińkowski Mikołaj, Donnat Philippe (2021) A review of two decades of correlations, hierarchies, networks and clustering in financial markets. Progress in Information Geometry, pages 245–274
Li J, Convertino M (2021) Inferring ecosystem networks as information flows. Sci Rep 11(1):1–22
Google Scholar
Farahani FV, Karwowski W, Lighthall NR (2019) Application of graph theory for identifying connectivity patterns in human brain networks: a systematic review. Front Neurosci 13:585
Google Scholar
Syuen Chai Jia, Ganeshsree Selvachandran, Florentin Smarandache, Gerogiannis Vassilis C, Le Hoang Son, Quang-Thinh Bui, Bay Vo (2021) New similarity measures for single-valued neutrosophic sets with applications in pattern recognition and medical diagnosis problems. Complex Intell Syst 7(2):703–723
Google Scholar
Magdy N, Abdelkader T, El-Bahnasy K (2018) A comparative study of similarity evaluation methods among trajectories of moving objects. Egypt Inf J 19(3):165–177
Google Scholar
De Sousa Roniel S, Azzedine Boukerche, Loureiro Antonio AF (2020) Vehicle trajectory similarity: models, methods, and applications. ACM Comput Surv (CSUR) 53(5):1–32
Google Scholar
Pan H, You X, Liu S, Zhang D (2021) Pearson correlation coefficient-based pheromone refactoring mechanism for multi-colony ant colony optimization. Appl Intell 51(2):752–774
Google Scholar
Rauf Ahmad M (2019) A significance test of the rv coefficient in high dimensions. Comput Stat Data Anal 131:116–130
MathSciNet MATH Google Scholar
Donnat P, Marti G, Very P (2016) Toward a generic representation of random variables for machine learning. Pattern Recogn Lett 70:24–31
Google Scholar
Shashank Gavel, Singh Raghuvanshi Ajay, Sudarshan Tiwari (2022) Maximum correlation based mutual information scheme for intrusion detection in the data networks. Exp Syst Appl 189:116089
Google Scholar
Gang-Jin Wang, Chi Xie, Eugene Stanley H (2018) Correlation structure and evolution of world stock markets: evidence from pearson and partial correlation-based networks. Comput Econ 51(3):607–635
Google Scholar
Isobel Watts, Benjamin Pettit, Máté Nagy, de Perera Theresa Burt, Biro Dora, (2016) Lack of experience-based stratification in homing pigeon leadership hierarchies. R Soc Open Sci 3(1):150518
Guo X, Zhang Y, Wenhao H, Tan H, Wang X (2014) Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation. PLoS ONE 9(2):e87446
Google Scholar
Lahreche A, Boucheham B (2021) A fast and accurate similarity measure for long time series classification based on local extrema and dynamic time warping. Expert Syst Appl 168:114374
Google Scholar
Zou H, Xue L (2018) A selective overview of sparse principal component analysis. Proc IEEE 106(8):1311–1320
Google Scholar
Chatterjee S (2021) A new coefficient of correlation. J Am Stat Assoc 116(536):2009–2022
MathSciNet MATH Google Scholar
Li Zhengxin, Liu Jia, ZhangXiaofeng (2020) Similarity measure of multivariate time series based on segmentation. In Proceedings of the 2020 12th International Conference on Machine Learning and Computing, pages 47–51
Kasieczka G, Shih D (2020) Robust jet classifiers through distance correlation. Phys Rev Lett 125(12):122001
Google Scholar
Lamrini Bouchra, Gjini Augustin, Daudin Simon, Pratmarty Pascal, Armando François, Travé-Massuyès Louise (2018) Anomaly detection using similarity-based one-class svm for network traffic characterization. In DX@ Safeprocess
Josse J, Holmes S (2016) Measuring multivariate association and beyond. Stat Surv 10:132
MathSciNet MATH Google Scholar
Kongjing G, Mao Z, Duan X, Guanlin W, Yan L (2021) Identifying the module structure of swarms using a new framework of network-based time series clustering. Eng Appl Artif Intell 101:104214
Google Scholar
Kai W, Liu J, Wang S (2016) Reconstructing networks from profit sequences in evolutionary games via a multiobjective optimization approach with lasso initialization. Sci Rep 6(1):1–11
Google Scholar
Wu K, Hao X, Liu J, Liu P, Shen F (2020) Online reconstruction of complex networks from streaming data. IEEE Trans Cybern
Scrucca Luca, Fop Michael, MurphyT Brendan, Raftery Adrian E (2016) mclust 5: clustering, classification and density estimation using gaussian finite mixture models. The R journal, 8(1):289
Reynolds CW (1987) Flocks, herds and schools: A distributed behavioral model. In: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pages 25–34
Vicsek T, Czirók A, Ben-Jacob E, Cohen I, Shochet O (1995) Novel type of phase transition in a system of self-driven particles. Phys Rev Lett 75(6):1226
MathSciNet Google Scholar
Cucker F, Smale S (2007) Emergent behavior in flocks. IEEE Trans Autom Control 52(5):852–862
MathSciNet MATH Google Scholar
Li X, Liu Y, Jun W (2016) Flocking and pattern motion in a modified cucker-smale model. Bull Korean Math Soc 53(5):1327–1339
MathSciNet MATH Google Scholar
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):1–21
Google Scholar
Sing T, Sander O, Beerenwinkel N, Lengauer T (2005) Rocr: visualizing classifier performance in r. Bioinformatics 21(20):7881
Székely Gábor J, Rizzo Maria L, Bakirov Nail K (2007) Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6):2769–2794
Rizzo Maria, Szekely Gabor (2021) energy: E-Statistics: Multivariate Inference via the Energy of Data. R package version 1.7-8
Giorgino T (2009) Computing and visualizing dynamic time warping alignments in r: the dtw package. J Stat Softw 31:1–24
Google Scholar
Meyer Patrick E. (2014) infotheo: Information-Theoretic Measures,. R package version 1.2.0
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J et al (1996) The ncep/ncar 40-year reanalysis project. Bull Am Meteor Soc 77(3):437–472
Google Scholar
Egilmez Hilmi E, Pavez Eduardo, Ortega Antonio (2018) Graph learning from filtered signals: Graph system and diffusion kernel identification. IEEE Transactions on Signal and Information Processing over Networks, 5(2):360–374
Zhu Yu, Schaub Michael T, Jadbabaie Ali, Segarra Santiago (2020) Network inference from consensus dynamics with unknown parameters. IEEE Transactions on Signal and Information Processing over Networks, 6:300–315

Download references

Funding

National Natural Science Foundation of China (Grant Nos. 12101608, 11771450). Natural Science Foundation of Hunan Province (Grant No. 2021JJ40675).

Author information

Authors and Affiliations

College of Science, National University of Defense Technology, Changsha, China
Kongjing Gu, Xiaojun Duan, Mingze Qi & Liang Yan

Authors

Kongjing Gu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Duan
View author publications
You can also search for this author in PubMed Google Scholar
Mingze Qi
View author publications
You can also search for this author in PubMed Google Scholar
Liang Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojun Duan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Our source code is available on https://github.com/gukongjing/AGlobal-Relative-Similarity-for-Inferring-Interactions-of-Multiagent-Systems.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gu, K., Duan, X., Qi, M. et al. A global relative similarity for inferring interactions of multi-agent systems. Complex Intell. Syst. 9, 1671–1686 (2023). https://doi.org/10.1007/s40747-022-00877-5

Download citation

Received: 04 April 2022
Accepted: 13 September 2022
Published: 04 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s40747-022-00877-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A global relative similarity for inferring interactions of multi-agent systems

Abstract

Similar content being viewed by others

Fixed Time Synchronization of Complex Dynamical Networks

Design of Heterogeneous Multi-agent System for Distributed Computation

Multi-Agent Control: A Graph-Theoretic Perspective

Introduction

Formulation of K-similarity

Definition

The choice of K strategy

Features of K-similarity

Transformation into distance

Stability against outliers

Trivial weight distribution

Threshold strategy

Experiments on simulated datasets

Cucker–Smale model for multi-agent system

K-random strategy

Performance with outliers

Comparison with benchmark methods

Benchmark methods

Comparison performance

GMM-based threshold for K-similarity

Experiment on real datasets

Pigeons

Temperature

Discussion

Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation