Introduction

Collective behaviour is one of the most enigmatic phenomena of contemporary times [1]. It is a genuinely omnipresent phenomenon— from the flocking of birds and the growth of bacterial colonies [2, 3] to the stock market trading and hierarchy of school classes. We intuitively feel that something should be familiar to all these phenomena because the observed patterns of agents’ behaviour, such as leader, follower, mediator, or crowd, sound applicable to all these systems. There is probably some general law or structure common to most living species and unrelated to the degree of evolution and the level of the organisation [4, 5].

This work justifies and formalises such a general meaning and empirically verifies the findings. To be able to link such diverse types of collective behaviour, it is necessary to choose its common manifestation. We selected the motion. This term has a similar meaning for most groups of living species (a move from "bad" to "good", whatever this means) and is invariant to the organisation level (degree of evolution). For instance, cells move from a place without nutrients to a place with nutrients or dogs run away from vehicles.

One of the most common approaches to evaluating motion is the analysis of agents’ relative position [6]. This method was successfully applied in the investigation of human behaviour [7] and fish [8]. There are two types of this approach: physically inspired and model-free. The first one assumes a direct analogy with the physical system: agents are considered particles that are driven by potential—spring-like—forces. These potential-driven particles are the first models that can explain the behaviour of flocks and schools [4]. This set of techniques is an excellent tool for quantifying known phenomena but not discovering new ones.

Another—data-driven—approach characterises the agents’ behaviour by specific, measurable quantities (e.g., the nearest neighbour distance, the Voronoi cell size) and then uses these high-level features to describe observed phenomena. To a certain extent, this approach works, but a more recent paper [9] proved it insufficient. However, later, the data-driven method using the diffusion maps was applied successfully to identify states of collective behaviour in dense fish schools [10].

Fig. 1
figure 1

Aquarium. (A) The experimental setup (a fish aquarium with two side mirrors) and (B) individual objects (fish) tracking in the multiple views after image geometry restoration

The next generation of models [11] defined a network based on each agent’s position and the sensory field. This step was necessary for incorporating the behaviour itself (the individual’s reaction to the observed situation) into the model that would address the decision-making process. However, this approach is still not fully model-free. It assumes that the sensory field is tied to the agent, with the interactions being semi-local and homogeneous. Additionally, these interactions are predefined by, e.g., a similar orientation or moving in the same direction. Despite the remarkable success, extending this approach to include heterogeneous agents and non-local interactions is challenging.

To address these limitations, we propose a more model-free way of characterising a group of interacting agents. This approach does not limit interactions by type or locality. Moreover, as we will show in the results, this approach allows the agents to be heterogeneous, which is crucial to explaining the biological experiment results. Due to causality analysis, we are not enforcing the type of pattern (e.g. moving in the same direction), only rotation and scale invariant pattern repeatability.

We selected a small aquarium fish—Tiger Barb—as a testing subject. This species is very active and smart, able to form shoals from 5 individuals with complex collective behaviour [12]. We employed video analysis to track each fish’s movement without any label.

Here we propose a method of characterising collective behaviour by causality networks. It is a way of converting time-resolved multi-agent dynamics to a simple mathematical graph, showing who follows who. Such a follow-up network can be calculated for a time window, outlining the system’s time evolution. This approach is generic, considering only Euclidean coordinates of the objects. However, this approach requires knowledge of the trajectories of each agent. Thus, tracking and recognition of individuals are needed.

Materials and methods

Fish breeding

As a model organism, we used a Tiger Barb (Puntigrus tetrazona) in a group of 6 individuals.

Between the experiments, the fish lived in a relatively large home aquarium (60\(\times \)35\(\times \)40 \(\hbox {cm}^3\)). The aquarium was disinfected with ethanol, boiling, and sodium hydrogen carbonate and designed to simulate the natural environment with involved aquatic plants and stones (4–8 mm). The fish were bred at a 12/12 artificial daylight cycle and 25 \(^{\circ }\)C. The water quality was validated at weekly intervals and before each experiment. The parameters of water were temporarily adjusted according to the conditions prevailing in the aquarium: 4.9 g \(\cdot \) \(\hbox {L}^{-1}\) \(\hbox {Mg}^{2+}\), 12.0 g \(\cdot \) \(\hbox {L}^{-1}\) \(\hbox {Ca}^{2+}\), 5.5 g \(\cdot \) \(\hbox {L}^{-1}\) \(\hbox {NO}^{3-}\), pH 6.9, hardness 2.8 \(^{\circ }\)H. The aquarium was equipped with a Sera LED 360, a Sera Air 110 air pump, a Sera Fil Bioactive 130 L separate filter (all from Heinsberg, Germany), and a water heater. The fish were fed (predominantly by the Tetra Min, Tetra, Melle, Germany) once a day (at 7 a.m.), always 1 h after turning on the aquarium illumination.

Image data collection

The experiments were conducted at (11 ± 0.5) a.m. on 10 consecutive days (including the weekend). The condition of the fish was assessed before each experiment. Figure 1 represents the experimental set-up. The small (\(38.5\times 20\times 14\) \(\hbox {cm}^3\); maximal volume 8-L; 6-L used) aquarium was located inside a LED box, which provided homogeneous illumination from all angles. Water in the aquarium came from the home aquarium. Along four sides of the aquarium, mirrors were located. Two—above and below the aquarium—mirrors were sufficient to observe the aquarium from 3 sides simultaneously (one side directly and two sides in mirrors). The fish were transported to the small aquarium under full illumination 5 min before each experiment. Of course, such a setup may cause significant stress to the fish. However, the previous exposure of the fish to light significantly mitigated the behaviour alteration. In addition, this species has a keen intellect and good memory, so the experimental conditions are not new for it every day, and the stress is alleviated. Above all, the aim was not to fully eliminate the stress but to make it repeatable and systematic so that all fish perceive it equally. All experiments were performed with a single group of fish to reduce inter-group variability. It is unknown how different school structures can be (even for the same species and group size).

Fig. 2
figure 2

Algorithm. (A) The extraction of objects’ coordinates and tracking algorithm. (B) The algorithm for construction of follow-up networks. This algorithm arises from the tracking algorithm A

To reduce the errors in identifying individual fish, the data was collected by a 12-bit Ximea MX124CG-SU-X2G2-FAB rgb camera (scanning frequency 0.4 Hz). The size of the fish in the digital image corresponded to approx. 50\(\times \)30 px (5 µ\(\hbox {m}^2\,\cdot \) \(\hbox {px}^{-1}\)).

Results

Fish tracking

During the experiment, the aquarium with fish was recorded with a frequency of 25 Hz. The images were stored uncompressed in the 16-bit depth. Before further processing, we had to separate the camera images into corresponding views. The aquarium corners were annotated manually (a total of 28 points) and then fitted with a projective transformation. As a side effect, the projective transformation compensated for the distortion of the corresponding views (Fig. 1) and the aspect ratio became linearly proportional to the actual size of the aquarium.

The aquarium front view giving the z-coordinates of the fish localization was always processed separately. The bottom and top views were almost the same but only mutually flipped (in terms of coordinates x, y) and, thus, were usually averaged. If the fish were overlapping in the top view and not in the bottom view (or vice versa), the bottom and the top view were processed separately to give better tracks. The general method pipeline (Fig. 2A) was inspired by Perez-Escudero et al. [13] and Romero-Ferrero et al. [8] but rewritten from scratch. The original algorithm is 2D, but, in our case, three 2D views allow us the reconstruction of 3D positions (with xy redundancy when one view is from above and the second one is from below the aquarium) using re-projecting the coordinates and calculating the overlap of coordinates between the assembled trajectories in different views. This allows us to permute the classifiers that were the first (for all views) to detect the same fish in the corresponding views. In this way, we get a consistent annotation between views.

The first step used a simple foreground detection based on median background estimation. This foreground detection is very robust and shows a high recall. The detected objects were then classified as individuals/overlaps using a small VGG-like CNN [14]. This CNN annotation was generated automatically based on the average areas and perimeters of the detected objects. The detections for each fish were assembled into contiguous sequences—tracklets—based on the overlapping bounding boxes in every two consecutive frames. In some cases, it was possible to resolve the overlaps of two fish [15] to increase the continuity of the tracklets.

The image sequences, where the number of detected objects coincides with the number of fish in the experiment, were used as seeds for the next CNN-based classifier. These sequences for the different aquarium sides were matched using the projection of coordinates and overlap in the time domain [13] and used to train the classifier. Then, all objects were classified. Among them, the trustworthy sequences were selected and used for training. Four passes were typically enough to achieve a high (0.98) classification accuracy. In the same way, the whole tracklets were classified. The tracklets from different views were projected (using known proportion between coordinates) and merged, giving 3D coordinates [mm].

Fig. 3
figure 3

Time windows. (A) The timelines for two objects with relevant correlation time windows of the length L. (B) The favourable (green), neutral (yellow), and unfavourable (red) combinations of correlations for a situation when object i follows object j. Symbols P, N, and F denote the past, current, and future states of the object i or j, respectively

Specific post-processing steps were applied to fill the gaps in the trajectories (where it was possible to do it unambiguously via overlap uniqueness) by a simple linear interpolation between the ends of the individual tracklets. We verified the method’s accuracy only visually. However, the method was designed similar to Perez-Escudero et al. [13] and, thus, its accuracy is assumed to be comparable. Due to the design of the method, visual inspection of the fish trajectories has not shown any significant errors. The output of this step is a time-resolved 3D trajectory of each fish. We intentionally did not describe the tracking algorithm in-depth because the processing and analysis below are not narrowly limited to this data but are also suitable for any position tracking data. The complete code with comments is available in supplementary materials.

Determination of temporal correlations

To compare the results for different species/levels of the organization, we needed to decouple the behaviour from the motion. A classical way to do so is to use a scalar product of the direction vector as a measure of movement correlation [16]. However, this approach covers only orientation but no speed or overall travelling distance. A more holistic way is to use a distance correlation, which also considers a non-trivial translation and handles the complex motion much better. The idea was simple: rewrite the classical covariance by multiplying the signed distances. Let AB be scalar sequences of length N, then

$$\begin{aligned} \text{ cov }(A, B)= & {} \frac{1}{\textsf {N}^2}\sum _{\textsf {i}}^{\textsf {N}} \sum _{j}^{\textsf {N}} (A_\textsf {i} - A_\textsf {j})\cdot (B_\textsf {i} - B_\textsf {j}) \nonumber \\= & {} \frac{1}{\textsf {N}^2}\sum _{\textsf {i}}^{\textsf {N}} \sum _{\textsf {j}}^{\textsf {N}} D_{\textsf {ij}}(A)\cdot D_{\textsf {ij}}(B), \end{aligned}$$
(1)

where \(D_\textsf {{ij}}(X)\) is a signed distance between elements i and j of arbitrary sequence X. If we replace such a signed distance by the Euclidean distance \(D_\textsf {ij}(\textbf{X}) = ||\textbf{X}_\textsf {i} - \textbf{X}_\textsf {j}||\), the obtained function will behave very similar to the classical covariance and let us denote it as a distance covariance. Such a definition allows us to introduce a Pearson-like correlation coefficient, which we denoted the distance correlation, by replacing the covariances with the distance covariances:

$$\begin{aligned}{} & {} \text{ corr }(\textbf{A}, \textbf{B})= \textsf {N}^2 \frac{\sum _{\textsf {i}, \textsf {j}} \Vert \mathbf {A_\textsf {i}}-\mathbf {A_\textsf {j}}\Vert \cdot \Vert \mathbf {B_\textsf {i}}-\mathbf {B_\textsf {j}}\Vert }{ \sqrt{(\sum _{\textsf {i},\textsf {j}} \Vert \mathbf {A_\textsf {i}}-\mathbf {A_\textsf {j}}\Vert ^2)\cdot (\sum _{\textsf {i},\textsf {j}} \Vert \mathbf {B_\textsf {i}}-\mathbf {B_\textsf {j}}\Vert ^2)}},\nonumber \\ \end{aligned}$$
(2)

where \(\textbf{A}, \textbf{B}\) are vector sequences with the length of N = {..., i, j,...} points in the Euclidean space of the same dimensionality. Despite its simplicity, this method was rigorously done and investigated quite recently [17]. However, it varies from [0, 1] instead of \([-1, 1]\), which is a direct consequence of the unsigned distance. As shown in an even more recent paper [18], such characteristics show most of the properties and common meaning of the ordinary correlation.

We used the correlation in Eq. (2) as a measure of the trajectories’ similarity in the most general sense. The correlation between positions in a sliding window for two selected objects on a timeline gave the local similarity of their motion (Fig. 3A).

It is also possible to select windows mutually shifted on the time arrow. For an object, let us select three dedicated time windows centred at times t, \(t-L\), and \(t+L\), respectively, where L is a window size, and denote them as Past, Now, and Future. Thus, any pair of objects gives 9 combinations for two windows. Let us denote the time windows by two-letter codes: the first letter marks the first object’s time state, and the second letter is for the second object’s time state. For example, \({\textbf {PF}}\) means a comparison of the first object’s past position with the second object’s future position. This notation helped us to analyse the physical interpretation and causality of such shifted correlations.

Table 1 The Fisher test contingency table for verification of the hypothesis of measure of following up

Construction of follow-up networks

Let us define the term following in this way: if object i’s future state correlates with object j’s past state, then object i follows object j.

If we consider the hypothesis that i follows j then combinations of time windows can be classified into 3 groups (Fig. 3B): supporting the hypothesis \(S=\{\)NP, FP, FN\(\}\), neutral to the hypothesis \(N=\{\)PP, NN, FF\(\}\), and rejecting the hypothesis \(R=\{\)PN, PF, NF\(\}\). If each time window is taken from each group only once, it gives \(3^3\) combinations. We denoted as an evidence the fact that one correlation is higher than another. Such an evidence is treated as significant if the corresponding correlation is higher than the correlation from the neutral group. In the next step, we evaluated the statistical significance of the hypothesis whether the tendency that object i follows object j and vice versa is higher than that object j follows object i. For each pair of objects i and j, we have four possibilities how the objects follow each other in a time window: i follows j; j follows i; i does not follow j; j does not follow i. The algorithm for constructing the follow-up networks is depicted in Fig. 2B.

The total numbers of the significant evidences during the experiment are

$$\begin{aligned} e^{(i \rightarrow j)}= & {} \sum _{s^{(i \rightarrow j)}}^{S^{(i \rightarrow j)}} \quad \sum _{n}^{N} \sum _{r^{(i \rightarrow j)}}^{R^{(i \rightarrow j)}} [s^{(i \rightarrow j)}> n]\nonumber \\{} & {} \cdot [s^{(i \rightarrow j)} > r^{(i \rightarrow j)}], \end{aligned}$$
(3)
$$\begin{aligned} e^{(j \rightarrow i)}= & {} \sum _{s^{(j \rightarrow i)}}^{S^{(j \rightarrow i)}} \quad \sum _{n}^{N} \sum _{r^{(j \rightarrow i)}}^{R^{(j \rightarrow i)}} [s^{(j \rightarrow i)}> n]\nonumber \\{} & {} \cdot [s^{(j \rightarrow i)} > r^{(j \rightarrow i)}], \end{aligned}$$
(4)
$$\begin{aligned} e^{(i \not \rightarrow j)}= & {} \sum _{r^{(i \not \rightarrow j)}}^{R^{(i \not \rightarrow j)}} \sum _{n}^{N} \sum _{s^{(i \not \rightarrow j)}}^{S^{(i \not \rightarrow j)}} [r^{(i \not \rightarrow j)}> n]\nonumber \\{} & {} \cdot [r^{(i \not \rightarrow j)} > s^{(i \not \rightarrow j)}], \end{aligned}$$
(5)
$$\begin{aligned} e^{(j \not \rightarrow i)}= & {} \sum _{r^{(j \not \rightarrow i)}}^{R^{(j \not \rightarrow i)}} \sum _{n}^{N} \sum _{s^{(j \not \rightarrow i)}}^{S^{(j \not \rightarrow i)}} [r^{(j \not \rightarrow i)}> n]\nonumber \\{} & {} \cdot [r^{(j \not \rightarrow i)} > s^{(j \not \rightarrow i)}], \end{aligned}$$
(6)

where \(e^{(i \rightarrow j)}\), \(e^{(j \rightarrow i)}\), \(e^{(i \not \rightarrow j)}\), and \(e^{(j \not \rightarrow i)}\) are sums of significant evidences when object i follows object j, object j follows object i, object i does not follow object j, and object j does not follow object i, respectively. Variables \(s^{(i \rightarrow j)}\) and \(s^{(j \rightarrow i)}\) denote correlations supporting the hypothesis that object i follows object j and vice versa, respectively. The analogous variables \(r^{(i \rightarrow j)}\) and \(r^{(j \rightarrow i)}\) denote correlations rejecting the hypothesis that object i follows object j and vice versa, respectively. Variable \(n \in N\) is a combination of time windows that is neutral to the hypothesis of following up.

The discrete sums of the significant evidences in Equations (3)–(6) were first used as an input into the Fisher exact test [19] to calculate the precise p value. In this case, the meaning of the p value is a measure of the deviation from the hypothesis that no one follows anyone. From Table 1, the p value of the Fisher test was calculated as

$$\begin{aligned} p= & {} \frac{\left( {\begin{array}{c}e^{(i \rightarrow j)} + e^{(j \rightarrow i)}\\ e^{(i \rightarrow j)}\end{array}}\right) \left( {\begin{array}{c}e^{(i \not \rightarrow j)} + e^{(j \not \rightarrow i)}\\ e^{(i \not \rightarrow j)}\end{array}}\right) }{\left( {\begin{array}{c}e^{(i \rightarrow j)} + e^{(j \rightarrow i)}e^{(i \not \rightarrow j)} + e^{(j \not \rightarrow i)}\\ e^{(i \rightarrow j)} + e^{(i \not \rightarrow j)}\end{array}}\right) } \equiv \frac{\left( {\begin{array}{c}e^{(i \rightarrow j)} + e^{(j \rightarrow i)}\\ e^{(j \rightarrow i)}\end{array}}\right) \left( {\begin{array}{c}e^{(i \not \rightarrow j)} + e^{(j \not \rightarrow i)}\\ e^{(j \not \rightarrow i)}\end{array}}\right) }{\left( {\begin{array}{c}e^{(i \rightarrow j)} + e^{(j \rightarrow i)}e^{(i \not \rightarrow j)} + e^{(j \not \rightarrow i)}\\ e^{(j \rightarrow i)} + e^{(j \not \rightarrow i)}\end{array}}\right) } \nonumber \\\equiv & {} \frac{(e^{(i \rightarrow j)} + e^{(j \rightarrow i)})!(e^{(i \not \rightarrow j)} + e^{(j \not \rightarrow i)})!(e^{(i \rightarrow j)} + e^{(i \not \rightarrow j)})!(e^{(j \rightarrow i)} + e^{(j \not \rightarrow i)})!}{e^{(i \rightarrow j)}!e^{(j \rightarrow i)}!e^{(i \not \rightarrow j)}!e^{(j \not \rightarrow i)}!(e^{(i \rightarrow j)} + e^{(j \rightarrow i)} + e^{(i \not \rightarrow j)} + e^{(j \not \rightarrow i)})!}. \end{aligned}$$
(7)
Fig. 4
figure 4

Follow-up networks. (A) Two typical follow-up networks in a school of 6 fish. (B) The kernel density estimation (KDE) and the Gaussian fitting for the in-degree parameter of the follow-up network. For the out-degree parameter, the distributions are analogous to the in-degree parameter. (C) The distribution of intra-experiment in-degree Gaussian fitting component proportions. (D) The distribution of intra-experiment out-degree Gaussian fitting component proportions and mean values

The relations i-j, i.e., the time windows, whose p values lie under the criterion of level of significance \(\alpha =0.05\) (i.e., \(p\le C_{\alpha }\)) were rejected. Removing these spurious correlations [20] greatly improved the robustness of the obtained networks.

For the rest of relations i-j, the follow-up probabilities were then defined using the sums of significant evidences written in Eqs. 36, e.g.,

$$\begin{aligned} P^{(i \rightarrow j)}= & {} \frac{e^{(i \rightarrow j)}}{e^{(i \rightarrow j)} + e^{(i \not \rightarrow j)}} \nonumber \\= & {} \left[ 1+\frac{e^{(i \not \rightarrow j)}}{e^{(i \rightarrow j)}}\right] ^{-1}, \end{aligned}$$
(8)

where \(P^{(i \rightarrow j)}\) is a chance that object i follows object j. The advantages of the proposed measure of following up are its clear meaning and decoupling from direct measurements by transformation through the correlation space to the probability space. It allows us to compare and analyse very different systems because absolute correlation values do not matter, only their combinations and ratios. The obtained follow-up matrix is of the size \(n_{obj}\times n_{obj}\) in the case of \(n_{obj}\) objects and is typically sparse. It is natural to convert it to a directed graph (treating the following matrix as an adjacency matrix). The orientation of the network edges follows the comparison of the values of probabilities \(P^{(i \rightarrow j)}\) and \(P^{(j \rightarrow i)}\). Examples of such graphs (visualised in a way that a higher follow-up probability corresponds to a shorter edge) are shown in Fig. 4A. These graphs are defined at each recorded time (Video in Supplementary) and have an apparent behaviouristic meaning, showing who follows whom.

The approach described above utilises only one parameter - the time window size L. The obtained network is thus a function only of the time t and the parameter L.

Biological relevance

To verify the method, we conducted a straightforward experiment: the fish were freely swimming in an aquarium for 15 min. The experiment was repeated ten times on different days to investigate the fish school’s stability and the measurements’ repeatability. An output of the method is a directed graph defined for each time frame (Video in Additional files). Interpretation of these graphs is straightforward: the nodes correspond to the observed objects (fish individuals in our case), length of the edges inversely reflects the follow-up probability (the shorter the edge, the higher the chance of following up). The arrows show who follows whom.

To show the variability of the follow-up network over time, we partition the graphs adjacency matrices into two groups by k-means. Figure 4A shows the follow-up networks dominating each group. Of course, the actual situation is more complicated, and two groups of follow-up networks are only the minimal illustrative example. The networks (Fig. 4A) are almost the same, but the directions of edges are opposite, and fish \(\#1\) and \(\#6\) swapped their positions. Fish \(\#3\) situated in the center has the most prominent set of features among all. Even without biological analysis, one can assess that this fish is somehow important. Fish \(\#3\) has relatively strong connectivity to all other fish and is completely changing its role in the school (depicted by a reverse direction of the network edges). Fish \(\#2\) is also connected to everyone, but the connections are weak (the lengths of network edges are relatively long). Fish \(\#5\) and \(\#6\) changed their mutual distances in the graphs and did not show distinct features.

The number of edges entering/emerging from a network node in a time window characterises the in/out-degree. For each node, two centrality measures—in-degree and out-degree—quantify the intensity of the relations among the fish. After that, for all ten experiments, a probability density function for values of in/out-degree parameters was estimated by the kernel density estimation (Fig. 4B). The obtained distribution is not unimodal nor trivial. To switch to the parametric statistics, a mixture of 4 Gaussians was fitted to this distribution (Fig. 4B). Then, both in- and out-degree parameters can be described by four sets of [component proportion, mean value, variance] corresponding to one peak in the distribution.

The component (peak area) proportions in the probability density functions (compared with mean values or variances) provide the most straightforward interpretation of the results. This approach is closely related to frequency analysis. A higher value of the component proportion corresponds to a longer time spent in the relevant mean value (the components are sorted in ascending order according to the mean values). A robust measure of in/out-degree due to a relatively low frequency of occurrence is a sum of the 3\(^{\text{ rd }}\) and the 4\(^{\text{ th }}\) component proportion, showing how frequently the fish exhibits a high in/out-degree state. The box plot of in-degree states for all six fish from 10 experiments is depicted in Fig. 4C. Fish \(\#3\) and \(\#4\) show the highest in-degree state. These individuals are frequently followed and lead the school. However, \(\#3\) has a low out-degree and rarely follows someone, whereas \(\#4\) has a high out-degree. We interpret this as meaning that fish \(\#3\) is a leader, whereas fish \(\#4\) can be called a coordinator. As supported in Fig. 4A, the issue is that there is no clear definition of possible roles, only a consensus that they exist [21]. The role of the coordinator changes diametrically (the coordinator can be followed or following). In general, such a network shows an oscillatory process over time and is not trivial (it is impossible to obtain from one cluster another by simply changing the direction of edges). We have not investigated this phenomenon deeper yet, but a similar oscillation process has already been observed before [22]. The mean value, which corresponds to the out-degree Gaussian components 3 and 4 for fish \(\#1\), is significantly lower than the average (Fig. 4D). The combination with a very low in-degree component proportion may support the hypothesis that fish \(\#1\) is oppressed or, more precisely, unsocial. Fish \(\#1\) is never a leader but also does not follow the school too much.

The typical results reflecting the mean “intensity” of the following-up (Fig. 4A) may not correlate with the time statistics (frequency) of what fraction of time the agent exhibits a given value of the following-up (Fig. 4C–D). Figure 4A shows that fish \(\#3\) has strong connectivity as both a follower and a leader to most individuals in the school. But according to Fig. 4C–D, fish \(\#3\) is much more often followed than following, which supports assumptions about its role.

Thus, even this most simple analysis of the obtained graphs gives an insight into the school structure, which is consistent between the experiments. Of course, the detected roles must be rigorously tested in different schools, even with other species. Nevertheless, the complexity of the observed phenomena is quantifiable. Our results agree with the trend recently published in research papers [21, 23, 24] that variability in individual behaviour matters in collective behaviour. We think that this variability is essential to explain the observed phenomena. But further studies are still needed, including a detailed comparison of the obtained results with visual observations.

Discussion and conclusions

Commercially available deep-learning-based devices [25] offer to track and analyse only one animal individual in 2D. Indeed, the number of research papers describing fish schools reaches hundreds yearly but none of the experiments is followed by the automatic network analysis. The possibility to use the network theory to understand animal collectives has been solved only theoretically, at most to the level of semi-automation [26,27,28]. The method proposed in this paper is the first which combines and fully automates 3D individualised object tracking together with networking for potential practical applications.

The approach to collective movement quantification proposed here is informative and easy to interpret. It can be applied to any kind of multiple trajectory data, irrelevant to the species, level or even type of organisation (e.g., for flocks of drones or military ships). The approach has only one adjustable parameter—the time window selected according to the system type—making the method model-free, giving a spectrum of networks for all windows.

Even a limited application, such as a single group of a single species, showed a fascinating result: the observed roles in the school are not demonstrated all the time but rather for a time of necessity. This fact addresses a pitfall of many psychological approaches, which assign a single personality type (temperament or MBTI model [29]) to an agent unambiguously and then predict interactions based on this assignment. However, it does not work in this way, and we can see why: even very primitive species show flexible roles dependent on a given situation. It is more suitable to speak about the distribution of such assignments (attributes, categories) or, better, conditional distributions of the roles.

The obtained results also point to pitfalls of classical—homogeneous—agent models: even in our simple experiments, the agents received a set of parameters that differed significantly from each other and could be named as personality with considerable caution. We believe individual variability is essential to the observed collective phenomena, but we should conduct additional experiments to prove this.

The main advantage of the proposed method, however, is not its generality but its applicability. The method allows converting widespread, well-known tracking data to time-dependent directed graphs (sometimes called coevolution networks [30, 31]), which are one of the hottest topics of modern data mining [32] and collective behaviour analysis [11]. Currently, the definition of an agent’s role is up to the scientist and his optic. However, the introduction of mathematically defined, graph-based roles can greatly benefit information exchange and convergence between ethological fields and scientific schools. This is a topic for our future research, together with applying this framework to agents at different environmental conditions and levels of biological organisation.

The network and hierarchical data extractable by this method can serve as inputs for further modelling and analysis of complex biological systems, including using machine learning or agent-based approaches such as reinforcement learning. The collective analysis described here can be extended to predict the movement of each individual and compare this prediction with the actual behaviour. We expect this method to benefit all parties—biologists and data scientists—and encourage further collaboration and knowledge transfer up to the level of commercial and industrial applications, see [25].

Supplementary information The Matlab code and the supplementary video are available at Dryad at [33]. Upon request, we get the image data for further scientific analysis.

Supplementary 1—Video Time changes of the follow-up networks (i.e., fish school configurations) during the experiment. Top view (upper left), bottom view (upper right), front view (lower left) of fish school in the aquarium, and follow-up network (lower right).