1 Introduction

Sociality is an essence of life in many species such as ants, bees, monkeys, and humans. Sociality is characterized by the interactions among the individuals in a group and/or those among groups, which appear in their behavior such as movement. The movement of agents has easily been recorded thanks to the IoT technologies and produces a huge amount of data.

Recent technologies for mobility are so innovative that they have been changing even sports scenes [1]. However, the tools for science are not sufficiently developed since any sensors are not equipped on agents in scientific experiments or wildlife observations. In addition, the interactions among agents are not objectively analyzed so far due to the lack of tools.

In short, therefore, we need the three steps below for the mobility analyses. The first step is to track and identify agents therein. The second step is to extract interactions from the tracking data. The last step is to analyze graphs since the interactions are represented as a graph, where an agent and an interaction correspond to a node and an edge, respectively.

This chapter introduces some tools recently developed for the three steps above. As an example of the first step, we introduce a deep learning-based multi-animal tracking system. The system is named Deep MAnTra and is applicable to non-human primates studies [8], since they experience stress and show less movement when they are relocated and socially isolated [7, 12]. As an example of the second step, we introduce a method for horse herding [2]. An example of the last step is a technique for graph neural networks called MLAP (multi-level attention pooling) [5]. These tools help analyze mobility data given as videos.

2 Deep Learning-Based Multi-animal Tracking

The first step of the mobility analyses is to track and identify agents in videos. In this section, we introduce a deep learning-based multi-animal tracking system called the Deep MAnTra (Fig. 6.1) [8]. Multi-animal tracking in a wild environment is a considerably hard challenge because animals show agile motions, however, it is an active research problem in computer vision, since this kind of methods is not only applicable to animals but also able to cover the mobility analyses.

Fig. 6.1
Three photographs. 1 highlights the monkeys 99.82, 99.95, 99.81, and 99.67 on the trees. 2 highlights the monkey 99.42 on the tree. 3 highlight the monkey 99.88 inside a netted box and the monkey 99.9 above the box.

Image examples of multiple monkey detection with Deep MAnTra. Each green box indicates the region showing a monkey. Our approach successfully detected monkeys in these examples even if they are occluded

In Deep MAnTra, a monkey detector using You Only Look Once (YOLOv4) [9] is trained with carefully designed transfer learning. Using the resulting box detections from our monkey detection model, SuperGlue [10] and Murty’s algorithm are used for re-identifying the monkey individuals across the succeeding frames.

As a result, the Japanese macaque detection model trained using a YOLOv4 architecture with spatial attention module, combined with the Mish activation function based on a 3-stage training curriculum yielded the best performance with a mean AP50 of 96.59%, a precision score of 93%, a recall of 96%, and a mean IOUAP@50 of 77.2%. Deep MAnTra can prove effective and reliable for animal behavior studies, since it achieved 91.35% MOTA (Multiple Object Tracking Accuracy) even on a heterogeneous dataset.

3 Horse Herding Analysis

The second step is to extract interactions from the tracking data. In this section, we introduce an example of interaction analyses, which proposed a mathematical model of feral horses [2].

Fig. 6.2
A schematic diagram has a repulsion zone, an attraction zone, and a synchronization zone from the inner to the outer. Arrows from the center point to attraction to neighbor A, repulsion from horse R rep f, center of mass com, inertia H, repulsion from herder R rep h, and synchronization.

Schematic diagram of the zones and forces of interaction experienced by a mare while being herded by a stallion [2]

The purpose of this model is to study the mechanism of group aggregation by herding, which is necessary for a stallion (the male leader horse of a group) to keep his harem of mares (female horses). Herding is also seen between a shepherd dog and a sheep, where the shepherd dog moves to aggregate a group of sheep and move the group in a certain direction with two crucial strategies [11]. While herding, different forces of interaction come into play, reflecting the dynamics of the movements of the sheep and the strategies of the dog. The herding of sheep by a shepherd dog is explained as a social force model [3] comprising self-propelled particles with a constant speed. In the case of mares, they do not move with a constant speed. Thus, we proposed a social force model where the motion is a linear combination of various forces such as repulsion from the stallion and attraction to the center of mass.

The forces employed in our social force model are listed below (Fig. 6.2).

  1. Inertia:

    A mare tends to move in the same direction as the previous one.

  2. Repulsion from the stallion:

    When a mare is within a certain distance from the stallion, she experiences a force of repulsion directed away from the stallion.

  3. Short-range repulsion:

    A mare experiences a force of repulsion from other mares when they are within the repulsion zone, which means that a mare tends to retain an exclusive zone around her.

  4. Medium-range attraction:

    A mare experiences a force of attraction from other mares when they are within the attraction zone, which means that a mare likes to match its direction to her specific zone.

  5. Synchronization attraction:

    A mare matches her direction with the nearest moving mare within a set region, which means that a moving mare may be escaping from danger.

  6. Attraction to the center of mass:

    Mares experience a force of attraction toward the group’s center of mass, since divergent positions increase the risk of predation.

The driving force of a mare is a weighted sum of these components. We assumed that each mare has specific weights and estimated them by minimizing the squared error between the trajectories of the model and the measurements subject to the condition that the coefficients are non-negative, which results in a non-negative factorization problem.

Table 6.1 Weights of the forces obtained
Fig. 6.3
A line graph of coordinate position versus coordinate position. It plots ten lines that rise and then fall.

Comparison of actual mare trajectories with the best-fit model

The estimated weights and the reproduced trajectories are in Table 6.1 and in Fig. 6.3, respectively. We can see two points from the weights. One is that the weights \(c_4\) and \(c_5\) take small values compared to the others, which means that these components are ignorable in our model. This is an advantage of using the non-negative optimization. The other is that the weights of the mares are widely distributed, except for Mares 1 and 2. This variation may indicate forms of relationships among individuals within the harem. In fact, Mares 1 and 2 are sisters.

In this section, we introduced a mathematical model study for herding in mares as an example of interaction analyses. The social force model used therein is general, easy to use, and explainable for interaction analyses. In addition, the weights represent a personality of the agent and then are useful for further analyses.

4 Multi-level Attention Pooling for Graph Neural Networks

As an example of machine learning techniques for graphs, we introduce deep graph neural networks called MLAP (multi-level attention pooling) [5].

Graph-structured data are found in many fields since a wide variety of natural and artificial objects can be expressed with graphs, such as molecular structural formula, biochemical reaction pathways, brain connection networks, social networks, and abstract syntax trees of computer programs as well as mobility analysis. Because of this ubiquity, machine learning methods on graphs have been actively studied. Thanks to rich information underlying the structure, graph machine learning techniques have shown remarkable performances in various tasks.

In contrast to classical graph machine learning methods using hand-crafted features, recent years have witnessed a surge in graph representation learning (GRL). Recently, graph neural networks (GNNs) have rapidly emerged as a new framework for GRL.

Itoh et al. have proposed a method to learn graph representation for graph-level prediction tasks, such as molecular property classification, by using multiple representations in different localities [5]. For this representation learning, they proposed a multi-level attention pooling (MLAP) architecture that introduces an attention pooling layer [6] for each message passing step to compute layer-wise graph representations. Then, the proposed architecture aggregates them to compute the final graph representation. Thus, the MLAP architecture can focus on different nodes (or different subgraphs) in each layer with different levels of information localities, which leads to better modeling of both local structural information and global structural information (Fig. 6.4).

Fig. 6.4
A flow diagram of the architecture of a G N N with M L A P. IN points M P, M P, M P, and all M P point to pooling, then to aggregator, and finally to out.

Architecture of a GNN with MLAP. There is a dedicated pooling layer for each message passing layer to compute layer-wise graph representation. The aggregator computes the final graph representation from the layer-wise graph representations. M.P.: message passing

The MLAP architecture is shown to improve the graph classification performance compared to the baseline architectures [5]. In addition, analyses of the layer-wise graph representations suggest that aggregating information from multiple levels of localities indeed has the potential to improve the discriminability of learned graph representations.

The MLAP architecture is applicable not only to classification task, but also to sequence generation task (Graph2Seq) [4]. Further studies will widen the applicability more in the future.

5 Conclusions

This chapter introduced some tools for three steps for mobility analyses. The first is a multi-animal tracking system based on deep learning techniques called Deep MAnTra [8]. The second is a social force model that extracts interactions from the tracking data as well as characterizes the individuals, applied to horse herding [2]. The last is a graph neural network based on a representation learning called MLAP [5]. These techniques will help us analyze mobility data given as videos.