1 Introduction

As RoboCup progresses over the years, high-level skills become necessary to maintain a competitive level. Such skills are no longer restricted to the detection of field objects or to the self-localization of the robot players, but include team’s skills based on the tracking (position estimation) of teammates and opponents. Examples of these skills are ball passing, adversaries’ tracking and team’s formation.

When the observability of the game and the players is not an issue to address, as in the case of the Small-size league, high- levels skills based on the tracking of the robot players have been already implemented (e.g. reactive coordination [1], and the analysis and learning of the opponent’s strategies [2, 3]).

The situation in the Standard Platform League (SPL) is more complex given the restricted field of view of the robot´s camera and the low computational resources of the Nao robots; in this league the detection of other robots is not robust and the construction of a good map of obstacles/players is a difficult process. Current standard robot tracking systems maintain/update the robot estimations using Kalman filters (e.g. [4, 5]). However, in this tracking paradigm there is no clear solution of the data association problem, and several heuristics need to be used in order to eliminate and merge hypotheses.

In this context the main goal of this paper is to propose a new methodology for the robust tracking (position estimation) of multiple soccer robots using the Random Finite Sets (RFS) framework, which allows to overcome the drawbacks of current approaches. The proposed methodology is inspired in [6] where the term Probability Hypothesis Density (PHD) was introduced as the first moment of a point process. Then, the PHD filter is presented in [7] as a way to maintain hypotheses of multiple objects using sets instead of vectors to describe the object’s states.

There are several works that use this new framework in the literature, concerning all kind of problems and subjects [8]. Principally, it is used in highly complex environments to track large amounts of features, which makes it very expensive computationally. But, in the SPL problem the number of features (robots) to detect rises to 10 in the worst case, which make it computationally tractable for a Nao robot’s CPU.

The paper is organized as follows: in Sect. 2 the problem to resolve is described. Section 3 presents a brief introduction to RFS. Section 4 shows the implementation used in this work, and Sect. 5 the experimental results. Finally, conclusions are drawn in Sect. 6.

2 Problem Description: Data Association When Tracking Multiple Players in Robot Soccer

As already mentioned, knowing the position of the other robots in the field is relevant for implementing high-level soccer behaviors. In this work we will call map of obstacles to a map that a given player builds, and which includes the positions on the field of every other robot player, teammate or opponent (see Fig. 1). We will denote observations to the detections of these robots, and obstacles to the estimated position of these robots in the map.

Fig. 1.
figure 1

Example of a map of obstacles. The white rectangle represents the robot which is building the map. The black crosses represent the robots/obstacles positions and the yellow circles the corresponding covariance of each representation. The lighted zone represents the Field of View of the camera. (Color figure online)

Most of the existing methods used for estimating the map of obstacles employ a classical approach with a vector representation of the obstacles (robots), which are propagated over time using a Bayesian filter (e.g. an EKF filter). However, it has been demonstrated that the use of a vector representation of the obstacles has numerous drawbacks, mainly related to the data association between new and past observations (obstacles) [9]. Some examples of those problems are illustrated in Fig. 2: Fig. 2(a) shows a trivial case where the data association between two new observations (red crosses) and two obstacles (black crosses) is trivial. However, the data associations are not trivial in the cases illustrated in Fig. 2(b) and (c). First, Fig. 2(b) shows the case when two new measurements have a similar distance to the obstacle, in addition to be not very close to the obstacle (see the covariance of the obstacle representation). So, depending of the implemented data association strategy, this could end in one, two or three obstacles in the map. Figure 2(c) show a case where two obstacles are very close, so the new measurement could be associated with any of them, and leave the other with no update for that frame. For these cases, most of methods use heuristics to associate new measurements to the obstacles, or to create new obstacles if no association is made. But, in a highly dynamic environment as a robot soccer match, these methods may produce several bad associations or missed detections.

Fig. 2.
figure 2

The red crosses represent measurements of the sensor, black crosses represent the robots/obstacles positions and the yellow circles the corresponding covariance of each representation. (a) represents an easy case of data association, while (b) and (c) show more complex cases. (d) represents a case when an obstacle that should be detected by the robot is not sensed. (Color figure online)

Finally, Fig. 2(d) describes a situation where no measurement is obtained for an obstacle inside the Field of View (FoV). For the classical approach, this is not different from an obstacle outside the FoV, and its only consequence is that the obstacles’ covariance grows. So, depending of the speed of the covariance’s growing (which maintain the obstacles outside the FoV), the obstacles inside the FoV will be maintained the same time that the others, although they do not receive any measurements.

3 Multi-target Tracking with Random Finite Sets

The main idea of the proposed methodology is to use finite sets instead of vectors for representing both observations and obstacles, which can encapsulate positions and quantity uncertainty. As has been widely demonstrated [7, 9, 10], the first moment of RFS, known as Probability Hypothesis Density (PHD), can be used to construct a filter which propagates the PHD of the map posterior instead of the map posterior itself.

3.1 PHD Filter

The PHD function \( v \) at a point represents the density of the expected number of obstacles occurring at that point of the state space (map). Therefore, a property of the PHD is that for any given region S of the map

$$ {\mathbb{E}}\left[ {\left| {M \cap S} \right|} \right] = \mathop \smallint \limits_{\varvec{S}} v\left( m \right)dm $$
(1)

where \( M \) represents the map RFS and \( \left| \cdot \right| \) denotes the cardinality of a set. This means that, by integrating the PHD on any region S of the map, we obtain the expected number of obstacles in S [7].

The PHD filter considers the following two steps [7]:

  • Prediction:

$$ v_{k|k - 1} \left( m \right) = v_{k - 1} \left( m \right) + b_{k} \left( m \right) $$
(2)

where \( b_{k} \left( m \right) \) represents the PHD of the new obstacles in time \( k \).

  • Update:

$$ v_{k} \left( m \right) = v_{k|k - 1} \left( m \right)\left[ {1 - P_{D} \left( m \right) + \mathop \sum \limits_{{z \in Z_{k} }} \frac{{P_{D} \left( m \right)g_{k} (z|m))}}{{c_{k} \left( z \right) + \smallint P_{D} \left( \xi \right)g_{k} \left( {z |\xi } \right)v_{k|k - 1} \left( \xi \right)d\xi }}} \right] $$
(3)

where \( {\text{P}}_{\text{D}} \left( m \right) \) represents the probability of detecting an obstacle at \( m \), \( g_{k} (z|m) \) represents the likelihood that \( z \) is generated by an obstacle at \( m \) at time \( k \) (i.e. the measurement likelihood) and \( c_{k} \left( z \right) \) is the clutter intensity at time \( k \).

3.2 Considerations

To adopt this framework to the presented problem, some considerations must be done. As presented before, \( P_{D} \left( m \right) \) represents the probability of detecting an obstacle at \( m \), but it does not take into account the capability of the robot to sense at \( m \). So the real probability is represented by \( P_{D} \left( {m |X_{k} } \right) \) where \( X_{k} \) represent the position of the robot in time \( k \). The same occurs with \( g_{k} , c_{k} \) and \( b_{k} \), because they also depend of the robot position.

4 Implementation

There are many implementations of RFS in the literature, but most of them are time consuming, especially for Nao robots with limited computational capabilities [11, 12]. Hence we use the Mixture of Gaussian implementation (GM-PHD) [13], because it is very time efficient and it allows to easily get the positions of the obstacles from the PHD.

4.1 GM-PHD Implementation

The main idea is to represent any RFS as a mixture of Gaussians. Therefore, both obstacles and detections are represented by Gaussians. But, to represent the position and number uncertainty of the obstacles present in the field, it is necessary to add a weight to every Gaussian. In this way their positions represent the multitude of location of obstacles in the map while their weights represent the number of obstacles in that given region. So, a PHD map is a Gaussian Mixture of the form,

$$ v_{k - 1} \left( {m |X_{k - 1} } \right) = \mathop \sum \limits_{j = 1}^{{J_{k - 1} }} \omega_{k - 1}^{\left( j \right)} {\mathcal{N}}\left( {m;\;\mu_{k - 1}^{\left( j \right)} ,\;P_{k - 1}^{\left( j \right)} } \right) $$
(4)

which is a mixture of \( J_{k - 1} \) Gaussians, with \( \omega_{k - 1}^{\left( j \right)} , \;\mu_{k - 1}^{\left( j \right)} \) and \( P_{k - 1}^{\left( j \right)} \) being their corresponding prior weights, means and covariances, respectively. The same form is used to represent the new obstacles at time \( k \), \( b_{k} (m|Z_{k - 1} ,X_{k - 1} ) \), as

$$ b_{k} \left( {m |Z_{k - 1} ,\;X_{k - 1} } \right) = \mathop \sum \limits_{j - 1}^{{J_{b,k} }} \omega_{k|k - 1}^{\left( j \right)} {\mathcal{N}}\left( {m;\;\mu_{k|k - 1}^{\left( j \right)} ,\;P_{k|k - 1}^{\left( j \right)} } \right) $$
(5)

where \( J_{b,k} \) is the number of Gaussians in the new PHD at time \( k \), \( Z_{k - 1} \) is the vector of measurements at time \( k - 1 \) and \( \omega_{k|k - 1}^{\left( j \right)} ,\mu_{k|k - 1}^{\left( j \right)} \) and \( P_{k|k - 1}^{\left( j \right)} \) determine the shape of the PHD of new obstacles. Therefore, the predicted PHD of the map, shown in (2) is also a Gaussian mixture

$$ v_{k|k - 1} \left( {m |X_{k} } \right) = \mathop \sum \limits_{j = 1}^{{J_{k|k - 1} }} \omega_{k|k - 1}^{\left( j \right)} {\mathcal{N}}\left( {m;\;\mu_{k|k - 1}^{\left( j \right)} ,\;P_{k|k - 1}^{\left( j \right)} } \right) $$
(6)

where \( J_{k|k - 1} = J_{k - 1} + J_{b,k} \) are the number of Gaussians representing the union of the prior map PHD \( v_{k - 1} \left( {m |X_{k - 1} } \right), \) and the new obstacles PHD at time \( k \). \( \omega_{k|k - 1}^{\left( j \right)} ,\;\mu_{k|k - 1}^{\left( j \right)} \) and \( P_{k|k - 1}^{\left( j \right)} \) represents the shape and form of the Gaussians of the prior map PHD if \( j < J_{k - 1} \) and the shape and form of the Gaussians of the new observations PHD otherwise.

So, the posterior PHD shown in (3) is also a Gaussian mixture of the form

$$ v_{k} \left( {m |X_{k} } \right) = v_{k|k - 1} \left( {m |X_{k} } \right)\left[ {1 - P_{D} \left( {m |X_{k} } \right) + \mathop \sum \limits_{{z \in Z_{k} }} \mathop \sum \limits_{j = 1}^{{J_{k|k - 1} }} v_{G,k}^{\left( j \right)} \left( {z,\;m |X_{k} } \right)} \right] $$
(7)

where \( v_{G,k}^{\left( j \right)} \) corresponds, according to the general PHD Filter update equation, to

$$ v_{G,k}^{\left( j \right)} \left( {z,\;m |X_{k} } \right) = \omega_{k}^{\left( j \right)} \left( {z|X_{k} } \right){\mathcal{N}}\left( {m;\;\mu_{k|k}^{\left( j \right)} ,\;P_{k|k}^{\left( j \right)} } \right) $$
(8)
$$ \omega_{k}^{\left( j \right)} \left( {z |X_{k} } \right) = \frac{{P_{D} \left( {m |X_{k} } \right)\omega_{k|k - 1}^{\left( j \right)} q^{\left( j \right)} \left( {z|X_{k} } \right)}}{{c_{k} \left( z \right) + \mathop \sum \nolimits_{i = 1}^{{J_{k|k - 1} }} P_{D} \left( {m |X_{k} } \right)\omega_{k|k - 1}^{\left( i \right)} q^{\left( i \right)} (z|X_{k} )}} $$
(9)

where \( q^{\left( i \right)} \left( {z|X_{k} } \right) = {\mathcal{N}}\left( {z;\;H_{k} \mu_{k|k - 1}^{\left( i \right)} ,\;S_{k} } \right) \) is the measurement likelihood. The components \( \mu_{k|k}^{\left( i \right)} \) and \( P_{k|k}^{\left( i \right)} \) can be obtained from the standard EKF update equations,

$$ S_{k}^{\left( i \right)} = R_{k} + \nabla H_{k} P_{k|k}^{\left( i \right)} \nabla H_{k}^{T} $$
(10)
$$ K_{k}^{\left( i \right)} = P_{k|k}^{\left( i \right)} \nabla H_{k}^{T} \left[ {S_{k}^{\left( i \right)} } \right]^{ - 1} $$
(11)
$$ \mu_{k|k}^{\left( i \right)} = \mu_{k|k - 1}^{\left( i \right)} + K_{k}^{\left( i \right)} \left( {z - H_{k} \left( {\mu_{k|k - 1}^{\left( i \right)} } \right)} \right) $$
(12)
$$ P_{k|k}^{\left( i \right)} = \left[ {I - K_{k}^{\left( i \right)} \nabla H_{k} } \right]P_{k|k - 1}^{\left( i \right)} $$
(13)

with \( \nabla H_{k} \) being the Jacobian of the measurement equation with respect to the obstacles estimated location.

4.2 Algorithm

In order to use the presented framework, it is necessary to create Gaussians according to the measurements of the robot’s sensors. Therefore \( b_{k} (m|Z_{k - 1} ,\;X_{k - 1} ) \) is obtained from the measurements \( Z_{k - 1} \) and the previous robot position \( X_{k - 1 } \). The components of this Gaussians are determined according to

$$ \omega_{b,k}^{\left( j \right)} = 0.01, \;\;\; \mu_{b,k}^{\left( j \right)} = h^{ - 1} \left( {z_{k - 1}^{j} ,\;X_{k - 1} } \right), $$
$$ P_{b,k}^{\left( j \right)} = h'\left( {\mu^{\left( j \right)} ,\;X_{k - 1} } \right)R\left[ {h'\left( {\mu^{\left( j \right)} ,\;X_{k - 1} } \right)} \right]^{T} $$

where \( h^{ - 1} \) is the inverse measurement equation, \( R \) is the measurement noise covariance and \( h'\left( {\mu^{\left( j \right)} ,\;X_{k - 1} } \right) \) is the Jacobian of the measurement model function with respect to the Gaussian state, \( j \). Therefore, the implementation initially considers all detections at time \( k - 1 \) to be potential new features at time \( k \).

Then, as every Gaussian is combined with every measurement to generate a new Gaussian, the numbers of Gaussians grow exponentially in every frame. That is why pruning and merging operations are necessary. Gaussians which are determined sufficiently close (through a Mahalanobis distance threshold) are merged into a single Gaussian. But this does not represent an elimination of an obstacle because one Gaussian can represent more than one obstacle by its weight; these values are added when two or more Gaussians are merged.

Figure 3 shows the pseudo code of the complete algorithm.

Fig. 3.
figure 3

Pseudo code of the general algorithm that calculates the PHD that represent the map of obstacles.

With this algorithm, the PHD of the map is obtained. Then to get the position of the obstacles in the map, it is necessary to evaluate every Gaussian’s weight. If this values exceeds a given threshold, then the obstacle position is given by the Gaussian’s mean vector, and it’s added to a vector that represent the current map. Figure 4 shows this algorithm.

Fig. 4.
figure 4

Pseudo code of the algorithm that drawn obstacles according to the PHD of the map.

4.3 Application

Using the proposed methodology, it is obtained a representation of the obstacle map for the detection of soccer players (Nao robots) in the SPL league. To do this, the methodology is used as follows:

  1. i.

    State space: in order to describe the obstacles in the field, the state space is a vector \( p = \left( {x,y} \right) \) that represents positions on the field according to the center of the field as \( \left( {0,0} \right) \) of the coordinate system.

  2. ii.

    Sensor: the used sensor is the Nao camera. This implies that transformations must be done in order to describe the measurements as positions on the field, using the camera’s intrinsic and extrinsic parameters, as well as the position of the robot on the field. The detections are made with the same robot´s detector provided in the B-Human Code Release 2014 [14].

  3. iii.

    Probability of detection \( P_{D} \): Given that the sensor is the camera of the robot, the probability of detection is given by the field of view of it and the position of the obstacle relative to the robot. This implies that \( P_{D} \) must be recalculated in every frame for all the Gaussians of the map.

  4. iv.

    Moving obstacles: The movement of the obstacles is taken into account by growing the covariance of the mixture of Gaussians in every frame instead of adding a movement model into the prediction step.

5 Results

In order to evaluate the proposed methodology several experiments with real Nao robots in a real SPL field were carried out. Given that we needed to measure the accuracy of the obstacle’s map (i.e. robots map), a validation system consisting of a global vision system (camera over the field) for measuring the Ground Truth was implemented.

First, for very simple initial conditions, we carried out only one experiment in which a robot is placed in the center of the field and it observes three other static robots. The robot is moving its head all the time, and given its reduced field of view, at a given moment it is able to observe just one of the other robots and in some few cases two. The proposed GM-PHD based method is compared with a classical EKF based method. As expected, given the simplicity of the problem, both systems obtained an average error of about 20 cm in the position of the robots. Both methods run in real time, being the processing time of the GM-PHD method 0.13 ms, and the processing time of the EKF method 0.07 ms.

Secondly, the proposed GM-PHD based method and the classical EKF based method were compared in a set of experiments under a variety of more realistic and dynamic conditions, where the observer robot, i.e. the one that builds the map, moves as well as some of the observed robots. Figure 5 shows this set of experiments.

Fig. 5.
figure 5

Map building experiments under dynamic conditions. Four different situations are described in (a), (b), (c) and (e). In these diagrams the black asterisks represent the real position of the robots, obtained by the Ground Truth system; Blue crosses represent the robots’ positions calculated by a EKF tracking method; The colored ellipses represent the robot estimations of the GM-PHD based method, and the associated number represents the weight of each Gaussian. The white dashed lines represent the trajectory of moving robots. (d)/(f) shows the estimated number of robots corresponding to situation (c)/(e). (Color figure online)

For the first experiment of this set, five static robots are placed on the field, and the observer robot performs a ready positioning, i.e. the robot walks from its starting position to their legal kick-off position. The observer robot moves its head from left to right all the time, hence the other robots are not inside the FoV in every frame. As can be seen in Fig. 5(a), the differences of the GM-PHD method and the classical EKF method, in term of a multi-tracking criteria, are notorious. While the GM-PHD approach correctly describes the presence of obstacles in most of the positions of the field, the classical one shows an incorrect number of obstacles for each real one. This is because the new observations are not correctly associated with the previous ones, due to the odometry errors and the non-constant observations; then new hypothesis are drawn incorrectly by the EKF method.

In the second experiment one moving robot observes five other robots; one moving robot and four static ones. The observer robot, while moves, observes the other moving robot occasionally, because it moves its head from left to right all the time. In Fig. 5(b) it can be seen that, when using the classical EKF approach, there are two wrongly detected robots placed in the previous path of the moving robot, in addition to the same error that occurs in the last experiment when more than one obstacle in the map is describing each real one. The GM-PHD method correctly relates these observations with the same obstacle. In fact, the GM-PHD method perfectly estimates the number of robots in the field. In the case of the EKF method, bad associations can be corrected by increasing the minimal distance of merging. But this can produce another type of errors, where detections from different robots are associated to the same one.

In the third experiment we analyze a typical kidnapping situation. The observer robot is placed in the center of the field and three static robot are placed in other field positions. The observer robot is looking around when one of the static robots is removed from the field (in a real match, this is very common due to robot penalizations). As can be seen in Fig. 5(c) and (d), the GM-PHD approach deletes very quickly the hypothesis associated with the kidnapped robot, while the classical EKF method keeps the track until the covariance reaches a given threshold value. This can be fixed for the classical EKF method by calculating a different rate of covariance growing when a hypothesis that should be seen is not seen. But, this implies including another heuristic to the process, while the GM-PHD method handles this situation naturally.

Finally, in the last experiment the observer robot also realizes a ready positioning while there are some static robots placed in the field. But two of these robots are very close from each other, therefore the perception of these robots is very inaccurate. In Fig. 5(e) it can be seen that even when only one Gaussian is representing these robots, the GM-PHD method can correctly estimate the number of robots in that place (given by the weight of the Gaussian), while the classical EKF approach fails due the odometry and perception errors. In Fig. 5(f) the estimated number of robots given by each method thought the entire experiment is shown. It should be remembered that the estimated number of robots is calculated as the sum of the weight of all Gaussians by the GM-PHD method, and as the number of obstacles created by the classical method.

6 Conclusions

This paper presents a new method for building obstacle maps using a consistent mathematically approach, known as Random Finite Sets. The method is applied to the problem of estimating the position of the robots, teammates and opponents, in the SPL league. Considering the computational capabilities of Nao robots, the GM-PHD implementation is used. In this implementation, obstacles and observations are represented using Mixture of Gaussians, but instead of associating an obstacle or an observation to a given Gaussian, the weight of each Gaussians maintains an estimation of the number of robots that it represents.

The proposed tracking method was validated in several real game situations, with moving robots, and compared with a classical EKF based approach. The proposed GM-PHD method showed a much better performance, being able to deal with most of the data association problems, even being able to manage complex situations such a robot kidnapping. Moreover, the method is able to run in real-time in the Nao robots (mean processing time is 0.13 ms; worse case processing time 0.3 ms).