1 Introduction

We consider the problem of performing inference for multi-object dynamical systems under partial, corrupted and noisy observations. This class of problems, known as multi-target tracking in the engineering literature (Fortmann et al. 1980; Mahler 2003; Vo et al. 2014), arises in many applications, e.g. bio-imaging (Chenouard 2014), robotics (Mullane et al. 2011) and surveillance (Benfold and Reid 2011), which can all benefit from principled inference solutions in different ways: i) when the number of objects is too large to be treated by hand, ii) when the phenomena of interest take place on extended periods of time or, conversely, when an immediate response is needed, iii) when the data available about each object are scarce and iv) when it is difficult to tell one object from another. One of the main difficulties with the considered type of system is that the number of objects is not known a priori and might vary in time due to a birth-death process. Also, objects are observed under multiple perturbations: i) each object might or might not be detected, ii) if an object is detected then its state is only partially observed and the observation is subject to noise and iii) observations not related to any object, referred to as false alarms, are also received. The main task when inferring the number of objects in a given system as well as their respective state is to solve the data association problem, that is, to estimate whether or not observations at different time steps originate from the same object. Each of the above-mentioned perturbations incurs a significant increase in the size of the set of all possible data associations, making it highly combinatorial. Due to this combinatorial nature, the task of estimating the current state of all objects based on all previous observations, referred to as multi-object filtering, is a difficult problem. It has been an active research topic for several decades and continues to be challenging in spite of the ever-increasing available computational resources (Fortmann et al. 1980; Vo et al. 2014). In this article, we aim to tackle the even harder problem of multi-object smoothing, that is, our objective is to keep evaluating the likelihood of data associations at previous times in light of newly received data. This is an important problem in practice since the elicitation of objects’ trajectories and origins is fundamental for the evaluation of the objects’ identities and of the associated situational awareness. Indeed, knowing the current state of each object is not sufficient in many situations and maintaining an up-to-date estimate of their past trajectories is often crucial. For instance, in defence applications, if an object labelled as “ally” crosses path with another object labelled as “enemy”, then being able to tell one from the other at a later time can be more critical than having an accurate estimate of their state at that time.

In the context of filtering, one of the most natural ways of improving the trajectory estimates over the last few time steps is referred to as fixed-lag smoothing, where a sliding window made of a given number of time steps is updated based on the latest observations. The advantage with fixed-lag smoothing is that the computational cost can easily be tuned by selecting an adequate lag. However, since our objective is to elicit particular events that might have taken place at arbitrary time steps, we consider instead a “batch” alternative where a user-defined time window of interest is fixed. It would be possible to extend our approach in order to make it suitable for online estimation, e.g. by running the proposed MCMC chain over a fixed lag for some or all iterations of a suitable online estimation algorithm in order to further explore the set of data associations; this extension is kept for future work. Further generalisations of our approach could consider smoothing on an infinitely growing time interval; however, this would require improving the current understanding of the statistical properties of multi-object systems (Houssineau et al. 2019b) in order to leverage the corresponding forgetting properties.

Defining a standard statistical model for representing multiple objects requires setting a number of probability distributions and parameters to characterise the different aspects of the problem, including highly uncertain phenomena such as false alarms. Such models also usually ignore the disparity between the different objects of interest in terms of behaviour and detection profile. In this article, we consider an alternative representation of uncertainty (Houssineau et al. 2019a; Houssineau 2018b), based on possibility theory (Dubois and Prade 2015), that allows for acknowledging the lack of information about the different aspects of multi-object dynamical systems with the objective of increasing the robustness to misspecification of the derived solutions. The considered representation of uncertainty has links with imprecise probabilities (Walley 1991) and Dempster–Shafer theory (Dempster 1967; Shafer 1976).

Our motivation for using possibility theory in this work stems from the provided ability to assess the consistency between a model and an observation via the marginal likelihood rather than the associated fitness of the model as with probability theory. The difference is that greater uncertainty in the model yields greater consistency but smaller fitness. This notion of consistency can be related informally to a model-based spatio-temporal distance between observations, and is key when exploring the set of all possible data associations.

The use of MCMC to solve data association problems has been previously explored in Oh et al. (2009) as well as in Vu et al. (2014), Jiang et al. (2015) and Jiang and Singh (2018). The approach considered in these articles is based on local proposals in the set of data association, with Vu et al. (2014), Jiang et al. (2015) and Jiang and Singh (2018) additionally considering the estimation of the object’s trajectories. The objective in this article is to show that a suitable trade-off can be found between proposing global moves and maintaining a reasonable probability of acceptance of each move. This result is achieved by leveraging the efficiency of an approximate multi-object filtering method. The use of MCMC in discrete spaces is discussed more generally in Zanella (2019). MCMC has also been used in conjunction with, or as a replacement of, sequential Monte Carlo in the context of filtering for single-object systems (Gilks and Berzuini 2001) and multi-object systems (Khan et al. 2005; Septier et al. 2009; Carmi et al. 2012; Maroulas and Stinis 2012; Bao and Maroulas 2017); however, this type of approach is less directly related to the method proposed in this article.

Overall, the contributions of the articles are as follows: i) a full multi-object model is defined in the context of possibility theory, building up on the components of single- and multi-object models of Ristic et al. (2020) and Houssineau (2018a); ii) a possibilistic analogue of the scalable solution to multi-object filtering of Houssineau and Clark (2018) is introduced; iii) the tools of possibility theory are used to define a suitable structure on the set of data associations; and iv) a new efficient MCMC-based solution for the multi-object smoothing problem is introduced and its performance is demonstrated.

The proposed statistical model for representing multi-object systems is described in Sect. 2. This is followed by the presentation of the proposed method for exploring the set of data association in Sect. 3, before considering an extension of this approach in Sect. 4. The performance of the proposed method is then assessed on simulated data in Sect. 5.

2 Model

We consider a fixed number K of time steps and assume without loss of generality that time steps take integer values between 1 and K. At each time step \(k \in \{1,\dots ,K\}\), a set of observations \(Z_k\) is received, containing both object-originated observations and false alarms. Each observation in the set \(Z_k\) is an element of an observation set \(\mathsf {Z}\), which is assumed to be a subset of \(\mathbb {R}^{d_{\mathsf {Z}}}\), where \(d_{\mathsf {Z}}\) is the number of components of each observation, e.g. \(d_{\mathsf {Z}} = 3\) for a radar measuring range, azimuth and Doppler shift. In order to model that an object might not be detected, we introduce the notation \(\phi \) for the empty observation, that is, an object for which detection has failed is associated with the empty observation \(\phi \). We assume, as is standard, that an object cannot generate more than one observation at each time step. Therefore, denoting \(\bar{Z}_k = Z_k \cup \{\phi \}\) the set of observations at time k augmented with the empty observation for any \(k \in \{1,\dots ,K\}\), any sequence of observations generated by an object through the K time steps of the scenario can be seen as an element of

$$\begin{aligned} \mathcal {O}_K = \bar{Z}_1 \times \dots \times \bar{Z}_K \setminus \{\phi \}^K \end{aligned}$$

where the sequence of observation containing empty observations only is not considered. Elements of \(\mathcal {O}_K\) are also referred to as observation paths or simply as paths. Multi-object data association can then be seen as the problem of determining the probability for all the paths in a given subset of \(\mathcal {O}_K\) to be the true paths of objects in the system under consideration; i.e. we are interested in the joint detection and tracking of all objects. Another standard assumption about multi-object systems is that each observation cannot originate from more than one object; as a consequence, not all subsets of \(\mathcal {O}_K\) are considered feasible and we focus on the set \(\mathcal {A}\) of subsets of \(\mathcal {O}_K\) such that for any set \(A \in \mathcal {A}\), any two different observations paths \(o\) and \(o'\) in A must verify that either \(o_k = o'_k = \phi \) or \(o_k \ne o'_k\) for all \(k \in \{1,\dots ,K\}\), where \(o_k\) denotes the kth element of the path \(o\). Less formally, elements of \(\mathcal {A}\) only contain paths that are different where they are not both equal to the empty observation. In practice, it might happen that a single object generates multiple observations, e.g. if it spans several resolution cells, or that multiple objects generate a single observation, e.g. if they are within the same resolution cell, these generalisations significantly increase the complexity of the problem and are kept for future work. The set \(\mathcal {A}\), in spite of being a strict subset of the power set of \(\mathcal {O}_K\), has a large cardinality and evaluating the credibility of each of its elements by exhaustion can be difficult even when the number of observations at each time step is small. Assuming, for simplicity, that the number of observations at every time step is constant and equal to m, the number of elements in the power set of \(\mathcal {O}_K\) is equal to \(2^{m^K}-1\), which is prohibitively large even for toy problems. It is generally difficult to devise algorithms that perform inference on a large discrete space such as \(\mathcal {A}\), yet, MCMC methods can help to address part of this challenge since they only require being able to evaluate the credibility of a given association \(A \in \mathcal {A}\) proposed via some user-defined transition kernel.

In practice, we also need to estimate the interval of existence of each object. For this purpose, we introduce a set \(\mathcal {T}\) which is similar to \(\mathcal {A}\) except that each path \(o\) will be paired with a time of appearance \(m \in \{1,\dots ,K\}\) and the last time of existence \(n \in \{m,\dots ,K\}\); the tuple \((o,m,n)\) will be referred to as a track since all the observable characteristics of the corresponding hypothetical object can be estimated using this tuple. Formally, for any set \(T \in \mathcal {T}\), any track \(t = (o,m,n)\) in T must verify \(o_k = \phi \) for any \(k \notin \{m,\dots ,n\}\) and, for any \((o',m',n')\) in T different from \((o,m,n)\), it must hold that either \(o_k = o'_k = \phi \) or \(o_k \ne o'_k\) for all \(k \in \{1,\dots ,K\}\), as for data associations. We denote by \(\kappa \) the function extracting paths from tracks, that is, \(\kappa (t) = o\) for any track t with path \(o\).

2.1 Uncertain variable and possibility function

We consider the representation of uncertainty described in Houssineau et al. (2019a) which can be used as an alternative to subjective probabilities in a statistical model. The objective of this representation of uncertainty is to model information rather than randomness and therefore to address common issues with statistical modelling for complex systems and with the use of subjective probabilities. In the context of multi-object systems, some of these issues are:

  1. 1)

    the associated models are hierarchical which precludes the use of improper priors on the first level of this hierarchy;Footnote 1 however, there is often no prior information on the location of appearing objects which means that uninformative priors are needed;

  2. 2)

    as with many complex systems, there are a large number of parameters which are not necessarily known in practice and learning these parameters is both challenging computationally and potentially useless if they are likely to change drastically from one time step to the other; this is for instance the case with the probability of detection;

As will be shown in the next few sections, the proposed approach allows for addressing these issues while preserving most of the usual intuitive mechanisms in Bayesian inference.

We model a fixed but unknown quantity as a mapping \(\varvec{x}\) from a sample space \(\Omega \) to a set \(\mathsf {X}\), referred to as an uncertain variable. The difference with a random variable is that \(\Omega \) is not defined as a probability space and the value \(x^*\) of the unknown quantity is not seen as the realisation of a “true” underlying probability distribution; instead, there is a reference element in \(\Omega \), denoted \(\omega ^*\), such that \(\varvec{x}(\omega ^*) = x^*\). This modification implies that uncertainty should be modelled differently and we consider the approach where the information about the true value of \(\varvec{x}\) is represented by a non-negative function \(f_{\varvec{x}}\) on \(\mathsf {X}\) verifying \(\sup _{x \in \mathsf {x}} f_{\varvec{x}}(x) = 1\), referred to as a possibility function. In this context, instead of defining the probability of an event \(X \in A\) for some subset \(A \subseteq \mathsf {X}\) as \(\int _A p_X(x) \mathrm {d}x\) for some random variable \(X \sim p_X\), we define the credibility of the event \(\varvec{x} \in A\) as \(\bar{P}(A) = \sup _{x \in A} f_{\varvec{x}}(x)\). The notion of credibility can be illustrated by the two extreme cases: if \(\bar{P}(A) = 0\) then the event \(\varvec{x} \in A^{\mathrm {c}} = \mathsf {X} \setminus A\) happened almost surely, whereas if \(\bar{P}(A) = 1\) then there is no information against the event \(\varvec{x} \in A\). (However, it might also hold that \(\bar{P}(A^{\mathrm {c}}) = 1\) as opposed to the probabilistic case.) Possibility functions are not characterised by their corresponding uncertain variables, and instead, we say that the possibility function describes the uncertain variable. If \(\varvec{y}\) is another uncertain variable in a set \(\mathsf {Y}\) and if \(\varvec{x}\) and \(\varvec{y}\) are jointly described by the possibility function \(f_{\varvec{x},\varvec{y}}\) then \(\varvec{y}\) is described by the marginal possibility function

$$\begin{aligned} f_{\varvec{y}}(y) = \sup _{x \in \mathsf {X}} f_{\varvec{x},\varvec{y}}(x,y), \qquad y \in \mathsf {Y}. \end{aligned}$$

Assuming that \(f_{\varvec{y}}(y) > 0\) for some given \(y \in \mathsf {Y}\), the conditional possibility function describing \(\varvec{x}\) given that \(\varvec{y} = y\) is defined for any \(x \in \mathsf {X}\) as

$$\begin{aligned} f_{\varvec{x}}(x \,|\,y) = \dfrac{f_{\varvec{x},\varvec{y}}(x,y)}{f_{\varvec{y}}(y)} = \dfrac{f_{\varvec{y}}(y \,|\,x)f_{\varvec{x}}(x)}{\sup _{x' \in \mathsf {X}} f_{\varvec{y}}(y \,|\,x')f_{\varvec{x}}(x')}, \end{aligned}$$

which is the analogue of Bayes’ theorem for possibility functions (De Baets et al. 1999). In this context, we will refer to \(f_{\varvec{x}}\) and \(f_{\varvec{x}}(\cdot \,|\,y)\) as the prior and posterior possibility functions, respectively, and \(f_{\varvec{y}}(y \,|\,\cdot )\) will be called the likelihood function; similarly, \(f_{\varvec{y}}(y)\) will be referred to as the marginal likelihood. If it holds that \(f_{\varvec{x},\varvec{y}}(x,y) = f_{\varvec{x}}(x)f_{\varvec{y}}(y)\) for all \((x,y) \in \mathsf {X}\times \mathsf {Y}\), then \(\varvec{x}\) and \(\varvec{y}\) are said to be independently described. This form of independence only implies that the information about \(\varvec{x}\) is not related to the information we hold about \(\varvec{y}\).

A notion of expected value can be defined for possibility functions via the corresponding law of large numbers of Houssineau et al. (2019a) as

$$\begin{aligned} \mathbb {E}^*(\varvec{x}) = \hbox {arg sup}_{x \in \mathsf {X}} f_{\varvec{x}}(x). \end{aligned}$$

Another useful notion of expected value, which is the direct analogue of the standard expected value, can be defined for any real-valued function \(\varphi \) on \(\mathsf {X}\) as

$$\begin{aligned} \bar{\mathbb {E}}(\varphi (\varvec{x})) = \sup _{x \in \mathsf {X}} \varphi (x) f_{\varvec{x}}(x). \end{aligned}$$

The scalar \(\bar{\mathbb {E}}(\varphi (\varvec{x}))\) can be interpreted as the maximum expected value of \(\varphi (\varvec{x})\). If the expected value \(\bar{\mathbb {E}}(\varvec{x})\) is interpreted as the first moment, then \(\mathbb {E}^*(\varvec{x})\) can be seen as the argument of the supremum in the zeroth moment \(\bar{\mathbb {E}}(\varvec{x}^0) = \sup _{x \in \mathsf {X}} f_{\varvec{x}}(x) = 1\).

Many concepts and results holding for probability distributions can be used for possibility functions. For instance, if \(\mathsf {Y} = \mathsf {X} = \mathbb {R}\) and if the likelihood function is a normal possibility function, i.e.

$$\begin{aligned} f_{\varvec{y}}(y \,|\,x) = \exp \Big ( -\dfrac{1}{2\sigma ^2}(y - a x)^2 \Big ) \doteq \overline{\mathrm {N}}(y; a x, \sigma ^2) \end{aligned}$$

for some \(a \in \mathbb {R}\) and some \(\sigma > 0\), then one can show that the posterior is also a normal possibility function if the prior \(f_{\varvec{x}}\) is normal. In other words, the concept of conjugate priors makes sense. This result can be extended to the multivariate case, and it has been shown in Houssineau and Bishop (2018) that the posterior expected value and variance of the Kalman filter can be recovered with possibility functions.

If the objective is to find the subjective probability p(B) of some event \(\varvec{x} \in B\) for some measurable subset B of \(\mathsf {X}\), then the credibility \(\sup _{x \in B} f_{\varvec{x}}(x)\) can be seen as an upper bound for this subjective probability, and we find that

$$\begin{aligned} 1 - \sup _{x \in B^{\mathrm {c}}} f_{\varvec{x}}(x) \le p(B) \le \sup _{x \in B} f_{\varvec{x}}(x). \end{aligned}$$
(1)

Since \(\varvec{x}\) is not random, we emphasise that p(B) is a subjective probability; that is, p(B) only captures a degree of belief for the event \(\varvec{x} \in B\) rather than a probability in the strict sense (e.g. as a limiting frequency for that event). Equation (1) implies that the possibility function \(\varvec{1}\), which is equal to 1 everywhere on \(\mathsf {X}\), is the least informative; this uninformative possibility function is well defined even when \(\mathsf {X}\) is unbounded. Another consequence of (1) is that an alternative possibility function \(f'_{\varvec{x}}\) for \(\varvec{x}\) can be seen as being less informative than \(f_{\varvec{x}}\) if \(f'_{\varvec{x}}(x) \ge f_{\varvec{x}}(x)\) for any \(x \in \mathsf {X}\); indeed, the bounds in (1) will be wider with \(f'_{\varvec{x}}\). This is related to the notion of credal set that is common in possibility theory (Dubois and Prade 2015). It is also possible to interface uncertain variables and random variables in order to introduce more sophisticated representations of uncertainty involving both lack of information and randomness (Houssineau 2018b). However, we will argue that all the elements of the introduced statistical model can be seen as deterministic so that only possibility functions will be used. Our approach has connections with the treatment of single-object dynamical systems based on random set theory, see, e.g. (Mahler 2007) where generalised likelihood functions are suggested. Our contribution is the modelling of the entire multi-object system with possibility functions.

2.2 Multi-object model

We first introduce the assumptions and notations for modelling the way objects appear, behave and disappear in Sect. 2.2.1 before moving on to the considered sensor modelling in Sect. 2.2.2. Most of the assumptions are standard in the field of multi-object estimation.

2.2.1 Object and population dynamics

We consider the case where there is no information about some or all of the components of the state of appearing objects. Typically, there might be no prior information about the position of objects, whereas assumptions can be made about the velocity components. Denoting \(m \in \{1,\dots ,K\}\) the time step at which a given object has appeared, the state at this time step is represented by an uncertain variable \(\varvec{x}_m\) in a space \(\mathsf {X} \subseteq \mathbb {R}^{d_{\mathsf {X}}}\) described by a possibility function \(f_0\). With probabilistic modelling, improper priors might be required in order to model the absence of information about appearing objects; however, the hierarchical nature of multi-object estimation implies that improper priors cannot be used without adding heuristics at the level of data association (Houssineau and Laneuville 2010; Ristic et al. 2012).

We consider that there is a non-negligible heterogeneity between the dynamics of the different objects and that the characteristics of the objects’ motion is not necessarily well known. As a consequence, and recalling that n is the last time step of existence of the object, we model the trajectory of an object as a sequence of uncertain variables \(\{\varvec{x}_k\}_{k=m+1}^n\) on \(\mathsf {X}\) such that, for any \(k \in \{m+1,\dots ,n\}\), \(\varvec{x}_k\) is described by a possibility function \(f_{\varvec{x}_k}(\cdot \,|\,\varvec{x}_m, \dots , \varvec{x}_{k-1})\) satisfying

$$\begin{aligned} f_{\varvec{x}_k}(x_k \,|\,\varvec{x}_m, \dots , \varvec{x}_{k-1}) = g_k(x_k \,|\,\varvec{x}_{k-1}), \qquad x_k \in \mathsf {X}, \end{aligned}$$

for some possibility function \(g_k(\cdot \,|\,\varvec{x}_{k-1})\) on \(\mathsf {X}\). This is an analogue of the Markov property for uncertain variables.

We take into account the fact that objects might completely disappear from the scene before the last time step, in which case we say that the object has “not survived”. This could be seen as a convenient way of dealing with objects that are no longer detectable by the sensor(s). Since object’s dynamics is modelled via uncertain variables, we also model object survival as deterministic. The respective credibilities for an object with state \(x \in \mathsf {X}\) to survive or not survive to the next time step are denoted \(\alpha _{\mathrm {s}}(x)\) and \(\alpha _{\mathrm {ns}}(x)\). These credibilities must verify \(\max \{\alpha _{\mathrm {s}}(x), \alpha _{\mathrm {ns}}(x)\} = 1\) for any \(x \in \mathsf {X}\). We consider the case where \(\alpha _{\mathrm {s}} = \varvec{1}\), i.e. \(\alpha _{\mathrm {s}}(x) = 1\) for any \(x \in \mathsf {X}\), since we want to model that objects are unlikely to disappear right after appearing, for which we need to set \(\alpha _{\mathrm {ns}}(x) \ll 1\) for any \(x \in \mathsf {X}\). The subjective probability of survival for an object with state \(x \in \mathsf {X}\) is therefore restricted to the interval \([1-\alpha _{\mathrm {ns}}(x), 1]\).

Given the introduced model and notations, the joint credibility of a trajectory \(x_{m:n} = (x_m,\dots ,x_n) \in \mathsf {X}^{n-m+1}\) and of the corresponding last time of existence \(n \in \{1,\dots ,K\}\) for an object that is known to appear at time step m can be characterised by the possibility function

$$\begin{aligned}&g(x_{m:n},n \,|\,m) = \\&\quad f_0(x_m) \alpha _{\mathrm {ns}}(x_n)^{\mathbb {1}(n < K)} \prod _{k = m+1}^n g_k(x_k \,|\,x_{k-1}), \end{aligned}$$

where \(\mathbb {1}(e)\) equals 1 if e is true and 0 otherwise.

There are several possible models for the number of appearing objects per time step. The simplest is to assume that the credibility for an object to appear at time \(k \in \{1,\dots ,K\}\) is \(\alpha _{k,+}\) and that this aspect can be independently described for all objects. The credibility \(f_{k,+}(M)\) for M objects to appear at time step k is then \(f_{k,+}(M) = \alpha _{k,+}^M\). Additional information might, however, be available about appearing objects, such as a maximum number \(M_{k,+}\) at time step k, in which case credibility for M objects to appear would be \(\mathbb {1}(M \le M_{k,+})\alpha _{k,+}^M\).

2.2.2 Observation

Most sensors acquire information about the objects of interest by measuring some signal over an array of resolution cells. This is the case for cameras, where these resolution cells are pixels, but also for most radars and sonars (Skolnik 1990). Considering for instance the case of a radar measuring range and azimuth, the internal processing of the radar image yields a set of resolution cells where the strength of the signal suggests the presence of an object in the corresponding directions and at the specified distances. In addition, objects are often extended and the signal can originate from different edges and/or surfaces depending on their (unknown) orientations. As a consequence, we model the observation process via uncertain variables and consider the following form for the likelihood function:

$$\begin{aligned} \ell _k(z \,|\,x)&= \exp \Big ( -\dfrac{1}{2}(z - h_k(x))^{\intercal }R_k^{-1}(z - h_k(x)) \Big ) \\&\doteq \overline{\mathrm {N}}(z; h_k(x), R_k), \end{aligned}$$

for any \(z \in \mathsf {Z}\), where \(R_k\) is a \(d_{\mathsf {Z}}\times d_{\mathsf {Z}}\) symmetric positive-definite matrix related to the size and shape of the resolution cells (assumed constant in \(\mathsf {Z}\)). The difference between this normal possibility function and the corresponding normal probability distribution would not matter in a standard single-object tracking scenario since normalising constants would simplify in Bayes’ theorem; however, in multi-object tracking, these constants are important since they appear in the assessment of data associations. The credibility for an object with state \(x \in \mathsf {X}\) to be detected is denoted \(\alpha _{\mathrm {d}}(x)\) and, similarly, the credibility of a detection failure is denoted \(\alpha _{\mathrm {nd}}(x)\). Since it must hold that \(\max \{\alpha _{\mathrm {d}}(x), \alpha _{\mathrm {nd}}(x)\} = 1\) for any \(x \in \mathsf {X}\), we will assume that \(\alpha _{\mathrm {d}} = \varvec{1}\) and \(\alpha _{\mathrm {nd}}(x) \ll 1\) so that it is unlikely for an object to remain undetected. Given a trajectory \(x_{m:n}\) of an object appearing at time step m and disappearing after time step n, we assume that the uncertain variables \(\{\varvec{z}_k\}_{k=m}^n\) modelling the observation in \(\mathsf {Z} \cup \{\phi \}\) are conditionally independent given \(\varvec{x}_{m:n} = x_{m:n}\), it follows that the likelihood function for a path \(o\in \mathcal {O}_K\) is

$$\begin{aligned} \ell (o\,|\,x_{m:n}, m, n) = \prod _{k = m}^n \alpha _{\mathrm {nd}}(x_k)^{\mathbb {1}(o_k = \phi )} \ell _k(o_k \,|\,x_k)^{\mathbb {1}(o_k \ne \phi )}. \end{aligned}$$

Overall, we have the analogue of a HMM for each object where the sequence of uncertain variables \(\{\varvec{x}_k\}_{k=m}^n\) has the Markov property and where observations are conditionally independent given \(\{\varvec{x}_k\}_{k=m}^n\).

The credibility for an observation \(z \in \mathsf {Z}\) at time \(k \in \{1,\dots ,K\}\) to be a false alarm is denoted \(\alpha _{k,\mathrm {fa}}(z)\), which will be assumed to be strictly lesser than 1; indeed, if there is no information on false alarms, then the data association with highest credibility will be the one defining all observations as false alarms. The credibility for a given finite subset Z of observations in \(\mathsf {Z}\) to be false alarms is then

$$\begin{aligned} f_{k,\mathrm {fa}}(Z) = \prod _{z \in Z} \alpha _{k,\mathrm {fa}}(z). \end{aligned}$$

As a possibility function on sets, \(f_{k,\mathrm {fa}}\) must verify that \(\sup _{Z \subseteq \mathsf {Z}} f_{k,\mathrm {fa}}(Z) = 1\).

2.3 Target possibility function

We now introduce the posterior possibility function on the set \(\mathcal {T}\) describing the unknown set of tracks, based on the model detailed in Sect. 2.2. For this purpose, we consider a track \(t = (o, m, n)\) and start by defining the credibility \(\pi (o,n \,|\,m)\) of the pair \((o,n)\) given the time of appearance \(m \in \{1,\dots ,K\}\) as

$$\begin{aligned} \pi (o,n \,|\,m) = \sup _{x_{m:n} \in \mathsf {X}^{n-m+1}} \ell (o\,|\,x_{m:n}, m, n) g(x_{m:n}, n \,|\,m). \end{aligned}$$

Other aspects such as false alarms and initial observations must be considered jointly. We denote by \(f_{\mathrm {fa}}\) the function defined on \(\mathcal {A}\) as

$$\begin{aligned} f_{\mathrm {fa}}(A) = \prod _{k = 1}^K f_{k,\mathrm {fa}}(Z_{k,\mathrm {fa}}(A)), \qquad A \in \mathcal {A}, \end{aligned}$$

where \(Z_{k,\mathrm {fa}}(A) = \{ z \in Z_k : \forall o\in A, z \ne o_k \}\) is the set of false alarms induced by A at time step k. We also introduce \(f_+\) as the function on \(\mathcal {T}\) defined as

$$\begin{aligned} f_+(T) = \prod _{k = 1}^K f_{k,+}(M_k(T)) \end{aligned}$$

where \(M_k(T) = \sharp \{ (o,m,n) \in T : m = k \}\) is the number of objects appearing at time step \(k \in \{1,\dots ,K\}\). The functions \(f_{\mathrm {fa}}\) and \(f_+\), defined, respectively, on \(\mathcal {A}\) and \(\mathcal {T}\), are not possibility functions; instead, they are simply the joint credibility for observations that are not in a given element of \(\mathcal {A}\) to be false alarms and for tracks that are in a given element of \(\mathcal {T}\) to have appeared at the indicated time steps. The target possibility function, i.e. the posterior possibility function \(\varPi \) on \(\mathcal {T}\) describing the unknown set of tracks, is then expressed for any \(T \in \mathcal {T}\) as

$$\begin{aligned} \varPi (T) \propto f_{\mathrm {fa}}(\kappa (T)) f_+(T) \prod _{(o,m,n) \in T} \pi (o,n \,|\,m ), \end{aligned}$$
(2)

and is such that \(\max _{T \in \mathcal {T}} \varPi (T) = 1\). The marginal likelihood for the set of paths \(A = \kappa (T)\) is then defined as

$$\begin{aligned} \hat{\varPi }(A) = \max \{ \varPi (T) : T \in \mathcal {T}, \kappa (T) = A \}. \end{aligned}$$
(3)

3 MCMC for data association

3.1 Computational aspects of possibility theory

Approximation methods for possibility functions must be devised in order to solve the corresponding inference problems in general. Grid-based methods have the same shortcomings as in the probabilistic case since it is often difficult to anticipate where the posterior possibility function will take non-negligible values. We propose to use MCMC with a suitably designed proposal distribution in order to explore credible data associations and attempt to find the global maximum and/or regions of high credibility. In this case, there is no requirement of targeting a given probability distribution and there is no concern regarding the independence between samples. One of the consequences is that low-discrepancy sequences can be used instead of pseudorandom numbers. The generated chain, say \(\{X_n\}_{n \ge 1}\), will simply be used to approximate the expected value

$$\begin{aligned} \bar{\mathbb {E}}(\varphi (\varvec{x})) \approx \max _{n \ge 1} \varphi (X_n) f_{\varvec{x}}(X_n), \end{aligned}$$

for any real-valued function \(\varphi \) on \(\mathsf {X}\). As opposed to the standard Monte Carlo approximation, the possibility function \(f_{\varvec{x}}\) appears explicitly in the expression of \(\bar{\mathbb {E}}(\varphi (\varvec{x}))\) since the density of samples in a given area conveys no information about \(f_{\varvec{x}}\); instead, the chain \(\{X_n\}_{n \ge 1}\) simply provides support points for the approximation of \(f_{\varvec{x}}\) as a function. Although the empirical distribution of the samples generated by the MCMC chain is not relevant in this context, the use of MCMC remains beneficial since the underlying mechanisms yield an efficient exploration of the set of data associations which would otherwise be challenging. This is an example of synergy between probability theory and possibility theory where the former helps computing quantities appearing in the latter by leveraging random exploration. Once the most likely data association \(A^*\) has been found, the credibility of any other data association \(A \in \mathcal {A}\) can be computed exactly; this would require summing over all data associations in the probabilistic case, which is not usually feasible. If only a local maximum \(A' \ne A^*\) is found, the credibility of other data associations will be over-estimated, which is conservative according to the interpretation associated with (1) and therefore acceptable in practice.

If only the expected value \(\mathbb {E}^*(\varvec{x})\) of \(\varvec{x}\) is of interest, then the possibility function \(f_{\varvec{x}}^{\gamma }\) for some \(\gamma > 1\) can be used instead. The considered power can also be increased during the execution of the MCMC, leading to a simulated annealing. Conversely, if one is interested in identifying the subset of \(\mathsf {X}\) containing at least \(100(1-\alpha )\%\) of the subjective probability mass defined in (1), then areas where \(f_{\varvec{x}}\) has value \(\alpha \) must also be explored, hence justifying the use of a power \(\gamma \) strictly lesser than 1.

When using the possibility function \(f^{\gamma }_{\varvec{x}}\) in a MCMC algorithm, it is the probability distribution on \(\mathsf {X}\) defined as the renormalised version of \(f^{\gamma }_{\varvec{x}}\) that is targeted (assuming \(f^{\gamma }_{\varvec{x}}\) is integrable). This is not, however, the only possible approach. Indeed, (1) suggests that a possibility function can be seen as inducing an upper bound for probability distributions. It follows that selecting the sampling distribution from the set of upper-bounded probability distributions is also meaningful. A particular choice that is appropriate in many settings is to follow the maximum-entropy principle (Jaynes 1957) and consider the maximum-entropy distribution that is upper bounded by \(f_{\varvec{x}}\) as in (1), as proposed in Houssineau and Ristic (2017). When \(\mathsf {X}\) is discrete, it is possible to further increase the entropy by replacing the set-wise upper bound of (1) by a point-wise upper bound of the form \(p(x) \le f_{\varvec{x}}(x)\), \(x \in \mathsf {X}\), with p a probability mass function on \(\mathsf {X}\). This approach will be particularly useful in the context of multi-object inference since it will lead to an increase of the diversity of explored data associations when compared to sampling from the distribution proportional to \(f_{\varvec{x}}\).

Overall, our approach provides a trade-off between the standard probabilistic modelling where all aspects of the problem must be characterised and model-free methods, see, e.g. (Sgouralis et al. 2017), which rely on minimalistic assumptions about the underlying dynamics. This trade-off can be beneficial in situations where one wants to leverage the available information without describing phenomena such as false alarms which are often challenging to characterise.

3.2 Problem formulation

The objective in the remainder of this section is to design a proposal distribution for identifying the mode of the possibility function \(\hat{\varPi }\) defined in (3) via the Metropolis–Hastings algorithm. We assume for the moment that this proposal distribution is given and express it as a Markov kernel \(\varPhi \) from \(\mathcal {A}\) to itself. A natural starting point for exploring the set \(\mathcal {A}\) is to consider the case where all observations are false alarms, that is, we start from the element \(A = \emptyset \in \mathcal {A}\). We first assume that \(\hat{\varPi }\) can be evaluated everywhere so that, given a previous sample A, a new sample \(A'\) can be obtained from the probability distribution \(\varPhi (\cdot \,|\,A)\) and accepted with probability

$$\begin{aligned} \hat{\alpha }_t(A,A') = \min \bigg (1, \dfrac{\hat{\varPi }(A')^{\rho _t}\varPhi (A \,|\,A')}{\hat{\varPi }(A)^{\rho _t}\varPhi (A' \,|\,A)} \bigg ), \end{aligned}$$
(4)

where t is the current iteration and \(\rho _t\) is the inverse temperature defined by \(\rho _0 = 1\) and \(\rho _t = \rho _{t-1} / (1 - c)\) for some constant c.

The main difficulty with the Metropolis–Hastings algorithm in the context of interest is to design a proposal distribution \(\varPhi \) with adequate properties. In particular, there are two issues with this approach which we will aim to solve in the remainder of this section:

  1. i)

    The possibility function \(\hat{\varPi }\) on \(\mathcal {A}\) is highly multimodal in general so that moves that are local both in space and time are unlikely to yield a sufficient exploration of the space.

  2. ii)

    Implementing moves on entire paths in the set \(\mathcal {O}_K\) would be more global in nature; however, this requires the non-trivial introduction of additional structure on this set.

These two issues will be addressed in Sects. 3.3 and 3.4, respectively. Section 3.5 will then detail the construction of the proposal distribution \(\varPhi \). Extensions of the MCMC algorithm introduced for \(\hat{\varPi }\) to the possibility function \(\varPi \) on \(\mathcal {T}\) will be covered in Sect. 4.

3.3 Approximate multi-object filtering

As is usual with MCMC algorithms, the objective is to design a proposal distribution that explores the set \(\mathcal {A}\) as quickly as possible while maintaining a reasonable probability of acceptance. To this effect, we propose to use a multi-object filter to ensure that any proposed association is meaningful from the viewpoint of the model. The motivation for leveraging the capabilities of an approximate filtering algorithm to solve the corresponding smoothing problem is very similar to the one behind particle MCMC (Andrieu et al. 2010). The corresponding moves that we will construct will be global in the sense that they might affect all time steps but local in the sense that only a restricted number of paths will be (re)assigned. The considered filtering algorithm should have a low complexity in order to limit the computational cost of the overall MCMC algorithm. A possible candidate could therefore be the probability hypothesis density (PHD) filter (Mahler 2003) or its analogue in the context of possibility theory (Houssineau 2018a). However, the PHD filter does not solve the data association problem explicitly and, as a consequence, cannot be formally used to propose paths. Instead, we consider an analogue of the hypothesised filter for stochastic populations (Houssineau and Clark 2018), or HISP filter, which is of the same complexity as the PHD filter and which allows for distinguishing objects.

At time step \(k \in \{1,\dots ,K\}\), the HISP filter extends a set of paths \(O_{k-1} \subseteq \mathcal {O}_{k-1} = \bar{Z}_1 \times \dots \times \bar{Z}_{k-1} \setminus \{\phi \}^{k-1}\) with the new observations in \(\bar{Z}_k\) and computes the marginal probability of each association between paths in \(O_{k-1}\) and observations in \(\bar{Z}_k\). The standard version of the algorithm would consider all data associations with non-negligible marginal probability and proceed to the next time step. Instead, we consider a modified version of the algorithm where a single observation in \(\bar{Z}_{k-1}\) is selected at random for each path in \(O_{k-1}\); this allows for keeping constant the number of considered paths through all time steps. We also use the modelling based on possibility functions introduced in the previous sections instead of the probabilistic modelling considered in Houssineau and Clark (2018). The different steps of this modified HISP filter are given in the following sections.

The context is as follows: denoting A the current state of the MCMC chain, we aim to reassign a subset \(A_{\mathrm {r}}\) of A by using the HISP filter. This implies that the other paths in \(A \setminus A_{\mathrm {r}}\) are not to be modified. However, since a given observation can only appear in a single path, it follows that the observations contained in the paths in \(A \setminus A_{\mathrm {r}}\) cannot be used when reassigning paths in \(A_{\mathrm {r}}\). As a consequence, only a subset of all observations can be used within the HISP filter and this subset is denoted \(Z_k^-\) at time step \(k \in \{1,\dots ,K\}\). We will assume in this section that the credibility \(\alpha _{\mathrm {nd}}\) of detection failure and the credibility \(\alpha _{\mathrm {ns}}\) of non-survival are constant over the state space for the sake of simplicity; as opposed to the probabilistic case, this can be achieved in general by selecting the (constant) credibility of detection failure to be \(\sup _{x \in \mathsf {X}} \alpha _{\mathrm {nd}}(x)\) and similarly for the credibility of non-survival. This operation can be seen as a voluntary loss of information with the purpose of gaining a property of interest: we forgo specific spatial information regarding detection and survival in order to make \(\alpha _{\mathrm {nd}}\) and \(\alpha _{\mathrm {ns}}\) constant and allow a Gaussian implementation to be used.

3.3.1 Initialisation

We assume that, when reassigning the set \(A_{\mathrm {r}}\) of paths, the MCMC algorithm specifies when and at which observations the new paths should start. The corresponding set is denoted \(Z_{\mathrm {c}} = \{(z^i_{\mathrm {c}},k^i_{\mathrm {c}})\}_{i=1}^{N_{\mathrm {c}}}\) with, for any \(i \in \{1,\dots ,N_{\mathrm {c}}\}\), \(z^i_{\mathrm {c}}\) the observation where one of the new paths should start and with \(k^i_{\mathrm {c}}\) the corresponding time step. These observations might or might not be at the same time step but the pairs \((z^i_{\mathrm {c}},k^i_{\mathrm {c}})\), \(i \in \{1,\dots ,N_{\mathrm {c}}\}\), are assumed to be different from each other. A path will be initialised every time one of these observations is encountered in the sets of observation \(Z_0^-,\dots ,Z_K^-\).

3.3.2 Prediction

We denote by \(O_{k-1}\) the set of paths at time \(k-1\), that is, the subset of \(\mathcal {O}_{k-1}\) composed of paths that have been selected so far as potential sequences of object-originated observations. To each path \(o\in O_{k-1}\) corresponds a possibility function \(f_{k-1}(\cdot \,|\,o)\) on the state space \(\mathsf {X}\). Recalling that \(g_k\) is the Markov transition from \(\mathsf {X}\) to itself describing the objects’ dynamics, we obtain the predicted possibility function

$$\begin{aligned} f_{k|k-1}(x \,|\,o) = \sup _{x' \in \mathsf {X}} g_k( x \,|\,x' ) f_{k-1}(x' \,|\,o), \qquad x \in \mathsf {X}. \end{aligned}$$

Such a prediction only considers the event where the object survives to the kth time step although it is possible for objects to disappear. We postpone considerations of this aspect of the prediction to a further stage in the algorithm.

3.3.3 Update

At time step k, the set of observations \(Z_k^-\) is available to update the existing paths. For any path \(o\) in the set \(O_{k-1}\) of previously selected paths and for any new observation \(z \in Z_k^- \cup \{\phi \}\), the posterior possibility function associated with the extended path \(o:z\), with “ : ” denoting concatenation, is defined as

$$\begin{aligned} f_k(x \,|\,o:z) = \left\{ \begin{array}{l@{\quad }l} \dfrac{\ell _k(z \,|\,x) f_{k|k-1}(x \,|\,o)}{\sup _{x' \in \mathsf {X}} \ell _k(z \,|\,x') f_{k|k-1}(x' \,|\,o)} &{} \hbox {if }z \in Z^-_k \\ f_{k|k-1}(x \,|\,o) &{} \hbox {if }z = \phi . \end{array}\right. \end{aligned}$$

We can then select which observation in \(Z_k^- \cup \{\phi \}\) will be used to propagate the path based on the credibility of the corresponding association. However, before expressing the latter, we first have to introduce the prior credibility of presence, which depends on the consecutive number of time steps for which the path under consideration has not been detected. Indeed, there is some remaining ambiguity whenever the empty observation \(\phi \) is selected for a path since it is unclear in this case whether the detection has failed for the corresponding object or the object has not survived the last prediction step. We purposefully maintain this ambiguity and postpone the decision in order to better estimate which of these two events occur. Indeed, the credibility of non-survival is most often much lower than the credibility of detection failure, e.g. \(\alpha _{\mathrm {nd}} = 0.1\) and \(\alpha _{\mathrm {ns}} = 0.001\), so that terminating a track after a single detection failure is unlikely. Yet, if detection failures keep occurring for l time steps, then the credibility of the corresponding events, i.e. \(\alpha _{\mathrm {nd}}^l\) for the case where the object remains and \(\alpha _{\mathrm {ns}}\) for the case where the object has disappeared, will rapidly favour a disappearance as opposed to a sequence of detection failures. For any path \(o\in O_{k-1}\), we denote by \(l_{o}\) the number of consecutive time steps for which \(\phi \) has been selected, e.g. if \(o\) is of the form \((o_1,\dots ,o_{k-3},\phi ,\phi )\) with \(o_{k-3} \ne \phi \) then \(l_{o} = 2\). We then compute the credibility that the corresponding object has survived/not survived since the last detection as

$$\begin{aligned} \hat{\alpha }_{\mathrm {s}}(o) = \dfrac{\alpha _{\mathrm {nd}}^{l_{o}}}{ \alpha _{\mathrm {ns}} \vee \alpha _{\mathrm {nd}}^{l_{o}}}, \qquad \hat{\alpha }_{\mathrm {ns}}(o) = \dfrac{\alpha _{\mathrm {ns}}}{ \alpha _{\mathrm {ns}} \vee \alpha _{\mathrm {nd}}^{l_{o}}}, \end{aligned}$$

with \(a \vee b \doteq \max \{a,b\}\) for any \(a,b \in \mathbb {R}\). The binary operator \(\vee \) is assumed to have lower precedence than multiplication, i.e. \(a \vee bc = a \vee (bc)\) for any \(a,b,c \in \mathbb {R}\).

We can now express the marginal credibility of association on \(Z_k^- \cup \{\phi \}\) for the path \(o\in O_{k-1}\) as

$$\begin{aligned} \gamma _k(z \,|\,o) \propto \varGamma _k\big (Z^-_k \setminus \{z\} \,|\,O_{k-1} \setminus o\big ) L_k(z \,|\,o) \end{aligned}$$

for any observation \(z \in Z^-_k \cup \{\phi \}\), with

$$\begin{aligned} L_k(z \,|\,o) = \left\{ \begin{array}{l@{\quad }l} \displaystyle \hat{\alpha }_{\mathrm {s}}(o) \sup _{x \in \mathsf {X}} \ell _k(z \,|\,x) f_{k|k-1}(x \,|\,o) &{} \hbox {if }z \in Z^-_k \\ \hat{\alpha }_{\mathrm {ns}}(o) \vee \hat{\alpha }_{\mathrm {s}}(o)\big ( \alpha _{\mathrm {ns}} \vee \alpha _{\mathrm {nd}} \big ) &{} \hbox {otherwise} \end{array}\right. \end{aligned}$$

the marginal likelihood for the observation z and with \(\varGamma _k(Z \,|\,O)\) the credibility for paths in the set \(O \subseteq O_{k-1}\) to be associated with observations in the set \(Z \subseteq Z^-_k\), which can be expressed as

$$\begin{aligned} \varGamma _k(Z \,|\,O) = \max _{\sigma : O \rightarrow Z \cup \{\phi \}} f_{\mathrm {fa}}(Z \setminus \sigma (O)) \prod _{o\in O} L_k(\sigma (o) \,|\,o), \end{aligned}$$

where the maximum is over all mappings \(\sigma \) from O to \(Z \cup \{\phi \}\) that are injective on Z and where \(\sigma (O)\) is the image of O by \(\sigma \), i.e. \(\sigma (O) = \{ \sigma (o) : o\in O \}\). Although the number of simultaneously reassigned paths will be limited in the context of interest, the number of observations in Z can be extremely large so that the computation of \(\varGamma _k(Z \,|\,O)\) can be challenging. Yet, it is possible to rewrite this term by assuming that any two paths in O are unlikely to obtain large marginal likelihoods from a single observation in Z, that is, for any \(o, o' \in O\) such that \(o\ne o'\) and any \(z \in Z\), there exists \(z' \in Z\) such that

$$\begin{aligned} L_k(z \,|\,o) L_k(z \,|\,o') \le L_k(z \,|\,o) L_k(z' \,|\,o'). \end{aligned}$$
(5)

In the probabilistic version of this assumption (Houssineau and Clark 2018), the left-hand side needs to be equal to 0, which is more constraining. It follows that \(\varGamma _k(Z \,|\,O)\) can be expressed as

$$\begin{aligned} \varGamma _k(Z \,|\,O) = f_{\mathrm {fa}}(Z) \prod _{o\in O} \bigg [ L_k(\phi \,|\,o) \vee \max _{z \in Z} \dfrac{L_k(z \,|\,o)}{\alpha _{\mathrm {fa}}(z)} \bigg ]. \end{aligned}$$

This result can be proved easily by developing the product in the approximated expression and removing the terms where a single observation is associated with several tracks. Using this expression, all the terms \(\varGamma _k(Z^-_k \setminus \{z\} \,|\,O_{k-1} \setminus o)\), for any \(z \in Z^-_k \cup \{\phi \}\) and any \(o\in O_{k-1}\), can be calculated with a computational complexity of order \(|O_{k-1}||Z^-_k|\). The approach is similar to the one detailed in Houssineau and Clark (2018) for the probabilistic case.

We then randomly select an observation in \(Z_k \cup \{\phi \}\) for each of the paths in \(O_{k-1}\) by sampling from the maximum entropy distribution induced by the marginal credibility of association \(\gamma _k(\cdot \,|\,o)\). There are two ways of enforcing the modelling assumption that paths cannot contain the same observation:

  1. i)

    Use a rejection sampling strategy at the level of the HISP algorithm to ensure that only acceptable data associations are proposed and

  2. ii)

    Completely reject the proposed data association at the level of the MCMC algorithm if it contains overlapping paths.

The main drawback with the first option is that calculating the probability of proposing a given acceptable data association is combinatorial in nature and becomes a computational bottleneck when the number of observations is large. We therefore consider the second option.

Finally, we initialise a new path for any \((z^i_{\mathrm {c}}, k^i_{\mathrm {c}})\), \(i \in \{1,\dots ,N_{\mathrm {c}}\}\), such that \(k^i_{\mathrm {c}} = k\). This path is of the form \(o= (\phi ,\dots ,\phi ,z^i_{\mathrm {c}})\).

At the last time step, a decision is taken for all observations paths, even the one ending with empty observations, and a set \(A_{\mathrm {c}}\) is defined as the set of all created paths. The conditional probability for generating the set of paths \(A_{\mathrm {c}}\) given the initial observations \(Z_{\mathrm {c}}\) and the available observations \(Z^-_1, \dots , Z^-_K\) is denoted \(P_{\mathrm {c}}(A_{\mathrm {c}} \,|\,Z_{\mathrm {c}}, Z^-_{1:K})\).

3.4 Structure on the set of paths

In order to help exploring the set of data associations \(\mathcal {A}\), it is useful to equip the underlying set of paths \(\mathcal {O}_K\) with additional structure; in particular, the objective is to identify which pairs of points in the observation space \(\mathsf {Z}\) are likely to be consecutive observations of the same object. The only natural structure on \(\mathcal {O}_K\) is the one inherited from the fact that the set \(\mathsf {Z}\) is a subset of an Euclidean space. This is not, however, sufficient since simply measuring the distance between two observations \(z_k\) and \(z'_{k'}\) at two different time steps \(k \ne k'\) as \(\Vert z_k - z'_{k'} \Vert \), with \(\Vert \cdot \Vert \) the Euclidean norm, does not take into account the structure of the problem. Moreover, the notion of distance is very model dependent and what is considered as “close” or “far” would need to be adjusted for each scenario. For instance, depending on how large the speed of objects is likely to be, observations at two consecutive time steps that are 100 meters apart might be very likely or very unlikely to originate from the same object; similarly, when the credibility of detection failure increases, the distance between observations should also decrease as objects are more likely to be undetected for several time steps and therefore move further before being detected again. One partial solution would be to consider the Mahalanobis distance which can be seen as a rescaling of the Euclidean distance based on a covariance matrix; we aim to go further and use all the aspects of the objects’ dynamical and observation model as a reference and relate observations via the credibility for these observations to be generated by the same object. These observations can be seen as consistent if that credibility is close to 1 and inconsistent if it is close to 0. In order to simplify the calculations, we assume the existence of an upper bounding function g for the Markov transition \(g_k\) such that \(g_k(x \,|\,x') \le g(x \,|\,x')\), for any \(x,x' \in \mathsf {X}\) and for any \(k \in \{1,\dots ,K\}\), with g of the form

$$\begin{aligned} g( x \,|\,x') = \overline{\mathrm {N}}(x; Fx', Q), \qquad x,x' \in \mathsf {X}, \end{aligned}$$

for some \(d_{\mathsf {X}} \times d_{\mathsf {X}}\) matrices F and Q. We also introduce a lower bound \(a_{\mathrm {nd}}\) for the credibility of non-detection, i.e. \(a_{\mathrm {nd}}\) is such that \(\alpha _{\mathrm {nd}}(x) \ge a_{\mathrm {nd}}\) for any \(x \in \mathsf {X}\).

We consider two time steps \(k,k' \in \{1,\dots ,K\}\) such that \(k < k'\) as well as two observations z and \(z'\) at time steps k and \(k'\), respectively, and introduce \(f_{k'|k}(z' \,|\,z)\) as the possibility for an object initialised from z at time step k to be next observed at time step \(k'\) at \(z'\), that is,

$$\begin{aligned} f_{k'|k}(z' \,|\,z) = a_{\mathrm {nd}}^{l-1} \sup _{x,x' \in \mathsf {X}} \ell _k(z' \,|\,x') g^l(x' \,|\,x) f_k(x \,|\,z) \end{aligned}$$

where \(l = k' - k\), where \(g^l\) is the l fold convolution of the transition g, that is, for any \(x_k, x_{k'} \in \mathsf {X}\),

$$\begin{aligned} g^l(x_{k'} \,|\,x_k) = \sup _{x_{k+1},\dots ,x_{k'-1} \in \mathsf {X}} g(x_{k'} \,|\,x_{k'-1}) \dots g(x_{k+1} \,|\,x_k) \end{aligned}$$

and where \(f_k(\cdot \,|\,z)\) is the posterior possibility function defined as

$$\begin{aligned} f_k(x \,|\,z) = \dfrac{\ell _k(z \,|\,x) f_0(x)}{\sup _{x' \in \mathsf {X}} \ell _k(z \,|\,x') f_0(x')}, \qquad x \in \mathsf {X}. \end{aligned}$$

The possibility function \(g^l(\cdot \,|\,x_k)\) is an upper bound for the convolution of the Markov transitions \(g_{k+1}, \dots , g_{k'}\). Assuming that \(f_k(\cdot \,|\,z) = \overline{\mathrm {N}}(m_z, \varSigma _0)\) and denoting by \(\varSigma _l\) the covariance matrix after l predictions, e.g. \(\varSigma _1 = F\varSigma _0F^{\intercal } + Q\), then the possibility function \(f_{k'|k}(\cdot \,|\,z)\) can be written

$$\begin{aligned}&f_{k'|k}(z' \,|\,z) =a_{\mathrm {nd}}^{l-1} \overline{\mathrm {N}}( z'; H_{k',z,l} F^l m_z, H_{k',z,l}\varSigma _l H_{k',z,l}^{\intercal } + R_{k'}) \end{aligned}$$

where \( H_{k',z,l}\) is the Jacobian of \(h_{k'}\) at the point \(F^l m_z\). The function \(f_{k'|k}\) can be easily extended to \(\mathsf {Z} \cup \{\phi \}\) by defining \(f_{k'|k}(\cdot \,|\,\phi ) = f_{k'|k}(\phi \,|\,\cdot ) = 0\).

Example 1

To illustrate the use of the notion of consistency provided by \(f_{k'|k}\), a simple scenario consisting of five objects is considered as in Fig. 1a. For each observation \(z \in Z_k\) at some time step \(k \in \{1,\dots ,K\}\), we compute a marginal credibility for z as

$$\begin{aligned} \hat{f}_k(z) = \max _{k' : k' > k} \Big ( \max _{z' \in Z_{k'}} f_{k'|k}(z' \,|\,z) \Big ). \end{aligned}$$
(6)

The scalar \(\hat{f}_k(z)\) can be interpreted as the credibility for z to be followed by another observation in \(Z_{k'}\) for some \(k' > k\). When creating a new track, we can then define the probability for selecting z as the first observation of the new track as a function of \(\hat{f}_k(z)\). A scatter plot displaying these credibilities for all observations is shown in Fig. 1b. As mentioned in the caption of Fig. 1, the superposition of observations at all times means that the spatio-temporal aspect of the data is not represented; in this sense, Fig. 1b better captures the complexity of the problem by showing how “close” are observations originating from a given target and how “far” are false alarms, which are two important factors in the assessment of the complexity of a given scenario. The probabilistic analogue of the scalar \(\hat{f}_k(z)\) would penalise detection failures more strongly since the fitness of the model would decrease quickly with the lag \(k'-k\), especially in the presence of a large dynamical noise; this means that our approach leads to a higher probability of proposing the initialisation a new path for objects with challenging detection profiles, e.g. when there are gaps between most observations.

Fig. 1
figure 1

Scenario with 5 objects as represented in a with the corresponding scatter plot of the probability for each observation to be selected in b. Observations at all time steps are superimposed in these figures, which do not represent the available spatio-temporal that is crucial in solving the corresponding data association problem. Although the trajectories of two objects cross, the objects do not reach the crossing point at the same time

The advantage of relating observations in this way is that it can be easily extended to paths. Indeed, we can define the consistency between an observation \(z \in Z_k\) at some given time k with a path \(o\) in \(\mathcal {O}_K\) as

$$\begin{aligned}&\hat{f}_k(z, o) = \max \Big \{ \max _{k' : k' < k} f_{k|k'}( z \,|\,o_{k'}),\, \max _{k' : k' > k} f_{k'|k}( o_{k'} \,|\,z) \Big \}, \end{aligned}$$
(7)

where the initial observation is either z or one of the observations in \(o\). Similarly, the consistency between two paths \(o\) and \(o'\) in \(\mathcal {O}_K\) is defined as

$$\begin{aligned}&\hat{f}(o, o') =\max \Big \{ \max _{k,k' : k' < k} f_{k|k'}( o_k \,|\,o'_{k'}),\, \max _{k,k' : k' > k} f_{k'|k}( o'_{k'} \,|\,o_k) \Big \}. \end{aligned}$$

We can now propose to modify a given data association by changing nearby paths, and therefore focus the computational power on moves that are likely to be accepted. Although the approach considered here is not standard, it has two main advantages: it relates observations together and applies to nonlinear cases as long as a Gaussian upper-bounding function can be found.

In practice, it might be necessary to reduce the time required for computing \(f_{k'|k}(z' \,|\,z)\) between any pair \((z,z')\) of observations, especially if the scenario runs over many times steps or if the number of observations at every time step is large. In that case, one can define a threshold \(\tau '\) such that if \(a_{\mathrm {nd}}^l < \tau '\) then any observations that are l time steps apart will be arbitrarily assigned a credibility of 0.

3.5 Design of the proposal distribution

When designing a proposal distribution \(\varPhi \) for our MCMC algorithm, several requirements need to be considered: it should be possible to

  1. i)

    Reassign several paths simultaneously in order to address crossings and track fragmentation,

  2. ii)

    Reassign both the initial observation of a path and subsequent observations and

  3. iii)

    Create a new path.

Requirement i) can be easily fulfilled by using the approach presented in Sect. 3.3; however, instead of simply choosing the paths at random, it is more efficient to focus on nearby paths. In order to simultaneously reassign the initial observations of a given set of paths \(A_{\mathrm {r}}\) as needed in Requirement ii), we consider the notion of consistency defined in (7). Once a new initial observation has been selected, the approach of Sect. 3.3 can be used to reassign the rest of the chosen path. Finally, the marginal consistency defined in (6) can be used for Requirement iii) in order to identify observations that are likely to be initial observations.

The general objective is to find a proposal distribution \(\varPhi \) that is as simple as possible and such that the associated MCMC kernel is irreducible and reversible. Starting from a given set of paths A of size \(s = |A|\), we suggest to proceed as follows:

  1. 1)

    Sample a number \(N_{\mathrm {r}}\) of paths to reassign from a probability mass function (p.m.f.) \(p_{\mathrm {r}}(\cdot \,|\,s)\) such that \(N_{\mathrm {r}} \le s\) almost surely (a.s.), e.g. a truncated Poisson distribution. Then, sample the number \(N_{\mathrm {c}}\) of paths to be created from the p.m.f. \(p_{\mathrm {c}}(\cdot \,|\,N_{\mathrm {r}})\) on the set of non-negative integers \(\mathbb {N}\) defined as

    $$\begin{aligned} p_{\mathrm {c}}(n \,|\,N_{\mathrm {r}}) \left\{ \begin{array}{l@{\quad }l} \delta _1(n) &{} \hbox {if }N_{\mathrm {r}} = 0 \\ \tilde{p}_{\mathrm {c}}(n - N_{\mathrm {r}}) &{} \hbox {otherwise}, \end{array}\right. \end{aligned}$$

    with \(\tilde{p}_{\mathrm {c}}\) a p.m.f. on \(\{-1,0,1\}\) to be defined. With this model, there will be one created path a.s. when none are reassigned (there is limited interest in creating several paths at once in this case) and the number of paths will be increased by one, kept constant or decreased by one in case of reassignment. Reducing the number of paths by one will address the issue of track fragmentation, keeping the number of paths constant is appropriate when considering objects with crossing trajectories, and leaving the possibility of increasing the number of paths is required to ensure reversibility. Indeed, when evaluating the probability of the reverse proposal, created paths will become reassigned paths and vice versa.

  2. 2)

    If \(N_{\mathrm {r}} = 0\), then define \(A_{\mathrm {r}} = \emptyset \) and proceed to the next step; otherwise, select the set \(A_{\mathrm {r}} = \{o_i\}_{i=1}^{N_{\mathrm {r}}}\) of paths to be reassigned as follows: the first path \(o_1\) is picked uniformly at random from the set of paths A and then the \(N_{\mathrm {r}}-1\) remaining paths, if any, are selected based on their distance to \(o_1\):

    $$\begin{aligned} o_i \sim \mathcal {P}_{o_{1:i-1}}\big ( \hat{f}(o_1,\cdot ) \big ) \end{aligned}$$

    for any \(1 < i \le N_{\mathrm {r}}\), where \(\mathcal {P}_{o_{1:i-1}}(\cdot )\) is a function transforming possibility functions into probability distributions, e.g. the maximum-entropy distribution upper-bounded point-wise by \(\hat{f}(o_1,\cdot )\), which we assume to verify

    $$\begin{aligned} \sum _{o\in A} \mathcal {P}_{o_{1:i-1}}\big ( \hat{f}(o_1,\cdot )\big )(o) = 1 \end{aligned}$$

    and \(\mathcal {P}_{o_{1:i-1}}\big (\hat{f}(o_1,\cdot )\big )(o_j) = 0\) for all \(j \in \{1,\dots ,i-1\}\). Therefore, the set of paths \(A_{\mathrm {r}}\) is sampled without replacement from the set A. When evaluating the probability \(P_{\mathrm {r}}(A_{\mathrm {r}} \,|\,N_{\mathrm {r}})\) for sampling the subset \(A_{\mathrm {r}}\) of A, all possible ways of obtaining such a subset must be taken into account, that is,

    $$\begin{aligned}&P_{\mathrm {r}}(A_{\mathrm {r}} \,|\,N_{\mathrm {r}}, A) = \left\{ \begin{array}{l@{\quad }l} s^{-1} \sum _{\sigma \in S_{N_{\mathrm {r}}}} \prod _{i=2}^{N_{\mathrm {r}}} \mathcal {P}_{o_{\sigma (1):\sigma (i-1)}}\big (\hat{f}(o_1,\cdot )\big )(o_{\sigma (i)}) &{} \hbox {if}\,N_{\mathrm {r}} > 0 \\ \delta _{\emptyset }(A_{\mathrm {r}}) &{} \hbox {otherwise}, \end{array}\right. \end{aligned}$$

    where \(S_n\) is the set of permutations of \(\{1,\dots ,n\}\). Although the computational complexity for this term is combinatorial, \(N_{\mathrm {r}}\) is usually small so the actual computational time is limited.

  3. 3)

    If \(N_{\mathrm {c}} = 0\) then define \(Z_{\mathrm {c}} = \emptyset \) and proceed to the next step; otherwise, select the \(N_{\mathrm {c}}\) initial observations \(Z_{\mathrm {c}} = \{(z^i_{\mathrm {c}},k^i_{\mathrm {c}})\}_{i=1}^{N_{\mathrm {c}}}\) from the set \(\bigcup _{k=1}^K \{(z,k) : z \in Z^-_k\}\) of available observations, with \(Z^-_k\) defined for any \(k \in \{1,\dots ,K\}\) as

    $$\begin{aligned} Z^-_k = \big \{ z \in Z_k: \forall (o,k_-) \in A \setminus A_{\mathrm {r}}, z \ne o_k \big \}. \end{aligned}$$

    The selection of the initial observations is performed without replacement as

    $$\begin{aligned} \hat{z}^i_{\mathrm {c}} \sim \mathcal {P}_{(\hat{z}^1_{\mathrm {c}},\dots ,\hat{z}^{i-1}_{\mathrm {c}})}\big (\hat{f}^{N_{\mathrm {r}}}_{\mathrm {c}}\big ) \end{aligned}$$

    where, for any \(j \in \{1,\dots ,N_{\mathrm {c}}\}\), \(\hat{z}^j_{\mathrm {c}}\) stands for the pair \((z^j_{\mathrm {c}}, k^j_{\mathrm {c}})\), and where the possibility function \(\hat{f}^{N_{\mathrm {r}}}_{\mathrm {c}}\) is equal to the marginal consistency defined in (6) if \(N_{\mathrm {r}} = 0\) and as the consistency defined in (7) with the future observation in the paths in \(A_{\mathrm {r}}\) otherwise. Indeed, when reassigning \(N_{\mathrm {r}} > 0\) paths, it is more efficient to propose new paths in the same area rather than initialising paths in random locations, especially during the burn-in period of the MCMC when observations in different places are likely to originate from objects. The probability of proposing the subset \(Z_{\mathrm {c}}\) of observations takes a similar form as for path reassignment and can be expressed as

    $$\begin{aligned}&\tilde{P}_{\mathrm {c}}\big (Z_{\mathrm {c}} \,|\,N_{\mathrm {c}}, Z^-_{1:K}\big ) = \left\{ \begin{array}{l@{\quad }l} \sum _{\sigma \in S_{N_{\mathrm {c}}}} \prod _{i=1}^{N_{\mathrm {c}}} \mathcal {P}_{(\hat{z}^{\sigma (1)}_{\mathrm {c}},\dots ,\hat{z}^{\sigma (i-1)}_{\mathrm {c}})}\big (\hat{f}^{N_{\mathrm {r}}}_{\mathrm {c}}\big )(\hat{z}^{\sigma (i)}_{\mathrm {c}}) &{}{} \text{ if }\,N_{\mathrm {c}} > 0 \\ \delta _{\emptyset }(Z_{\mathrm {c}}) &{}{} \text{ otherwise }. \end{array}\right. \end{aligned}$$

    The comment regarding computational complexity made about \(P_{\mathrm {r}}(\cdot \,|\,N_{\mathrm {r}})\) applies equally here.

  4. 4)

    Apply the approximate multi-object filter of Sect. 3.3 to the set of initial observations \(Z_{\mathrm {c}}\) and with the sets of available observations \(Z^-_1,\dots ,Z^-_K\) and denote \(A_{\mathrm {c}}\) the generated set of paths. If \(A_{\mathrm {c}} \cap A_{\mathrm {r}} \ne \emptyset \), then we reject the proposal; otherwise, the proposed set of paths is \(A' = (A \setminus A_{\mathrm {r}}) \cup A_{\mathrm {c}}\). The reason for rejecting the proposal when \(A_{\mathrm {c}} \cap A_{\mathrm {r}} \ne \emptyset \) is to ensure that \(A_{\mathrm {c}}\) and \(A_{\mathrm {r}}\) can be recovered from A and \(A'\) as \(A_{\mathrm {c}} = A' \setminus A\) and \(A_{\mathrm {r}} = A \setminus A'\).

If the proposal has not been already rejected during its construction, the probability \(\varPhi (A' \,|\,A)\) to go from the previous set of paths A to the new set of paths \(A'\) is computed as

$$\begin{aligned} \varPhi (A' \,|\,A)= & {} P_{\mathrm {c}}\big (A_{\mathrm {c}} \,|\,Z_{\mathrm {c}}, Z^-_{1:K}\big ) \tilde{P}_{\mathrm {c}}\big (Z_{\mathrm {c}} \,|\,N_{\mathrm {c}}, Z^-_{1:K}\big ) \\&\times P_{\mathrm {r}}(A_{\mathrm {r}} \,|\,N_{\mathrm {r}}, A) p_{\mathrm {c}}(N_{\mathrm {c}} \,|\,N_{\mathrm {r}}) p_{\mathrm {r}}(N_{\mathrm {r}} \,|\,s). \end{aligned}$$

The probability \(\hat{\alpha }(A,A')\) of accepting the proposed set of paths \(A'\) can then be computed using (4).

4 MCMC on the set of tracks

We now want to design a MCMC algorithm that targets the possibility function \(\varPi \) as introduced in (2). In this case, the Metropolis–Hastings acceptance ratio is

$$\begin{aligned} \alpha _t(T,T') = \min \bigg (1, \dfrac{\varPi (T')^{\rho _t} \varPsi (T \,|\,A)\varPhi (A \,|\,A')}{\varPi (T)^{\rho _t}\varPsi (T' \,|\,A')\varPhi (A' \,|\,A)} \bigg ), \end{aligned}$$

with A and \(A'\) the set of paths in T and \(T'\), respectively. We therefore have to propose a time of appearance and a last time of existence for each path in A. These time steps will sampled independently from their previous values in T.

4.1 Proposing the interval of existence

The objective in this section is to propose a time of appearance m and a last time of existence n for a given path \(o\in \mathcal {O}_K\), using the different quantities introduced in Sect. 3.3. We consider a path \(o\in \mathcal {O}_K\) of the form \(o':\phi \). One can sample the lag corresponding to the last time of appearance according to the probability mass function \(p_-(\cdot \,|\,o)\) on \(\{0,\dots ,l_{o}\}\) defined as the maximum-entropy distribution bounded by \(l \mapsto \alpha _{\mathrm {ns}}^{\mathbb {1}(l < l_{o})}\alpha _{\mathrm {nd}}^l\). The last time of existence is set to \(n = K - l_{o} + L_-\). For the time of appearance m associated with a path \(o\in \mathcal {O}_K\), we can simply sample a lag \(L_+\) from the maximum-entropy distribution \(p_+(\cdot \,|\,o)\) bounded by \(l \mapsto \alpha _{\mathrm {nd}}^l\) and set \(n = k_+(o) - L_+\) with \(k_+(o)\) the time of the first observation in \(o\). The probability distribution \(\varPsi (\cdot \,|\,A)\) is then associated with the proposal of a time of appearance and a last time of existence for each path in a given set \(A \in \mathcal {A}\), i.e.

$$\begin{aligned} \varPsi (T \,|\,A) = \delta _{A}(\kappa (T))\prod _{(o,m,n) \in T} \big [ p_+(m \,|\,o) p_-(n \,|\,o) \big ]. \end{aligned}$$

4.2 Evaluating the marginal likelihood

So far, the proposed approach does not assume a specific model for the dynamics and for the observation process. Indeed, although the likelihood \(\ell _k(\cdot \,|\,x)\) is assumed to take the form of a Gaussian possibility function, the function h relating states to observations is general. We will, however, distinguish two different cases for the evaluation of the marginal likelihood: the linear-Gaussian case in which Kalman filtering can be used and the nonlinear case where sequential Monte Carlo techniques are a natural alternative.

4.2.1 Linear-Gaussian case

If the Markov transition \(g_k\) is of the form \(g_k(\cdot \,|\,x') = \overline{\mathrm {N}}(F_k x', Q_k)\) for some \(d_{\mathsf {X}} \times d_{\mathsf {X}}\) matrices \(F_k\) and \(Q_k\) and for any \(k \in \{1,\dots ,K\}\) and if the observation function \(h_k\) is of the form \(h_k(x) = H_k x\), then the posterior distribution of the state at any time step can be computed analytically via the Kalman filter. In particular, for a given path \(o\in \mathcal {O}_{k-1}\), we denote by \(m_k^{o}\) and \(\varSigma _k^{o}\) the mean and variance of the state at time \(k \in \{1,\dots ,K\}\) given the observations in the path \(o\). The only difference with the standard Kalman filtering equation is the marginal likelihood which, due to the form of the likelihood, is expressed at time step k as

$$\begin{aligned} \hat{\ell }_k(z \,|\,o)&= \sup _{x \in \mathsf {X}} \ell _k(z \,|\,x) f_{k|k-1}(x \,|\,o) \\&= \overline{\mathrm {N}}(z; H_k m_k^{o}, H_k \varSigma _k^{o} H_k^{\intercal } + R_k) \end{aligned}$$

for any \(z \in Z_k\).

4.2.2 Nonlinear case

If either the objects’ dynamics or the observation function is not linear, then there is no analytical form for the filtering distributions at different time steps in general. Sequential Monte Carlo (SMC) methods are an alternative to the Kalman filter in this case. An analogue of the bootstrap particle filter Gordon et al. (1993) can be used as in Houssineau and Ristic (2017), see also Ristic et al. (2018) and Ristic et al. (2020). In particular, for a given path \(o\in \mathcal {O}_{k-1}\), we denote by \(\{(w^{o}_{k-1,i}, x^{o}_{k-1,i})\}_{i=1}^N\) the indexed family of weighted particles approximating the predicted possibility function \(f_{k|k-1}(\cdot \,|\,o)\), i.e.

$$\begin{aligned} \bar{\mathbb {E}}(\varphi (\varvec{x}_{k}) \,|\,o) \approx \max _{1 \le i \le N} w^{o}_{k-1,i} \varphi (x^{o}_{k-1,i}) \end{aligned}$$

for any real-valued function \(\varphi \) on \(\mathsf {X}\), with the uncertain variable \(\varvec{x}_k\) being described by \(f_{k|k-1}(\cdot \,|\,o)\). Then

$$\begin{aligned} \bar{\mathbb {E}}(\varphi (\varvec{x}_{k}) \,|\,o:z) \approx \dfrac{\max _i w^{o:z}_{k,i} \varphi (x^{o}_{k-1,i})}{\max _i w^{o:z}_{k,i}} \end{aligned}$$

for any \(z \in Z_k\), where \(w^{o:z}_{k,i} = w^{o}_{k-1,i} \ell _k(z \,|\,x^{o}_{k-1,i})\) for any \(i \in \{1,\dots ,N\}\). In this situation, the marginal likelihood at time step k can be approximated by

$$\begin{aligned} \hat{\ell }_k(z \,|\,o) \approx \max _{1\le i \le N} w^{o:z}_{k,i}. \end{aligned}$$

5 Simulations

In all the cases to be considered, \(K=50\) and \(\mathsf {X} = \mathbb {R}^4\). States at time step k are of the form \(x_k = (\mathbf {x}_k, \mathbf {y}_k, \dot{\mathbf {x}}_k, \dot{\mathbf {y}}_k)^{\intercal }\), where \(\mathbf {x}_k\) and \(\mathbf {y}_k\) are the coordinates of the position in the 2-dimensional Euclidean space and where \(\dot{\mathbf {x}}_k\) and \(\dot{\mathbf {y}}_k\) are the coordinates of the velocity. The duration of one time step is denoted \(\varDelta \) and the motion model is assumed to be of the form

$$\begin{aligned} q_k(x_k \,|\,x_{k-1}) = \mathcal {N}( x_k; F x_{k-1}, Q) \end{aligned}$$

with

$$\begin{aligned} F = \begin{bmatrix} 1 &{} 0 &{} \varDelta &{} 0 \\ 0 &{} 1 &{} 0 &{} \varDelta \\ 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{bmatrix} \end{aligned}$$

and

$$\begin{aligned} Q = \sigma _{\mathrm {a}}^2 \begin{bmatrix} \varDelta ^4/4 &{} 0 &{} \varDelta ^3/2 &{} 0 \\ 0 &{} \varDelta ^4/4 &{} 0 &{} \varDelta ^3/2 \\ \varDelta ^3/2 &{} 0 &{} \varDelta ^2 &{} 0 \\ 0 &{} \varDelta ^3/2 &{} 0 &{} \varDelta ^2 \end{bmatrix}, \end{aligned}$$

where \(\sigma _{\mathrm {a}}\) is the standard deviation of the zero-mean random acceleration, which is considered as a noise term. This model is referred to as the nearly constant velocity model. We will consider in particular the case where \(\varDelta = 1\) and \(\sigma _{\mathrm {a}} = 0.05\).

For the sake of simplicity, the observation model is assumed to be linear; the position \((\mathbf {x}_k, \mathbf {y}_k)^{\intercal }\) of an object is observed directly, which leads to \(h(x_k) = Hx_k\) with

$$\begin{aligned} H = \begin{bmatrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \end{bmatrix}. \end{aligned}$$

The variance R is of the form \(\sigma ^2 \varvec{I}_2\) with \(\sigma > 0\) and \(\varvec{I}_2\) the identity matrix of dimension 2. This model is useful when tracking directly in the coordinate systems defined by a sensor such as the image plane of a camera. Other situations where this model arises are when multiple sensors provide complex observations which can be combined into a single observation before being used in a tracking algorithm such as with GPS or with multiple-input multiple-outputs sensor systems (Bekkerman and Tabrikian 2006; Haimovich et al. 2007; Pailhas et al. 2016). We will consider in particular the case where \(\sigma = 0.3\) and \(\mathsf {Y} = [-60, 60] \times [-60, 60]\).

5.1 Parametrisation of the proposed algorithm

If the probability of detection is \(p_{\mathrm {d}}\) then the possibility of detection failure is set to \(\alpha _{\mathrm {nd}} = 1 - p_{\mathrm {d}}\) and the possibility of detection \(\alpha _{\mathrm {d}}\) is set to 1. The same approach is used with the probability of survival. The possibility function \(\alpha _{k,\mathrm {fa}}\) is assumed to be constant and equal to \(10^{-2}\) for all scenarios; this is in spite of the fact that the number of false alarms will vary significantly across the considered settings. The reason for this is that \(\alpha _{k,\mathrm {fa}}^n\) is seen as an upper bound for the probability of having n false alarms. A similar approach is used for appearing objects with \(f_{k,+}(n) = \alpha _+^n\) with \(\alpha _+ = 10^{-4}\). The other model parameters such as \(\sigma \) and \(\sigma _{\mathrm {a}}\) are assumed to be known.

The proposed approach is compared to the MCMC for Data Association (MCMC-DA) method introduced in Oh et al. (2009). In order to make the two methods comparable, the possibility function \(\varPi \) is used to evaluate the log-likelihood of the proposed sets of tracks. However, as opposed to the proposed approach, MCMC-DA is provided with the true parameters of the model in the design of the corresponding proposal distribution.

The main differences between MCMC-DA and our approach lies in the construction of the proposal distribution: MCMC-DA has a number of simple moves, whereas our approach focuses on one sophisticated move. The main consequence is the complexity of a single step of the corresponding MCMC algorithms: each step in MCMC-DA has a constant complexity, whereas each step in our approach has a complexity of order K; yet, this is compensated by the more efficient exploration of the set of data associations that our approach yields, as demonstrated below in a range of scenarios.

5.2 Choice of parameter

We assume that the current sample from \(\varPi \) is \(T \in \mathcal {T}\) and denote by \(A = \kappa (T)\) the corresponding set of paths. We then comment on the choice of parameters for the different steps in the proposal mechanism.

The number \(N_{\mathrm {r}}\) of tracks to reassign is chosen from a Poisson distribution with parameter \(\lambda _{\mathrm {r}} = 1\), truncated to the interval \(\{0, \dots , |A|\}\). The parameter \(\lambda _{\mathrm {r}}\) can be adjusted depending on the considered scenario: if objects are expected to be very close to each other and to frequently have crossing trajectories, then \(\lambda _{\mathrm {r}}\) could be increased to raise the average number of tracks that are reassigned at once. Large reassignments are, however, less likely to be accepted so that a trade-off between exploration and mixing must be found, as is usual with MCMC.

Fig. 2
figure 2

Simple scenario with performance comparison for different parameter choices

Fig. 3
figure 3

Scenario with high false-alarm rate

Fig. 4
figure 4

Scenario with low probability of detection

The distribution \(\tilde{p}_{\mathrm {c}}(\cdot \,|\,N_{\mathrm {r}})\) on \(\{-1, 0, 1\}\) drives the increase or decrease of the number of tracks in the proposal step. Since one of the main issues with the MCMC approach for data association is track fragmentation, i.e. the representation of a single object by a series of shorter tracks, it is generally helpful to focus on reducing the number of tracks. We therefore consider the following parametrisation:

$$\begin{aligned} \tilde{p}_{\mathrm {c}}(\delta \,|\,N_{\mathrm {r}}) = \left\{ \begin{array}{ll} \frac{1}{2} &{}{} \text{ if }\, \delta = -1 \\ \frac{1}{4} &{}{} \text{ if }\, \delta = 0 \\ \frac{1}{4} &{}{} \text{ if }\, \delta = 1. \end{array}\right. \end{aligned}$$

5.3 MCMC on the data association set

The choice of parameter and the performance of the proposed approach are assessed on different scenarios.

5.3.1 Simple scenario

We first consider a simple scenario, as shown in Fig. 2a, with ten false alarms and 0.1 appearing objects per time step on average and with a probability of detection of \(p_{\mathrm {d}} = 0.9\). The simplicity of the scenario is illustrated in Fig. 2a where it appears that most of the false alarms are far from any other observation and, conversely, object-originated observations are close to each other.

The performance of the two considered approaches is first assessed on a single run in Fig. 2c where the evolution of the log-likelihood is displayed as a function of the computational time. “HISP” refers to the proposed approach whereas “DA” refers to the MCMC-DA. The difference in behaviour between the proposed approach and MCMC-DA is due to the use of the simulated annealing in the former. Both methods provide satisfactory results in this case and the MCMC-DA’s chain mixes well. Figure 2d, which displays the performance averaged over 50 repeats, shows that setting the parameter c in the inverse temperature \(\rho _t\) to 0.001 provides the best performance throughout the duration of the runs.

5.3.2 Scenario with high false-alarm rate

We consider a first type of challenging scenario, depicted in Fig. 3a, with the following challenging characteristics: there are 100 false alarms and 0.5 appearing objects per time step on average and the probability of detection \(p_{\mathrm {d}}\) is equal to 0.8. In this case, it is the large number of false alarms that make the estimation difficult due to the fact that they are likely to form coherent observation sequences over 2 to 3 time steps. This aspect is illustrated in Fig. 3b where many false alarms can be seen to be near other observations. Figure 3c considers different choices for the Poisson parameter \(\lambda _{\mathrm {r}}\) with the log-likelihood being once again averaged over 50 repeats. The choice \(\lambda _{\mathrm {r}} = 1\) allows for rapidly creating tracks while proposing the simultaneous reassignment of 2 tracks often enough to prevent track fragmentation, whereas setting \(\lambda _{\mathrm {r}}\) to 0.5 or 1.5 does not perform as well. Finally, a few options are compared in Fig. 3d for the distribution \(\tilde{p}_{\mathrm {c}}\), with the log-likelihood being averaged over 50 repeats. The assessed options are

$$\begin{aligned} \tilde{p}_{\mathrm {c}}( (-1, 0,+1) \,|\,N_{\mathrm {r}}) = \left\{ \begin{array}{l@{\quad }l} (\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) &{}{} \text {as ``uniform''} \\ (\frac{1}{2}, \frac{1}{4}, \frac{1}{4}) &{}{} \text {as ``focus on}\,-1\text {''}\\ (\frac{1}{4}, \frac{1}{2}, \frac{1}{4}) &{}{} \text {as ``focus on}\,0\text {''}\\ (\frac{1}{4}, \frac{1}{4}, \frac{1}{2}) &{}{}\text { as ``focus on}\, +1\text {''}, \end{array}\right. \end{aligned}$$

where \(\tilde{p}_{\mathrm {c}}( (\delta _1, \delta _2, \delta _3) \,|\,N_{\mathrm {r}}) = (p_1, p_2, p_3)\) is a shorthand notation for \(\tilde{p}_{\mathrm {c}}(\delta _i \,|\,N_{\mathrm {r}}) = p_i\) for \(i \in \{1,2,3\}\). The results in Fig. 3d show that focusing on \(\delta = -1\) yields a slightly better performance, followed by focusing on \(\delta = 0\). Once again, this can be attributed to the reduction in track fragmentation. The influence of the parameter c is considered once more in Fig. 3e where it appears that \(c = 0.0005\) gives the best long-run performance. However, \(c = 0.001\) still provides good performance throughout the run time and is considered for the other simulations. Figure 3f compares the performance of the propose approach with MCMC-DA and shows that the latter does not mix as well as in the first scenario and fails to identify most of the tracks. The fact that the proposed approach does not reach the true log-likelihood can be attributed to local maxima in the posterior possibility function \(\varPi \) as well as to identifiability issues. The trace plots are shown for 50 repeats, and the median of these repeats is also plotted in order to better illustrate the behaviour of both approaches. In this scenario, it appears that it can be beneficial to run the simulated annealing several times in order to ensure that a good local maxima is found, as is usual with this type of algorithm.

5.3.3 Scenario with low probability of detection

To further assess the performance of the considered approach, we consider another challenging scenario, as shown in Fig. 4a, with the following characteristics: there are 25 false alarms and 0.5 appearing objects per time step on average and the probability of detection \(p_{\mathrm {d}}\) is equal to 0.5. The difficulty of this scenario is illustrated in Fig. 4b where it appears that the inter-observation distance is not sufficient to clearly identify the objects; in particular, the observations belonging to the object at the bottom right barely appear in Fig. 4b, emphasising the fact that a probability of detection of 0.5 is not sufficient to guarantee the spatio-temporal consistency between observations. Figure 4c shows that the proposed approach can capture most of the structure of the scenario, whereas the MCMC-DA did not identify the majority of tracks in the allocated time.

Fig. 5
figure 5

Comparison between multi-object filtering methods with the first observation of each path being given. The measure of performance is the OSPA distance (Schuhmacher et al. 2008) with parameters \(c=25\) and \(p=2\) averaged over 1000 repeats

5.3.4 Performance of multi-object filtering

We further assess the performance of the proposed approach by comparing the proposed multi-object filtering method against existing algorithms of the same complexity. The method detailed in Sect. 3.3 is an analogue of the HISP filter (Houssineau and Clark 2018) in the context of possibility theory, and both the original and the proposed approach are compared against each other as well as against the PHD filter (Mahler 2003). For the latter, paths are extracted from the underlying Gaussian mixture as is common, see, e.g. Pace and Del Moral (2013), and data associations are sampled from a distribution based on the components’ weight in the Gaussian mixture.

Performance assessment is based on the OSPA distance (Schuhmacher et al. 2008), which provides a suitable metric on sets and is used here to evaluate the distance between the set of true object states and the corresponding estimate at each time step. For each scenario, the different methods are initialised with the first observation of a given object and run until the last time step. The results are then averaged over all objects and over 1000 repeats. Figure 5 shows that the proposed analogue of the HISP filter outperforms the original version in general, and especially when the false-alarm rate is high. The PHD filter does not yield reliable estimates in either of the considered scenarios. The performance of the proposed approach, when compared to the original HISP filter, could be the consequence of the assumption in (5) which relaxes the one in the original version.