Uncertainty modelling and computational aspects of data association

A novel solution to the smoothing problem for multi-object dynamical systems is proposed and evaluated. The systems of interest contain an unknown and varying number of dynamical objects that are partially observed under noisy and corrupted observations. In order to account for the lack of information about the different aspects of this type of complex system, an alternative representation of uncertainty based on possibility theory is considered. It is shown how analogues of usual concepts such as Markov chains and hidden Markov models (HMMs) can be introduced in this context. In particular, the considered statistical model for multiple dynamical objects can be formulated as a hierarchical model consisting of conditionally independent HMMs. This structure is leveraged to propose an efficient method in the context of Markov chain Monte Carlo (MCMC) by relying on an approximate solution to the corresponding filtering problem, in a similar fashion to particle MCMC. This approach is shown to outperform existing algorithms in a range of scenarios.


Introduction
We consider the problem of performing inference for multi-object dynamical systems under partial, corrupted and noisy observations.This class of problems, known as multi-target tracking in the engineering literature [8,22,33], arises in many applications, e.g.bio-imaging [4], robotics [24] and surveillance [3], which can all benefit from principled inference solutions in different ways: i) when the number of objects is too large to be treated by hand, ii) when the phenomena of interest take place on extended periods of time or, conversely, when an immediate response is needed, iii) when the data available about each object is scarce and iv) when it is difficult to tell one object from another.One of the main difficulties with the considered type of system is that the number of objects is not known a priori and might vary in time due to a birth-death process.Also, objects are observed under multiple perturbations: i) each object might or might not be detected, ii) if an object is detected then its state is only partially observed and the observation is subject to noise and iii) observations not related to any object, referred to as false alarms, are also received.The main task when inferring the number of objects in a given system as well as their respective state is to solve the data association problem, that is, to estimate whether or not observations at different time steps originate from the same object.Each of the above-mentioned perturbations incurs a significant increase in the size of the set of all possible data associations, making it highly combinatorial.Due to this combinatorial nature, the task of estimating the current state of all objects based on all previous observations, referred to as multiobject filtering, is a difficult problem.It has been an active research topic for several decades and continues to be challenging in spite of the ever-increasing available computation resources [8,33].In this article, we aim to tackle the even harder problem of multi-object smoothing, that is, our objective is to keep evaluating the likelihood of data associations at previous times in light of newly received data.This is an important problem in practice since the elicitation of objects' trajectories and origins is fundamental for the evaluation of the objects' identities and of the associated situational awareness.Indeed, knowing the current state of each object is not sufficient in many situations and maintaining an up-to-date estimate of their past trajectories is often crucial.For instance, in defence applications, if an object labelled as "ally" crosses path with another object labelled as "enemy" then being able to tell one from the other at a later time can be more critical than having an accurate estimate of their state at that time.
In the context of filtering, one of the most natural ways of improving the trajectory estimates over the last few time steps is referred to as fixed-lag smoothing, where a sliding window made of a given number of time steps is updated based on the latest observations.The advantage with fixed-lag smoothing is that the computational cost can easily be tuned by selecting an adequate lag.However, since our objective is to elicit particular events that might have taken place at arbitrary time steps, we consider instead a "batch" alternative where a user-defined time-window of interest is fixed.
Defining a standard statistical model for representing multiple objects requires setting a number of probability distributions and parameters to characterise the different aspects of the problem, including highly uncertain phenomena such as false alarms.Such models also usually ignore the disparity between the different objects of interest in terms of behaviour and detection profile.In this article, we consider an alternative representation of uncertainty [14,12], based on possibility theory [7], that allows for acknowledging the lack of information about the different aspects of multi-object dynamical systems with the objective of increasing the robustness to misspecification of the derived solutions.The considered representation of uncertainty has links with imprecise probabilities [35] and Dempster-Shafer theory [6,31].
The use of MCMC to solve data association problems has been previously explored in [25] as well as in [34,20,19].The approach considered in these articles is based on local proposals in the set of data association, with [34,20,19] additionally considering the estimation of the object's trajectories.The objective in this article is to show that the set of data association can be explored effectively with global proposals without significantly affecting the probability of acceptance of each move.This result is achieved by leveraging the efficiency of an approximate multi-object filtering method.The use of MCMC in discrete spaces is discussed more generally in [36].MCMC has also been used in conjunction with, or as a replacement of, sequential Monte Carlo in the context of filtering for multi-object systems, see e.g.[21,30,23]; however this type of approach is less directly related to the method proposed in this article.
Overall, the contributions of the articles are as follows: i) a full multi-object model is defined in the context of possibility theory, building up on the components of single-and multi-object models of [29] and [11]; ii) a possibilistic analogue of a scalable solution to multi-object filtering [15] is introduced; iii) the tools of possibility theory are used to define a suitable structure on the set of data associations; iv) a new efficient MCMC-based solution for the multi-object smoothing problem is introduced and its performance is demonstrated.
We introduce a new statistical model for representing multi-object systems in Section 2. This is followed by the presentation of the proposed method for exploring the set of data association in Section 3, before considering an extension of this approach in Section 4. The performance of the proposed method is then assessed on simulated data in Section 5.

Model
We consider a fixed number K of time steps and assume without loss of generality that time steps take integer values between 1 and K.At each time step k ∈ {1, . . ., K}, a set of observations Z k is received, containing both object-originated observations and false alarms.Each observation in the set Z k is an element of an observation set Z, which is assumed to be a subset of R d Z .In order to model that an object might not be detected, we introduce the notation φ for the empty observation, that is, an object for which detection has failed is associated with the empty observation φ.We assume, as is standard, that an object cannot generate more than one observation at each time step.Therefore, denoting Zk = Z k ∪ {φ} the set of observations at time k augmented with the empty observation for any k ∈ {1, . . ., K}, any sequence of observations generated by an object through the K time steps of the scenario can be seen as an element of where the sequence of observation containing empty observations only is not considered.Elements of O K are also referred to as observation paths or simply as paths.Data association can then be seen as the problem of determining the probability for all the paths in a given subset of O K to be the true paths of objects in the system under consideration.Another standard assumption about multi-object systems is that each observation cannot originate from more than one object; as a consequence, not all subsets of O K are considered feasible and we focus on the set A of subsets of O K such that for all A ∈ A, any two different observations paths o and o in A must verify that either where o k denotes the k-th element of the sequence o.Less formally, elements of A only contain paths that are different where they are not both equal to the empty observation.The set A, in spite of being a strict subset of the power set of O K , has a large cardinality and evaluating the credibility of each of its elements by exhaustion can be difficult even when the number of observations at each time step is small.Assuming, for simplicity, that the number of observations at every time step is constant and equal to m, the number of elements in the power set of O K is equal to 2 m K − 1, which is prohibitively large even for toy problems.It is generally difficult to devise algorithms that perform inference on a large discrete space such as A, yet, MCMC methods can help to address part of this challenge since they only require being able to evaluate the credibility of a given association A ∈ A proposed via some user-defined transition kernel.
In practice, we also need to estimate the interval of existence of each object.For this purpose, we introduce a set T which is similar to A except that each path o will be paired with a time of appearance m ∈ {1, . . ., K} and the last time of existence n ∈ {m, . . ., K}. Formally, for all T ∈ T , any (o, m, n) in T must verify o k = φ for any k / ∈ {m, . . ., n} and, for any for all k ∈ {1, . . ., K}, as for data associations.We denote by κ the function extracting paths from tracks, that is κ(t) = o for any track t with path o.

2.1.
Uncertain variable and possibility function.We consider a representation of uncertainty [14] which can be used as an alternative to subjective probabilities in a statistical model.The objective of this representation of uncertainty is to model information rather than randomness and therefore to address common issues with statistical modelling for complex systems and with the use of subjective probabilities.In the context of multi-object systems, some these issues are: 1) the associated models are inherently hierarchical which precludes the use of improper priors on the first level of this hierarchy; however, there is often no prior information on the location of appearing objects which means that uninformative priors are needed; 2) as with many complex systems, there is a large number of parameters which are not necessarily known in practice and learning these parameters is both challenging computationally as well as potentially useless if they are likely to change drastically from one time step to the other; this is for instance the case with the probability of detection; As will be shown in the next few sections, the proposed approach allows for addressing these issues while preserving most of the usual intuitive mechanisms in Bayesian inference.
We model a fixed but unknown quantity as a mapping x from a sample space Ω to a set X, referred to as an uncertain variable.The difference with a random variable is that Ω is not equipped with a probability distribution and, instead, there is a reference element in Ω, denoted ω * , which correspond to the true value x * = x(ω * ) of the considered unknown quantity.The information about the true value of x is represented by a non-negative function f x on X verifying sup x∈x f x (x) = 1, referred to as a possibility function.The scalar f x (x) ∈ [0, 1] corresponds to the credibility of the event x = x for any x ∈ X and the credibility of the event x ∈ A for any A ⊆ X is given by sup x∈A f x (x).In particular, f x is not a density and the integral is replaced by a supremum, which is consistent with the fact that the event x ∈ X has credibility 1 by construction.Possibility functions are not characterised by their corresponding uncertain variables and, instead, we say that the possibility function describes the uncertain variable.If y is another uncertain variable in a set Y and if x and y are jointly described by the possibility function f x,y then y is described by the marginal possibility function and the possibility function describing x given that y = y is which is the analogue of Bayes' theorem for possibility functions [5].In this context, we will refer to f x and f x (• | y) as the prior and posterior possibility functions respectively and f y (y | •) will be called the likelihood function; similarly, f y (y) will be referred to as the marginal likelihood.If it holds that f x,y (x, y) = f x (x)f y (y) for all (x, y) ∈ X × Y then x and y are said to be independently described.This form of independence only implies that the information about x is not related to the information we hold about y.
The expected value and variance can be defined for possibility functions via the corresponding law of large numbers and central limit theorem [14] as where ∆f x is the Laplacian of f x , with the variance being infinite when E * (x) is not a singleton and undefined when f x is not twice differentiable at E * (x).The variance can be seen as being the inverse of an analogue of the Fisher information.
Another useful notion of expected value, which is the direct analogue of the standard expected value, can be defined for any real-valued function ϕ on X as Ē(ϕ(x)) = sup The scalar Ē(ϕ(x)) can be interpreted as the maximum expected value of ϕ(x).Many concepts and results holding for probability distributions can be used for possibility functions.For instance, if Y = X = R and if the likelihood function is a normal possibility function, i.e.
for some a ∈ R and some σ > 0, then one can show that the posterior is also a normal possibility function if the prior f x is normal.In other words, the concept of conjugate priors makes sense.This result can be extended to the multivariate case and it has been shown in [13] that the posterior expected value and variance of the Kalman filter can be recovered with possibility functions.If the objective is to find the (subjective) probability p(B) of some event x ∈ B for some measurable subset B of X, then the credibility sup x∈B f x (x) can be seen as an upper bound for this probability and we find that where B c = X \ B is the complement of B in X.This interpretation implies that the possibility function 1, which is equal to 1 everywhere on X, is the least informative.This uninformative possibility function is well defined even when X is unbounded.
It is also possible to interface uncertain variables and random variables in order to introduce more sophisticated representations of uncertainty involving both lack of information and randomness [12].However, we will argue that all the elements of the introduced statistical model can be seen as subjective so that only possibility functions will be used.

2.2.
Multi-object model.We first introduce the assumptions and notations for modelling the way objects appear, behave and disappear in Section 2.2.1 before moving on to the considered sensor modelling in Section 2.2.2.Most of the assumptions are standard in the field of multi-object estimation.
2.2.1.Object and population dynamics.We consider the case where there is no information about some or all of the components of the state of appearing objects.Typically, there might be no prior information about the position of objects whereas assumptions can be made about the velocity components.Denoting m ∈ {1, . . ., K} the time step at which a given object has appeared, the state at this time step is represented by an uncertain variable x m in a space X ⊆ R d X described by a possibility function f 0 .With probabilistic modelling, improper priors might be required in order to model the absence of information about appearing objects; however, the hierarchical nature of multi-object estimation implies that improper priors cannot be used without adding heuristics at the level of data association [16,27].We consider that there is a non-negligible heterogeneity between the dynamics of the different objects and that the characteristics of the objects' motion is not necessarily well known.As a consequence, we model the trajectory of an object as a sequence of uncertain variables {x k } n k=m+1 on X such that, for any k ∈ {m + 1, . . ., n}, x k is described by a possibility function for some possibility function g k (• | x k−1 ) on X.This is an analogue of the Markov property for uncertain variables.We take into account the fact that objects might completely disappear from the scene before the last time step, in which case we say that the object has "not survived".This could be seen as a convenient way of dealing with objects that are no longer detectable by the sensor(s).Object survival is not usually a random event so that we model it as an uncertain variable.The respective credibilities for an object with state x ∈ X to survive or not survive to the next time step are denoted α s (x) and α ns (x).These credibilities must verify max{α s (x), α ns (x)} = 1 for any x ∈ X.We consider the case where α s = 1 since we want to model that objects are unlikely to disappear right after appearing, for which we need to set α ns (x) 1 for any x ∈ X.The subjective probability of survival for an object with state x ∈ X is therefore restricted to the interval [1 − α ns (x), 1].
Given the introduced model and notations, the joint credibility of a trajectory x m:n ∈ X n−m+1 and of the corresponding last time of existence n ∈ {1, . . ., K} for an object that is known to appear at time step m can be characterised by the possibility function where 1(e) equals 1 if e is true and 0 otherwise.
There are several possible models for the number of appearing objects per time step.The simplest is to assume that the credibility for an object to appear at time k ∈ {1, . . ., K} is α k,+ and that this aspect can be independently described for all objects.The credibility for M objects to appear at time step k is then equal to α M k,+ .Additional information might however be available about appearing objects, such as a maximum number M k,+ at time step k, in which case we would have a The associated possibility function on the set N 0 of non-negative integers is denoted f k,+ in any case.

2.2.2.
Observation.Most sensors acquire information about the objects of interest by measuring some signal over an array of resolution cells.This is the case for cameras, where these resolution cells are pixels, but also for most radars and sonars [32].Considering for instance the case of a radar measuring range and azimuth, the internal processing of the radar image yields a set of resolution cells where the strength of the signal suggests the presence of an object in the corresponding directions and at the specified distances.In addition, objects are often extended and the signal can originate from different edges and/or surfaces depending on their (unknown) orientations.As a consequence, we model the observation process via uncertain variables and consider the following form for the likelihood function: where R k is a d Z × d Z symmetric positive-definite matrix related to the size and shape of the resolution cells (assumed constant in Z).The difference between this normal possibility function and the corresponding normal probability distribution would not matter in a standard single-object tracking scenario since normalising constants would simplify in Bayes' theorem; however, in multi-object tracking, these constants are important since they appear in the assessment of data associations.
The credibility for an object with state x ∈ X to be detected is denoted α d (x) and, similarly, the credibility of a detection failure is denoted α nd (x).Since it must hold that max{α d (x), α nd (x)} = 1 for any x ∈ X, we will assume that α d = 1 so that it is unlikely for an object to remain undetected.Given a trajectory x m:n of an object appearing at time step m and disappearing after time step n, it follows that the likelihood function for a path o The credibility for an observation z ∈ Z at time k ∈ {1, . . ., K} to be a false alarm is denoted α k,fa (z), which will be assumed to be strictly lesser than 1; otherwise, if it were possible for all observation to be false alarms then this would be the posterior expected data association in general.The credibility for a given finite subset Z of observations in Z to be false alarms is then As a possibility function on sets, f k,fa must verify that sup Z⊆Z f k,fa (Z) = 1.
Other aspects such as false alarms and initial observations must be considered jointly.We denote f fa the possibility function defined on A as is the set of false alarms induced by A at time step k.We also introduce f + as the possibility function on T defined as where M k (T ) = {(o, m, n) ∈ T : m = k} is the number of objects appearing at time step k ∈ {1, . . ., K}.The functions f fa and f + , defined respectively on A and T , are not possibility functions; instead, they are simply the joint credibility for observations that are not in a given element of A to be false alarms and for tracks that are in a given element of T to have appeared at the indicated time steps.The target possibility function, i.e. the posterior possibility function Π on T describing the unknown set of tracks, is then expressed for any T ∈ T as and is such that max T ∈T Π(T ) = 1.The marginal likelihood for the set of paths A = κ(T ) is then defined as

MCMC for data association
3.1.Computational aspects of possibility theory.Approximation methods for possibility functions must be devised in order to solve the corresponding inference problems in general.Grid-based methods have the same shortcomings as in the probabilistic case since it is often difficult to anticipate where the posterior possibility function will take non-negligible values.Although one cannot sample directly from a given possibility function f x , the latter can be used within MCMC together with a proposal (probability) distribution.In this case, there is no requirement of targeting a given probability distribution and there is no concern regarding the independence between samples.One of the consequences is that low-discrepancy sequences can be used instead of pseudorandom numbers.The generated chain, say {X n } n≥1 , will simply be used to approximate the expected value for any real-valued function ϕ on X.As opposed to the standard Monte Carlo approximation, the possibility function f x appears explicitly in the expression of Ē(ϕ(x)) since the density of samples in a given area conveys no information about f x ; instead, the chain {X n } n≥1 simply provides support points for the approximation of f x as a function.If only the expected value E * (x) of x is of interest, then the possibility function f γ x for some γ > 1 can be used instead.The considered power can also be increased during the execution of the MCMC, leading to a simulated annealing.Conversely, if one is interested in identifying the subset of X containing at least 100(1 − α)% of the subjective probability mass defined in (1), then areas where f x has value α must also be explored, hence justifying the use of a power γ strictly lesser than 1.
When using the possibility function f γ x in a MCMC algorithm, it is the probability distribution on X defined as the renormalised version of f γ x that is targeted (assuming f γ x is integrable).This is not however the only possible approach.Indeed, (1) suggests that a possibility function can be seen as inducing an upper bound for probability distributions.It follows that selecting the sampling distribution from the set of upper-bounded probability distributions is also meaningful.A particular choice that is appropriate in many settings is to follow the maximum-entropy principle [18] and consider the maximum-entropy distribution that is upper bounded by f x as in (1), as proposed in [17].When X is discrete, it is possible to further increase the entropy by replacing the set-wise upper bound of (1) by a point-wise upper bound of the form p(x) ≤ f x (x), x ∈ X, with p a probability mass function on X.This approach will be particularly useful in the context of multi-object inference since it will lead to an increase of the diversity of explored data associations when compared to sampling from the distribution proportional to f x .

Problem formulation.
The objective in the remainder of this section is to design a proposal distribution for identifying the mode of the possibility function Π defined in (3) via the Metropolis-Hastings algorithm.We assume for the moment that this proposal distribution is given and express it as a Markov kernel Φ from A to itself.A natural starting point for exploring the set A is to consider the case where all observations are false alarms, that is, we start from the element A = ∅ ∈ A. We first assume that Π can be evaluated everywhere so that, given a previous sample A, a new sample A can be obtained from the probability distribution Φ(• | A) and accepted with probability where t is the current iteration and ρ t is the inverse temperature defined by ρ 0 = 1 and ρ t = ρ t−1 /(1 − c) for some constant c.
The main difficulty with the Metropolis-Hastings algorithm in the context of interest is to design a proposal distribution Φ with adequate properties.In particular, there are two issues with this approach which we will aim to solve in the remainder of this section: i) The possibility function Π on A is highly multimodal in general so that moves that are local both in space and time are unlikely to yield a sufficient exploration of the space.
ii) Implementing moves on entire paths in the set O K would be more global in nature; however this requires the non-trivial introduction of additional structure on this set.These two issues will be addressed in Sections 3.3 and 3.4 respectively.Section 3.5 will then detail the construction of the proposal distribution Φ.Extensions of the MCMC algorithm introduced for Π to the possibility function Π on T will be covered in Section 4.

3.3.
Approximate multi-object filtering.In order to explore the different possible associations in the set T without getting stuck in local maxima and without incurring detrimental effects on the mixing of the MCMC chain, we propose to use a multi-object filter to ensure that any proposed association is meaningful from the viewpoint of the model.The motivation for leveraging the capabilities of an approximate filtering algorithm to solve the corresponding smoothing problem is very similar to the one behind particle MCMC [1].To illustrate the challenge with proposing changes in data association, we consider the case where two objects have crossing trajectories as in Figure 1a; if we only change one observation of a given path at a time, then it will take many moves to go from one high-credibility data association to another, and some of these moves will be in regions of arbitrarily small credibility.Alternatively, as is usual with MCMC algorithms, proposing bigger moves without taking into account the geometry of the target possibility function will result in an extremely low acceptance rate.This would be the case for instance if we were to reassign paths by simply proposing new observations uniformly at random.The objective is therefore to obtain paths that are consistent with the model given a restricted number of initial observations (first observation in a path).The corresponding moves that we will construct will be global in the sense that they might affect all time steps but local in sense that only a restricted number of paths will be (re)assigned.The considered filtering algorithm should have a low complexity in order to limit the computational cost of the overall MCMC algorithm.A possible candidate could therefore be the probability hypothesis density (PHD) filter [22] or its analogue in the context of possibility theory [11].However, the PHD filter does not solve the data association problem and, as q consequence, cannot be used to propose paths.Instead, we consider an analogue of the hypothesised filter for stochastic populations [15], or HISP filter, which is of the same complexity as the PHD filter and which allows for distinguishing objects.
At time step k ∈ {1, . . ., K}, the HISP filter provides the marginal probabilities for extending an existing path z 1:k−1 ∈ Z1 × • • • × Zk−1 with an additional observation z k ∈ Zk at the current time step.The standard version of the algorithm would consider all such associations (at least the ones that are not too unlikely) and proceed to the next time step; however, we consider a modified version where a feasible data association is drawn at every time step so that the number of considered paths does not increase exponentially and the computational cost is further reduced.We also use the modelling based on possibility functions introduced in the previous sections instead of the probabilistic modelling considered in [15].The different steps of this modified HISP filter are given in the following sections.
The context is as follows: since only part of the existing paths are reassigned and since observations can only be associated with one object, it follows that some of the observations are unavailable to the HISP filter; we denote by Z − k the sets of available observations at any time step k ∈ {1, . . ., K}.We will assume in this section that the credibility α nd of detection failure and the credibility α ns of nonsurvival are constant over the state space for the sake of simplicity; as opposed to the probabilistic case, this can be achieved in general by selecting the (constant) credibility of detection failure to be sup x∈X α nd (x) and similarly for the credibility   of non-survival.This operation can be seen as a voluntary loss of information with the purpose of gaining a property of interest.

3.3.1.
Initialisation.We assume that, using local moves, the MCMC algorithm provides a set Z c = {(z i c , k i c )} Nc i=1 of N c pairs with z i c the initial observation for a path and with k i c the corresponding time step, i ∈ {1, . . ., N c }.These observations might or might not be at the same time step but the pairs (z i c , k i c ), i ∈ {1, . . ., N c }, are assumed to be different from each other.A path will be initialised every time one of these observations is encountered in the provided sets of observation Z − k .

Prediction. We denote by O
composed of paths that have been selected so far as potential sequences of object-originated observations.To each path o ∈ O k−1 corresponds a possibility function Recalling that g k is the Markov transition from X to itself describing the objects' dynamics, we obtain the predicted possibility function Such a prediction only considers the event where the object survives to the k-th time step although it is possible for objects to disappear.We postpone considerations of this aspect of the prediction to a further stage in the algorithm.
3.3.3.Update.At time step k, the set of observations Z − k is available to update the existing paths.For any path o in the set O k−1 of previously selected paths and for any new observation z ∈ Z − k ∪ {φ}, the posterior possibility function associated with the extended path o : z, with ":" denoting concatenation, is defined as We can then select which observation in Z − k ∪ {φ} will be used to propagate the path based on the credibility of the corresponding association.However, before expressing the latter, we first have to introduce the prior credibility of presence, which depends on the consecutive number of time steps for which the path under consideration has not been detected.Indeed, there is some remaining ambiguity whenever the empty observation φ is selected for a path since it is unclear in this case whether the detection has failed for the corresponding object or the object has not survived the last prediction step.We purposefully maintain this ambiguity and postpone the decision in order to better estimate which of these two events occur.Indeed, the credibility of non-survival is most often much lower than the credibility of detection failure, e.g.α nd = 0.1 and α ns = 0.001, so that terminating a track after a single detection failure is unlikely.Yet, if detection failures keep occurring for l time steps, then the credibility of the corresponding events, i.e. α l nd for the case where the object remains and α ns for the case where the object has disappeared, will rapidly favour a disappearance as opposed to a sequence of detection failures.For any path o ∈ O k−1 , we denote by l o the number of consecutive time steps for which φ has been selected, e.g. if o is of the form (o 1 , . . ., o k−3 , φ, φ) with o k−3 = φ then l o = 2.We then compute the credibility that the corresponding object has survived/not survived since the last detection as = max{a, b} for any a, b ∈ R. The binary operator ∨ is assumed to have lower precedence than multiplication, so that a ∨ bc = a ∨ (bc) for any a, b, c ∈ R.
We can now express the marginal credibility of association on Although, the number of simultaneously reassigned paths will be limited in the context of interest, the number of observations in Z can be extremely large so that the computation of Γ k (Z | O) can be challenging.Yet, it is possible to rewrite this term by assuming that any two paths in O are unlikely to obtain large marginal likelihoods from a single observation in Z, that is, for any o, o ∈ O such that o = o and any z ∈ Z, there exists z ∈ Z such that In the probabilistic version of this assumption [15], the left hand side needs to be equal to 0, which is more constraining.It follows that Γ k (Z | O) can be expressed as This result can be proved easily by developing the product in the approximated expression and removing the terms where a single observation is associated with several tracks.Using this expression, all the terms Γ k The approach is similar to the one detailed in [15] for the probabilistic case.
We then select an observation in Z k ∪ {φ} at random for each of the paths in O k−1 using the marginal credibility of association γ k (• | o) from the maximum entropy approach.There are two ways of enforcing the modelling assumption that paths cannot contain the same observation: i) use a rejection sampling strategy to ensure that only acceptable data associations are proposed, and ii) completely reject the proposed data association if it contains overlapping paths.
The main drawback with the first option is that calculating the probability of proposing a given acceptable data association is combinatorial in nature and becomes a computational bottleneck when the number of observations is large.We therefore consider the second option.Finally, we initialise a new path for any This path is of the form o = (φ, . . ., φ, z i c ).At the last time step, a decision is taken for all observations paths, even the one ending with empty observations, and a set A c is defined as the set of all created paths.The conditional probability for generating the set of paths A c given the initial observations Z c and the available observations 3.4.Structure on the set of paths.In order to help exploring the set of data associations A, it is useful to equip the underlying set of paths O K with additional structure.The only natural structure on O K is the one inherited from the fact that the observations are in the set Z which is a subset of an Euclidean space.This is not however sufficient since simply measuring the distance between two observations z k and z k at two different time steps k = k as z k −z k , with • the Euclidean norm, does not take into account the structure of the problem.Moreover, the notion of distance is very model-dependent and what is considered as "close" or "far" would need to be adjusted for each scenario.Instead, we use the objects' dynamical and observation model as a reference and relate observations via the credibility for these observations to be generated by the same object.These observations can be seen as consistent if that credibility is close to 1 and inconsistent if it is close to 0. In order to simplify the calculations, we assume the existence of an upper bounding function g for the Markov transition g k such that (5) for any k ∈ {1, . . ., K}, with g of the form for some d X × d X matrices F and Q.We consider two time steps k, k ∈ {1, . . ., K} such that k < k as well as two observations z and z at time steps k and k respectively and introduce f k,k (z | z) as the possibility for an object initialised from z at time step k to be observed again at time step k at z in the absence of any other observation, that is where l = k − k, where g l is the l-th fold convolution of the transition g, that is and where f k (• | z) is the posterior possibility function defined as The possibility function g l (• | x k ) is an upper bound for the convolution of the Markov transitions g k+1 , . . ., g k .Assuming that f k (• | z) = N(m z , Σ 0 ) and denoting by Σ l the covariance matrix after l predictions, e.g.
where H k ,z,l is the Jacobian of h k at the point F l m z ; the value of d k,k (z, z ) can be easily deduced.
The main drawback of this notion of consistency is that observations tend to become more consistent as l increases since there is more uncertainty about the state of the object as time passes by.To address this potential issue, we take the credibility of detection α d (•) into account and focus on the credibility for an observation z to be the next observation of the object after z.To fit into the considered context, we introduce a lower bound a nd for the credibility of nondetection, i.e. a nd is such that α nd (x) ≥ a nd for any x ∈ X.It then follows that the possibility for z to be the next observation after z is fk , defined for any l > 0. The function fk,k can be easily extended to Z ∪ {φ} by defining fk,k ( Example 1.To illustrate the use of the notion fk,k of consistency, a simple scenario consisting of 6 objects is considered as in Figure 2a.For each observation z ∈ Z k at some time step k ∈ {1, . . ., K} we compute a marginal credibility for z as (8) fk (z) = max The scalar fk (z) can be interpreted as the credibility for z to be followed by another observation in Z k for some k > k.When creating a new track, we can then define the probability for selecting z as the first observation of the new track as a function of a fk (z).A scatter plot displaying these credibilities for all observations is shown in Figure 2b.
The advantage of relating observations in this way is that it can be easily extended to paths.Indeed, we can define the consistency between an observation z ∈ Z k at some given time k with a path o in O K as (9) fk (z, o) = max max where the initial observation is either z or one of the observations in o.Similarly, the consistency between two paths o and o in O K is defined as We can now propose to modify a given data association by changing nearby paths, and therefore focus the computational power on moves that are likely to be accepted.Although the approach considered here is not standard, it has two main advantages:  it relates observations together and applies to non-linear cases as long as a Gaussian upper-bounding function can be found.
In practice, it might be necessary to reduce the time required for computing fk ,k (z | z) between any pair (z, z ) of observations, especially if the scenario runs over many times steps or if the number of observations at every time step is large.In that case, one can define a threshold τ such that if a l nd < τ then any observations that are l time steps apart will be arbitrarily assigned a credibility of 0.
3.5.Design of the proposal distribution.When designing a proposal distribution Φ for our MCMC algorithm, several requirements need to be considered: it should be possible to i) reassign several paths simultaneously in order to address crossings as illustrated in Figure 1a and track fragmentation, ii) reassign both the initial observation of a path and the subsequent path as in Figure 1b, and iii) create a new path.
Requirement i) can be easily fulfilled by using the approach presented in Section 3.3 however, instead of simply choosing the paths at random, it is more efficient to focus on nearby paths.In order to simultaneously reassign the initial observations of a given set of paths A r as needed in Requirement ii), we consider the notion of consistency defined in (9).Once a new initial observation has been selected, the approach of Section 3.3 can be used to reassign the rest of the chosen path.Finally, the marginal consistency defined in (8) can be used for Requirement iii) in order to identify observations that are likely to be initial observations.The general objective is to find a proposal distribution Φ that is as simple as possible and such that the associated MCMC kernel is irreducible and reversible.Starting from a given set of paths A of size s = |A|, we suggest to proceed as follows: 1) Sample a number N r of paths to reassign from a probability mass function (p.m.f.) p r (• | s) such that N r ≤ s almost surely (a.s.), e.g. a truncated Poisson distribution.Then, sample the number N c of paths to be created from the p.m.f.p c (• | N r ) on the set of non-negative integers N defined as with pc a p.m.f. on {−1, 0, 1} to be defined.With this model, there will be one created path a.s.when none are reassigned (there is limited interest in creating several paths at once in this case) and the number of paths will be increased by one, kept constant or decreased by one in case of reassignment.Reducing the number of paths by one will address the issue of track fragmentation, keeping the number of paths constant is appropriate when considering objects with crossing trajectories, and leaving the possibility of increasing the number of paths is required to ensure reversibility.Indeed, when evaluating the probability of the reverse proposal, created paths will become reassigned paths and vice versa.
2) If N r = 0 then define A r = ∅ and proceed to the next step, otherwise, select the set A r = {o i } Nr i=1 of paths to be reassigned as follows: the first path o 1 is picked uniformly at random from the set of paths A then the N r − 1 remaining paths, if any, are selected based on their distance to o 1 : for any 1 < i ≤ N r , where P o1:i−1 (•) is a function transforming possibility functions into probability distributions, e.g. the maximum-entropy distribution upperbounded point-wise by f (o 1 , •), which we assume to verify o∈A Therefore, the set of paths A r is sampled without replacement from the set A. When evaluating the probability P r (A r | N r ) for sampling the subset A r of A, all possible ways of obtaining such a subset must be taken into account, that is where Sym(n) is the set of permutations of {1, . . ., n}.Although the computational complexity for this term is combinatorial, N r is usually small so the actual computational time is limited.
3) If N c = 0 then define Z c = ∅ and proceed to the next step, otherwise, select the N c initial observations k defined for any k ∈ {1, . . ., K} as where ẑj c stands for the pair (z j c , k j c ) for any j ∈ {1, . . ., N c }, and where the possibility function f Nr c is defined as the marginal consistency (8) if N r = 0 and as the consistency (9) with the future observation in the paths in A r otherwise.Indeed, when reassigning N r > 0 paths, it is more efficient to propose new paths in the same area rather than initialising paths in random locations, especially during the burn-in period of the MCMC when observations in different places are likely to originate from objects.The probability of proposing the subset Z c of observations takes a similar form as for path reassignment and can be expressed as The comment regarding computational complexity made about P r (• | N r ) applies equally here.4) Apply the approximate multi-object filter of Section 3.3 to the set of initial observations Z c and with the sets of available observations Z − 1 , . . ., Z − K and denote A c the generated set of paths.If A c ∩A r = ∅ then we reject the proposal, otherwise, the proposed set of paths is A = (A\A r )∪A c .The reason for rejecting the proposal when A c ∩ A r = ∅ is to ensure that A c and A r can be recovered from A and A as If the proposal has not been already rejected during its construction, the probability Φ(A | A) to go from the previous set of paths A to the new set of paths A is computed as The probability α(A, A ) of accepting the proposed set of paths A can then be computed using (4).

MCMC on the set of tracks
We now want to design a MCMC algorithm that targets the possibility function Π as introduced in (2).In this case, the Metropolis-Hastings acceptance ratio is (10) α t (T, T ) = min 1, with A and A the set of paths in T and T respectively.We therefore have to propose a time of appearance and a last time of existence for each path in A.
These time steps will sampled independently from their previous values in T .

4.2.
Evaluating the marginal likelihood.So far, the proposed approach does not assume a specific model for the dynamics and for the observation process.Indeed, although the likelihood k (• | x) is assumed to take the form of a Gaussian possibility function, the function h relating states to observations is general.We will however distinguish two different cases for the evaluation of the marginal likelihood: the linear-Gaussian case in which Kalman filtering can be used and the non-linear case where sequential Monte Carlo techniques are a natural alternative.
4.2.2.Non-linear case.If either the objects' dynamics or the observation function is not linear, then there is no analytical form for the filtering distributions at different time steps in general.Sequential Monte Carlo (SMC) methods are an alternative to the Kalman filter in this case.An analogue [17] of the bootstrap particle filter [9] can be used, see also [28,29].In particular, for a given path o ∈ O k−1 , we denote by {(w o k−1,i , x o k−1,i )} N i=1 the indexed family of weighted particles approximating the predicted possibility function for any real-valued function ϕ on X, with the uncertain variable x k being described by ) for any i ∈ {1, . . ., N }.In this situation, the marginal likelihood at time step k can be approximated by ˆ

Simulations
In all the cases to be considered, K = 50 and X = R 4 .States at time step k are of the form x k = (x k , y k , ẋk , ẏk ) , where x k and y k are the coordinates of the position in the 2-dimensional Euclidean space and where ẋk and ẏk are the coordinates of the velocity.The duration of one time step is denoted ∆ and the motion model is assumed to be of the form where σ a is the standard deviation of the zero-mean random acceleration, which is considered as a noise term.This model is referred to as the nearly-constant velocity model.We will consider in particular the case where ∆ = 1 and σ a = 0.05.For the sake of simplicity, the observation model is assumed to be linear; the position (x k , y k ) of an object is observed directly, which leads to h(x k ) = Hx k with H = 1 0 0 0 0 1 0 0 .
The variance R is of the form σ 2 I 2 with σ > 0 and I 2 the identity matrix of dimension 2. This model is useful when tracking directly in the coordinate systems defined by a sensor such as the image plane of a camera.Other situations where this model arises are when multiple sensors provide complex observations which can be combined into a single observation before being used in a tracking algorithm such as with GPS or with multiple-input multiple-outputs sensor systems [2,10,26].We will consider in particular the case where σ = 0.The same approach is used with the probability of survival.The possibility function α k,fa is assumed to be constant and equal to 10 −2 for all scenarios; this is in spite of the fact that the number of false alarms will vary significantly across the considered settings.The reason for this is that α n k,fa is seen as an upper bound for the probability of having n false alarms.A similar approach is used for appearing objects with f k,+ (n) = α n + with α + = 10 −4 .The other model parameters such as σ and σ a are assumed to be known.
The proposed approach is compared to the MCMC for Data Association (MCMC-DA) method introduced in [25].In order to make the two methods comparable, the possibility function Π is used to evaluate the log-likelihood of the proposed sets of tracks.However, as opposed to the proposed approach, MCMC-DA is provided with the true parameters of the model in the design of the corresponding proposal distribution.5.2.Choice of parameter.We assume that the current sample from Π is T ∈ T and denote by A = κ(T ) the corresponding set of paths.We then comment on the choice of parameters for the different steps in the proposal mechanism.
The number N r of tracks to reassign is chosen from a Poisson distribution with parameter λ r = 1, truncated to the interval {0, . . ., |A|}.The parameter λ r can be adjusted depending on the considered scenario: if objects are expected to be very close to each other and to frequently have crossing trajectories, then λ r could be increased to raise the average number of tracks that are reassigned at once.Large reassignments are however less likely to be accepted so that a trade-off between exploration and mixing must be found, as is usual with MCMC.
The distribution pc (• | N r ) on {−1, 0, 1} drives the increase or decrease of the number of tracks in the proposal step.Since one of the main issues with the MCMC approach for data association is track fragmentation, i.e. the representation of a single object by a series of shorter tracks, it is generally helpful to focus on reducing the number of tracks.We therefore consider the following parametrisation: 5.3.MCMC on the data association set.The choice of parameter as well as the performance of the proposed approach are assessed on different scenarios.We first consider a simple scenario, as shown in Figure 3a, with 10 false alarms and 0.1 appearing objects per time step on average and with a probability of detection of p d = 0.9.The simplicity of the scenario is illustrated in Figure 3b where it appears that most of the false alarms are far from any other observation and, conversely, object-originated observations are close to each other.The performance of the two considered approaches is first assessed on a single run in Figure 3c where the evolution of the log-likelihood is displayed as a function of the computational time."HISP" refers to the proposed approach whereas "DA" refers to the MCMC-DA.The difference in behaviour between the proposed approach and MCMC-DA is due to the use of the simulated annealing in the former.Both methods provide satisfactory results in this case and the MCMC-DA's chain mixes well.Figure 3d, which displays the performance averaged over 50 repeats, shows that setting the parameter c in the inverse temperature ρ t to 0.001 provides the best performance throughout the duration of the runs.5.3.2.Scenario with high false-alarm rate.We consider a first type of challenging scenario, depicted in Figure 4a, with the following challenging characteristics: there are 100 false alarms and 0.5 appearing objects per time step on average and the probability of detection p d is equal to 0.8.In this case, it is the large number of false alarms that make the estimation difficult due to the fact that they are likely to form coherent observation sequences over 2 to 3 time steps.This aspect is illustrated in Figure 4b where many false alarms can be seen to be near other observations.Figure 4c considers different choices for the Poisson parameter λ r with the log-likelihood being once again averaged over 50 repeats.The choice λ r = 1 allows for rapidly creating tracks while proposing the simultaneous reassignment of 2 tracks often enough to prevent track fragmentation, whereas setting λ r to 0.5 or 1.5 does not perform as well.Finally, a few options are compared in Figure 4d for the distribution pc , with the log-likelihood being averaged over 50 repeats.The assessed options are pc ((−1, 0, +1) (1/3, 1/3, 1/3) as "uniform" (1/2, 1/4, 1/4) as "focus on −1" (1/4, 1/2, 1/4) as "focus on 0" (1/4, 1/4, 1/2) as "focus on +1", where pc ((δ 1 , δ 2 , δ 3 ) | N r ) = (p 1 , p 2 , p 3 ) is a shorthand notation for pc (δ i | N r ) = p i for i ∈ {1, 2, 3}.The results in Figure 4d show that focusing on δ = −1 yields a slightly better performance, followed by focusing on δ = 0. Once again, this can be attributed to the reduction in track fragmentation.The influence of the parameter c is considered once more in Figure 4e where it appears that c = 0.0005 gives the best long-run performance.However, c = 0.001 still provides good performance throughout the run time and is considered for the other simulations.Figure 4f compares the performance of the propose approach with MCMC-DA and shows that the latter does not mix as well as in the first scenario and fails to identify most of the tracks.The fact that the proposed approach does not reach the true loglikelihood can be attributed to local maxima in the posterior possibility function Π as well as to identifiability issues.The trace plots are shown for 1 repeat as well as for 50 repeats in order to show that the low average performance of the MCMC-DA is not due to averaging.5.3.3.Scenario with low probability of detection.To further assess the performance of the considered approach, we consider another challenging scenario, as shown in Figure 5a, with the following characteristics: there are 25 false alarms and 0.5 appearing objects per time step on average and the probability of detection p d is equal to 0.5.The difficulty of this scenario is illustrated in Figure 5b where it appears that the inter-observation distance is not sufficient to clearly identify the objects; in particular, the observations belonging to the object at the bottom right barely appear in Figure 5b, emphasising the fact that a probability of detection of 0.5 is not sufficient to guarantee the spatio-temporal consistency between observations.Figure 5c shows that the proposed approach can capture most of the structure of the scenario whereas the MCMC-DA did not identify the majority of tracks in the allocated time.

2. 3 .
Target possibility function.We now introduce the posterior possibility function on the set T describing the unknown set of tracks, based on the model detailed in Section 2.2.For this purpose, we consider a track t = (o, m, n) and start by defining the credibility π(o, n | m) of the pair (o, n) given the time of appearance m ∈ {1, . . ., K} as Reassignment of two paths when trajectories cross.
Track fragmentation and change of observation.

Figure 1 .
Figure 1.Examples of reassignment where black dots represent observations and different line styles represent different possible data associations.Thin grey lines and red numbers show the time steps to which different observations belong.

k
αns (o) ∨ αs (o) α ns ∨ α nd otherwise the marginal likelihood for the observation z and with Γ k (Z | O) the credibility for paths in the set O ⊆ O k−1 to be associated with observations in the set Z ⊆ Z − k , which can be expressed as Γ k (Z | O) = max σ:O→Z ∪{φ} f fa (Z \ σ(O)) o∈O L k (σ(o) | o), where the maximum is over all mappings σ from O to Z ∪ {φ} that are injective on Z and where σ(O) is the image of O by σ, i.e. σ(O) = {σ(o) : o ∈ O}.
True trajectories as coloured lines, initial/final positions as circles/squares and observations as black dots.Scatter plot of the observations with a size proportional to the corresponding credibilities of selection.

Figure 2 .
Figure 2. Scenario with 6 objects as represented in (A) with the corresponding scatter plot of the probability for ach observation to be selected in (B).

1 c
of the initial observations is performed without replacement as ẑi c ∼ P (ẑ

4. 1 .
Proposing the interval of existence.The objective in this section is to propose a time of appearance m and a last time of existence n for a given path o ∈ O K , using the different quantities introduced in Section 3.3.We consider a path o ∈ O K of the form o : φ.One can sample the lag corresponding to the last time of appearance according to the probability mass function p − (• | o) on {0, . . ., l o } defined as the maximum-entropy distribution bounded by l → α 1(l<lo) ns α l nd .The last time of existence is set to n = K − l o + L − .For the time of appearance m associated with a path o ∈ O K , we can simply sample a lag L + from the maximumentropy distribution p + (• | o) bounded by l → α l nd and set n = k + (o) − L + with k + (o) the time of the first observation in o.The probability distribution Ψ(• | A) is then associated with the proposal of a time of appearance and a last time of existence for each path in a given set A ∈ A, i.e.

4. 2 . 1 .
Linear-Gaussian case.If the Markov transition g k is of the formg k (• | x ) = N(F k x , Q k ) for some d X × d X matricesF k and Q k and for any k ∈ {1, . . ., K} and if the observation function h k is of the form h k (x) = H k x then the posterior distribution of the state at any time step can be computed analytically via the Kalman filter.In particular, for a given path o ∈ O k−1 , we denote by m o k and Σ o k the mean and variance of the state at time k ∈ {1, . . ., K} given the observations in the path o.The only difference with the standard Kalman filtering equation is the marginal likelihood which, due to the form of the likelihood, is expressed at time step k as

5 . 1 .
3 and Y = [−60, 60] × [−60, 60].Parametrisation of the proposed algorithm.If the probability of detection is p d then the possibility of detection failure is set to α nd = 1 − p d and the possibility of detection α d is set to 1.
Trajectories (with circles indicating initial position) and observations (black dots).
Distance from one given observation to the nearest observation.
Comparison between different methods.
Trace plot for different values of c.