Probabilistic logic programming for hybrid relational domains
 994 Downloads
 5 Citations
Abstract
We introduce a probabilistic language and an efficient inference algorithm based on distributional clauses for static and dynamic inference in hybrid relational domains. Static inference is based on sampling, where the samples represent (partial) worlds (with discrete and continuous variables). Furthermore, we use backward reasoning to determine which facts should be included in the partial worlds. For filtering in dynamic models we combine the static inference algorithm with particle filters and guarantee that the previous partial samples can be safely forgotten, a condition that does not hold in most logical filtering frameworks. Experiments show that the proposed framework can outperform classic sampling methods for static and dynamic inference and that it is promising for robotics and vision applications. In addition, it provides the correct results in domains in which most probabilistic programming languages fail.
Keywords
Probabilistic programming Statistical relational learning Discrete and continuous distributions Particle filter Likelihood weighting Logic programming1 Introduction
Robotics research has made important achievements in problems such as state estimation, planning, and learning. However, the majority of probabilistic models used, such as Bayesian networks, cannot easily represent relational information, that is, objects, properties as well as the relations that hold between them. Relational representations allow one to encode more general models, to integrate background knowledge about the world, and to convert lowlevel information into humanreadable form. Probabilistic programming languages (De Raedt et al. 2008) and statistical relational learning techniques (SRL) (Getoor and Taskar 2007) provide such relational representations and have been successful in many application areas ranging from natural language processing to bioinformatics.
This paper extends probabilistic logic programming techniques to deal with hybrid relational domains, involving both discrete and continuous random variables in two settings. The first setting is a static one, where we contribute a new inference algorithm for Distributional Clauses (DC) (Gutmann et al. 2011), a recent extension of Sato’s distribution semantics (Sato 1995) for dealing with continuous variables. The second setting is a dynamic one, where we extend the DC framework for coping with time. For the resulting Dynamic Distributional Clauses (DDC), we develop a particle filter (DCPF) that exploits the static inference algorithm for filtering. Particle filters (Doucet et al. 2000) are widely applied in domains such as probabilistic robotics (Thrun et al. 2005), and we adapt them here for use in hybrid relational domains, in which each state of the environment is represented as an interpretation, that is, a set of ground facts that defines a possible world. The statistical relational learning literature already contains several approaches to temporal models and to particle filters; see Sect. 8 for a detailed discussion. However, few frameworks are suited for tracking or other robotics applications: they are too slow for online applications or they only support discrete domains.
The distinguishing features of the proposed framework are that: (1) it provides an inference method that extends the applicability of likelihood weighting (LW) and works with zero probability evidence; (2) it exploits partial worlds as samples allowing for a potentially infinite state space; (3) it employs a relational representation to represent (contextspecific) independence assumptions, and exploits these to speed up inference; (4) it is suited for tracking and robotics applications; and (5) it has a bounded space complexity for filtering by avoiding inference backward in time (backinstantiation).
The contributions of this paper are that (1) we propose an improved sampling algorithm for static inference; (2) we extend DC for dynamic domains (DDC); (3) we introduce an optimized particle filter for DDC; (4) we prove the theoretical correctness for DCPF and study its relation with Rao–Blackwellized particle filters; (5) we integrate online learning in the DCPF; and (6) we adopt the DCPF in some tracking scenarios. This paper is based on previous papers (Nitti et al. 2013, 2014c) (plus a position paper Nitti et al. 2014a and a summary paper Nitti et al. 2014b), but is extended with contributions (1) and (4), and additional experiments.
This paper is organized as follows: we first review DC (Sect. 2), introduce some extensions and present the static inference procedure (Sect. 3). We then introduce the Dynamic DC, the particle filters (Sect. 4), and propose the DCPF (Sect. 5). We then integrate learning in DCPF (Sect. 6), present some experiments (Sect. 7), discuss related work (Sect. 8), and conclude (Sect. 9).
2 Distributional clauses
We now introduce the distributional clauses (DC) (Gutmann et al. 2011), an extension of distribution semantics (Sato 1995), while assuming some familiarity with statistical relational learning and logic programming (De Raedt et al. 2008) (see “Logic programming” in the Appendix for a brief overview).
Formally, a distributional clause is a formula of the form \(\mathtt{{h}}\sim {\mathcal {D}} \leftarrow \mathtt{{b_1,\dots ,b_n}}\), where the \(\mathtt{{b_i}}\) are literals and \(\sim \) is a binary predicate written in infix notation.
The intended meaning of a distributional clause is that each ground instance of the clause \((\mathtt{{h}}\sim {\mathcal {D}} \leftarrow \mathtt{{b_1,\ldots ,b_n}})\theta \) defines the random variable \({\mathtt {h}}\theta \) with distribution \({\mathcal {D}}\theta \) whenever all the \(\mathtt{{b_i}} \theta \) hold, where \(\theta \) is a substitution. In other words, a distributional clause can be seen as a powerful template to define conditional probabilities: \(p(\mathtt{{h}}\theta (\mathtt{{b_1,\ldots ,b_n}})\theta )={\mathcal {D}}\theta \). The term \({\mathcal {D}}\) can be nonground, i.e., values, probabilities, or distribution parameters can be related to conditions in the body. Furthermore, given a random variable r, the term \(\simeq \!\!(r)\) constructed from the reserved functor \(\simeq \!\!/1\) represents the value of r. Abusing notation, for brevity, we shall sometimes write \(r {\sim }=v\) instead of \(\simeq \!\!(r)=v\), which is true iff the value of the random variable \(r\) unifies with v.
Example 1
DCs support continuous distributions (under reasonable conditions) and naturally cope with an unknown number of objects (Gutmann et al. 2011). In addition to distributional clauses, we shall also employ deterministic clauses as in Prolog. We shall often talk about clauses when the context is clear.
A distributional program \(\varvec{\mathbb {P}}\) is a set of distributional and/or deterministic clauses that defines a distribution p(x) over possible worlds x. The probability \(p(q)\) of a query \(q\) can be estimated using MonteCarlo methods, that is, possible worlds are sampled from p(x), and p(q) is approximated as the ratio of samples in which the query q is true.
The procedure used to generate possible worlds defines the semantics and a basic inference algorithm. A possible world is generated starting from the empty partial world \(x=\emptyset \); then for each distributional clause \(\mathtt{{h}} \sim {\mathcal {D}} \leftarrow \mathtt{{b_1, \dots , b_n}}\), whenever the body \(\{\mathtt{{b_1}}\theta , \dots , \mathtt{{b_n}}\theta \}\) is true in the set x for the substitution \(\theta \), a value v for the random variable \(\mathtt{{h}}\theta \) is sampled from the distribution \({\mathcal {D}}\theta \) and \(\mathtt{{h}}\theta =v\) is added to the new partial world \(\hat{x}\). This is also performed for deterministic clauses, adding ground atoms to \(\hat{x}\) whenever the body is true. This process is then recursively repeated until a fixpoint is reached, that is, until no more variables can be sampled and added to the world. The final world is called complete or full, while the intermediate worlds are called partial. The process is based on sampling, thus ‘world’ is often replaced with sample or particle. Notice that a possible world may contain a countably infinite number of random variables (and atoms).
Example 2
Given the DC program \(\varvec{\mathbb {P}}\) defined by (1), (2), and (3), \(\mathtt{\{n=2, pos(1)=3.1,}\) \(\mathtt{pos(2)=4.5,left(1,2)\}}\) is a possible (complete) world. This world is sampled in the following order: \(\emptyset \rightarrow \mathtt{\{n=2\}\rightarrow \{n=2,pos(1)=3.1,pos(2)=4.5\}\rightarrow \{n=2,} \mathtt{pos}\) \(\mathtt{(1)=3.1,pos(2)=4.5,left(1,2)\}}\).
Gutmann et al. (2011) formally describe this generative process using the \(ST_P\) operator. To define a proper probability distribution \(p(x)\), \(\varvec{\mathbb {P}}\) needs to satisfy the validity conditions described in Gutmann et al. (2011). See “Distributional clauses” in the Appendix for details on the validity conditions and the \(ST_P\) operator. Throughout the paper the partial worlds will be written as \(x^{P} \), while \(x \supseteq x^{P} \) indicates a complete world consistent with \(x^{P} \), and \(x^a=x\setminus x^{P}\) represents the remaining part of the world.
We extended DC to allow for negated literals in the body of distributional clauses. To accommodate negation, we need to consider the case that a random variable is not defined in a full world x. Any comparison involving a nondefined variable will fail; therefore, its negation will succeed. In contrast, grounded atoms \(a\) are considered false in \(x\), if \(a\notin x\) (standard closedworld assumption). For further details on negation in DC see “Negation” in the Appendix.
3 Static inference for distributional clauses
Sampling full worlds is generally inefficient or may not even terminate as possible worlds can be infinitely large. Therefore, Gutmann et al. (2011) use magic sets (Bancilhon and Ramakrishnan 1986) to generate only those facts that are relevant for answering the query. Magic sets are a wellknown logic programming technique for forward reasoning. In this paper, we propose a more efficient sampling algorithm based on backward reasoning and likelihood weighting.
3.1 Importance sampling
Given a program \(\varvec{\mathbb {P}}\), the probability of the query q is estimated by applying importance sampling to partial samples \(x^{P(i)}\), with \(i=1,\dots ,N\) where N is the number of samples. In importance sampling the proposal probability \(g(x)\) used to generate samples is not necessary the target probability p(x).
There are two reasons to sample partial worlds instead of complete ones. First, the sampling process is faster and terminates (under some conditions) even when the complete world is a countably infinite set. Second, the estimator variance is generally lower with respect to a naive MonteCarlo estimator that samples complete worlds. Indeed, sampling some variables \(x^{P(i)}\) and computing \(p(qx^{P(i)})\) analytically is an instance of the conditional MonteCarlo method (Lemieux 2009) (sometimes called Rao–Blackwellization), which has better performance with respect to naive MonteCarlo.
The split of x has to guarantee that the probability \(p(qx^{P(i)})\) is analytically computable. Let var(q) denote the set of all random variables in q. If \(var(q)\subseteq var( x^{P(i)})\), all the variables in q are instantiated in \(x^{P(i)}\), thus q can be determined deterministically: Open image in new window . In some cases it is possible to compute \(p(qx^{P(i)})\) without sampling variables in q.
Example 3
In general, it is impossible to sample only var(q) as the DC program does not directly define the distribution of these variables. For example, to sample \(\mathtt{{pos(1)}}\) defined by (2) we first need to sample \(\mathtt{{n}}\) defined by (1); thus we need to follow the generative sampling process (\(ST_P\) operator) until the variables of interest are sampled. Backward reasoning (or the magic set transformation) can help to focus the sampling.
3.2 Sampling partial possible worlds
We now present our approach to sampling possible worlds and computing p(q) following Equation (4) and p(qe) following (5). Central is the algorithm with signature
EvalSampleQuery(q : query, \(x^{P(i)}\) : partial world) returns \((w^{(i)}_q,x_q^{P(i)})\)
 1.
\(x_q^{P(i)}\supseteq x^{P(i)}\), i.e., \(x_q^{P(i)}\) is an expansion of \(x^{P(i)}\) obtained using \(ST_P\),
 2.
\(var(q)\subseteq var(x_q^{P(i)})\), which ensures that we can evaluate q in \(x^{P(i)}_q\) and therefore \(p(qx^{P(i)}_q)\),
 3.
\(w_{q}^{(i)}=p(qx_q^{P(i)})\frac{p(x_{q}^{P(i)}x^{P(i)})}{g(x_{q}^{P(i)}x^{P(i)})}.\)
The key question is thus how to sample one such partial world \(x^{P(i)}_q\) for a generic call of EvalSampleQuery \((q, x^{P(i)})\). To realize this, we combine likelihood weighting (LW) (Fung and Chang 1989; Koller and Friedman 2009) with a variant of SLDresolution in the EvalSampleQuery algorithm that we describe below.
SLDresolution (Apt 1997; Lloyd 1987; Nilsson and Małiszyński 1995) is an inference procedure to prove a query q, used in logic programming, that focuses the proof on the relevant part of the program \(\varvec{\mathbb {P}}\). The basic idea is replacing an atom with its definition. Given \(q=(q_1,q_2,\dots ,q_n)\) and a rule \({head\leftarrow body} \in \varvec{\mathbb {P}}\) such that \(\theta =mgu(q_1,head)\) (i.e. \(q_1\theta =head\theta \)), then the new goal becomes \(q'=(body,q_2,\dots ,q_n)\theta \), where mgu indicates the most general unifier. If the empty goal is reached then the query is proved. If it is impossible to reach the empty goal, the query is assumed false under the closedworld assumption. There may be more than one rule that satisfies the mentioned conditions, resulting in the SLDtree. In Prolog the tree is traversed using depthfirst search with backtracking.
Likelihood weighting is a type of importance sampling that forces variables to be consistent with the evidence by using an adapted proposal distribution \(g\). It has been shown (Fung and Chang 1989) that LW reduces the variance of the estimator with respect to the naive MonteCarlo estimator. In this paper LW is also used to force variables to be consistent with the query. LW has connections with conditional Monte Carlo (for binary events). Indeed, computing \(p(r=vx^{P(i)})\) is equivalent to imposing \(r=v\) and weight the sample with \(w=p(r=vx^{P(i)})\).
 1.
the weight \(w^{(i)}_{q}\), initialized to 1,
 2.
the initial query \(iq\), initialized to q, and
 3.
the partial sample \(x^{P(i)}\).
 1.
\(G'\) is the new goal obtained from \(G\) using a kind of SLDresolution step;
 2.if a new variable \(r\) is sampled with value v,In addition, if \(r{\sim }=Val \in iq\) and \((r{\sim }=v,iq)\Leftrightarrow iq\theta \) with \(r\) grounded and \(\theta =\{Val=v\}\) then:

set \(w^{(i)}_{q}\leftarrow w^{(i)}_{q}\frac{p(r=vx^{P(i)})}{g(r=vx^{P(i)})}\) (based on LW) and

\( x^{P(i)}\leftarrow x^{P(i)}\cup \{r=v\}\).

\(iq\leftarrow iq\theta \)

 3.
if a new atom \(h\) is proved, set \( x^{P(i)}\leftarrow x^{P(i)}\cup \{h\}\).
The algorithm needs the original query to apply LW. Since the inference rules change the current goal to prove, we distinguish the current goal \(G\) from the original query \(iq=q\); during sampling iq can be simplified (e.g., applying substitutions) as long as \(\varvec{\mathbb {P}}\models (x^{P(i)},iq)\Leftrightarrow (x^{P(i)},q)\), i.e. iq is logically equivalent to q given \(x^{P(i)}\) and the DC program \(\varvec{\mathbb {P}}\). For those reasons \(x^{P(i)}, w^{(i)}_{q},\) and \(iq\) are global variables.
 1a.\({\text {If }} [\exists h\in x^{P(i)}:\theta =mgu(q_{1},h)]\) OR \([builtin(q_{1}),\exists \theta : q_{1}\theta ]\) then:i.e., if \(q_{1}\theta \) is true in \(x^{P(i)}\) for a substitution \(\theta \), remove \(q_{1}\) from the current goal and apply the substitution \(\theta \) to the current goal. \(q_{1}\) can also be a builtin predicate such as \(1<4\) that is trivially proved.$$\begin{aligned} (q_1,q_2,\dots ,q_n) \vdash (q_2,\dots ,q_n)\theta \end{aligned}$$
 1b.\({\text {If }} \exists \theta {\text { s.t. }} h\leftarrow body \in \varvec{\mathbb {P}}, \theta =mgu(q_{1},h) \text { then:}\)i.e., if \(q_1\) unifies with the head of a deterministic clause, then add the body of the clause and \(add(h)\) to the current goal, and apply substitution \(\theta \). The special predicate \(add(h)\) indicates that \(h\) must be added to \(x^{P(i)}\) after the body has been proven.$$\begin{aligned} (q_1,q_2,\dots ,q_n)\vdash (body,add(h),q_2,\dots ,q_n)\theta \end{aligned}$$
 2a.If \(\exists \theta \) s.t. \(h=v\in x^{P(i)},\theta =mgu(q_{1}, h{\sim }=v)\):i.e., if \(q_1\theta \) compares a sampled random variable \(h\) to a value and \(q_1\theta \) is true in \(x^{P(i)}\), then remove \(q_{1}\) from the current goal and apply substitution \(\theta \).$$\begin{aligned} (q_1,q_2,\dots ,q_n)\vdash (q_2,\dots ,q_n)\theta \end{aligned}$$
 2b.If \(\exists \theta \) s.t. \(h\sim {\mathcal {D}} \leftarrow body \in \varvec{\mathbb {P}},\theta =mgu(q_{1},h{\sim }=Val),h\notin var(x^{P(i)})\):i.e., if \(q_1\) compares a (not yet sampled) variable \(h\) that unifies with the head of a DC clause, then add the body of the clause and \(sample(h, {\mathcal {D}})\) to the current goal and apply the substitution \(\theta \); \(sample(h, {\mathcal {D}})\) is a special predicate that indicates that we need to sample h from \({\mathcal {D}}\) and add \(h=val\) to \(x^{P(i)}\) after the body has been proven.$$\begin{aligned} (q_1,q_2,\dots ,q_n) \vdash (body,sample(h, {\mathcal {D}}),q_1,q_2,\dots ,q_n)\theta \end{aligned}$$
 3a.If \((h{\sim }=v)\in iq, ground(h{\sim }=v), h\notin var(x^{P(i)}),[(h\ne v,x^{P(i)})\models \lnot iq ]\):i.e., if \(sample(h, {\mathcal {D}})\) is in the current goal, \(h{\sim }=v\) is ground in \(iq\), and \(h\ne v\) makes iq false (always true if iq is a conjunction of literals), and \(h\) is not sampled in \(x^{P(i)}\), then add \(h=v\) to \(x^{P(i)}\), weight accordingly (LW), and remove \(sample(h, {\mathcal {D}})\) from the current goal.$$\begin{aligned} (sample(h, {\mathcal {D}}),q_2,\dots ,q_n)&\vdash \ (q_2,\dots ,q_n)\\ w^{(i)}_{q}&\leftarrow w^{(i)}_{q}\cdot likelihood_{{\mathcal {D}}}(h=v) \\ x^{P(i)}&\leftarrow x^{P(i)}\cup \{h=v\} \end{aligned}$$
 3b.If \(h\notin var(x^{P(i)}),ground(h)\), and rule 3a is not applicable:with v sampled from \({\mathcal {D}}\), and \(\gamma =\{Val=v\}\). That is, if \(sample(h, {\mathcal {D}})\) is in the current goal, and rule 3a is not applicable, then sample h, add it to \(x^{P(i)}\), and remove \(sample(h, {\mathcal {D}})\) from the current goal. Finally, apply the substitution \(\gamma \) to \(iq\) iff \(iq\gamma \) is equivalent to \(iq\) with \(h{\sim }=v\) (always true if iq is a conjunction of literals).$$\begin{aligned} (sample(h, {\mathcal {D}}),q_2,\dots ,q_n)&\vdash \ (q_2,\dots ,q_n) \\ x^{P(i)}&\leftarrow x^{P(i)}\cup \{h=v\}\\ {\text {if }} h{\sim }=Val\in iq,{((h{\sim }=v,iq)\Leftrightarrow iq\gamma )}{\text { then }} iq&\leftarrow iq\gamma \end{aligned}$$
 3c.If ground(h):i.e., if \(add(h)\) is in the current goal and \(h\) is ground, then add \(h\) to \(x^{P(i)}\) and remove \(add(h)\) from the current goal.$$\begin{aligned} (add(h),q_2,\dots ,q_n)&\vdash (q_2,\dots ,q_n)\\ x^{P(i)}&\leftarrow x^{P(i)}\cup \{h\} \end{aligned}$$
Theorem 1
For \(N \rightarrow \infty \) samples generated using EvalSampleQuery, the estimation \(\hat{p}(q)\) obtained using (4) converges with probability 1 to the correct probability p(q).
Proof
It is sufficient to prove that EvalSampleQuery satisfies the importance sampling requirement for which convergence guarantees are available (Robert and Casella 2004), that is \(\forall x: p(qx)p(x)>0\Rightarrow g(x)>0 \) or equivalently: \(\forall x:g(x)=0\Rightarrow p(qx)p(x)=0\). The algorithm samples random variables h using the target distribution when LW is not applied (rule 3b): \(g(hx^{P(i)})=p(hx^{P(i)})\). LW is applied with proposal \(g(h=valx^{P(i)})=1\) for grounded equalities in the initial query \((h{\sim }=val) \in iq\) (rule 3a). Therefore, \(g(h\ne val,x^{P(i)})=0\) but also \(p(qh\ne val,x^{P(i)})p(h\ne val,x^{P(i)})=0\), because the query q fails for \(h\ne val\). Indeed, \((h\ne val,x^{P(i)})\models \lnot iq\) as required in rule 3a, and it is easy to show that \(\varvec{\mathbb {P}}\models (x^{P(i)},iq)\Leftrightarrow (x^{P(i)},q)\), therefore \((\varvec{\mathbb {P}},h\ne val,x^{P(i)})\models \lnot q\). The requirement is thus satisfied (\(\forall x:g(x)=0\Rightarrow p(qx)p(x)=0 \)). \(\square \)
Theorem 1 is extendable for conditional probabilities \(p(qe)=p(q,e)/p(e)\), as long as \(p(e)>0\). The remainder of this section will consider \(p(e)=0\); it can be safely skipped by the reader less interested in technical details.
To apply importance sampling to definition (6), it is sufficient to estimate \(P(e\in [vdv/2,v+dv/2])\) and \(P(q,e\in [vdv/2,v+dv/2])\) for \(dv\rightarrow 0\). Knowing that \(dr\rightarrow 0 \Rightarrow P(h\in [rdr/2,r+dr/2])\rightarrow p(h=r)dr\), every time we apply LW to a continuous variable, \(h=r\) is intended as \(h\in [rdr/2,r+dr/2]\) with \(dr\rightarrow 0\), thus the incremental weight in rule 3a is \(p(h=r)dr\).
Formula (5) needs the sum of importance weights. This has to be carefully computed when there is a mix of densities and probability masses (Owen 2013, Chap. 9.8). Imagine that there is a sample weight \(w_{1}=P(x)\) obtained assigning a discrete variable to a value, and a second sample weight \(w_{2}=p(y)dy\) obtained assigning a continuous variable to a value, or more precisely to a range \([ydy/2,y+dy/2]\), with \(dy\rightarrow 0\). The weight \(w_{1}\) trumps \(w_2\), because the latter goes to zero. Indeed, the second sample has a weight infinitely smaller than the first, and thus it is ignored in the weight sum: \(w_{1}+w_2=P(x)+p(y)dy = P(x)\) (for \(dy\rightarrow 0\)). Analogously, a weight \(w_{a}=p(x_{1},\dots ,x_n)dx_1,\dots ,dx_n\) that is the product of n (onedimensional) densities trumps a weight \(w_{b}=p(x'_{1},\dots ,x'_n,x'_{n+1},\dots ,x'_{n+m})dx_1,\dots ,dx_{n+m}\) that is the product of n densities of the same variables and \(m>0\) other densities (i.e. \(w_a+w_b=w_a\)). If all the weights are ndimensional densities (of the same variables), then the quantities are comparable and are trivially summed. However, if the weights refer to different variables, we need an assumption to ensure the existence of the limit, e.g., \(\forall {i}: dx_i=dx\), thus \({dx^{n+m}}/{dx^n}\rightarrow 0\), making again ndimensional densities trump \((n+m)\)dimensional densities. For \(m=0\) the densities are trivially summed, e.g., \(w_1=p(a,b,c)dadbdc\) and \(w_2=p(f,g,e)dfdgde\), assuming \(dadbdc=dfdgde=dx^{3}\), we obtain \(w_1+w_2=(p(a,b,c)+p(f,g,e))dx^{3}\). Finally, the ratio of weights sums in (5) is computed assuming again \(\forall {i}: dx_i=dx\), obtaining \(P(qe)\approx \lim \limits _{dx\rightarrow 0}(k_n dx^v)/(k_d dx^l)\). If \(v>l\) then \(P(qe)=0\), otherwise for \(v=l\) we have \(P(qe)\approx {k_n}/{k_d}\). Those distinctions are automatically performed in EvalSampleQuery. For zero probability evidence we do not have convergence results for every DC program, query, and evidence, nonetheless the inference algorithm produces the correct results in many domains, as shown in the next section and in the experiments.
3.3 Examples
We now illustrate EvalSampleQuery and the cases when LW can be applied with the following example.
Example 4
Let us consider the query \(\mathtt{{p(color(2){\sim }=black)}}\), the derivation is the following (omitting \(iq=q\)):
The algorithm starts checking whether a rule is applicable to the current goal initialized with the query. For example, rule 2a fails because \(\mathtt{{color(2)}}\) is not in the sample \(x^{P(i)}\). Rule 2b can be applied to clause (8), obtaining tuple 2. At this point \(\mathtt{{material(2)}}\) needs to be evaluated, it is not sampled and it unifies with the head of clause (10), thus applying rule 2b we obtain tuple 3. Now ‘\(\mathtt{{n}}\)’ is required, thus it is sampled with value e.g, 3 (rules 2b on (7) and 3b), for which ‘\(\mathtt{{n{\sim }=N,between(1,N,2)}}\)’ succeeds for \(\mathtt{{ N=3}}\) (rule 2a and 1a). At tuple 6 the body of (10) has been proven; therefore, \(\mathtt{{material(2)}}\) is sampled with a value e.g., \(\mathtt{{wood}}\) (rule 3b). Now in tuple 7, the formula \(\mathtt{{ material(2){\sim }=metal}}\) fails because the sampled value is \(\mathtt{{ wood}}\) (and thus the body of clause (8) fails); the algorithm backtracks to tuple 1 and applies rule 2b on clause (9) obtaining tuple 9. This time \(\mathtt{{ material(2){\sim }=wood}}\) is true and can be removed from the current goal (rule 2a). At this point \(\mathtt{{color(2)}}\) needs to be sampled (tuple 10). LW is applied because \(\mathtt{{color(2)}}\) is in the original query \(iq=q\), thus \(\mathtt{{color(2)=black}}\) is added to the sample with weight 1 / 2 (rule 3a). The query in this sample is true with final weight 1 / 2.
3.3.1 Query expansion
 e1
replace each deterministic atom \( h\in proof\) with \( body\theta \) where \( head \leftarrow body \in \varvec{\mathbb {P}}\) and \( \theta =mgu(h,head)\), the process is repeated recursively for \( body\theta \);
 e2
if the truth value of \( h\in proof\) cannot be determined it is left unchanged;
 e3
for each \( h{\sim }=value \in proof\) add \( body\theta \) where \(head\sim {\mathcal {D}} \leftarrow body \in \varvec{\mathbb {P}}\) and \( \theta =mgu(h,head)\), the process is repeated recursively for \( body\theta \).
After the iq expansion the sampling algorithm can start using the same inference rules, that cover the case in which iq contains disjunctions. As described in rule 3a, we can apply LW setting \(h=val\), only when \((h\ne val,x^{P(i)})\models \lnot iq\). Thus, if \(iq=\mathtt{{((h{\sim }=val\ OR\ a),b)}}\), LW cannot be applied for \(\mathtt{{h{\sim }=val}}\). However, if \(\mathtt{{a}}\) becomes false, iq simplifies to \(\mathtt{{ (h{\sim }=val,b)}}\), and LW can be applied setting \(h=val\), because \(h\ne val\) makes iq false. To determine whether \((h\ne val,x^{P(i)})\models \lnot iq\) holds, it is convenient to simplify iq whenever a random variable is sampled. For example, after a random variable \(\mathtt{{h}}\) has been sampled with value \(\mathtt{{v}}\), \(\mathtt{{ h{\sim }=val}}\) is replaced with its truth value (true or false). Furthermore, \((\mathtt{{true\ OR\ a}})\) is simplified with \(\mathtt{{true}}\), \((\mathtt{{false\ OR\ a}})\) with \(\mathtt{{a}}\), and so on. The expansion of iq and the simplification guarantee that \(\varvec{\mathbb {P}}\models (x^{P(i)},iq)\Leftrightarrow (x^{P(i)},q)\) as required by Theorem 1. The iq expansion allows to exploit LW in a broader set of cases. Indeed, it is basically a form of partial evaluation adapted for DCs and being able to exploit similar optimizations, e.g., using constraint propagation where the constraints that make the query true are propagated with the iq expansion, and updated according to the sampled variables.
3.3.2 Complex queries
Example 5
A more complex query is \(\mathtt{{\simeq \!\!(color(2))=\ \simeq \!\!(color(1))}}\), which is converted to \(\mathtt{{color(2){\sim }=Y, color(1){\sim }=Y}}\) as in this way each subgoal refers to a single random variable. In this case, \(\mathtt{{color(2)}}\) is sampled (rule 3b: LW is not used ) for example to \(\mathtt{{red}}\) (after sampling \({\mathtt{{n}}}\) and \({\mathtt{{material}}}\)). Assuming \(\mathtt{{n}}\ge 2\) the first subgoal \(\mathtt{{color(2){\sim }=Y}}\) succeeds with substitution \(\gamma =\{\mathtt{{Y= red}}\}\), thus the original query becomes \(iq=\mathtt{{ (color(2){\sim }=red, color(1){\sim }=red}}\)). The remaining subgoal will be \(\mathtt{{color(1){\sim }=red}}\) for which LW is used (rule 3a). Indeed, the original query \(iq\) becomes grounded and LW can be applied.
The examples show that EvalSampleQuery exploits LW in complex queries (or evidence). This is also valid for continuous random variables for which MCMC or naive MC will return \(0\). The probability of such queries is \(0\), but \(p(e)=0\) is not a satisfactory answer to estimate p(qe), and the limit (6) needs to be computed. For simple evidence or queries (e.g., \(\mathtt{{size(1){\sim }=v}}\)) classical LW is sufficient to solve the problem. For more complex evidence (e.g., \(\mathtt{{\simeq \!\!(size(1))=\ \simeq \!\!( size(2))}}\)) MCMC, naive MC or classical LW will fail to provide an answer (all samples are rejected). Many probabilistic languages cannot handle those queries (if we exclude explicit approximations such as discretization). In contrast, the proposed algorithm is able to provide a meaningful answer.
Example 6
Example 7
The last case to discuss is a query that contains random variables that are nonground terms, e.g., \(\mathtt{{color(X){\sim }=black}}\), which is interpreted as \(\mathtt{{\exists X\ color(X){\sim }=}} \mathtt{{black}}\). In this case LW is not applied because the goal is nonground. Applying LW would produce wrong results because we would force the value \(\mathtt{{black}}\) only for the first proof (e.g., \(\mathtt{{color(1){\sim }=black}}\)), ignoring the other possible proofs (e.g., \(\mathtt{{color(2){\sim }=black}}\)), and thus violating the importance sampling requirement. In some cases, query expansion can enumerate all possible grounded proofs, making LW applicable.
LW can also be applied for the query \(\mathtt{{\simeq \!\!(material(\simeq \!\!(drawn(1))))= wood}}\) (the first drawn ball is made of wood) which is converted to \(\mathtt{(drawn(1){\sim }=X,}\) \(\mathtt{{material}}{} \mathtt{{(X){\sim }=}}\) wood). Once \(\mathtt{{drawn(1)}}\) is sampled to a value \(\mathtt{{v}}\) (without LW), the substitution \(\theta =\{\mathtt{{X= v}}\}\) is applied to the current goal and to iq. At this point LW can be applied to \(\mathtt{{material(v){\sim }=wood}}\) because it is grounded in iq (rule 3a). In other words, for the partial world \(x^{P(i)}=\mathtt{\{drawn(1)=v\}}\) the only value of \(\mathtt{{material(v)}}\) that makes the query true is \(\mathtt{{wood}}\) for which LW is applicable. For this query one sample is sufficient to obtain the exact result.
4 Dynamic distributional clauses
We now extend Distributional Clauses to Dynamic Distributional Clauses (DDC) for modeling dynamic domains. We then discuss how inference by means of filtering is realized in the propositional case.
4.1 Dynamic distributional clauses
 1.
the prior distribution: \(\mathtt{{h_0}}\sim {\mathcal {D}} \leftarrow \mathtt{{body_0}}\),
 2.
the state transition model : \(\mathtt{{h_{t+1}}}\sim {\mathcal {D}} \leftarrow \mathtt{{body_{t:t+1}}}\) (the body involves variables at time t and eventually at time \(t+1\)),
 3.
the measurement model: \(\mathtt{{z_{t+1}}}\sim {\mathcal {D}} \leftarrow \mathtt{{body_{t+1}}}\), and
 4.
clauses that define a random variable at time t from other variables at the same time (intratime dependence): \(\mathtt{{h_{t}}}\sim {\mathcal {D}} \leftarrow \mathtt{{body_t}}\).
Example 8
4.2 Inference and filtering
If we consider the time step as an argument of the random variable, inference can be done as in the static case with no changes. However, the performance will degrade, while time and space complexity will grow linearly with the maximum time step considered in the query. This problem can be mitigated if we are interested in filtering, as we shall show in the next section.
 (a)
Prediction step: sample a new set of samples \(x_{t+1}^{(i)}\), \(i=1, \ldots , N\), from a proposal distribution \(g(x_{t+1}  x_{t}^{(i)}, z_{t+1}, u_{t+1})\).
 (b)Weighting step: assign to each sample \(x_{t+1}^{(i)}\) the weight:$$\begin{aligned} w_{t+1}^{(i)} =w_{t}^{(i)} \frac{p(z_{t+1}  x_{t+1}^{(i)}) p(x_{t+1}^{(i)}  x_{t}^{(i)}, u_{t+1})}{g(x_{t+1}^{(i)}  x_{t}^{(i)}, z_{t+1}, u_{t+1})} \end{aligned}$$
 (c)
Resampling: if the variance of the sample weights exceeds a certain threshold, resample with replacement, from the sample set, with probability proportional to \(w_{t+1}^{(i)}\) and set the weights to 1.
5 DCPF: a particle filter for dynamic distributional clauses
We now develop a particle filter for a set of dynamic distributional clauses that define the prior distribution, state transition model and observation model, (cf. Sect. 4.1). Throughout this development, we only consider the bootstrap filter for simplicity, but other proposal distributions are possible.
5.1 Filtering algorithm
The basic relational particle filter applies the same steps as the classical particle filter sketched in Sect. 4.2 and employs the forward reasoning procedure for distributional clauses sketched in Sect. 2. Each sample \(x_{t}^{(i)}\) will be a complete possible world at time \(t\). Working with complete worlds is computationally expensive and may lead to bad performance. Therefore, we shall work with samples that are partial worlds as in the static case. The resulting framework, that we shall now introduce, is called the Distributional Clauses Particle Filter (DCPF).

Step (1): expand the partial sample to compute \(w_{t+1}^{(i)} = w_{t}^{(i)} p({z_{t+1}} \hat{x}_{t+1}^{P(i)})\)

Resampling (if necessary)

Step (2): complete the prediction step (a)
Step (2) performs the prediction step for variables that have not yet been sampled because they are not directly involved in the weighting step. The algorithm queries the head of any DC clause in the state transition model (intratime clauses excluded), thus it evaluates the body recursively. Whenever the body is true for a substitution \(\theta \), the variabledistribution pair \(r_{t+1}\theta \sim {\mathcal {D}}\theta \) is added to the sample. Avoiding sampling is beneficial for performance, as discussed in Sect. 3.1. Step (2) is necessary to make sure that the partial sample at the previous time step t can be safely forgotten, as we shall discuss in the next section. After Step (2) the set of partial samples \(\hat{x}^{P(i)}_{t+1}\) with weights \(w_{t+1}^{(i)}\) approximates the new belief \(bel(x_{t+1})\).
In the classical particle filter resampling is the last step (c). In contrast, DCPF performs resampling before completing the prediction step (i.e., before Step (2)). This is loosely connected to auxiliary particle filters that perform resampling before the prediction step (Whiteley and Johansen 2010). Anticipating resampling is beneficial because it reduces the variance of the estimation.
To answer a query \(q_{t+1}\), it suffices to call \((w_{q}^{(i)},\hat{x}_{q_{t+1}}^{P(i)})\leftarrow \) EvalSample Query(\( q,\hat{x}_{t+1}^{P(i)}\)) for each sample and use formula (5) where \(w_e^{(i)}\) is replaced by \(w_{t+1}^{(i)}\). After querying, the partial samples \(\hat{x}_{q_{t+1}}^{P(i)}\supseteq \hat{x}_{t+1}^{P(i)}\) are discarded, i.e., the partial samples remain \(\hat{x}_{t+1}^{P(i)}\). This improves the performance, indeed querying does not expand \(\hat{x}_{t+1}^{P(i)}\).
Example 9
To understand the filtering algorithm let us consider Step (1) for the observation \(\mathtt{{obsPos(1)_{t+1}{\sim }=2.5}}\) (there is no observation for object 2), and action \(\mathtt{{tap(1)_{t+1}}}\). Let us assume \(x_{t}^{P(i)}=\mathtt{{\{pos(1)_t=2,pos(2)_t=5\}}}\). The sample before and after the filtering step is shown in Fig. 2. The algorithm tries to prove \(\mathtt{{ {obsPos(1)_{t+1}}{\sim }=2.5}}\). Rule 2b applies for DC clause (21) that defines \(\mathtt{{obsPos(1)_{t+1}}}\) for \(\theta =\mathtt{{\{ID=1\}}}\). The algorithm tries to prove the body and the variables in the distribution recursively, that is \(\mathtt{{ pos(1)_{t+1}}}\). The latter is not in the sample and rule 2b applies for DC clause (17) with \(\theta =\mathtt{{\{ID=1\}}}\). The proof fails, therefore it backtracks and applies rule 2b to (18). Its body is true assuming that \(\mathtt{{on(1,3)_t}}\) succeeds (and added to the sample). Thus, \(\mathtt{{pos(1)_{t+1}}}\) will be sampled from \(\mathtt{{gaussian(2.3,0.04)}}\) and added to the sample (rule 3b). Deterministic facts in the background knowledge, such as \(\mathtt{{ type(1,cube)}}\), are common to all samples; therefore, they are not added to the sample. At this point \(p(\mathtt{{obsPos(1)_{t+1}}}{\sim }=2.5\hat{x}^{P(i)}_{t+1})\) is given by (21). This is equivalent to applying rule 3a that imposes the query to be true and updates the weight.
Step (1) is complete, let us consider Step (2). The algorithm queries all the variables in the head of a clause in the state transition model, in this case \(\mathtt{{pos(ID)_{t+1}}}\). This is necessary to propagate the belief for variables not involved in the weighting process, such as \(\mathtt{{pos(2)_{t+1}}}\). The query \(\mathtt{{ pos(ID)_{t+1}{\sim }=Val}}\) succeeds for \(\mathtt{{ID=2}}\) applying (17), and \(\mathtt{{pos(2)_{t+1}\sim gaussian(5,0.01)}}\) is added to the sample. The algorithm backtracks looking for alternative proofs of q, there are none, so the procedure ends. In the next step \(\mathtt{{ pos(2)_{t+1}}}\) is required for \(\mathtt{{pos(2)_{t+2}}}\), so \(\mathtt{{ pos(2)_{t+1}}}\) will be sampled from the distribution stored in the sample. Note that \(\mathtt{{on(A,B)_t}}\) is evaluated selectively. Any other relation or random variable eventually defined in the program remains marginalized. For example, any relation that involves object 2 is not required (e.g., \(\mathtt{{on(2,B)_t}}\)).
5.2 Avoiding backinstantiation
We showed that lazy instantiation is beneficial to reduce the number of variables to sample and to improve the precision of the estimation in the static case. However, to evaluate a query at time t in dynamic models, the algorithm might need to instantiate variables at previous steps, sometimes even at time 0. We call this backinstantiation. This requires one to store the entire sampled trajectory \(x^{P(i)}_{1:t}\), which may have a negative effect on performance. If we are interested in filtering, this is a waste of resources.
We shall now show that the described filtering algorithm performs lazy instantiation over time and avoids backinstantiation. We will first derive sufficient conditions for avoiding backinstantiation in DDC, and then prove that these conditions hold for the DCPF algorithm.
Rao–Blackwellized particle filters (RBPF) described in the literature, adopt a fixed and manually defined split of \(x_t = x_{t}^{P} \cup x_{t}^{a}\). In contrast, our approach exploits the language and its inference algorithm to perform a dynamic split that may differ accross samples, as described for the static case in Sect. 3.
Backinstantiation in the DCPF. One contribution in the DCPF is that it integrates RBPF and logic programming to avoid backinstantiation over variables \(r\in x_{1:t1}\). For this reason we are interested in performing a filtering step determining the smallest partial samples that approximate the new belief \(bel(x_{t+1})\) and are dseparated from the past. To avoid backinstantiation we require that \(p(x^{a}_{t} x^{P(i)}_{1:t},z_{1:t},u_{1:t} )\) is a known distribution for each sample i or at most parametrized by \( x^{P(i)}_{t}\): \(p(x^{a}_{t} x^{P(i)}_{1:t},z_{1:t},u_{1:t} )=f^{(i)}_{t}( x^a_{t}; x^{P(i)}_{t} )\). Note that the latter equation does not make any independence assumptions: \(f^{(i)}_{t}\) is a probability distribution that incorporates the dependency of previous states and observations and can be different in each sample.
Formally, starting from a partial sample \(x^{P(i)}_{1:t}\) with weight \(w^{(i)}_{t}\) sampled from \(p(x^{P}_{1:t}z_{1:t},u_{1:t})\), a new observation \(z_{t+1}\), a new action \(u_{t+1}\), and the distribution \(p(x^{a}_{t}x^{P(i)}_{1:t},z_{1:t},u_{1:t} )\), we look for the smallest partial sample \(\hat{x}^{P(i)}_{1:t+1}=\{x^{P(i)}_{1:t1},\hat{x}_{t}^{P(i)},\hat{x}_{t+1}^{P(i)}\}\) with \(x_{t}^{P(i)}\subseteq \hat{x}_{t}^{P(i)}\), such that \(\hat{x}^{P(i)}_{1:t+1}\) with weight \(w^{(i)}_{t+1}\) is distributed as \(p(\hat{x}^{P}_{1:t+1}z_{1:t+1},u_{1:t+1})\) and \(p(x^{a}_{t+1}\hat{x}^{P(i)}_{1:t+1},z_{1:t+1},u_{1:t+1} )\) is a probability distribution available in closed form. Even though the formulation considers the entire sequence \(x_{1:t+1}\), to estimate \(bel(x_{t+1})\) the previous samples \(\{x^{P(i)}_{1:t1},\hat{x}_{t}^{P(i)}\}\) can be forgotten.
Dseparation conditions. We will now describe sufficient conditions that guarantee dseparation and thus avoid backinstantiation; then we will show that these conditions hold for the DCPF filtering algorithm. The belief update is performed by adopting RBPF steps.
 1.
the partial interpretation \(\hat{x}^{P(i)}_{t+1}\) does not depend on the marginalized variables \(x^a_{t}\): \(p(\hat{x}^{P(i)}_{t+1} \hat{x}^{P(i)}_{t},\hat{x}^a_{t}, u_{t+1})=p(\hat{x}^{P(i)}_{t+1} \hat{x}^{P(i)}_{t},u_{t+1})\);
 2.
the weighting function does not depend on the marginalized variables: \(p(z_{t+1}  x^{(i)}_{t+1})= p(z_{t+1}\hat{x}^{P(i)}_{t+1})\)
 3.
\(p(\hat{x}^a_{t+1}\hat{x}^{P(i)}_{1:t+1},z_{1:t+1},u_{1:t+1} )=f^{(i)}_{t+1}(\hat{x}^a_{t+1}; \hat{x}^{P(i)}_{t+1})\) is available in closed form.
Theorem 2
Under the dseparation conditions 1,2,3 the samples
Proof
Step (1) in the filtering algorithm (Sect. 5.1) guarantees condition 1 and 2, while Step (2) guarantees condition 3. For a proof sketch see Theorems 5 and 6 in “Theorems” in the Appendix. Indeed, EvalSampleQuery used in Step (1) and (2) will never need to sample variables at time \(t1\) or before, because the belief distribution of marginalized variables \(r_t\in \hat{x}^a_t\) is \(f^{(i)}_{t}(x^a_t;\hat{x}^{P(i)}_t )\) are available in closed form and (eventually) parameterized by \(\hat{x}^{P(i)}_t\), while \(r_{t+1}\in \hat{x}^a_{t+1}\) are sampled from the state transition model. After Step (2) any \(r_{t+1}\in \hat{x}^a_{t+1}\) is derivable from \(\hat{x}_{t+1}^{P(i)}\) together with the DDC program. These conditions avoid backinstantiation during filtering or query evaluation, thus previous partial states \(x_{0:t}^{P(i)}\) can be forgotten.
Step (2) avoids computing the integral (23). The integral is approximated with a single sample, or equivalently the partial sample is expanded until \(\hat{x}^a_{t+1}\) does not depend on marginalized \(\hat{x}^a_t\), for which \(f^{(i)}_{t+1}(\hat{x}^a_{t+1}; \hat{x}^{P(i)}_{t+1})=p(\hat{x}^a_{t+1}\hat{x}^{P(i)}_t, \hat{x}^{P(i)}_{t+1})\) is derivable from the DDC program. In detail, for each \(r_{t+1}\theta \in \hat{x}^a_{t+1}\) Step (2) stores \(r_{t+1}\theta \sim {\mathcal {D}}\theta \), where \({\mathcal {D}}\theta =p(r_{t+1}\theta \hat{x}^{P(i)}_t, \hat{x}^{P(i)}_{t+1})\), e.g., \(\mathtt{{pos(2)_{t+1}}}\sim \mathtt{{Gaussian(5,0.01)}}=f^{(i)}_{t+1}{} \mathtt{{( pos(2)_{t+1})}}\) as shown in Fig. 2. Such distribution is not parametrized and it is generally different in each sample. In contrast, \(p(\mathtt{{size(A)_{t}}}x^{P(i)}_{1:t},z_{1:t},u_{1:t} )=Gaussian([{\text {if }}{} \mathtt{{type(A,ball)}}\) then \( \mu =1 ;{\text { else }} \mu =2],0.1)=f^{(i)}_{t}(\mathtt{{size(A)_{t}}};x^{P(i)}_{t} )\) is a distribution parametrized by other variables in \(x^{P(i)}_{t}\). It is sufficient to store this parametric function once, using DDC clauses, instead of storing each distribution separately for each sample. Similarly, the DDC clause that defines \(\mathtt{{on(A,B)_t}}\) from object positions at time t is sufficient to represent all marginalized facts \(\mathtt{{on(a,b)_t}}\) not in \(x^{P(i)}_t\). In general, intratime DDC clauses can represent distributions of marginalized variables for an unspecified number of objects. Thus, Step (2) does not need to query variables defined by intratime clauses, as proved in Theorem 6, “Theorems” in the Appendix.
Step (2) could be improved applying (23) whenever possible. Moreover, if there is a set of variables that has the same prior and transition model, the belief update (23) can be performed once for all whole set. Whenever a variable is required, it will be sampled. This does not require bounding the number of such sets of variables, and it can be considered as a simple form of lifted belief update. In some cases the belief \(f^{(i)}_{t}(x^a_{t}; x^{P(i)}_{t})\) can be directly specified for any time \(t\), we call this precomputed belief. As lifted belief update, this is applicable to an unbounded set of variables, and avoids unnecessary sampling.
Theorem 3
DCPF has a space complexity per step and sample bounded by the size of the largest partial state \(x^{P(i)}_{t}\), together with \(f^{(i)}_t(x_t^a; x^{P(i)}_{t})\).
Proof
(sketch) The filtering algorithm proposed for DCPF avoids backinstantiation, therefore the space complexity is bounded by the dimension of the state space at time \(t\). A tighter bound is the size of the largest partial state. \(\square \)
5.3 Limitations
We will now describe the limitations of the proposed algorithms.
Lazy instantiation is beneficial only when there are facts or random variables that are irrelevant during inference. For example, this is true when the model includes background knowledge that is not entirely required for a query. In fullyconnected models or when the entire world is relevant for a query, lazy instantiation is useless. Nonetheless, the proposed method generalizes LW, thus it can be beneficial even in the described worse cases.
The above issues apply also to inference in dynamic models, but the latter raises additional issues for filtering in particular. One is the curse of dimensionality that produces poor results for highdimensional state spaces, or equivalently it requires a huge number of samples to give acceptable results. There are some solutions in the literature (e.g., factorising the state space Ng et al. 2002). In DCPF this problem can be alleviated using a suboptimal proposal distribution (see “Proposal distribution” in the Appendix for details).
6 Online parameter learning
So far we assumed that the model used to perform state estimation is known. In practice, it may be hard to determine or to tune the parameters manually, and therefore the question arises as to whether we can learn them. We will first review online parameter learning in classical particle filters, then we will show how to adapt those methods in DCPF.
6.1 Learning in particle filters
A simple solution to perform state estimation and parameter learning with particle filters consists of adding the static parameters to the state space: \(\bar{x}_t=\{x_t,\theta \}\). The posterior distribution \(p(\bar{x}_tz_{1:t})\) is then described as a set of samples \(\{x^{(i)}_t,\theta ^{(i)}\}\). However, this solution produces poor results due to the following degeneracy problem. As the parameters are sampled in the first step and left unchanged (since they are static variables), after a few steps the parameter samples \(\theta ^{(i)}\) will degenerate to a single value due to resampling. This value will remain unchanged regardless of incoming new evidence. Limiting or removing resampling is not a good solution, because it will produce poor state estimation results. Better strategies have been proposed and are summarized in Kantas et al. (2009). We focus on two simple techniques with limited computational cost: artificial dynamics (Higuchi 2001) and resamplemove (Gilks and Berzuini 2001). Both methods introduce diversity among the samples to solve the described degeneration problem.
The first method adds artificial dynamics to the parameter \(\theta \): \(\theta _{t+1}=\theta _t + \epsilon _{t+1},\) where \(\epsilon _{t+1}\) is artificial noise with a small and decreasing variance over time. This strategy has the advantage to be simple and fast, nonetheless it modifies the original problem and requires tuning (Kantas et al. 2009). We will show that this technique is suitable for the scenarios considered in this paper (for a limited number of parameters).

propagate: \(x^{(i)}_t \sim g({x_t}{x^{(i)}_{t1}},\theta _{t1}^{(i)},z_t)\),

resample samples with weights: \(w^{(i)}_t=\frac{p({z_t}{x^{(i)}_t},\theta _{t1}^{(i)})p({x^{(i)}_t}{x^{(i)}_{t1}},\theta _{t1}^{(i)})}{g({x^{(i)}_t}{x^{(i)}_{t1}},\theta _{t1}^{(i)},z_t)}\),

propagate sufficient statistics: \(s^{(i)}_{t} = S(s^{(i)}_{t1},x^{(i)}_t,x^{(i)}_{t1},z^{(i)}_t)\), and

sample \(\theta _{t}^{(i)} \sim p(\theta s^{(i)}_{t})\).
6.2 Online parameter learning for DCPF
We now propose an integration of the mentioned learning methods in DCPF. The main contribution is to adapt artificial dynamics and the Storvik’s filter for DCPF and allow learning of a number of parameters defined at runtime. Indeed, the relational representation allows to define an unbounded set of parameters to learn, e.g., the size of each object \(\mathtt{{size(ID)}}\). The number of objects and thus parameters to learn is not necessarily known in advance.
For example, in the Learnsize scenario, for each object \(\mathtt{{ID}}\) we have \({\hat{\theta }_t}=\mathtt {cursize_{t}(ID)}\) and the parameter to learn is \(\theta =\mathtt {size(ID)}\). For each object \(p({\hat{\theta }_t}\theta )\) is defined as \(\mathtt{{cursize_{t}(ID) \sim Gaussian(\simeq \!\!size(ID),}}\bar{\sigma }^2)\), where \(\bar{\sigma }^2\) is a fixed variance. The conjugate prior of \(\mathtt{{size(ID)}}\) is a Gaussian with hyperparameters \(\mathtt{{\mu _0(ID),\sigma _0^2(ID)}}\): \(\mathtt{{ size(ID) \sim Gaussian(\mu _0(ID),\sigma _0^2(ID))}}\). As explained \(\theta =\mathtt {size(ID)}\) need not be sampled, indeed \({\hat{\theta }_t}=\mathtt {cursize_{t}(ID)}\) is directly sampled from \(p({\hat{\theta }_t}s_{t1})\), i.e. \(\mathtt{{cursize_{t}(ID)} \sim Gaussian(\mu _{t1}(ID),\sigma _{t1}^2(ID)+\sigma ^2)}\). For each \(\mathtt{{ ID}}\) the posterior \(p(\theta s_{t})\) is a Gaussian as the prior, and the sufficient statistics \(s_{t}=\mathtt{{\mu _{t}(ID),\sigma _{t}^2(ID)}}\) are computed using Bayesian inference. The posterior distribution of the parameters can become peaked in few steps, causing again a degeneration problem. This issue is mitigated reducing the influence of the evidence during the Bayesian update.
7 Experiments

(Q1) Does the EvalSampleQuery algorithm obtain the correct results?

(Q2) How do the DCPF and the classical particle filter compare?

(Q3) How do the DC and DCPF perform with respect to a representative stateoftheart probabilistic programming language for static and dynamic domains?

(Q4) Is the DCPF suitable for realworld applications?

(Q5) How do the learning algorithms perform?
All algorithms were implemented in YAP Prolog and run on an Intel Core i7 3.3GHz for simulations and on a laptop Core i7 for realworld experiments. To measure the error between the predicted and the exact probability for a given query, we compute the empirical standard deviation (STD). The average used to compute STD is the exact probability when available or the empirical average otherwise. We report STD \(99\%\) confidence intervals. Notice that those intervals refer to the uncertainty of the STD estimation, not to the uncertainty of the probability. If the number of samples is not sufficient to give an answer (e.g., all samples are rejected), a value is randomly chosen from 0 to 1. The results are averaged over 500 runs. In all the experiments we measure the CPU time (“user time” in the Unix “time” command). This makes a fair comparison between DC and DCPF (not parallelized in the current implementation) and (D)BLOG that often uses more than one CPU at the time. Time includes initialization: around 0.3s for (D)BLOG, 0.03s for DC and DCPF.
We first describe experiments in static domains, then in dynamic domains (synthetic and realworld scenarios).
7.1 Static domains
In the third experiment we considered continuous variables using Example 4 (Fig. 6). We queried the probability that the first drawn ball is made of wood, given that its size is 0.4. BLOG and naive MC failed to give an answer, while DC (LW) provides a probability. To compare with BLOG we had to consider an interval instead of a value ([0.39, 0.41]). Both DC and BLOG converge to 0.16, this confirms that DC works properly with continuous variables. Figure 6 shows the STD and time performance. EvalSampleQuery exploits LW also in this case, while BLOG needs evidence discretization to give an answer and does not exploit LW in this case. For this reason BLOG performs poorly. In this case query expansion does not provide an improvement. Another tested query is the probability that two drawn balls are the same, given that they have the same size. This probability is one, because the size has a continuous distribution, thus the probability of having two different balls with the same size is infinitely smaller than the probability of sampling the same ball; for this reason the balls must be the same. Again, DC provides the correct result, while BLOG does not provide an answer.
From the above experiments we make the following comparison with BLOG. Given a query BLOG stacks variables that need to be sampled to answer the query, and uses LW to generate samples consistent with the evidence. This follows the lazy instantiation principle applied in EvalSampleQuery, nonetheless there are the following differences. BLOG exploits LW only for simple evidence statements of the form \(\mathtt{{r=value}}\), thus it performs worse than DC with complex queries described in Sect. 3.3.2, which the experiments confirm. Furthermore, BLOG is not always able to give an answer for complex queries or evidence containing continuous variables as shown above. In contrast, DC gives meaningful answers and exploits LW in a much wide range of queries. Finally, BLOG seems to be less suited than DC for realtime inference, because it is generally slower with higher variance.
7.2 Synthetic dynamic domains
We now answer questions Q2 and Q3 comparing the classical particle filter, DCPF and DBLOG in dynamic domains. In all dynamic experiments we disabled the query expansion because it is not necessary.
In this section we used a probabilistic Wumpus world [inspired by Russell and Norvig (2009)]. This is a discrete world with a twodimensional grid of cells, that can be either free, a wall, or a pit. In one of the cells the horrible wumpus lives and each cell can contain gold. Each pit produces a breeze in the neighboring cells, and the wumpus produces a stench in the neighboring cells. The agent has to estimate the hidden state consisting of the wumpus’ location, the state of each cell (free, wall or pit), as well as its own position in the maze. The agent has four stochastic ‘move’ actions: up, down, left, right, which change the position by 1 cell or lead to no change with a particular probability. Furthermore, the agent has noisy sensors to observe whether there is a breeze, a stench, or gold in the cell, and whether there are walls in the neighboring cells. We assume that the agent starts from position (0,0), therefore the cell (0,0) is free.
In the Wumpus domain we use a lifted belief update or precomputed belief. For example, if the belief at time t for each cell not in the partial sample is \(\mathtt{{maze(X,Y)_t\sim finite([0.6:}}{} \mathtt{{free,0.4:wall])}}\) and the state transition is \(\mathtt{{maze(X,Y)_{t+1} \sim {val(\simeq \!\!(maze(X,Y)_t))}}}\) (the next cell state is equal to the current cell state), the belief update can be done once for all the cells that have not been sampled yet. In this case the belief remains the same over time, thus, we can directly define the belief at time t without doing lifted belief update. If a cell \(\mathtt{{maze(x,y)_t}}\) is required it will be sampled, and the belief update for this cell will be performed by sampling. For nonsampled cells \(\mathtt{{ maze(X,Y)_t\sim finite([0.6:free,0.4:wall])}}\) is used instead. Lifted belief update and precomputed belief do not require that the size of the grid is specified.
Classical Particle Filter (Q2). The classical particle filter samples the entire state every step with a forward reasoning procedure.
In the first experiment (Wumpus1) there are no pits and no wumpuses. The goal is to compute the joint distribution of 3 cells state (\(\mathtt{{maze(0,2)_t}}\), \(\mathtt{{maze(1,1)_t}}\), \(\mathtt{{maze(2,0)_t}}\)) given noisy gold and cell observations. The experiment consists of 3 steps (to keep exact inference feasible) with one or two observations per step. In the classical particle filter, we need to limit the size of the grid in advance. In contrast, the DCPF estimates the borders of the maze itself. To measure the error between the predicted and the exact posteriors we use the total variation divergence (i.e., the sum of absolute differences averaged over runs). Figure 7a, b show that both algorithms provide correct results (Q1), but our DCPF produces lower errors and is faster when compared to the classical particle filter for the same number of samples (Fig. 7b). This is because DCPF reduces the number of tracked variables and therefore reduces both the variance in the sampling process and the execution time. As expected, the maze size of the classical particle filter affects the performance. The DCPF by contrast does not require a fixed grid size using a precomputed belief or lifted belief update. This makes DCPF more flexible and faster in comparison (Q2).
DBLOG (Q3). Because (D)BLOG cannot fully cope with continuous evidence as DCPF, we now focus on a discrete case for a further comparison.
7.3 Realworld dynamic domains
The experiments shown so far are generated from synthetic data. We also ran experiments with realworld data (Q4). We considered two tracking scenarios called Packaging and Learnsize; the latter is used to evaluate parameter learning. The objects have markers for an easy detection with a camera (Fig. 14).
7.3.1 Packaging scenario
 1.
if an object is on top of another object, it cannot fall down;
 2.
if there are no objects under an object, the object will fall down until it collides with another object or the table;
 3.
an object may fall inside the box only if it is on the box in the previous step, is smaller than the box and the box is openside up;
 4.
if an object is inside a box it remains in the box and its position follows that of the box as long as it is openside up;
 5.
if the box is rotated upside down the objects inside will fall down with a certain probability.
In this scenario we can consider queries such as how many objects there are in the box with the respective probability for each object, or if there is a blue cube in the red box. The second query would be the conjunction: \(\mathtt{{type(A,cube),color(A,blue),}} \mathtt{{inside_t(A,B)}}{\sim }=\mathtt{{true,type(B,box),color(B,red)}}\). The answer is the probability that the query is true. Alternatively we can list all object pairs \(\mathtt{{ (A,B)}}\) that satisfy the query with the respective probability.
7.4 Learnsize scenario
Learnsize scenario results
Algorithm  Correct  Avg error (cm)  Time per sample (ms) 

Artificial dynamics  27/30  0.7  1.6 
Storvik’s filter variation  23/30  1.3  2.4 
8 Related work
8.1 Frameworks
In this section we will review related frameworks for static and then dynamic inference.
The proposed static inference is related to BLOG (Milch et al. 2005) inference and to MonteCarlo inference used in ProbLog (Kimmig et al. 2008). As discussed in Sect. 7.2 BLOG is based on LW and lazy instantiation as EvalSampleQuery. However, BLOG exploits LW only for simple evidence statements, thus it performs worse than DC with complex queries described in Sect. 3.3.2. Furthermore, many probabilistic languages [e.g., Anglican (Wood et al. 2014), Church (Goodman et al. 2008) and BLOG] are not always able to give an answer for complex queries with evidence containing continuous variables as shown in the experiments (for BLOG). In contrast, DC gives meaningful answers in those cases exploiting LW in a larger set of cases.
For dynamic inference, DCPF is related to probabilistic programming languages such as BLOG, Church (Goodman et al. 2008), ProbLog (Kimmig et al. 2008), and the Distributional Clauses of Gutmann et al. (2011). While these languages are expressive enough to be used for modeling dynamic relational domains, these languages do not support explicitly filtering (BLOG excluded), which makes inference prohibitively slow or unreliable for dynamic models. Also worth mentioning is firstorder logical filtering (e.g., see Shirazi and Amir 2011) the logical deterministic counterpart of probabilistic filtering. This method can inspire further DCPF extensions, nonetheless the absence of a probabilistic framework and continuous distributions make them less suitable for the range of applications considered in this paper.
There exist probabilistic programming approaches for temporal models. A variant of BLOG for filtering in dynamic domains (de Salvo Braz et al. 2008) has been proposed, it instantiates the variables needed for inference as BLOG. However, as discussed in Sect. 7.2, DBLOG does not currently implement Step (2) of our filtering algorithm. This requires one to manually query all the variables that dseparate from the past or at least those that might be relevant for the given query. In contrast, DCPF automatically determines which variable to samples to guarantee dseparation, exploiting context specific independences. This avoids backinstantiation with the possibility to use lifted beliefs update and precomputed beliefs. Furthermore, all the considerations about LW in complex queries and continuous evidence are valid for the dynamic case.
Logical HMMs (Kersting et al. 2006) employ logical atoms as observations and states and hence, their expressivity is more limited. The lifted relational Kalman filter (Choi et al. 2011), performs efficient lifted exact inference for continuous dynamic domains, but it assumes linear Gaussian models. The relational particle filter of Manfredotti et al. (2010) cannot handle partial samples. Finally, the approaches that are most similar to ours are those of Zettlemoyer et al. (2007) and Probabilistic Relational Action Model (PRAM) (Hajishirzi and Amir 2008). The former employs firstorder formulas to represent a set of states called hypothesis; these are similar to our partial worlds in that they represent a potentially infinite number of states. The key difference is that our approach explicitly defines random variables, (in)dependence assumptions, and their conditional distributions in relationship to other random variables, which allows us to efficiently compute the distribution of a random variable that needs to be sampled and added to the sample. In PRAM the filtering problem is converted into a deterministic firstorder logic problem that can be solved using progression, regression and sampling. PRAM is mainly suited for relational domains, that are inherently discrete and binary. In addition, PRAM performs regression of a formula from time t to time 0 that implies performance issues as previously discussed.
Furthermore, none of the frameworks of Thon et al. (2011), Kersting et al. (2006), Zettlemoyer et al. (2007), Hajishirzi and Amir (2008), Natarajan et al. (2008) supports continuous random variables (other than through discretization), therefore these techniques cannot deal with realworld applications in robotics. Discretization is not always a good solution, and it can dramatically increase the number of states, therefore it is unclear whether these algorithms would maintain good performance in such cases. Finally, their firstorder logic representation allows discrete and fixed probabilities (for the transition and measurement model), instead DCPF provides a flexible language to represent continuous and discrete distributions that can be parameterized by other random variables or logical variables used in the body. This allows a compact model and faster inference.
Several improvements to classical particle filtering have been proposed, such as Rao–Blackwellization (Casella and Robert 1996) and Factored Particle Filtering (Pfeffer et al. 2009). The first method has been exploited in our approach, but further improvements are possible. Factored Particle Filters cluster the state space reducing the variance and improving the accuracy. These methods are complementary to our work and could be adapted in future work.
8.2 Applications
Some state estimation applications with a relational representation have been proposed. The relational particle filter of Cattelani et al. (2012) uses relations such as ‘walking together’ in people tracking to improve prediction and the tracking process. They divide the state in two sets: object attributes and relations, making some assumptions to speed up inference. In their approach a relation can be true or false. In contrast, our approach does not make a real distinction between attributes and relations, indeed, each random variable has a relational representation, regardless of the distribution (binary, discrete or continuous). This allows parametrization and template definition for any kind of random variable. Furthermore, our language and inference algorithm are more general, keeping inference relatively fast. In addition, it is not clear if they can support partial states and integrate background knowledge while keeping good performance. A relational representation has been used in MeyerDelius et al. (2008) for situation characterization over time. However, this work is based on HMMs and uses only binary relations (true or false). Interesting works have been proposed (Beetz et al.; Tenorth and Beetz 2009) for manipulation tasks exploiting a relational representation. Those works integrate relational knowledge about the world to reason about the objects and perform complex tasks. For probabilistic inference and belief update they use MLNs (Markov Logic Networks). However, MLNs are arguably less efficient for filtering inference because they are undirected models that might require MCMC or the computation of the partition function at each step, even though recent optimizations have been proposed (Papai et al. 2012).
9 Conclusions
We proposed a flexible representation for hybrid relational domains and provided an efficient inference algorithm for static and dynamic models. This framework exploits the relational representation and (context specific) independence assumptions to reduce the sample size (through partial worlds) and the inference cost.
The proposed static algorithm EvalSampleQuery exploits LW in a wider range of cases with respect to systems such as BLOG, and supports complex queries with continuous variables, for which most related frameworks fail. These features are also valid for dynamic domains, where DCPF calls EvalSampleQuery during filtering. At the same time, DCPF avoids backinstantiation to bound the space complexity and reduce time performance variability. This makes DCPF particularly suited for online applications.
One of the advantages of a relational framework like DCPF is the flexibility and generality of the model with respect to a particular situation. Indeed, whenever a new object appears all the respective properties and relations with other objects are implicitly defined (but not necessary computed and added to the sample). In addition, the expressivity of the language helps to bridge the gap between robotics and the highlevel symbolic representation used in Artificial Intelligence.
The static algorithm EvalSampleQuery and the DCPF filtering were empirically evaluated and applied in several synthetic and realworld scenarios. The results show that EvalSampleQuery outperforms naive MC and BLOG in static domains. For dynamic domains DCPF outperforms the classical particle filter and DBLOG for a small number of samples. The DCPF averaged error is lower than the DBLOG error for the same number of samples. Nonetheless, DBLOG seems to be faster for a large number of samples, which might be caused by implementation reasons.
The object tracking experiments show that DCPF is promising for robotics applications. The overall performance is acceptable, but could be improved to scale well with highdimensional states. Indeed, each sample represents the entire state, therefore inference can be computationally intensive for a high number of objects and relations. Nonetheless, DCPF exploits the structure of the model and partial samples to speed up inference and improve the performance.
Finally, both learning strategies tested in this framework perform reasonably well for a limited number of parameters. More sophisticated strategies and offline methods need to be investigated for a higher number of parameters.
Footnotes
 1.
The indicator function is 1 when the argument is true, zero otherwise.
 2.
The MonteCarlo approximation replaces a distribution with an empirical distribution given by a set of (weighted) samples. If the distribution is continuous the empirical distributon is described as a sum of Dirac delta centered in the samples.
 3.
 4.
According to the following bug report https://github.com/BayesianLogic/blog/issues/330.
 5.
This is valid for 3 or more objects, with 2 objects it is not possible to learn the objects size from pushes. Indeed, any pair of objects with the same sum of sizes produces the same behaviour, because the distance between the objects during contact is the same.
References
 Andrieu, C., Doucet, A., & Tadic, V. B. (2005). Online parameter estimation in general statespace models. In Proceedings of the 44th IEEE conference on decision and control, 2005 and 2005 European control conference (CDCECC ’05), pp. 332–337.Google Scholar
 Apt, K. (1997). From logic programming to Prolog. Upper Saddle River: PrenticeHall international series in computer science. Prentice Hall.Google Scholar
 Bancilhon, F., & Ramakrishnan, R. (1986). An amateur’s introduction to recursive query processing strategies. SIGMOD Record, 15, 16–51.CrossRefGoogle Scholar
 Beetz, M., Jain, D., Mosenlechner, L., Tenorth, M., Kunze, L., & Blodow, N., et al. (2012) Cognitionenabled autonomous robot control for the realization of home chore task intelligence. Proceedings of the IEEE, 100(8), 2454–2471.Google Scholar
 Carvalho, C. M., Johannes, M. S., Lopes, H. F., & Polson, N. G. (2010). Particle learning and smoothing. Statistical Science, 25(1), 88–106.MathSciNetCrossRefMATHGoogle Scholar
 Carvalho, C. M., Lopes, H. F., Polson, N. G., & Taddy, M. A. (2010). Particle learning for general mixtures. Bayesian Analysis, 5(4), 709–740.MathSciNetCrossRefMATHGoogle Scholar
 Casella, G., & Robert, C. P. (1996). RaoBlackwellisation of sampling schemes. Biometrika, 83(1), 81–94.MathSciNetCrossRefMATHGoogle Scholar
 Cattelani, L., Manfredotti, C., & Messina, E. (2012). A particle filtering approach for tracking an unknown number of objects with dynamic relations. Journal of Mathematical Modelling and Algorithms in Operations Research, 13(1), 3–21.MathSciNetCrossRefMATHGoogle Scholar
 Choi, J., GuzmanRivera, A., & Amir, E. (2011). Lifted relational Kalman filtering. In Proceedings of the 22nd international joint conference on artificial intelligence (IJCAI 2011), pp. 2092–2099.Google Scholar
 De Raedt, L., Frasconi, P., Kersting, K., & Muggleton, S. (Eds.). (2008). Probabilistic inductive logic programming, theory and applications. Lecture notes in computer science, Vol. 4911. Berlin: Springer.Google Scholar
 de Salvo Braz, R., Arora, N., Sudderth, E., & Russell, S. (2008). Openuniverse state estimation with DBLOG. In NIPS workshop on probabilistic programming: Universal languages, systems and applications.Google Scholar
 Doucet, A., de Freitas, N., Murphy, K., & Russell, S. (2000). Raoblackwellised particle filtering for dynamic bayesian networks. In Proceedings of the 16th conference on uncertainty in artificial intelligence (UAI ’00) (pp. 176–183). Burlington: Morgan Kaufmann.Google Scholar
 Doucet, A., Godsill, S., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.CrossRefGoogle Scholar
 Fung, R. M., & Chang, K. (1989). Weighing and integrating evidence for stochastic simulation in Bayesian networks. In Proceedings of the 5th conference on uncertainty in artificial intelligence (UAI 1989).Google Scholar
 Getoor, L., & Taskar, B. (2007). An introduction to statistical relational learning. Cambridge, MA: MIT Press.MATHGoogle Scholar
 Gilks, W. R., & Berzuini, C. (2001). Following a moving targetmonte carlo inference for dynamic bayesian models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(1), 127–146.MathSciNetCrossRefMATHGoogle Scholar
 Goodman, N., Mansinghka, V. K., Roy, D. M., Bonawitz, K., & Tenenbaum, J. B. (2008). Church: A language for generative models. In Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI 2008) (pp. 220–229). Edinburgh: AUAI Press.Google Scholar
 Gutmann, B., Thon, I., Kimmig, A., Bruynooghe, M., & De Raedt, L. (2011). The magic of logical inference in probabilistic programming. Theory and Practice of Logic Programming, 11, 663–680.Google Scholar
 Hajishirzi, H., & Amir, E. (2008). Sampling first order logical particles. In Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI 2008) (pp. 248–255). Edinburgh: AUAI Press.Google Scholar
 Higuchi, T. (2001). Selforganizing time series model. In A. Doucet, N. Freitas, & N. Gordon (Eds.), Sequential Monte Carlo methods in practice, statistics for engineering and information science (pp. 429–444). New York: Springer.CrossRefGoogle Scholar
 Kadane, J. (2011). Principles of uncertainty. Abingdon: Taylor & Francis.CrossRefMATHGoogle Scholar
 Kalman, R. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.CrossRefGoogle Scholar
 Kantas, N., Doucet, A., Singh, S. S., & Maciejowski, J. M. (2009). An overview of sequential monte carlo methods for parameter estimation in general statespace models. In 15th IFAC symposium on system identification, Vol. 15, (pp. 774–785).Google Scholar
 Kersting, K., De Raedt, L., & Raiko, T. (2006). Logical hidden Markov models. Journal of Artificial Intelligence Research, 25, 425–456.MathSciNetMATHGoogle Scholar
 Kimmig, A., Santos Costa, V., Rocha, R., Demoen, B., & De Raedt, L. (2008). On the efficient execution of ProbLog programs. In Proceedings of the 24th conference on logic programming (ICLP 2008). Lecture notes in computer science (Vol. 5366, pp. 175–189). Berlin: Springer.Google Scholar
 Kitagawa, G. (1996). Monte Carlo filter and smoother for nongaussian nonlinear state space models. Journal of Computational and Graphical Statistics, 5(1), 1–25.MathSciNetGoogle Scholar
 Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques—adaptive computation and machine learning. Cambridge, MA: MIT Press.Google Scholar
 Lemieux, C. (2009). Monte Carlo and QuasiMonte Carlo sampling (Vol. 20). Berlin: Springer.MATHGoogle Scholar
 Lloyd, J. (1987). Foundations of logic programming. New York: Springer.CrossRefMATHGoogle Scholar
 Lloyd, J., & Shepherdson, J. (1991). Partial evaluation in logic programming. The Journal of Logic Programming, 11(34), 217–242.MathSciNetCrossRefMATHGoogle Scholar
 Lopes, H. F., Carvalho, C. M., Johannes, M., & Polson, N. G. (2010). Particle learning for sequential bayesian computation. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, M. West (Eds.), Bayesian statistics 9 (Vol. 9, pp. 317–360). Oxford: Oxford University Press.Google Scholar
 Manfredotti, C. E., Fleet, D. J., Hamilton, H. J., & Zilles, S. (2010). Relational particle filtering. NIPS Workshop on Monte Carlo methods for modern applications, December 2010.Google Scholar
 MeyerDelius, D., Plagemann, C., Wichert, G., Feiten, W., Lawitzky, G., & Burgard, W. (2008). A probabilistic relational model for characterizing situations in dynamic multiagent systems. In Data analysis, machine learning and applications. Berlin: Springer.Google Scholar
 Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D., & Kolobov, A. (2005). BLOG: Probabilistic models with unknown objects. In Proceedings of the 19th international joint conference on artificial intelligence (IJCAI 2005), pp. 1352–1359.Google Scholar
 Milch, B., Marthi, B., Sontag, D., Russell, S., Ong, D. L., & Kolobov, A. (2005). Approximate inference for infinite contingent Bayesian networks. In R. G. Cowell, Z. Ghahramani (Eds.), Proceedings of the 10th international workshop on artificial intelligence and statistics (AISTATS 2005) (pp. 238–245). Society for Artificial Intelligence and Statistics.Google Scholar
 Milch, B. C. (2006). Probabilistic models with unknown objects. Ph.D. thesis, University of California, Berkeley.Google Scholar
 Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. Ph.D. thesis, University of California, Berkeley.Google Scholar
 Natarajan, S., Bui, H. H., Tadepalli, P., Kersting, K., & Keen Wong, W. (2008). Logical hierarchical hidden markov models for modeling user activities. In Proceedings of the 18th international conference in inductive logic programming (ILP 2008). Lecture notes in computer science (Vol. 5194, pp. 192–209).Google Scholar
 Ng, B., Peshkin, L., & Pfeffer, A. (2002). Factored particles for scalable monitoring. In Proceedings of the 18th conference on uncertainty in artificial intelligence (UAI2002) (pp. 370–377). Burlington: Morgan Kaufmann.Google Scholar
 Nilsson, U., & Małiszyński, J. (1995). Logic, programming and prolog (2nd ed.). New York: Wiley.Google Scholar
 Nitti, D., Chliveros, G., De Raedt, L., Pateraki, M., Hourdakis, M., & Trahanias, P. (2014a). Application of dynamic distributional clauses for multihypothesis initialization in modelbased object tracking. In 9th International conference on computer vision theory and applications (VISAPP 2014), Vol. 2.Google Scholar
 Nitti, D., De Laet, T., & De Raedt, L. (2013). A particle filter for hybrid relational domains. In Proceedings of the international conference on intelligent robots and systems (IROS 2013), pp. 2764–2771.Google Scholar
 Nitti, D., De Laet, T., & De Raedt, L. (2014b). Distributional clauses particle filter. In Machine learning and knowledge discovery in databases, lecture notes in computer science (Vol. 8726, pp. 504–507). Berlin: Springer.Google Scholar
 Nitti, D., De Laet, T., & De Raedt, L. (2014c). Relational object tracking and learning. In Proceedings of the International conference on robotics and automation (ICRA 2014).Google Scholar
 Owen, A. B. (2013). Monte Carlo theory, methods and examples. http://statweb.stanford.edu/~owen/mc/
 Papai, T., Kautz, H., & Stefankovic, D. (2012). Slice normalized dynamic Markov logic networks. In Advances in neural information processing systems (NIPS 2012) (pp. 1907–1915).Google Scholar
 Perov, Y., Paige, B., & Wood, F. The Indian GPA problem. Retrieved February 23, 2016, from http://www.robots.ox.ac.uk/~fwood/anglican/examples/viewer/?worksheet=indiangpa.
 Pfeffer, A., Das, S., Lawless, D., & Ng, B. (2009). Factored reasoning for monitoring dynamic team and goal formation. Information Fusion, 10(1), 99–106.CrossRefGoogle Scholar
 Pitt, M. K., & Shephard, N. (1999). Filtering via simulation: Auxiliary particle filters. Journal of the American statistical association, 94(446), 590–599.MathSciNetCrossRefMATHGoogle Scholar
 Przymusinski, T. C. (1988). Perfect model semantics. In Proceedings of the 5th international conference on logic programming and symposium (ICLP/SLP 1988), pp. 1081–1096.Google Scholar
 Robert, C., & Casella, G. (2004). Monte Carlo statistical methods. Springer texts in statistics. New York: Springer.CrossRefGoogle Scholar
 Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.MATHGoogle Scholar
 Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th international conference on logic programming (ICLP 1995) (pp. 715–729). Cambridge, MA: MIT Press.Google Scholar
 Shirazi, A., & Amir, E. (2011). Firstorder logical filtering. Artificial Intelligence, 175(1), 193–219.MathSciNetCrossRefMATHGoogle Scholar
 Storvik, G. (2002). Particle filters for statespace models with the presence of unknown static parameters. IEEE Transactions on Signal Processing, 50(2), 281–289.CrossRefGoogle Scholar
 Tenorth, M., & Beetz, M. (2009). KnowRob—Knowledge processing for autonomous personal robots. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS 2009), pp. 4261–4266.Google Scholar
 Thon, I., Landwehr, N., & De Raedt, L. (2011). Stochastic relational processes: Efficient inference and applications. Machine Learning, 82(2), 239–272.Google Scholar
 Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. Cambridge, MA: MIT Press. ISBN: 0262201623.Google Scholar
 Whiteley, N., & Johansen, A. M. (2010). Recent developments in auxiliary particle filtering. Barber, C., & Chiappa, (Eds.), Inference and learning in dynamic models (pp. 38, 39–47). Cambridge: Cambridge University Press.Google Scholar
 Wood, F., van de Meent, J. W., & Mansinghka, V. (2014). A new approach to probabilistic programming inference. In Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS 2014), pp. 1024–1032.Google Scholar
 Zettlemoyer, L. S., Pasula, H. M., & Kaelbling, L. P. (2007). Logical particle filtering. In Proceedings of the Dagstuhl seminar on probabilistic, logical, and relational learning.Google Scholar