kOptimal: a novel approximate inference algorithm for ProbLog
Authors
 First Online:
 Received:
 Revised:
 Accepted:
DOI: 10.1007/s1099401253049
 Cite this article as:
 Renkens, J., Van den Broeck, G. & Nijssen, S. Mach Learn (2012) 89: 215. doi:10.1007/s1099401253049
Abstract
ProbLog is a probabilistic extension of Prolog. Given the complexity of exact inference under ProbLog’s semantics, in many applications in machine learning approximate inference is necessary. Current approximate inference algorithms for ProbLog however require either dealing with large numbers of proofs or do not guarantee a low approximation error. In this paper we introduce a new approximate inference algorithm which addresses these shortcomings. Given a userspecified parameter k, this algorithm approximates the success probability of a query based on at most k proofs and ensures that the calculated probability p is (1−1/e)p ^{∗}≤p≤p ^{∗}, where p ^{∗} is the highest probability that can be calculated based on any set of k proofs. Furthermore a useful feature of the set of calculated proofs is that it is diverse. Our experiments show the utility of the proposed algorithm.
Keywords
Inductive logic programming Statistical relational learning Decision theory Approximative inference1 Introduction
ProbLog (De Raedt et al. 2007) is a probabilistic extension of Prolog. It has been used to solve learning problems in probabilistic networks as well as other types of probabilistic data (De Raedt et al. 2010). The key feature of ProbLog is its distribution semantics. Each fact in a ProbLog program can be annotated with the probability that this fact is true in a random sample from the program. The success probability of a query is equal to the probability that the query succeeds in a sample from the program, where facts are sampled independently from each other. Each such sample is also called a possible world.
The main problem in calculating the success probability of a query in ProbLog is the high computational complexity of exact inference. As multiple proofs for a query can be true in a possible world, we cannot calculate the success probability of a query based on the probabilities of the independent proofs; we need to deal with a disjoint sum problem (De Raedt et al. 2007). This problem becomes worse as the number of proofs grows.
To deal with this computational issue, several approaches have been proposed in the past. De Raedt et al. (2007) proposed to use Binary Decision Diagrams (BDDs) (Bryant 1986) to deal with the disjoint sum problem. BDDs can be seen as a representation of proofs from which the required success probability can be calculated in time polynomial in the size of the BDD. Building a BDD for all proofs can however be intractable. De Raedt et al. showed that for a given desired approximation factor ϵ, an iterative deepening algorithm can be used to approximate the success probability from a subset of proofs. However, to reach reasonable approximation errors in practice, this algorithm still needs to compile large numbers of proofs into a BDD (De Raedt et al. 2007).
A commonly used alternative which does not have this disadvantage is the kbest strategy. In this case, the k most likely proofs are searched for, where k is a userspecified parameter; a BDD is constructed based on these proofs only. Whereas this strategy avoids compiling many proofs, its disadvantage is that one has few guarantees with respect to the quality of the calculated success probability: it is not clear whether any other set of k proofs would achieve a better approximation, or how far the calculated probability is from its true value.
A further disadvantage of kbest is found in parameter learning for ProbLog programs (Gutmann et al. 2008, 2011) and the use of ProbLog in solving probabilistic decision problems (Van den Broeck et al. 2010). An example of such a decision problem is targeted advertising in social networks: in order to reduce advertising cost, one wishes to identify a small subset of nodes in a social network such that the expected number of people reached is maximized. Such problems can be solved by defining an optimization problem on top of a set of BDDs. To ensure that these calculations remain tractable, it is important that the BDDs are as small as possible but still represent the most relevant proofs. As we will show, the kbest strategy selects a set of proofs that is not optimal for this purpose. Intuitively the reason is that the kbest proofs can be highly redundant with respect to each other.
In this paper we propose a new algorithm, koptimal, for finding a set of at most k proofs. The key distinguishing feature with respect to kbest is that it ensures that the set of k proofs found is of provably good quality. In particular, if p ^{∗} is the best probability that can be calculated based on k proofs, our algorithm will not calculate a probability that is worse than (1−1/e)p ^{∗}. At the same time, our algorithm ensures that the resulting set of proofs is diverse, i.e., proofs are less likely to use similar facts; the resulting set of proofs carries more of the probability mass of the exact success probability. We will empirically show that using koptimal leads to significant memory gains and can lead to reduced runtime for similar or better approximations of the true probability. Furthermore we will show that using koptimal for inference in DTProbLog can improve the results over the use of kbest, by reducing runtime for sufficiently high k values.
The remainder of this paper is organized as follows: Sect. 2 introduces ProbLog and DTProbLog and discusses the drawbacks of kbest; Sect. 3 introduces the new algorithm; Sect. 4 proves the quality of the resulting set of proofs; Sect. 5 reports on some experiments and finally Sect. 6 concludes.
2 Background
We will now introduce ProbLog and describe how to compute the probability of a query using BDDs, both exactly and approximately with the kbest algorithm.
2.1 ProbLog
Example 1
Computing the probability of a query q is performed by finding all proofs for the query using SLDresolution in the ProbLog program. Each proof relies on the conjunction of probabilistic facts that need to be true to prove q. From now on we will use the term ‘proof’ also for this conjunction. The disjunction of these conjunctions is a DNF formula that represents all proofs, that is, all conditions under which q is true.
Example 2
This reduces the problem of computing the probability of the query q to that of computing the probability of a DNF formula, i.e. \(\operatorname {P}(q)=\operatorname {P}(\bigvee_{p\in V} p)\), where V is the set of proofs for query q; \(\operatorname {P}(\bigvee_{p\in V}p) = \sum_{F \subseteq \mathcal{GF}} \operatorname {P}(\bigvee_{p\in V}pF) \cdot \operatorname {P}(F)\). However, because the conjunctions in the different proofs are not mutually exclusive, one cannot compute the probability of the DNF formula as the sum of the probabilities of the different conjunctions as this could lead to values larger than one.
Therefore, the next step in the evaluation addresses this disjointsum problem by constructing a Binary Decision Diagram (BDD) (Bryant 1986) that represents the DNF formula. A BDD is an efficient graphical representation for Boolean formulas. It can be seen as a decision tree in which identical subtrees are shared by multiple parents. The probability of a DNF can be calculated from a BDD by traversing the BDD bottomup. Each internal node represents a probabilistic fact; the probability for a node is computed by combining the probabilities of its children; for details see the paper of De Raedt et al. (2007).
2.2 kBest
Computing the success probability of a query as described in Sect. 2.1 can fail for two reasons. First, finding all proofs for the query can be intractable. Second, compiling the set of proofs into a BDD solves a #Pcomplete problem, which quickly becomes intractable when the number of proofs increases. The kbest inference algorithm solves these problems by only considering a fixed number of k proofs in the calculation of the approximate success probability. The effect of ignoring some of the proofs is that the query is evaluated to true only in a subset of the possible worlds where the query should succeed. Therefore, the kbest strategy computes a lower bound on the success probability.
Example 3
The three best proofs in the network of Example 1 are (e(1,2)∧e(2,100)), (e(1,3)∧e(3,50)∧e(50,100)) and (e(1,3)∧e(3,51)∧e(51,100)), with probabilities 0.6^{2}=0.36 and 0.5⋅0.9^{2}=0.405. When the search algorithm reaches node four of the network, p _{ min } is 0.36, but the partial proof that leads to node four in the network only has probability 0.01. Therefore, the algorithm does not continue to search for the proof (e(1,4)∧e(4,52)∧e(52,100)).
kBest’s ability to compile small, approximate BDDs is especially useful in the context of DTProbLog and parameter learning for ProbLog programs (Gutmann et al. 2008, 2011). These algorithms build and store BDDs for a large number of queries. It is essential that these BDDs are small for the algorithms to be effective. Although kbest guarantees small BDDs, it does not guarantee good approximations. When selecting proofs based on their probability, it is impossible to detect redundancy between the proofs. This can result in the selection of a highly redundant set of proofs which do not approximate the probability in an optimal way.
2.3 DTProbLog
 Given:

(1) a set of utility queries \(\mathcal{U}\) where each query has a reward given by a function \(u:\mathcal{U} \rightarrow \mathbb{R}\); (2) a set of boolean decision facts \(\mathcal{D}\), for each of which a value from {0,1} has to be determined; (3) a ProbLog program \(\mathcal{PF}, \mathcal{R}\).
 Find:

a decision for each decision fact \(s:\mathcal{D} \rightarrow \{0,1\}\).
 Such that:

the score given by \(\sum_{g \in \mathcal{U}} \operatorname {P}(g)u(g)\) is maximal.
 Where:

the probabilities are calculated using the ProbLog program \(\mathcal{PF} \cup \{s(d)::dd \in \mathcal{D}\},\mathcal{R}\); note that we can see binary decisions as zero/one probability assignments.
DTProbLog offers multiple algorithms for finding the solution to these problems. One algorithm which gives the globally optimal solution, calculates the BDDs for each of the utility queries and combines these BDDs into an Algebraic Decision Diagram (ADD) (Van den Broeck et al. 2010). From this ADD, the solution can be found in polynomial time. This approach can become intractable when the BDDs for the utility queries become too large. A second algorithm uses a greedy hill climbing algorithm to find an approximate solution. The calculations are sped up by calculating the BDDs for each utility query once, and reusing them for each evaluation of the score. As only the probabilities of the decision facts change, the structure of the BDDs stays the same; after building the BDDs once, the rest of the score evaluations can be done in polynomial time.
For both algorithms, it is important that the BDDs stay small enough. The construction of the ADD quickly becomes intractable when the BDDs become large and even for the approximate algorithm, the BDDs for the goals need the fit in memory. Small BDDs can be obtained using approximative inference algorithms.
3 kOptimal
We will introduce koptimal, which is the main contribution of this paper. It is an approximative algorithm for calculating the success probability, similar to kbest. First the intuition behind the algorithm is explained after which a straightforward implementation is given. Later on, optimizations for the implementation are described which speed up koptimal significantly.
3.1 Intuition
 Given:

the collection V of all possible proofs for a goal and a maximum number k of proofs which can be used.
 Find:

a collection \(A = \operatorname {arg\,max}_{B \subseteq V, B \leq k} \operatorname {P}(\bigvee_{p \in B} p)\).
Example 4
We compare the results obtained by 2best and 2optimal on the network of Example 1. 2Best selects proofs (e(1,3)∧e(3,50)∧e(50,100)) and (e(1,3)∧e(3,51)∧e(51,100)) because they have the highest proof probability. This results in a probability of \(\operatorname {P}(\mathtt {e}(1,3) \land \mathtt {e}(3,50) \land \mathtt {e}(50,100)\lor \mathtt {e}(1,3) \land \mathtt {e}(3,51) \land \mathtt {e}(51,100)) = 0.405\). The first iteration of 2optimal will select (e(1,3)∧e(3,50)∧e(50,100)) because in this iteration, the added probability is equal to the proof probability. In the second iteration however, a different proof will be selected. The added probability of (e(1,2)∧e(2,100)) is higher than the added probability of (e(1,3)∧e(3,51)∧e(51,100)) even though the proof probability is lower. This results in a probability of \(\operatorname {P}(\mathtt {e}(1,3) \land \mathtt {e}(3,50) \land \mathtt {e}(50,100)\lor \mathtt {e}(1,2) \land \mathtt {e}(2,100)) = 0.48195\) which is higher than the one calculated by 2best.
There are several problems that complicate implementing this algorithm. These are addressed in the following section.
3.2 Implementation
Collecting all possible proofs before starting the greedy algorithm is undesirable, as collecting these proofs can be intractable. In kbest this problem is solved by a branch and bound search of the SLDtree. We can use a similar solution in the koptimal greedy algorithm. In each iteration we start a branchandbound search with a modified scoring function. Compared to 1best, we now optimize \(\operatorname {P}(A\cup\{\mathit {pr}\})\) instead of \(\operatorname {P}(\mathit {pr})\). Because for any pair of (partial) proofs pr⊂pr′, \(\operatorname {P}(A\cup\{\mathit {pr}\})\geq \operatorname {P}(A\cup\{\mathit {pr}'\})\), we eliminate branches for which \(\operatorname {P}(A\cup\{\mathit {pr}\})\) is not sufficiently high. Hence, 1best is modified by replacing the calculation of \(\operatorname {P}(pr)\) in lines 11 to 13 of Algorithm 1 with a calculation of \(\operatorname {P}(A \cup \{pr\})  \operatorname {P}(A)\).
Overall, a basic strategy is hence to compile the BDD for the set of proofs A at the beginning of each search iteration. For each visited node of the SLDtree we calculate the added probability, \(\operatorname {P}(A\cup\{p\})\), where p is the partial proof that is used to reach the node. Every calculation of the added probability requires the calculation of n conditional probabilities.
Even though we avoid building a BDD in every node of the search tree, calculating the conditional probability remains a time consuming operation. The next section introduces optimizations to reduce the number of conditional probabilities that need to be calculated.
3.3 Optimizations
A first optimization results in the calculation of only one conditional probability per node of the SLDtree. We can make the following observation: when the added probability for the (partial) proof f _{1}∧⋯∧f _{ n }∧f _{ n+1} needs to be calculated, the added probability for f _{1}∧⋯∧f _{ n } has already been calculated in the parent of the node. This means that the conditional probabilities \(\{\operatorname {P}(\mathit {dnf}f_{1} \land \dots \land f_{i1} \land\lnot f_{i}) 1 \leq i \leq n\}\) have already been calculated and the only conditional probability that needs to calculated in order to calculate the added probability is \(\operatorname {P}(\mathit {dnf}f_{1} \land \dots \land f_{n} \land\lnot f_{n+1})\).
This modified strategy requires more extensive bookkeeping, as we can no longer assume that we have calculated an added probability for the parent of a complete proof in the search tree. By caching the results of intermediate added probability calculations, however, we can still share the calculations between multiple complete proofs.
Furthermore we can optimize the calculation by changing the order in which facts in a proof are considered. When the fact f _{ n } does not occur in the BDD, \(\operatorname {P}(\mathit {dnf}f_{1} \land \dots \land f_{n}) = \operatorname {P}(\mathit {dnf}f_{1} \land \dots \land f_{n1})\). Moving these facts to the front of the proof will result in the first conditional probabilities being equal to \(\operatorname {P}(dnf)\), which has been calculated in a previous iteration.
As in any branchandbound algorithm it is important to quickly find a good bound on the best added probability that can be reached. In koptimal, this is solved in the following way. In the first iteration we reuse the strategy of kbest. In further iterations, the bound is initialized based on the proofs that have been encountered in previous iterations, as follows. In each iteration, whenever the added probability for a complete proof is calculated, the proof as well as its added probability are stored. Before starting the SLDtree search in the next iteration, we traverse these stored proofs in decreasing order of added probability in the previous iteration. For each proof we recalculate its added probability in the new iteration (caching any intermediate results for later use during the SLD search). During this traversal we in practice quickly find increasingly better added probability thresholds, which allow us to eliminate later proofs in the list as well as proofs later in the SLD search. Here we exploit the observation that the added probability for a proof pr can not increase in a later iteration. Although proofs are stored, SLD search is still necessary in each iteration as partial proofs which previously have been pruned can be relevant in later iterations.
3.4 kθOptimal
Both kbest and koptimal suffer from the following problem. Before the approximate probability is calculated, the user needs to specify the number of proofs that have to be used to approximate the probability. When enough proofs are present k proofs will always be used, even when a lot of these are insignificant. However, there is no way of knowing the number of significant proofs; we would need to start kbest with a sufficiently high parameter. This is undesirable as the complexity of computing the BDD for a collection of proofs can increase exponentially with the number of proofs.
In the case of kbest, it is only possible to leave proofs out based on their individual proof probabilities. However, even proofs with a high proof probability can be insignificant when combined with other proofs. Within koptimal, we can detect insignificant proofs by imposing a threshold θ on the added proof probability. We stop the algorithm before k proofs are selected when no more proofs with an added probability higher than θ can be found. The resulting algorithm is called kθoptimal and is obtained by initializing the bound with at least θ in the beginning of each iteration.
Example 5
The network of Example 1 contains 49 proofs for a path between node 1 and node 100. Of these proofs only (e(1,2)∧e(2,100)), (e(1,3)∧e(3,50)∧e(50,100)) and (e(1,3)∧e(3,51)∧e(51,100)) are significant as the other proofs have a probability equal to 0.00001. When we run 10optimal, 10 proofs will be used to construct the BDD. This is not the case with 10θoptimal when θ>0.000001 because the added probability of the insignificant proofs will never be high enough to be selected. As a result, the constructed BDD will be much smaller without calculating a much lower probability.
4 Analysis
First, we would like to recall that calculating the probability for an arbitrary set of proofs can be a hard problem in itself: given that they solve a #P problem, all current algorithms are exponential in k, where k is the size of the set of proofs. To gain a better understanding in the koptimal problem itself, we hence limit ourselves in the remainder of this analysis to instances of the koptimal problem where probabilities can be calculated in polynomial time.
 Given

the collection V of all possible proofs for a goal q in a ProbLog program, a maximum number k of proofs that can be used, and a probability threshold β.
 Decide

is there a collection B⊆V with B≤k and \(\operatorname {P}(\bigvee_{p \in B} p)\geq\beta\)?
Theorem 1
The βkoptimal decision problem for βpolynomial goals is NPcomplete.
Proof
First, we observe that the decision problem is in NP: by assumption, for each subset B we can verify in polynomial time whether a solution is valid. Note that without our assumption, computing the probability of a goal may be exponential in k; hence, in this case the problem is only fixedparameter tractable.
Next, we show that the 3set packing problem, which is known to be NPcomplete (Hazan et al. 2006), can be reduced to the koptimal decision problem for βpolynomial goals. The 3set packing problem is defined as follows. Given is a universe of elements X, and a collection V of subsets of X, where each subset is of size 3. The problem is to decide if there exists a subset V′ of V of size k such that each pair of elements in V′ is disjoint.

For each element x in the universe X we define a probabilistic fact \(\mathtt {\alpha :: p(x)}\) with some probability α which is the same for all facts;

For each subset {x _{1},x _{2},x _{3}} in the collection we define a clauseHence, there are multiple proofs for q, each corresponding to one subset of the collection. In the context of this theorem, we assume that all these proofs are given; note however that they can be calculated in polynomial time.$$\mathtt {q \mbox { : }p(x_1), p(x_2), p(x_3).} $$

If we can find a subset of proofs of size k, all of which have disjoint sets of probabilistic facts, the probability of this set issince each proof has probability α ^{3} and the k proofs are independent. Note that since the probability for independent proofs can be calculated in polynomial time, the goal q is βpolynomial.$$1\bigl(1\alpha^3\bigr)^{k}; $$

If a set of proofs of size k has two proofs which share a probabilistic fact, the probability will be lower than this value.
In many cases the assumption that the probability for a set of proofs can be calculated in polynomial time is not valid; however, our theorem shows that even where this is the case, the koptimal problem is hard to solve.
The theorem hence leads to the following more general observation.
Theorem 2
The koptimal optimization problem is NPhard.
Proof
This follows from the fact that we can use a koptimal optimizer to solve NPcomplete βkoptimal decision problems. □
Nevertheless our algorithm calculates good solutions. The quality of the result of our algorithm follows from the fact that the function \(\operatorname {P}(.)\) is submodular and monotone.
Definition 1
Given a collection S and a function f:2^{ S }→ℝ. Function f is called submodular when ∀A⊆B⊆S,∀x∈S:f(A∪{x})−f(A)≥f(B∪{x})−f(B). Function f is called monotone when ∀A,B⊆S:A⊆B→f(A)≤f(B).
It can easily be seen that the \(\operatorname {P}(.)\) function is submodular and monotone: adding a proof to a larger set of proofs will not increase its impact on the overall probability; more possible worlds will already be covered, and a larger set of proofs will result in a larger probability.
Submodular scoring functions allow for a simple greedy approximation algorithm, as shown by Cornuejols et al. (1977).
Theorem 3
From this theorem it follows that the probability computed by our greedy algorithm for a fixed proof set size is not worse than \(1\frac{1}{e}\) times the probability of the optimal solution.
Clearly, this theorem does not state how far we are from the probability calculated on all proofs. Calculating an upperbound on the probability for all proofs is a significantly more challenging task. One possibility is the following. When the SLD procedure is running in the last iteration, certain incomplete proof nodes will be pruned as their (added) probability is not sufficiently promising. A useful observation is that if we were to add all complete proofs below this node in our proof set, this can never increase the probability more than the added probability for this incomplete proof (see De Raedt et al. 2007 for a similar observation). Hence, by summing up all added probabilities of pruned incomplete proofs, we obtain an upperbound on the probability for all proofs. Note, however, that this bound may very well be higher than 1; this bound is not likely to be as accurate as the lowerbound that is calculated based on the submodularity property.
5 Experiments
Three types of experiments have been conducted. In Sect. 5.1 we evaluate the improvements to the implementation of the koptimal algorithm that are described in Sect. 3.3. We ask the following question: (Q1) Is scoring partial proofs based on their added probability slower than using the proof probability as an upper bound? Section 5.2 discusses the difference in individual performance between kbest, koptimal and kθoptimal. The following questions are answered: (Q2) Do koptimal and kθoptimal achieve better approximations than kbest? (Q3) Do koptimal and kθoptimal calculate smaller BDDs? (Q4) What is the difference in runtime between the different algorithms? To conclude we test the effect of using koptimal and kθoptimal incorporated in more complex inference systems. Section 5.3 investigates the following questions: (Q5) What is the effect on the utility calculated by DTProbLog? (Q6) How does this difference in calculated utility translate to runtime performance?
For all the experiments, the probabilistic network constructed by Ourfali et al. (2007) is used, which contains 15147 bidirectional edges, 5568 unidirectional edges and 5313 nodes. This biological network represents the regulatory system of a yeast cell; biologists are interested in pathways that explain the effect of one protein in this network on the expression of another. For this purpose, the connection probability for many pairs of nodes needs to be calculated. Because of the sheer size of the network, the probability cannot be calculated exactly and needs to be approximated.
5.1 Implementation of koptimal
It was argued in Sect. 3.3 that it is more efficient to use the proof probability of partial proofs instead of the added probability to prune the search tree. The advantage of the proof probability is that computing the heuristic is much more efficient because it does not need to evaluate a BDD. The disadvantage is that less pruning of the search tree is possible, and more partial proofs need to be evaluated. Note that in both cases, the objective function is still the added probability, and both implementations are instances of koptimal.
(Q1) Is scoring partial proofs based on their added probability slower than using the proof probability as an upper bound?
5.2 Comparing different approximations
In this section, we approximate the probability of a path between 5371 pairs of nodes using kbest, koptimal and kθoptimal with k values ranging between one and twenty. More pairs are considered than in the previous section as these experiments have been run with a maximal path length equal to four edges and we are able to connect more pairs of nodes.
(Q2) Do koptimal and kθoptimal achieve better approximations than kbest?
The xvalue is equal to the probability achieved using kbest. The yvalue is equal to the probability achieved using koptimal. Only the pairs that have more than k proofs are shown. When this is not the case, all the proofs are selected and kbest and koptimal achieve the same result.
kOptimal achieves at least as good results as kbest for all queries and all k values. When k is equal to one, there is no difference between kbest and koptimal because when selecting the first proof, the added probability of a proof is exactly the same as the probability of the proof itself. In the other cases, koptimal performs at least as well as kbest. When k is low compared to the number of available proofs (k=6), the difference in calculated probability is the biggest because the selection problem becomes more important. When k is almost equal to the number of proofs (k=16) the remaining proofs are almost completely redundant and it does not matter which proof is left out.
(Q3) Do koptimal and kθoptimal calculate smaller BDDs?
kOptimal also achieves better results than kθoptimal with θ=0.01, so stopping early is not reflected in a lower BDD size as a function of the probability. However, kθoptimal does manage to limit the BDD size for all pairs. For high k values, the average BDD size remains relatively small when we compare it to the BDD size for koptimal and kbest. Hence kθoptimal is useful when small BDDs need to be guaranteed.
(Q4) What is the difference in runtime between the different algorithms?
Figure 5b shows the average computation time and BDD construction time (both in ms) as a function of the average probability for varying k values. Each point represents the averaged results for one k value. When we are using low k values, the time that is needed to compute the BDDs is not dominant and kbest achieves better results due to lower search time. However, with higher k values, the BDD construction time grows exponentially; as koptimal constructs smaller BDDs for similar calculated probabilities, it has an advantage for such values.
5.3 DTProbLog
The previous section showed that koptimal can achieve better results than kbest when approximating the success probability. In Sect. 2.3 DTProbLog is introduced and it is argued that approximating algorithms need to be used when the problem is large. We will compare the use of kbest and koptimal for approximative inference in DTProbLog.
For this purpose the problem described by Ourfali et al. (2007) is implemented in DTProbLog. In this problem a probabilistic network is given, as well as a set of causeeffect pairs; each causeeffect pair consists of two nodes of the network and a sign (+1 or −1), but is not stored as an edge in the network itself. Based on the set of causeeffect pairs, signs (+1 or −1) have to be assigned to the edges in the network. The goal is to maximize the expected number of connected causeeffects pairs, where a path is valid only when the product of the sign of its edges is equal to the sign of the causeeffect pair.
The network and causeeffect pairs are the same as those used in the previous experiments. Inference is done using kbest and koptimal with varying kvalues (1,3,…,19). For the optimization step a random restart greedy hill climbing algorithm is used with five restarts.
(Q5) What is the effect on the utility calculated by DTProbLog?
(Q6) How does this difference in calculated utility translate to runtime performance?
Figure 6b shows the runtime as a function of the kvalue for both koptimal and kbest. The dotted lines show the search time, which is the time used to collect the proofs. The full time shows the total runtime, and includes the execution of an optimization algorithm. It is clear that koptimal has a higher search time and for low kvalues this leads to a higher total runtime for koptimal. For higher k values the search time has a smaller effect on the total runtime and kbest needs more time.
6 Conclusions
We have introduced a new approximate inference mechanism for calculating the success probability of a query in ProbLog. This mechanism uses k proofs to approximate the exact probability of a query. As koptimal iteratively searches for proofs that increase the probability the most, it minimizes the redundancy between the selected proofs. An efficient calculation of this probability was proposed.
We compared the results of koptimal and kbest. These experiments show that koptimal captures a larger part of the success probability with the same number of proofs. Also relative to the size of the BDDs, BDDs that are created using koptimal capture a bigger part of the success probability. Because of this, the run time of koptimal is better when the probability needs to be approximated very accurately. The BDD construction time, which is proportional to the size of the BDD, is dominant in this case.
Furthermore we compared kbest and koptimal within the context of solving decision problems in DTProbLog. These results show that although the quality of the solutions does not differ significantly, the use of koptimal causes lower run time when using high values of k.
To complement the experimental results, a theoretical analysis has shown that the calculated probability p is (1−1/e)p ^{∗}≤p≤p ^{∗}, where p ^{∗} is the highest probability that can be calculated based on any set of k proofs.
We presented an extension of koptimal which avoids adding insignificant proofs. This is easily possible in koptimal as koptimal computes the added probability of a proof. kθOptimal produces fewer proofs, with only a small loss of probability.
In general, koptimal can be seen as a strategy for selecting conjunctions in a formula in disjunctive normal form, such that the probability of the selected conjunctions is maximized. The use of such selection strategies in other contexts, such as other probabilistic logics, is an interesting question for future work. Furthermore, challenges remain in approximating decision problems without building data structures such as BDDs for large collections of proofs.
Acknowledgements
We would like to thank Luc De Raedt for his insightful comments on this work. Joris Renkens is supported by PF10/010 NATAR. Siegfried Nijssen and Guy Van den Broeck are supported by the Research FoundationFlanders (FWOVlaanderen).