Keywords

1 Introduction

The research field of Probabilistic Logic Programming (PLP) [20] aims to reason with logic programs where some of the facts, called probabilistic facts, are considered uncertain [11]. One of the most adopted semantics for these programs, the Distribution Semantics (DS) [21], assigns a meaning to probabilistic logic programs where every world, i.e., a logic program identified by the truth values of probabilistic facts, is required to have a total well-founded model [25].

Probabilistic Answer Set Programming (PASP) [10, 19] extends the capabilities of Answer Set Programming (ASP) [9] and allows, as PLP, the definition of probabilistic facts. With PASP, however, every world is an answer set program and thus may have multiple answer sets. In this case, a semantics that can be adopted is the credal semantics, which assigns a probability range rather than a sharp probability value, as happens with the DS, to a query. This range is defined by a lower and an upper probability.

Maximum-a-Posteriori (MAP) inference is a central topic in machine learning, where the goal is to find, given a set of evidence variables, the most probable value to a subset of the random variables (called query variables). If the set of query variables is the complement of the set of evidence variables, the problem is called Most Probable Explanation (MPE).

In this paper, we propose an algorithm to perform both cautious MAP/MPE and brave MAP/MPE inference in probabilistic answer set programs, where we consider respectively the lower and the upper probability bound induced by the query variables. We test this algorithm on two datasets with different configurations. Moreover, we also compare our algorithm with the clingo’s [13] #maximize statement for the brave MPE task.

The paper is structured as follows: in Sect. 2, we discuss some related works. Section 3 introduces the main concepts of PLP and PASP. Section 4 describes our algorithm to perform brave and cautious MAP/MPE inference in PASP and in Sect. 5 we discuss some experiments to test its performance. Section 6 concludes the paper with some final remarks and possible future works.

2 Related Work

The MAP/MPE task has received relatively small attention in PLP: in [22], the authors introduced an algorithm to compute the MAP/MPE for a given LPAD [26]. The program is converted into a compact form and then the result is computed by analysing it. Similar work can be found in [8]. However, both consider programs where every world has a unique model, so they cannot deal with probabilistic ASP programs, where every world may have multiple models.

The authors of [16] propose a tool to perform inference in ASP programs following the \(\textrm{LP}^{\textrm{MLN}}\) [17] semantics. Differently from them, we adopt a different semantics, the credal semantics [10], that we believe being more general and intuitive for PASP. Moreover, we consider the MAP/MPE task, not discussed in their work.

Inference in PASP has been considered in [24], where the authors introduced the PASOCS solver, but they do not explore the MAP/MPE task. Similar considerations can be applied to [23], where the authors discussed how to perform inference in ProbLog [11] programs under the stable model semantics, but still ignoring MAP/MPE.

3 Background

We assume that the reader is familiar with the basic concepts of Logic Programming [18]. Here we consider the Answer Set Programming (ASP) syntax [9] enriched with aggregate atoms [3]. An aggregate atom is composed by two guards that can be either constants or variables, denoted with \(g_0\) and \(g_1\), two comparison arithmetic operators, \(\delta _0\) and \(\delta _1\), an aggregate function symbol \(\varphi \), and a set of expressions \(\epsilon _0, \dots , \epsilon _n\) where each \(\epsilon _i\) has the form \(t_1, \dots , t_n : F\) and each \(t_i\) is a term whose variables appear in the conjunction of literals F. Given the previous elements, the syntax of an aggregate atom is \(g_0 \delta _0 \ \# \varphi \{\epsilon _0 ; \dots ; \epsilon _n : F\} \ \delta _1 g_1\). An example of aggregate atom is \(\texttt {0<= \#sum\{A : p(A)\} <= 2}\).

We denote a disjunctive rule (or simply rule) with the syntax

figure a

where each hi is an atom and each bi is a literal. The disjunction of atoms at the left of the neck operator (:-) is called head while the conjunction of literals at its right is called body. If the head is empty and the body is not, the rule is called a constraint and if the body is empty and the head is not, the rule is a fact. We restrict ourselves to safe rules, i.e., rules where every variable in the head also appears in a positive literal in the body. Finally, if a rule does not contain variables it is called ground. A program is a finite set of rules.

To provide the definition of answer set, we need to introduce some more concepts. If we consider an answer set program \(\mathcal {P}\), with \(B_\mathcal {P}\) we denote the set of ground atoms that can be constructed with the symbols in \(\mathcal {P}\). \(B_\mathcal {P}\) is also called Herbrand base. An interpretation I of \(\mathcal {P}\) is such that I \(\subset B_\mathcal {P}\). I satisfies a ground rule if at least one head atom is true in it when all the literals in the body are true in it, and it is called a model if it satisfies all the groundings of the rules of \(\mathcal {P}\). The reduct [12] of a ground program \(\mathcal {P}_g\) w.r.t. an interpretation I is obtained by removing from \(\mathcal {P}_g\) the rules where at least one literal in the body is false in I. Finally, an answer set (or stable model) for a program \(\mathcal {P}\) is defined as an interpretation that is a minimal (under set inclusion) model of \(\mathcal {P}_g\). We indicate with \(AS(\mathcal {P})\) the set of all the answer sets of a program \(\mathcal {P}\). Finally, the projective solutions [14] onto a set of ground atoms B are given by the set \(AS_{B}(\mathcal {P}) = \{A \cap B \mid A \in AS(\mathcal {P})\}\).

Probabilistic Logic Programming [20] allows the definition of uncertain data in logic programs. For example, ProbLog [11] allows probabilistic facts. Each probabilistic fact has the form \(\varPi \,{:}{:}\, f\) where \(\varPi \in ]0,1]\) and f is an atom. According to the Distribution Semantics [21], an assignment of truth value, true (\(\top \)) or false (\(\bot \)), for every probabilistic fact \(f_i\) in the program identifies a world w whose probability P(w) can be computed as

$$\begin{aligned} P(w) = \prod _{i \mid f_i = \top } \varPi _i \cdot \prod _{i \mid f_i = \bot } (1 - \varPi _i) \end{aligned}$$
(1)

If we are given a query q, i.e., a conjunction of ground literals, its probability is the sum of the probability of the worlds where the query is true:

$$\begin{aligned} P(q) = \sum _{w \models q} P(w) \end{aligned}$$
(2)

The Distribution Semantics assumes that all the probabilistic facts are independent and that every world is a logic program with a two-valued well-founded model [25]. However, when we consider Probabilistic Answer Set Programming, the latter condition usually does not hold. For PASP, we consider here the credal semantics (CS) [10, 19]. Under the CS, every query q is associated with a probability interval defined by a lower bound \(\underline{\textrm{P}}(q)\) and an upper bound \(\overline{\textrm{P}}(q)\). A world contributes to the upper probability if the query is present in at least one of its answer sets and contributes to the lower probability if the query is present in all its answer sets. In formulas,

$$\begin{aligned} \overline{\textrm{P}}(q) = \sum _{w_i \mid \exists m \in AS(w_i), \ m \models q} P(w_i) \end{aligned}$$
$$\begin{aligned} \underline{\textrm{P}}(q) = \sum _{w_i \mid |AS(w_i)| > 0 \ \wedge \ m \in AS(w_i), \ m \models q} P(w_i) \end{aligned}$$

These formulas are valid only if every world has at least one answer set, so in this paper we consider only programs that satisfy this requirement. If every world has exactly one answer set, the CS coincides with the DS and the query has a sharp probability value. Consider the following program.

Example 1

Gold example

figure b
Table 1. Worlds for Example 1. Predicate ‘g’ stands for gold. Column ‘mq’ indicates whether there is at least one model of the world where the query valuable(1) is true and column ‘mnq’ indicates whether there is at least one model of the world where the query is false.

The first three lines introduce three probabilistic facts gold/1 indicating that the objects identified with 1, 2, and 3 could be made of gold with different probabilities. Line 4 states that an object made of gold may be valuable or not. Line 5 represents a constraint saying that 60% of the objects made of gold are valuable. This program has \(2^3 = 8\) worlds listed in Table 1. If we consider the query q valuable(1), \(\underline{\textrm{P}}(q) = 0.158\) (corresponding to \(P(w_4) + P(w_5) + P(w_6)\)) and \(\overline{\textrm{P}}(q) = 0.2\) (corresponding to \(P(w_4) + P(w_5) + P(w_6) + P(w_7)\)).

4 MAP Inference in Probabilistic Answer Set Programming

In PLP, the MAP task [8, 22] consists in finding a possible truth value assignment to a subset of probabilistic facts such that a given evidence is satisfied and the sum of the probabilities of the possible worlds identified by the truth values’ choices is maximized. More formally, given a probabilistic logic program, a set of ground atoms e, and a set of query random variables (also called query variables) Q, the goal is to solve

$$ \mathrm {arg\,max}_q P(Q = q \mid e) $$

If all the program variables are query variables, the task is called MPE.

If we consider PASP, every world may have multiple models so the previous definition must be extended. We now introduce the cautious MAP and brave MAP tasks:

Definition 1

Cautious and brave MAP/MPE. Given a PASP program \(\mathcal {P}\), a set of ground atoms e (call it evidence), and a set of query probabilistic facts Q:

  • the cautious MAP problem consists in finding a truth assignment q to query facts Q such that \(\underline{\textrm{P}}(q \mid e)\) is maximized, i.e., in solving:

    $$\begin{aligned} \underline{\textrm{MAP}}(e) = \mathrm {arg\,max}_q \underline{\textrm{P}}(Q = q \mid e) = \mathrm {arg\,max}_q \sum _{w_i \mid \forall m \in AS(w_i), m \models q \wedge m \models e} P(w_i) \end{aligned}$$
  • the brave MAP problem consists in finding a truth assignment q to query facts Q such that \(\overline{\textrm{P}}(q \mid e)\) is maximized, i.e., in solving:

    $$\begin{aligned} \overline{\textrm{MAP}}(e) = \mathrm {arg\,max}_q \overline{\textrm{P}}(Q = q \mid e) = \mathrm {arg\,max}_q \sum _{w_i \mid \exists m \in AS(w_i), m \models q \wedge m \models e} P(w_i) \end{aligned}$$

The definition of cautious and brave MPE inference for a query e, denoted with \(\underline{\textrm{MPE}}(e)\) and \(\overline{\textrm{MPE}}(e)\) respectively, is similar.

Note that this task is different from computing the conditional probability of a query given evidence. Given the previous definitions, for a query e we have that \(P(\underline{\textrm{MAP}}(e)) \le P(\overline{\textrm{MAP}}(e))\) and \(P(\underline{\textrm{MPE}}(e)) \le P(\overline{\textrm{MPE}}(e))\).

If we consider all the three probabilistic facts gold/1 of Example 1 as query variables (denoted by prepending the functor map), the cautious MPE state (all the probabilistic facts are query variables) for the query valuable(1) is given by {gold(1), not gold(2), gold(3)} with an associated probability of 0.098 (world 5 of Table 1). With not gold(2) we indicate that the probabilistic fact gold(2) should be false. The same state is also the brave MPE state. The cautious MAP/MPE and the brave MAP/MPE state do not necessarily coincide. For example, if we consider gold(1) and gold(3) as query variables, the cautious MAP state for the evidence valuable(1) is {gold(1), not gold(3)} with a probability of 0.06 (sum of the probabilities of the worlds 4 and 6 of Table 1) while the brave MAP state is {gold(1), gold(3)} with a probability of 0.14 (sum of the probabilities of the worlds 5 and 7 of Table 1). Finally, there can be multiple cautious/brave MAP/MPE states. If we consider again Example 1 but with all the probabilities set to 0.5 and all the three probabilistic facts as query variables, there are 3 cautious MPE states for the query valuable(1), all with an associated probability of 0.125: {gold(1), gold(2), not gold(3)}, {gold(1), not gold(2), gold(3)}, and {gold(1), not gold(2), not gold(3)}.

4.1 Algorithm

To solve the cautious/brave MAP/MPE taskFootnote 1, we developed an algorithm that works in two steps: first, it translates the PASP program into an ASP program by rewriting probabilistic facts and query variables into ASP choice rules. It is shown in Algorithm 1 and it proceeds as follows: first, the function ConvertVariables converts probabilistic facts and query variables into an ASP representation. Every probabilistic fact p::f and every query variable map p::f (note that f may also have arguments) is transformed into 0{f}1. Moreover, we add the rule not_f:- not f. Function ComputeMinimalSet [5] extracts the minimal set of probabilistic facts by computing the cautious consequences (intersection of all models). The facts in this set must always be true, so we can remove the choices for them and fix their value. For every element in this set, we add a constraint imposing that it must be true (line 5). This is possible since every world is required to have at least one answer set. Now, if we want to perform brave MAP (i.e., considering the upper probability) given an evidence e, we insert the rule :- not e (a constraint imposing that the evidence must always be true) to the program and project the solutions on the probabilistic facts (line 9). Otherwise, if we consider cautious MAP (lower probability), we add the rules q:- e and nq:- not e and still project the solutions on the atoms q/0 and nq/0 (line 12). Finally, we extract every world and its contribution to the probability with the function ComputeContribution and identify the MAP state (function ComputeMAPState).

To better understand how the algorithm works, consider the program shown in Example 1 with gold(1) and gold(3) as query variables and valuable(1) as evidence. After the execution of function ConvertVariables, the probabilistic fact and the two query variables become 0{gold(2)}1, 0{gold(1)}1, and 0{gold(3)}1. The minimal set of atoms, obtained by computing the cautious consequences on the converted program with an additional rule :- not valuable(1), contains gold(1), so we add the constraint :- not gold(1) to the program. If we consider brave MAP, by adding again :- not valuable(1) to the program and projecting the solutions on the probabilistic facts (function ProjectSolutions, line 9), we get 4 answer sets:

  • AS1 = {gold(1) not_gold(2) not_gold(3)},

  • AS2 = {gold(1) not_gold(2) gold(3)},

  • AS3 = {gold(1) gold(2) not_gold(3)}, and

  • AS4 = {gold(1) gold(2) gold(3)},

where with not_gold(i) we indicate that the probabilistic fact or query variable is not selected. These four answer sets (worlds) have respectively probability \(0.2 \cdot (1 - 0.3) \cdot (1 - 0.7) = 0.042\), \(0.2 \cdot (1 - 0.3) \cdot 0.7 = 0.098\), \(0.2 \cdot 0.3 \cdot (1 - 0.7) = 0.018\), and \(0.2 \cdot 0.3 \cdot 0.7 = 0.042\), that are computed with the function ComputeContribution. Finally, if we group these answer sets by query variables (function ComputeMAPState), we get two sets representing two different MAP states: MAP1 = {AS1, AS3} (gold(1) and not_gold(3)) and MAP2 = {AS2, AS4} (gold(1) and gold(3)). MAP1 has probability \(0.042 + 0.018 = 0.06\) while MAP2 has probability \(0.098 + 0.042 = 0.14\) so MAP2 is selected as MAP state since it gives the highest upper probability for the evidence valuable(1).

If we consider instead cautious MAP, the process in analogous, but we cannot add the constraint :- not valuable(1) since we need to consider the lower probability: in this case, a world contributes to the lower probability if the evidence is true in every answer set. If we add the constraint imposing that the evidence must be true in every answer set, we cannot identify the worlds that have at least one answer set where the evidence is false (and thus do not contribute to the lower probability). We now get 5 answer sets:

  • {gold(1) gold(2) gold(3) nq},

  • {gold(1) gold(2) gold(3) q},

  • {gold(1) gold(2) not_gold(3) q},

  • {gold(1) not_gold(2) gold(3) q}, and

  • {gold(1) not_gold(2) not_gold(3) q}.

The world identified by the first two answer sets is the same (all the three variables true) but in the first there is nq and in the second q. Thus, the first answer set indicates that there is at least one answer set of this world where the query is false, so it does not contribute to the lower probability (and can be discarded). For the remaining three worlds there is only one answer set each and it has q inside, so they contribute to both the lower and the upper probability. By applying, as before, functions ComputeContribution and then ComputeMAPState, we get {gold(1), not gold(3)} as MAP state (third and fifth answer set) with an associated probability of \(0.2 \cdot 0.3 \cdot (1 - 0.7) + 0.2 \cdot (1 - 0.3) \cdot (1 - 0.7) = 0.06\).

For both brave and cautious MAP tasks we need to generate at worst \(2^n\) answer sets, where n is the number of probabilistic facts, thus the algorithm is exponential in n. The reason is that we need to know if there is at least one answer set for every world where the query is true for the brave MAP and if in all the models for every world the query is true for cautious MAP. However, the number of generated models for brave MAP is usually smaller than the number of generated models for cautious MAP, due to the additional constraint removing the models where the query is false. However, this additional constraint plus possibly the constraints given by the elements in the minimal set of atoms does not reduce the complexity of the task.

figure c

We propose another possible encoding for the brave MPE task. For each query variable map p::f, we add: a rule 0{f}1, a rule f(lp):- f and a rule not_f(nlp):- not f. lp is given by \(10^n \cdot log(\texttt{p})\) and nlp is given by \(10^n \cdot log(1 - \texttt{p})\), where n is an integer that denotes its scale. The multiplications by \(10^n\) are needed since ASP does not handle floating points. For example, if we set n to 3, the fact 0.2::gold(1) of Example 1 is expanded in: 0{gold(1)}1, gold(1,-1609):- gold(1), and not_gold(1,-223):- not gold(1), where \(10^3 \cdot log(0.2) = -1609\) and \(10^3 \cdot log(0.8) = -223\). With this log-encoding, we can leverage the property \(log(a \cdot b) = log(a) + log(b)\) and thus use the #sum aggregate. By multiplying by \(10^n\), it is not straightforward to obtain the original probabilities once we have the brave MPE state. However, once we get the combination of variables in this state, we can simply look up the initial probabilities in the program. Finally, since we have the (converted) probability as argument of the atoms, we can use the clingo [13] #maximize to find the combination of query variables resulting in the brave MPE state. If we consider again Example 1, with all the probabilistic facts converted as previously described, we can compute the brave MPE state with #maximize{ P : wp(P) } where wp/1 is defined as

figure d

This is a naive encoding that requires the enumeration of all the answer sets. An alternative ASP encoding, we call it improved, for the solution of the brave MPE task for the same example, that does not require the enumeration of all the answer sets, is #maximize{X,Y:gold(Y,X); X,Y:not_gold(Y,X)}. In the next section, we test our algorithm for cautious and brave MAP and MPE and compare the execution time between our brave MPE proposal and the clingo #maximize statement.

5 Experiments

We implemented the algorithm in Python and we used the clingo APIs [13] to compute the answer setsFootnote 2. To test the performance, we ran some experiments on a computer with Intel® Xeon® E5-2630v3 running at 2.40 GHz with 8 Gb of RAM and a time limit of 8 h. Execution times are computed with the bash command time. The reported values are from the real field.

The first dataset, gold, contains a set of programs with the structure of Example 1. The size of a program is given by the number of probabilistic facts gold/1. Example 1 has size 3. For the MAP task, 50% of the gold/1 facts are considered query variables. We randomly set the probability of probabilistic facts. The query is valuable(1). Results are shown in Fig. 1a. We removed the results for size less than 19 since their execution times were negligible. The computation of the brave MAP state seems the fastest one, followed by the brave MPE state. This is due to the additional constraint inserted into the program, which removes some of the possible answer sets. Cautious MAP and cautious MPE have comparable execution times. In all the cases, for size greater than 25 we get a memory error.

The second dataset, smoke, describes a network of friends where some of them smoke. An example of program of size (number of people) 4 is:

figure e

A person X smokes if she has at least one friend Y that smokes. The constraint imposes that at least 80% of the people smoke. The goal is to compute the MAP/MPE state for the query smokes(n) where n is the number of people involved (here 4). Half of the people of the network certainly smoke. If the number of people is odd, we round the result to the next integer. As before, for the MAP experiments, 50% of e/2 facts are query The number of probabilistic facts follows a Barabási-Albert preferential attachment model generated with the networkx [15] Python package. We set as initial number of nodes of the graph (n) the size of the instance and as the number of edges that connect a new node to an existing one (m) 2. Results are shown in Fig. 1b. As for the gold dataset, also here brave MAP and brave MPE seem the fastest, and their execution times are similar (the red and black curves in the plot overlap). In all cases, for size greater than 14 we get a memory error.

In a second set of experiments we verified whether and how the execution time of the algorithm varies when there is an increasing number of MAP/MPE states. To do this, we generated two versions of the gold dataset, one with random probabilities and one with all the probabilities set to 0.5. The remaining parts of the programs are equal to Example 1. Figure 2a shows the execution times of the cautious and brave MAP and MPE task on the dataset with all the probabilities set to 0.5. As before, brave MAP/MPE are the fastest. Also here, datasets with size larger than 25 cause a memory error, except for brave MPE that stops at size 23. Execution times for cautious MPE/MAP are almost identical. In Fig. 2b we compare the two versions of the datasets on the brave MPE task. Brave MAP with all probabilities set to 0.5 and brave MPE with random probabilities seem to take the same time to complete. Execution times for random probabilities are slightly smaller since there is usually only one MAP/MPE state in this case. Moreover, the MAP/MPE task where all the probabilities are equal gives a memory error starting from size 24, while, when the probabilities are all different, we get a memory error starting from size 26. A similar trend (exponential) was observed in the case of cautious MAP/MPE, but with the same differences found in Fig. 2a.

Lastly, we compared our algorithm with the #maximize statement of clingo on the brave MPE task for the gold dataset. As before, we generate a set of programs with random probabilities and a set of programs with all the probabilities set to 0.5. For a fair comparison, we set all the elements of the minimal set of atoms to be true in the program that will use the clingo statement and we add the constraint imposing that the query must be true. We ran two tests: one that outputs only one brave MPE state (even if there may be more) and one that outputs all the states, by using the flag –opt-mode=optN. We only considered the naive encoding, since the improved one is order of magnitude faster than the other and than our tool. For example, with 30 probabilistic facts and the improved encoding, the result is computed in a fraction of a second. Results in Fig. 3a show that the execution time for the computation of the brave MPE state oscillates when we want only one solution when probabilities are all equals. The computation of all the solutions when the probabilities are all set to 0.5 is the fastest one. For random probabilities, in both cases (the two curves overlap) the programs of size larger than 12 give a memory error. Figure 3b shows that clingo’s #maximize statement is slower than our algorithm but it can handle larger instances when we want to compute all the solutions of the brave MPE task when all the probabilities of the states are equal (red and yellow curves). This may be due to a better memory management of the program and a possibly better search strategy. Moreover, the computation of 1 MPE state in clingo (blue curve) stops for the time limit, rather than the memory limit as the others.

Fig. 1.
figure 1

Results for cautious and brave MAP and MPE tasks for the gold and smoke datasets in terms of inference time as the program size (number of probabilistic facts) increases.

Fig. 2.
figure 2

Comparisons between the two gold dataset versions.

Fig. 3.
figure 3

Results for the brave MPE task computed with clingo’s #maximize statement using the naive encoding and comparison with our algorithm. ‘1’ means that we compute only 1 solution while ‘all’ means that we compute all the solutions.

6 Conclusions

In this paper, we proposed the concepts of cautious and brave MAP/MPE inference in probabilistic answer set programming and developed an algorithm to solve these tasks. We ran some experiments on multiple datasets and we obtained that, generally, cautious MAP/MPE is slower than brave MAP/MPE, due to the necessity to enumerate all the possible answer sets needed to compute the lower probability. We also proposed two alternative encodings for the brave MPE task and compare the clingo #maximize statement with our approach. The encoding that does not require the enumeration of all the answer sets is order of magnitude faster than the other and than our tool. However, if we consider the naive encoding, when all the probabilities are set to 0.5, clingo is slower than our algorithm but it seems to be able to solve larger instances with less memory requirements. In the future, we plan to test other ASP solvers such as WASP [1, 2], adopt approximate algorithms based on sampling [6, 7], and consider the concept of abduction [4] in PASP.