Imprecise probabilistic query answering using measures of ignorance and degree of satisfaction

In conditional probabilistic logic programming, given a query, the two most common forms for answering the query are either a probability interval or a precise probability obtained by using the maximum entropy principle. The former can be noninformative (e.g., interval [0, 1]) and the reliability of the latter is questionable when the priori knowledge is imprecise. To address this problem, in this paper, we propose some methods to quantitatively measure if a probability interval or a single probability is sufficient for answering a query. We first propose an approach to measuring the ignorance of a probabilistic logic program with respect to a query. The measure of ignorance (w.r.t. a query) reflects how reliable a precise probability for the query can be and a high value of ignorance suggests that a single probability is not suitable for the query. We then propose a method to measure the probability that the exact probability of a query falls in a given interval, e.g., a second order probability. We call it the degree of satisfaction. If the degree of satisfaction is high enough w.r.t. the query, then the given interval can be accepted as the answer to the query. We also prove our measures satisfy many properties and we use a case study to demonstrate the significance of the measures.


Introduction
Probabilistic knowledge presents in many real-world applications. Typical examples include medical expert systems, engineering experiments modelling and analysis, etc. Probabilistic logics have been intensively studied in the literature. One important element of many formal languages for representing probabilistic knowledge is the interval restriction for conditional probabilities, also called conditional constraints [12]. Extensive work on probabilistic reasoning about propositional conditional constraints have been carried out (e.g., [6,7]).
Logic programming is a well established knowledge representation and reasoning formalism in artificial intelligence and deductive databases. The need for representing uncertainty in the logic programming framework is already reported by a great number of publications [1,2,5,8,12,17,18,20] etc.
These probabilistic logic programs have been designed from different perspectives and have different syntactic forms and semantics. In conditional probabilistic logic programming [4,14], knowledge is represented by interval restrictions for probabilities on conditional events, which is in the form of (ψ|φ) [l, u]. A probabilistic conditional event (ψ|φ) [l, u] is interpreted as given φ, the probability of ψ falls in the interval [l, u] where [l, u] ⊆ [0, 1].
In Causal Probabilistic Logic Programming [1,2], a rule Pr(ψ| c φ) = y is interpreted as "if φ happens, this fact will cause the probability of ψ being y". In Success Probabilistic Logic Programming [8,20], a rule ψ ← φ is associated with a probability p r , which represents the probability that this rule is true. In [3,5,17,18,22], a probabilistic rule is of the form ψ[l 1 , u 1 ] ← φ[l 2 , u 2 ], which means that "if the probability of φ is in the interval [l 2 , u 2 ] then the probability of ψ is in between l 1 and u 1 ". In this logic, we may get a set of intervals for a query, whose actual probability can fall in any of the intervals.
In the field of clinical trials, statistical data summarizing trials results provide some indications about the relationship between medical treatments and their effects. If some rules are derived from these data, the probability of such a rule should not be interpreted as the probability of a rule-head being true causing the rule-tail being true, rather, it is the probability of the effects had the treatment been carried out. For example, rule (mortality|drug_name) [l, u] cannot be interpreted as "using this drug causes death with probability in [l,u]", rather, it says that with the treatment of this drug, the probability of the mortality (of the patient) is in this interval (so that the cause(s) of the death is something else, not this drug). It is clear that clinical trials data are usually in the form of conditional probability, not in the form of ψ[l 1 , u 1 ] ← φ[l 2 , u 2 ]. In addition, the effects of a treatment do not fall into different probability intervals. Therefore, based on our application background in clinical trials, we focus only on conditional probabilistic logic programming in this paper.
Conditional probabilistic logic programming is a framework to represent and reason with imprecise (conditional) probabilistic knowledge. An agent's knowledge is represented by a probabilistic logic program (PLP) which is a set of (conditional) logical formulas with probability intervals. The impreciseness of an agent's knowledge is explicitly represented by assigning a probability interval to every logical formula (representing a conditional event) indicating that the probability of a formula shall be in the given interval.
Given a PLP and a query against the PLP, traditionally, a probability interval is returned as the answer. This interval implies that the true probability of the query shall be within the given interval. However, when this interval is too wide, it provides no useful information. For instance, if a PLP contains knowledge {( fly(X)|bird(X)[0.98, 1], (bird(X)|magpie(X)) [1,1]} then the answer to the query Can a magpie f ly? (i.e., ?( fly(t)|magpie(t))) is a trivial bound [0, 1]. One way to enhance the reasoning power of a PLP is to apply the maximum entropy principle [9]. Based on this principle, a single probability distribution is selected and it is assumed to be the most acceptable one for a query among all possible probability distributions. As a consequence, a precise probability is given for a query even when the agent's original knowledge is imprecise. In the above example, by applying the maximum entropy principle, 0.98 is returned as the answer for the query. Intuitively, accepting a precise probability from (a prior) imprecise knowledge can be risky. When an agent's knowledge is rich enough then a single probability could be reliable, however, when an agent's knowledge is (very) imprecise, an interval is more appropriate than a single probability.
Therefore, in probabilistic logic programming as well as other condition probabilistic logics, there is a question that has not been fully investigated, that is, how useful a probabilistic logic program (PLP) is to answering a given query? This question's importance is twofold: first, it helps to analyze if a PLP is adequate to answer a query and second, if a PLP is sufficiently relevant to a query, then shall a single probability be obtained or shall a probability interval be more suitable? If it is an interval that is more suitable, then how can we get a more meaningful interval (which is satisfactory to a certain extent), rather than a loose bound?
To answer the above questions, in this paper, we propose two concepts, the measure of ignorance and the measure of the degree of satisfaction, w.r.t. a PLP and a query. The former analyzes the impreciseness of the PLP w.r.t. a query, and the latter measures which (tighter) interval is sufficiently informative to answer a query.
The main contributions of this paper are as follows. First, we formally analyze conditional probabilistic logic programs and the maximum entropy principle. Although the assumption of applying the maximum entropy principle is intuitive and widely accepted, such an assumption introduces some new unsupported knowledge, and thus, we need to know to what extent an answer given under the maximum entropy is reliable. Second, we propose a general framework which formally defines the measure of ignorance and the measure of the degree of satisfaction, and the postulates for these two measures. We also provide several consequence relations based on the degree of satisfaction. Third, by using the divergence of probabilistic distribution, we instantiate our framework, and show that the measure of ignorance and the measure of the degree of satisfaction have many desirable properties and provide much useful information about a PLP w.r.t. a query. Fourth, we prove that our framework is an extension of both reasoning with probabilistic logic programs and reasoning under the maximum entropy principle. Finally, we prove that these measures can be viewed as a second-order probability. More specifically, a high level of ignorance means a high probability about the given PLP (an agentŠs knowledge) is towards total absence of knowledge. The degree of satisfaction is the second-order probability about the actual probability for a conditional event given in the query falls in the given interval (provided in the query).
This paper is organized as follows. After a brief reviewing of probabilistic logic programming in Section 2, we formally analyze probabilistic logic programming and the maximum entropy principle, and provide our general framework in Section 3. The instantiation of the framework is given in Section 4. In Section 5, we use examples to illustrate that our framework can provide additional information when assessing query results. In Section 6, we present algorithms used in the implementation of the framework and discuss our experiments with statistical data in the field of clinical trials. After comparing with related works in Section 7, we conclude this paper in Section 8.

Syntax and semantics
We briefly review conditional probabilistic logic programming here [12,14]. We use to denote a finite set of predicate symbols and constant symbols, V to denote a set of object variables, and B to denote a set of bound constants which describe the bound of probabilities where bound constants are in [0,1]. We use a, b , . . . to denote constants from and X, Y . . . to denote object variables from V. An object term t is a constant from or an object variable from V. An atom is of the form p(t 1 , . . . , t k ), where p is a predicate symbol and t 1 , . . . , t k are object terms. We use Greek letters φ, ϕ, ψ, . . . to denote events (or formulas) which are obtained from atoms by logic connectives ∧, ∨, ¬ as usual. A conditional event is of the form (ψ|φ) where ψ and φ are events, and φ is called the antecedent and ψ is called the consequent. A probabilistic formula, denoted as (ψ|φ) [l, u], means that the probability of conditional event ψ|φ is between l and u, where l, u are bound constants. A set of probabilistic formulas is called a conditional probabilistic logic program (PLP), a PLP is denoted as P in the rest of the paper.
A ground term (resp. event, conditional event, probabilistic formula, or PLP) is a term (resp. event, conditional event, probabilistic formula or PLP) that does not contain any object variables in V.
All the constants in form the Herbrand universe, denoted as HU , and the Herbrand base, denoted as H B , is the finite nonempty set of all events constructed from the predicate symbols in and constants in HU . A subset I of H B is called a possible world and I is used to denote the set of all possible worlds over . A function σ that maps each object variable to a constant is called an assignment. It is extended to object terms by σ (c) = c for all constant symbols from . An event φ satisfied by I under σ , denoted by I |= σ φ, is defined inductively as: t 1 ), . . . , σ (t n )) ∈ I; I |= σ φ 1 ∧ φ 2 iff I |= σ φ 1 and I |= σ φ 2 ; I |= σ ¬φ iff I |= σ φ A possible world I satisf ies or is a model of φ, denoted by I |= φ, iff I |= σ φ for all assignment σ . A possible world I satisf ies or is a model of a set of formulas F, denoted by I |= F, iff for all assignment σ and for all φ ∈ F, I |= σ φ. An event φ is a logical consequence of F, denoted as F |= φ, iff all models of F satisfy φ.
In this paper, we use to represent the (ground) tautology, and we have that I |= for all I and all assignments σ ; we use ⊥ to denote ¬ .
Using probabilistic logic programs, we can represent imprecise probabilistic knowledge.
Example 1 Let P be a PLP that contains only one constant tweety and Intuitively, we expect that the probability of ( fly(tweety)|magpie(tweety)) to be high. Table 1 gives all the possible probabilistic models for P, which satisfy all the constraints listed in the table. There are eight Herbrand models (I = {I 0 , . . . , I 8 }) of P. The third column gives the probability for each of them. So, Pr( fly(tweety)|magpie(tweety))) = x 6 x 6 + x 3 .
It is easy to see that there are many probability distributions that satisfy all the constraints induced from the PLP. Let x 1 = x 4 = x 6 = 0, x 5 = 0.98, x 2 = 0, x 3 = 0.02, we can get a probabilistic model Pr 1 for P and Pr 1 ( fly(tweety)|magpie(tweety)) = 0. where we get another probabilistic model Pr 2 for P and Pr 2 ( fly(tweety)|magpie(tweety)) = 1. Therefore, that means we have no idea about whether a magpie can fly.

Maximum entropy principle
One possible method to enhance the reasoning power of probabilistic logic programs is reasoning by the distribution with maximum entropy [9,10]. The principle of maximum entropy is a well known technique to represent probabilistic knowledge. Entropy quantifies the indeterminateness inherent to a distribution Pr by H(Pr) = − I∈I Pr(I)logPr(I). Given a logic program P, the principle of maximum entropy model (or me-model), denoted by me [P], is defined as: is the unique probabilistic interpretation Pr that is a probabilistic model of P and that has the greatest entropy among all the probabilistic models of P.
In Example 1, let Pr be the probability distribution with maximum entropy that satisfies the PLP P, then Pr( fly(tweety)|magpie(tweety)) = 0.98.
we say that (ψ|φ) [l, u] is a tight me-consequence of P, denoted by P |= me tight (ψ|φ) [l, u], iff Applying the principle of maximum entropy solves the problem of inferring noninformative probabilistic intervals. For instance, we have that P |= me tight ( fly(tweety)|magpie(tweety))[0.98, 0.98], where P is as given in Example 1.

Implementation of reasoning with PLP
A logic program can be treated as a set of inequality constraints LC(P, I ) shown in Fig. 1. A solution y I , I ∈ I that satisfies LC(P, I ) gives a probabilistic model of P, that is Pr(I) = y I . The next theorem shows that the reasoning problems can be reduced to the optimization problem subject to the linear constraints LC(P, I ): Theorem 2 [9] Let P be a PLP and P |= ⊥ ← φ. Then P |= tight (ψ|φ) [l, u] Example 2 Considers the PLP given in Example 1. The constraints LC(P, ) induced by P is the following:

A formal analysis of PLP
In information theory, information entropy is a measure of the uncertainty associated with a random variable. Entropy quantifies information in a piece of data. Informally speaking, − log p(X = x i ) means the degree of surprise when one observes that the random variable turns out to be x i . In other words, − log p(X = x i ) reflects the information one receives from the observation. The entropy is an expectation of the information one may receive from a random domain by observing random events. Inspired by this, we define a knowledge entropy, which reflects how much an agent knows the truth value of ψ given φ prior any observations. Informally, the more surprised an agent is by the observation, the more knowledge it learns from the observation, and thus, the less prior knowledge its has about ψ given φ before observing ψ given φ.

Definition 1
Let P be a PLP, and (ψ|φ) be a conditional event. Suppose that Pr is a probabilistic model for P, then the knowledge entropy of inferring ψ from φ under Pr is defined by: It is obvious that K Pr (ψ|φ) = K Pr (¬ψ|φ) and K Pr (ψ|φ) ∈ [0, 1]. Trivially, we have K Pr (φ|φ) = 1 and K Pr (¬φ|φ) = 1, since from Pr, an agent can exactly know the truth value of ψ and its negation given φ.
By extending the above definition, we can define a knowledge measurement for a PLP.

Definition 2
Let P be a PLP, and (ψ|φ) be a conditional event. Suppose that Pr is a probabilistic model for P and Pr(φ) > 0, then the knowledge measurement K P (ψ|φ) is defined by: The measurement K P (ψ|φ) is used to characterize the usefulness of a priori knowledge contained in PLP P for inferring ψ when knowing or observing φ. When ψ or ¬ψ can be inferred from φ under P, P contains all the necessary knowledge of inferring ψ given φ, and we have that minK P (ψ|φ) = 1. When knowledge in P excludes the possibility that the probability of ψ (or ¬ψ) may be 1 given φ, i.e., P ∪ {(ψ|φ)[1, 1]} (or P ∪ {(ψ|φ)[0, 0]}) is unsatisfiable, then the knowledge contained in P cannot fully support ψ given φ, so maxK P (ψ|φ) < 1. Specifically, if the conclusion that ψ is more (or less) likely to be true than ¬ψ (, i.e., the probability of ψ given φ is bigger (or smaller) than ¬ψ given φ), then max K P (ψ|φ) > 0.
We can define a partial order over the set We say that a PLP P is more precise than P w.r.t. ψ|φ, if K P (ψ|φ) K P (ψ|φ), denoted as P k (ψ|φ) P . If minK P (ψ|φ) = maxK P (ψ|φ), then the knowledge contained in P is not sufficient to decide the probability of ψ given φ, that is, the knowledge contained in P about inferring ψ given φ is imprecise. In order to infer the actual probability of ψ given φ under P, we need additional knowledge. Proposition 2 Let P and P be two PLPs. If P |= P then P k (ψ|φ) P for any conditional event (ψ|φ).
Proof Suppose that P |= tight (ψ|φ) [l, u] and P |= tight (ψ|φ) [l , u ]. If P |= P , then This proposition suggests that the consequence relation |= considers all the statements in a PLP while the knowledge measurement focuses only on the knowledge about ψ given φ. maxK P (ψ|φ) reflects the best knowledge consistent with P that can help to infer the truth value of ψ given φ. Since P |= P , P contains more knowledge than P , thus minK P (ψ|φ) ≥ minK P (ψ|φ). It is also possible that this knowledge excludes some other information that is useful to infer the truth value of ψ given φ, thus maxK P (ψ|φ) ≤ maxK P (ψ|φ).
In the view of knowledge entropy, reasoning under the maximum entropy principle implicitly introduces some extra knowledge to enhance the reasoning power of PLP.  This suggests that some knowledge is introduced.
For example, me[P](magpie(tweety)|bird(tweety)) = 0.5, which is not supported by P. The rationale behind the maximum entropy principle is to represent given probabilistic information as faithfully as possible, by maximizing admissible indeterminateness. Specific to this example, with the maximum entropy principle, an assumption that magpies are distinguishable from typical birds is introduced (this assumption cannot be represented in PLP), and actually, such an assumption enriches the knowledge contained in P.
From the above example, we know that reasoning under the maximum entropy cannot be taken for granted as reasoning based on minimal knowledge, but actually based on some implicit knowledge. We should be aware that although the assumption seems intuitive, it may be wrong.
1] be PLPs. Here, P 1 says that tossing a fair coin may result in head-up with probability 0.5, however, in P 2 , we do not know whether the coin is fair.
In this example, the knowledge in P 1 is richer than that in P 2 since from P 1 we know the coin is fair. Using the maximum entropy principle, we get that This result suggests that the difference between P 1 and P 2 is omitted under the maximum entropy reasoning. By calculating the knowledge entropy of P 1 and P 2 , we know that K P1 (headU p(coin)|toss(coin)) = [0, 0] and K P2 (headU p(coin)| toss(coin)) = [0, 1]. Thus we know that P 1 is more precise than P 2 .
From the above examples, we know that, accepting a conclusion obtained by reasoning under the maximum entropy principle may imply that we are willing to introduce extra knowledge into a given PLP and there is no guarantee that this extra knowledge is always correct.
In the next subsection, we provide a general framework for analyzing and reasoning with imprecise PLPs. For this purpose, we provide two concepts: ignorance and degree of satisfaction. The ignorance reflects the richness of the knowledge contained in a PLP and the degree of satisfaction of a query ?(ψ|φ)[l, u] reflects the possibility that the actual probability of (ψ|φ) falls in the given bound [l, u].

General framework for measuring imprecise knowledge
Intuitively, the knowledge measurement K P (ψ|φ) indicates to some extent the ignorance about the conditional event (ψ|φ) when using knowledge contained in P. But unfortunately, such interval can not sufficiently reflect the ignorance about (ψ|φ). This is not surprising, since K P (ψ|φ) is determined only by the tight probability bound of the conditional event (ψ|φ), other knowledge is not considered in K P (ψ|φ).
Example 5 Let P be a PLP: From P, we can infer that Here, we have K P ( fly(t)|sickMagpie(t)) = K P ( fly(t)|magpie(t)). However, since the proportion of sick magpies in birds is smaller than the proportion of magpies in birds, the knowledge that birds can fly should be more cautiously applied to sick magpies than magpies in general. In other words, the statement that more than 90% of birds can fly is more about magpies than sick magpies. Therefore, to accept that 90% of magpies can fly is more rational than to accept 90% of sick magpies can fly. However, these analyses can not be obtained directly from comparing the bounds inferred from P.
We define an ignorance measurement to characterize the knowledge incompleteness restricted to a given (conditional) event: Definition 3 (Ignorance) Let PL be the set of all PLPs and E be a set of all conditional events. Function IG : PL × E → [0, 1] is called a measure 1 of ignorance, iff for any PLP P and conditional event (ψ|φ) it satisfies the following postulates [Irrelevance] If P and another PLP P do not contain common syntaxes, i.e., ∩ = ∅, then IG(P, ψ|φ) = IG(P ∪ P , ψ|φ).
If P = ∅, only tautologies can be inferred from P. Therefore, from any PLP P, IG P (ψ|φ) ≤ IG ∅ (ψ|φ), which means that an empty PLP has the biggest ignorance value for any conditional event. When IG P (ψ|φ) = 0, event (ψ|φ) can be inferred precisely from P, since a single precise probability for (ψ|φ) can be obtained from P. The ignorance measurement focuses on the knowledge about (ψ|φ) contained in P, which means that irrelevant knowledge does not provide a better understanding of this conditional event.

Proposition 3 Let P be a PLP and (ψ|φ) be a conditional event. If
When querying a PLP, the tight consequence reasoning gives a bound as the answer which can be too cautious and not very informative; however, me-consequence gives a precise probability for a query, and it is too risky to simply accept it. We argue that, sometimes we do not need to know a precise probability which is not reliable enough; we may want to know whether the probability falls in a given bound with 1 In mathematical analysis, a measure m is a function, such that m : 2 S → [0, ∞] and . is a countable sequence of pairwise disjoint subsets of S, the measure of the union of all the E i 's is equal to the sum of the measures of each high enough possibility. In other words, we may want to strike a balance between a less informative bound which is true for sure and an intuitively precise probability which is not reliable enough. Consider the ignorance of a conditional event, when the ignorance is 0, then accepting the precise probability given by the maximum entropy principle is guaranteed to be right. But how about the situation that the ignorance of a conditional event is very small? It suggests that the knowledge contained in the PLP is rich enough to infer an informative bound, but the knowledge is not rich enough to infer a precise probability. In order to extract an informative and reliable interval, we first measure the degree of satisfaction for a query (with bound). If a degree is high enough, then regarding the query as true is reliable, since it is very possible that the actual probability falls in the given interval.
Definition 4 (Degree of satisfaction) Let PL be the set of all PLPs and F be a set of all probabilistic formulas. Function SAT : PL × F → [0, 1] is called a measure of degree of satisfaction iff for any PLP P and ground probabilistic formula μ = (ψ|φ) [l, u], it satisfies the following postulates: and SAT(P, (ψ|φ)[l , u ]) < 1.
The reflexivity property says that every consequence is totally satisfied. Rationality says that 0 is given as the degree of satisfaction of an unsatisfiable probabilistic formula. Monotonicity says that if we expect a more precise interval for a query, then the chance that the exact probability of the query is not in the interval is greater. Cautious monotonicity says that, if P and P are equivalent except for the bound of (ψ|φ), and P contains more knowledge about (ψ|φ), then the degree of satisfaction of μ under P should be bigger than that of μ under P.
In our framework, given a PLP P, a conditional event (ψ|φ), and a probabilistic formula (ψ|φ) [l, u], the ignorance value IG P (ψ|φ) and the degree of satisfaction SAT P (μ) reveal different aspects of the impreciseness of the knowledge in P w.r.t. (ψ|φ) and μ. The former says how much this P can tell about (ψ|φ) and the latter says to what degree a user can be satisfied with the bound [l, u] attached to (ψ|φ).
The above proposition says that when the knowledge contained in P totally ignores the conditional event (ψ|φ), then all the knowledge contained in P is irrelevant to the query ?(ψ|φ)[l, l].
Definition 5 Let SAT P (μ) be the degree of satisfaction for a PLP P and μ = (ψ|φ)[l, u] be a probabilistic formula. We define two consequence relations as Proposition 6 Let SAT P (μ) be the degree of satisfaction for a PLP P and a probabilistic formula μ = (ψ|φ) [l, u], then In this proposition, we use SAT = 1 instead of SAT ≥ 1, since the degree of satisfaction can not be greater than 1.
The above proposition says that our framework is a generalization of PLP under its original semantics as well as under the maximum entropy principle. That is, the classical consequence relations |= and |= tight are too cautious-they are equivalent to requiring the degree of satisfaction of μ w.r.t P to be 1, which means that the true probability of (ψ|φ) must fall in the bound [l, u]. On the other hand, reasoning under the maximum entropy principle (|= me tight ) is credulous-it excludes all the other possible probability distributions except for the most possible one.
Given a query ?(ψ|φ)[l, u] against a PLP P, the degree of satisfaction SAT P (μ) tells the probability that Pr(ψ|φ) ∈ [l, u]. For a query ?(ψ|φ), the bound [l, u] returned by P |= tight (ψ|φ)[l, u] may be noninformative as discussed above. In our framework, we provide three possible routes to generate a more informative interval where a is threshold given by the user. First, a user may want to know the highest acceptable lower bound, so l is increased to a smallest value l s.t. SAT P ((ψ|φ) [l , u]) ≥ a holds. Second, a user may want to know the lowest upper bound, so u is decreased to be u until SAT P ((ψ|φ)[l, u ]) ≥ a is true. Third, a user may want to create an interval [l , u ] around me[P], the precise probability given by the maximum entropy principle, where SAT P ((ψ|φ)[l , u ]) ≥ a holds. To formalize these three scenarios, we define three consequence relations |= SAT≥a maxLow , |= SAT≥a minU p and |= SAT≥a aroundMe for them respectively as From P, we can only infer that P |= tight ( fly(t)|magpie(t))[0, 1], and P |= me tight ( fly(t)|magpie(t))[0.9, 0.9].
As discussed above, the bound [0, 1] is meaningless and there is not enough knowledge to infer that exactly 90% magpies can fly. In reality, taking [0.9, 0.9] as the answer for this query is too risky, and there is no need to get a precise probability for the query. A more informative interval [l, u] than [0, 1] would be required. Assume that a user is happy when there is a 80% (i.e., a = 0.8) chance that the actual probability of the query is in [l, u], then we are able to use the above three consequence relations to get the following From the highest lower bound 0.7, a user can assume that a magpie very likely can fly. The user should not think that all magpies can fly either, since the lowest upper bound 0.96 is less than 1. The bound [0.7, 1] gives an estimate for the probability that a magpie can fly.

Quasi-distance
How to measure the distance between probability distributions is a major topic in probability theory and information theory. One of the most common measures for comparing probability distributions is the KL-divergence:
Value δ ub Pr (ψ|φ) (resp. δ lb Pr (ψ|φ)) measures how much additional information needs to be added to the uniform distribution in order to infer the upper (resp. lower) bound of the conditional event (ψ|φ) given subset Pr. Let σ denote the smallest collection such that σ contains all the inseparable subsets of Pr and it is closed under complement and countable unions of its members. Therefore, Pr , σ is a measurable space over the set Pr . Obviously, Pr ∈ σ , and if Pr = {Pr | Pr |= P} for any PLP P, then Pr ∈ σ .
We extend function ϑ ψ|φ to any subset of σ .
Informally, value ϑ (ψ|φ) (Pr) measures how wide the probability distributions in Pr are when inferring ψ given φ. For example, when all the distributions in Pr assign the same probability for the conditional event (ψ|φ), then the set Pr is acting like a single distribution when inferring ψ given φ, and in this case, Pr has width 0 for inferring ψ given φ.
From the definition, we know that function ϑ (ψ|φ) is a measure. Since it is a measure, we can define a probability distribution based on it, and we show that this probability distribution can be used as an instantiation of ignorance in the next subsection.
It is worth noting that since the set of all probabilistic models of a PLP is a convex set and thus is inseparable, we can use ϑ (ψ|φ) to measure the probabilistic models of a PLP. We will discuss this further in the next subsection.

Instantiation of ignorance
Definition 10 Let P be a PLP and (ψ|φ) be a conditional event. Then a KLdivergence based ignorance denoted as IG KL P (ψ|φ) is defined as where Pr = {Pr | Pr |= P}.
Since ϑ ψ|φ is a measure, IG KL P is a uniform probability distribution. Thus, IG KL P (ψ|φ) is the probability that a randomly selected probability distribution from set Pr assigns ψ|φ a probability value that is in the interval [l, u], where P |= tight (ψ|φ) [l, u]. If this probability is close to 1, then reasoning with P is similar to reasoning with an empty PLP; when it is close to 0, it indicates that a tighter bound for (ψ|φ) can be inferred from P.
In the above definition,

Proposition 7
The measure IG KL satisf ies the properties given in Def inition 3.
From the definition of IG KL , it is easy to see that IG KL Suppose that PLP P and another PLP P do not contain common syntaxes, i.e., ∩ = ∅. Let Pr |= P ∪ P . Since ∩ = ∅, we can construct another probability distribution Pr as Pr(I) = I⊆I Pr (I ) for all I ⊆ I . Obviously, Pr |= P. By Definitions 7 and 8, two probability distributions that satisfy P are chosen to calculate δ lb (Pr, (ψ|φ)), δ ub (Pr, (ψ|φ)), ϑ ψ|φ (Pr), and thus IG KL P ∪P (ψ|φ). From them, two probability distributions are constructed and they satisfy P, and give the conditional event (ψ|φ) the lower bound and upper bound respectively. With these two distributions, we can calculate IG KL P (ψ|φ), as well as IG KL P ∪P (ψ|φ). This proposition says that the ignorance of a PLP about a conditional event is the sum of the ignorance of lacking knowledge supporting probability distributions above and below the maximum entropy probability. The ignorance can also be calculated according to the maximum entropy as below. By calculating the KL-ignorance value for conditional events E 1 , . . . , E 5 , we have that IG KL P (E 1 ) = 0.0065, IG KL P (E 2 ) = 0.0027, etc., as shown in Table 2. From the table, we can see that IG KL P (E 1 ) > IG KL P (E 2 ), due to the ignorance about whether the special birds, penguins, are typical birds or not (with respect to the property of having legs).
Comparing these ignorance values, we also see that IG KL P (E 3 ) < IG KL P (E 4 ). It looks counter-intuitive, since being red is irrelevant to being able to fly or not. But such irrelevance is based on our common knowledge and not represented in P. Based on P, we actually do not know whether being red is irrelevant to the ability of flying.
To be more precise, for example, if the predicate red is replaced by ab normal, then whether an abnormal bird can fly is unclear. Consider that the difference between red and ab normal comes from our common knowledge and not from the PLP P, it is natural that the ignorance value for E 3 is smaller than that for E 4 .

Instantiation of satisfaction function
Given a PLP P, a set of probability distributions can be induced such that Pr = {Pr | Pr |= P} and a unique probability distribution me [P] in the set that has the maximum entropy can be determined. In Pr, some distributions are likely to be the actual probability distribution. However, due to the lack of information, we do not know which one is the actual probability distribution. Based on the maximum entropy principle, me[P] is assumed to be the most likely one, and the probability me[P](ψ|φ) is assumed to be the most likely probability for the event (ψ|φ). Intuitively, the probability value that is closer to me[P](ψ|φ) is more likely to be the actual probability of (ψ|φ). Based on this, an interval that contains values closer to me[P](ψ|φ) are more likely to contain the actual probability of (ψ|φ). Of course, a loose interval is always more likely to contain the actual probability of (ψ|φ) than a tighter interval.
Again, from the distance functions dis pos P,(ψ|φ) and dis neg P,(ψ|φ) , a probability distribution can be defined. So, by KL-divergence, the possible probabilities of a conditional event (ψ|φ) are measurable. Assume that every probability is equally probable, then the (second order) probability that the actual (first order) probability of (ψ|φ) falls in an interval      Concluding the above, we have that So, in P 1 , we do not know whether head-up is more probable than tail-up. However  [11], are commonly regarded as being particularly desirable for any reasonable notion of nonmonotonic entailment .
In [9], these postulates are reformulated for probabilistic reasoning:
These properties indicate that the consequence relation |= SAT≥a is plausible for nonmonotonic reasoning.

Examples
In this section, we illustrate the usefulness of our framework with some examples.

Example 9
Let P be a PLP as given in Example 11. In our framework, we calculate the KL-ignorance and KL-satisfaction for our queries. We have IG KL ( fly(t)|magpie(t) (P) = 0.11 and IG KL ( fly(t)|sickMagpie(t)) (P) = 0.0283. This indicates that P is more useful to infer the proportion of magpies that can fly than to infer the proportion of sick magpies that can fly. We also have that SAT KL P (( fly(t)|magpie(t))[0.8, 1]) = 0.58, SAT KL P (( fly(t)|sickMagpie(t)) [0.8, 1]) = 0.53. By comparing these KL degrees of satisfaction, we know that magpies are more likely to fly than sick magpies.
Example 10 Let P be as given in Example 7. Consider a query ?(haveLegs(tweety)| penguin(tweety)) [l, u] with different values of l and u, we can calculate its KL degree of satisfaction as shown in Table 3. In the table, we can see that So the degree of satisfaction of the query decreases as the bound becomes tighter. It is worth noting that P |= me tight (haveLegs(tweety)| penguin(tweety))[0.98, 0.98]. However, the degree of satisfaction of (haveLegs(tweety)| penguin(tweety))[0.97, 0.99] is only 0.081, which indicates that the probability 0.98 is not fully acceptable. This is because we do not know whether penguins are typical birds (with respect to the property of having legs).
On the contrary, we can accept that (haveLegs(tweety)| penguin(tweety))[0. 8,1], which means that we can also infer that tweety has legs given that it is a penguin. Although we are not entirely sure about this, it is more reliable than believing 97%-99% magpies can fly (compare 0.829 to 0.081).
In this example, a user wants to know the lower bound for the probability that a penguin has legs. From the the non-informative interval [0, 1], the user can only know that it is possible that the probability that a penguin has leg is 0 (say, a penguin is possibly a special kind of bird), which is useless. By |= SAT≥a maxLow , the user can infer a non-trivial lower bound for the query. Given a = 0.8, it is inferred that more than 80% penguin may have legs.
Consider the inheritance problem. Intuitively, we expect that a subclass can inherit its superclass's attributes. But if we permit inheritance with exception, then a special subclass may lack the attributes that its superclass has. The more specific a subclass is, the more possible that it lacks the attributes that its superclass has.
Example 12 (Route planning [9]) Assume that John wants to pick up Mary after she stops working. To do so, he must drive from his home to her office. Now, John has the following knowledge at hand: Given a road (ro) from R to S, the probability that he can reach (re) S from R without running into a traffic jam is greater than 0.7. Given a road in the south (so) of the town, this probability is even greater than 0.9. A friend just called him and gave him advice (ad) about some roads without any significant traffic. Clearly, if he can reach S from T and T from R, both without running into a traffic jam, then he can also reach S from R without running into a traffic jam. Furthermore, John has some concrete knowledge about the roads, the roads in the south of the town, and the roads that his friend was talking about. For example, he knows that there is a road from his home (h) to the university (u), from the university to the airport (a), and from the airport to Mary's office (o). Moreover, John believes that his friend was talking about the road from the university to the airport with a probability between 0.8 and 0.9 (he is not completely sure about it, though). The above and some other probabilistic knowledge is expressed by the following PLP P: John wants to know the probability of him running into a traffic jam, which can be expressed by query: Q 0 =?(re(h, o)| ).
In [9], Q 0 can be answered by  , o)| ). However, the actual probability of (re(h, o)| ) may be still different from 0.93, since John is wondering whether he can reach Mary's office from his home, such that the probability of him running into a traffic jam is smaller than 0.10. This can be expressed by the following probabilistic query: Q 1 =?(re(h, o)| )[0.90, 1]. John is also wondering whether the probability of him running into a traffic jam is smaller than 0.10, if his friend was really talking about the road from the university to the airport. This can be expressed as a probabilistic query: Q 2 =? (re(h, o)|ad(u, a))[0.90, 1].
In [9], in the traditional probabilistic logic programming both Q 1 and Q 2 are given the answer "No"; by applying the maximum entropy principle Q 1 is given the 0.000 answer "No" and Q 2 is given the answer "Yes". For Q 1 John will accept the answer "No", however, for Q 2 , John may be confused and does not know which answer he should trust. Using our method, we calculate the degrees of satisfaction of these two queries. For Q 1 , SAT KL P (Q 1 ) = 0, which means the bound [0.9, 1] does not contain the probability given by the maximum entropy principle, and thus John has no confidence that he can reach Mary's office on time. For Q 2 , SAT KL P (Q 2 ) = 0.724, the relative high value "0.724" can help John to decide whether he should set off to pick up Mary.
Using our method, John gets an estimate of the probability that he can reach Mary's office from his home without running into a traffic jam. If it is a special day for him and Mary, he hopes that his estimate is more accurate, otherwise, he can tolerate a less accurate estimate. Formally, he needs to decide the threshold a for |= SAT≥a maxLow . For example, for Q 2 , he may set a = 0.6 for a normal day, and a I = 0.75 for an important day. Therefore, he can infer that P |= SAT≥0. If it is an ordinary day and the lowest probability is bigger than 0.90, then he can set off. On an important day, he will need to investigate more about the traffic (to decrease the ignorance of (re (h, o)|ad(u, a))) or he has to revise his plan, since 0.897 < 0.9.
On the another hand, we also analyze the usefulness of the advice from his friend. By analyzing his friend's knowledge, we have IG KL P (re(h, o)|ad(u, a)) = 0.0184. This means that his friend's advice is indeed useful, since this ignorance value is significantly smaller than IG KL P (re(h, o)| ). So, John needs to call his friend to make sure that his friend is really talking about the road from the university to the airport.
The degrees of satisfaction for various intervals are given in Table 4. From the table, we can see that the degree of satisfaction decreases as the interval becomes tighter. This means that the second order probability that the actual probability of (ψ|φ) falls in [l, u] is getting smaller.
6 Implementation of our framework and a case study

Implementation
To efficiently return a query result given a PLP, we implemented the algorithms proposed in [9,14] for reasoning with PLPs. Using these algorithms, a PLP can be translated into a linear or nonlinear optimization problem. We implemented these algorithms in Java and solved the underlying optimization problem using a component in Matlab.
In addition, we also implemented the calculation of ignorance and degree of satisfaction with the algorithms given below, KLIgnorance (Algorithm 1) and KL-Satisfaction (Algorithm 3). These two algorithms rely on the algorithms provided in [9,14] as well as the software Matlab to optimize a PLP.
In terms of complexity, our algorithms call Algorithm tight_0_concequence (Fig. 2) and Algorithm tight_me_concequence (Fig. 3) 1 or 2 times. It is stated in [9] that the complexity for Algorithm tight_0_concequence (Fig. 2) is F P N P -complete and Algorithm tight_me_concequence (Fig. 3) fall outside the range of such standard complexity analysis (where the upper complexity bound is based on the existence of a polynomial-size probabilistic interpretation that involves only rational numbers), since the me-model of a probabilistic logic program P may involve irrational numbers [9]. The same difficulties exist for analyzing the complexity of our algorithms. Furthermore, our algorithms also rely on the computation of nonlinear optimization problems subject to linear constraints as that for Algorithm tight_me_concequence (Fig. 3). Therefore, the complexity of our algorithms falls in the same level as that of Algorithm tight_me_concequence (Fig. 3), which is intractable [9].

A case study on breast cancer clinical trials
Usually clinical trials provide a huge amount of statistical data. From these statistical data, we can compare the efficiency of drugs or therapies for different groups of patients. In order to make use of these data, we need to represent the statistical knowledge formally, and to provide analyzing tools for using such knowledge to answer queries related to individuals (maybe with some facts about the individuals). For this purpose, we use PLP as the formal representation language. As we discussed in Section 1, PLP is chosen because of its expressive power for imprecise probabilistic knowledge and its reasoning efficiency. Also, statistical data can be regarded as probabilistic data that is guaranteed by the law of large numbers in the field of probability theory, so using PLP to model statistical data drawn from trials is theoretically valid.

Observation vs. a prior facts
In PLPs, we use ground formulas to state a prior facts from statistics, i.e., something that must be true (statistically) is regarded as a fact. These facts are treated differently from observations about individuals. Observing an event (such as the test result of a particular test) does not infer that the event would happen for sure. So, observations cannot be represented as formulas of the form (ψ(a)| ) [1,1] in a PLP, doing so implies that we know ψ(a) as being true even before it is observed. In other words, taking ψ(a) as a probabilistic event, we cannot predict if ψ(a) is true or false before we observed it. In our framework, all observations are stored in a separate database (named OBS) rather than in a PLP containing statistical knowledge. When querying (ψ|φ) [l, u] on PLP P, this observation database OBS is automatically called, so querying (ψ|φ) [l, u] is equivalent to querying (ψ|φ ∧ OBS) [l, u] on P.

Background knowledge
From a clinical trial, only statistical data are explicitly provided. This knowledge alone is not sufficient for reasoning, some background knowledge is also necessary.   [9] In order to process data in a trial, the background knowledge needed can be categorized into three groups: -additional statistical knowledge (to trials data), which is explicitly represented typically by a table, such as a statistics about the death rate in a particular age group; -meta knowledge for a trial, such as the principle for choosing the participants, which is represented explicitly or implicitly in a trial report; -background knowledge related to the trial, which may be omitted in a trial report and is shared by many trials, such as age distribution, natural death rate, a prior estimation of a disease, etc.

Analysis of trials data for breast cancer
In this section, we model and query the meta-analysis results of early breast cancer trials. 4 This meta analysis of original individual trials aims to examine the effects of various treatments with early breast cancer. Here we consider the mortality of patients who have had the treatment of radiotherapy after breast conserving surgery The knowledge (or data) in Webfigure 6a can be formally represented by a PLP P below with 13 rules. Question like what is the mortality of a 50-year-old patient who has had the treatment of radiotherapy after breast conserving surgery and whose ER value is positive can be formalized as Q = ?(mort(name, Y10) | bcsRT(name, Y1) ∧ hasBC(name, Y1) ∧ er(name, Y1, positive) ∧ age(name, 50s) ∧ tenYear(Y1, Y10)), where name should be replaced by an individual's name whose 10-year mortality is our interest. In this PLP, there are four constants related to attribute Age: yt50 and ot70 standing for younger than 50 and older than 70, respectively, 50s and 60s standing for age in 50s and age in 60s, respectively. Rules 2-5 are from the meta-knowledge from the trials, which states the methods for dividing sub-groups on ER values. bcsRT(X, Y) means that patient X has RT treatment after breast conserving surgery in year Y. bcsOnly(X, Y) means X has breast conserving surgery in year Y only. er(X, Y, Z ) states that the test result of ER status for X in year Y is Z . hasBC(X, Y) means that X has breast cancer in year Y. mort(X, Y) means X died for breast cancer in year Y. tenYear (Y, Z ) states that year Z is ten years after year Y.
Rule 1 also comes from the background knowledge, it says that a person cannot have two different ages (or age groups). Condition Y = Z in Rule 1 can be replaced by any two values Y and Z which cannot be held simultaneously. Rule 1 is in fact equivalent to a set of rules such as (age(X, yt50) ∧ age(X, 60s)| )[0, 0], by replacing Y and Z with Y = 50s and Z = 60s. The remaining rules come directly from the statistical data listed in Table 5, which is a sub-table of Webfigure 6a. 6 More precisely, Rules 6-12 correspond to the second column in the table.
Assume that we have a patient named Mary who is 50 years old and is diagnosed as having breast cancer with positive result of ER test. A doctor decides to give her the treatment of radiotherapy (RT) after breast conserving surgery. From the statistical data, we know the 10-year mortality of breast cancer patients in their 50s who have had RT treatment after BCS and the 10-yeay mortality of breast cancer patients with a positive ER value. So, what can we tell about Mary's 10-year mortality after BCS and RT, given that her ER value is positive? Formally, this is to answer the query Q =?(mort(Mary, Y10) | bcsRT(Mary, Y1) ∧ hasBC(Mary, Y1) ∧ er(Mary, Y1, positive) ∧ age(Mary, 50s) ∧ tenYear(Y1, Y10)).
Let us denote the conditional event in this query Q as E to simplify the notation (i.e., Q =?E). With the given PLP, we have that P |= tight E[0, 1] and P |= me tight E[0.1456, 0.1456]. That is, we get a non-informative interval [0, 1] and a precise probability 0.1456 as two possible answers to this query.
Note that from statistics, the 10-year mortality in the subgroup of patients in their 50s with BCS+RT is 15.0% and in the subgroup of ER-positive (ER value is positive) patients with BCS+RT is 16.9%. Since the probability of query Q given by maximum entropy (14.56%) is less than both 15.0% and 16.9%, this value seems reasonable since both subgroups for Age in 50-59 and ER-positive have a lower 10-year mortality rate than other subgroups. So both factors together could further reduce the mortality rate. This value is also backed up by the ignorance value of E under P which is 0.017 and this small value implies that the knowledge in P is rich enough to answer Q with a single probability. In other words, the probability 14.56% given by maximum entropy is reasonable. However, on the other hand, the ignorance is bigger than 0, thus there is still a small chance that probability 14.56% could be wrong. In this case, we want to find out the interval where the true probability could lie and how satisfied we are with this interval. To do so, we need to measure the degree of satisfaction which is the second order of probability about the probability of a query being in a given interval. The second order probabilities for Pr(Q) ∈ [l, u], where l, u have different values, are listed in Table 6. In this table, we take the maximum entropy probability, p me , as a middle point to create various sized intervals [l, u] with l = p me − , u = p me + where is the base value for increase/decrease and = k indicates how many times (k times) more/less of we want to increase/decrease p me . In this case, we set = 0.005, and create the first interval [0.1406, 0.1506] that contains p me = 0.1456.
If a doctor wants to follow the treatment plan BCS+RT given that Mary is in her 50s and her ER value is positive, the doctor could look up Table 6 to see how reasonable this treatment plan is. For instance, we have p(Pr(Q) ∈ [0.1356, 0.1556]) = 0.00085 which can be explained as: the probability that the probability of Mary being dead in 10-years time after BCS+RT treatment falling in between 0.1356 and 0.1556 is 0.00085. In other words, it is very unlikely that Mary's 10-year mortality is between 13.56% and 15.56%. If in the table, there is an entry with a smaller value of u (the 10-year mortality is not beyond u) and a reasonably large value of degree of satisfaction, then the doctor could decide that this plan is worth following. On the other hand, if there does not exist an entry in the table that shows a high probability of a low 10-year mortality probability for Mary, then this treatment plan is questionable. In this particular case, we have an entry ([0, 0.3056], 0.42478) which

Related work
Since the early 1990s, there have been considerable research efforts on integrating logical programming with probability theory. These probabilistic logic programs have been studied from different perspectives and have different syntactic forms and semantics, including conditional probabilistic logic programming [4,12,14], Causal Probabilistic Logic Programming [1,2,22], Success Probabilistic Logic Programming [8,20], and some others [5].
In causal probabilistic logic programming [1,2], a rule Pr(ψ| c φ) = y is interpreted as "if φ happens, this fact will cause the probability of ψ being y". A causal probability statement implicitly represents a set of conditional independence assumptions: given its cause φ, an effect ψ is probabilistically independent of all factors except the (direct or indirect) effects of φ (see [1] for details). Formally, if Pr(ψ| c φ 1 ) = y 1 ∈ P and Pr(ψ| c φ 2 ) = y 2 ∈ P where y 1 = y 2 , then no possible world satisfies both φ 1 and φ 2 .
In [8,20], a real number attached to a rule represents the probability that this rule is alliable (or satisfiable). In other words, a PLP in this view represents a set of (classical) logic programs, and the probability of each member is decided by all the probabilities of the rules. Then for any query, the answer is the probability of choosing a classical logic program from the set that can successfully infer the query. In this formalization, we can only query about the probability of ψ and cannot query about the probability of (ψ|φ), since (ψ|φ) is meaningless in classical logic programs.
In [3,5,17,18,22] In this paper, we have focused on the framework of conditional probabilistic logic programming for representing conditional events, because this framework is more suitable for modelling our applications, such as clinical trials information or dialog knowledge.
Because of its weakness in reasoning, subclasses cannot inherit the properties of its superclass in the basic semantics of PLP. In [13,15,16], Lukasiewicz provided another method to enhance the reasoning power mainly on the issue of inheritance. In this setting, logic entailment strength λ is introduced. With strength 1, subclasses can completely inherit the attributes of its superclass; with strength 0 subclasses cannot inherit the attributes of its superclass; with a strength between 0 and 1, subclasses can partially inherit the attributes of its superclass. Value strength appears to be similar to the degree of satisfaction in our framework, but they are totally different. First, λ is not a measurement for a query, but is given by a user to control the reasoning procedure, in other words, we cannot know beforehand the strength in order to infer a conclusion. Second, even if we can use a strength as a measurement, i.e., even if we can obtain the required strength to infer an expected conclusion, it is not an instance of degree of satisfaction, because the cautious monotonicity postulate in Definition 4 is not satisfied. Given a PLP P, assume that we can infer both (ψ|φ)[l 1 , u 1 ] by strength λ = λ 1 and (ψ|φ)[l 2 , u 2 ] by strength λ = λ 2 . Now assume that (ψ|φ)[l 1 , u 1 ] is added to P, however, in order to infer (ψ|φ)[l 2 , u 2 ], we still need to have the strength λ = λ 2 being given. That is, adding additional information to P does not avoid requiring the strength λ 2 if (ψ|φ)[l 2 , u 2 ] is to be inferred. In contrast, if we have (ψ|φ)[l 1 , u 1 ] added in the PLP, then the degree of satisfaction of (ψ|φ)[l 2 , u 2 ] will increase. Consider Example 11, with strength λ = 0.5, we can infer that ( fly(t)|magpie(t))[0. 8,1], ( fly(t)|sickMagpie(t))[0. 8,1]. However, these two conclusions have different degrees of satisfaction.
In [19,21], the authors provided a second order uncertainty to measure the reliability of accepting the precise probability obtained by applying the maximum entropy principle as the answer to a query in propositional probabilistic logic. The second order uncertainty for (ψ|φ) and PLP P is defined as (− log l − log u) where P |= tight (ψ|φ) [l, u]. Similarly, we provided an ignorance function to measure the usefulness of a PLP for answering a query. If a precise probability for a query is inferred from a PLP P then P contains full information about the query, and therefore accepting the probability is totally reliable. More precisely, their second order uncertainty is directly computed from the probability interval of the query inferred from P. In contrast, our ignorance is computed from the PLP, which provides more information than an interval. Therefore, our measure of ignorance is more accurate in reflecting the knowledge in a PLP. Consider Example 11 again, the second order uncertainty of ( fly(t)|magpie(t)) and ( fly(t)|sickMagpie(t)) are the same. However the degree of satisfaction for the two queries are different.
In [3], the authors defined a higher-order probability distribution over the probabilities that a query can be inferred from, given a probabilistic logic program . Informally speaking, the higher order probability that a query Q is entailed by with probability in [a, b ] is interpreted as the ratio between the number of probabilistic models of that give Q a probability value within the interval [a, b ] and the total number of probabilistic models of . As stated in the paper, there are no assumptions about the dependencies or correlations between the events represented in the probabilistic logic program, including the maximum entropy principle, that is, their method is based the assumption of ignorance. In contrast, our method tries to balance between the assumption of the maximum entropy principle with the ignorance of the knowledge contained in a PLP. Therefore, our approach can be seen as a step forward towards addressing the problem of ignorance.

Conclusion
To be able to accurately answer a query is critical in many intelligent systems. When the underlying knowledge is uncertain, e.g., probabilistic, this problem is more evident. So, what is the probability of an answer is indeed for a given query, when the knowledge used itself is uncertain? One way is to attach an interval to the answer indicating that the probability of the answer is in the interval, another is to generate a single precise probability. The maximum entropy principle is widely used for this latter purpose.
Although the maximum entropy principle is intuitive and widely accepted in information theory, it is too risky to simply apply it to answer a query when the knowledge used is not certain. In order to tell how much we can trust a result for a query given a PLP with imprecise knowledge, we proposed a framework to measure both ignorance and the degree of satisfaction of an answer to a query under a given PLP. Using the consequence relations provided in this paper, we can get an informative and reliable interval as the answer for a query or alternatively we know how much we can trust a single probability. The proofs that our framework is an extension of both traditional conditional probabilistic logic programming and the maximum entropy principle (in terms of consequence relations) show that our framework is theoretically sound.
We demonstrated our framework with some examples from the literature and from other research projects we are involved in. The results show that providing degrees of satisfaction and ignorance are useful for making decisions, when there seem to be several choices and a system does not have other information to suggest which result to choose.

Appendix: Algorithms
In this section, we provide a brief description of the algorithms proposed in [9] which are used in our algorithms. We consider only ground PLPs here.
The idea of these algorithms is to generate equivalent classes of possible worlds and the possible worlds in each equivalent class are indifferentiable under the knowledge contained by the PLP.
Define S C (E) as the set of all possible worlds that satisfy C, and this set can be partitioned into subsets w.r.t. R C (E), such that S C (E) = {S r | r ∈ R C (E)} where S r = {I ∈ I | I |= C, ∀(ψ|φ) ∈ E, I |= r(ψ|φ)}.
An important result in [9] is that, reasoning with a PLP P can be reduced to calculating a probability distribution over a set S C (E).
Let P = (C, D) be a PLP, and let α be an event. Set At(α) is used to denote the set of all atoms that occur in α. Denote the Herbrand base defined on the set of predicates and constants that occur in P and α as H B P,α . The decomposition of H B P,α w.r.t. P and α is a partition {H 1 , . . . , H k } of H B P,α such that -each probabilistic formula in P is defined over some H i with i ∈ {1, . . . , k}, and α is defined over some H i with i ∈ {1, . . . , k}, and -k ≥ 1 is maximal.
For i ∈ {1, . . . , k}, we define D i as the set of all probabilistic formulas from D that are defined over H i . The relevant subset of D w.r.t. C and α, denoted by rel C,α (D), is defined as the set D i with minimal index i ∈ {1, . . . , k} such that At(α) ⊆ D i .