Efficient inference and learning in a large knowledge base

Wang, William Yang; Mazaitis, Kathryn; Lao, Ni; Cohen, William W.

doi:10.1007/s10994-015-5488-x

Efficient inference and learning in a large knowledge base

Reasoning with extracted information using a locally groundable first-order probabilistic logic

Published: 04 April 2015

Volume 100, pages 101–126, (2015)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Efficient inference and learning in a large knowledge base

Download PDF

William Yang Wang¹,
Kathryn Mazaitis¹,
Ni Lao² &
…
William W. Cohen¹

3950 Accesses
30 Citations
1 Altmetric
Explore all metrics

Abstract

One important challenge for probabilistic logics is reasoning with very large knowledge bases (KBs) of imperfect information, such as those produced by modern web-scale information extraction systems. One scalability problem shared by many probabilistic logics is that answering queries involves “grounding” the query—i.e., mapping it to a propositional representation—and the size of a “grounding” grows with database size. To address this bottleneck, we present a first-order probabilistic language called ProPPR in which approximate “local groundings” can be constructed in time independent of database size. Technically, ProPPR is an extension to stochastic logic programs that is biased towards short derivations; it is also closely related to an earlier relational learning algorithm called the path ranking algorithm. We show that the problem of constructing proofs for this logic is related to computation of personalized PageRank on a linearized version of the proof space, and based on this connection, we develop a provably-correct approximate grounding scheme, based on the PageRank–Nibble algorithm. Building on this, we develop a fast and easily-parallelized weight-learning algorithm for ProPPR. In our experiments, we show that learning for ProPPR is orders of magnitude faster than learning for Markov logic networks; that allowing mutual recursion (joint learning) in KB inference leads to improvements in performance; and that ProPPR can learn weights for a mutually recursive program with hundreds of clauses defining scores of interrelated predicates over a KB containing one million entities.

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Boolean interpretation, matching, and ranking of natural language queries in product selection systems

Article Open access 03 April 2024

A retrospective of knowledge graphs

Article 26 September 2016

1 Introduction

Probabilistic logics are useful for many important tasks (Lowd and Domingos 2007; Fuhr 1995; Poon and Domingos 2007, 2008); in particular, such logics would seem to be well-suited for inference with the “noisy” facts that are extracted by automated systems from unstructured web data. While some positive results have been obtained for this problem (Cohen 2000), most probabilistic first-order logics are not efficient enough to be used for inference on the very large broad-coverage KBs that modern information extraction systems produce (Suchanek et al. 2007; Carlson et al. 2010). One key problem is that queries are typically answered by “grounding” the query—i.e., mapping it to a propositional representation, and then performing propositional inference—and for many logics, the size of the “grounding” can be extremely large. For instance, in probabilistic Datalog (Fuhr 1995), a query is converted to a structure called an “event expression”, which summarizes all possible proofs for the query against a database; in ProbLog (De Raedt et al. 2007) and MarkoViews (Jha and Suciu 2012), similar structures are created, encoded more compactly with binary decision diagrams (BDDs); in probabilistic similarity logic (PSL) (Brocheler et al. 2010), an intentional probabilistic program, together with a database, is converted to constraints for a convex optimization problem; and in Markov Logic Networks (MLNs) (Richardson and Domingos 2006), queries are converted to a propositional Markov network. In all of these cases, the result of this “grounding” process can be large.

As a concrete illustration of the “grounding” process, Fig. 1 shows a very simple MLN and its grounding over a universe of two web pages $a$ and $b$. Here, the grounding is query-independent. In MLNs, the result of the grounding is a Markov network which contains one node for every atom in the Herbrand base of the program—i.e., the number of nodes is $O(n^k)$ where $k$ is the maximal arity of a predicate and $n$ the number of database constants. However, even a grounding size that is only linear in the number of facts in the database, $|DB|$, would be impractically large for inference on real-world problems. Superficially, it would seem that groundings must inherently be $o(|DB|)$ for some programs: in the example, for instance, the probability of aboutSport(x) must depend to some extent on the entire hyperlink graph, if it is fully connected. However, it also seems intuitive that if we are interested in inferring information about a specific page—say, the probability of aboutSport(d1)—then the parts of the network only distantly connected to d1 are likely to have a small influence. This suggests that an approximate grounding strategy might be feasible, in which a query such as aboutSport(d1) would be grounded by constructing a small subgraph of the full network, followed by inference on this small “locally grounded” subgraph. As another example, consider learning from a set of queries $Q$ with their desired truth values. Learning might proceed by locally-grounding every query goal, allowing learning to also take less than $O(|DB|)$ time.

In this paper, we first present a first-order probabilistic language which is well-suited to such approximate “local grounding”. We describe an extension to stochastic logic programs (SLP) (Muggleton 1996; Cussens 2001) that is biased towards short derivations, and show that this is related to personalized PageRank (PPR) (Page et al. 1998; Chakrabarti 2007) on a linearized version of the proof space. Based on the connection to PPR, we develop a provably-correct approximate inference scheme, and an associated proveably-correct approximate grounding scheme: specifically, we show that it is possible to prove a query, or to build a graph which contains the information necessary for weight-learning, in time $O(\frac{1}{\alpha \varepsilon })$, where $\alpha $ is a reset parameter associated with the bias towards short derivations, and $\varepsilon $ is the worst-case approximation error across all intermediate stages of the proof. This means that both inference and learning can be approximated in time independent of the size of the underlying database—a surprising and important result, which leads to a very scalable inference algorithm. We show that ProPPR is efficient enough for inference tasks on large, noisy KBs.

The ability to locally ground queries has another important consequence: it is possible to decompose the problem of weight-learning to a number of moderate-size subtasks—in fact, tasks of size $O(\frac{1}{\alpha \varepsilon })$ or less—which are weakly coupled. Based on this we outline a parallelization scheme, which in our current implementation provides an order-of-magnitude speedup in learning time on a multi-processor machine.

This article extends prior work (Wang et al. 2013) in the following aspects. First, the focus of this article is on inference on a noisy KB, and we comprehensively show the challenges on the inference problems on large KBs, how one can apply our proprosed locally grounding theory to improve the state-of-the-art in statistical relational learning. Second, we provide numerous new experiments on KB inference, including varying the size of the graph, comparing to MLNs, and varying the size of the theory. We demonstrate that the ProPPR inference algorithm can scale to handle million-entity datasets with several complex theories (non-recursive, PRA non-recursive, and PRA recursive). Third, we provide additional background on our approach, discussing in detail the connections to prior work on stochastic logic programs and path finding.

In the following sections, we first introduce the theoretical foundations and background of our formalism. We then define the semantics of ProPPR, and its core inference and learning procedures. We next focus on a large inference problem, and show how ProPPR can be used in a statistical relational learning task. We then present experimental results on inference in a large KB of facts extracted from the web (Lao et al. 2011). After this section, we describe our results on additional benchmark inference tasks. We finally discuss related work and conclude.

2 Background

In this section, we introduce the necessary background that our approach builds on: logic program inference as graph search, an approximate Personalized PageRank algorithm, and stochastic logic programs.

2.1 Logic program inference as graph search

To begin with, we first show how inference in logic programs can be formulated as search over a graph. We assume some familiarity with logic programming and will use notations from Lloyd (1987). Let $LP$ be a program which contains a set of definite clauses $c_1,\ldots ,c_n$, and consider a conjunctive query $Q$ over the predicates appearing in $LP$. A traditional Prolog interpreter can be viewed as having the following actions. First, construct a “root vertex” $v_0$, which is a pair $(Q,Q)$, and add it to an otherwise-empty graph $G'_{Q,LP}$. (For brevity, we drop the subscripts of $G'$ where possible.) Then recursively add to $G'$ new vertices and edges as follows: if $u$ is a vertex of the form $(Q, (R_1,\ldots ,R_k))$, and $c$ is a clause in $LP$ of the form $ R' \leftarrow S'_1,\ldots ,S'_\ell $, and $R_1$ and $R'$ have a most general unifier $\theta =mgu(R_1,R')$, then add to $G'$ a new edge ${{u\rightarrow {}v}}$ where $ v = (Q\theta , (S'_1,\ldots ,S'_\ell ,R_2,\ldots ,R_k)\theta ) $.^{Footnote 1} Let us call $Q\theta $ the transformed query and $(S'_1,\ldots ,S'_\ell ,R_2,\ldots ,R_k)\theta $ the associated subgoal list. Empty subgoal lists correspond to solutions, and if a subgoal list is empty, we will denote it by $\Box $.

The graph $G'$ is often large or infinite so it is not constructed explicitly. Instead Prolog performs a depth-first search on $G'$ to find the first solution vertex $v$—i.e., a vertex with an empty subgoal list—and if one is found, returns the transformed query from $v$ as an answer to $Q$.

Table 1 and Fig. 2 show a simple Prolog program and a proof graph for it. (Ignore for now the annotation after the hash marks, and edge labels on the graphs, which will be introduced below.) For conciseness, in Fig. 2 only the subgoals $R_1,\ldots ,R_k$ are shown in each node $u=(Q,(R_1,\ldots ,R_k))$. Given the query $Q=\textit{about(a,Z)}$, Prolog’s depth-first search would return $Q=\textit{about(a,fashion)}$. Note that in this proof formulation, the nodes are conjunctions of literals, and the structure is, in general, a directed graph, rather than a tree. Also note that the proof is encoded as a graph, not a hypergraph, even if the predicates in the LP are not binary: the edges represent a step in the proof that reduces one conjunction to another, not a binary relation between entities.

Table 1 A simple program in ProPPR. See text for explanation

Efficient inference and learning in a large knowledge base

Abstract

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Boolean interpretation, matching, and ranking of natural language queries in product selection systems

A retrospective of knowledge graphs

1 Introduction

2 Background

2.1 Logic program inference as graph search

2.2 Personalized PageRank

2.3 Stochastic logic programs

3 The programming with personalized PageRank (PROPPR) language

3.1 Extensions to the semantics of SLPs

3.1.1 Feature-based transition probabilities

3.1.2 Restart links and solution self-links

3.1.3 Summary of the extended proof graph

3.1.4 Discussion

3.2 Locally grounding a query

Theorem 1

Corollary 1

3.3 Learning for ProPPR

4 Inference in a noisy KB

4.1 Challenges of inference in a noisy KB

4.2 Inference using the path ranking algorithm (PRA)

4.3 From non-recursive to recursive theories: joint inference for multiple relations

5 Experiments in KB inference

5.1 Varying the size of the graph

5.2 Comparing ProPPR and MLNs

5.3 Varying the size of the theory

6 Experiments on other tasks

6.1 Efficiency

6.2 Effectiveness of learning

7 Related work

8 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation