Skip to main content

Soft quantification in statistical relational learning


We present a new statistical relational learning (SRL) framework that supports reasoning with soft quantifiers, such as “most” and “a few.” We define the syntax and the semantics of this language, which we call \(\hbox {PSL}^Q\), and present a most probable explanation inference algorithm for it. To the best of our knowledge, \(\hbox {PSL}^Q\) is the first SRL framework that combines soft quantifiers with first-order logic rules for modelling uncertain relational data. Our experimental results for two real-world applications, link prediction in social trust networks and user profiling in social networks, demonstrate that the use of soft quantifiers not only allows for a natural and intuitive formulation of domain knowledge, but also improves inference accuracy.


Statistical relational learning (SRL) has become a popular paradigm for knowledge representation and inference in application domains with uncertain data that is of a complex, relational nature. A variety of different SRL frameworks has been developed over the last decade, based on ideas from probabilistic graphical models, first-order logic, and programming languages (see e.g., Muggleton and De Raedt 1994; Richardson and Domingos 2006; Getoor and Taskar 2007). Many of these frameworks use logical formulas to express statistical dependencies over relational data. The number of elements in the data that satisfy a formula is called its quantification. Quantification in first-order logic, and by extension SRL, is traditionally either existential (\(\exists \)) or universal (\(\forall \)). However, there are many modeling scenarios in which softer quantifiers, such as most and a few, are more appropriate.

For example, in models for social networks it is common to include the knowledge that the behaviour, beliefs, and preferences of friends all influence each other. How this information can be incorporated depends on the expressivity of the model. In a traditional probabilistic model, a dependency might be included for each pair of friends (corresponding to a universally quantified rule), each expressing the knowledge that it is more probable that two friends share a trait in common. An often cited example in SRL contexts describing smoking behaviour among friends is:

\(\forall X \forall Y Friends(X,Y) \rightarrow (Smokes(X)\leftrightarrow Smokes(Y))\) (Richardson and Domingos 2006). This formula states that if two people are friends, then either both of them smoke or neither of them. In this case, the probability that a person smokes scales smoothly with the number of friends that smoke. However, many traits of interest might not behave this way, but instead exhibit “tipping points” in which having a trait only becomes more probable once most or some of one’s friends have that trait (e.g., smoking behaviour). Expressing this dependency requires a soft quantifier, which none of the existing SRL frameworks allow.

What sets soft quantifiers apart from universal and existential quantification is that expressions that contain them are often true to a certain degree, as opposed to either being true or false. Indeed, the degree to which a statement such as “most of Bob’s friends smoke” is true, increases with the percentage of smokers among Bob’s friends. This increase is not necessarily linear; in fact, a common approach to compute the truth degree of soft quantified expressions is to map percentages to the scale [0, 1] using non-decreasing piecewise linear functions (Zadeh 1983). Previous SRL work (e.g., Milch et al. 2008; Jain et al. 2010; Poole et al. 2012) has considered hard quantifiers with thresholds such as at least k. Soft quantifiers, on the other hand, do not impose such hard thresholds but allow smooth, gradual transitions from falsehood to truth.

Furthermore, the dependence of predicted probabilities on population size in relational models such as Markov logic networks (MLNs) and relational logistic regression is addressed in Poole et al. (2014) and Kazemi et al. (2014). Soft quantifiers not only provide the flexibility of modelling complex relations, but their semantics also do not depend on the absolute population size. Hence soft quantifiers allow us to learn a model for some population size and apply the same model to another population size without the need for changes in the model, e.g., without introducing auxiliary variables to control whether the population size grows.

Many SRL applications could benefit from the availability of soft quantifiers. Collective document classification, for instance, relies on rules such as \(\forall D \forall E \forall C (Cites(D,E) \wedge Class(D,C) \rightarrow Class(E,C))\) which expresses that if documents D and E are linked (e.g., by citation), and D belongs to class C, then E belongs to C (Bach et al. 2013). Soft quantifiers can express that a document should be assigned a class if most of its citing documents have that class, instead of one citing document. Similarly, in collaborative filtering, one can rely on the preferred products of a user to infer the behaviour of a similar user, i.e., \(\forall U_1 \forall U_2 \forall J (Likes(U_1,J) \wedge Similar(U_1,U_2) \rightarrow Likes(U_2,J))\) (Bach et al. 2013). Using soft quantifiers would allow to infer preferences of a user based on most of the behaviours of a similar user, or by comparing one user with most of the users similar to him.

In this paper we present the first SRL framework that combines soft quantifiers with first-order logic rules for modelling uncertain relational data. A brief overview of our framework is presented in Farnadi et al. (2014). We start from probabilistic soft logic (PSL) (Bach et al. 2015), an existing SRL framework that defines templates for hinge-loss Markov random fields (Bach et al. 2013), and extend it to a new framework which we call \(\hbox {PSL}^Q\). As is common in SRL frameworks, in PSL a model is defined by a set of logical rules using a finite set of atoms. However, unlike other SRL frameworks whose atoms are Boolean, atoms in PSL can take continuous values in the interval [0, 1]. Intuitively, value 0 means false and value 1 means true, while any value \(v\in [0,1]\) represents a partial degree of truth.

Our approach differs from existing research on quantifiers for logical reasoning in various ways. Studies on quantifiers in probabilistic logic settings deal with Boolean atoms (Lowd and Domingos 2007; Beltagy and Erk 2015; Van den Broeck et al. 2013), while in this paper atoms take on continuous values. The literature on fuzzy logic contains a fair amount of work on reasoning with continuous values (e.g., Prade et al. 2003; Cao et al. 2002), including the use of soft quantifiers (Bobillo and Straccia 2008), yet, to the best of our knowledge, there is no prior work on such soft quantifiers in the SRL community.

An early version of this paper appeared in Farnadi et al. (2015). In addition to the inclusion of proofs in Sect. 4, this paper extends this earlier work with more extensive evaluation, including a new application presented in Sect. 6. After recalling the preliminaries of PSL in Sect. 2, in Sect. 3 we introduce \(\hbox {PSL}^Q\), a new SRL framework that supports reasoning with soft quantifiers, such as “most” and “a few.” Because this expressivity pushes beyond the capabilities of PSL, in Sect. 4 we introduce new inference and weight learning algorithms for \(\hbox {PSL}^Q\). Finally, as a proof of concept, we present two \(\hbox {PSL}^Q\) models, one for predicting trust in social networks and another one for user profiling in social networks. We show that our \(\hbox {PSL}^Q\) model more accurately predicts trust in social networks than the current state-of-the-art approach in Sect. 5. Similarly, our \(\hbox {PSL}^Q\) model significantly outperforms its sibling PSL model in inferring age and gender in social networks in Sect. 6.

PSL: probabilistic soft logic

In this section, we review the syntax and semantics of probabilistic soft logic (PSL), a probabilistic programming language with a first-order logical syntax. PSL is a probabilistic programming language for defining hinge-loss Markov random fields (Bach et al. 2015) that has been used in various domains, including bioinformatics (Fakhraei et al. 2014; Sridhar et al. 2016), knowledge graph identification (Pujara et al. 2013), recommender systems (Kouki et al. 2015), natural language processing (Beltagy et al. 2014; Deng and Wiebe 2015; Ebrahimi et al. 2016), information extraction (Liu et al. 2016), information retrieval (Alshukaili et al. 2016), and social network analysis (Huang et al. 2013; West et al. 2014), among many others.

Multiple interpretations of PSL semantics are possible. In this paper, we use the Łukasiewicz logic (Klir and Yuan 1995) interpretation, because soft degrees of truth naturally complement soft quantifiers. We also focus on the subset of PSL syntax relevant to our approach. See Bach et al. (2015) for a full explanation of PSL syntax, semantics, and possible interpretations.

We start by defining atoms.

Definition 1

An atom is an expression of the form \(p(a_1, a_2, \ldots , a_n)\) where p is a predicate symbol, and each argument \(a_1, a_2, \ldots , a_n\) is either a constant or a variable. The finite set of all possible substitutions of a variable to a constant for a particular variable \(a_i\) is called its domain \(D_{a_i}\). If all variables in \(p(a_1, a_2, \ldots , a_n)\) are substituted by some constant from their respective domain, then we call the resulting atom a ground atom. We call \(\lnot p(a_1, a_2, \ldots , a_n)\) a negated atom which is the negation of \(p(a_1, a_2, \ldots , a_n)\).

Under the Łukasiewicz logic interpretation, PSL atoms represent soft degrees of truth.

Definition 2

An interpretation I is a mapping that associates a truth value \(I(p) \in [0,1]\) to each ground atom p.

For example, \(I(Knows(Alice, Bob)) = 0.7\) indicates that Alice knows Bob to degree 0.7. We next define programs and rules.

Definition 3

A PSL program is a collection of PSL rules. A PSL rule r is an expression of the form:

$$\begin{aligned} \lambda _{r}: T_1 \wedge T_2 \wedge \ldots \wedge T_w \rightarrow H_1 \vee H_2 \vee \ldots \vee H_l \end{aligned}$$

where \(T_1, T_2, \ldots , T_w, H_1, H_2, \ldots , H_l\) are atoms or negated atoms and \(\lambda _{r} \in \mathbb {R}^{+} \cup \infty \) is the weight of the rule r. We call \(T_1 \wedge T_2 \wedge \ldots \wedge T_w\) the body of r (\(r_{body}\)), and \(H_1 \vee H_2 \vee \ldots \vee H_l\) the head of r (\(r_{head}\)). Grounding a PSL rule r means instantiating all the variables with constants from their domains.

Rules 1–9 in Table 1 and rules 1–4 in Table 7 are examples of PSL programs. Conjunction \(\wedge \) is interpreted by the Łukasiewicz t-norm, disjunction \(\vee \) by the Łukasiewicz t-conorm, and negation \(\lnot \) by the Łukasiewicz negator.

Definition 4

The Łukasiewicz t-norm (\(\tilde{\wedge }\)) and the corresponding t-conorm (\(\tilde{\vee }\)) and negator (\(\tilde{\lnot }\)) are defined as follows. For \(m, n \in [0,1]\) we have \( m \tilde{\wedge } n = \max (0 , m+n -1) \), \( m \tilde{\vee } n = \min (m+n , 1) \) and \( \tilde{\lnot } m = 1 - m \).

The \(\, \tilde{} \,\) indicates the relaxation over Boolean values. Using Definition 4, we can extend the interpretation of atoms to more complex formulas in Łukasiewicz logic.

Definition 5

Given an interpretation I, and \(p_1\) and \(p_2\) ground atoms, we have \( I(p_1 \wedge p_2) = I(p_1)\ \tilde{\wedge }\ I(p_2) \), \( I(p_1 \vee p_2) = I(p_1)\ \tilde{\vee }\ I(p_2) \) and \( I(\lnot \ p_1) = \tilde{\lnot }\ I(p_1) \).

Remark 1

In Łukasiewicz logic, the expression \(B \tilde{\rightarrow } H\) where \( \tilde{\rightarrow }\) is implication, is logically equivalent to \( \tilde{\lnot } B \tilde{\vee } H\), thus the interpretation of a grounded PSL rule r is as follows:

$$\begin{aligned} I (r) = I(r_{body} \rightarrow r_{head}) = \tilde{\lnot } I(r_{body}) \tilde{\vee } I(r_{head}) \end{aligned}$$

Example 1

Consider the grounded PSL rule \(\textit{Knows}(Alice,Bob) \rightarrow \textit{Trusts}(Alice,Bob)\), and suppose \(I(Knows(Alice,Bob)) = 0.5\) and \(I(Trusts(Alice,Bob)) = 0.4\). Then \(I(r_{body}) = 0.5\), \(I(r_{head})=0.4\) and \(I(r) = 0.9\), i.e. rule r is satisfied to degree 0.9 under interpretation I.

The probability of truth value assignments in PSL is determined by the rules’ distance to satisfaction.

Definition 6

The distance to satisfaction \(d_{r}(I)\) of a rule r under an interpretation I is defined as:

$$\begin{aligned} d_{r}(I) = \max \{0, I(r_{body})-I(r_{head})\} \end{aligned}$$

By using Remark 1, one can show that a rule r is fully satisfied, i.e. satisfied to degree 1, when the truth value of its head is at least as high as the truth value of its body. Thus, the closer the interpretation of a grounded rule is to 1, the smaller its distance to satisfaction.

Remark 2

The distance to satisfaction of a PSL rule is equivalent to its negated interpretation: \(d_{r}(I) = \tilde{\lnot } I(r)\).

Example 2

Consider the example PSL rule in Example 1. Let’s assume that we know \(I(Knows(Alice, Bob)) = 0.7\) then to satisfy the rule, \(I(Trusts(Alice, Bob)) \ge 0.7\).

If \(\textit{Trusts(Alice,Bob)}\) is true to 0.5, then the rule is satisfied to degree \(1-0.7+0.5 = 0.8\). Consider the rule r in Example 1: \(d_{r}(I) = 0.1\), which is equal to \(\tilde{\lnot } I(r) = 1 - 0.9 = 0.1\). If \(I(\textit{Trusts(Alice,Bob)}) = 0.6\), then the rule is satisfied (to degree 1) and the distance to satisfaction is 0.

A PSL program, i.e., a set of PSL rules, induces a distribution over interpretations I. Let R be the set of all grounded rules, then the probability density function is:

$$\begin{aligned} f(I)= & {} {\frac{1}{Z}} \exp \left[ -\sum _{r\in R} \lambda _{r}(d_{r}(I))^p\right] \end{aligned}$$
$$\begin{aligned} Z= & {} \int _{I} \exp \left[ -\sum _{r\in R} \lambda _{r}(d_{r}(I))^p\right] \end{aligned}$$

where \(\lambda _{r}\) is the weight of rule r, Z is a normalization constant, and \(p \in \{1,2\}\) provides a choice of two different loss functions, \(p=1\) (i.e., linear) favors interpretations that completely satisfy one rule at the expense of higher distance from satisfaction for conflicting rules, and \(p=2\) favors interpretations that satisfy all rules to some degree (i.e, quadratic). These probabilistic models are instances of hinge-loss Markov random fields (HL-MRF). For further explanation we refer to Bach et al. (2015).

Inference in PSL is performed by finding the most probable explanation (MPE) over a set of given evidence, which is equivalent to maximizing the density function in Eq. 4. For example, in a trust propagation application, given a set of trust relations between users, the goal of MPE inference is to infer the trust degree between all users. Later in this paper we discuss the inference phase in PSL in more detail.

\(\hbox {PSL}^Q\): PSL with soft quantifiers

The statement that most of Bob’s friends are smokers is expressed in \(\hbox {PSL}^Q\) with the quantifier expression Most(TFriend(TBob), Smokes(T)). The general form of such quantifier expressions is \(Q(V,F_1[V], F_2[V])\), in which Q denotes a soft quantifier, V denotes the variable over which the quantification ranges, and \(F_1[V]\) and \(F_2[V]\) are formulas containing V. These formulas can be atoms as well as negations, conjunctions or disjunctions of formulas.

Definition 7

A quantifier expression is an expression of the form

$$\begin{aligned} Q (V,F_1[V], F_2[V]) \end{aligned}$$

where Q is a soft quantifier, and \(F_1[V]\) and \(F_2[V]\) are formulas containing a variable V. A grounded quantifier expression is obtained by instantiating all variables with constants from their domains except for V.

Example 3

Consider the two formulas Knows(XT) and Trust(XT), then \(Most(T, Knows(X,T), Trusts(X,T))\) is a quantifier expression. By substituting X with Alice, we obtain the grounded quantifier expression \(Most(T,Knows(Alice,T), {-}\) Trusts(AliceT)) which can be read as “Alice trusts most of the people she knows”.

Definition 8

A \({\mathbf{PSL}}^Q\) program is a collection of \(\hbox {PSL}^Q\) rules. A \({\mathbf{PSL}}^Q\) rule has the same form as a PSL rule defined in Eq. 1 except that \(T_1,T_2,\ldots ,T_k\) are either atoms, negated atoms or quantifier expressions. Grounding a \(\hbox {PSL}^Q\) rule means instantiating all the variables with constants from their domain except for all the variables V in quantifier expressions \(Q(V, F_1[V], F_2[V])\).

Rules 10–14 in Table 2 and rules 5–8 in Table 8 are examples of \(\mathrm{PSL}^Q\) rules with quantifier expressions.

Analogously to how the interpretation of rules in PSL relies on operations from Łukasiewicz logic (see Definition 4), the interpretation of quantifier expressions in \(\hbox {PSL}^Q\) relies on quantifier mappings.

Definition 9

A quantifier mapping \(\tilde{Q}\) is a \([0,1] \rightarrow [0,1]\) mapping. If \(\tilde{Q}\) is non-decreasing and satisfies the boundary conditions \(\tilde{Q}(0) = 0\) and \(\tilde{Q}(1) = 1\), it is called a coherent quantifier mapping (Delgado et al. 2000).

We assume that for every soft quantifier Q an appropriate quantifier mapping \(\tilde{Q}\) can be defined, i.e. a function that represents the meaning of Q.

Using two thresholds \(\alpha \in [0,1]\) and \(\beta \in [0,1]\), where \(\alpha \le \beta \), the following equation defines a parametrized family of such quantifier mappings:

$$\begin{aligned} \tilde{Q}_{[\alpha , \beta ]}(x) = \left\{ \begin{array}{ll}0 &{} \quad \text{ if } x< \alpha \\ \frac{x-\alpha }{\beta -\alpha } &{} \quad \text{ if } \alpha \le x < \beta \\ 1 &{} \quad \text{ if } x \ge \beta \end{array} \right. \end{aligned}$$

Figure 1 depicts a possible coherent quantifier mapping for the soft quantifier “a few” as \(\tilde{Q}_{Few} = \tilde{Q}_{[0.1,0.4]}\) and for the soft quantifier “most” as \(\tilde{Q}_{Most} = \tilde{Q}_{[0.25,0.75]}\). Note how \(\tilde{Q}_{Few}\) is more relaxed than \(\tilde{Q}_{Most}\). For example, using these mappings, the statement “a few friends of Bob smoke” is true to degree 1 as soon as \(40\%\) of Bob’s friends are smokers, while \(75\%\) of Bob’s friends are required to be smokers for the statement “most friends of Bob smoke” to be fully true. The evaluation section contains a detailed analysis on the effect of the choice of the thresholds \(\alpha \) and \(\beta \) on the results obtained with MPE inference.

Fig. 1
figure 1

Examples of quantifier mappings

An interesting observation is that in practice friendship is not necessarily a black-and-white matter, i.e., people can be friends to varying degrees. For instance, \(I(Frie-\) \(nd(Bob,Alice)) = 1\) and \(I(Friend(Bob,Chris)) = 0.2\) denote that under interpretation I, Alice is a very close friend of Bob, while Chris is a more distant friend. Similarly, Chris might be a heavy smoker, while Alice might be only a light smoker. All these degrees can and should be taken into account when computing the truth degree of statements such as “a few friends of Bob smoke” and “most friends of Bob smoke” (Zadeh 1983).

Remark 3

Zadeh (1983) suggested to calculate the truth value of “Q A’s are Bs”, with \(A: X \rightarrow [0,1]\) and \(B: X \rightarrow [0,1]\) fuzzy sets, as:

$$\begin{aligned} \tilde{Q}\left( \frac{|A \cap B|}{|A|}\right) \end{aligned}$$

where \(A \cap B\) is a fuzzy set defined as:

$$\begin{aligned} A \cap B: X \rightarrow [0,1]: x \mapsto A(x) \tilde{\wedge } B(x) \end{aligned}$$

Remark 4

The cardinality of a fuzzy set \(S: X \rightarrow [0,1]\) is defined as:

$$\begin{aligned} |S| = \sum _{x \in X} S(x) \end{aligned}$$

Definition 10

For a given interpretation I, the interpretation of a grounded quantifier expression \(Q(V, F_1[V], F_2[V])\) is defined as

$$\begin{aligned} I(Q(V,F_1[V], F_2[V])) = \tilde{Q} \left( \frac{\sum _{x \in D_V} I(F_1(x)) \,\tilde{\wedge }\, I(F_2(x))}{\sum _{x \in D_V}I(F_1(x))} \right) \end{aligned}$$

with \(\tilde{Q}\) a quantifier mapping modelling Q.

Example 4

Let’s consider an interpretation I in a sample trust network as shown in Fig. 2. Nodes represent users and each edge represents the trust relation between two users. Since a trust relation is asymmetric, the direction of the trust relation is shown with an arrow. The degree of the trust links are shown with a value under/above the links, e.g., \(I(Trusts(Alice, Ann)) = 0.9\).

Fig. 2
figure 2

Sample trust network between five users

To calculate I(Most(XTrusts(AliceX), Trusts(XBob))), i.e. the degree to which most trustees of Alice trust Bob under the interpretation I shown in Fig. 2, we calculate \(\sum _{x \in D}(\text {I(Trusts}(Alice,x) ) \tilde{\wedge } \text { I(Trusts}(x,Bob))) = 1.3\) and \(\sum _{x \in D} \text {I(Trusts}(Alice,x)) = 3.2\) so \(\tilde{Q} \left( \frac{1.3}{3.2} \right) \sim \tilde{Q}(0.41)\). By using the quantifier expression mapping of “most” in Fig. 1 we obtain \(\tilde{Q}_{[0.25,0.75]}(0.41) = 0.32\). Thus, I(Most(XTrusts(AliceX), Trusts(XBob))) states that “most trustees of Alice trust Bob” to degree 0.32.

Inference and weight learning in \(\hbox {PSL}^Q\)

Expressing soft quantifiers pushes beyond the capabilities of inference and weight learning methods in PSL. In this section, we introduce new methods for inference based on the most probable explanation inference method (MPE inference) and weight learning with maximum-likelihood estimation (MLE) in \(\hbox {PSL}^Q\).


The goal of MPE “most probable explanation” inference is to find the most probable truth assignments \(I_{MPE}\) of unknown ground atoms given the evidence which is defined by the interpretation I. Let X be all the evidence, i.e., X is the set of ground atoms such that \(\forall x \in X, I(x)\) is known, and let Y be the set of ground atoms such that \(\forall y \in Y, I(y)\) is unknown. Then we have

$$\begin{aligned} I_{MPE}(Y) = \textit{arg}\max _{I(Y)} P(I(Y)|I(X)) \end{aligned}$$

and by Eq. 4 it follows that the goal of optimization is to minimize the weighted sum of the distances to satisfaction of all rules.

Remark 5

Suppose we want to optimize a \(f: [0,1]^n \rightarrow [0,1]\) function consisting of applications of only piecewise linear functions, fractions of piecewise linear functions, \(\min : [0,1]^2 \rightarrow [0,1]\) and \(\max : [0,1]^2 \rightarrow [0,1]\). We can transform such an optimization problem as follows. For every expression of the form \(\min (\phi ,\psi )\), we introduce a variable \(v_{\min (\phi ,\psi )}\) and add the constraints \(0 \le \phi , \psi , v_{\min (\phi ,\psi )} \le 1\), \(v_{\min (\phi ,\psi )} \le \phi \) and \(v_{\min (\phi ,\psi )} \le \psi \). Similarly, for every expression of the form \(\max (\phi ,\psi )\), we introduce a variable \(v_{\max (\phi ,\psi )}\) and add the constraints: \(0 \le \phi , \psi \), \(v_{\max (\phi ,\psi )} \le 1, v_{\max (\phi ,\psi )} \ge \phi \), and \(v_{\max (\phi ,\psi )} \ge \psi \).

Define the function g as the original function f but all minima and maxima are replaced by their corresponding variables. Optimizing f is then equivalent to optimizing g under these constraints.

Proposition 1

MPE inference for a PSL program is equivalent to solving a linear optimization problem.


The goal of optimization in PSL is to minimize the weighted sum of the distances to satisfaction of all rules, therefore we have:

$$\begin{aligned} I_{MPE}(Y) =\textit{arg}\max _{I(Y)}{-\sum _{r\in R} \lambda _{r}(d_{r}(I(X,Y)))} \end{aligned}$$

By the particular piecewise linear form of \(d_r(I)\) (see Definition 6) and Remark 5, finding an MPE assignment is a linear optimization problem, which is solvable in polynomial time. Note that we consider the linear form which is \(p=1\) in Eq. 4 throughout this section. However, PSL supports \(p=2\) where finding an MPE assignment is a convex optimization problem, which is solvable in polynomial time.

PSL also supports aggregates, i.e., random variables with values determined by other variables. To preserve convexity, however, standard PSL only supports linear aggregates. \(\square \)

Definition 11

An aggregate is a \([0,1]^n \rightarrow [0,1]\) mapping. If it is a linear mapping, it is called a linear aggregate, otherwise it is called a non-linear aggregate.

As an example, \(f: [0,1]^n \rightarrow [0,1]: (t_1, \ldots , t_n) \mapsto \frac{t_1+t_2+\cdots +t_n}{n}\) is a linear aggregate. A \(\hbox {PSL}^Q\) program allows expressions that contain quantifier expressions. Since the interpretation of a grounded quantifier expression (See Definition 10) is based on a non-linear aggregate, finding a MPE assignment of a \(\hbox {PSL}^Q\) program with quantifier expressions is beyond the capabilities of the standard PSL MPE-solver. To deal with this, we will first categorize different types of grounded quantifier expressions, given the interpretation I denoting the evidence.

Definition 12

A grounded quantifier expression \(Q(V, F_1[V], F_2[V])\), where for every \(s \in D_V\), it holds that all ground atoms in the formulas \(F_1[s]\) and \(F_2[s]\) are in X, is called a fully observed grounded quantifier expression (FOQE).

For instance, in a social network where the age and the friends of all users are known, by grounding Most(BFriend(AB), Young(B)), we obtain FOQEs. Note that for a FOQE \(Q(V, F_1[V], F_2[V])\), we have that \(I(Q(V, F_1[V], F_2[V]))\) is a known value in [0, 1].

Proposition 2

MPE inference for a \(\hbox {PSL}^Q\) program with grounded quantifier expressions limited to type FOQE is equivalent to solving a linear optimization problem.


By replacing all the FOQEs in the program with new variables, where the value of these variables are known and are in [0, 1], there is no difference between MPE inference for a \(\hbox {PSL}^Q\) program without any quantifier expressions and for a \(\hbox {PSL}^Q\) program with grounded quantifier expressions of type FOQE. The only difference is the processing time needed to calculate the value of each quantifier expressions, which is O(|X|). \(\square \)

Definition 13

A grounded quantifier expression \(Q(V, F_1[V], F_2[V])\), where for every \(s \in D_V\), it holds that all ground atoms in the formula \(F_1[s]\) are in X and there exists \(t \in D_V\) such that at least one ground atom in the formula \(F_2[t]\) is in Y, is called a partially observed grounded quantifier expression of type one (\({\mathbf{POQE}}^{(1)}\)).

Suppose all friendship relations are known and the goal is to infer the age of all users based on the age of some, then by grounding Most(BFriend(AB), Young(B)), we obtain \(\hbox {POQE}^{(1)}\)s. By grounding the rules 5–8 in Table 8, we obtain examples of \(\hbox {POQE}^{(1)}\)s. Note that for a \(\hbox {POQE}^{(1)}\) \(Q(V, F_1[V], F_2[V])\), we have that \(I(Q(V, F_1[V], F_2[V])) = \tilde{Q}(f(Y))\) where f is a piecewise linear function in variables belonging to Y.

Definition 14

A grounded quantifier expression \(Q(V, F_1[V], F_2[V])\), for which there exists \(t \in D_V\) such that at least one ground atom in the formula \(F_1[t]\) is in Y, is called a partially observed grounded quantifier expression of type two (\({\mathbf{POQE}}^{(2)}\)).

Note that for a \(\hbox {POQE}^{(2)}\) \(Q(V, F_1[V], F_2[V])\), we have that \(I(Q(V, F_1[V], F_2[V])) = \tilde{Q}(f(Y))\) where f is a fraction of piecewise linear functions in variables belonging to Y. In link prediction applications, such as trust link prediction, we mostly deal with \(\hbox {POQE}^{(2)}\)s. By grounding the rules 10–14 in Table 2 using unknown trust relations, we obtain complex examples of \(\hbox {POQE}^{(2)}\)s.

In the following proposition we give an equivalent definition for the membership function in Eq. 7. By applying Remark 5 we will then be able to show that a \(\hbox {PSL}^{Q}\) program can be transformed to an optimization program with an objective function that is a weighted sum of linear and fractional linear functions.

Proposition 3

The membership-function defined in Eq. 7 where \(\alpha \in [0,1]\), \(\beta \in [0,1]\), and \(\alpha < \beta \) can be rewritten as:

$$\begin{aligned} \tilde{Q}_{[\alpha , \beta ]}(x) = \max \left( 0, \frac{x-\alpha }{\beta -\alpha }\right) + \min \left( \frac{x-\alpha }{\beta -\alpha }, 1\right) - \frac{x-\alpha }{\beta -\alpha } \end{aligned}$$


By checking the boundary conditions \(x<\alpha \), \(\alpha \le x <\beta \) and \(x \ge \beta \), one can show that Eq. 13 is equivalent to Eq. 7. \(\square \)

Proposition 4

MPE inference for a \(\hbox {PSL}^Q\) program with grounded quantifier expressions limited to type FOQE and \(\hbox {POQE}^{(1)}\) is equivalent to solving a linear optimization problem.


A grounded quantifier expression \(Q(V, F_1[V], F_2[V])\) of type \(\hbox {POQE}^{(1)}\) is of the form:

$$\begin{aligned} \tilde{Q} \left( \frac{\sum _{x \in D_V} \max (0, C_x + I(F_2(x)) -1)}{\sum _{x \in D_V} C_x} \right) \end{aligned}$$

where \(C_x\) is the outcome of calculating the value of the grounded formula \(F_1(x)\), i.e., \(C_x = I(F_1(x))\). Since all ground atoms in the formula \(F_1(x)\) are in X, \(C_x\) is a constant value in [0, 1]. Let’s assume \(\lambda = \frac{1}{\sum _{x \in D_V} C_x}\) and \(\gamma = \frac{1}{\beta -\alpha }\). Both \(\lambda \) and \(\gamma \) are constant values. By using Proposition 3 the \(\hbox {POQE}^{(1)}\) is converted to:

$$\begin{aligned}= & {} \max \left( 0, \gamma \times \left( \left( \lambda \times {\sum _{x \in D_V} \max (0, C_x + I(F_2(x)) -1))} \right) -\alpha \right) \right) \\&+\min \left( \gamma \times \left( \left( \lambda \times {\sum _{x \in D_V} \max (0, C_x + I(F_2(x)) -1))} \right) -\alpha \right) , 1 \right) \\&-\gamma \times \left( \left( \lambda \times {\sum _{x \in D_V} \max (0, C_x + I(F_2(x)) -1))} \right) -\alpha \right) \end{aligned}$$

By using Remark 5, we first introduce new variables for all the inner expressions of the form \(\max (0, C_x + I(F_2(x)) -1))\) and replace them by their corresponding variables. Then we introduce new variables for the two outer maxima and minima expressions and replace them with their corresponding variables. As a result, each grounded quantifier expression of type \(\hbox {POQE}^{(1)}\) is replaced with a linear expression and a set of linear constraints. MPE inference for the \(\hbox {PSL}^Q\) program limited to quantifier expressions of type FOQE and \(\hbox {POQE}^{(1)}\) is then equivalent to minimizing a linear function under a set of linear constraints coming from replacing the minima and maxima in calculating the distance to satisfaction of grounded rules. \(\square \)

MPE inference for a \(\hbox {PSL}^Q\) program with grounded quantifier expressions of type \(\hbox {POQE}^{(2)}\) is solved with a sequence of linear optimization problems. Note that this is only a worst case scenario: if the grounded PSL\({^Q}\) program has no \(\hbox {POQE}^{(2)}\)s then we obtain a linear program. In the presence of \(\hbox {POQE}^{(2)}\)s, we use a transformation similar to the approach of Isbell and Marlow (1956) to replace a linear fractional program (LFP) by a set of linear programs by establishing a convergent iterative process. The linear program at each iteration is determined by optimization of the linear program at the previous iteration. Note that our \(\hbox {PSL}^Q\) program with grounded quantifier expressions of type \(\hbox {POQE}^{(2)}\) has an objective function that is a weighted sum linear and fractional linear functions, and is subject to linear equality and inequality constraints. Transformations of a LFP to a LP such as Charnes/Cooper transformation (Charnes and Cooper 1962) are not suitable for our problem domain.

figure a

The algorithm we propose for MPE inference (Algorithm 1) starts by initializing the set of all grounded rules (i.e., \(\textit{R}\)) to an empty set and all random variables to zero (i.e., line 2). Then, an iterative process starts by grounding all rules in the \(\hbox {PSL}^Q\) program (i.e., line 3–5). For every grounded quantifier expression Q of type \(\hbox {POQE}^{(2)}\), the value of Q is initialized by calculating the value over the known values (I(X)) and the current setting of the unknown values (\(I^{(0)}(Y)\)). In the algorithm, we use the notation \(\tilde{Q}(I(X) \cup I^{i-1}(Y))\) to denote this new interpretation of Q at iteration i (i.e., line 7–9). For each rule \(r_g\) we then calculate the distance to satisfaction (i.e., line 10). Note that \(I(r_g)\) and hence also \(d_{r_g}(I)\) can be piecewise linear functions in Y, but here \(d_{r_g}(I)\) does not contain fractions of piecewise linear functions since we calculate values for the \(\hbox {POQE}^{(2)}\)s. Next, we exclude the satisfied grounded rules (i.e., we exclude rules \(r_g\) such that \(d_{r_g}(I) = 0\)) from the optimization since their values will not change the optimization task (i.e., line 11–13). For the optimization task, f(I) (Eq. 4) is calculated using the distance to satisfaction of all grounded rules (i.e., line 16). Since f(I) does not contain fractions of piecewise linear functions, it can be transformed to a linear program (i.e., line 17). Finally, the inner optimization in \(\hbox {PSL}^Q\) is solved with PSL’s scalable, parallelizable message-passing inference algorithm (Bach et al. 2013) (i.e., line 18). In each iteration, the values of the Qs get updated by the most probable assignment of random variables in the previous iteration (\(I(X) \cup I^{(i-1)}(Y)\)) (i.e., line 8). This process is iteratively repeated for a fixed number of times (i.e., k).

Note that by considering the quadratic form which is \(p=2\) in Eq. 4, MPE inference for a \(\hbox {PSL}^Q\) program with grounded quantifier expressions of type FOQE is similar to the MPE inference in a PSL program. However, MPE inference for a \(\hbox {PSL}^Q\) program with grounded quantifier expressions of either type \(\hbox {POQE}^{(1)}\) or type \(\hbox {POQE}^{(2)}\) is based on our proposed iterative MPE algorithm. The inner optimization in \(\hbox {PSL}^Q\) of the quadratic form is solved with PSL’s inference algorithm (Bach et al. 2013) with squared hinge-loss functions.

Weight learning

The goal of weight learning based on maximum likelihood estimation (MLE) is to maximize the log likelihood of the rules’ weight based on the training data in Eq. 4. Hence, the partial derivatives of log likelihood with respect to \(\lambda _i\) of rule \(r_i \in R\) are

$$\begin{aligned} -\frac{\delta \log (f(I))}{\delta \lambda _i} = E_\lambda \left[ \sum _{r\in R_gi} (d_{r}(I))^p\right] -\sum _{r\in R_gi} (d_{r}(I))^p \end{aligned}$$

with \(E_\lambda \) the expected value under the distribution defined by \(\lambda \), and \(R_gi\) is the set of grounded rules of rule \(r_i\). The optimization is based on the voted perception algorithm (Collins 2002), in which approximation is done by taking fixed-length steps in the direction of gradient and averaging the points after all steps; out of the scope steps are projected back into the feasible region. To make the approximation tractable, a MPE approximation is used that replaces the expectation in the gradient with the corresponding values in the MPE state. We use our proposed MPE approach for transforming \(\hbox {POQE}^{(1)}\)s and \(\hbox {POQE}^{(2)}\)s in our MLE algorithm. We omit the pseudocode of the MLE algorithm for a \(\hbox {PSL}^Q\) program to save space.

To investigate the effects of using soft quantifiers on real-world applications, we explore two applications. The first application is a link prediction task in which we have grounded quantifier expressions of type \(\hbox {POQE}^{(2)}\) and test our iterative MPE inference in Sect. 5. The second application is a node labeling application in which we apply our transformed MPE inference using grounded quantifier expressions of type \(\hbox {POQE}^{(1)}\) in Sect. 6.

Link prediction: social trust link prediction

Studies have shown that people tend to rely more on recommendations from people they trust than on online recommender systems which generate recommendations based on anonymous people similar to them. This observation has generated a rising interest in trust-enhanced recommendation systems (Victor et al. 2011). The recommendations generated by these systems are based on an (online) trust network, in which members of the community express whether they trust or distrust each other. In practice these networks are sparse because most people are connected to relatively few others. Trust-enhanced recommendation systems therefore rely on link prediction.

In Huang et al. (2013), trust relations between social media users are modeled and predicted using a PSL program based on the structural balance theory (Heider 1958). Structural balance theory implies the transitivity of a relation between users. Based on this theory, users are more prone to trust their neighbors in the network rather than unknown other users. Bach et al. (2013)Footnote 1 evaluated the PSL program based on the structural balance theory on data from Epinions,Footnote 2 an online consumer review site in which users can indicate whether they trust or distrust each other. Throughout this section, we will use the same sample of Epinions (Leskovec et al. 2010). The sample dataset includes 2000 users with 8675 relations, namely 7974 trust relations and only 701 distrust relations.

We perform eightfold cross-validation and to evaluate the results, we use three metrics, AUC: the area under the receiver operating characteristic curve, PR+: the area under the precision-recall curves for trust relations, and PR−: the area under the precision-recall curves for distrust relations. In each fold, we first learn the weights of the rules based on 7/8 of the trust network and then apply the learned model on the remaining 1/8 to infer the trust/distrust relations. Bach et al. used the program of Huang et al. (2013) which is composed of twenty PSL rules in order to predict the degree of trust between two individuals. Sixteen rules from these rules encode possible stable triangular structures involving the two individuals and a third one. For example, an individual is likely to trust people his or her friends trust. The program of Huang et al. (2013) is used to predict unobserved truth-values of Trusts(AB) for pairs of individuals. The results of this program are shown in the first line in Table 3.

Table 1 PSL rules for social trust link prediction

In this paper, we propose a program based on 4 transitive rules (rules 1–4 in Table 1) and one rule which models the cyclic relation between 3 users (rule 5 in Table 1). Rules 6–9 in Table 1 are complementary rules for which we refer to Huang et al. (2013) for further explanation. The atom \(Average(\{Trusts\})\) in rules 8 and 9 is a constant which refers to the global average value of observed trust scores. This atom is useful for the disconnected parts of the trust network without any known trust relation. These four rules are also used in the PSL program of Bach et al. (2013).

Table 2 \(\hbox {PSL}^Q\) rules for social trust link prediction

To investigate whether we can improve the accuracy of the predictions by introducing rules with soft quantifier expressions, we construct \(\hbox {PSL}^Q\) rules based on a triad relation over a set of users instead of a third party (rules 10–14) in Table 2. The full \(\hbox {PSL}^Q\) program then consists of all rules displayed in Tables 1 and 2 .

Fig. 3
figure 3

a PR+ , b PR−, and c AUC when changing \(\alpha \) and \(\beta \) in the quantifier mapping \(\tilde{Q}\)

Table 3 Values with a \(*\) are statistically significant with a rejection threshold of 0.05 and values in bold are statistically significant with a rejection threshold of 0.1 using a paired t test w.r.t. the PSL program (Bach et al. 2013; Huang et al. 2013)
Fig. 4
figure 4

Visualizing the weighted sum of the distance to satisfaction of grounded rules for eightfold in 50 iterations of MPE inference for social trust \(\hbox {PSL}^Q\) program

We examine what happens when changing the thresholds for the quantifier mappings \(\tilde{Q}\) (Eq. 7). We have investigated ten different quantifier mappings by changing the values of \(\alpha \) and \(\beta \) by steps of 0.25. In this way, we obtain ten different \(\hbox {PSL}^Q\) programs. For every program, we applied Algorithm 1 for all \(k \in \{1, 2, \ldots , 10\}\). Note that for \(k=1\), since our \(\hbox {PSL}^Q\) program only contains quantifier expressions of type \(\hbox {POQE}^{(2)}\), the output of the MPE inference is equivalent to the output generated by a \(\hbox {PSL}^Q\) program with only FOQEs by ignoring the unknown values. Figure 3 presents changes of the three metrics of these ten \(\hbox {PSL}^Q\) programs with different quantifier mappings. All ten \(\hbox {PSL}^Q\) programs outperform the PSL program (shown with a line) in all iterations and in all three metrics, except for the \(\hbox {PSL}^Q\) program with \(\tilde{Q}_{[0.75,1]}\) in PR− after the first two iterations. An explanation for this is the fact that people trust/distrust a third party as soon as a few/some of their trusted/distrusted friends trust/distrust that person and not most of them, i.e., more than 75%. Interestingly, by decreasing both \(\alpha \) and \(\beta \) values, results get better. The program with the best predicting scores is \(\hbox {PSL}^Q\) with \(\tilde{Q}_{[0,0.25]}\) as a quantifier mapping representing “a few” (see Table 3).

Figure 4 presents the convergence curve of the 8 \(\hbox {PSL}^Q\) programs with quantifier mapping \(\tilde{Q}_{[0,0.25]}\) based on eightfold for 50 iterations during training. At each iteration k, all the grounded rules in the \(\hbox {PSL}^Q\) program are generated and then the weighted sum of distance to satisfaction of all grounded rules is calculated. As shown in Fig. 4, MPE inference converges for all eightfold. After \(k=10\) the variation rate in the outcome of calculating the weighted sum of distance to satisfaction value is very low. Note that the range of the weighted sum of the distance to satisfaction of each fold is different from each other because the program is completely different.

Fig. 5
figure 5

Learned values of the weight of the 14 rules of the \(\hbox {PSL}^Q\) program

Figure 5 emphasizes the importance of the \(\hbox {PSL}^Q\) rules with quantifier expressions (rules 10–14) after the weight learning phase. Bars represent average and error bars represent minimum and maximum weights of the rules learned in eightfold for the \(\hbox {PSL}^Q\) program with quantifier mapping \(\tilde{Q}_{[0,0.25]}\). These results show that using soft quantifiers not only improves the accuracy of trust and distrust predictions but also that the rules containing soft quantifiers, i.e. rules 10–14, play a major part in this by dominating all other rules in terms of weight. In these experiments, we used one quantifier mapping for all the quantifiers in a \(\hbox {PSL}^Q\) program; however it is possible to use different mapping functions for each quantifier expression in a \(\hbox {PSL}^Q\) program, which is an interesting direction for future research.

Training and predicting with PSL takes little time. In our experiments, training each model takes about a few minutes, while predicting takes under a second per trust/distrust relation. For the case of the \(\hbox {PSL}^Q\) program, training takes almost k times (i.e, number of iterations) the time of training our baseline PSL program.

Node labeling: user profiling in social networks

Many applications benefit from reliable approaches of user profiling in social networks. Such applications exist in various fields, from personalized advertising to reputation management (Dijkmans et al. 2015; Ha et al. 2015). In this section, we present and evaluate a \(\hbox {PSL}^Q\) program for inferring age and gender of social network users.

We collected user profiles from Netlog, an early social network that had over 90 million users worldwide at its peak. By applying snowball sampling, and starting from one user, we crawled the profiles of 3015 users, called the core users henceforth. Out of these 3015 profiles, 765 users (25%) have private profiles (the private core users) and 2241 (75%) have public profiles (the public core users). Next we crawled the user profiles of all the friends of the public core users, resulting in 171,439 additional profiles, referred to as the background users. Note that we could not do the same for the private core users, as their friend lists are not publicly accessible. We ended up with a sample network including 174,454 users and 277,191 friendship links. For all users in the sample, we extracted their age and gender, which is publicly available for all users in Netlog. Table 4 describes detailed information regarding the sample.Footnote 3

Table 4 Details about the Netlog data sample

The gender classes are fairly imbalanced among the core users (64% male vs. 36% female) and even more so among the public core users (69% male vs. 31% female). The average age of users in the overall sample is 29 years old, and the median is 25 years old. We choose the threshold of being 25 years old to divide users into two age groups of Young and Non-young users. The resulting age groups in the sample are uniform with \(53\%\) Young users and \(47\%\) Non-young users. Interestingly, the majority of private core users (i.e., \(72\%\)) are Young users, which indicates that Young users are more concerned about their privacy.

Table 5 Description of the network properties of the Netlog sample graph

Table 5 includes a description of the social graph. Since we used snowball sampling, the graph consists of a single connected component. The average degree of the public core users is substantially higher than the average degree of users in the sample overall because for each public core user we purposely crawled and included all their friends, which we did not do for the background users.

Table 6 Analysis of users’ age and gender given their friends age and gender clearly indicate the age homophily and gender heterophily in the graph

Given an interpretation I, and \(P_1\) and \(P_2\) predicate symbols such as Female, Male, Young or Non-young, we explore the relation between the public core users age and gender and the age and gender of their friends by calculating \(N_{P_1,P_2}\) as:

$$\begin{aligned} N_{P_1,P_2} = \frac{1}{|C_{P_1}|} \frac{|\{v|P_2(v) \wedge (u,v) \in E\}|}{|\{v|(u,v) \in E\}|} \end{aligned}$$

where \(\Pi = \{ u | u \text { is a public core user}\}\), \(C_P = \{u | u \in \Pi \wedge P(u)\}\), and E denoted the friendship relations in the Netlog sample graph. The results are presented in Table 6 which gives a first insight on how gender of users are structured within the Netlog network. Users’ behaviour in Netlog is of the type gender heterophily, which indicates the tendency of users to connect with the opposite gender, deriving from the particular usage of this social network site for flirting and dating purposes. The age results in Table 6 on the other hand clearly indicate that users have a strong tendency to connect within their same age group, i.e., age homophily behaviour. These results are in line with the properties of the Netlog network in Farnadi et al. (2015), and are apparent from Fig. 6 as well.

In the gender graph (Fig. 6), blue (i.e., male) and red (i.e., female) users are mostly connected to the opposite color, while in the age graph (Fig. 6) there is a clear tendency of connecting with the same color.

Fig. 6
figure 6

Visualizing the Netlog sample graph with Gephi (Bastian et al. 2009) using OpenOrd layout. Red nodes represent female users and blue nodes represent male users in (a) and similarly, green nodes represent Young users and orange nodes represent Non-young users in (b). a Gender graph. b Age graph

Below we present PSL and \(\hbox {PSL}^Q\) programs to be used in scenarios where we only have the age and gender of some users and our aim is to infer the age and gender of the remaining users given the social relations between them. We perform tenfold cross validation by randomly dividing the public core users in 10 parts. In each fold, one part is used for testing and the other 9 parts are added to the background and private core users and used as the training set for that fold.

Table 7 PSL rules for user profiling

We built a PSL program as shown in Table 7 based on our observation that the Netlog network exhibits gender heterophily and age homophily property. For example, the first PSL rule indicates that if two users are friends and one of them is female, it is more probable that the other user is male. Since being female and being male is mutual exclusive, where \(Female(A) = \lnot Male(A)\), we only need to consider one predicate for gender. For example, \(I(Female(X))=1\) indicates that user X is Female and \(I(Female(X))=0\) indicates that user X is male. Similarly, we only consider one predicate for the age group. For example, \(I(Young(X))=1\) indicates that user X is young. Since friendship relations in Netlog are undirected and symmetric, \(I(Friend(A,B))=I(Friend(B,A))\). To include both directions in the program, we populate both friendship directions in our graph, which produces 554,374 friendship relations in our dataset.

Table 8 \(\hbox {PSL}^Q\) rules for user profiling

To examine the effects of using soft quantifiers, we built a \(\hbox {PSL}^Q\) program as shown in Table 8. The first two rules express the intuition behind gender heterophily, meaning that if most of A’s friends are female, then A is male (rule \(R\#5\)), and that if most of A’s friends are male, then A is female (rule \(R\#6\)). Along those lines, rule \(R\#7\) and \(R\#8\) express age homophily.

Table 9 Result of age and gender prediction in Netlog

We examine different quantifier mappings to find the best \(\alpha \) and \(\beta \) values. We choose a step of 0.25 to produce 10 different quantifier mappings and evaluate them based on PR−, PR+, AUC and overall accuracy (i.e., the number of correct predictions made divided by the total number of predictions made). As shown in Table 9, the best AUC scores are obtained with the \(\hbox {PSL}^Q\) program with \(Q_{[0,0.5]}\) as a quantifier mapping representing “most”. The \(\hbox {PSL}^Q\) program with \(Q_{[0,0.25]}\) as a quantifier mapping, which indicates “a few”, gets the best overall accuracy for both the age and the gender prediction task. Both \(\hbox {PSL}^Q\) programs significantly outperform their sibling PSL program and the majority baseline. The \(\hbox {PSL}^Q\) programs owe their high performance to their local averaging behaviour for each user while the behaviour of the PSL program is less local, and, for the PSL rules presented in this paper, approximates global averaging behaviour.


In this paper, we have introduced \(\hbox {PSL}^Q\), the first SRL framework that supports reasoning with soft quantifiers, such as “most” and “a few”. \(\hbox {PSL}^Q\) is a powerful and expressive language to model uncertain relational data in an intuitive way. Since this expressivity pushed beyond the capabilities of existing PSL-MPE solvers, we have introduced and implemented new inference and weight learning algorithms that can handle rules with soft quantifiers. We have shown how the higher expressivity of \(\hbox {PSL}^Q\) can lead to better results in practice by extending an existing PSL program for link prediction in social trust networks with rules that contain soft quantifiers. Similarly, we have presented results for both PSL and \(\hbox {PSL}^Q\) programs for user profiling in a social network. We have presented the effects of using different interpretations of soft quantifiers in both applications. As a next step, we want to include an automatic way of learning the best interpretation for each quantifier expression in a \(\hbox {PSL}^Q\) program. Besides trust link prediction and user profiling, many other applications could benefit from the use of soft quantifiers. Exploring the effects of using soft quantifiers in \(\hbox {PSL}^Q\) programs for various applications in different domains is therefore another promising research direction. Furthermore, in addition to the approach of Zadeh that we have used in this paper, other approaches for soft quantifiers have been proposed (Delgado et al. 2014), most notably Yager’s OWA-operators (Yager 1988); we plan to investigate them in our future work.


  1. Source code available at


  3. The Netlog dataset is available at:


  • Alshukaili, D., Fernandes, A. A. A., & Paton, N. W. (2016). Structuring linked data search results using probabilistic soft logic. In International semantic web conference (ISWC).

  • Bach, S. H., Broecheler, M., Huang, B., & Getoor, L. (2015). Hinge-loss Markov random fields and probabilistic soft logic. arXiv:1505.04406 [cs.LG].

  • Bach, S. H., Huang, B., London, B., & Getoor, L. (2013). Hinge-loss Markov random fields: Convex inference for structured prediction. In Proceedings of the Uncertainty in Artificial Intelligence (UAI).

  • Bastian, M., Sebastien, H., & Mathieu, J. (2009). Gephi: An open source software for exploring and manipulating networks. In Proceedings of the international AAAI conference on web and social media (ICWSM) (pp. 361–362).

  • Beltagy, I., & Erk, K. (2015). On the proper treatment of quantifiers in probabilistic logic semantics. In Proceedings of the 11th international conference on computational semantics (IWCS) (p. 140).

  • Beltagy, I., Erk, K., & Mooney, R. J. (2014). Probabilistic soft logic for semantic textual similarity. In Proceedings of the 52nd annual meeting of the association for computational linguistics (ACL) (pp. 1210–1219).

  • Bobillo, F., & Straccia, U. (2008). fuzzyDL: An expressive fuzzy description logic reasoner. In Proceedings of the IEEE international conference on fuzzy systems (FUZZ-IEEE) (pp. 923–930).

  • Cao, T. H., Rossiter, J. M., Martin, T. P., & Baldwin, J. F. (2002). On the implementation of Fril++ for object-oriented logic programming with uncertainty and fuzziness. In Technologies for constructing intelligent systems 2 (pp. 393–406). Physica-Verlag HD.

  • Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research Logistics Quarterly, 9, 181–186.

    MathSciNet  Article  MATH  Google Scholar 

  • Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the international conference on empirical methods in natural language processing (ACL) (pp. 1–8).

  • Delgado, M., Ruiz, M.-D., Sánchez, D., & Vila, M.-A. (2014). Fuzzy quantification: A state of the art. Fuzzy Sets and Systems, 242, 1–30.

    MathSciNet  Article  MATH  Google Scholar 

  • Delgado, M., Sánchez, D., & Vila, M. A. (2000). Fuzzy cardinality based evaluation of quantified sentences. International Journal of Approximate Reasoning, 23, 23–66.

    MathSciNet  Article  MATH  Google Scholar 

  • Deng, L., & Wiebe, J. (2015). Joint prediction for entity/event-level sentiment analysis using probabilistic soft logic models. In Conference on empirical methods in natural language processing (EMNLP).

  • Dijkmans, C., Kerkhof, P., & Beukeboom, C. J. (2015). A stage to engage: Social media use and corporate reputation. Tourism Management, 47, 58–67.

    Article  Google Scholar 

  • Ebrahimi, J., Dou, D., & Lowd, D. (2016). Weakly supervised tweet stance classification by relational bootstrapping. In Conference on empirical methods in natural language processing (EMNLP).

  • Fakhraei, S., Huang, B., Raschid, L., & Getoor, L. (2014). Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11, 775–787.

    Article  Google Scholar 

  • Farnadi, G., Bach, S., Blondeel, M., Moens, M.-F., Getoor, L., & De Cock, M. (2015). Statistical relational learning with soft quantifiers. In Proceedings of 25th international conference on inductive logic programming (ILP).

  • Farnadi, G., Bach, S. H., Moens, M. F., Getoor, L., & De Cock, M. (2014). Extending PSL with fuzzy quantifiers. In Proceedings of the Fourth International Workshop on Statistical Relational AI at AAAI (StarAI).

  • Farnadi, G., Mahdavifar, Z., Keller, I., Nelson, J., Teredesai, A., Moens, M.-F., & De Cock, M. (2015). scalable adaptive label propagation in Grappa. In Proceedings of IEEE international conference on big data (IEEE-BigData).

  • Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge: MIT press.

    MATH  Google Scholar 

  • Ha, I., Oh, K.-J., & Jo, G.-S. (2015). Personalized advertisement system using social relationship based user modeling. Multimedia Tools and Applications, 74, 8801–8819.

    Article  Google Scholar 

  • Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley.

    Book  Google Scholar 

  • Huang, B., Kimmig, A., Getoor, L., & Golbeck, J. (2013). A flexible framework for probabilistic models of social trust. In Social computing, behavioral-cultural modeling and prediction (pp. 265–273).

  • Isbell, J. R., & Marlow, W. H. (1956). Attrition games. Naval Research Logistics Quarterly, 3, 71–94.

    MathSciNet  Article  MATH  Google Scholar 

  • Jain, D., Barthels, A., & Beetz, M. (2010). Adaptive Markov logic networks: Learning statistical relational models with dynamic parameters. In Proceedings of the European conference on artificial intelligence (ECAI) (pp. 937–942).

  • Kazemi, S. M., Buchman, D., Kersting, K., Natarajan, S., & Poole, D. (2014). Relational logistic regression. In Proceedings of the international conference on principles of knowledge representation and reasoning (KR).

  • Klir, G., & Yuan, B. (1995). Fuzzy sets and fuzzy logic. New Jersey: Prentice Hall.

    MATH  Google Scholar 

  • Kouki, P., Fakhraei, S., Foulds, J., Eirinaki, M., & Getoor, L. (2015). HyPER: A flexible and extensible probabilistic framework for hybrid recommender systems. In ACM conference on recommender systems (RecSys).

  • Leskovec, J., Huttenlocher, D., & Kleinberg, J. (2010). Signed Networks in Social Media. In Proceedings of the 28th ACM conference on human factors in computing systems (CHI).

  • Liu, S., Liu, K., He, S., & Zhao, J. (2016). A probabilistic soft logic based approach to exploiting latent and global information in event classification. In AAAI conference on artificial intelligence (AAAI).

  • Lowd, D., & Domingos, P. (2007). Recursive random fields. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 950–955).

  • Milch, B., Zettlemoyer, L. S., Kersting, K., Haimes, M., & Kaelbling, L. P. (2008). Lifted probabilistic inference with counting formulas. In Proceedings of the international conference on artificial intelligence (AAAi) (Vol. 8, pp. 1062–1068).

  • Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19, 629–679.

    MathSciNet  Article  MATH  Google Scholar 

  • Poole, D., Buchman, D., Kazemi, S. M., Kersting, K., & Natarajan, S. (2014). Population size extrapolation in relational probabilistic modelling. In Proceedings of the international conference on scalable uncertainty management (SUM) (pp. 292–305). Springer.

  • Poole, D., Buchman, D., Natarajan, S., & Kersting, K. (2012). Aggregation and population growth: The relational logistic regression and Markov logic cases. In Proceedings of the international workshop on statistical relational AI at UAI (StarAI).

  • Prade, H., Richard, G., & Serrurier, M. (2003). Learning first order fuzzy logic rules. In Proceedings of the 10th international fuzzy systems world congress (IFSA) (pp. 702–709). Springer.

  • Pujara, J., Miao, H., Getoor, L., & Cohen, W. (2013). Knowledge graph identification. In Proceedings of the international semantic web conference (ISWC).

  • Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62, 107–136.

    Article  Google Scholar 

  • Sridhar, D., Fakhraei, S., & Getoor, L. (2016). A probabilistic approach for collective similarity-based drug-drug interaction prediction. Bioinformatics, 32(20), 3175–3182.

    Article  Google Scholar 

  • Van den Broeck, G., Meert, W., & Darwiche, A. (2013). Skolemization for weighted first-order model counting. arXiv preprint arXiv:1312.5378.

  • Victor, P., Cornelis, C., & De Cock, M. (2011). Trust and recommendations. In Recommender systems handbook (pp. 645–675). Springer.

  • West, R., Paskov, H. S., Leskovec, J., & Potts, C. (2014). Exploiting social network structure for person-to-person sentiment analysis. Transactions of the Association for Computational Linguistics (TACL), 2, 297–310.

    Google Scholar 

  • Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decision making. In IEEE transactions on systems, man and cybernetics (IEEE SMC) (pp. 183–190).

  • Zadeh, L. A. (1983). A computational approach to fuzzy quantifiers in natural languages. Computers and Mathematics with Applications, 9, 149–184.

    MathSciNet  Article  MATH  Google Scholar 

Download references


This work was funded in part by the SBO-program of the Flemish Agency for Innovation by Science and Technology (IWT-SBO-Nr. 110067) and NSF Grant IIS1218488. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Golnoosh Farnadi.

Additional information

Editors: Katsumi Inoue, Akihiro Yamamoto, and Hayato Ohwada.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Farnadi, G., Bach, S.H., Moens, MF. et al. Soft quantification in statistical relational learning. Mach Learn 106, 1971–1991 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Soft Quantifiers
  • Statistical Relational Learning (SRL)
  • Link Prediction
  • Social Trust Network
  • First-order Logic Rules