Abstract
Motivated by an analogy with matrix factorization, we introduce the problem of factorizing relational data. In matrix factorization, one is given a matrix and has to factorize it as a product of other matrices. In relational data factorization, the task is to factorize a given relation as a conjunctive query over other relations, i.e., as a combination of natural join operations. Given a conjunctive query and the input relation, the problem is to compute the extensions of the relations used in the query. Thus, relational data factorization is a relational analog of matrix factorization; it is also a form of inverse querying as one has to compute the relations in the query from the result of the query. The result of relational data factorization is neither necessarily unique nor required to be a lossless decomposition of the original relation. Therefore, constraints can be imposed on the desired factorization and a scoring function is used to determine its quality (often similarity to the original data). Relational data factorization is thus a constraint satisfaction and optimization problem. We show how answer set programming can be used for solving relational data factorization problems.
Keywords
Answer set programming Inductive logic programming Pattern mining Relational data Factorization Data mining Declarative modeling1 Introduction
The fields of data mining and machine learning have contributed numerous effective and highly optimized algorithms for analyzing data. However, this focus on efficiency and scalability has come at the cost of generality. Indeed, while the algorithms are highly effective, their application range is often very restricted, and the algorithms are typically hard to change and adapt even to small variations on the problem definition. This observation has led to an interest in declarative methods for data mining and machine learning in which the focus lies on the use of expressive models that can capture a wide range of different problem settings and that can then be solved using offtheshelf constraint solving technology; see Guns et al. (2013a), De Raedt (2012), Arimura et al. (2012), De Raedt (2015).
Motivated by this quest for more general and generic data analysis approaches, the present paper introduces the problem of relational data factorization (ReDF). ReDF is inspired by matrix factorization, one of the most popular techniques in machine learning and data mining for which many variants have been studied, such as nonnegative, singular value and Boolean matrix factorization. In matrix factorization, one is given an \(n \times m\) matrix \(\mathbf {A}\), and the problem is to rewrite it as the product of some other matrices, e.g., the product of an \(n \times k\) matrix \(\mathbf {B}\) and \(k\times m\) matrix \(\mathbf {C}\) such that \({\mathbf {A}_{i,j} = \sum _k \mathbf {B}_{i,k} \cdot \mathbf {C}_{k,j}}\). In relational data factorization, one is given a relation (i.e., a set of tuples over the same attributes) and asked to rewrite it in terms of other relations. Consider, for instance, a relation sells(Company, Part, Project), stating that companies sell particular parts to particular projects. While it is wellknown that ternary relations, in general, can not be rewritten as the join of three binary relations (Heath 1971; Jones et al. 1996),^{1} we might be interested in an approximation of the ternary relation. That is, we might approximate sells(Company, Part, Project) by the query offers(Company,Part), needs(Project, Part), deliversto(Company, Project) (we follow logic programming notation, where the same variable name denotes a natural join). The question is then how to determine the extensions for the relations offers, needs, and delivers. The found solution will generally be imperfect, so in ReDF we want to find the best approximation w.r.t. a scoring function and we allow the user to specify hard constraints. In the example these might specify, e.g., that only tuples in the target relation sells may be derivable from the query.
In this paper, we develop a modeling and solving approach for ReDF using answer set programming (ASP) (Brewka et al. 2011). This is realized by showing for a number of ReDF problems how they can be tackled with ASP. This leads to the identification of constraints and scoring functions, which we then abstract to an even higherlevel declarative language. We show that the resulting ReDF framework is general and generic and is in line with the declarative modeling approach to machine learning and data mining as (1) it allows one to easily specify and solve a wide range of wellknown data analysis problems (such as tiling, Boolean matrix factorization, discriminative pattern mining, matrix block diagonalization, etcetera), (2) it is effective for prototyping such tasks (as we show in our experiments), even though it cannot yet compete with optimized special purpose algorithms in terms of efficiency, and (3) the constraints and optimization criteria are specified in a declarative and flexible manner. Translating problem definitions in the ReDF framework to ASP models is straightforward, and small changes in the problem definitions generally result in small changes in the model.
Relational data factorization is a form of relational learning. That is, it is a relational analog of matrix factorization and is therefore relevant to inductive logic programming (Muggleton and De Raedt 1994; De Raedt 2008) and can also be seen as a form of largescale abduction (Denecker and Kakas 2002). Moreover, the solution techniques that we adopt are based on answer set programming, which has also been adopted in some recent works and methods on inductive logic programming (Paramonov et al. 2015; Järvisalo 2011). The implementation techniques we employ may also be used in more traditional inductive logic programming settings.
This paper is structured as follows. Section 2 introduces the formal ReDF framework. Section 3 introduces ASP. Section 4 shows how a wide range of data mining problems can be expressed as ReDF problems. Section 5 introduces some novel problems that the framework can express. Section 6 discusses the encoding of the problems into ASP, while Sect. 7 reports on the experimental evaluation. In Sect. 8 we discuss related work, and we formulate some conclusions and directions for future work in Sect. 9.
2 Relational data factorization
Before we formalize the ReDF problem and approach in its full generality, we illustrate Relational Data Factorization on the sells(Company, Part, Project) example from the Introduction.
2.1 An example
In practice, it is usually impossible to find a perfect solution (with \(\textit{error} =0\)) to relational data factorization problems, in this example because of Heath’s theorem (Heath 1971) (as discussed in the Introduction). Therefore, it is often useful to impose further restrictions on the sets to be considered. One such constraint could specify that there is no overcoverage, i.e., that all tuples in \(\textit{approx}\) must be in sells.
2.2 Problem statement
Using a logic programming formalism, we generalize the above example into the following ReDF problem statement.

a dataset D: a set of ground facts for target predicate db;

a factorization shape Q: \(\textit{approx} ({\bar{T}}) \leftarrow q_1({\bar{T}}_1), \ldots , q_k({\bar{T}}_k)\), where the \(q_i\) are factors and the \({\bar{T}}_i\) denote tuples of variables;

a set of constraints C;

an error function measuring difference between two predicates (i.e., between the corresponding sets of ground facts);
The factorization shape is a single nonrecursive rule defining \(\textit{approx}\), the approximation of the target predicate \(\textit{db} \), where the predicates in the body are the factors. If a variable occurs in a body atom \({\bar{T}}_i\) and not in \({\bar{T}}\) (the head), then it is called latent. The task is to find a set F of ground facts defining the factors \(q_i\). Furthermore, each such set F uniquely determines a set of facts for \(\textit{approx}\). Notice that if a predicate \(q_i\) is already known and defined, then the task simplifies.
As in matrix factorization, it is quite likely that a perfect solution, with \(\textit{error} =0\), cannot be obtained. Consider the following example: \(\textit{db} (X,Y) \leftarrow p(X), q(Y)\) and dataset \(D = \{ \textit{db} (a,c), \textit{db} (b,d)\}\). Then it is impossible to perfectly reconstruct the target D. If \(F = \{p(a), p(b), q(c), q(d)\}\), the resulting program overgeneralizes as it entails facts not in D: \(\textit{db} (a,d) \in \textit{approx} \) and \(\textit{db} (a,d) \not \in D\); if, on the other hand, there are facts in D that are not entailed in \(\textit{approx} \), one undergeneralizes (e.g., when \(F = \emptyset \)).
The scoring function in relational factorization measures the error between the predicates \(\textit{approx}\) and db. Instead of minimizing error, however, in some cases it is more convenient to maximize similarity. Since these two perspectives can be trivially transformed from one to the other, we will use both without loss of generality.
2.3 Approach
We introduce ASP in more detail below, but this model is easy to understand if one is familiar with the basics of logic programming. The ASP model basically defines the necessary predicates in ASP using a set of clauses. In addition, the rule in Line 4 encodes the constraint that whenever a tuple holds for sells(Com, Pa, Proj) there should be 0 or 1 corresponding tuples for the predicate offers(Com, Pa). Furthermore, the minimize statement specifies that we are looking for a model (a set of ground facts or tuples) that minimizes the error. The encoding in Listing 3 together with a set of facts for sells can be given to an ASP solver such as clasp (Gebser et al. 2011b).
Observe that the relational data factorization approach we propose perfectly fits within the declarative modeling paradigm for machine learning and data mining (De Raedt 2012). Indeed, the next sections will show that it naturally supports a wide range of popular and wellknown factorization problems. Modeling different problems corresponds to specifying different constraints, shapes and optimization functions. By doing so, one obtains a deep understanding of the relationships among the many variations of factorization, and one can easily design, prototype and experiment with new variations of factorization problems. Furthermore, the models of factorization are in principle solverindependent and do not depend on a particular ASP solver implementation.
Notice that it would also be possible to use other constraint satisfaction and optimization approaches (such as, e.g., Integer Linear Programming), but given that we work within a relational framework, ASP is a natural choice. It is also declarative and has the right expressiveness for the class of problems that we will study, many of which are NPcomplete such as BMF; see Sect. 4.2.
Finally, let us mention that there are many factorization approaches in both linear algebra, databases, and even in logic. We provide a detailed discussion of their relationship to ReDF in Sect. 8.
3 Preliminaries: ASP essentials
We use the answer set programming (ASP) paradigm for solving relational data factorization problems. Contrary to the programming language Prolog, which is based on a prooftheoretic approach to answer queries, ASP follows a model generation approach. It has been shown to be effective for a wide range of constraint satisfaction problems (Gebser et al. 2012).
The remainder of this subsection introduces the essentials of ASP in a rather informal way. ASP is a rich (and technical) research area, so we do not focus on technical issues as these would complicate the presentation, but rather refer the interested reader to Gebser et al. (2012), Eiter et al. (2009), Leone et al. (2002), Lifschitz (2008) for more details on this. For the actual implementation, we will use the clasp system (Gebser et al. 2012; Brewka et al. 2011).
Definition 1
where \(a_1, \ldots , a_n, b_1, \ldots , b_k,c_1, \ldots c_h\) are atoms of a functionfree first order language L. Each atom is an expression of the form \(p(t_1,\ldots ,t_n)\), where p is a predicate name and \(t_i\) is either a constant or a variable. We refer to the head of rule r as \(H(r) = \{a_1,\ldots ,a_n\}\) and to the body as \(B(r) = B^{+}(r) \cup B^{}(r)\), where \(B^{+}(r) = \{ b_1, \ldots , b_k \}\) is the positive part of the body and \(B^{}(r) = \{ c_1, \ldots , c_h \}\) the negative.
If a disjunctive datalog program P has variables, then its semantics are considered to be the same as that of its grounded version, written as ground(P), i.e. all variables are substituted with constants from the Herbrand Universe \(H_P\) (the constants occurring in the program). The semantics of a program with variables is defined by the semantics of the corresponding grounded version.
An interpretation I w.r.t. to a program P is a set of ground atoms of P. Let P be a positive disjunctive datalog program (i.e. without negation), then an interpretation I is called closed under P, if for every \(r \in \textit{ground}(P)\) it holds that \(H(r) \cap I \ne \emptyset \) whenever \(B(r) \subseteq I\).
Definition 2
(Answer set of a positive program (Eiter et al. 2009)) An answer set of a positive program P is a minimal (under set inclusion) interpretation among all interpretations that are closed under P.
Definition 3

removing all rules \(r \in P\) for which \(B^{}(r) \cap I \ne \emptyset \);

removing the literals “\(\textit{not }a\)” from all remaining rules.
Intuitively, the reduct of a program is a program where all rules with bodies contradicting I are removed and in all noncontradicting all negative ones are ignored. The interpretation I is a guess as to what is true and what is false.
Definition 4
(An answer set of a disjunctive program) An answer set of a disjunctive program P is an interpretation I such that I is an answer set of positive ground program \(\textit{ground}(P)^I\).
Example 1
An aggregate atom is an atom that has the following form: \(l \# \{ a_1, \ldots ,a_n \} u\) where l and u are constant numbers, each \(a_i\) is a literal. The atom is true in an answer set A iff there are from l to u literals \(a_i\) that are true in A.
Another construct is maximization (Gebser et al. 2012; Leone et al. 2002) (minimization is defined analogously) stated as \(\#maximize\{ a_1=k_1, \ldots , a_n=k_n \}\), where \(a_1, \ldots , a_n\) are classic literals and \(k_1, \ldots , k_n\) are integer constants (possibly negative). The semantics of this constraint are as follows: a model I is selected if the weighted sum of \([a_i]*k_i\) is maximal in I, where \([\cdot ]\) are Iverson brackets, i.e. [a] is equal to 1 iff a is true in I and 0 otherwise.
4 Application to data mining problems
In this section we show that the ReDF framework generalizes a wide range of data mining tasks and provides a truly declarative modeling approach for relational data factorization. We introduce a range of constraints and optimization criteria that can be used in practice. The data mining tasks studied include tiling (Geerts et al. 2004), Boolean Matrix Factorization (BMF) (Miettinen et al. 2008), discriminative pattern mining (Knobbe and Ho 2006), and blockdiagonal matrix forms (Aykanat et al. 2002).
4.1 Tiling
Data mining has contributed numerous techniques for finding patterns in (Boolean) matrices. One fundamental approach is that of tiling (Geerts et al. 2004). A tile is a rectangular area in a Boolean matrix represented by set of rows and columns such that all values on the corresponding rows and columns in the matrix are equal to 1.
Definition 5
(Maximum kTiling) Given a binary dataset D and a positive integer k, find a tiling \(\mathcal {T}\) consisting of at most k tiles and maximizing \(\textit{area}(\mathcal {T}, D)\).
We now formalize tiling as a relational data factorization problem and then solve it using ASP. Rather than restricting ourselves to Boolean values as in the traditional formulation, we consider the relational case. The standard way of dealing with tables in attributevalue datasets was to expand them into a sparse Boolean matrix (with one Boolean for every attributevalue). In contrast, our formulation employs the attributevalue format directly.
In Fig. 2, for example, we can see the initial dataset, in which State is an attribute and Fair and Good are values for this attribute. Moreover, the blue and green areas indicate two relational tiles occurring in particular sets of transactions.
 onevalueattribute: for every attribute of a tile there is at most one value:$$\begin{aligned} \leftarrow \textit{tile} (\textit{Indx},\textit{Val}_1, \textit{Attr}),\textit{tile} (\textit{Indx},\textit{Val}_2, \textit{Attr}),\textit{Val}_1\ne \textit{Val}_2. \end{aligned}$$(2)
 notileintersection: tiles do not overlap in the same transaction$$\begin{aligned} \leftarrow \textit{in} (I_1,T), \textit{in} (I_2,T), \textit{tile} (I_1,V,A), \textit{tile} (I_2,V,A), I_1 \ne I_2. \end{aligned}$$(3)
 noovercoverage: tiles cannot “overcover” the transaction, that is, they are only allowed to cover tuples that are in the dataset;$$\begin{aligned} \leftarrow \textit{tile} (\textit{Indx},\textit{Value}, \textit{Attr}),\textit{in} (\textit{Indx},\textit{Transct}), \textit{not } \textit{db} (\textit{Value}, \textit{Attr}, \textit{Transct}). \end{aligned}$$(4)
 numberofpatterns(K): there are at most ktiles (numbered from 1 to k):$$\begin{aligned} \textit{Indx} = 1 \vee \textit{Indx} = 2 \vee \ldots \textit{Indx} = k \leftarrow \textit{tile} (\textit{Indx},\textit{Value}, \textit{Attr}). \end{aligned}$$
 overlappingtiles(N): two tiles in one transaction can intersect only on at most N attributes:$$\begin{aligned} \leftarrow \textit{in} (I_1,T), \textit{in} (I_2,T), \textit{tile} (I_1,V,A_1), \textit{tile} (I_2,V,A_2), I_1 \ne I_2, \# \{ A_1 = A_2\} > N. \end{aligned}$$
 noisyovercoverage(N): every tile I can overcover at most N attributes in every transaction T where it occurs:$$\begin{aligned} \leftarrow \textit{tile} (I,V,A),\textit{in} (I,T), \textit{not}~ \textit{db} (V, A, T), \# \{ A \} > \textit{N}. \end{aligned}$$
4.2 The Discrete Basis Problem (DBP) and Boolean matrix factorization (BMF)

\(\texttt {overcoverage}\): \(\#\{ (T,A) : \textit{overcovered} (T,A) \}. \)
This formulation mimics The Discrete Basis Problem (Miettinen et al. 2008). That is, K plays the role of the basis size and \(\alpha \) mimics the bias towards rewarding covering and penalizing overcovering (the flags –bonuscovered and –penaltyovercovered in ASSO).
4.3 Discriminative kpattern set mining
4.4 Blockdiagonal matrix form
Aykanat et al. (2002) introduced the problem of and an algorithm for permuting the rows and columns of a sparse matrix into block diagonal form. They relate this problem to other combinatorial and classical linear algebra problems. The underlying blockdiagonal structure of a matrix can be used to parallelize certain matrix computations. An illustration of blockdiagonalization (several variants) of the Animals dataset is depicted in Fig. 4.

itemblocking: \(\leftarrow \textit{tile} (I_1,A), \textit{tile} (I_2,A), I_1 \ne I_2. \)

transactionblocking: \(\leftarrow \textit{in} (I_1,T), \textit{in} (I_2,T), I_1 \ne I_2. \)

\(\texttt {itempenalty}\): \(\#\{ (T,A) : \textit{approx} (T,A'), ~\textit{not } \textit{covered} (T,A) \}\)

\(\texttt {transtpenalty}\): \(\#\{ (T,A) : \textit{approx} (T',A), ~\textit{not } \textit{covered} (T,A) \}\)
If we omit \(\texttt {itempenalty}\) and \(\texttt {transtpenalty}\), we obtain the standard optimization function for tiling. In the experimental section we evaluate the effect of the presence of this penalty.
5 Beyond classic problems
So far we have focused on matrixlike representations of the data, in which the dataset was represented by instances of \(\textit{db} (T,A,V)\), for a transactions T having a value V for an attribute A. This representation is independent of the number of attributes and values, it allows one to easily specify constraints over all attributes and to access the data using the predicate db only. We will now show that it is also possible to use other, purely relational representations, such as the sells example from the Introduction.
Section 2 already provided the sells example for decomposing a ternary relation into three binary ones. In the shape for the sells example in Listing 3 there is no latent variable: there are only attributes from the original dataset. Since there is no latent variable, there is no “pattern” to be found for which the optimization criterion needs to be optimized, which allowed us to use a simple error function using only one type of atom.
However, latent variables can also be useful in a purely relational setting. Let us illustrate this on an example inspired by the ArXiv community analysis example of Gopalan and Blei (2013). Assume we are given a relation publishedIn with attributes Author, University, and Venue, specifying that an author belonging to a particular university publishes in a particular venue. Furthermore, assume we want to factorize this relation into the relation \(\textit{approx}\) (A,U,V) by introducing a latent attribute Topic, denoted as T. The latent topic variable clusters authors, universities and venues together in such a way that their join results in publications.
6 Implementation
This section describes how ReDF models can be implemented in ASP. We do this for the basic problem of tiling, as well as for the purely relational data factorization presented before. Implementations of the other variations are included in Appendix C. Our primary implementation is written in clasp, can be used with the clasp system (Gebser et al. 2012; Brewka et al. 2011) and will be made available online upon acceptance of this manuscript.
6.1 General computation methods: greedy and sampling approaches
In all described problems, the goal is to find k patterns or tiles, where a pattern is interpreted as a set of facts corresponding to a particular value of the latent variable. We will follow an iterative approach to finding these patterns, in which the discovery of the next pattern or tile will be encoded in ASP. We will consider both a greedy and a sampling algorithm for realizing this. The sampling approach is intended for better scalability and will be evaluated in Sect. 7.1.
6.2 Data mining problems expressed in the framework
The maximum ktiling problem can be encoded in answer set programming as indicated in Listing 10. The code implements the greedy model, i.e., Algorithm 1, for the maximum ktiling problem with a fixed number of tiles (Geerts et al. 2004). It assumes we have already found an optimal tiling for \(n1\) tiles, and indicates how to find the nth tile to cover the largest area. The nth tile is called \(\textit{currentI}\) in the listing. Further, we have information about the names of the attributes and the possible values for each attribute through predicates \(\textit{col} (\textit{Attr})\) and \(\textit{valid} (\textit{Attr}, \textit{Value})\). That is, \(\textit{col} (A)\) is an unary predicate that encodes possible column indices, and \(\textit{valid} (A,V)\) is a binary predicate that encodes which possible values V can occur in column A.
Let us explain the code in Listing 10. The constraint in Line 2 generates at most one value for each attribute. The constraints in Lines 4 and 6 compute the transactions where the current tile cannot occur, i.e., intersect(T) is the set of all transactions where the current tile overlaps with another tile and the current tile cannot cover these transactions. Similarly, overcovered(currentI,T) is the set of transactions that cannot be covered because there is an element in the current tile, with fixed index currentI, that is not present in transaction T. The constraint in Line 8 states that if the tile does not violate the overcovering and intersection constraints in a transaction, it occurs in the transaction. Line 10 defines the coverage and the optimization constraint in Line 11 enforces the selection of the best model.
Theorem 1
(Correctness of the greedy ASP tiling encoding) The ASP program \(\mathcal {P}\) defined by the Listing 10 computes the kth largest tile w.r.t. the scoring function coverage (5) as extensions of the predicates \(\textit{tile} (k,\cdot ,\cdot )\) and \(\textit{in} (k,\cdot )\) in its answer set \(\mathcal {A}\), provided that the dataset is represented extensionally through the predicates db, \(\textit{valid}\), and \(\textit{col}\) and the \(k1\) already found tiles are represented extensionally through the predicates \(\textit{tile} (I,\cdot ,\cdot )\) and \(\textit{in} (I,\cdot )\) for \(I \in [1,k1]\).
For the proof, see Appendix B. The clasp encodings for the other models are sketched in Appendix C.
6.3 Purely relational data factorization
In Sect. 5 we presented a factorization of the publishedIn relation into three binary relations. It constitutes a proofofconcept prototype model in ASP and could be improved by, e.g., incorporating heuristics.
Implementation differences When we generalize the factorization encoding with two relations to three relations, we observe a slight implementation difference between them. Factorization with the two relation shapes can be naturally implemented using the core ASP generateandtest paradigm. Once we have guessed an extension for a certain value of the latent variable, we propagate it to the second relation and test against the constraints. This strategy is often deployed in specialized algorithms (Geerts et al. 2004; Miettinen et al. 2008). For a multiple relation shape we guess an extension of one relation, then we constrain the possible values we generate for the second value (e.g., see Line 2 in Listing 11). In general, we can search for one at a time using a greedy strategy (as in tiling). Theoretically, we can simultaneously search for values of a latent variable by replacing the fixed latent parameter by a variable and searching over the latent parameter as well. The work of Guns et al. (2013b) provides evidence that this approach does not scale well, unless special propagators are introduced into the solver. This technique would allow extending the method to other shapes with more than three relations.
7 Experiments
The main goal of this section is to evaluate whether ReDF problems can be solved using a generic solver. In particular, we focus on solving the problem formulations as we specified them in ASP. We investigate whether the problems can be solved, and for a number of tasks compare the results and runtimes to those obtained by specialized algorithms. Since we here use generic problem formulations and generic solvers that have neither been designed nor optimized for the tasks under consideration, we cannot expect the approach to be as efficient as specialized algorithms. However, what is more important is that we demonstrate that all tasks formalized and prototyped using the ReDF framework can be solved using a unified approach.
Experimental setup and datasets The ASP engine we use is 64bit clingo (clasp with the gringo grounder) version 3.0.5 with the parameter –heuristic=Vmtf (see Appendix A for details on the parameters) and all experiments are executed on a 64bit Ubuntu machine with Intel Core i53570 CPU @ 3.40GHz \(\times \) 4 and 8GB memory, except for Maximum ktiling on Chess and Mushrooms datasets where Intel Xeon CPU with 128GB of memory (all singlethreaded) has been used due to high memory requirements. For most experiments we use the datasets summarized in Table 1, which all but one originate from the UCI Machine Learning repository (Bache and Lichman 2013). The Animals (with Attributes) dataset was taken from Osherson et al. (1991). For the purely relational factorization task, the data and experiment results are described separately in the corresponding subsection.
In Sect. 7.1 we show how ReDF formulations of existing data mining tasks (from Sect. 4) can be solved using the implementation presented in Sect. 6, afterwards in Sect. 7.2 we show the results of the purely relational data factorization task. The ASP solver parameters used in the experiments and a breakdown of individual solving steps and their runtimes determined by the metaexperiment are presented in Appendix A.
Dataset properties
Dataset  Attributes  # Tuples  # Attributes  Avg # values per attribute 

Animals  Boolean  50  85  2 
Solar flare  Categorical  1389  11  3.3 
Tictactoe  Categorical  958  10  2.9 
Nursery  Categorical  12, 960  8  3.4 
Voting  Categorical  435  17  3.0 
Chess (KrvsKp)  Categorical  3196  36  2.1 
Mushroom  Categorical  8124  22  5.6 
7.1 Solving existing tasks
Maximum ktiling in categorical data We first consider the maximum ktiling problem from Sect. 4.1 and present timing and coverage results in Table 2 obtained on all datasets from Table 1.
In all cases the problem specification given in Listing 10 was used to greedily mine \(k=25\) tiles. Since the problem becomes more constrained as the number of tiles increases, runtime decreases for each additional tile mined. We therefore report total runtime and coverage for different values of k, i.e., for different total numbers of tiles. Only \(k=10\) tiles were mined on Chess and Mushroom due to long runtimes.
Effect of sampling As we can see from Table 2a, runtimes are quite long on datasets like Mushroom. To address this issue, we use the sampling procedure of Algorithm 2 with the following parameters: \(\alpha = 0.4\) and \(N = 20\), i.e., 40% of all attributes were selected uniformly at random for each sample and 20 samples were used. Intuitively, the larger the sample size and the more samples, the better we approximate the exact result.
With the given parameters, we attain an order of magnitude improvement in runtime: instead of 19 hours with the regular algorithm, using sampling it takes only one hour to compute 10 tiles as indicated in Fig. 5a. The effect of using sampling on coverage can be seen in Fig. 5b: the first tiles that are mined have lower coverage than when sampling is not used, but after a while the difference in coverage with LTMk remains more or less constant and even slightly decreases. LTMk is the original, specialized tiling algorithm, to which we compare next.
Maximum ktiling
Dataset  Number of tiles (k)  

5  10  15  20  25  
(a) Runtime  
Animals  36s  1m4s  1m21s  1m32s  1m36s 
Solar flare  6s  10s  13s  16s  18s 
Tictactoe  22s  31s  33s  34s  35s 
Nursery  4m19s  6m32s  7m32s  7m56s  8m13s 
Voting  52s  1m28s  1m42s  1m46s  1m49s 
Chess  17h03m  22h31m  –  –  – 
Mushroom  13h09m  19h44m  –  –  – 
(b) Coverage  
Animals  0.327  0.472  0.573  0.649  0.709 
Solar flare  0.416  0.565  0.655  0.721  0.751 
Tictactoe  0.251  0.449  0.623  0.784  0.907 
Nursery  0.269  0.454  0.634  0.773  0.905 
Voting  0.399  0.553  0.662  0.749  0.810 
Chess  0.483  0.618  –  –  – 
Mushroom  0.476  0.586  –  –  – 
Without sampling, we can see that our approach gives the same results in terms of the coverage as the LTMk algorithm. This is as expected though, since both LTMk and our approach guarantee to find an optimal solution in each iteration. The slight difference between the two coverage curves in Fig. 5b is caused by the fact that multiple tiles can have the same (maximum) area, and some choice between those has to be made. Although these choices are typically made deterministically, the different implementations make decisions based on different criteria, resulting in slightly different tilings.
Overlapping tiling To evaluate the overlapping tiling task from Sect. 4.1, we apply the model in Listing 12 (ASP encoding in Appendix C) to the five smaller datasets from Table 1. We experiment with two levels of overlap, i.e., parameter N is set to either 1 or 2: tiles can intersect on at most one or two attribute(s). As the results in Table 3 show, allowing limited overlap can lead to a small increase in coverage, but runtimes also increase due to the costly aggregate operation in Line 1 of Listing 12.
Maximum kTiling with overlap
Dataset  N  Number of tiles (k)  

5  10  15  20  25  
(a) Runtime  
Animals  1  1m10s  2m28s  3m46s  4m24s  4m47s 
2  1m39s  4m10s  6m26s  7m40s  8m10s  
Solar flare  1  8s  13s  17s  21s  24s 
2  8s  15s  20s  25s  29s  
Tictactoe  1  24s  41s  49s  52s  53s 
2  23s  43s  51s  55s  56s  
Nursery  1  5m00s  8m19s  10m10s  10m48s  11m12s 
2  5m43s  9m32s  11m9s  11m50s  12m12s  
Voting  1  1m10s  2m19s  2m53s  3m8s  3m15s 
2  1m39s  3m34s  4m35s  5m9s  5m33s  
(b) Coverage  
Animals  1  0.327  0.475  0.583  0.663  0.722 
2  0.332  0.482  0.592  0.675  0.742  
Solar flare  1  0.433  0.595  0.684  0.734  0.756 
2  0.452  0.602  0.685  0.731  0.755  
Tictactoe  1  0.253  0.451  0.626  0.781  0.898 
2  0.253  0.451  0.626  0.781  0.898  
Nursery  1  0.268  0.454  0.633  0.772  0.905 
2  0.268  0.454  0.633  0.772  0.905  
Voting  1  0.403  0.558  0.675  0.765  0.828 
2  0.409  0.571  0.683  0.762  0.819 
Discriminative pattern set mining Here we demonstrate how the discriminative kpattern mining model from Sect. 4.3 can be solved. For this we use Chess and Tictactoe from Table 1, each of which has a binary class label indicating whether a game was won or not and can therefore be naturally used for this task.
We apply the encoding from Listing 14 to both datasets, set \(\alpha = 1\) to weigh positive and negative tuples equally, and summarize the results in Fig. 7b. The results show that five patterns suffice to cover all positive examples of Tictactoe, hence mining more than five patterns would be useless. 92 of the 718 covered tuples are negative, i.e., \(12.8\%\), while \(34.7\%\) of the tuples in the complete dataset is negative. For Tictactoe, the time needed to solve this task is very limited: about half a second.
Figure 7a shows the runtime needed to iteratively find subsequent patterns in the Chess dataset. Interestingly, it seems that the problem becomes substantially easier (computationally) once the first few patterns have been found: the runtime per pattern drops heavily. This confirms that the search space shrinks when the problem becomes more constrained, i.e., the number of answer sets decreases with the addition of more constraints.
We next show the influence of the \(\alpha \) parameter, i.e., the relative weight of covering positive and negative tuples in the optimization criterion. By increasing \(\alpha \), the ‘penalty’ for covering a negative tuple is increased and the algorithm can be forced to select more conservative rules. We investigate the effect of this parameter by measuring and comparing precision and recall of the obtained pattern sets for \(\alpha = 1\) and \(\alpha = 5\). Figure 8 shows that precision goes to 1 when \(\alpha \) is increased, while recall is decreased but this can be compensated by mining a larger number of patterns.^{4}
This task differs from the previous one in its optimization criterion: positive coverage penalized by negative coverage allows for fast inference and discovery of the optimal solution, which results in shorter runtimes than for tiling.
7.2 Purely relational data factorization
In Sect. 5 we described how to model the factorization of publishedIn(Author, University, Venue) into three binary relations with a latent variable Topic. We now evaluate whether the standard ASP solver can solve this task. Unfortunately, we cannot expect a generic solver to handle enormous datasets such as the one from ArXiv as described by Gopalan and Blei (2013). Instead, we demonstrate a proofofconcept of solving the model in Listing 16 on a moderate dataset.
We constructed a dataset for a wellknown colleague from the data mining community: Bart Goethals (Antwerp University). We collected his publication list from Microsoft Academic Search^{5} and extracted for each paper the publication venue, and all coauthors together with their corresponding affiliations (i.e., the last known affiliation for each author in this list of papers). Each unique combination of venue, coauthor, and affiliation resulted in a tuple in the publishedIn relation. The complete dataset contains 57 tuples over 19 universities, 38 authors, and 15 venues.
Intuitively, if a set of authors from a set of universities publish in a set of venues, then there must be an underlying research topic that unites them. Hence, by factorizing the relation into three separate relations, we cluster each of the entity types into a (fixed) number of topics, as indicated by the value of the latent Topic variable.
The results for factorization using \(K = 12\) topics and \(\alpha =\frac{1}{2}\) are presented in Fig. 9, including coauthors (red), universities (blue), publication venues (purple), and topics (green). To determine the number of topics K, we tracked the optimization criterion while increasing K and stopped when this no longer improved.
Since the task is of an exploratory character, we can only qualitatively evaluate the results. We observe that all data mining venues are located together in the center, connected to the same topics. SEBD, an Italian database conference, stands apart, and there is also a separate block for database and computing venues DaWaK and SAC. Manual inspection of the results indicates the topics (or clusters) to be coherent and meaningful: they represent different affiliations and groups of coauthors that Bart Goethals has collaborated with. For example, topic 5 contains the SDM conference, the University of Helsinki, and three coauthors specialized in Data Mining. Hence, this topic could be described as “Data mining collaboration with the University of Helsinki”, which makes perfect sense as Bart Goethals was previously a researcher in Helsinki.
Not all authors are represented in the factorization. How much of the publishedIn dataset is covered depends on the number of topics K (which was chosen as described before). The higher the cardinality of the pattern set, the larger the total coverage. The \(\textit{covered}\) elements positively contribute to coverage, whereas the \(\textit{overcovered}\) elements contribute negatively. This implies that each pattern is chosen such that the number of \(\textit{covered}\) and \(\textit{overcovered}\) elements are balanced and the optimization criterion is maximized. In general, covering all authors with few patterns would lead to significant overcovering of the original dataset, while introducing too many patterns would create clusters with only one author (which is clearly undesirable, since these clusters would not be meaningful).
The decompositions, as the one depicted in Fig. 9, could serve as a basis for new analyses. For example, we might visualize the intersection of common (latent) topics shared by two researchers. We outline possible examples in Appendix E.
Relational factorization without a latent variable In Sect. 5, we also described a factorization that does not use any latent variables (analog to the sells example in Listing 11 from the introduction section). We evaluate this model using Listing 11 on the same dataset as used in the previous experiment, i.e., the coauthor relation publishedIn(Author, Uni, Venue) for Bart Goethals.
In general, factorizations do not perfectly match the original relation (i.e., \(\textit{error} \ne 0\)), but in this particular case the system found a lossless solution. It is easy to see that this will not always be possible though. For instance, let us assume we keep multiple affiliations per author in the dataset. For example, apart from a fact \(\textit{p(bonchi,barcelona,pakdd)}\), there may be another fact \(\textit{p(bonchi, pisa, pakdd)}\) in publishedIn. Although the same factorization would be found by the solver, the found solution would be imperfect as the latter fact is not represented in the factorized relation.
Experimental summary for pure relational factorizations from Sect. 5
With a latent variable  #Transactions  Overall runtime  Avg runtime  #Topics  Avg atoms per topic 
58  14s  1.1s  12  5.4  
Without a latent variable  #Transactions  Overall runtime  Correct  #Incorrect  Avg factor size 
58  0.01s  58  0  45 
7.3 Runtime discussion
In this section we have seen a number of experiments that solve ReDF problems using generic solving technology, i.e., answer set programming. As we can see in Figs. 5a and 6, specialized algorithms are substantially faster than ASP. On datasets of moderate size, however, generic solvers obtain reasonable runtimes, as indicated by the results in Tables 2a, 3a, and 7b, and Figs. 6 and 7a. For the purely relational data factorization task from Sect. 5 we present a summary of the experiments in Table 4. In these experiments, computation time ranged from several seconds to few minutes.
8 Related work
Our work is related to (1) previous work on generalizing problem definitions and solutions in factorization, (2) existing forms of relational decomposition, and (3) approaches in inductive and abductive logic programming, and (4) the use of declarative languages and solvers for data mining.
8.1 General models for pattern mining
Our work can be related to a number of approaches that have generalized some of the tasks addressed in Sect. 5. Lu et al. (2008) used BMF as a basis for defining several data mining tasks and modeled them using integer linear programming. While Lu et al. (2008) also used a general purpose solver, it is restricted to Boolean matrix products involving only two Boolean matrices. In a similar manner Li (2005) defined a General Model for Clustering Binary Data, using matrix factorization to model several wellknown clustering methods. The framework supports only one possible factorization shape, a lowerlevel modeling language, and requires complete partitions as well as specialized algorithms for different problems. In our approach, the shape of the factorization is separated from the constraints and optimization criterion.
Biskup et al. (2004) and Fan et al. (2012) investigated inverse querying and the problem of solving relational equations \(e_1(D) = e_2(D)\) exactly under several assumptions, that could be used to compute exact solutions to a restricted form of ReDF. However, this approach does not seem to allow for approximations and the use of loss functions.
8.2 Decomposition of databases, tensors, and realvalued matrices
ReDF is related to several forms of relational decomposition, a term that has been heavily overloaded in the literature. Hence, it is imperative that we present an overview and contrast existing paradigms to our own work. Moreover, ReDF is also related to decomposition methods for realvalued matrices.
Relational decomposition in database theory Ever since the seminal paper by Codd (1970), the decomposition of relations has been an important theme in database research (Koehler 2007; Date 2006). Key properties of this form of relational decomposition are (Elmasri and Navathe 2010): (1) a relational schema together with its constraints, e.g., functional dependencies, is assumed given; (2) decomposition is never based on the data (extension), but only on the schema (intension); (3) decomposition is always lossless, i.e., factorization is always exact for any possible extension, and never an approximation. An interesting exception is Relational Decomposition via Partial Relations (Berzal et al. 2002), where one is looking for partially satisfied dependencies in the data and then uses these partial dependencies to derive a normal form. It does take into account data, but only to mine additional schema constraints in the decomposition.
Relational decomposition in tensor calculusKim and Candan (2011) extend classical tensor factorization, CP decomposition, to deal with datasets composed of several relations, i.e., CP is generalized to multirelational datasets. This requires adding relational algebra operations to CP. Key differences are: the data consists of several tables, with a schema to join them at the end; the shape is always the same and a tensor is decomposed into a sum of terms having the same structure; the optimization function is fixed; no user constraints are supported.
Decomposition of realvalued matrices Let us start with SVD (Singular Value Decomposition) (Golub and Van Loan 1996), the bestknown method in this area, which gives an optimal rankk decomposition of a realvalued matrix A into a composition of three matrices \(U \varSigma V^T\), where U and V are orthogonal realvalued matrices and \(\varSigma \) is a diagonal nonnegative matrix with singular values of A. One of the key problems with SVD in the context of relational and Boolean factorization is that U and V may contain negative values, which make interpretation in the relational setting problematic. To overcome this issue NNMF (NonNegative Matrix Factorization) has been introduced (Paatero and Tapper 1994). Still, there are two key issues with the usage of NNMF and SVD for relational and Boolean data.
Secondly, existing realvalued matrix factorization methods do not support multiple relations in the decomposition shape and extra constraints in the decomposition, which is at the core of the ReDF method. Furthermore, the constraints used in our method are hard constraints over discrete values. The latter problem has been addressed by Collective Matrix Factorization (Singh and Gordon 2008), which allows to handle multiple relations and optimization criteria. However, at its core the method relies on stochastic optimization over reals, which leads to the problems discussed at the beginning of the section.
Finally, as ReDF is defined over discrete values in the presence of the hard constraints, all the problems described above (rank inequalities, optimization over reals, uninterpretable values, etc) apply to the comparison of ReDF with realvalued matrix factorizations as well.
8.3 Relational learning
ReDF is also related to some well known techniques in inductive logic programming and statistical relational learning and even to abductive reasoning.
Several frameworks for abduction have been introduced over the years (Denecker and Kakas 2002; Flach and Kakas 2000). In abduction, the goal is to find a (minimal) hypothesis in the form of a set of ground facts that explains an observation. Abductive reasoning uses a rich background theory as well as integrity constraints; it also uses a set of clauses defining the predicate in the observation. The differences with ReDF are that ReDF uses a much simpler shape definition and no real background theory. On the other hand, abductive reasoning proceeds in a purely logical manner, and typically does neither take into account multiple facts in the observation nor does it use complex optimization functions. There also exist similarities between ReDF and fuzzy abduction (Vojtás 1999; Miyata et al. 1995), but we differ in the core assumptions we make: all rules and constraints in our setting are deterministic, as well as the evidence that needs to be derived. Also, ReDF has the shape constraint, which allows to derive only specific explanations in a form of a factorization.
Metainterpretive learning (Muggleton et al. 2015) uses templates together with a kind of abductive reasoning to find a set of rules and facts in a typical inductive logic programming setting. While it can use much richer templates and background theory, it uses neither constraints nor optimisation functions like ReDF does.
Kok and Domingos (2007) introduced a probabilistic framework based on Markov logic together with the EM principle to realize statistical predicate invention. This captures what the authors call multiple relational clustering and addresses essentially the same task as the infinite relational model of Kemp et al. (2006). Statistical predicate invention shares several ideas with our approach: it employs a kind of query or schema to denote the kind of factorization one wants and also imposes some hard constraints on the possible solutions. On the other hand, its optimization criterion is builtin and based on the maximum likelihood principle, the framework seems restricted to a kind of block modeling approach, essentially clustering the different rows and columns into different blocks, and the approach is inherently probabilistic.
8.4 Declarative data mining
The idea of using generic solvers and languages for data mining is not new and has been investigated by, for instance, Guns et al. (2011, 2013a), Métivier et al. (2012), who used various constraint programming languages for modeling and solving itemset mining problems. The use of ASP for frequent item set and graph mining were investigated in Järvisalo (2011) and Paramonov et al. (2015). Furthermore, the use of integer linear programming is quite popular in data mining and machine learning; e.g., Chang et al. (2008). While the choice of a particular framework for modeling and solving may lead to both different models and performances, it should be possible to use alternative frameworks, such as constraint programming or integer programming, for modeling and solving ReDF problems.
Aftrati et al. (2012) extended the typical structure of the mining problem using threelevel graphs that represent a chain of relations in the multirelational setting: authors writing papers, and papers being about certain topics. The goal is to find the subgraphs that satisfy particular constraints and optimization criteria. E.g., an author is an authority if the number of topics he has written papers on is maximal. They provide various interesting discovery tasks and solve them using integer programming.
9 Conclusions
The key contribution of this paper is the introduction of the framework of relational data factorization, which was shown to be relevant for modeling, prototyping, and experimentation purposes.
On the modeling side, we have formulated several wellknown data mining tasks in terms of ReDF, which allowed us to identify commonalities and differences between these data mining tasks. One advantage of the framework is that small changes in the problem definition typically lead to small changes in the model. Furthermore, ReDF allowed us to model new types of relational data mining problems.
We have not only modeled problems, but also demonstrated that these models can be easily translated into concrete executable ASP encodings. The experiments have shown the feasibility of the approach, especially for prototyping, and especially with the sampling technique. The runtimes were typically not comparable with highly optimized and much more specific implementations that are typically used in data mining. Still they could be run on reasonable datasets of modest size (e.g., Mushroom and Chess have approximately \(185\,000\) and \(115\,000\) nonempty elements respectively).
Directions for future research include investigating the use of alternative solvers (such as constraint or integer programming), the study of heuristics and local search, and the expansion of the range of tasks to which ReDF can be applied. For example, a general ReDF framework is needed to factorize evidence for probabilistic lifted inference, where the shape of the factorization crucially affects the overall performance of the algorithm (Van den Broeck and Darwiche 2013).
Footnotes
Notes
Acknowledgements
We would like to thank Marc Denecker, Tias Guns, Benjamin Negrevergne, Siegfried Nijssen, and Behrouz Babaki for their help and assistance, and last but not least the ICON project (FP7ICT2011C) and FWO for funding this work.
References
 Aftrati, F., Das, G., Gionis, A., Mannila, H., Mielikäinen, T., & Tsaparas, P. (2012). Mining chains of relations. In D. E. Holmes & L. C. Jain (Eds.), Data mining: foundations and intelligent paradigms, intelligent systems reference library (Vol. 24, pp. 217–246). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
 Arimura, H., Medina, R., & Petit, J.M. (Eds.). (2012). In: Proceedings of the IEEE ICDM Workshop on Declarative Pattern Mining.Google Scholar
 Aykanat, C., Pinar, A., & Catalyurek, Ü. V. (2002). Permuting sparse rectangular matrices into blockdiagonal form. SIAM Journal on Scientific Computing, 25, 1860–1879.MathSciNetCrossRefMATHGoogle Scholar
 Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
 Berzal, F., Cubero, J. C., Cuenca, F., & Medina, J. M. (2002). Relational decomposition through partial functional dependencies. Data and Knowledge Engineering, 43(2), 207–234.CrossRefMATHGoogle Scholar
 Biskup, J., Paredaens, J., Schwentick, T., & den Bussche, J. V. (2004). Solving equations in the relational algebra. SIAM Journal on Computing, 33(5), 1052–1066.MathSciNetCrossRefMATHGoogle Scholar
 Brewka, G., Eiter, T., & Truszczyński, M. (2011). Answer set programming at a glance. Communications of the ACM, 54(12), 92–103.CrossRefGoogle Scholar
 Chang, M. W., Ratinov, L. A., Rizzolo, N., & Roth, D. (2008). Learning and inference with constraints. Proceedings of the TwentyThird AAAI Conference on Artificial Intelligence, AAAI, 2008, 1513–1518.Google Scholar
 Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.CrossRefMATHGoogle Scholar
 Date, C. J. (2006). Date on database: Writings 2000–2006. Berkely, CA, USA: Apress.Google Scholar
 De Raedt, L. (2008). Logical and relational learning. Berlin: Cognitive Technologies, Springer.CrossRefMATHGoogle Scholar
 De Raedt, L. (2012). Declarative modeling for machine learning and data mining. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp 2–3.Google Scholar
 De Raedt, L. (2015). Languages for learning and mining. In: Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence, January 25–30, 2015 (pp. 4107–4111). USA.: Austin, Texas.Google Scholar
 Denecker, M., & Kakas, A. (2002). Abduction in logic programming. In A. Kakas & F. Sadri (Eds.), Computational logic: Logic programming and beyond, lecture notes in computer science (Vol. 2407, pp. 402–436). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
 Eiter, T., Ianni, G., & Krennwallner, T. (2009). Answer set programming: A primer. In: 5th International Reasoning Web Summer School (RW 2009), Brixen/Bressanone, Italy, August 30 – September 4, 2009, Springer, LNCS, vol 5689.Google Scholar
 Elmasri, R., & Navathe, S. B. (2010). Fundamentals of database systems (6th ed.). Boston, MA, USA: AddisonWesley Longman Publishing Co. Inc.MATHGoogle Scholar
 Fan, W., Geerts, F., & Zheng, L. (2012). View determinacy for preserving selected information in data transformations. Information Systems, 37(1), 1–12.CrossRefGoogle Scholar
 Feige, U. (1996). A threshold of ln n for approximating set cover. In: Proceedings of the Twentyeighth Annual ACM Symposium on Theory of Computing, ACM, New York, NY, USA, STOC ’96, pp. 314–318.Google Scholar
 Flach, P. A., & Kakas, A. C. (2000). On the relation between abduction and inductive learning. In: D. M. Gabbay & R. Kruse (Eds.), Abductive reasoning and learning. Handbook of defeasible reasoning and uncertainty management systems (Vol. 4, pp. 1–33). Springer NetherlandsGoogle Scholar
 Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T., Schneider, M., & Ziller, S. (2011a). A portfolio solver for answer set programming: Preliminary report. In: Delgrande, J., Faber, WT (Eds.) Proceedings of the Eleventh International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR’11), SpringerVerlag, Lecture Notes in Artificial Intelligence, vol 6645, pp 352–357Google Scholar
 Gebser, M., Kaufmann, B., Kaminski, R., Ostrowski, M., Schaub, T., & Schneider, M. (2011b). Potassco: The potsdam answer set solving collection. AI Communications, 24(2), 107–124.MathSciNetMATHGoogle Scholar
 Gebser, M., Kaminski, R., Kaufmann, B., & Schaub, T. (2012). Answer set solving in practice. Synthesis lectures on artificial intelligence and machine learning. San Rafael: Morgan and Claypool Publishers.Google Scholar
 Gebser, M., Kaufmann, B., Romero, J., Otero, R., Schaub, T., & Wanko, P. (2013). Domainspecific heuristics in answer set programming. In M. desJardins & M. L. Littman (Eds.), Association for the advancement of artificial intelligence. Palo Alto: AAAI Press.Google Scholar
 Geerts, F., Goethals, B., & Mielikäinen, T. (2004). Tiling databases. In: E. Suzuki & S. Arikawa (Eds.), Discovery science: 7th international conference, DS 2004, Springer Berlin Heidelberg pp. 278–289.Google Scholar
 Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). Baltimore, MD, USA: Johns Hopkins University Press.MATHGoogle Scholar
 Gopalan, P. K., & Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences, 110(36), 14,534–14,539.MathSciNetCrossRefMATHGoogle Scholar
 Guns, T., Nijssen, S., & De Raedt, L. (2011). Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12–13), 1951–1983.MathSciNetCrossRefMATHGoogle Scholar
 Guns, T., Dries, A., Tack, G., Nijssen, S., & De Raedt, L. (2013a). Miningzinc: A modeling language for constraintbased mining. In: International Joint Conference on Artificial Intelligence, Beijing, ChinaGoogle Scholar
 Guns, T., Nijssen, S., & De Raedt, L. (2013b). kpattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering, 25(2), 402–418.CrossRefGoogle Scholar
 Guns, T., Nijssen, S., & De Raedt, L. (2013c). kpattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering, 25(2), 402–418.CrossRefGoogle Scholar
 Heath, I.J. (1971). Unacceptable file operations in a relational data base. In: Proceedings of the 1971 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, ACM, New York, NY, USA, SIGFIDET ’71, pp. 19–33.Google Scholar
 Hochbaum, D. S., & Pathria, A. (1998). Analysis of the greedy approach in problems of maximum kcoverage. Naval Research Logistics, 45, 615–627.MathSciNetCrossRefMATHGoogle Scholar
 Järvisalo, M. (2011). Itemset mining as a challenge application for answer set enumeration. In: Logic Programming and NonMonotonic Reasoning, pp 304–310.Google Scholar
 Jones, T.H., Song, I.Y., & Park, E.K. (1996). Ternary relationship decomposition and higher normal form structures derived from entity relationship conceptual modeling. In: Proceedings of the 1996 ACM 24th Annual Conference on Computer Science, ACM, New York, NY, USA, CSC ’96, pp. 96–104.Google Scholar
 Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In: Proceedings of the 21th National Conference on Artificial Intelligence, AAAI Press, pp. 381–388.Google Scholar
 Kim, M., & Candan, K.S. (2011). Approximate tensor decomposition within a tensorrelational algebraic framework. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, New York, NY, USA, CIKM ’11, pp. 1737–1742.Google Scholar
 Knobbe, A.J., & Ho, E.K.Y. (2006). Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Principles and practice of knowledge discovery in databases, Springer, Lecture Notes in Computer Science, vol 4213, pp. 577–584.Google Scholar
 Koehler, H. (2007). Domination normal form: Decomposing relational database schemas. In: Proceedings of the Thirtieth Australasian Conference on Computer Science  Volume 62, Australian Computer Society, Inc., Darlinghurst, Australia, Australia, ACSC ’07, pp. 79–85.Google Scholar
 Kok, S., & Domingos, P. (2007). Statistical predicate invention. In: Proceedings of The 24th International Conference on Machine Learning, pp. 433–440.Google Scholar
 Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., et al. (2002). The dlv system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7, 499–562.MathSciNetCrossRefMATHGoogle Scholar
 Li, T. (2005). A general model for clustering binary data. ACM SIGKDD (pp. 188–197). New York, NY, USA: ACM.Google Scholar
 Lifschitz, V. (2008). What is answer set programming? Association for the Advancement of Artificial Intelligence, 8, 1594–1597.Google Scholar
 Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 80–86.Google Scholar
 Lu, H., Vaidya, J., & Atluri, V. (2008). Optimal boolean matrix decomposition: Application to role engineering. In: IEEE 24th ICDE, pp. 297–306.Google Scholar
 Métivier, J.P., Boizumault, P., Crémilleux, B., Khiari, M., & Loudni, S. (2012), A constraint language for declarative pattern discovery. In: Ossowski, S., Lecca, P. (eds) Proceedings of the ACM Symposium on Applied Computing, pp. 119–125.Google Scholar
 Miettinen, P. (2009). Matrix decomposition methods for data mining: computational complexity and algorithms. Department of Computer Science, series of publications A, report A20094, University of Helsinki 2009 (Ph.D. thesis, monograph).Google Scholar
 Miettinen, P. (2012). Dynamic boolean matrix factorizations. In: Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds). Proceedings of International Conference on Data Mining, IEEE Computer Society, pp. 519–528.Google Scholar
 Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering, 20(10), 1348–1362.CrossRefGoogle Scholar
 Miyata, Y., Furuhashi, T., & Uchikawa, Y. (1995). A study on fuzzy abductive inference. In: Proceedings of 1995 IEEE International Conference on Fuzzy Systems, Citeseer, vol. 1, pp. 337–342.Google Scholar
 Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19(20), 629–679.MathSciNetCrossRefMATHGoogle Scholar
 Muggleton, S. H., Lin, D., & TamaddoniNezhad, A. (2015). Metainterpretive learning of higherorder dyadic datalog: Predicate invention revisited. Machine Learning, 100(1), 49–73.MathSciNetCrossRefMATHGoogle Scholar
 Osherson, D., Stern, J., Wilkie, O., Stob, M., & Smith, E. (1991). Default probability. Cognitive Science, 15(2), 251–269.CrossRefGoogle Scholar
 Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111–126.CrossRefGoogle Scholar
 Paramonov, S., van Leeuwen, M., Denecker, M., & De Raedt, L. (2015). An exercise in declarative modeling for relational query mining. In: International Conference on Inductive Logic Programming, ILP, Kyoto, 20–22 August 2015, SpringerGoogle Scholar
 Singh, A.P., & Gordon, G.J. (2008). Relational learning via collective matrix factorization. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 650–658.Google Scholar
 Van den Broeck, G., & Darwiche, A. (2013). On the complexity and approximation of binary evidence in lifted inference. In: The Neural Information Processing Systems, pp. 2868–2876.Google Scholar
 Vojtás, P. (1999). Fuzzy logic abduction. In: Proceedings of the EUSFLATESTYLF Joint Conference, Palma de Mallorca, Spain, September 22–25, 1999, pp. 319–322.Google Scholar