# Mapping-equivalence and oid-equivalence of single-function object-creating conjunctive queries

## Abstract

Conjunctive database queries have been extended with a mechanism for object creation to capture important applications such as data exchange, data integration, and ontology-based data access. Object creation generates new object identifiers in the result that do not belong to the set of constants in the source database. The new object identifiers can be also seen as Skolem terms. Hence, object-creating conjunctive queries can also be regarded as restricted second-order tuple-generating dependencies (SO-tgds), considered in the data exchange literature. In this paper, we focus on the class of single-function object-creating conjunctive queries, or sifo CQs for short. The single-function symbol can be used only once in the head of the query. We give a new characterization for oid-equivalence of sifo CQs that is simpler than the one given by Hull and Yoshikawa and places the problem in the complexity class NP. Our characterization is based on Cohen’s equivalence notions for conjunctive queries with multiplicities. We also solve the logical entailment problem for sifo CQs, showing that also this problem belongs to NP. Results by Pichler et al. have shown that logical equivalence for more general classes of SO-tgds is either undecidable or decidable with as yet unknown complexity upper bounds.

### Keywords

Conjunctive query Object creation Oid Equivalence Logical entailment SO-tgd Sifo CQ Nested tgd Schema mapping## 1 Introduction

Conjunctive queries (CQs) form a natural class of database queries, which can be defined by combinations of selection, renaming, natural join, and projection. Much of the research on database query processing is focused on CQs; moreover, these queries are amenable to advanced optimizations because containment of CQs is decidable (though NP-complete). In this paper, we are interested in CQs extended with a facility for object creation.

Object creation, also called oid generation or value invention, has been repeatedly proposed and investigated as a feature of query languages. This has happened in several contexts: high expressiveness [4, 5, 11]; object orientation [3, 10, 22, 24, 29]; data integration [21]; semi-structured data and XML [1]; and data exchange [8, 16, 18]. In a logic-based approach, object creation is typically achieved through the use of Skolem functions [22, 24, 29].

*f*:

*f*(

*x*,

*y*) for every pair (

*x*,

*y*) of a woman

*x*and a man

*y*who have at least one child together; all children

*c*of

*x*and

*y*are linked to the new oid in the result of the query (a relation called \( Family \)). As an example, if \( Mother(beth,anne) \) and \( Father(beth,adam) \) are two facts in the underlying database, then the result of the query includes the fact \( Family(beth,f(anne,adam)) \), where \( f(anne,adam) \) is the newly created oid. This oid will be shared by all the children having \( anne \) and \( adam \) as parents.

In this paper, we first revisit the problem of checking oid-equivalence of sifo CQs. Oid-equivalence has its origins in the theory of object-creating queries introduced by Abiteboul and Kanellakis [3]; it is the natural generalization of query equivalence in the presence of object creation.

*Q*above. The query \(Q^{\prime }\) links all children

*c*of the parents

*x*and

*y*to the oid

*g*(

*x*,

*y*,

*x*) that depends exactly on

*x*and

*y*. That is, two children in the result of

*Q*are connected to the same oid if and only if they are connected to same oid in \(Q^{\prime }\), although the oids will be syntactically different. Therefore, we can conclude that

*Q*and \(Q^{\prime }\) are oid-equivalent, which means that their results are identical on any input up to a simple isomorphism mapping the oids in one result to those in the other.

Hull and Yoshikawa [23] studied oid-equivalence (they called it ‘obscured equivalence’) for non-recursive ILOG programs; the decidability of this problem is a long-standing open question. Nevertheless, for the case of ‘isolated oid creation,’ to which sifo CQs belong, they have given a decidable characterization.

We give a new result relating oid-equivalence to equivalence of classical conjunctive queries under ‘combined’ bag–set semantics [14], which models the evaluation of CQs when query results and relations may contain duplicates of tuples. As a corollary, we obtain that oid-equivalence for sifo CQs belongs to NP, which does not follow from the Hull–Yoshikawa test. Obviously, then, oid-equivalence for sifo CQs is NP-complete, since equivalence of classical CQs without object creation is already NP-complete.

Object creation is receiving renewed interest in the context of schema mappings [8, 18], which are formalisms describing how data structured under a source schema are to be transformed into data structured under a target schema. Hence, it is instructive to view sifo CQs as schema mappings, simply by interpreting them as implicational statements. As an example, we may view query *Q* above as an implicational statement that relates a query over relations \( Mother \) and \( Father \) in the source schema to the relation \( Family \) in the target schema.

For standard CQs without object creation, two queries are equivalent if and only if they are logically equivalent as schema mappings [17]. For sifo CQs, we show that oid-equivalence implies logical equivalence, while the converse is not true.

*Q*above: It can be rewritten into the following SO-tgd:

*f*is existentially quantified.

Although logical equivalence of SO-tgds is undecidable [19], logical implication of nested dependencies has recently been shown to be decidable [26]. We give a novel and elegant characterization of logical implication for sifo CQs which is simpler than the general implication test for nested dependencies. It turns out that the problem belongs to NP. Hence, logical implication for sifo CQs has no worse complexity than containment for standard CQs without object creation.

- 1.
We clarify the relationship between sifo CQs and other formalisms in the literature, notably the language ILOG [22], second-order tuple-generating dependencies [18], and nested tuple-generating dependencies [8].

- 2.
We relate the problem of oid-equivalence for sifo CQs to the equivalence of classical conjunctive queries under combined bag–set semantics, which implies its NP-completeness.

- 3.
We show that when sifo CQs are interpreted as schema mappings, oid-equivalence implies logical equivalence but not vice versa.

- 4.
We provide a new characterization of logical implication for sifo CQs as object-creating queries showing that this problem has the same complexity as deciding containment for classical CQs.

## 2 Applications of sifo CQs

In this section, we discuss further applications of sifo CQs, which may constitute important components of many advanced database systems, spanning from information integration, and schema mapping engines along with their benchmarks, to several semantic Web tools. We believe this shows that the results in this article on equivalence and logical implication of sifo CQs are relevant and contribute to our understanding of how solutions for these applications can be optimized.

*GAV*) schema mappings [20, 27, 33] relate a query over the source schema, represented by a body

*B*of a CQ, to an atomic element of the global schema, represented by a head atom

*H*of a CQ. More precisely, a GAV mapping can be written as follows:

*T*as the atomic head predicate.

GAV schema mappings have been used already in the 1990s in mediator systems like Tsimmis [30, 33] or information manifold [28] for the integration of heterogeneous data sources. In both systems, source facts are related to facts over the global schema by means of queries.

Sifo CQs can naturally be seen as extensions of GAV mappings, when one of the attributes of the global schema carries newly created identifiers.

For instance, the sifo CQ *Q* from Sect. 1 can express a mapping from a source schema containing two relations \( Mother \) and \( Father \) to one relation \( Family \) of a global schema, with created identifiers for families appearing in the tuples in the result of the mapping. Thus, we can also interpret *Q* as an extended GAV schema mapping.

Another important application of sifo CQs is schema mapping benchmarks allowing the users to compare and evaluate schema mapping systems. In particular, the flexibility of the arguments of the Skolem functions used for object creation has been advocated as one of the desirable features in recent benchmarks for schema mapping and information integration, such as STBenchmark [6] and iBench [9].

More precisely, in the mapping primitives of iBench [9], an extension of STBenchmark [6] that supports SO-tgds, the users can choose among two different skolemization strategies to fill the arguments of the Skolem functions: *fixed*, where the arguments of the function are pre-defined in a native mapping primitive, or *variable*, where one can further choose among the options *All, Key*, and *Random*, which generate mappings where all variables, the variables in the positions of the primary key, or a random set of variables, respectively, are used as arguments of the function.

These skolemization strategies can be captured by sifo CQs as follows.

*B*(option

*All*). If the attribute in the position of

*x*is a primary key for

*B*, then the application of the option

*Key*generates a mapping that can be expressed by the sifo CQ

*Random*may lead the iBench to randomly select the attributes in the positions of

*x*and

*z*and then to generate the mapping represented by

*ADD*(copy a relation and ADD new attributes),

*ADL*(copy a relation, Add and DeLete attributes in tandem), and

*MA*(merge and add new attributes) contain single Skolem functions. They correspond to the following sifo CQs, respectively:

^{1}Indeed, newly created identifiers in the head of a sifo CQ can serve as generated keys, or simply as newly invented values needed to fill an attribute of a relation in the global schema. As such, sifo CQs can be seen as examples of mapping assertions from source schemas to a global ontology in OBDA [31]. Typically, OBDA mapping assertions relate facts in relational source schemas to RDF triples in a global ontology. The newly generated IRIs

^{2}in the RDF triples can be interpreted as skolemized values in the global ontology.

*r*in a database schema, Datalog-like rules can be used to generate an IRI for the relation

*r*and an IRI for each attribute

*a*in

*r*. We take an example of a translation from a relational schema into OWL, and we show that, actually, these Datalog-like rules can be viewed as sifo CQs, since they employ a single concatenation function to obtain such IRIs (exemplified as

*f*). The corresponding sifo CQs are reported below:

*r*and attribute names

*a*from the data dictionary of an underlying relational database and where

*b*is a string representing a given

*IRI base*(e.g., the string ‘http://example.edu/db’) for the same database to be translated. Thus, the first query creates a new IRI for the relation

*r*, by concatenating

*b*with the relation symbol

*r*, while the second query returns the set of IRIs of the attributes

*a*of

*r*, by concatenating

*b*with the relation symbol

*r*and its attribute symbols

*a*.

## 3 Preliminaries

In this section, we introduce our formalism for dealing with conjunctive queries and introduce the notion of object-creating CQ, adapted from the language ILOG [22].

### 3.1 Databases and conjunctive queries

From the outset, we assume a supply of *relation names*, where each relation name *R* has an associated arity \(\mathrm {ar}(R)\). We also assume an infinite domain \(\mathbf {dom}\) of atomic data elements called *constants*. A *fact* is of the form \(R(a_1,\ldots ,a_k)\) where \(a_1,\ldots , a_k\) are constants and *R* is a *k*-ary relation name. We call *R* the *predicate* of the fact.

A *database schema*\(\mathbf{S }\) is a finite set of relation names. An *instance* of \(\mathbf{S }\) is a finite set of facts with predicates from \(\mathbf{S }\). The set of all constants appearing in an instance *I* is called the *active domain* of *I* and denoted by \(\mathrm {adom}(I)\).

We further assume an infinite supply of *variables*, disjoint from \(\mathbf {dom}\). An *atom* is of the form \(R(x_1,\ldots ,x_k)\) where \(x_1,\ldots ,x_k\) are variables and *R* is a *k*-ary relation name. As with facts, we call *R* the predicate of the atom.

*B*is a finite set of atoms with predicates from \(\mathbf{S }\) and

*H*is an atom with a predicate not in \(\mathbf{S }\). The set

*B*is called the

*body*, and

*H*is called the

*head*. It is required that every variable occurring in the head also occurs in the body. We denote the set of variables occurring in a set of atoms

*B*(or a single atom

*A*) by \(\mathrm {var}(B)\) (or \(\mathrm {var}(A)\)).

The semantics of CQs is defined in terms of valuations. A *valuation* is a mapping \(\alpha : X \rightarrow \mathbf {dom}\) on some finite set of variables *X*. When *A* is an atom with \(\mathrm {var}(A) \subseteq X\), we can apply \(\alpha \) to *A* simply by applying \(\alpha \) to every variable in *A*. This results in a fact and is denoted by \(\alpha (A)\). When *B* is a set of atoms and \(\alpha \) is a valuation on \(\mathrm {var}(B)\), we can apply \(\alpha \) to *B* by applying \(\alpha \) to every atom in *B*. Formally, \(\alpha (B)\) is defined as the instance \(\{\alpha (A) \mid A \in B\}\).

*I*is an instance and \(\alpha \) is a valuation on \(\mathrm {var}(B)\) such that \(\alpha (B) \subseteq I\), we say that \(\alpha \) is a

*matching*of

*B*in

*I*, and denote this by \(\alpha : B \rightarrow I\). Now when

*Q*is a CQ \(H \leftarrow B\) and

*I*is an instance, the result of

*Q*on

*I*is defined as

### 3.2 Object-creating conjunctive queries

Assume a finite vocabulary of *function symbols* of various arities. As with relation names, the arity of a function symbol *f* is denoted by \(\mathrm {ar}(f)\).

*Data terms*are syntactical expressions built up from constants using function symbols. Formally, data terms are inductively defined as follows:

- 1.
Every constant is a data term;

- 2.
If

*f*is a*k*-ary function symbol and \(d_1,\ldots ,d_k\) are data terms, then the expression \(f(d_1,\ldots ,d_k)\) is also a data term.^{3}

*extended fact*is defined just like a fact, except that it may contain data terms rather than only constants. Formally, an extended fact is of the form \(R(d_1,\ldots ,d_k)\), where \(d_1,\ldots ,d_k\) are data terms and

*R*is a

*k*-ary relation name. The active domain of an extended fact \(e = R(d_1,\ldots ,d_k)\) is defined as

*extended instance*is a finite set of extended facts. The active domain of an extended instance

*J*is defined as

*Formula terms*are defined in the same way as data terms, but are built up from variables rather than constants.

*Extended atoms*are defined like atoms, but can contain formula terms in addition to variables. If

*t*is a formula term and \(\alpha \) is a valuation defined on all variables occurring in

*t*, we can apply \(\alpha \) to every variable occurrence in

*t*, obtaining a data term \(\alpha (t)\). Likewise, we can apply a valuation to an extended atom, resulting in an extended fact.

*object-creating conjunctive queries*(oCQ). Like a classical CQ, an oCQ is of the form \(H \leftarrow B\). The only difference with a classical CQ is that

*H*can be an extended atom; in particular,

*B*is still a finite set of ‘flat’ atoms, not extended atoms. It is still required that \(\mathrm {var}(H) \subseteq \mathrm {var}(B)\). The result of an oCQ \(Q = H \leftarrow B\) on an instance

*I*is now an extended instance, defined as

Instances used in Example 1

*Example 1*

*Q*from Sect. 1:

*I*is the instance consisting of the Mother and Father facts listed in Table 1, then

*Q*(

*I*) is the extended instance consisting of the extended Family facts listed in the same table.

*Example 2*

*Q*:

*I*is the instance consisting of the

*R*-facts listed in Table 2, then

*Q*(

*I*) consists of the extended

*T*-facts listed in the same table.

Instances used in Example 2

### 3.3 The single-function case

In this paper, we focus on *single-function* oCQs (sifo CQs) that have exactly one occurrence of a function symbol in the head. Without loss of generality, we always place the function term in the last position of the head.

**Definition 1**

*T*is the head predicate,

*f*is a function symbol,

*B*is the body, \({\bar{x}}\) is a tuple of (not necessarily distinct) variables from \(\mathrm {var}(B)\), called the

*distinguished variables*, \(\bar{z}\) is a tuple of (not necessarily distinct) variables from \(\mathrm {var}(B)\), called the

*creation variables*; some creation variables may be distinguished; the elements of \(\mathrm {var}(B)\) that are not distinguished are called the

*non-distinguished variables*.

### 3.4 Comparison with ILOG

Object-creating CQs can be considered to be the conjunctive query fragment of non-recursive ILOG [22]; our syntax exposes the Skolem functions, which are normally obscured in the standard ILOG syntax, and our semantics corresponds to what is called the ‘exposed semantics’ by Hull and Yoshikawa. Nevertheless, in the following section, we will consider oid-equivalence of sifo CQs, which does correspond to what has been called ‘obscured equivalence’ [23].

## 4 Characterization of oid-equivalence for sifo CQs

### 4.1 Oid-equivalence of oCQs

*Q*(

*I*) of an oCQ

*Q*applied to an instance

*I*is an extended instance. The data terms in \(\mathrm {adom}(Q(I))\) that are not constants play the role of created oids (also called invented values). Intuitively, it is clear that the actual form of the created oids does not matter.

Instance used in Example 4

*Example 4*

*Q*from Example 1:

We formalize this intuition in the following definitions.

**Definition 2**

*J*be an extended instance.

The set \(\mathrm {adom}(J) - \mathbf {dom}\) is denoted by \({\mathrm {oids}(J)}\);

The set \(\mathrm {adom}(J) \cap \mathbf {dom}\) is denoted by \({\mathrm {consts}(J)}\).

**Definition 3**

Let *J* be an extended instance and let \(\rho \) be a mapping from \(\mathrm {adom}(J)\) to the set of data terms. For any extended fact \(e = R(d_1,\ldots ,d_k)\) in *J*, we define \(\rho (e)\) to be the extended fact \(R(\rho (d_1),\ldots ,\rho (d_k))\). We then define \(\rho (J) := \{\rho (e) \mid e \in J\}\).

**Definition 4**

*oid-isomorphic*if there exists a bijection \(\rho : \mathrm {adom}(J_1) \rightarrow \mathrm {adom}(J_2)\) such that

\(\rho \) is the identity on \({\mathrm {consts}(J_1)}\);

\(\rho \) maps \({\mathrm {oids}(J_1)}\) to \({\mathrm {oids}(J_2)}\);

\(\rho (J_1)=J_2\).

*oid-isomorphism*from \(J_1\) to \(J_2\).

The above definition implies that oid-isomorphic instances have the same constants. Formally, if \(J_1\) and \(J_2\) are oid-isomorphic, then \({\mathrm {consts}(J_1)} = {\mathrm {consts}(J_2)}\).

**Definition 5**

Let *Q* and \(Q^{\prime }\) be two oCQs with the same head predicate and over the same database schema \(\mathbf{S }\). Then *Q* and \(Q^{\prime }\) are called *oid-equivalent* if for every instance *I* over \(\mathbf{S }\); the results *Q*(*I*) and \(Q^{\prime }(I)\) are oid-isomorphic.

*Example 5*

*I*of Table 1, the oid-isomorphism from

*Q*(

*I*) to \(Q^{\prime }(I)\) is as follows:

Instances used in Example 6

*Example 6*

*Q*from Example 2:

*Q*:

*Q*and \(Q^{\prime }\) are not oid-equivalent, as given by the simple instances in Table 4. Indeed, there cannot be an oid-isomorphism from

*Q*(

*I*) to \(Q^{\prime }(I)\) because

*Q*(

*I*) contains only one distinct oid while \(Q^{\prime }(I)\) contains two distinct oids.

Instances used in Example 7

### 4.2 Homomorphisms and containment of conjunctive queries

The characterizations we will give for oid-equivalence of sifo CQs depend on the classical notions of homomorphism and containment between conjunctive queries. Let us briefly recall these notions now [2, 13].

A *variable mapping* is a mapping *h* from a finite set *X* of variables to another finite set *Y* of variables. If *A* is an atom with variables in *X*, then we can apply *h* to each variable occurrence in *A* to obtain an atom with variables in *Y*, which we denote by *h*(*A*). If *B* is a set of atoms with \(\mathrm {var}(B) \subseteq X\), then we naturally define \(h(B):=\{h(A) \mid A \in B\}\).

For two sets *B* and \(B^{\prime }\) of atoms, a variable mapping \(h : \mathrm {var}(B) \rightarrow \mathrm {var}(B^{\prime })\) is called a *homomorphism from**B* to \(B^{\prime }\) if \(h(B) \subseteq B^{\prime }\). This is denoted by \(h : B \rightarrow B^{\prime }\). The notion of homomorphism is extended to conjunctive queries \(Q = H \leftarrow B\) and \(Q^{\prime } = H^{\prime } \leftarrow B^{\prime }\) as follows. A homomorphism from *Q* to \(Q^{\prime }\) is a homomorphism \(h : B \rightarrow B^{\prime }\) such that \(h(H)=H^{\prime }\). This is denoted by \(h : Q \rightarrow Q^{\prime }\).

A classical result relates homomorphisms between conjunctive queries to containment. Let *Q* and \(Q^{\prime }\) be two conjunctive queries over a common database schema \(\mathbf{S }\). We say that \(Q^{\prime }\)* is contained in Q* if for every instance *I* of \(\mathbf{S }\), we have \(Q^{\prime }(I) \subseteq Q(I)\). The classical result states that \(Q^{\prime }\) is contained in *Q* if and only if there exists a homomorphism \(h : Q \rightarrow Q^{\prime }\).

Two queries *Q* and \(Q^{\prime }\) are *equivalent* if for every instance *I* of \(\mathbf{S }\), we have \(Q(I)=Q^{\prime }(I)\). Since equivalence amounts to containment in both directions, two conjunctive queries are equivalent if and only if there exist homomorphisms between them in both directions.

### 4.3 A normal form for oid-equivalence problems

*Q*, \(Q^{\prime }\) with the same head predicate:

*Q*and \(Q^{\prime }\) can be reduced to the case where the heads

As a first lemma, we state that rearranging the creation variables of a query does not affect oid-equivalence.

**Lemma 1**

(Rearranging creation variables) Let *Q* be a sifo CQ written as above. Let \(\bar{u}\) be a tuple with exactly the same variables as \(\bar{z}\), but possibly with different repetitions and a different ordering, and let *g* be a function symbol whose arity is equal to the length of \(\bar{u}\). Then the sifo CQ \(P = T(\bar{x},g(\bar{u})) \leftarrow B\) is oid-equivalent to *Q*.

*Proof*

Let *I* be an instance. We define an oid-isomorphism from *Q*(*I*) to *P*(*I*) as follows. Any oid *o* in *Q*(*I*) is of the form \(f(\alpha (\bar{z}))\) for some matching \(\alpha :B\rightarrow I\); we define \(\rho (o) := g(\alpha (\bar{u}))\). This is well defined, i.e., independent of the choice of \(\alpha \). Indeed, if the data terms \(f(\alpha _1(\bar{z}))\) and \(f(\alpha _2(\bar{z}))\) are equal, then the tuples \(\alpha _1(\bar{z})\) and \(\alpha _2(\bar{z})\) are equal, which implies that \(\alpha _1\) and \(\alpha _2\) agree on every variable appearing in \(\bar{z}\). Since exactly the same variables appear in \(\bar{u}\), also the tuples \(\alpha _1(\bar{u})\) and \(\alpha _2(\bar{u})\) are equal, whence \(g(\alpha _1(\bar{u}))= g(\alpha _2(\bar{u}))\).

That \(\rho : {\mathrm {oids}(Q(I))} \rightarrow {\mathrm {oids}(P(I))}\) is injective is shown by an analogous argument. The surjectivity of \(\rho \), as well as the equality \(\rho (Q(I)) = P(I)\), is clear. \(\square \)

By the above lemma, we can remove all duplicates from \(\bar{z}\) and \(\bar{z}^{\prime }\) in the heads of *Q* and \(Q^{\prime }\), respectively. So, from now on we may assume \(\bar{z}\) and \(\bar{z}^{\prime }\) have no duplicates.

In the following, let *Z* equal the set of variables occurring in \(\bar{z}\), let *X* equal the set of variables occurring in \({\bar{x}}\), and let \(Z^{\prime }\) and \(X^{\prime }\) be defined similarly.

We next show that two sifo CQs can only be oid-equivalent if they have identical patterns of distinguished variables, up to renaming.

**Lemma 2**

(Renaming distinguished variables) If *Q* and \(Q^{\prime }\) are oid-equivalent, then there exists a bijective variable mapping \(\sigma : X \rightarrow X^{\prime }\) such that \(\sigma ({\bar{x}})=\bar{x}^{\prime }\).

*Proof*

Certainly, if *Q* and \(Q^{\prime }\) are oid-equivalent, then the conjunctive queries \(Q_0 = T_0({\bar{x}}) \leftarrow B\) and \(Q_0^{\prime } = T_0({\bar{x}^{\prime }})\leftarrow B^{\prime }\), where \(T_0\) is a new predicate symbol, are equivalent. So, there are homomorphisms \(h:Q_0 \rightarrow Q_0^{\prime }\) and \(h^{\prime }:Q_0^{\prime } \rightarrow Q_0\). In particular, \(h({\bar{x}})=\bar{x}^{\prime }\) and \(h^{\prime }({\bar{x}^{\prime }})=\bar{x}\). We define \(\sigma \) to be the restriction of *h* to *X*. The claim \(\sigma ({\bar{x}}) = \bar{x}^{\prime }\) and the surjectivity of \(\sigma \) are then clear. So it remains to show that \(\sigma \) is injective. Thereto, consider \(h^{\prime }(\sigma ({\bar{x}})) = h^{\prime }(h({\bar{x}})) = h^{\prime }({\bar{x}^{\prime }}) = \bar{x}\). We see that \(h^{\prime }\circ \sigma \) is the identity on *X* and thus injective. Hence, \(\sigma \) must be injective as well. \(\square \)

By the above lemma, if there does *not* exist a renaming \(\sigma \) as in the lemma, certainly *Q* and \(Q^{\prime }\) are not oid-equivalent. If there exists such a renaming, then by renaming the variables in one of the two queries; we can now assume without loss of generality that \({\bar{x}}= {\bar{x}^{\prime }}\) and in particular that \(X=X^{\prime }\).

The next step is to show that oid-equivalent queries must have the same distinguished variables among the creation variables, that is, \(X\cap Z=X \cap Z^{\prime }\).

**Lemma 3**

(Distinguished creation variables) If \(X\cap Z \ne X \cap Z^{\prime }\), then *Q* and \(Q^{\prime }\) are not oid-equivalent.

*Proof*

Either there exists some \(x \in X\cap Z\) but not in \(Z^{\prime }\) or vice versa. By symmetry, we may assume the first possibility.

We construct an instance *I* from \(B^{\prime }\). In doing this, to keep our notation simple, we consider the variables in \(B^{\prime }\) to be constants. The instance *I* is obtained from \(B^{\prime }\) by duplicating *x* to some new element \(x_2\). Formally, consider the mapping *d* on \(\mathrm {var}(B^{\prime })\) that is the identity everywhere except that *x* is mapped to \(x_2\); then \(I = B^{\prime } \cup d(B^{\prime })\).

First, let us look at \(Q^{\prime }(I)\). Using the identity matching that maps every variable to itself, we obtain the extended fact \(T(\bar{x},f^{\prime }(\bar{z}^{\prime })) \in Q^{\prime }(I)\). Using the matching *d* defined above, we obtain the extended fact \(T(\bar{x}_2,f^{\prime }(d(\bar{z}^{\prime })))\) in \(Q^{\prime }(I)\). Here, \(\bar{x}_2\) denotes \(d({\bar{x}})\), i.e., \({\bar{x}}_2\) is obtained from \(\bar{x}\) by replacing *x* with \(x_2\). Since *x* does not belong to \(Z^{\prime }\), we have \(d(\bar{z}^{\prime }) = \bar{z}^{\prime }\), so \(T(\bar{x}_2,f^{\prime }(\bar{z}^{\prime })) \in Q^{\prime }(I)\).

On the other hand, in *Q*(*I*) consider any two extended facts \(T(\alpha _1(\bar{x}),f(\alpha _1(\bar{z})))\) and \(T(\alpha _2(\bar{x}),f(\alpha _2(\bar{z})))\), with matchings \(\alpha _1:B \rightarrow I\) and \(\alpha _2:B \rightarrow I\), such that \(\alpha _1(\bar{x})=\bar{x}\) and \(\alpha _2(\bar{x}) = \bar{x}_2\). Then in particular \(\alpha _1(x)=x\) and \(\alpha _2(x)=x_2\). Since \(\alpha _1\) and \(\alpha _2\) differ on *x* and *x* is in *Z*, also \(\alpha _1(\bar{z})\) and \(\alpha _2(\bar{z})\) are different. Hence, the two last components \(f(\alpha _1(\bar{z}))\) and \(f(\alpha _2(\bar{z}))\) are different. Thus, we see that in *Q*(*I*) it is impossible to have two extended atoms \(T(\bar{x},o)\) and \(T(\bar{x}_2,o)\) with the same oid *o*. But we have seen this is possible in \(Q^{\prime }(I)\), so *Q*(*I*) and \(Q^{\prime }(I)\) are not oid-isomorphic and *Q* and \(Q^{\prime }\) cannot be oid-equivalent. \(\square \)

By the above lemma, we now assume \(X\cap Z=X\cap Z^{\prime }\). The last step is to show that \(Z -X\) and \(Z^{\prime } -X\), the sets of non-distinguished creation variables, need to have the same cardinality.

**Lemma 4**

(Non-distinguished creation variables) If \(Z -X\) and \(Z^{\prime } -X\) have different cardinality, then *Q* and \(Q^{\prime }\) are not oid-equivalent.

*Proof*

As in the proof of Lemma 3, we consider *B* as an instance, viewing variables as constants.

*k*and \(k^{\prime }\) be the cardinalities of \(Z -X\) and \(Z -X^{\prime }\), respectively. By symmetry, we may assume that \(k > k^{\prime }\). Now, for any natural number

*n*, let \(I_n\) be the instance obtained from

*B*by independently multiplying each variable \(z \in Z -X\) into

*n*fresh copies \(z^{(1)}, \ldots , z^{(n)}\). Formally, for any function \(d:Z -X \rightarrow \{1,\ldots ,n\}\), let \(\hat{d}\) be the valuation on \(\mathrm {var}(B)\) that maps each \(z \in Z-X\) to \(z^{(d(z))}\) and that is the identity on all other variables. Then

*B*in \(I_n\); all these matchings are the identity on \({\bar{x}}\) but are pairwise different on \({\bar{z}}\). Thus, there are at least \(n^k\) different extended facts in \(Q(I_n)\) of the form \(T(\bar{x},o)\).

On the other hand, consider any set *S* of valuations from \(X \cup Z^{\prime }\) to \(\mathrm {adom}(I_n)\) that are pairwise different on \(Z^{\prime } -X\) but that all agree on *X*. The cardinality of \(Z^{\prime } -X\) is \(k^{\prime }\). The cardinality of \(\mathrm {adom}(I_n)\) is *O*(*n*) (although the cardinality of \(I_n\) itself is larger). Hence, such a set *S* can be of cardinality at most \(O(n^{k^{\prime }})\). Consequently, since \(k>k^{\prime }\), for *n* large enough, \(Q^{\prime }(I_n)\) cannot possibly contain \(n^k\) different extended facts of the form \(T(\bar{x}, o)\). But we saw that this is possible in \(Q(I_n)\). So, \(Q(I_n)\) and \(Q^{\prime }(I_n)\) are not oid-isomorphic and *Q* and \(Q^{\prime }\) cannot be oid-equivalent. \(\square \)

By the above lemma and after renaming the variables in \(Z^{\prime } -X\) and reordering the variables in \(\bar{z}^{\prime }\), we may now indeed assume that \(\bar{z}\) and \(\bar{z}^{\prime }\) are identical.

### 4.4 Characterization of oid-equivalence

*Q*and \(Q^{\prime }\) have identical tuples \(\bar{x}\) and \(\bar{z}\) of distinguished and creation variables; moreover, \(\bar{z}\) contains no variable more than once. As before, we denote the sets of distinguished and creation variables as

*X*and

*Z*, respectively.

We will show that *Q* and \(Q^{\prime }\) are oid-equivalent if and only if there are homomorphisms between *B* and \(B^{\prime }\) in both directions that (i) keep \(\bar{x}\) fixed and (ii) possibly permute the variables in \(\bar{z}\). To make this formal, we associate with each query a classical CQ without function symbols.

**Definition 6**

Fix a new relation symbol \(\mathring{T}\) of arity the sum of the lengths of \(\bar{x}\) and \(\bar{z}\). The *flattening* of *Q* is the query \(\mathring{Q} = \mathring{T}(\bar{x},\bar{z}) \leftarrow B\). The query \(\mathring{Q}^{\prime }\) is defined similarly.

**Proposition 1**

If there exists a permutation \(\pi \) of \(Z-X\) such that \(\mathring{Q}^\pi \) and \(\mathring{Q}^{\prime }\) are equivalent, then *Q* and \(Q^{\prime }\) are oid-equivalent.

*Proof*

Let *I* be an instance. We define an oid-isomorphism \(\rho \) from *Q*(*I*) to \(Q^{\prime }(I)\) as follows. Any oid *o* in *Q*(*I*) is of the form \(f(\alpha (\bar{z}))\) for some matching \(\alpha :B \rightarrow I\); we define \(\rho (o) := f^{\prime }(\alpha (\pi (\bar{z})))\). This is well defined, i.e., independent of the choice of \(\alpha \). Indeed, if the data terms \(f(\alpha _1(\bar{z}))\) and \(f(\alpha _2(\bar{z}))\) are equal, then the tuples \(\alpha _1(\bar{z})\) and \(\alpha _2(\bar{z})\) are equal, and consequently, the permuted tuples \(\alpha _1(\pi (\bar{z}))\) and \(\alpha _2(\pi (\bar{z}))\) are equal. Hence, \(f^{\prime }(\alpha _1(\pi (\bar{z})))=f^{\prime }(\alpha _2(\pi (\bar{z})))\).

The injectivity of \(\rho : {\mathrm {oids}(Q(I))} \rightarrow {\mathrm {oids}(Q^{\prime }(I))}\) is shown by an analogous argument. The surjectivity of \(\rho \) and the equality \(\rho (Q(I)) = Q^{\prime }(I)\) follow readily from the equality \(\mathring{Q}^\pi (I)=\mathring{Q}^{\prime }(I)\). \(\square \)

We next prove that the sufficient condition given by the above Proposition is actually also necessary for oid-equivalence. The key idea for proving this is to show that oid-equivalence of sifo CQs depends only on the *number* of oids generated for any binding of the distinguished variables.

*I*and any tuple \(\bar{c}\) of elements from \(\mathrm {adom}(I)\), we define

*o*that occur together with \(\bar{c}\) in

*Q*(

*I*). We will show that

*Q*and \(Q^{\prime }\) are oid-equivalent if and only if \(\#_{\bar{c}}(Q,I) = \#_{\bar{c}}(Q^{\prime },I)\) for all instances

*I*and tuples \(\bar{c}\). The only-if direction of this statement is obvious, but the if-direction is not so obvious.

For our proof, we rely on work by Cohen [14] who studied queries with multiset variables that are evaluated under so-called combined semantics, a semantics that combines set and multiset semantics. Cohen characterized equivalence of such queries in terms of homomorphisms.

Queries with multiset variables (MV queries) have the form \(Q_0,M\) where \(Q_0\) is a standard CQ and *M* is some set of variables of \(Q_0\) that do not appear in the head of \(Q_0\). The elements of *M* are called the multiset variables. Evaluating an MV query \(Q_0,M\) on an instance *I* results in a multiset (bag) of facts, where the number of times a fact occurs is related to the number of different possible assignments of values to the multiset variables.

*I*be an input instance. Recall that \(Q_0(I)\) according to the classical semantics equals

*W*be the set of variables appearing in \(H_0\). Then the result of evaluating the MV query \(Q_0,M\) on instance

*I*is defined to be the multiset with ground set \(Q_0(I)\), where for each fact \(e \in Q_0(I)\); the multiplicity of

*e*in the multiset is defined to be

*total*number of different such matchings \(\gamma \), but rather the number of different

*restrictions*one obtains when restricting these matchings \(\gamma \) to

*M*.

^{4}

Two MV queries are *equivalent* if they evaluate to the same multiset on every input instance. Equivalence of MV queries can be characterized using the notion of multiset homomorphism [14]. A *multiset homomorphism* from MV query \(Q_0,M\) to MV query \(Q^{\prime }_0,M^{\prime }\) is a homomorphism \(h : Q_0 \rightarrow Q_0^{\prime }\) such that *h* is injective on *M* and \(h(M) \subseteq M^{\prime }\). Cohen showed the following:

**Theorem 1**

([14, Thm 5.3]) Two MV queries are equivalent if and only if there are multiset homomorphisms between them in both directions.

To leverage this result on MV equivalence, we associate two MV queries with our given sifo CQs in the following way.

**Definition 7**

The following proposition now relates oid-equivalence to MV equivalence:

**Proposition 2**

If *Q* and \(Q^{\prime }\) are oid-equivalent, then the MV queries \(\tilde{Q}\) and \(\tilde{Q}^{\prime }\) are equivalent.

*Proof*

Let *I* be an instance. We must show that the multisets \(\tilde{Q}(I)\) and \(\tilde{Q}^{\prime }(I)\) are equal. Since *Q* and \(Q^{\prime }\) are oid-equivalent, the ground sets \(Q_0(I)\) and \(Q_0^{\prime }(I)\) of \(\tilde{Q}(I)\) and \(\tilde{Q}^{\prime }(I)\) are already equal. We must show that the element multiplicities are the same as well.

*Q*(

*I*) and \(Q^{\prime }(I)\) are oid-isomorphic, the left-hand sides of the above two equalities are equal. Hence, the right-hand sides are equal as well. But these are precisely the multiplicities of \(T_0(\bar{c})\) in \(\tilde{Q}(I)\) and \(\tilde{Q}^{\prime }(I)\), respectively. \(\square \)

The following proposition further relates MV equivalence to equivalence of the flattenings up to permutation:

**Proposition 3**

If the MV queries \(\tilde{Q}\) and \(\tilde{Q}^{\prime }\) are equivalent, then there exists a permutation \(\pi \) of \(Z-X\) such that \(\mathring{Q}^\pi \) and \(\mathring{Q}^{\prime }\) are equivalent.

*Proof*

By Theorem 1, there exist a multiset homomorphism *h* from \(\tilde{Q}\) to \(\tilde{Q}^{\prime }\) and a multiset homomorphism \(h^{\prime }\) from \(\tilde{Q}^{\prime }\) to \(\tilde{Q}\). Since Theorem 1 also implies that *h* is injective on \(Z-X\) and that \(h(Z-X)\subseteq Z-X\), we can conclude that *h* acts as a permutation on \(Z-X\). Moreover, *h* is the identity on *X*. The same two properties hold for \(h^{\prime }\).

Now put \(\pi = (h|_{Z-X})^{-1}\). Then \(h : \mathring{Q}^\pi \rightarrow \mathring{Q}^{\prime }\). So it remains to find a homomorphism \(h^{\prime \prime } : \mathring{Q}^{\prime } \rightarrow \mathring{Q}^\pi \). Thereto, note that \(h^{\prime }h\) acts as a permutation on \(Z-X\). Since \(Z-X\) is finite, there exists a nonzero natural number *m* such that \((h^{\prime }h)^m\) is the identity on \(Z-X\). Equivalently, \((h^{\prime }h)^{m-1}h^{\prime }\) equals \(\pi \) on \(Z-X\). We conclude that \((h^{\prime }h)^{m-1}h^{\prime }\) is the desired homomorphism \(h^{\prime \prime }\). \(\square \)

We summarize the three preceding Propositions in the following.

**Theorem 2**

*Q*and \(Q{\prime }\) have identical tuples \(\bar{x}\) and \(\bar{z}\) of distinguished and creation variables and where \(\bar{z}\) contains no variable more than once. Denote the sets of distinguished and creation variables by

*X*and

*Z*, respectively.

- 1.
The sifo CQs

*Q*and \(Q^{\prime }\) are oid-equivalent; - 2.
The MV queries \(\tilde{Q}\) and \(\tilde{Q}^{\prime }\) are equivalent;

- 3.
There is a permutation \(\pi \) of \(Z-X\) such that the classical CQs \(\mathring{Q}^\pi \) and \(\mathring{Q}^{\prime }\) are equivalent.

### 4.5 Computational complexity

The results of this section imply the following:

**Corollary 1**

Testing oid-equivalence of sifo CQs is NP-complete.

*Proof*

*Q*and \(Q^{\prime }\) with the same head predicate:

*X*, \(X^{\prime }\),

*Z*, and \(Z^{\prime }\) denote the sets of variables occurring in \(\bar{x}\), \(\bar{x}^{\prime }\), \(\bar{z}\) and \(\bar{z}^{\prime }\), respectively.

To test oid-equivalence, we begin by removing duplicates in \(\bar{z}\) and \(\bar{z}^{\prime }\), as justified by Lemma 1. Note that \(\bar{x}\) and \(\bar{x}^{\prime }\) have the same length *k*, because of the fixed arity of *T*. So we can write \(\bar{x}=x_1,\ldots ,x_k\) and \(\bar{x}^{\prime }=x^{\prime }_1,\ldots ,x^{\prime }_k\). Consider the mapping \(\sigma = \{(x_1,x^{\prime }_1),\ldots , (x_k,x^{\prime }_k)\}\). We test whether \(\sigma \) is a bijection from *X* to \(X^{\prime }\); if not, then *Q* and \(Q^{\prime }\) are not oid-equivalent by Lemma 2. If \(\sigma \) is a bijection, we can safely replace every variable \(x^{\prime }\) in \(X^{\prime }\) by \(\sigma ^{-1}(x^{\prime })\), which yields a sifo CQ that is oid-equivalent to \(Q^{\prime }\). Hence, from now on we may assume that \(\bar{x}=\bar{x}^{\prime }\) and in particular \(X=X^{\prime }\).

Next, we test whether \(X \cap Z = X \cap Z^{\prime }\) and whether \(Z-X\) and \(Z^{\prime }-X\) have the same cardinality; if one of the two tests fails then *Q* and \(Q^{\prime }\) are not oid-equivalent by Lemmas 3 and 4. Otherwise, we can rename the variables in \(Z^{\prime }-X\), so that we may assume that \(\bar{z}=\bar{z}^{\prime }\).

We are now left in the situation where *Q* and \(Q^{\prime }\) are in the general forms (3) and (4) from Sect. 4.4, to which Theorem 2 applies. By the third statement of this theorem, we can test oid-equivalence of *Q* and \(Q^{\prime }\) in NP by guessing a permutation \(\pi \) and two homomorphisms between \(\mathring{Q}^\pi \) and \(\mathring{Q}^{\prime }\) in both directions.

NP-hardness follows immediately because the problem has equivalence of classical CQs as a special case, which is well known to be NP-hard. Indeed, oid-equivalence of sifo CQs *Q* and \(Q^{\prime }\) in the special case where the creation functions are nullary amounts to classical equivalence when we ignore the function terms in the heads.

## 5 Logical entailment of sifo CQs interpreted as schema mappings

*Q*of the general form \(T(\bar{x},f(\bar{z})) \leftarrow B\) over the database schema \(\mathbf{S }\). Let \(\bar{v}\) be the sequence of all variables used in

*B*. Then we may view

*Q*as a second-order implicational statement over the augmented schema \(\mathbf{S } \cup \{T\}\), as follows:

*H*is the head and

*B*is conveniently used to stand for the conjunction of its elements. Note that this formula is second order because it existentially quantifies a function

*f*; we denote the above formula by \(\mathrm {sotgd}(Q)\). This formula belongs to the well-known class of second-order tuple-generating dependencies (SO-tgds). More specifically, it is a

*plain*SO-tgd [7].

Plain SO-tgd may consist of multiple rules; sifo CQs consist of a single rule.

The head of a plain SO-tgd may consist of multiple atoms; the head of a sifo CQ consists of a single atom (this is similar to GAV mappings [12, 27], although the classical notion of GAV mapping does not use function symbols).

There is only one function symbol, which moreover can be applied only once in the head.

*Q*as an SO-tgd, the semantics becomes that of a schema mapping. Specifically, let

*I*be an instance over \(\mathbf{S }\), considered as a source instance, and let

*J*be an instance over \(\{T\}\), considered as a target instance. Then (

*I*,

*J*) together form an instance over the augmented schema \(\mathbf{S } \cup \{T\}\). Now we say that (

*I*,

*J*) satisfies

*Q*, denoted by \((I,J) \models Q\), if the structure \((\mathrm {adom}(I) \cup \mathrm {adom}(J),I,J)\) satisfies \(\mathrm {sotgd}(Q)\) under the standard semantics of second-order logic, using \(\mathrm {adom}(I) \cup \mathrm {adom}(J)\) as the universe of the structure.

Instances \(J_1\) and \(J_2\) from Example 8

Instance \(J_3\) from Example 8

*Example 8*

*I*consisting of the Mother and Father facts listed in Table 1, and take the instances \(J_1\) and \(J_2\) consisting of the Family facts listed in Table 6 left and right, respectively. Then both pairs \((I,J_1)\) and \((I,J_2)\) satisfy the SO-tgd. For \(J_1\), this is witnessed by the following function

*f*:

*f*witnessing the truth of the formula on \((I,J_3)\). Since beth has anne as mother and adam as father, the fact

*Remark 1*

Note that, by the purely implicational nature of SO-tgds, if (*I*, *J*) satisfies an SO-tgd and \(J \subseteq J^{\prime }\), then also \((I,J^{\prime })\) satisfies the SO-tgd. Hence, continuing the previous example, for any instance \(J^{\prime }\) obtained by \(J_1\) or \(J_2\) by adding some more Family facts, the pair \((I,J^{\prime })\) would still satisfy the SO-tgd from the example.

The above example and remark show that given a source instance *I*, there are in general multiple possible target instances *J* such that \((I,J) \models Q\). This is in contrast to the semantics of *Q* as an oCQ, where *Q*(*I*) is an extended instance that is uniquely defined. Still, there is a connection between the oCQ semantics and the SO-tgd semantics. Specifically, *Q*(*I*) can be viewed as a target instance in a canonical manner, using *oid-to-constant assignments* (oc-assignments for short) defined as follows.

**Definition 8**

Let *I* be a source instance and let *J* be an extended instance over \(\{T\}\) such that \({\mathrm {consts}(J)} \subseteq \mathrm {adom}(I)\). An *oc-assignment* for *J* with respect to *I* is an injective mapping \(\rho : {\mathrm {oids}(J)} \rightarrow \mathbf {dom}\) so that the image of \(\rho \) is disjoint from \(\mathrm {adom}(I)\).

Thus, \(\rho \) assigns to each non-constant data term from *J* a different constant that is not in \(\mathrm {adom}(I)\).

We now observe the following obvious property giving a connection between the oCQ semantics and the SO-tgd semantics:

**Proposition 4**

Let *I* be a source instance and let \(\rho \) be an oc-assignment for *Q*(*I*) with respect to *I*. Then \((I,\rho (Q(I))) \models Q\).

In fact, *Q*(*I*) corresponds to what Fagin et al. [18] call the *chase of I with *\(\mathrm {sotgd}(Q)\).

### 5.1 Nested dependencies

*B*, except for the creation variables (the variables from \(\bar{z}\)). Furthermore, let

*w*be a fresh variable not occurring in

*B*, and let \(H^{\prime }\) be the atom \(T(\bar{x},w)\). We can now associate with

*Q*the following implicational statement, denoted by \(\mathrm {ntgd}(Q)\):

### 5.2 Logical entailment

In Sect. 4, we have shown that equivalence of sifo CQs as object-creating queries is decidable. Now that we have seen that sifo CQs can also be given a semantics as schema mappings; we may again ask if equivalence under this alternative semantics is decidable. The answer is affirmative; we have seen in the previous subsection that sifo CQ mappings belong to the class of nested dependencies, and logical implication of nested dependencies has recently been shown to be decidable [26]. When this general implication test for nested dependencies is applied specifically to sifo CQ schema mappings, it can be implemented in non-deterministic polynomial time. Hence, logical entailment (and also logical equivalence) of sifo CQ schema mappings is NP-complete.

In the present section, we present a specialized logical entailment test for sifo CQ schema mappings which is much simpler and more elegant and provides more insight into the problem by relating it to testing implication of a join dependency by a CQ (Theorem 3). Interestingly, there is a striking correspondence between the general implication test when applied to sifo CQs and the strategy we use to prove our theorem. An in-depth comparison will be given in Sect. 6, after we have stated the theorem formally and have seen its proof.

*logically entails*\(\mathcal{M}^{\prime }\) if the following implication holds for every instance

*I*over \(\mathbf{S }\) and every instance

*J*over \(\{T\}\):

**Definition 9**

Let *Q* and \(Q^{\prime }\) be two sifo CQs with the same head predicate and over the same database schema. We say that *Q logically entails*\(Q^{\prime }\) if \(\mathrm {sotgd}(Q)\) logically entails \(\mathrm {sotgd}(Q^{\prime })\).

Instances used in Example 9

*Example 9*

*Q*and \(Q^{\prime }\) from Example 6:

*Q*logically entails \(Q^{\prime }\). Indeed, if there exists a function

*f*witnessing the truth of \(\mathrm {sotgd}(Q)\), then we can easily define a function \(f^{\prime }\) witnessing the truth of \(\mathrm {sotgd}(Q^{\prime })\) by defining \(f^{\prime }(x,y) := f(y)\).

Conversely, however, \(Q^{\prime }\) does not logically entail *Q*. Indeed, Table 8 shows (*I*, *J*) where \((I,J) \models Q^{\prime }\) but \((I,J) \not \models Q\).

*Example 10*

*Q*and \(Q^{\prime }\) from Example 7:

*Q*and \(Q^{\prime }\) are not oid-equivalent, they are logically equivalent: They logically entail each other. The logical entailment of \(Q^{\prime }\) by

*Q*is again clear. To see the converse direction, assume \(f^{\prime }\) witnesses the truth of \(\mathrm {sotgd}(Q^{\prime })\). Then we define

*f*(

*x*) for any

*x*as follows: If there exists a pair (

*y*,

*z*) such that

*R*(

*x*,

*y*,

*z*) holds, we fix one such pair (

*y*,

*z*) arbitrarily and define \(f(x) := f^{\prime }(x,y,z)\). If no such

*y*and

*z*exist, we may define

*f*(

*x*) arbitrarily. It is now clear that this

*f*witnesses the truth of \(\mathrm {sotgd}(Q)\).

*Example 11*

*Q*and \(Q^{\prime }\) logically entail each other. The logical entailment of \(Q^{\prime }\) by

*Q*is again clear. To see the converse direction, we can use a reasoning similar to that used in Example 10. Assume \(f^{\prime }\) witnesses the truth of \(\mathrm {sotgd}(Q^{\prime })\). Then we define \(f(z_1)\) for any \(z_1\) as follows: If there exists \(z_2\) such that \(R(z_1,z_2)\) holds, we fix one such \(z_2\) arbitrarily and define \(f(z_1) := f^{\prime }(z_1,z_2)\). If no such \(z_2\) exists, we may define \(f(z_1)\) arbitrarily. The function

*f*thus defined witnesses the truth of \(\mathrm {sotgd}(Q)\).

Note that the kind of reasoning used here and in Example 10 does not work in the case of Example 9. In Theorem 3, we will characterize formally when this kind of reasoning is correct.

Example 10 shows that logical equivalence (logical entailment in both directions) does not imply oid-equivalence of sifo CQs. We will see in Theorem 4 that the other direction does hold.

### 5.3 Join dependencies and tableau queries

In our characterization of sifo CQ logical entailment, we use a number of concepts from classical relational database theory [2], which we recall here briefly.

Recall that a *relation scheme* is a finite set of elements called *attributes*. It is customary to denote the union of two relation schemes *X* and *Y* by juxtaposition, thus writing \(\textit{XY}\) for \(X \cup Y\).

A *tuple* over a relation scheme *U* is a function from *U* to \(\mathbf {dom}\). A *relation* over *U* is a finite set of tuples over *U*.

Let *t* be a tuple over *U* and let \(X \subseteq U\). The restriction of *t* to *X* is denoted by *t*[*X*]. The *projection*\(\pi _X(r)\) of a relation *r* over *U* equals \(\{\,t[X] \mid t \in r\,\}\).

We now turn to tableau queries, which are an alternative formalization of conjunctive queries so that the result of a query is a set of tuples rather than a set of facts. Let \(\mathbf{S }\) be a database schema, and let *B* be a finite set of atoms with predicates from \(\mathbf{S }\), as would be the body of a CQ over \(\mathbf{S }\). Let \(V = \mathrm {var}(B)\). For any \(U \subseteq V\), the pair (*B*, *U*) is called a *tableau query* over \(\mathbf{S }\). When applied to an instance *I* over \(\mathbf{S }\), this tableau query returns a relation over *U* in the following manner. Let \(\mathrm {Mat}(B,I)\) be the set of all matchings of *B* in *I*. Using variables for attributes, *V* can be viewed as a relation scheme. Under this view, every valuation on *V* is a tuple over *V*, and thus, \(\mathrm {Mat}(B,I)\) is a relation over *V*. We now define the result of (*B*, *U*) on input *I* to be \(\pi _U(\mathrm {Mat}(B,I))\). This result is denoted by (*B*, *U*)(*I*).

*natural join*\(r_1 \bowtie r_2\), for relations \(r_1\) and \(r_2\) over \(U_1\) and \(U_2\), respectively, then equals

*r*over some relation scheme

*U*. Let \(U_1\) and \(U_2\) be subsets of

*U*(not necessarily disjoint) such that \(U = U_1U_2\). Then

*r*satisfies the

*join dependency*(JD) \(U_1 \bowtie U_2\) if \(r = \pi _{U_1}(r) \bowtie \pi _{U_2}(r)\). Note that the containment from left to right is trivial, so one only needs to verify the containment \(\pi _{U_1}(r) \bowtie \pi _{U_2}(r) \subseteq r\).

The logical implication of JDs by tableau queries is well understood and can be solved by the chase procedure with NP complexity [2, 25]. Formally, a tableau query \(Q=(B,U)\) over \(\mathbf{S }\) is said to *imply* a JD over *U* if for every instance *I* over \(\mathbf{S }\), the relation *Q*(*I*) satisfies this JD.

### 5.4 Decidability of sifo CQ logical entailment

*Q*and \(Q^{\prime }\) with the same head predicate:

*Remark 2*

We assume *Q* and \(Q^{\prime }\) to have their function symbol in the same position in the head (here taken to be the last position). This is justified because otherwise *Q* could never logically entail \(Q^{\prime }\). In proof, suppose the function symbol in the head of \(Q^{\prime }\) would not be in the last position. Then we have a variable \(x^{\prime }\) from \(B^{\prime }\) in the last position. Now consider an instance *I* such that both *Q*(*I*) and \(Q^{\prime }(I)\) are non-empty (such an instance could be constructed by taking the disjoint union of *B* and \(B^{\prime }\) and substituting constants for variables). Let \(\rho \) by an oc-assignment for *Q*(*I*) with respect to *I*. By Proposition 4, we have \((I,\rho (Q(I))) \models Q\). In \(\rho (Q(I))\), none of the elements in the last position of a *T*-fact belongs to \(\mathrm {adom}(I)\). But then \((I,\rho (Q(I)))\) cannot satisfy \(Q^{\prime }\). Indeed, since \(Q^{\prime }(I)\) is non-empty, there is a matching \(\alpha ^{\prime }:B^{\prime }\rightarrow I\). In any \(J^{\prime }\) such that \((I,J^{\prime }) \models Q^{\prime }\), there needs to be a *T*-fact with \(\alpha ^{\prime }(x^{\prime })\) in the last position, and \(\alpha ^{\prime }(x^{\prime }) \in \mathrm {adom}(I)\). We conclude that *Q* does not logically entail \(Q^{\prime }\).

In what follows we use *X*, *Z* and \(Z^{\prime }\) to denote the sets of variables appearing in the tuples \(\bar{x}\), \(\bar{z}\) and \(\bar{z}^{\prime }\), respectively.

We establish:

**Theorem 3**

*Q*logically entails \(Q^{\prime }\) if and only if there exists a homomorphism \(h : B \rightarrow B^{\prime }\) satisfying the following conditions:

- 1.
\(h(\bar{x}) = \bar{x}^{\prime }\);

- 2.
\(h(X \cap Z) \subseteq Z^{\prime }\);

- 3.Let \(Y_h := h^{-1}(Z^{\prime })\), i.e.,Then the tableau query \((B,\textit{XY}_hZ)\) implies the join dependency \(\textit{XY}_h \bowtie Y_hZ\).$$\begin{aligned} Y_h = \left\{ y \in \mathrm {var}(B) \mid h(y) \in Z^{\prime }\right\} . \end{aligned}$$

#### 5.4.1 Proof of sufficiency

Let \((I,J) \models Q\), witnessed by the function *f*. We must show \((I,J) \models Q^{\prime }\). This means finding a function \(f^{\prime }\) witnessing the truth of \(\mathrm {sotgd}(Q^{\prime })\) in (*I*, *J*).

Call any two matchings \(\alpha _1,\alpha _2 \in \mathrm {Mat}(B,I)\) equivalent if they agree on \(Y_h\). This is denoted by \(\alpha _1 \equiv \alpha _2\). Let \(\rho \) be any function from \(\mathrm {Mat}(B,I)\) to \(\mathrm {Mat}(B,I)\) with the two properties, first, that \(\rho (\alpha ) \equiv \alpha \) and, second, that \(\alpha _1 \equiv \alpha _2\) implies \(\rho (\alpha _1) = \rho (\alpha _2)\). Thus, \(\rho \) amounts to choosing a representative out of each equivalence class. We denote the application of \(\rho \) by subscripting, writing \(\rho (\alpha )\) as \(\rho _\alpha \).

Let us define \(f^{\prime }\) as follows. Take any matching \(\beta : B^{\prime } \rightarrow I\). Then we put \(f^{\prime }(\beta (\bar{z}^{\prime })) := f(\rho _{\beta \circ h}(\bar{z}))\). To see that this is well defined, recall that \(h(Y_h) \subseteq Z^{\prime }\). Hence, \(\beta _1(\bar{z}^{\prime }) = \beta _2(\bar{z}^{\prime })\) implies that \(\beta _1 \circ h \equiv \beta _2 \circ h\), so \(\rho _{\beta _1\circ h} = \rho _{\beta _2 \circ h}\).

We now show that this interpretation of \(f^{\prime }\) satisfies the requirements. Specifically, let \(\beta : B^{\prime } \rightarrow I\) be a matching. We must show that \(T(\beta (\bar{x}^{\prime }),f^{\prime }(\beta (\bar{z}^{\prime }))) \in J\). Consider the valuations \(\beta _1=\beta \circ h\) and \(\beta _2=\rho _{\beta \circ h}\), both belonging to \(\mathrm {Mat}(B,I)\), and viewed as tuples over the relation scheme \(\mathrm {var}(B)\). Since these two tuples agree on \(Y_h\), also the two restrictions \(\beta _1[Y_hX]\) and \(\beta _2[Y_hZ]\) agree on \(Y_h\). Since \(X \cap Z \subseteq Y_h\), the union \(\beta _1[Y_hX] \cup \beta _2[Y_hZ]\) is a well-defined tuple over \(\textit{XY}_hZ\). Since \(\pi _{\textit{XY}_hZ}(\mathrm {Mat}(B,I))\) satisfies the JD \(Y_hX \bowtie Y_hZ\), the union belongs to \(\pi _{\textit{XY}_hZ}(\mathrm {Mat}(B,I))\). Hence, there exists a valuation \(\gamma \in \mathrm {Mat}(B,I)\) that agrees with \(\beta \circ h\) on *X*, and with \(\rho _{\beta \circ h}\) on *Z*. Since \((I,J) \models Q\), we have \(T(\gamma (\bar{x}),f(\gamma (\bar{z}))) \in J\). By the preceding, \(\gamma (\bar{x}) = \beta (h(\bar{x}))\) and \(\gamma (\bar{z}) = \rho _{\beta \circ h}(\bar{z}) = g(\beta (\bar{z}^{\prime }))\). We conclude that \(T(\beta (\bar{x}^{\prime }),g(\beta (\bar{z}^{\prime }))) \in J\) as desired.

#### 5.4.2 Proof of necessity

Let \(V^{\prime }=\mathrm {var}(B^{\prime })\), and let *n* be the arity of *f*. For each \(l \in \{0,1,\ldots ,n\}\) and each \(u \in V^{\prime } -Z^{\prime }\), we introduce a fresh copy of *u*, denoted by \(u^l\). We say that this fresh copy is ‘colored’ with color *l*. For each variable \(u \in Z^{\prime }\), we simply define \(u^l\) to be *u* itself. We say that the variables in \(Z^{\prime }\) are ‘colored white.’

For any tuple of variables \(\bar{u}=(u_1,\ldots ,u_p)\) in \(V^{\prime }\), we denote the tuple \((u_1^l,\ldots ,u_p^l)\) by \(\bar{u}^l\). In this tuple, all variables are colored *l* or white. We then define \(B^{\prime ^{l}} = \{\,R(\bar{u}^l) \mid R(\bar{u}) \in B^{\prime }\,\}\) and view it as an instance, i.e., the variables \(u^l\) are considered to be constants.

Now define the instance \(I = \bigcup \nolimits _{l=0}^n B^{{\prime }^{l}}\), and construct the instance \(J = Q(I)\). By Proposition 4, \((I,J) \models Q\), where we omit the oc-assignment for the sake of clarity. Since *Q* logically entails \(Q^{\prime }\), also \((I,J) \models Q^{\prime }\). Hence, there exists a function \(f^{\prime }\) such that for each color *l*, using the matching \(\mathrm {id}^l: B^{\prime } \rightarrow I\), \(u \mapsto u^l\), the fact \(T(\bar{x}^{{\prime }^{l}},f^{\prime }(\bar{z}^{{\prime }^{l}})) = T(\bar{x}^{{\prime }^{l}},f^{\prime }(\bar{z}^{\prime }))\) belongs to *J*.

Since \(J=Q(I)\), we have \(f^{\prime }(\bar{z}^{\prime }) = f(\bar{w})\) for some tuple \(\bar{w}\) of colored variables in \(V^{\prime }\). Since the arity of *f* is *n* and there are \(n+1\) distinct colors, some color does not appear in \(\bar{w}\). Without loss of generality we may assume that this is the color 0.

*J*. Like any

*T*-fact in

*J*, this fact has been produced by some matching \(k:B \rightarrow I\) such that \(T(\bar{x}^{{\prime }^{0}},f(\bar{w})) = k(T(\bar{x},f(\bar{z})))\), so

- (a)
\(k(\bar{x})=\bar{x}^{{\prime }^{0}}\) and

- (b)
\(k(\bar{z})=\bar{w}\).

*s*denote the mapping that removes colors, i.e., \(s(u^l)=u\) for every \(u \in V^{\prime }\) and every \(l \in \{0,1,\ldots ,n\}\). Since \(s(I) \subseteq B^{\prime }\), we have a homomorphism \(s \circ k:B \rightarrow B'\). We now define \(h := s \circ k\) and show that it satisfies the conditions required by the Theorem. The first condition is clear since \(h(\bar{x}) = s(k(\bar{x})) = s(\bar{x}^{{\prime }^{0}}) = \bar{x}^{\prime }\).

For the second condition, let \(x \in X \cap Z\). By (a), *k*(*x*) is colored 0 or white. By (b), *k*(*x*) is colored nonzero or white. Hence, *k*(*x*) is colored white, i.e., \(k(x) \in Z^{\prime }\), so \(h(x)=s(k(x))=k(x) \in Z^{\prime }\) as desired.

*B*by replacing each variable

*u*not in \(Y_h\) by a fresh copy \(u^0\). For each \(u \in Y_h\), we define \(u^0\) simply as

*u*itself. The body \(B_1\) is obtained from

*B*by replacing each variable not in \(Y_h\) by a fresh copy \(u^1\). Again, for each \(u \in Y_h\) we define \(u^1\) simply as

*u*itself. To show the containment, we now must find a homomorphism

*m*from

*B*to \(B_2\) such that each \(u\in X - Y_h\) is mapped to \(u^0\); each \(u \in Y_h\) is mapped to

*u*; and each \(u \in Z-Y_h\) is mapped to \(u^1\).

*m*:

if

*k*(*u*) is colored 0, then \(m(u):=u^0\);if

*k*(*u*) is colored*l*for some \(l>0\), then \(m(u):=u^1\);if

*k*(*u*) is colored white, then \(m(u):=u\).

*B*; we must show \(R(m(\bar{u})) \in B_2\). Since \(k:B \rightarrow I\), we know that \(R(k(\bar{u})) \in I\). By definition of

*I*, this means that \(R(k(\bar{u})) = R(\bar{v}^l)\) for some atom \(R(\bar{v})\) in \(B^{\prime }\) and some color

*l*. So, for each variable

*u*in \(\bar{u}\), the color of

*k*(

*u*) is either

*l*or white. We now distinguish two cases.

If

*k*(*u*) is colored white, then \(h(u)=k(u) \in Z^{\prime }\) so \(u \in Y_h\). Hence, in this case, \(m(u)=u=u^0=u^1\).If

*k*(*u*) is colored*l*, then by definition \(m(u)=u^0\) when \(l=0\), and \(m(u)=u^1\) when \(l>0\).

It remains to verify that *m* maps the variables in \(\textit{XY}_hZ\) correctly. If \(u \in Y_h\), then \(h(u) = k(u) \in Z^{\prime }\) so *k*(*u*) is colored white and \(m(u)=u\) as desired. If \(u \in X - Y_h\), then by (a), *k*(*u*) is colored 0 so \(m(u)=u^0\) as desired. Finally, if \(u \in Z-Y_h\), then by (b), *k*(*u*) is colored \(l>0\) so \(m(u)=u^1\) as desired.

As a corollary, we obtain that the complexity of deciding logical entailment for sifo CQs is not worse than that of deciding containment for classical CQs:

**Corollary 2**

Testing logical entailment of sifo CQs is NP-complete.

*Proof*

Membership in NP follows from Theorem 3; as a witness for logical entailment, we can use a homomorphism *h* satisfying the first two conditions of the theorem, together with a homomorphism \(h_0\) from the query \((B,\textit{XY}_hZ)\) to the query \((B,\textit{XY}_h) \bowtie (B,Y_hZ)\) witnessing the third condition of the theorem. NP-hardness follows because the problem has containment of classical CQs as a special case, which is well known to be NP-hard. Indeed, logical entailment of a sifo \(Q^{\prime }\) by a sifo *Q*, in the special case where the creation functions of *Q* and \(Q^{\prime }\) are nullary, amounts to classical containment of *Q* in \(Q^{\prime }\) when we ignore the function terms in the heads. \(\square \)

### 5.5 From oid-equivalence to logical entailment

Let *Q* and \(Q^{\prime }\) be sifo CQs of the general forms (3) and (4) from Sect. 4.4. From our main Theorems 2 and 3, we can conclude the following.

**Theorem 4**

If *Q* and \(Q^{\prime }\) are oid-equivalent, then *Q* logically entails \(Q^{\prime }\).

*Proof*

*h*satisfies the conditions of Theorem 3, thus showing that

*Q*logically entails \(Q^{\prime }\).

- 1.
Since

*h*maps the head of \(\mathring{Q}^\pi \) to the head of \(\mathring{Q}^{\prime }\), we have \(h(\bar{x})=\bar{x}\) and \(h(\pi (\bar{z})) = \bar{z}\). Since \(\bar{x}^{\prime }=\bar{x}\), we have \(h(\bar{x}) = \bar{x}^{\prime }\) as desired. - 2.
Since

*h*is the identity on*X*, we have \(h(X \cap Z) = X \cap Z \subseteq Z = Z^{\prime }\) as desired. - 3.
Since \(h(\pi (\bar{z}))=\bar{z}\) and \(\pi (Z)=Z\), we have \(h(Z)=Z=Z^{\prime }\). Hence, \(Z \subseteq Y_h\). But then the join dependency \(\textit{XY}_h \bowtie Y_h Z\) becomes \(\textit{XY}_h \bowtie Y_h\) which trivially holds.

## 6 Discussion

The results in this paper provide an understanding of the notions of oid-equivalence and logical entailment for sifo CQs. Sifo CQs, however, form a very simple subclass of oCQs. Moreover, oCQs themselves are rather limited; for example, they consist of a single rule and the rule can have only one atom in the head. Thus, there are at least three natural directions for further research: (i) allowing more than one function in the head; (ii) allowing more than one atom in the head; (iii) allowing more than one rule.

### 6.1 Containment

Furthermore, in addition to oid *equivalence* of oCQs, it would be natural to also investigate a notion of oid-*containment*. There are actually at least two reasonable ways to define such a notion. The situation is similar to that in research on CQs with counting or bag semantics [14, 15]. Most of the known results are for equivalence only, with the extension to containment typically an open problem. Indeed, our characterization of oid-equivalence for sifo CQs relies on equivalence of CQs with bag semantics. An extension to oid-containment will likely need a similar advance on containment of CQs with bag semantics.

### 6.2 Sifo CQs and ILOG

In the introduction, we mentioned that sifo CQs, and oCQs in general, are a fragment of ILOG without recursion [22]. Sifo CQs belong to the subclass of the class of recursion-free ILOG programs ‘with isolated oid creation’ [23]. For this class, oid-equivalence was already known to be decidable. This was shown by checking all finite instances up to some exponential size. Hence, our NP-completeness result for oid-equivalence of sifo CQs does not follow from the previous work. More generally, the decidability of oid-equivalence for general recursion-free ILOG programs, or already of oCQs for that matter, is a long-standing open question. Various interesting examples showing the intricacies of this problem have already been given by Hull and Yoshikawa [23].

### 6.3 Sifo CQs and nested dependencies

In Sect. 5.1, we also presented sifo CQs, now viewed as schema mappings, as a very simple subclass of nested tgds. The implication problem for general nested tgds was shown to be decidable by Kolaitis et al. [26] in work done independently from the present paper. Nevertheless, our characterization of implication for sifo CQs, given by Theorem 3, does not follow from the general decision procedure for nested tgds. Instead, the general procedure, when applied to two sifo CQs, is strikingly similar to our proof of necessity of our theorem. Using the notation from that proof, the general procedure applied to test implication of sifo CQ \(Q^{\prime }\) by sifo CQ *Q* would amount to testing for the existence of a homomorphism *h* from \(\{T(\bar{x}^{{\prime }^{l}},f^{\prime }(\bar{z}^{\prime })) \mid l=0,\ldots ,n\}\) to *Q*(*I*). Since \(Q(I) = \{T(\alpha (\bar{x}),f(\alpha (\bar{z}))) \mid \alpha : B \rightarrow I\}\), this can be implemented by guessing *h* and \(n+1\) matchings \(\alpha _l : B \rightarrow I\) such that \((h(\bar{x}^{{\prime }^{l}}),f^{\prime }(h(\bar{z}^{\prime }))) = (\alpha _l(\bar{x}),f(\alpha _l(\bar{z})))\) for \(l=0,\ldots ,n\). In contrast, as explained in Corollary 2, our characterization involves guessing just two homomorphisms.

### 6.4 Sifo CQs and plain SO-tgds

*counts*of generated oids per tuple are important. Now consider the following pair of oCQs:

*f*- and

*g*-oids per

*x*-value, but now it also becomes important how these oids are paired. In

*Q*, more pairs are generated for each

*x*, and the two queries are not oid-equivalent. So, in the case of multiple functions, also the interaction between the multiple terms needs to be taken into account in some way.

*Q*(ignoring the third component in the head) logically entails the \(g_1\)-part of \(Q^{\prime }\), and likewise, the \(f_2\)-part of

*Q*(ignoring the second component in the head) logically entails the \(g_2\)-part of \(Q^{\prime }\). Globally, however,

*Q*does not logically entail \(Q^{\prime }\); this can be seen by the instances given in Table 9, which satisfy

*Q*but not \(Q^{\prime }\).

Instances used to illustrate logical entailment in the presence of multiple functions

*Q*and \(Q^{\prime }\) are oid-equivalent, and

*Q*logically entails \(Q^{\prime }\), but \(Q^{\prime }\) does not logically entail

*Q*.

## Footnotes

- 1.
- 2.
IRIs stand for internationalized resource identifiers and extend the syntax of uniform resource identifiers (URIs) to a much wider repertoire of characters. They naturally embody global identifiers that refer to the same resource on the Web and can be used across different mapping assertions to refer to that resource.

- 3.
Since constants are atomic data elements, no constant is allowed to be of the form \(f(d_1,\ldots ,d_k)\).

- 4.
The motivation for MV queries was to model the semantics of positive SQL queries with nested EXISTS subqueries. While queries under standard SQL semantics return multisets of tuples, only the relations mentioned in the top level SQL block contribute to the multiplicities of answers, whereas relations mentioned in the subquery do not.

## Notes

### Acknowledgments

We thank the anonymous referees for their careful comments which helped improve the presentation of the paper. The work by Angela Bonifati has been partially supported by the ANR through the grant Datacert: Coq deep specification of security aware data integration (ANR-15-CE39-0009). The work by Werner Nutt has been partially supported by the grant CANDy of the Free University of Bozen-Bolzano.

### References

- 1.Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)Google Scholar
- 2.Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
- 3.Abiteboul, S., Kanellakis, P.: Object identity as a query language primitive. J. ACM
**45**(5), 798–842 (1998)MathSciNetCrossRefMATHGoogle Scholar - 4.Abiteboul, S., Vianu, V.: Procedural languages for database queries and updates. J. Comput. Syst. Sci.
**41**(2), 181–229 (1990)MathSciNetCrossRefMATHGoogle Scholar - 5.Abiteboul, S., Vianu, V.: Datalog extensions for database queries and updates. J. Comput. Syst. Sci.
**43**(1), 62–124 (1991)MathSciNetCrossRefMATHGoogle Scholar - 6.Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. Proc. VLDB Endow.
**1**(1), 230–244 (2008)CrossRefGoogle Scholar - 7.Arenas, M., Pérez, J., Reutter, J., Riveros, C.: The language of plain SO-TGDS: composition, inversion and structural properties. J. Comput. Syst. Sci.
**79**(6), 737–1002 (2013)MathSciNetCrossRefMATHGoogle Scholar - 8.Arocena, P., Glavic, B., Miller, R.: Value invention in data exchange. In: Proceedings of the SIGMOD Conference, pp. 157–168. ACM (2013)Google Scholar
- 9.Arocena, P.C., Ciucanu, R., Glavic, B., Miller, R.J.: Gain control over your integration evaluations. Proc. VLDB Endow.
**8**(12), 1960–1971 (2015)Google Scholar - 10.Van den Bussche, J., Paredaens, J.: The expressive power of complex values in object-based data models. Inf. Comput.
**120**, 220–236 (1995)MathSciNetCrossRefMATHGoogle Scholar - 11.Van den Bussche, J., Van Gucht, D., Andries, M., Gyssens, M.: On the completeness of object-creating database transformation languages. J. ACM
**44**(2), 272–319 (1997)MathSciNetCrossRefMATHGoogle Scholar - 12.ten Cate, B., Kolaitis, P.: Structural characterizations of schema-mapping languages. Commun. ACM
**53**(1), 101–110 (2010)CrossRefGoogle Scholar - 13.Chandra, A., Merlin, P.: Optimal implementation of conjunctive queries in relational data bases. In: Proceedings of the 9th ACM Symposium on the Theory of Computing, pp. 77–90. ACM (1977)Google Scholar
- 14.Cohen, S.: Equivalence of queries that are sensitive to multiplicities. VLDB J.
**18**, 765–785 (2009)CrossRefGoogle Scholar - 15.Cohen, S., Nutt, W., Sagiv, Y.: Containment of aggregate queries. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) Database Theory—ICDT 2003. Lecture Notes in Computer Science, vol. 2572, pp. 111–125. Springer, Berlin (2003)Google Scholar
- 16.Fagin, R., Haas, L., M. Hernández, R.M., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Borgida, A., Chaudhuri, V., Giorgini, P., Yu, E. (eds.) Conceptual Modeling: Foundations and Applications. Lecture Notes in Computer Science, vol. 5600, pp. 198–236. Springer, Berlin (2009)Google Scholar
- 17.Fagin, R., Kolaitis, P., Nash, A., Popa, L.: Towards a theory of schema-mapping optimization. In: Proceedings of the 27th ACM Symposium on Principles of Database Systems, pp. 33–42 (2008)Google Scholar
- 18.Fagin, R., Kolaitis, P., Popa, L.: Composing schema mappings: second-order dependencies to the rescue. ACM Trans. Database Syst.
**30**(4), 994–1055 (2005)CrossRefGoogle Scholar - 19.Feinerer, I., Pichler, R., Sallinger, E., Savenkov, V.: On the undecidability of the equivalence of second-order tuple generating dependencies. Inf. Syst.
**48**, 113–129 (2015)CrossRefGoogle Scholar - 20.Friedman, M., Levy, A.Y., Millstein, T.D.: Navigational plans for data integration. In: AAAI/IAAI, pp. 67–73 (1999)Google Scholar
- 21.Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The TSIMMIS approach to mediation: data models and languages. J. Intell. Inf. Syst.
**8**(2), 117–132 (1997)CrossRefGoogle Scholar - 22.Hull, R., Yoshikawa, M.: ILOG: declarative creation and manipulation of object identifiers. In: McLeod, D., Sacks-Davis, R., Schek, H. (eds.) Proceedings of the 16th International Conference on Very Large Data Bases, pp.455–468. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1990)Google Scholar
- 23.Hull, R., Yoshikawa, M.: On the equivalence of database restructurings involving object identifiers. In: Proceedings of the 10th ACM Symposium on Principles of Database Systems, pp. 328–340. ACM Press, New York (1991)Google Scholar
- 24.Kifer, M., Wu, J.: A logic for programming with complex objects. J. Comput. Syst. Sci.
**47**(1), 77–120 (1993)MathSciNetCrossRefMATHGoogle Scholar - 25.Klug, A., Price, R.: Determining view dependencies using tableaux. ACM Trans. Database Syst.
**7**, 361–380 (1982)MathSciNetCrossRefMATHGoogle Scholar - 26.Kolaitis, P., Pichler, R., Sallinger, E., Savenkov, V.: Nested dependencies: structure and reasoning. In: Proceedings of the 33rd ACM Symposium on Principles of Database Systems (2014)Google Scholar
- 27.Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings 21st ACM Symposium on Principles of Database Systems, pp. 233–246 (2002)Google Scholar
- 28.Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying heterogeneous information sources using source descriptions. In: Vijayaraman, T., Buchmann, A., Mohan, C., Sarda, N. (eds.) Proceedings of the 22nd International Conference on Very Large Data Bases, pp. 251–262. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1996)Google Scholar
- 29.Maier, D.: A logic for objects. In: Workshop on Foundations of Deductive Databases and Logic Programming, pp. 6–26 (1986)Google Scholar
- 30.Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: ICDE, pp. 251–260 (1995)Google Scholar
- 31.Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. Data Semant.
**10**, 133–173 (2008)MATHGoogle Scholar - 32.Sequeda, J.F., Arenas, M., Miranker, D.P.: On directly mapping relational databases to RDF and OWL. In: International Conference on World Wide Web (WWW), pp. 649–658 (2012). doi:10.1145/2187836.2187924
- 33.Ullman, J.D.: Information integration using logical views. Theor. Comput. Sci.
**239**(2), 189–210 (2000)MathSciNetCrossRefMATHGoogle Scholar