Complexity and Expressive Power of Weakly WellDesigned SPARQL
 576 Downloads
Abstract
SPARQL is the standard query language for RDF data. The distinctive feature of SPARQL is the OPTIONAL operator, which allows for partial answers when complete answers are not available due to lack of information. However, optional matching is computationally expensive—query answering is PSPACEcomplete. The welldesigned fragment of SPARQL achieves much better computational properties by restricting the use of optional matching—query answering becomes coNPcomplete. On the downside, welldesigned SPARQL captures far from all reallife queries—in fact, only about half of the queries over DBpedia that use OPTIONAL are welldesigned. In the present paper, we study queries outside of welldesigned SPARQL. We introduce the class of weakly welldesigned queries that subsumes welldesigned queries and includes most common meaningful nonwelldesigned queries: our analysis shows that the new fragment captures over 99% of DBpedia queries with OPTIONAL. At the same time, query answering for weakly welldesigned SPARQL remains coNPcomplete, and our fragment is in a certain sense maximal for this complexity. We show that the fragment’s expressive power is strictly inbetween welldesigned and full SPARQL. Finally, we provide an intuitive normal form for weakly welldesigned queries and study the complexity of containment and equivalence.
Keywords
RDF query languages SPARQL Optional matching1 Introduction
The Resource Description Framework (RDF) [14, 18, 31] is the W3C standard for representing linked data on the Web. RDF models information in terms of labelled graphs consisting of triples of resource identifiers (IRIs). The first and last IRIs in such a triple, called subject and object, represent entity resources, while the middle IRI, called predicate, represents a relation between the two entities.
SPARQL [17, 37] is the default query language for RDF graphs. First standardised in 2008 [37], SPARQL is now recognised as a key technology for the Semantic Web. This is witnessed by a recent adoption of a new version of the standard, SPARQL 1.1 [17], as well as by active development of SPARQL query engines in academia and industry, for instance, as part of the systems AllegroGraph (http://franz.com/agraph/allegrograph/), Apache Jena (http://jena.apache.org), RDF4J (http://rdf4j.org), or OpenLink Virtuoso (http://virtuoso.openlinksw.com).
In recent years, SPARQL has been subject to a substantial amount of theoretical research, based on the foundational work by Pérez et al. [32, 33]. In particular, we now know much about evaluation [1, 3, 4, 6, 8, 20, 22, 23, 25, 28, 34, 38], optimisation [8, 9, 12, 13, 24, 27, 35], federation [10, 11], expressive power [2, 20, 21, 25, 36, 39], and provenance tracking [15, 16] for queries from various fragments and extensions of SPARQL. These studies have had a great impact in the community, in fact influencing the evolution of SPARQL as a standard.
The O P T operator accounts in a natural way for the open world assumption and the fundamental incompleteness of the Web. However, evaluating queries that use O P T is computationally expensive—Pérez et al. [33] showed PSpacecompleteness of SPARQL query evaluation, and Schmidt et al. [38] refined this result by proving PSpacehardness even for queries using no operators besides O P T. This is not surprising given that SPARQL queries are equivalent in expressive power to firstorder logic queries, and translations in both directions can be done in polynomial time [2, 25, 36].
This spurred a search for restrictions on the use of O P T that would ensure lower complexity of query evaluation. It was also recognised that queries that are difficult to evaluate are often unintuitive. For instance, they may produce less specified answers (i.e., answers with fewer bound variables) as the graph over which they are evaluated grows larger.
Pérez et al. [33] introduced the welldesigned fragment of SPARQL queries by imposing a syntactic restriction on the use of variables in O P Texpressions. Roughly speaking, each variable in the optional (i.e., right) argument of an O P Texpression should either appear in the mandatory (i.e., left) argument or be globally fresh for the query, i.e., appear nowhere outside of the argument. Welldesigned queries have lower complexity of query evaluation—the problem is coNPcomplete (provided all the variables in the query are selected). Moreover, such queries have a more intuitive behaviour than arbitrary SPARQL queries; in particular, they enjoy the monotonicity property that we observed for query (1): each partial answer over a graph can potentially be extended to undefined variables if the graph is completed with the missing information, and the more information we have the more specified are the answers. Welldesigned queries can be efficiently transformed to an intuitive normal form allowing for a transparent graphical representation of queries as trees [27, 35]. Hence, many recent studies concentrate partially [23, 25, 27, 40, 41] or entirely [1, 8, 35] on welldesigned queries.
Such a success of welldesigned queries may lead to the impression that nonwelldesigned SPARQL queries are just a useless side effect of the early specification. But is this impression justified by the use of SPARQL in practice? To answer this question, a comprehensive analysis of reallife queries is required. We are aware of two works that analyse the distribution of operators in SPARQL queries asked over DBpedia [7, 34]. Both studies show that O P T is used in a nonnegligible amount of practical queries. However, only Picalausa and Vansummeren [34] go further and analyse how many of these queries are welldesigned; and the result is quite interesting—welldesigned queries make up only about half of all queries with O P T. In other words, welldesigned queries are common, but by far not exclusive.
The main goal of this paper is to investigate SPARQL queries beyond the welldesigned fragment. We wanted to see if the welldesignedness condition could be extended so as to include most practical queries while preserving good computational properties. The main result of our study is very positive—we identified a new fragment of SPARQL queries, called weakly welldesigned queries, that covers over 99% of queries over DBpedia and has the same complexity of query evaluation as the welldesigned fragment. We also show that our fragment is in a sense maximal for this complexity.
Of course, preference patterns encountered in reallife queries are often more complex. Still, we will see that in most cases they do not increase the complexity of query evaluation.
This use of F I L T E R is in fact very common in reallife queries. Moreover, it is intuitive as long as F I L T E R is essentially the outermost operator in the query, as it is in our example. We will see that in all such cases F I L T E R cannot lead to an increase in complexity.
Having isolated these typical uses of nonwelldesignedness, we identify a new fragment of SPARQL that (a) includes all queries of the above two types, (b) subsumes welldesigned queries, and (c) has the same complexity of query evaluation as welldesigned queries. We call such queries weakly welldesigned. They are the maximal fragment without structural restrictions on conjunctive blocks and filter conditions that has the above properties. Our analysis shows that more than 99% of DBpedia queries with O P T are weakly welldesigned.
Besides low complexity of query evaluation, we establish a few more useful properties of weakly welldesigned queries, which are summarised in the following outline of the paper. After introducing the syntax and semantics of SPARQL in Section 2, we formally define our new fragment in Section 3. In Section 4, we show that, similarly to the welldesigned case, weakly welldesigned queries can be transformed to an intuitive normal form, which allows for a natural graphical representation as constraint pattern trees. Using this representation, in Section 5, we formally show that the step from welldesigned to weakly welldesigned queries does not increase complexity of query evaluation; minimal relaxations of weak welldesignedness, however, already lead to a complexity jump. In Section 6, we compare the expressive power of our fragment (and its extensions with additional operators) with welldesigned queries and unrestricted SPARQL queries; in all cases, we show that the expressivity of weakly welldesigned queries lies strictly inbetween welldesigned and unrestricted queries. In Section 7, we study static analysis problems for weakly welldesigned queries and establish \({{\Pi }_{2}^{p}}\)completeness of equivalence and containment. Finally, in Section 8, we detail our analysis of DBpedia logs.
This article significantly extends the conference paper [19]. Besides providing full proofs of our technical claims, we have extended the analysis section and updated the evaluation to use more recent datasets. Furthermore, we have removed the erroneous claim that queries over unions of weakly welldesigned patterns have the same expressive power as unrestricted SPARQL queries; on the contrary, we show that the former are strictly less expressive than the latter.
2 SPARQL Query Language
We begin by formally introducing the syntax and semantics of SPARQL that we adopt in this paper. Our formal setup mostly follows [33], which has some differences from the W3C specification [17, 37]; in particular, we use twoplaced O P T and twovalued F I L T E R (conditional O P T and errors in F I L T E R evaluation as in the standard are expressible in our formalisation [2, 21]), do not consider blank nodes (their presence in RDF graphs would not change any of our results), and adopt set semantics, leaving multiset answers for future work.
RDF Graphs
An RDF graph is a labelled graph where nodes can also serve as edge labels. Formally, let I be a set of IRIs. Then an RDF triple is a tuple (s, p, o) from I × I × I, where s is called subject, p predicate, and o object. An RDF graph is a finite set of RDF triples.
SPARQL Syntax

⊤, ?x = u, ?x = ?y, or b o u n d(?x) for ?x, ?y in X and u ∈ I (these constraints are called atomic),

¬R _{1}, R _{1} ∧ R _{2}, or R _{1} ∨ R _{2} for filter constraints R _{1} and R _{2}.
We write \(\mathcal {U}\) for the set of all patterns. We also distinguish the fragment \(\mathcal {P}\) of \(\mathcal {U}\) that consists of all U N I O Nfree patterns, that is, patterns that do not use the U N I O N operator.
Note that every pattern P can be seen as a query of the form (4) where X = v a r s(P). Hence, all definitions that refer to “queries” implicitly extend to patterns in the obvious way.
SPARQL Semantics
 1.
if B is a basic pattern, then [[B]]_{ G } = {μ : v a r s(B) → I∣μ(B) ⊆ G};
 2.
[[(P _{1} A N D P _{2})]]_{ G } = [[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G };
 3.
 4.
[[(P _{1} U N I O N P _{2})]]_{ G } = [[P _{1}]]_{ G } ∪ [[P _{2}]]_{ G };
 5.[[(P ^{′} F I L T E R R)]]_{ G } = {μ∣μ ∈ [[P ^{′}]]_{ G } and μ ⊧ R}, where μ satisfies a filter constraint R, denoted by μ ⊧ R, if one of the following holds:

R is ⊤;

R is ?x = u, ?x ∈ d o m(μ), and μ(?x) = u;

R is ?x = ?y, {?x, ?y}⊆ d o m(μ), and μ(?x) = μ(?y);

R is b o u n d(?x) and ?x ∈ d o m(μ);

R is a Boolean combination of filter constraints evaluating to true under the usual interpretation of ¬, ∧, and ∨.

Let μ_{ X } be the projection of a mapping μ to variables X, that is, μ_{ X }(?x) = μ(?x) if ?x ∈ X and μ_{ X }(?x) is undefined if ?x ∉ X. The evaluation [[Q]]_{ G } of a query Q of the form (4) is the set of all mappings μ_{ X } such that μ ∈ [[P]]_{ G }.
Finally, a solution to a query (or pattern) Q over G is a mapping μ such that μ ∈ [[Q]]_{ G }.
3 Weakly WellDesigned Patterns
We begin by recalling the notion of welldesigned patterns and then formulate our generalisation. For now, we focus on the fragment \(\mathcal {P}\) of U N I O Nfree patterns (also known as the A N D O P T F I L T E R fragment of SPARQL), leaving the operators U N I O N and S E L E C T for later sections.
Note that a given pattern can occur more than once within a larger pattern. In what follows we will sometimes need to distinguish between a (sub)pattern P as a possibly repeated building block of another pattern P ^{′} and its occurrences in P ^{′}, that is, unique subtrees in the parse tree. Then, the left (right) argument of an occurrence i is the subtree rooted in the left (right) child of the root of i in the parse tree, and an occurrence i is inside an occurrence j if the root of i is a descendant of the root of j.
Definition 1
(Pérez et al. [33]) A pattern P from \(\mathcal {P}\) is welldesigned (or wdpattern, for short) if for every occurrence i of an O P Tpattern P _{1} O P T P _{2} in P the variables from v a r s(P _{2}) ∖ v a r s(P _{1}) occur in P only inside (the labels of) i.
 Source 1.
 There are two substantially different ways of nesting the O P T operator in patterns:$$\begin{array}{@{}rcl@{}} P_{1}~ {{\mathsf{OPT}}}~ (P_{2}~ {{\mathsf{OPT}}}~ P_{3}), \end{array} $$(OptR)Nonwelldesigned nesting of type (OptR) is responsible for the PSpacehardness of query evaluation [33, 38]. Moreover, such nesting is not very intuitive unless welldesigned. On the contrary, as we saw in the introduction, nonwelldesigned nesting of type (OptL) can be used for prioritising some parts of patterns to others, and is indeed used in real life. As we will see later, nesting of type (OptL) cannot lead to high complexity of evaluation.$$\begin{array}{@{}rcl@{}} (P_{1} ~{{\mathsf{OPT}}}~ P_{2})~{{\mathsf{OPT}}}~P_{3}. \end{array} $$(OptL)
 Source 2.

Welldesignedness can be violated by using “dangerous” variables from the right argument of O P T in filter constraints. In particular, patterns of the form (P _{1} O P T P _{2}) F I L T E R R with R using a variable from v a r s(P _{2}) ∖v a r s(P _{1}) are not welldesigned, but rather frequent in practice. However, such patterns almost never occur inside the right argument of other O P Tpatterns. We will see that if we restrict the usage of such filters to the “top level”, we preserve the good computational properties of wdpatterns.
Motivated by these observations, we considerably generalise the notion of wdpatterns to allow for useful queries like (2) and (3) while retaining important properties of such patterns. We start with two auxiliary notions.
Definition 2
Given a pattern P, an occurrence i _{1} in P dominates an occurrence i _{2} if there exists an occurrence j of an O P Tpattern such that i _{1} is inside the left argument of j and i _{2} is inside the right argument.
Definition 3
An occurrence i of a F I L T E Rpattern P ^{′} F I L T E R R in P is toplevel if there is no occurrence j of an O P Tpattern such that i is inside the right argument of j.
We are ready to give the main definition of this paper.
Definition 4

subpatterns whose occurrences are dominated by i, and

constraints of toplevel occurrences of F I L T E Rpatterns.
We write \(\mathcal {P}_{\text {wwd}}\) for the fragment of wwdpatterns. They extend wdpatterns by allowing variables from the right argument of an O P Tsubpattern that are not “guarded” by the left argument to appear in certain positions outside of the subpattern. Note that the patterns of queries (2) and (3) are wwdpatterns. Also, patterns which allow only for O P T nesting of type (OptL) are always weakly welldesigned, same as the pattern on the right hand side of (5), which expresses D I F F. However, patterns that have subpatterns of the atter form in the right argument of O P T are not weakly welldesigned. Next we give a few more examples.
Example 1
Proposition 1
Checking whether a U N I O N free pattern P belongs to the fragment \(\mathcal {P}_{\text {wwd}}\) can be done in time O(P^{2}), where P is the length of the string representation of P.
Proof
Next consider the recursive procedure i s_w w d in Fig. 3b, where s o r t(S) denotes a sorted, repetitionfree list representation of a set S.
Given a U N I O Nfree pattern P without toplevel filters, it is easily seen that i s_w w d(P) returns a tuple of the form (t r u e, v s, w s) if and only if P is weakly welldesigned, where ws is the sorted list of “unguarded” variables in P, that is, variables occurring in the second argument of an O P Tsubpattern P ^{′} of P but not in the first argument of P ^{′}, and v s = s o r t(v a r s(P))∖w s. Procedure i s_w w d can be implemented in quadratic time since s o r t (which may take time O(n log n)) is only applied to atomic subexpressions and set operations on sorted lists take linear time. □
4 OPTFILTERNormal Form and Constraint Pattern Trees
One of the key properties of wdpatterns is that they can always be converted to a socalled O P Tnormal form, in which all A N D and F I L T E Rsubpatterns are O P Tfree [33]. Also, F I L T E Rfree patterns in O P Tnormal form can be naturally represented as trees of a special form [27, 35], which give a good intuition for the evaluation and optimisation of such patterns. In this section, we show that these notions can be generalised to wwdpatterns.
Definition 5
 1.
(occurrences of) basic patterns as the bottom layer,
 2.
a F I L T E R on top of each basic pattern as the middle layer,
 3.
a combination of O P T and F I L T E R as the top layer;
Example 2
As shown by Letelier et al. [27], F I L T E Rfree patterns in O P Tnormal form can be represented by means of socalled pattern trees. We next show that this representation can be naturally extended to patterns in O Fnormal form.
Definition 6
 1.
if B is a basic pattern then \(\mathcal {T}(B~ {{\mathsf {FILTER}}}~ R)\) is a single node v labelled by the pair (B, R);
 2.
if P ^{′} is not a basic pattern then \(\mathcal {T}(P^{\prime }~ {{\mathsf {FILTER}}} ~R)\) is obtained by adding a special node labelled by R as the last child of the root of \(\mathcal {T}(P^{\prime })\);
 3.
\(\mathcal {T}(P_{1}~{{\mathsf {OPT}}}~P_{2})\) is the tree obtained from \(\mathcal {T}(P_{1})\) and \(\mathcal {T}(P_{2})\) by adding the root of \(\mathcal {T}(P_{2})\) as the last child of the root of \(\mathcal {T}(P_{1})\).
Proposition 2
Let P be a pattern in O F normal form. Then every special node in \(\mathcal {T}(P)\) is a child of the root.
Proof
Next we show that each wwdpattern can be converted to O Fnormal form and hence can be represented by a CPT. To prove this statement we make use of a number of equivalences. Formally, a pattern P _{1} is equivalent to a pattern P _{2} (written P _{1} ≡ P _{2}) if [[P _{1}]]_{ G } = [[P _{2}]]_{ G } holds for any graph G. There are several equivalences, such as associativity and commutativity of A N D, as well as filter decompositions, such as P F I L T E R (R _{1} ∧ R _{2}) ≡ (P F I L T E R R _{1}) F I L T E R R _{2}, which hold for all patterns (see [38] for an extensive list). Moreover, the key equivalences used in [33] for normalising wdpatterns can easily be adapted to serve our needs.
Proposition 3
Proof
Both equivalences are essentially shown in [33]. While stated for welldesigned patterns, the proof only exploits the properties v a r s(P _{2}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{1}) and v a r s(P _{2}) ∩ v a r s(R) ⊆ v a r s(P _{1}), which are satisfied not only by welldesigned patterns, but also by weakly welldesigned patterns □
Since all the equivalences preserve weak welldesignedness, we obtain the desired result.
Proposition 4
Each wwdpattern P is equivalent to a wwdpattern in O F normal form of size O(P).
Proof

Let P ^{′} = (P _{1} O P T P _{2}) A N D P _{3}. Since P is weakly welldesigned and the occurrence of P _{3} is not dominated by the occurrence of P _{1} O P T P _{2}, we have v a r s(P _{3}) ∩ v a r s(P _{2}) ⊆ v a r s(P _{1}). Therefore, using the first equivalence in Proposition 3, we can rewrite P to a pattern S by replacing P ^{′} with (P _{1} A N D P _{3}) O P T P _{2}. Moreover, we have P = S and P > S.

Let P ^{′} = (P _{1} O P T P _{2}) F I L T E R R where the occurrence of P ^{′} is not toplevel. Since P is weakly welldesigned, we then have v a r s(R) ∩ v a r s(P _{2}) ⊆ v a r s(P _{1}), and thus, with the second equivalence in Proposition 3, we can rewrite P to a pattern S by replacing P ^{′} with (P _{1} F I L T E R R) O P T P _{2}. Moreover, we have P = S and P > S.
Finally, S can be transformed to O Fnormal form by replacing every occurrence of an A N D F I L T E R combination of basic patterns by B F I L T E R R where B consists of all triples in the basic patterns and R is a conjunction of all the filter conditions (if there are no filters in the combination, then R is ⊤). Clearly, this transformation is equivalencepreserving and linear in S. □
Relying on this proposition, in the rest of the paper we silently assume that all wwdpatterns are in O Fnormal form and hence can be represented by CPTs.
We next transfer the notion of weak welldesignedness to CPTs. Given a pattern P in O Fnormal form, let ≺ be the strict topological sorting of the nodes in \(\mathcal {T}(P)\) computed by a depth first search traversal visiting the children of a node according to their ordering (i.e., v ≺ u holds if v is visited before u).
Lemma 1
Let P be a pattern in O F normal form and P ^{′} = P _{1} O P T P _{2} be a subpattern of P. Then v ≺ w for every two nodes v, w in \(\mathcal {T}(P)\) such that v is in the subtree of \(\mathcal {T}(P)\) corresponding to P _{1} and w is in the subtree corresponding to P _{2} .
Proof
The claim follows since \(\mathcal {T}(P^{\prime })\) is constructed by attaching \(\mathcal {T}(P_{2})\) as the last child to the root of \(\mathcal {T}(P_{1})\). □
In the following proposition, v a r s(u) for a node u of a CPT stands for the set of all variables in the label of u.
Proposition 5
A pattern P in O F normal form is weakly welldesigned if and only if, for each edge (v, u) with nonspecial u in the CPT \(\mathcal {T}(P)\) , every variable ?x ∈ v a r s(u) ∖ v a r s(v) occurs only in nodes w such that v ≺ w . The pattern is welldesigned if and only if for every variable ?x in P the set of all nodes v in \(\mathcal {T}(P)\) with ?x ∈ v a r s(v) is connected.
Proof

Let P = B F I L T E R R where B is basic. Then the claim is vacuous.

Let P = P _{1} F I L T E R R where P _{1} is not basic. By the inductive hypothesis, the claim holds for \(\mathcal {T}(P_{1})\). Moreover, \(\mathcal {T}(P)\) differs from \(\mathcal {T}(P_{1})\) only in the special node labelled with R, and the claim follows by Proposition 2.

Let P = P _{1} O P T P _{2}. By the inductive hypothesis, the claim holds for \(\mathcal {T}(P_{1})\) and \(\mathcal {T}(P_{2})\). Thus, by Lemma 1, it suffices to show that for every edge (v, u) in \(\mathcal {T}(P_{2})\) (with nonspecial u by definition), no variable ?x ∈ v a r s(u) ∖ v a r s(v) occurs in \(\mathcal {T}(P_{1})\). Suppose for contradiction that this property is violated for some (v, u) and ?x. Then P _{2} has a subpattern \(P^{\prime }=P^{\prime }_{1}~{{\mathsf {OPT}}}~P^{\prime }_{2}\) such that \(\mathcal {T}(P^{\prime }_{1})\) is a subtree of \(\mathcal {T}(P)\) rooted at v and \(\mathcal {T}(P^{\prime }_{2})\) is the complete subtree of \(\mathcal {T}(P)\) rooted at u. Moreover, ?x occurs in P _{1}, and thus outside P ^{′}. Since all F I L T E Rsubpatterns in P are safe, we can assume without loss of generality that the occurrence of ?x in P _{1} is not in a filter constraint. However, this contradicts the assumption that P is weakly welldesigned since the occurrence of ?x in P _{1} is not dominated by the occurrence of P ^{′}.
For the backward direction of the first claim, suppose P is not a wwdpattern. Then P has a subpattern \(P^{\prime }=P^{\prime }_{1}~{{\mathsf {OPT}}}~ P^{\prime }_{2}\), with v the root of \(\mathcal {T}(P^{\prime })\) in \(\mathcal {T}(P)\) and u the child of v corresponding to \(\mathcal {T}(P^{\prime }_{2})\), and a variable \(?x\in {\mathsf {vars}(P^{\prime }_{2})}\setminus {\mathsf {vars}(P^{\prime }_{1})}\) such that ?x ∈ v a r s(u) and, for some subpattern P _{1} O P T P _{2} of P, ?x occurs in P _{1} and P ^{′} occurs in P _{2}. Since \(?x\in {\mathsf {vars}(P^{\prime }_{2})}\setminus {\mathsf {vars}(P^{\prime }_{1})}\) and ?x ∈ v a r s(u), we have ?x ∈ v a r s(u) ∖v a r s(v). Thus, by Lemma 1, we have v ⊀ w, where w is a node in \(\mathcal {T}(P_{1})\) with an occurrence of ?x.
The second claim can be proved analogously. □
We conclude this section with a property that is unique to wwdpatterns: each wwdpattern is equivalent to a pattern whose corresponding CPT has depth one.
Definition 7
To show that each wwdpattern can be brought to this form, we exploit the following observation in [33].
Lemma 2 (Pérez et al. [33])
Let P be a pattern in \(\mathcal {P}\) , G a graph, and μ _{1} , μ _{2} two mappings in [[P]]_{ G } . Then μ _{1} ∼ μ _{2} if and only if μ _{1} = μ _{2} .
This lemma allows us to prove the following crucial equivalence.
Proposition 6
Proof

Let μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }). Then, by Lemma 2, we have [[P _{2}]]_{ G } = [[P _{2}]]_{ G } ⋈ [[P _{2}]]_{ G }. Consequently, μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }), and the claim follows.

Let μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ∖ [[P _{3}]]_{ G }). Then μ = μ _{1} ∪ μ _{2} such that μ _{1} ∈ [[P _{1}]]_{ G }, μ _{2} ∈ [[P _{2}]]_{ G }, and for every μ _{3} ∈ [[P _{3}]]_{ G }, μ _{2} ≁ μ _{3}. Since every mapping in [[P _{2} A N D P _{3}]]_{ G } is an extension of some mapping in [[P _{3}]]_{ G }, no mapping in [[P _{2} A N D P _{3}]]_{ G } is compatible with μ _{2}, and hence with μ. Therefore, μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ∖ [[P _{2} A N D P _{3}]]_{ G }, and the claim follows.

Let μ ∈ [[P _{1}]]_{ G } ∖ [[P _{2} O P T P _{3}]]_{ G }. Then μ ∈ [[P _{1}]]_{ G } and is incompatible with any mapping in [[P _{2} O P T P _{3}]]_{ G }. Moreover, since v a r s(P _{1}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{2}), μ is incompatible with any mapping in [[P _{2}]]_{ G }, and consequently also with any mapping in [[P _{2} A N D P _{3}]]_{ G }. Therefore, μ ∈ ([[P _{1}]]_{ G } ∖ [[P _{2}]]_{ G }) ∖ [[P _{2} A N D P _{3}]]_{ G }, and the claim follows.

Let μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }). Then, by Lemma 2, we have μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }), and the claim follows.

Let μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ∖ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }). Then μ = μ _{1} ∪ μ _{2} such that μ _{1} ∈ [[P _{1}]]_{ G }, μ _{2} ∈ [[P _{2}]]_{ G }, and μ is incompatible with every mapping in [[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }. Since v a r s(P _{1}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{2}), this implies that [[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G } is empty, that is, μ _{2} is incompatible with every mapping in [[P _{3}]]_{ G }. Therefore, μ _{2} ∈ [[P _{2}]]_{ G } ∖ [[P _{3}]]_{ G }, and thus μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ∖ [[P _{3}]]_{ G }). The claim follows.

Let μ ∈ [[P _{1}]]_{ G } ∖ [[P _{2}]]_{ G }. Since every mapping in [[P _{2} O P T P _{3}]]_{ G } extends a mapping in [[P _{2}]]_{ G }, we have that μ ∈ [[P _{1}]]_{ G } ∖ [[P _{2} O P T P _{3}]]_{ G }, and the claim follows.
Applied from left to right, equivalence (13) preserves weak welldesignedness (but not welldesignedness). Each such application transforms a weakly welldesigned O P T nesting of type (OptR) to a nesting of type (OptL), decreasing the depth of the CPT.
Corollary 1
Every wwdpattern is equivalent to a wwdpattern in depthone normal form.
5 Evaluation of wwdPatterns
In this section, we look at the query answering problem for wwdpatterns and their extensions with union and projection. We show that in all three cases, complexity remains the same as for wdpatterns. To obtain these results, we develop several new techniques.
It is known that \(\textsc {Eval}({\mathcal {U}})\) for general patterns \(\mathcal {U}\) is PSpacecomplete [33], and the result easily propagates to queries with projection (i.e., \(\mathcal {S}\)) [27]. For wdpatterns, the evaluation problem is coNPcomplete, and can be solved by exploiting the following idea of [27].
Suppose we are given a wdpattern P in O P Tnormal form (for simplicity, assume that P is F I L T E Rfree), a graph G, and a mapping μ. First, we look for a subtree of \(\mathcal {T}(P)\) that includes the root of \(\mathcal {T}(P)\), contains precisely the variables in d o m μ, and “matches” G under μ (i.e., images of all its triples under μ are contained in G). This is doable in polynomial time. If such a subtree does not exist, then μ cannot be a solution. Otherwise, the subtree witnesses that μ is a part of a solution to P. Finally, to verify that μ is a complete solution, we need to check that the subtree is maximal, that is, cannot be extended to any more nodes in \(\mathcal {T}(P)\) with a match in G. There are linearly many such nodes to check, and each check can be performed in coNP. So, the overall algorithm runs in coNP.
Inspired by this idea, we next show that the low evaluation complexity of wdpatterns transfers to wwdpatterns by developing a coNP algorithm for Eval(\({\mathcal {P}_{\text {wwd}}}\)).
Let P be a wwdpattern in O Fnormal form. An rsubtree of \(\mathcal {T}(P)\) is a subtree containing the root of \(\mathcal {T}(P)\) and all its special children. Every rsubtree \(\mathcal {T}(P^{\prime })\) of \(\mathcal {T}(P)\) is also a CPT representing a wwdpattern P ^{′} that can be obtained from P by dropping the right arguments of some O P Tsubpatterns (a transformation known from [33]). A child of an rsubtree \(\mathcal {T}(P^{\prime })\) of \(\mathcal {T}(P)\) is a node in \(\mathcal {T}(P)\) that is not contained in \(\mathcal {T}(P^{\prime })\) but whose parent is.
Definition 8
A mapping μ is a potential partial solution (or ppsolution for short) to a wwdpattern P over a graph G if there is an rsubtree \(\mathcal {T}(P^{\prime })\) of \(\mathcal {T}(P)\) such that d o m(μ) = v a r s(P ^{′}), μ(t r i p l e s(P ^{′})) ⊆ G, and μ ⊧ R for the constraint R of each ordinary node in \(\mathcal {T}(P^{\prime })\).
A ppsolution μ to P over G can be witnessed by several rsubtrees. However, the union of such rsubtrees is also a witness. Hence, there exists a unique maximal witnessing rsubtree, denoted \(\mathcal {T}(P_{\mu })\), with P _{ μ } being the corresponding wwdpattern.
Potential partial solutions generalise “partial solutions” as defined in [33] for wdpatterns. There, every “partial solution” is either a solution or can be extended to one. This is not the case for wwdpatterns. While every solution is clearly a ppsolution, not every ppsolution can be extended to a real one. Real solutions may not just extend ppsolutions by assigning previously undefined variables but can also override variable bindings established in some node v of \(\mathcal {T}(P_{\mu })\) by extending to a child of \(\mathcal {T}(P_{\mu })\) that precedes v according to the order ≺.
An additional complication is the presence of nonwelldesigned toplevel filters. Note that ppsolutions are only required to satisfy the constraints of ordinary nodes in the corresponding CPT, thus ignoring toplevel filters. Indeed, requiring ppsolutions to satisfy constraints of toplevel filters would be too strong since real solutions do not generally satisfy this property, as demonstrated by the following example.
Example 3
We now present a characterisation of solutions for wwdpatterns in terms of ppsolutions that (a) takes into account that not every ppsolution can be extended to a real solution and (b) ensures correct treatment of nonwelldesigned toplevel filters. For this we need some more notation. Given a wwdpattern P, a node v in \(\mathcal {T}(P)\), a graph G, and a ppsolution μ to P over G, let μ_{ v } be the projection μ_{ X } of μ to the set X of all variables appearing in nodes u of \(\mathcal {T}(P_{\mu })\) such that u ≺ v. A mapping μ _{1} is subsumed by a mapping μ _{2} (written \(\mu _{1} \sqsubseteq \mu _{2}\)) if μ _{1} ∼ μ _{2} and d o m(μ _{1}) ⊆ d o m(μ _{2}) (this notion is from [5, 33]).
Intuitively, a ppsolution μ needs to satisfy two conditions to be a real solution to a wwdpattern P. First, μ_{ v } (as opposed to μ for wdpatterns) must be nonextendable to v for any child v of \(\mathcal {T}(P_{\mu })\). Indeed, if such an extension exists, then it is either possible to provide bindings for some variables that are undefined in μ, or some variables from d o m(μ) can be assigned different values of higher “priority” than the corresponding values in μ. Second, every toplevel filter R labelling a node s needs to be satisfied by μ_{ s }, which is precisely the part of μ bound by the subpattern of P that is paired with R in the F I L T E Rpattern. The following lemma formalises this intuition.
Lemma 3
 1.
μ is a ppsolution to P over G;
 2.
for every child v of \(\mathcal {T}(P_{\mu })\) labelled with (B, R) there is no μ ^{′} such that \(\mu _{v} \sqsubseteq \mu ^{\prime }\) , μ ^{′} ⊧ R , and μ ^{′}(B) ⊆ G ;
 3.
μ_{ s } ⊧ R for every special node s in \(\mathcal {T}(P)\) labelled with R.
Proof
In this proof we write \(\mathcal {T}_{v}\) for the complete subtree of a CPT \(\mathcal {T}\) rooted at a node v (i.e., the subtree over all the descendants of v including v itself) and \(\mathcal {T}_{\prec v}\) for the subtree of \(\mathcal {T}\) consisting of all nodes u such that u ≺ v.
For the forward direction, suppose μ is a solution to P over G. Clearly, μ is a ppsolution to P over G, so it suffices to show that conditions 2 and 3 hold.
For condition 2, assume for contradiction that v is a child of \(\mathcal {T}(P_{\mu })\) labelled with (B, R) and μ ^{′} a mapping such that \(\mu _{v}\sqsubseteq \mu ^{\prime }\), μ ^{′} ⊧ R, and μ ^{′}(B) ⊆ G. Moreover, without loss of generality, let d o m(μ ^{′}) = d o m(μ) ∪ v a r s(B). Let u be the parent of v in \(\mathcal {T}(P)\), and let \(\mathcal {T}\) be the largest subtree of \(\mathcal {T}(P)\) that is rooted at u and has v as the last child of u. Then Open image in new window . Moreover, since u is contained in \(\mathcal {T}(P_{\mu })\), there is a mapping \(\mu _{1}\sqsubseteq \mu \) such that \(\mu _{1}\in [{\kern 2.3pt}[ \mathcal {T} ]{\kern 2.3pt}]_{G}\). Since v is not contained in \(\mathcal {T}(P_{\mu })\), we have \(\mu _{1}\sqsubseteq \mu _{v}\) and, since \(\mathcal {T}(P_{\mu })\) is the largest rsubtree witnessing μ, μ _{1} is not compatible with any mapping in \([{\kern 2.3pt}[ \mathcal {T}_{v} ]{\kern 2.3pt}]_{G}\). On the other hand, μ ^{′} satisfies the label of v, and thus, since \(\mathcal {T}_{v}\) contains no toplevel filters, μ ^{′}_{ v a r s(v)} can be extended to a mapping of \(\mu ^{\prime \prime }\in [{\kern 2.3pt}[ \mathcal {T}_{v} ]{\kern 2.3pt}]_{G}\). Moreover, since P is weakly welldesigned, \({\mathsf {vars}(\mathcal {T}_{v})}\cap {\mathsf {dom}(\mu _{v})}\subseteq {\mathsf {vars}(v)}\), and hence d o m(μ ^{″}) ∩ d o m(μ _{1}) ⊆ d o m(μ ^{′}). Thus, since μ_{ v } is compatible with μ ^{′}, μ _{1} is compatible with μ ^{″}, in contradiction to the above observation that μ _{1} is not compatible with any mapping in \([{\kern 2.3pt}[ \mathcal {T}_{v} ]{\kern 2.3pt}]_{G}\).
For condition 3, let s be a special node in \(\mathcal {T}(P)\) labelled with R. Since μ is a solution to P, there is some μ _{1} ⊆ μ such that \(\mu _{1}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec s} ]{\kern 2.3pt}]_{G}\) and μ _{1} ⊧ R. Hence, it suffices to show that μ _{1} = μ_{ s }. Clearly, μ _{1} ⊆ μ_{ s } (as μ_{ s } is the largest mapping compatible with μ that can occur in \([{\kern 2.3pt}[ \mathcal {T}(P)_{\prec s} ]{\kern 2.3pt}]_{G}\)), so assume for contradiction that there is a variable ?x ∈ d o m(μ_{ s }) ∖ d o m(μ _{1}). Then there is a node in \(\mathcal {T}(P_{\mu _{s}})\cap \mathcal {T}(P)_{\prec s}\) that does not occur in \(\mathcal {T}(P_{\mu _{1}})\cap \mathcal {T}(P)_{\prec s}\). This yields a contradiction with \(\mu _{1}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec s} ]{\kern 2.3pt}]_{G}\) analogously to the case of condition 2.
For the backward direction, suppose that μ satisfies conditions 1–3. We show that μ ∈ [[P]]_{ G } by induction on the depth of \(\mathcal {T}(P_{\mu })\), that is, the maximal number of edges between the root and a leaf.

Let u be a special node labelled with R. Then it suffices to show that μ_{ u } ⊧ R, which is immediate since μ_{ u } satisfies condition 3.

Let u be an ordinary node labelled with (B, R). We know that u is not in \(\mathcal {T}(P_{\mu })\). Since v is in \(\mathcal {T}(P_{\mu })\), by condition 2 there is no mapping μ ^{′} such that (a) \(\mu _{u}\sqsubseteq \mu ^{\prime }\), (b) μ ^{′} ⊧ R, and (c) μ ^{′}(B) ⊆ G. Since R is safe, it follows that every mapping satisfying (b) and (c) is incompatible with μ_{ u }. Consequently, every mapping in \([{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\) is incompatible with μ_{ u }, and hence \(\mu =\mu _{u}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec u} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\), as required.

Let u be an ordinary node labelled with (B ^{′}, R ^{′}) that is contained in \(\mathcal {T}(P_{\mu })\). Then μ = μ_{ u } ∪ μ _{2} where μ _{2} is the projection of μ to the set of variables occurring in the subtree \(\mathcal {T}\) of \(\mathcal {T}(P_{\mu })\) rooted at u (i.e., \(\mathcal {T}=\mathcal {T}(P_{\mu })_{u}\)). Since u is contained in \(\mathcal {T}(P_{\mu })\) and contains no special children, μ _{2} is a ppsolution to (the subpattern represented by) \(\mathcal {T}(P)_{u}\). Moreover, μ _{2} satisfies condition 3 with respect to \(\mathcal {T}(P)_{u}\) since \(\mathcal {T}(P)_{u}\) contains no special nodes. We next show that μ _{2} satisfies condition 2 with respect to \(\mathcal {T}(P)_{u}\). Let w be a child of \(\mathcal {T}\) (in \(\mathcal {T}(P)_{u}\)) labelled with (B, R), and assume for contradiction that there is some μ ^{′} such that \(\mu _{2}_{w}\sqsubseteq \mu ^{\prime }\), μ ^{′} ⊧ R, and μ ^{′}(B) ⊆ G. Without loss of generality, d o m(μ ^{′}) = d o m(μ _{2}) ∪ v a r s(B). Thus, since P is weakly welldesigned, v a r s(B) ∩ d o m(μ_{ u }) ⊆ v a r s(B ^{′}) ⊆ d o m(μ _{2}). Hence, μ ^{′} is compatible with μ_{ u }, and \(\mu _{w}\sqsubseteq \mu _{u}\cup \mu ^{\prime }\). Moreover, since μ ^{′} and μ_{ u } ∪ μ ^{′} coincide on v a r s(B) and R is safe, we have that μ_{ u } ∪ μ ^{′} ⊧ R and (μ_{ u } ∪ μ ^{′})(B) ⊆ G, contradicting the assumption for μ. Since μ _{2} satisfies conditions 1–3 with respect to \(\mathcal {T}(P)_{u}\), by the outer inductive hypothesis we obtain that \(\mu _{2}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\), and hence \(\mu \in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec u} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\) (as \(\mu _{u}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec u} ]{\kern 2.3pt}]_{G}\) holds by the inner inductive hypothesis). The claim follows.
Checking whether a mapping μ satisfies this characterisation is feasible in coNP, and the matching lower bound follows from the coNPhardness of evaluation of wdpatterns [33].
Theorem 1
Problem Eval \(({\mathcal {P}_{\text {wwd}}})\) is coNP complete.
Proof
The lower bound of this statement is known from [33], and the upper bound can be obtained from Lemma 3 as follows.
First we show that testing whether μ is a ppsolution takes polynomial time, same as computing the maximal witnessing tree \(\mathcal {T}(P_{\mu })\). We just proceed from the root of the tree down along the branches until we cannot find a match μ(t r i p l e s(v)) in G for the basic pattern in the child v which satisfies the condition in the node, and then check that the variables in the resulting tree are exactly v a r s(μ). So, the crucial part is to check that \(\mathcal {T}(P_{\mu })\) is not extendable to any of its children. But there are only linearly many children, and each check can be done in coNP. Finally, the checks for toplevel filters are again polynomial. □
Pérez et al. [33] extended wdpatterns to U N I O N by considering unions of wdpatterns, that is, patterns of the form P _{1} U N I O N … U N I O N P _{ n } with all \(P_{i}\in \mathcal {P}_{\text {wd}}\). We denote the resulting fragment by \(\mathcal {U}_{\text {wd}}\). This syntactic restriction on the use of U N I O N in \(\mathcal {U}_{\text {wd}}\) is motivated by the fact that any pattern in \(\mathcal {U}\) can be equivalently expressed as a union of U N I O Nfree patterns [33]. We denote the fragment of all queries over patterns in \(\mathcal {U}_{\text {wd}}\) by \(\mathcal {S}_{\text {wd}}\). Similarly, we write \(\mathcal {U}_{\text {wwd}}\) for unions of wwdpatterns and \(\mathcal {S}_{\text {wwd}}\) for queries over unions of wwdpatterns.
Analogously to the welldesigned case, Theorem 1 extends to fragments \(\mathcal {U}_{\text {wwd}}\) and \(\mathcal {S}_{\text {wwd}}\).
Corollary 2
Problem Eval ( \({\mathcal {U}_{\text {wwd}}}\) ) is coNP complete, and Eval ( \({\mathcal {S}_{\text {wwd}}}\) ) is \({{\Sigma }_{2}^{p}}\) complete.
The coNPalgorithm for \(\mathcal {U}_{\text {wwd}}\) is obtained simply by applying the algorithm for \(\mathcal {P}_{\text {wwd}}\) to each pattern in the union. Hardness for \(\mathcal {S}_{\text {wwd}}\) follows from the hardness of the welldesigned case [27], while for membership we just guess the values of the existential variables and then call a coNPoracle for \(\mathcal {U}_{\text {wwd}}\) on the resulting mapping and the normalised body of the query.
Hence, the complexity of evaluation for wwdpatterns is the same as for wdpatterns. We next show that wwdpatterns are, in a certain sense, a maximal extension of wdpatterns that preserves coNP evaluation complexity (under the usual complexitytheoretic assumptions).

some subpatterns whose occurrences are not dominated by i, or

constraints of some nontoplevel occurrences of F I L T E Rpatterns.
For the first relaxation, the arguably simplest special case would be to allow for some nonwelldesigned O P Tnesting of type (OptR). Consider the fragment \(\mathcal {P}_{\text {optr}}\) of patterns of the form B _{1} O P T (B _{2} O P T B _{3}), where B _{1}, B _{2} and B _{3} are basic patterns. Intuitively, \(\mathcal {P}_{\text {optr}}\) allows for the most simple form of nonwelldesigned nesting of type (OptR).
Theorem 2
Problem Eval \(({\mathcal {P}_{\text {optr}}})\) is \({{\Pi }_2^p}\) complete.
Proof
This theorem is a corollary of [38, Theorem 4] for their class \({\mathcal {E}}_{\leq 3}\), but without U N I O N. □
Theorem 3
Problem Eval ( \({\mathcal {P}_{\text {filter2}}}\) ) is \({{\Pi }_2^p}\) complete.
Proof

If \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}\cup B_{1}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G}\), then \(\mu ^{\prime }\models (?x^{\prime }_{1} = {}?x_{1}) \wedge {\dots } \wedge (?x^{\prime }_{n} = {} ?x_{n})\). Hence, \(\mu ^{\prime }_{{\mathsf {vars}(B_{2})}\cup {\mathsf {vars}(B_{3})}}\in [{\kern 2.3pt}[ {B_{2}~{{\mathsf {OPT}}}~B_{3}} ]{\kern 2.3pt}]_{G}\). Consequently, by assumption, \(\mu \not \sim \mu ^{\prime }_{{\mathsf {vars}(B_{2})}\cup {\mathsf {vars}(B_{3})}}\), and thus μ ≁ μ ^{′}.

If \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}\cup B_{1}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G}\), then \(\mu ^{\prime }_{{\mathsf {vars}(B_{2})}}\in [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G}=[{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B_{3}} ]{\kern 2.3pt}]_{G}\subseteq [{\kern 2.3pt}[ {B_{2}}~{{\mathsf {OPT}}}~B_{3}]{\kern 2.3pt}]_{G}\). Therefore, by assumption, \(\mu \not \sim \mu ^{\prime }_{{\mathsf {vars}(B_{2})}}\), and hence μ ≁ μ ^{′}.

Suppose \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B_{3}} ]{\kern 2.3pt}]_{G}\). Then there is some μ ^{″} such that \(\mu \cup \mu ^{\prime }\cup \mu ^{\prime \prime }\in [{\kern 2.3pt}[ {B_{1}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G} \) and \(\mu \cup \mu ^{\prime }\cup \mu ^{\prime \prime }\models (?x^{\prime }_{1} = {} ?x_{1}) \wedge {\dots } \wedge (?x^{\prime }_{n} = {} ?x_{n})\). Thus, \(\mu \cup \mu ^{\prime }\cup \mu ^{\prime \prime }\in [{\kern 2.3pt}[ {((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~ B^{\prime }_{3})~{{\mathsf {FILTER}}}~R} ]{\kern 2.3pt}]_{G} \), which is a contradiction.

Suppose \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B_{3}} ]{\kern 2.3pt}]_{G}=[{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G} \). Then \(\mu \cup \mu ^{\prime }\in [{\kern 2.3pt}[ {B_{1}} ]{\kern 2.3pt}]_{G}\cup B_{2}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G} \) and \(\mu \cup \mu ^{\prime }\models \neg bound(?x^{\prime }_{1})\), and hence \(\mu \cup \mu ^{\prime }\in [{\kern 2.3pt}[ {((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~B^{\prime }_{3})~\mathsf {FILTER}~R} ]{\kern 2.3pt}]_{G} \), which is again a contradiction.
Theorems 2 and 3 suggest that \(\mathcal {P}_{\text {wwd}}\) is a maximal fragment of \(\mathcal {P}\) that does not impose structural restrictions on basic patterns or filter constraints and has a coNP evaluation algorithm (assuming \(\text {\textsc {coNP}} \neq {{\Pi }_2^p}\)). Hence, going beyond wwdpatterns while preserving good computational properties requires more refined restrictions, possibly in the spirit of [27, Section 4].
6 Expressivity of wwdPatterns and Their Extensions
In this section, we analyse the expressive power of our fragments.
Definition 9
A language \(\mathcal {L}_{1}\) is strictly less expressive than a language \(\mathcal {L}_{2}\) (written \(\mathcal {L}_{1}<\mathcal {L}_{2}\)) if for every query Q _{1} in \(\mathcal {L}_{1}\) there is a query Q _{2} in \(\mathcal {L}_{2}\) such that Q _{1} ≡ Q _{2}, and there is a query Q _{2} in \(\mathcal {L}_{2}\) such that Q _{1} ≢ Q _{2} for every query Q _{1} in \(\mathcal {L}_{1}\).
We begin with UNIONfree patterns, establishing that \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}<\mathcal {P}\), and then proceed to unions, showing that \(\mathcal {U}_{\text {wd}}<\mathcal {U}_{\text {wwd}}<\mathcal {U}\), and queries, showing that \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}<\mathcal {S}\).
Following [5, 33], a set of mappings Ω_{1} is subsumed by a set of mappings Ω_{2} (written \({\Omega }_{1} \sqsubseteq {\Omega }_{2}\)) if for every μ _{1} ∈ Ω_{1} there exists a mapping μ _{2} ∈ Ω_{2} such that \(\mu _{1} \sqsubseteq \mu _{2}\). A query Q is weakly monotone if \([{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{1}} \sqsubseteq [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{2}}\) for any two graphs G _{1} and G _{2} with G _{1} ⊆ G _{2}, and a fragment \(\mathcal {L}\) is weakly monotone if it contains only weakly monotone queries. Arenas and Pérez [5] showed that, unlike \(\mathcal {P}\), the fragment \(\mathcal {P}_{\text {wd}}\) is weakly monotone, and hence \(\mathcal {P}_{\text {wd}}<\mathcal {P}\).
Example 4
Analogously, we show that \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}\) by observing that \(\mathcal {P}_{\text {wwd}}\) is not weakly monotone.
Proposition 7
Fragment \(\mathcal {P}_{\text {wwd}}\) is not weakly monotone.
Proof
An alternative proof of \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}\) can be obtained by adapting Theorem 3.5 in [6], which exhibits a weakly welldesigned, weakly monotone pattern that is not equivalent to any welldesigned pattern.
To distinguish \(\mathcal {P}_{\text {wwd}}\) from \(\mathcal {P}\) we need a different property.
Definition 10
A query Q is nonreducing if for any two graphs G _{1}, G _{2} such that G _{1} ⊆ G _{2} and any mapping \(\mu _{1} \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{1}}\) there is no \(\mu _{2} \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{2}}\) such that \(\mu _{2} \sqsubset \mu _{1}\) (i.e., \(\mu _{2} \sqsubseteq \mu _{1}\) and μ _{2} ≠ μ _{1}). A fragment is nonreducing if it contains only nonreducing queries.
Intuitively, for a nonreducing query extending a graph cannot result in a previously bound answer variable becoming unbound. All weakly monotone queries are nonreducing but not vice versa. Moreover, all wwdpatterns are nonreducing.
Proposition 8
Fragment \(\mathcal {P}_{\text {wwd}}\) is nonreducing.
Proof
Let \(P\in \mathcal {P}_{\text {wwd}}\) and let G _{1}, G _{2} be two graphs such that G _{1} ⊆ G _{2}. We show that \(\mu _{2}\not \sqsubset \mu _{1}\) for any \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\) by induction on the structure of P, proving, in parallel, that if all filters in P are over basic patterns, then for every mapping \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\) there is a mapping \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\) such that μ _{1}_{ v a r s(v)} = μ _{2}_{ v a r s(v)} for v the root of \(\mathcal {T}(P)\).
For the base case, suppose P = B F I L T E R R for some basic pattern B and filter constraint R. Then, P is monotone in the sense of [5], that is, satisfies \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\subseteq [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). Moreover, P contains no OPT, and hence every two distinct mappings in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) have the same domain and are thus incompatible. These facts imply both claims.

Let \(\mu _{1}={\mu _{1}^{1}}{\cup \mu _{1}^{2}}\) where \({\mu _{1}^{i}}\in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{1}}\). Assume for contradiction that \(\mu _{2}\sqsubset \mu _{1}\) for some \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). We begin by showing that μ _{2} must be of the form \({\mu _{2}^{1}}{\cup \mu _{2}^{2}}\) where \({\mu _{2}^{i}}\in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{2}}\), for which it suffices to show that \(\mu _{2}_{{\mathsf {vars}(P_{1})}}\) is compatible with some mapping in \([{\kern 2.3pt}[ P_{2} ]{\kern 2.3pt}]_{G_{2}}\). On the one hand, since \(\mu _{2}\sqsubset \mu _{1}\), \({\mu _{1}^{2}}\) is compatible with \(\mu _{2}_{{\mathsf {vars}(P_{1})}}\). On the other hand, since all filters in P _{2} are over basic patterns, the inductive hypothesis tells us that \([{\kern 2.3pt}[ P_{2} ]{\kern 2.3pt}]_{G_{2}}\) contains a mapping μ ^{′} that coincides with \({\mu _{1}^{2}}\) on the set of variables X in the root of \(\mathcal {T}(P_{2})\); moreover, since P is weakly welldesigned, d o m(μ ^{′}) ∩ v a r s(P _{1}) ⊆ X, and hence μ ^{′} is compatible with \(\mu _{2}_{{\mathsf {vars}(P_{1})}}\). Thus, \(\mu _{2}={\mu _{2}^{1}}{\cup \mu _{2}^{2}}\) where \({\mu _{2}^{i}}\in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{2}}\). Then, however, we must have that \({\mu _{2}^{1}}{\sqsubset \mu _{1}^{1}}\) or \({\mu _{2}^{2}}{\sqsubset \mu _{1}^{2}}\), contradicting the inductive hypothesis for P _{1} or P _{2}, respectively.

Let \(\mu _{1}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{1}}\setminus [{\kern 2.3pt}[ P_{2} ]{\kern 2.3pt}]_{G_{1}}\), and let μ _{2} be an arbitrary mapping in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). Then μ _{2} extends some \(\mu ^{\prime }\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{2}}\). By the inductive hypothesis for claim 2, we have that \(\mu ^{\prime }\not \sqsubset \mu _{1}\), and hence \(\mu _{2}\not \sqsubset \mu _{1}\).
Consider now the inductive step for the case when P = P _{1} F I L T E R R. Since P _{1} is not a basic pattern, we only need to show that \(\mu _{2}\not \sqsubset \mu _{1}\) for any \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). This holds by the inductive hypothesis, because \(\mu _{1}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{1}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{2}}\) for any such μ _{1} and μ _{2}. □
In contrast to Proposition 8, patterns in \(\mathcal {P}\) do not generally satisfy nonreducibility. For instance, consider again pattern P, graphs G _{1}, G _{2}, and mappings μ _{1}, μ _{2} from Example 4. Pattern P is not nonreducing since \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) but \(\mu _{2}\sqsubset \mu _{1}\). Therefore, we have the following theorem.
Theorem 4
It holds that \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}<\mathcal {P}\) .
We next compare \(\mathcal {U}_{\text {wwd}}\) to \(\mathcal {U}_{\text {wd}}\) and \(\mathcal {U}\), as well as \(\mathcal {S}_{\text {wwd}}\) to \(\mathcal {S}_{\text {wd}}\) and \(\mathcal {S}\) (note that neither UNION nor projection via S E L E C Tcan be expressed by means of the other operators [40], so adding either construct makes each fragment strictly more expressive). It is easily seen that \(\mathcal {U}_{\text {wd}}\) and \(\mathcal {S}_{\text {wd}}\) inherit weak monotonicity from \(\mathcal {P}_{\text {wd}}\) [27, 33], and hence \(\mathcal {U}_{\text {wd}}<\mathcal {U}_{\text {wwd}}\) and \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}\). Nonreducibility, however, does not propagate to unions.
Example 5
We have \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) but \(\mu _{2}\sqsubset \mu _{1}\), which is due to the fact that μ _{2} is already contained in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\) along with μ _{1}. This is only possible in the presence of UNION since all mappings in the evaluation of a UNIONfree pattern are mutually nonsubsuming (see Lemma 2).
Thus, to account for UNION, we introduce the following, more delicate property.
Definition 11
A query Q is extensionwitnessing (ewitnessing) if for any two graphs G _{1} ⊆ G _{2} and mapping \(\mu \!\in \![{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{2}}\) such that \(\mu \!\notin \![{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{1}}\!\) there is a triple t in Q such that v a r s(t) ⊆ d o m(μ) and μ(t) ∈ G _{2} ∖ G _{1}. A fragment is ewitnessing if so are all of its queries.
Informally, a query Q is ewitnessing if whenever an extension of a graph leads to a new answer, this answer is justified by a triple pattern in Q which maps to the extension. Unions of wwdpatterns can be shown ewitnessing.
Proposition 9
Fragment \(\mathcal {U}_{\text {wwd}}\) is ewitnessing.
Proof
Let \(P\in \mathcal {U}_{\text {wwd}}\) and let G _{1}, G _{2} be graphs such that G _{1} ⊆ G _{2}. Let μ be a mapping in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) but not in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\). We show that there is some t ∈ t r i p l e s(P) such that μ(t) ∈ G _{2} ∖ G _{1}.
Since P is a union of wwdpatterns, there is some wwdpattern P ^{′} in the union such that \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\). It suffices to show \(\mu (\mathsf {triples}(P^{\prime }_{\mu }))\cap (G_{2}\setminus G_{1})\ne \emptyset \), where \(P^{\prime }_{\mu }\) is the pattern corresponding to the maximal rsubtree of P witnessing μ in G _{2} (i.e., the part of P in the image of μ, see Definition 8). We know that \(\mu (\mathsf {triples}(P^{\prime }_{\mu })) \subseteq G_{2}\). Assume, for contradiction, that \(\mu (\mathsf {triples}(P^{\prime }_{\mu }))\subseteq G_{1}\). Then μ is a ppsolution to P ^{′} over G _{1}. We next show that μ is a real solution to P ^{′} over G _{1}. By Lemma 3, it suffices to show that (a) for any child u of \(\mathcal {T}(P^{\prime }_{\mu })\) labelled with (B, R), there is no mapping μ ^{′} such that \(\mu _{u}\sqsubseteq \mu ^{\prime }\), μ ^{′} ⊧ R, and μ ^{′}(B) ⊆ G _{1}, and (b) μ_{ s } ⊧ R for any special node s in \(\mathcal {T}(P^{\prime })\) labelled with R. Claim (a) holds since \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\) and G _{1} ⊆ G _{2} while (b) holds since \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\) and the claim does not depend on the graph over which the evaluation is computed. Consequently, \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{1}}\), and hence \(\mu \in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\), in contradiction to the assumption. □
On the other hand, \(\mathcal {U}\) is not ewitnessing, as can be seen on the pattern and graphs in Example 4. Hence, we obtain the following theorem.
Theorem 5
It holds that \(\mathcal {U}_{\text {wd}}<\mathcal {U}_{\text {wwd}}<\mathcal {U}\) .
Next we move to the fragments that allow for projection. As already mentioned, we have \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}\) since \(\mathcal {S}_{\text {wd}}\) is weakly monotone while \(\mathcal {S}_{\text {wwd}}\) is not. However, \(\mathcal {S}_{\text {wwd}}\) is not ewitnessing, so we cannot apply the technique of Theorem 5 to establish \(\mathcal {S}_{\text {wwd}}<\mathcal {S}\); instead, we make use of the following lemma.
Lemma 4
Let Q be a query in \(\mathcal {S}_{\text {wwd}}\) and G be a graph. For every graph G _{1} with G ⊆ G _{1} and every \(\mu \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{{G_{1}}}\) , there is a graph G _{2} with G ⊆ G _{2} such that \(\mu \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{{G_{2}}}\) and G _{2} ≤ G + t r i p l e s(Q).
Proof
Let Q = S E L E C T X W H E R E P, for P a union of wwdpatterns, and let G, G _{1} and μ be as required. Then there is a wwdpattern P ^{′} in the union P such that \(\mu ^{\prime }\in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{1}}\) for some μ ^{′} with μ ^{′}_{ X } = μ. Let \(G_{2}=G\cup \mu ^{\prime }(\mathsf {triples}(P^{\prime }_{\mu ^{\prime }}))\). Clearly, G _{2} ≤ G + t r i p l e s(Q), so it suffices to show that \(\mu ^{\prime }\in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\).
By construction, μ ^{′} is a ppsolution to P ^{′} over G _{2}. Moreover, since μ ^{′} is a solution to P ^{′} over G _{1}, we have that μ_{ s } ⊧ R for every special node s in \(\mathcal {T}(P^{\prime })\) labelled with R. Finally, suppose for contradiction that there is a child v of \(\mathcal {T}(P^{\prime }_{\mu ^{\prime }})\) labelled with (B, R) and a mapping μ ^{″} such that \(\mu ^{\prime }_{v}\sqsubseteq \mu ^{\prime \prime }\), μ ^{″} ⊧ R, and μ ^{″}(B) ⊆ G _{2}. However, since G _{2} ⊆ G _{1}, we then have μ ^{″}(B) ⊆ G _{1}, which contradicts the fact that \(\mu ^{\prime }\in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{1}}\). □
This lemma is the base of the last result of the section.
Theorem 6
It holds that \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}<\mathcal {S}\) .
Proof
As observed before, the inclusion \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}\) holds since \(\mathcal {S}_{\text {wd}}\) is weakly monotone [27, 33] and \(\mathcal {S}_{\text {wwd}}\) is not.
Suppose for contradiction there is a query Q ^{′} in \(\mathcal {S}_{\text {wwd}}\) such that Q ^{′} ≡ Q. Let n = t r i p l e s(Q ^{′}) + 2. Then, by Lemma 4, μ ∈ [[Q ^{′}]]_{ G } for some G with G ≤ G _{ n } + t r i p l e s(Q ^{′}) = G _{ n } + n − 2, which contradicts the above observation for Q. □
7 Static Analysis of wwdPatterns
This problem is commonly generalised to \(\textsc {Containment}({\mathcal {L}})\), in which one checks whether Q is contained in Q ^{′}, written Q ⊆ Q ^{′}, that is, whether [[Q]]_{ G } ⊆ [[Q ^{′}]]_{ G } holds for every graph G. We have Q ≡ Q ^{′} if and only if Q and Q ^{′} contain each other.
These problems have been studied for F I L T E Rfree wdpatterns in [27, 35], establishing NPcompleteness of equivalence and containment. Moreover, both problems are \({{\Pi }_{2}^{p}}\)complete for unions of F I L T E Rfree wdpatterns, and undecidable for fragments with projection. Finally, from the results in [41] it follows that containment is undecidable for \(\mathcal {U}\). On the other hand, nothing seems to be known so far for welldesigned patterns with F I L T E R.
We next show that equivalence and containment are both \({{\Pi }_2^p}\)complete for \(\mathcal {P}_{\text {wwd}}\) and \(\mathcal {U}_{\text {wwd}}\) (whereas they are undecidable for \(\mathcal {S}_{\text {wwd}}\) by the results in [35]). As the following lemma shows, the upper bound for containment follows from a small counterexample property: if P ⊈ P ^{′} for some P and P ^{′} from \(\mathcal {U}_{\text {wwd}}\), then there is a witnessing mapping and graph of size O(P + P ^{′}). Given this property, a \({{\Pi }_2^p}\) algorithm for containment is straightforward—we guess a mapping μ and a graph G of linear size, check that μ ∉ [[P ^{′}]]_{ G }, and then call a coNP oracle for checking μ ∈ [[P]]_{ G }. As a corollary, Equivalence(\({\mathcal {U}_{\text {wwd}}}\)) is also in \({{\Pi }_{2}^{p}}\).
Lemma 5
Let P and P ^{′} be two patterns from \(\mathcal {U}_{\text {wwd}}\) . If P ⊈ P ^{′} then there exists a mapping μ and a graph G of size O(t r i p l e s(P) + t r i p l e s(P ^{′})) such that μ ∈ [[P]]_{ G } but μ ∉ [[P ^{′}]]_{ G } .
Proof
Since P ⊈ P ^{′}, there exists a graph G ^{′}, a mapping μ, and a pattern P _{ i }, 1 ≤ i ≤ n, such that \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G^{\prime }}\), but \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }}\) for every \(P^{\prime }_{j}\). Hence, μ is a ppsolution to P _{ i } over G ^{′} with corresponding rsubtree \(\mathcal {T}((P_{i})_{\mu })\) of the CPT \(\mathcal {T}(P_{i})\). Let G _{0} = μ(t r i p l e s((P _{ i })_{ μ })). By construction, we have that G _{0} ⊆ G ^{′} and G _{0} ≤ t r i p l e s((P _{ i })_{ μ }) ≤ t r i p l e s(P _{ i }) ≤ t r i p l e s(P). Moreover, \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{0}}\), because all the matches and constraints, including the ones on the toplevel, stay unchanged. In fact, \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G^{\prime \prime }}\) for any G ^{″} such that G _{0} ⊆ G ^{″}⊆ G ^{′}.
If \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{0}}\) for every j, then G _{0} satisfies all the properties required from G. Otherwise, there exists \(P^{\prime }_{j}\) among \(P^{\prime }_{1}, \ldots , P^{\prime }_{m}\) such that \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{0}}\). Since G _{0} ⊆ G ^{′}, μ is a ppsolution to \(P^{\prime }_{j}\) over G ^{′}. Consider the corresponding pattern \((P^{\prime }_{j})_{\mu }\) (i.e., the maximal pattern witnessing μ in G ^{′} obtained from \(P^{\prime }_{j}\) by dropping the right arguments of some O P T operators), the rsubtree \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) of the CPT \(\mathcal {T}(P^{\prime }_{j})\), and the “image” \(\mu (\mathsf {triples}((P^{\prime }_{j})_{\mu }))\). Note that we may have \(\mu (\mathsf {triples}((P^{\prime }_{j})_{\mu })) \subseteq G_{0}\) or not: the latter is possible because the maximal rsubtree of \(\mathcal {T}(P^{\prime }_{j})\) witnessing μ in G _{0} may be different from \(\mathcal {T}((P^{\prime }_{j})_{\mu })\), which is maximal in G ^{′}. Let \(G^{\prime }_{1} = G_{0} \cup \mu (\mathsf {triples}((P^{\prime }_{j})_{\mu }))\). We define G _{1} depending on whether \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }_{1}}\) or not. If \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }_{1}}\), then let \(G_{1} = G^{\prime }_{1}\). Otherwise, since \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }}\) by assumption, there exists a child v of \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) and a mapping μ _{0} such that \(\mu _{v}\sqsubset \mu _{0}\) and μ _{0}(t r i p l e s(v)) ⊆ G ^{′}. Then the graph \(G_{1} = G^{\prime }_{1} \cup \mu _{0}(\mathsf {triples}(v))\) is such that \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{1}}\). In either case, \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{1}}\) because G _{0} ⊆ G _{1} ⊆ G ^{′}. Moreover, we have \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime \prime }}\) for every G ^{″} such that G _{1} ⊆ G ^{″} ⊆ G ^{′}. To see this, suppose for contradiction that \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime \prime }}\) for a graph G ^{″} as above. Then there must be a child v ^{′} of \(\mathcal {T}((P^{\prime }_{j})_{\mu _{0}})\) such that v ^{′}≺ v, \(\mu _{o}_{v^{\prime }}\sqsubset \mu \) and μ(t r i p l e s(v ^{′})) ⊆ G ^{″}. Since \(\mathcal {T}((P^{\prime }_{j})_{\mu _{0}})\) and \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) are identical restricted to nodes preceding v with respect to ≺, v ^{′} is a child of \(\mathcal {T}((P^{\prime }_{j})_{\mu })\). Thus, v ^{′} is not contained in \(\mathcal {T}((P^{\prime }_{j})_{\mu })\), which contradicts maximality of \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) since μ(t r i p l e s(v ^{′})) ⊆ G ^{″}⊆ G ^{′}.
If \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{1}}\) for all other j as well, then G _{1} satisfies all the properties required from G. Otherwise we can extend G _{1} to a graph G _{2} on the base of some other \(P^{\prime }_{j}\) with \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{1}}\) in the same way as G _{1} extends G _{0}. We then have G _{2} ⊆ G ^{′}, \(\mu \in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\), and \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{2}}\) for j from both steps. Repeating the extension step until there are no \(P^{\prime }_{j}\) having μ as a solution on the resulting graph, we obtain a graph that satisfies all the properties required from G; in particular, for each j the number of added triples to the graph is bounded by \(\mathsf {triples}(P^{\prime }_{j})\). □
Hardness of equivalence is established in the following lemma by a reduction of ∀∃3SAT, while containment is \({{\Pi }_2^p}\)hard by the results in [35]. Note that both results hold even for fragments without F I L T E R.
Lemma 6
Problem Equivalence ( \({\mathcal {O}_{\text {wwd}}}\) ) is \({{\Pi }_{2}^{p}}\) hard for the fragment \(\mathcal {O}_{\text {wwd}}\) of F I L T E R free wwdpatterns.
Proof
We next show that ϕ is true if and only if P is equivalent to P ^{′}, starting with the forward direction.
Let ϕ be true, yet, for the sake of contradiction, P is not equivalent to P ^{′}. Then there is a graph G and mapping μ such that μ ∈ [[P]]_{ G }, but μ ∉ [[P ^{′}]]_{ G }. Since patterns P and P ^{′} have the same root B _{base}, which contains ?u as the only variable, we conclude that ?u ∈ d o m(μ). Each ?x _{ i } is also in d o m(μ) by the construction of P, since there is a homomorphism from the corresponding leaf \(B_{i}^{\top }\) to the root B _{base}. However, it is not necessary that \({(\mu (?x_{i}), iri_{x_{i}}, \top )}\) is in G because if G contains a triple of the form \({(c, iri_{x_{i}}, \bot )}\) for some IRI c, we will have \({(\mu (?x_{i}), iri_{x_{i}}, \bot )}\in G\). Note also that nothing prevents G from containing both a triple \({(c, iri_{x_{i}}, \bot )}\) and a triple \({(c, iri_{x_{i}}, \top )}\) for some i. Depending on whether ?s ∈ d o m(μ) or not, we have two cases.
Case 1
Let ?s ∈ d o m(μ), that is, there is a homomorphism from B _{ ψ } to G that aligns with the previous assignment of all ?x _{ i }. In particular, this means that d o m(μ) = v a r s(P) = v a r s(P ^{′}). If there is no homomorphism from B _{valid} to G, then μ ∈ [[P ^{′}]]_{ G }, because B _{ ψ } is the last leaf of P ^{′} as well, and nothing prevents it from matching. But this contradicts the assumption. However, even if there is a homomorphism h from B _{valid} to G, we still have a contradiction because \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G}\) still holds. Indeed, ?s is the only variable in B _{valid} and is essentially isolated in B _{valid}, so if h is a homomorphism from B _{valid} to G, then h ^{′}, which maps ?s to μ(?s), is also such a homomorphism (in other words, since by the assumptions of this case, we know that (?s, r, ?s) has a match in G, the existence of h just means that all the ground triples of B _{valid} are in G). This means, however, that nothing prevents B _{ ψ } from matching in P ^{′}, implying μ ∈ [[P ^{′}]]_{ G }.
Case 2
Let ?s ∉ d o m(μ). Since μ ∉ [[P ^{′}]]_{ G }, there is no homomorphism from B _{ ψ } to G but there is one from B _{valid} to G, that is, all ground triples of B _{valid} are in G (the nonexistence of a homomorphism from B _{ ψ } to G is immediate since ?s ∉ d o m(μ) and μ ∈ [[P]]_{ G }; the existence of a homomorphism from B _{valid} to G then follows since otherwise we would have \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G}\)). Consider now a truth assignment α of variables \(\bar {x}\) such that if α(x _{ i }) is true then \({(\mu (?x_{i}), iri_{x_{i}}, \top )} \in G\) and if α(x _{ i }) is false then \({(\mu (?x_{i}), iri_{x_{i}}, \bot )} \in G\) (as we mentioned earlier, α may be not unique, but the argument does not depend on its uniqueness). Since ϕ is true, we know that α can be extended to the variables \(\bar y\) in such a way that each clause in ψ holds. Let α ^{′} be such an extension, and let μ ^{′} be an extension of μ to the variables in B _{ ψ } such that, for all j, μ ^{′}(?y _{ j }) = c y ^{⊤} if α ^{′}(y _{ j }) is true and μ ^{′}(?y _{ j }) = c y ^{⊥} otherwise. Then, for every clause γ in ψ, the IRIs \(\mu ^{\prime }(?v_{\gamma }^{1} )\), \(\mu ^{\prime }(?v_{\gamma }^{2})\), \(\mu ^{\prime }(?v_{\gamma }^{3} )\) correspond to the values α ^{′}(z _{1}), α ^{′}(z _{2}), α ^{′}(z _{3}), respectively, where z _{1}, z _{2}, z _{3} are the variables in the literals t _{1}, t _{2}, t _{3} of γ. Moreover, \(\mu ^{\prime }(?c_{\gamma }) = \mathit {cl}_{\gamma }^{\ell } \) for ℓ the number of the assignment α ^{′}(z _{1}), α ^{′}(z _{2}), α ^{′}(z _{3}); this assignment makes γ true by the choice of α ^{′} (in other words, for every γ there is some ℓ such that \({(\mu ^{\prime }(?c_{\gamma }), \mathit {lit}_{\gamma }^{i}, \mu ^{\prime }(?v_{\gamma }^{i} ))}\in B_{\gamma }^{\ell }\) for all 1 ≤ i ≤ 3). Hence, the extension μ ^{′} is contained in [[P]]_{ G }, and hence μ ∉ [[P]]_{ G }, which, however, contradicts the original assumption.
Since both cases yield a contradiction, we conclude that P is equivalent to P ^{′}.

the triple \({(u, iri_{x_{i}}, \top )}\), for each x _{ i } in \(\bar {x}\) and a fresh IRI u;

the triple \({(c_{x_{i}}, iri_{x_{i}}, \top )}\), for each x _{ i } in \(\bar {x}\) with α(x _{ i }) true, and the triple \({({c_{x_{i}}}, {iri_{x_{i}}}, {\bot })}\), for each x _{ i } in \(\bar {x}\) with α(x _{ i }) false, where \(c_{x_{1}},\dots ,c_{x_{n}}\) are fresh IRIs;

all ground triples from B _{valid}, that is, all of its triples except (?s, r, ?s);

the triple (s, r, s) for a fresh IRI s.

μ(?u) = u;

\(\mu (?x_{i}) = c_{x_{i}}\), for each x _{ i } in \(\bar {x}\).
Thus, we have shown that ϕ is true if and only if P ≡ P ^{′}. □
Theorem 7
Problems \(\textsc {Equivalence}({\mathcal {L}})\) and \(\textsc {Containment}({\mathcal {L}})\) are both \({{\Pi }_2^p}\) complete for any \(\mathcal {L}\!\in \!\{\mathcal {P}_{\text {wwd}}, \mathcal {U}_{\text {wwd}}\}\) .
Proof
The existence of a \({{\Pi }_2^p}\) algorithm for containment immediately follows from Lemma 5: to show that P ⊈ P ^{′}, for \(P,P^{\prime }\in \mathcal {P}_{\text {wwd}}\), we just need to guess, in NP, a graph G of linear size as well as a mapping μ, check that μ ∉ [[P ^{′}]]_{ G }, and then call for a coNP oracle for checking that μ ∈ [[P]]_{ G }. The claim for patterns in \(\mathcal {U}_{\text {wwd}}\) is similar, but involves guessing a disjunct P _{1} of P with μ ∈ [[P _{1}]]_{ G } and checking \(\mu \notin [{\kern 2.3pt}[ {P^{\prime }_{1}} ]{\kern 2.3pt}]_{G} \) for every disjunct \(P^{\prime }_{1}\) of P ^{′}. Since P ≡ P ^{′} if and only if containment holds in both directions, the problem \(\textsc {Equivalence}({\mathcal {U}_{\text {wwd}}})\) is also in \({{\Pi }_{2}^{p}}\).
Hardness follows by the results in [35] for containment and by Lemma 6 for equivalence. □
Hence, for U N I O N and F I L T E Rfree patterns, the step from welldesigned to weakly welldesigned O P T incurs a complexity jump for containment and equivalence. However, for the fragments with U N I O N or projection complexity remains the same in both cases. As far as we are aware, these are the first decidability results on query equivalence and related problems for SPARQL fragments with O P T and F I L T E R.
8 Analysis of DBpedia Logs
In this section, we present an analysis of query logs over DBpedia, which suggests that the step from wdpatterns to wwdpatterns makes a dramatic difference in real life: while only about half of the queries with O P T have welldesigned patterns, almost all of these patterns fall into the weakly welldesigned fragment.
DBpedia [26] is a project providing access to RDF data extracted from Wikipedia via a SPARQL endpoint. DBpedia query logs are well suited for analysing the structure of reallife SPARQL queries as they contain a large amount of generalpurpose knowledge base queries, generated both manually and automatically. DBpedia query logs have been analysed by Picalausa and Vansummeren [34], who reported that, over a period in 2010, about 46.38% of a total of 1344K distinct DBpedia queries used O P T. However, only 47.80% of the queries with O P T had welldesigned patterns. Another analysis of DBpedia logs from the USEWOD 2011 data set performed by Arias Gallego et al. [7] concluded that 16.61% of about 5166K queries contained O P T; however, detailed structure of queries was not analysed.
We considered query logs over DBpedia 3.9 from USEWOD 2015 [30] and USEWOD 2016 [29]. The USEWOD 2015 DBpedia dataset is a random selection of almost 14M queries from the first half of 2014 while the USEWOD 2016 dataset contains 35M queries from the second half of 2015. We removed syntactically incorrect queries as well as queries outside of \(\mathcal {S}\) (in particular, queries using operators specific to SPARQL 1.1). Also, we rewrote the patterns of the remaining queries to unions of U N I O Nfree patterns as proposed in [33] and eliminated duplicates, which left us with 6.6M queries in USEWOD 2015 and 9.1M queries in USEWOD 2016. Finally, we isolated queries involving O P T and counted how many of their patterns were in \(\mathcal {U}_{\text {wd}}\) and in \(\mathcal {U}_{\text {wwd}}\).
Structure of query patterns in DBpedia logs from USEWOD 2015 and 2016
USEWOD 2015  USEWOD 2016  

Unique patterns  Fraction of total  Fraction of patterns with O P T  Unique patterns  Fraction of total  Fraction of patterns with O P T  
Total  6 606 201  100%  9 119 492  100%  
Patterns with O P T  1 147 704  17.37%  100%  1 582 698  17.36%  100% 
Unions of wdpatterns  500 676  7.58%  43.62%  816 276  8.95%  51.57% 
Unions of wwdpatterns  1 147 135  17.36%  99.95%  1 582 339  17.35%  99.98% 
9 Conclusion and Future Work
In this paper, we introduced a new fragment of SPARQL patterns called weakly welldesigned patterns. This fragment extends the widely studied welldesigned fragment by allowing variables from the optional side of an O P Tsubpattern that are not “guarded” by the mandatory side to occur in certain positions outside of the subpattern. We showed that queries with wwdpatterns enjoy the same low complexity of evaluation as welldesigned queries but cover almost all reallife queries. Moreover, our fragment is the maximal coNP fragment that does not impose structural restrictions on basic patterns and filter conditions. We studied the expressive power of the fragment and the complexity of its query optimisation problems.
For future work, we want to extend wwdpatterns to allow for nontoplevel occurrences of U N I O N and projection. As we have seen in the previous section, this promises to be a challenging task since a naive extension of our definitions to such constructs is likely to increase reasoning complexity. Also, we want to take into account features of SPARQL 1.1 [17] such as G R A P H, N O T E X I S T S and property paths. Finally, we would like to implement our ideas in a prototype system and compare its performance with existing SPARQL engines.
Notes
Acknowledgements
This work was supported by the EPSRC projects Score!, DBOnto, and ED3.
References
 1.Ahmetaj, S., Fischl, W., Pichler, R., Simkus, M., Skritek, S.: Towards reconciling SPARQL and certain answers. In: Gangemi, A., Leonardi, S., Panconesi, A. (eds.) Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 23–33. ACM (2015)Google Scholar
 2.Angles, R., Gutierrez, C.: The expressive power of SPARQL. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T.W., Thirunarayan, K. (eds.) ISWC 2008, LNCS, vol. 5318, pp. 114–129. Springer (2008)Google Scholar
 3.Arenas, M., Conca, S., Pérez, J.: Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In: Mille, A., Gandon, F.L., Misselis, J., Rabinovich, M., Staab, S. (eds.) Proceedings of the 21st World Wide Web Conference, WWW 2012, pp. 629–638. ACM (2012)Google Scholar
 4.Arenas, M., Gottlob, G., Pieris, A.: Expressive languages for querying the semantic web. In: Hull, R., Grohe, M. (eds.) Proceedings of the 33rd ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2014, pp. 14–26. ACM (2014)Google Scholar
 5.Arenas, M., Pėrez, J.: Querying Semantic Web Data with SPARQL. In: Lenzerini, M., Schwentick, T. (eds.) Proceedings 30th ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2011, pp. 305–316. ACM (2011)Google Scholar
 6.Arenas, M., Ugarte, M.: Designing a query language for RDF: marrying open and closed worlds. In: Milo, T., Tan, W. (eds.) Proceedings 35th ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems, PODS 2016, pp. 225–236. ACM (2016)Google Scholar
 7.Arias Gallego, M., Fernández, J.D., MartínezPrieto, M.A., de la Fuente, P.: An empirical study of realworld SPARQL queries Proceedings of the 1st International Workshop on Usage Analysis and the Web of Data, USEWOD 2011. arXiv:1103.5043 (2011)
 8.Barceló, P., Pichler, R., Skritek, S.: Efficient Evaluation and Approximation of WellDesigned Pattern Trees. In: Milo, T., Calvanese, D. (eds.) Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS 2015, pp. 131–144. ACM (2015)Google Scholar
 9.Bischof, S., Krótzsch, M., Polleres, A., Rudolph, S.: Schemaagnostic query rewriting in SPARQL 1.1. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part I, LNCS, vol. 8796, pp. 584–600. Springer (2014)Google Scholar
 10.Buil Aranda, C., Arenas, M., Corcho, Ó., Simperl, E.P.B.: Semantics and optimization of the SPARQL 1.1 federation extension. In: Antoniou, G., Grobelnik, M., Parsia, B., Plexousakis, D., Leenheer, P.D., Pan, J.Z. (eds.) ESWC 2011, Part II, LNCS, vol. 6644, pp. 1–15. Springer (2011)Google Scholar
 11.Buil Aranda, C., Polleres, A., Umbrich, J., Knoblock, C.A., Vrandecic, D.: Strategies for executing federated queries in SPARQL 1.1. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part II, LNCS, vol. 8797, pp. 390–405. Springer (2014)Google Scholar
 12.Chekol, M.W., Euzenat, J., Genevès, P., Layaïda, N.: SPARQL query containment under RDFS entailment regime. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012, LNCS, vol. 7364, pp. 134–148. Springer (2012)Google Scholar
 13.Chekol, M.W., Euzenat, J., Genevès, P., Layaïda, N.: SPARQL query containment under SHI Axioms. In: Hoffmann, J., Selman, B. (eds.) Proceedings of the 26th AAAI Conference on Artificial Intelligence, AAAI 2012, pp. 10–16. AAAI Press (2012)Google Scholar
 14.Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 concepts and abstract syntax. W3C recommendation, W3C. http://www.w3.org/TR/rdf11concepts/ (2014)
 15.Geerts, F., Unger, T., Karvounarakis, G., Fundulaki, I., Christophides, V.: Algebraic structures for capturing the provenance of SPARQL queries. J. ACM 63(1), 7:1–7:63 (2016)MathSciNetCrossRefGoogle Scholar
 16.Halpin, H., Cheney, J.: Dynamic Provenance for SPARQL Updates. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part I, LNCS, vol. 8796, pp. 425–440. Springer (2014)Google Scholar
 17.Harris, S., Seaborne, A.: SPARQL 1.1 query language. W3C recommendation, W3C. http://www.w3.org/TR/sparql11query/ (2013)
 18.Hayes, P.J., PatelSchneider, P.F.: RDF 1.1 semantics. W3C recommendation, W3C. http://www.w3.org/TR/rdf11mt/ (2014)
 19.Kaminski, M., Kostylev, E.V.: Beyond welldesigned SPARQL. In: Martens, W., Zeume, T. (eds.) Proceedings of the 19th International Conference on Database Theory, ICDT 2016, LIPIcs, vol. 48, pp. 5:1–5:18. Schloss Dagstuhl  LeibnizZentrum für Informatik (2016)Google Scholar
 20.Kaminski, M., Kostylev, E.V., Cuenca Grau, B.: Semantics and expressive power of subqueries and aggregates in SPARQL 1.1. In: Bourdeau, J., Hendler, J., Nkambou, R., Horrocks, I., Zhao, B.Y. (eds.) Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 227–238. ACM (2016)Google Scholar
 21.Kontchakov, R., Kostylev, E.V.: On expressibility of nonmonotone operators in SPARQL. In: Baral, C., Delgrande, J.P., Wolter, F. (eds.) Proceedings of the 15th International Conference on Principles of Knowledge Representation and Reasoning, KR 2016, pp. 369–379. AAAI Press (2016)Google Scholar
 22.Kontchakov, R., Rezk, M., Rodriguezmuro, M., Xiao, G., Zakharyaschev, M.: Answering SPARQL queries over databases under OWL 2 QL entailment regime. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C. A. (eds.) ISWC 2014, Part I, LNCS, vol. 8796, pp. 552–567. Springer (2014)Google Scholar
 23.Kostylev, E.V., Cuenca Grau, B.: On the semantics of SPARQL queries with optional matching under entailment regimes. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part II, LNCS, vol. 8797, pp. 374–389. Springer (2014)Google Scholar
 24.Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoc, D., Staab, S.: SPARQL with property paths. In: Arenas, M., Corcho, Ȯ., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P.T., Dumontier, M., Heflin, J., Thirunarayan, K. (eds.) ISWC 2015, Part I, LNCS, vol. 9366, pp. 3–18. Springer (2015)Google Scholar
 25.Kostylev, E.V., Reutter, J.L., Ugarte, M.: CONSTRUCT Queries in SPARQL. In: Arenas, M., Ugarte, M. (eds.) Proceedings of the 18th International Conference on Database Theory, ICDT 2015, LIPIcs, vol. 31, pp. 212–229. Schloss Dagstuhl  LeibnizZentrum für Informatik (2015)Google Scholar
 26.Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia—a largescale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)Google Scholar
 27.Letelier, A., Pėrez, J., Pichler, R., Skritek, S.: Static analysis and optimization of semantic web queries. ACM Trans. Database Syst. 38(4), 25 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Losemann, K., Martens, W.: The complexity of evaluating path expressions in SPARQL. In: Benedikt, M., Krótzsch, M., Lenzerini, M. (eds.) Proceedings of the 31st ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2012, pp. 101–112. ACM (2012)Google Scholar
 29.LuczakRösch, M., Aljaloud, S., Berendt, B., Hollink, L.: USEWOD 2016 research dataset. doi: 10.5258/SOTON/385344 (2016)
 30.LuczakRösch, M., Berendt, B., Hollink, L.: USEWOD 2015 research dataset. doi: 10.5258/SOTON/379407 (2015)
 31.Manola, F., Miller, E., McBride, B.: RDF 1.1 primer. W3C working group note, W3C. http://www.w3.org/TR/rdf11primer/ (2014)
 32.Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I.F., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006, LNCS, vol. 4273, pp. 30–43. Springer (2006)Google Scholar
 33.Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)CrossRefGoogle Scholar
 34.Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Virgilio, R.D., Giunchiglia, F., Tanca, L. (eds.) Proceedings of the 3rd International Workshop on Semantic Web Information Management, SWIM 2011, pp. 7:1–7:6. ACM (2011)Google Scholar
 35.Pichler, R., Skritek, S.: Containment and equivalence of welldesigned SPARQL. In: Hull, R., Grohe, M. (eds.) Proceedings of the 33rd ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2014, pp. 39–50. ACM (2014)Google Scholar
 36.Polleres, A., Wallner, J.P.: On the relation between SPARQL 1.1 and answer set programming. J. Appl. NonClassical Log. 23(1–2), 159–212 (2013)MathSciNetCrossRefGoogle Scholar
 37.Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C recommendation, W3C. http://www.w3.org/TR/rdfsparqlquery/ (2008)
 38.Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Segoufin, L. (ed.) Proceedings of the 13Th International Conference on Database Theory, ICDT 2010, pp. 4–33. ACM (2010)Google Scholar
 39.Zhang, X., Van den Bussche, J.: On the power of SPARQL in expressing navigational queries. Comput. J. 58(11), 2841–2851 (2015)CrossRefGoogle Scholar
 40.Zhang, X., Van den bussche, J.: On the primitivity of operators in SPARQL. Inf. Process. Lett. 114(9), 480–485 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 41.Zhang, X., Van den bussche, J., Picalausa, F.: On the satisfiability problem for SPARQL patterns. J. Artif. Intell. Res. (JAIR) 56, 403–428 (2016)MathSciNetzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.