Abstract
SPARQL is the standard query language for RDF data. The distinctive feature of SPARQL is the OPTIONAL operator, which allows for partial answers when complete answers are not available due to lack of information. However, optional matching is computationally expensive—query answering is PSPACEcomplete. The welldesigned fragment of SPARQL achieves much better computational properties by restricting the use of optional matching—query answering becomes coNPcomplete. On the downside, welldesigned SPARQL captures far from all reallife queries—in fact, only about half of the queries over DBpedia that use OPTIONAL are welldesigned. In the present paper, we study queries outside of welldesigned SPARQL. We introduce the class of weakly welldesigned queries that subsumes welldesigned queries and includes most common meaningful nonwelldesigned queries: our analysis shows that the new fragment captures over 99% of DBpedia queries with OPTIONAL. At the same time, query answering for weakly welldesigned SPARQL remains coNPcomplete, and our fragment is in a certain sense maximal for this complexity. We show that the fragment’s expressive power is strictly inbetween welldesigned and full SPARQL. Finally, we provide an intuitive normal form for weakly welldesigned queries and study the complexity of containment and equivalence.
1 Introduction
The Resource Description Framework (RDF) [14, 18, 31] is the W3C standard for representing linked data on the Web. RDF models information in terms of labelled graphs consisting of triples of resource identifiers (IRIs). The first and last IRIs in such a triple, called subject and object, represent entity resources, while the middle IRI, called predicate, represents a relation between the two entities.
SPARQL [17, 37] is the default query language for RDF graphs. First standardised in 2008 [37], SPARQL is now recognised as a key technology for the Semantic Web. This is witnessed by a recent adoption of a new version of the standard, SPARQL 1.1 [17], as well as by active development of SPARQL query engines in academia and industry, for instance, as part of the systems AllegroGraph (http://franz.com/agraph/allegrograph/), Apache Jena (http://jena.apache.org), RDF4J (http://rdf4j.org), or OpenLink Virtuoso (http://virtuoso.openlinksw.com).
In recent years, SPARQL has been subject to a substantial amount of theoretical research, based on the foundational work by Pérez et al. [32, 33]. In particular, we now know much about evaluation [1, 3, 4, 6, 8, 20, 22, 23, 25, 28, 34, 38], optimisation [8, 9, 12, 13, 24, 27, 35], federation [10, 11], expressive power [2, 20, 21, 25, 36, 39], and provenance tracking [15, 16] for queries from various fragments and extensions of SPARQL. These studies have had a great impact in the community, in fact influencing the evolution of SPARQL as a standard.
A distinctive feature of SPARQL as compared to SQL is the O P T I O N A L operator (abbreviated as O P T in this paper). This operator was introduced to “not reject (solutions) because some part of the query pattern does not match” [37]. For instance, consider the SPARQL query
which retrieves all person IDs from the graph together with their names; names, however, are optional—if the graph does not contain information about the name of a person, the person ID is still retrieved but the variable ?n is left undefined in the answer. For instance, query (1) has two answers over the graph G in Fig. 1a, where the second answer is partial (see Fig. 1b). However, if we extend G with a triple supplying a name for P2, the second answer will include this name.
The O P T operator accounts in a natural way for the open world assumption and the fundamental incompleteness of the Web. However, evaluating queries that use O P T is computationally expensive—Pérez et al. [33] showed PSpacecompleteness of SPARQL query evaluation, and Schmidt et al. [38] refined this result by proving PSpacehardness even for queries using no operators besides O P T. This is not surprising given that SPARQL queries are equivalent in expressive power to firstorder logic queries, and translations in both directions can be done in polynomial time [2, 25, 36].
This spurred a search for restrictions on the use of O P T that would ensure lower complexity of query evaluation. It was also recognised that queries that are difficult to evaluate are often unintuitive. For instance, they may produce less specified answers (i.e., answers with fewer bound variables) as the graph over which they are evaluated grows larger.
Pérez et al. [33] introduced the welldesigned fragment of SPARQL queries by imposing a syntactic restriction on the use of variables in O P Texpressions. Roughly speaking, each variable in the optional (i.e., right) argument of an O P Texpression should either appear in the mandatory (i.e., left) argument or be globally fresh for the query, i.e., appear nowhere outside of the argument. Welldesigned queries have lower complexity of query evaluation—the problem is coNPcomplete (provided all the variables in the query are selected). Moreover, such queries have a more intuitive behaviour than arbitrary SPARQL queries; in particular, they enjoy the monotonicity property that we observed for query (1): each partial answer over a graph can potentially be extended to undefined variables if the graph is completed with the missing information, and the more information we have the more specified are the answers. Welldesigned queries can be efficiently transformed to an intuitive normal form allowing for a transparent graphical representation of queries as trees [27, 35]. Hence, many recent studies concentrate partially [23, 25, 27, 40, 41] or entirely [1, 8, 35] on welldesigned queries.
Such a success of welldesigned queries may lead to the impression that nonwelldesigned SPARQL queries are just a useless side effect of the early specification. But is this impression justified by the use of SPARQL in practice? To answer this question, a comprehensive analysis of reallife queries is required. We are aware of two works that analyse the distribution of operators in SPARQL queries asked over DBpedia [7, 34]. Both studies show that O P T is used in a nonnegligible amount of practical queries. However, only Picalausa and Vansummeren [34] go further and analyse how many of these queries are welldesigned; and the result is quite interesting—welldesigned queries make up only about half of all queries with O P T. In other words, welldesigned queries are common, but by far not exclusive.
The main goal of this paper is to investigate SPARQL queries beyond the welldesigned fragment. We wanted to see if the welldesignedness condition could be extended so as to include most practical queries while preserving good computational properties. The main result of our study is very positive—we identified a new fragment of SPARQL queries, called weakly welldesigned queries, that covers over 99% of queries over DBpedia and has the same complexity of query evaluation as the welldesigned fragment. We also show that our fragment is in a sense maximal for this complexity.
We next describe our results and techniques in more detail. Our first step was to identify typical reallife queries that are not welldesigned. We analysed DBpedia query logs in recent USEWOD research datasets [29, 30] and found two interesting types of nonwelldesigned queries. The first type is exemplified by the following query:
This query is clearly not welldesigned because variable ?n, binding the name of a person, appears in two different unrelated optional parts. Let us analyse answers to this query over different graphs. On graph G in Fig. 1a the result is exactly the same as for query (1), shown in Fig. 1b, simply because the IRI v_card:name is not present in G, and so cannot be matched against the second optional part of the query. Similarly, on graph G ^{′} in Fig. 1c, where the source of the name and the name itself are different, the result is as in Fig. 1d. In this case, the first optional part in the query does not match anything in the graph so the variable ?n is left unbound at this point; then the second optional is matched, and the variable is assigned with the name from v_card. More interestingly, query (2) evaluated over the graph G ∪ G ^{′} once again yields the result in Fig. 1b. Indeed, in this case, the first optional part has a match again and ?n is assigned the value A n a; then, this variable is already bound and there is no match for the second optional part that agrees with this value, meaning that the alternative v_card name is disregarded by the query. To summarise, query (2) is once again looking for person IDs and, optionally, their names. Now, however, names are collected from two different sources, foaf and v_card, where the first source is given preference over the second (maybe because it is considered more reliable or more informative, or for some other reason). In other words, if we know the foaf name of a person, it is returned as part of the answer regardless of their v_card name; however, if there is no foaf name, then the v_card name is also acceptable and should be returned; variable ?n is left unbound only if the name cannot be extracted from either source.
Of course, preference patterns encountered in reallife queries are often more complex. Still, we will see that in most cases they do not increase the complexity of query evaluation.
Our second example query is as follows:
The query uses F I L T E R, a standard SPARQL operator that admits only answers conforming to a specified constraint. Again, this query is not welldesigned because the F I L T E R constraint mentions the variable ?n, which occurs in the optional part of the query but not in the mandatory part. However, the intention of the query is quite clear: it searches for people whose names are not known to be A n a, including people whose names are unknown.
This use of F I L T E R is in fact very common in reallife queries. Moreover, it is intuitive as long as F I L T E R is essentially the outermost operator in the query, as it is in our example. We will see that in all such cases F I L T E R cannot lead to an increase in complexity.
Having isolated these typical uses of nonwelldesignedness, we identify a new fragment of SPARQL that (a) includes all queries of the above two types, (b) subsumes welldesigned queries, and (c) has the same complexity of query evaluation as welldesigned queries. We call such queries weakly welldesigned. They are the maximal fragment without structural restrictions on conjunctive blocks and filter conditions that has the above properties. Our analysis shows that more than 99% of DBpedia queries with O P T are weakly welldesigned.
Besides low complexity of query evaluation, we establish a few more useful properties of weakly welldesigned queries, which are summarised in the following outline of the paper. After introducing the syntax and semantics of SPARQL in Section 2, we formally define our new fragment in Section 3. In Section 4, we show that, similarly to the welldesigned case, weakly welldesigned queries can be transformed to an intuitive normal form, which allows for a natural graphical representation as constraint pattern trees. Using this representation, in Section 5, we formally show that the step from welldesigned to weakly welldesigned queries does not increase complexity of query evaluation; minimal relaxations of weak welldesignedness, however, already lead to a complexity jump. In Section 6, we compare the expressive power of our fragment (and its extensions with additional operators) with welldesigned queries and unrestricted SPARQL queries; in all cases, we show that the expressivity of weakly welldesigned queries lies strictly inbetween welldesigned and unrestricted queries. In Section 7, we study static analysis problems for weakly welldesigned queries and establish \({{\Pi }_{2}^{p}}\)completeness of equivalence and containment. Finally, in Section 8, we detail our analysis of DBpedia logs.
This article significantly extends the conference paper [19]. Besides providing full proofs of our technical claims, we have extended the analysis section and updated the evaluation to use more recent datasets. Furthermore, we have removed the erroneous claim that queries over unions of weakly welldesigned patterns have the same expressive power as unrestricted SPARQL queries; on the contrary, we show that the former are strictly less expressive than the latter.
2 SPARQL Query Language
We begin by formally introducing the syntax and semantics of SPARQL that we adopt in this paper. Our formal setup mostly follows [33], which has some differences from the W3C specification [17, 37]; in particular, we use twoplaced O P T and twovalued F I L T E R (conditional O P T and errors in F I L T E R evaluation as in the standard are expressible in our formalisation [2, 21]), do not consider blank nodes (their presence in RDF graphs would not change any of our results), and adopt set semantics, leaving multiset answers for future work.
RDF Graphs
An RDF graph is a labelled graph where nodes can also serve as edge labels. Formally, let I be a set of IRIs. Then an RDF triple is a tuple (s, p, o) from I × I × I, where s is called subject, p predicate, and o object. An RDF graph is a finite set of RDF triples.
SPARQL Syntax
Let X be an infinite set {?x, ?y, …} of variables, disjoint from I. Filter constraints are conditions of the form

⊤, ?x = u, ?x = ?y, or b o u n d(?x) for ?x, ?y in X and u ∈ I (these constraints are called atomic),

¬R _{1}, R _{1} ∧ R _{2}, or R _{1} ∨ R _{2} for filter constraints R _{1} and R _{2}.
A basic pattern is a possibly empty set of triples from (I ∪ X) × (I ∪ X) × (I ∪ X) (to avoid notational clutter, in examples we will often omit braces when writing singleton basic patterns, e.g., we will write (?x, u, ?y) instead of {(?x, u, ?y)}). Then, SPARQL (graph) patterns P are defined by the grammar
where B ranges over basic patterns and R over filter constraints. Additionally, we require all filter constraints to be safe, that is, v a r s(R) ⊆ v a r s(P) for every pattern (P F I L T E R R), where v a r s(S) is the set of all variables in S (which can be a pattern, constraint, etc.) When needed, we distinguish between patterns by their toplevel operator; e.g., we write O P Tpattern or F I L T E Rpattern.
We write \(\mathcal {U}\) for the set of all patterns. We also distinguish the fragment \(\mathcal {P}\) of \(\mathcal {U}\) that consists of all U N I O Nfree patterns, that is, patterns that do not use the U N I O N operator.
Projection is realised in SPARQL by means of queries with select result form, or queries for short, which are expressions of the form
where X is a set of variables and P is a graph pattern. We write \(\mathcal {S}\) for the set of all queries. The set of all triples in basic patterns of a query Q is denoted t r i p l e s(Q).
Note that every pattern P can be seen as a query of the form (4) where X = v a r s(P). Hence, all definitions that refer to “queries” implicitly extend to patterns in the obvious way.
SPARQL Semantics
The semantics of graph patterns is defined in terms of mappings, that is, partial functions from variables to IRIs. The domain d o m(μ) of a mapping μ is the set of variables on which μ is defined. Two mappings μ _{1} and μ _{2} are compatible (written μ _{1} ∼ μ _{2}) if μ _{1}(?x) = μ _{2}(?x) for all variables ?x ∈ d o m(μ _{1}) ∩ d o m(μ _{2}). If μ _{1} ∼ μ _{2}, then μ _{1} ∪ μ _{2} constitutes a mapping with domain d o m(μ _{1}) ∪ d o m(μ _{2}) that coincides with μ _{1} on d o m(μ _{1}) and with μ _{2} on d o m(μ _{2}). Given two sets of mappings Ω_{1} and Ω_{2}, we define their join, union and difference as follows:
Based on these, the left outer join operation is defined as
Given a graph G, the evaluation [[P]]_{ G } of a graph pattern P over G is defined as follows:

1.
if B is a basic pattern, then [[B]]_{ G } = {μ : v a r s(B) → I∣μ(B) ⊆ G};

2.
[[(P _{1} A N D P _{2})]]_{ G } = [[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G };
 3.

4.
[[(P _{1} U N I O N P _{2})]]_{ G } = [[P _{1}]]_{ G } ∪ [[P _{2}]]_{ G };

5.
[[(P ^{′} F I L T E R R)]]_{ G } = {μ∣μ ∈ [[P ^{′}]]_{ G } and μ ⊧ R}, where μ satisfies a filter constraint R, denoted by μ ⊧ R, if one of the following holds:

R is ⊤;

R is ?x = u, ?x ∈ d o m(μ), and μ(?x) = u;

R is ?x = ?y, {?x, ?y}⊆ d o m(μ), and μ(?x) = μ(?y);

R is b o u n d(?x) and ?x ∈ d o m(μ);

R is a Boolean combination of filter constraints evaluating to true under the usual interpretation of ¬, ∧, and ∨.

Let μ_{ X } be the projection of a mapping μ to variables X, that is, μ_{ X }(?x) = μ(?x) if ?x ∈ X and μ_{ X }(?x) is undefined if ?x ∉ X. The evaluation [[Q]]_{ G } of a query Q of the form (4) is the set of all mappings μ_{ X } such that μ ∈ [[P]]_{ G }.
Finally, a solution to a query (or pattern) Q over G is a mapping μ such that μ ∈ [[Q]]_{ G }.
3 Weakly WellDesigned Patterns
We begin by recalling the notion of welldesigned patterns and then formulate our generalisation. For now, we focus on the fragment \(\mathcal {P}\) of U N I O Nfree patterns (also known as the A N D O P T F I L T E R fragment of SPARQL), leaving the operators U N I O N and S E L E C T for later sections.
Note that a given pattern can occur more than once within a larger pattern. In what follows we will sometimes need to distinguish between a (sub)pattern P as a possibly repeated building block of another pattern P ^{′} and its occurrences in P ^{′}, that is, unique subtrees in the parse tree. Then, the left (right) argument of an occurrence i is the subtree rooted in the left (right) child of the root of i in the parse tree, and an occurrence i is inside an occurrence j if the root of i is a descendant of the root of j.
Definition 1
(Pérez et al. [33]) A pattern P from \(\mathcal {P}\) is welldesigned (or wdpattern, for short) if for every occurrence i of an O P Tpattern P _{1} O P T P _{2} in P the variables from v a r s(P _{2}) ∖ v a r s(P _{1}) occur in P only inside (the labels of) i.
We write \(\mathcal {P}_{\text {wd}}\) for the fragment of wdpatterns. Such patterns comply with the basic intuition for optional matching in SPARQL: “do not reject (solutions) because some part of the query pattern does not match” [37]; indeed, our canonical use case (1) is clearly welldesigned. Evaluation of wdpatterns, that is, checking if μ ∈ [[P]]_{ G } for a mapping μ, graph G and pattern \(P\in \mathcal {P}_{\text {wd}}\), is coNPcomplete (in combined complexity), as opposed to PSpacecompleteness for \(\mathcal {P}\) [33, 38]. The high complexity of unrestricted patterns is partially due to the fact that unrestricted combinations of O P T and F I L T E R allow to express nesting of the difference operator D I F F with semantics [[P _{1} D I F F P _{2}]]_{ G } = [[P _{1}]]_{ G }∖[[P _{2}]]_{ G } (unless P _{1} or P _{2} are empty basic patterns, see [21] for details):
where ?x, ?y and ?z do not occur in v a r s(P _{1}) ∪ v a r s(P _{2}). This property is wellknown [2, 21, 33], and has been usually believed to be an important source of nonwelldesigned patterns in practice. We challenge this belief by answering differently the question on the prevalent structure of reallife queries beyond the welldesigned fragment. This question is not just of theoretical interest: as previous studies [34] show (and our analysis confirms), about half of queries with O P T asked over DBpedia are not welldesigned.
Next we discuss two sources of nonwelldesignedness in patterns as revealed by the example queries (2) and (3) in the introduction—one based on O P T and another one on F I L T E R.
 Source 1. :

There are two substantially different ways of nesting the O P T operator in patterns:
$$\begin{array}{@{}rcl@{}} P_{1}~ {{\mathsf{OPT}}}~ (P_{2}~ {{\mathsf{OPT}}}~ P_{3}), \end{array} $$(OptR)$$\begin{array}{@{}rcl@{}} (P_{1} ~{{\mathsf{OPT}}}~ P_{2})~{{\mathsf{OPT}}}~P_{3}. \end{array} $$(OptL)Nonwelldesigned nesting of type (OptR) is responsible for the PSpacehardness of query evaluation [33, 38]. Moreover, such nesting is not very intuitive unless welldesigned. On the contrary, as we saw in the introduction, nonwelldesigned nesting of type (OptL) can be used for prioritising some parts of patterns to others, and is indeed used in real life. As we will see later, nesting of type (OptL) cannot lead to high complexity of evaluation.
 Source 2. :

Welldesignedness can be violated by using “dangerous” variables from the right argument of O P T in filter constraints. In particular, patterns of the form (P _{1} O P T P _{2}) F I L T E R R with R using a variable from v a r s(P _{2}) ∖v a r s(P _{1}) are not welldesigned, but rather frequent in practice. However, such patterns almost never occur inside the right argument of other O P Tpatterns. We will see that if we restrict the usage of such filters to the “top level”, we preserve the good computational properties of wdpatterns.
Motivated by these observations, we considerably generalise the notion of wdpatterns to allow for useful queries like (2) and (3) while retaining important properties of such patterns. We start with two auxiliary notions.
Definition 2
Given a pattern P, an occurrence i _{1} in P dominates an occurrence i _{2} if there exists an occurrence j of an O P Tpattern such that i _{1} is inside the left argument of j and i _{2} is inside the right argument.
Definition 3
An occurrence i of a F I L T E Rpattern P ^{′} F I L T E R R in P is toplevel if there is no occurrence j of an O P Tpattern such that i is inside the right argument of j.
We are ready to give the main definition of this paper.
Definition 4
A pattern \(P \in \mathcal {P}\) is weakly welldesigned (or wwdpattern, for short) if, for each occurrence i of an O P Tsubpattern P _{1} O P T P _{2}, the variables in v a r s(P _{2})∖v a r s(P _{1}) appear outside i only in

subpatterns whose occurrences are dominated by i, and

constraints of toplevel occurrences of F I L T E Rpatterns.
We write \(\mathcal {P}_{\text {wwd}}\) for the fragment of wwdpatterns. They extend wdpatterns by allowing variables from the right argument of an O P Tsubpattern that are not “guarded” by the left argument to appear in certain positions outside of the subpattern. Note that the patterns of queries (2) and (3) are wwdpatterns. Also, patterns which allow only for O P T nesting of type (OptL) are always weakly welldesigned, same as the pattern on the right hand side of (5), which expresses D I F F. However, patterns that have subpatterns of the atter form in the right argument of O P T are not weakly welldesigned. Next we give a few more examples.
Example 1
Consider the following patterns and their parse trees in Fig. 2 (we write ?x ≠ ?y for ¬(?x = ?y)):
Pattern (6) is not welldesigned because of variable ?z, but is weakly welldesigned since the occurrence of (?y, c, ?z) dominates (?x, d, ?z). However, the similar pattern (7) is not weakly welldesigned because the occurrence of the inner O P Tpattern with the second occurrence of ?z does not dominate the first. Pattern (8) is weakly welldesigned since the F I L T E Rpattern (which is not dominated by the inner O P Tpattern) is toplevel, but pattern (9) is not, because of variable ?w in a nontoplevel F I L T E R.
Proposition 1
Checking whether a U N I O N free pattern P belongs to the fragment \(\mathcal {P}_{\text {wwd}}\) can be done in time O(P^{2}), where P is the length of the string representation of P.
Proof
First note that a U N I O Nfree pattern P is weakly welldesigned if and only if so is the pattern r m_t o p l e v e l_fi l t e r s(P), which is obtained from P by removing all toplevel occurrences of filters. The operation r m_t o p l e v e l_fi l t e r s can be implemented in linear time by the recursive procedure in Fig. 3a.
Next consider the recursive procedure i s_w w d in Fig. 3b, where s o r t(S) denotes a sorted, repetitionfree list representation of a set S.
Given a U N I O Nfree pattern P without toplevel filters, it is easily seen that i s_w w d(P) returns a tuple of the form (t r u e, v s, w s) if and only if P is weakly welldesigned, where ws is the sorted list of “unguarded” variables in P, that is, variables occurring in the second argument of an O P Tsubpattern P ^{′} of P but not in the first argument of P ^{′}, and v s = s o r t(v a r s(P))∖w s. Procedure i s_w w d can be implemented in quadratic time since s o r t (which may take time O(n log n)) is only applied to atomic subexpressions and set operations on sorted lists take linear time. □
4 OPTFILTERNormal Form and Constraint Pattern Trees
One of the key properties of wdpatterns is that they can always be converted to a socalled O P Tnormal form, in which all A N D and F I L T E Rsubpatterns are O P Tfree [33]. Also, F I L T E Rfree patterns in O P Tnormal form can be naturally represented as trees of a special form [27, 35], which give a good intuition for the evaluation and optimisation of such patterns. In this section, we show that these notions can be generalised to wwdpatterns.
Definition 5
A pattern \(P\in \mathcal {P}\) is in O P T F I L T E Rnormal form (or O Fnormal form for short) if it adheres to the grammar
where B ranges over basic patterns and R over filter constraints.
In other words, the parse tree of a pattern in O Fnormal form can be stratified as follows:

1.
(occurrences of) basic patterns as the bottom layer,

2.
a F I L T E R on top of each basic pattern as the middle layer,

3.
a combination of O P T and F I L T E R as the top layer;
moreover, each occurrence of a F I L T E Rpattern in the top layer is toplevel (according to Definition 3). Note that our normal form is A N Dfree: all conjunctions are expressed via basic patterns.
Example 2
None of the four patterns in Example 1 are in O Fnormal form. However, the first three of them can be easily normalised by replacing each triple t with t ^{⊤}, where P ^{⊤} is an abbreviation of P F I L T E R ⊤ for a pattern P. Also, compare the pattern
which is in O Fnormal form, with the very similar pattern
which is not, because the outer F I L T E R is in the right argument of the outermost O P T.
As shown by Letelier et al. [27], F I L T E Rfree patterns in O P Tnormal form can be represented by means of socalled pattern trees. We next show that this representation can be naturally extended to patterns in O Fnormal form.
Definition 6
Let P be a pattern in O Fnormal form. The constraint pattern tree (CPT) \(\mathcal {T}(P)\) of P is the directed, ordered, labelled, rooted tree recursively constructed as follows (in this definition we abuse notation and confuse patterns and their occurrences; strictly speaking, we create a fresh subtree for each occurrence, so the resulting object is always a tree):

1.
if B is a basic pattern then \(\mathcal {T}(B~ {{\mathsf {FILTER}}}~ R)\) is a single node v labelled by the pair (B, R);

2.
if P ^{′} is not a basic pattern then \(\mathcal {T}(P^{\prime }~ {{\mathsf {FILTER}}} ~R)\) is obtained by adding a special node labelled by R as the last child of the root of \(\mathcal {T}(P^{\prime })\);

3.
\(\mathcal {T}(P_{1}~{{\mathsf {OPT}}}~P_{2})\) is the tree obtained from \(\mathcal {T}(P_{1})\) and \(\mathcal {T}(P_{2})\) by adding the root of \(\mathcal {T}(P_{2})\) as the last child of the root of \(\mathcal {T}(P_{1})\).
By definition, there is a onetoone correspondence between patterns in O Fnormal form and CPTs. Hence, such trees can be seen as a convenient representation of patterns in O Fnormal form. Unlike parse trees, which represent the syntactic shape of patterns, CPTs show the semantic structure of O P T and F I L T E R nesting. Figure 4 shows how O P T nestings of types (OptR) and (OptL) are represented in both formats. Note that CPTs treat different F I L T E Rsubpatterns differently: if the filter is over a basic pattern, the constraint of the F I L T E R is paired with this pattern; however, if the filter is over an O P Tsubpattern, then the constraint is represented by a separate special node. Moreover, since in the second case the F I L T E Rpattern must be toplevel, special nodes can only occur in CPTs as children of the root. For instance, the CPT of the example pattern (10) is given in Fig. 5a.
Proposition 2
Let P be a pattern in O F normal form. Then every special node in \(\mathcal {T}(P)\) is a child of the root.
Proof
Let v be a special node in \(\mathcal {T}(P)\). Then v is obtained from a subpattern P ^{′} F I L T E R R where P ^{′} is not basic. Hence, by definition of the O Fnormal form, P must have the form
(for some n ≥ 0) where S _{1}, …, S _{ n } contain only F I L T E Rsubpatterns over basic patterns. Thus, the root of \(\mathcal {T}(P^{\prime })\) is also the root of \(\mathcal {T}(P)\), and the claim follows. □
Next we show that each wwdpattern can be converted to O Fnormal form and hence can be represented by a CPT. To prove this statement we make use of a number of equivalences. Formally, a pattern P _{1} is equivalent to a pattern P _{2} (written P _{1} ≡ P _{2}) if [[P _{1}]]_{ G } = [[P _{2}]]_{ G } holds for any graph G. There are several equivalences, such as associativity and commutativity of A N D, as well as filter decompositions, such as P F I L T E R (R _{1} ∧ R _{2}) ≡ (P F I L T E R R _{1}) F I L T E R R _{2}, which hold for all patterns (see [38] for an extensive list). Moreover, the key equivalences used in [33] for normalising wdpatterns can easily be adapted to serve our needs.
Proposition 3
Let P _{1}, P _{2}, P _{3} be patterns and R a filter constraint such that v a r s(P _{2}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{1}) and v a r s(P _{2}) ∩ v a r s(R) ⊆ v a r s(P _{1}). Then the following equivalences hold:
Proof
Both equivalences are essentially shown in [33]. While stated for welldesigned patterns, the proof only exploits the properties v a r s(P _{2}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{1}) and v a r s(P _{2}) ∩ v a r s(R) ⊆ v a r s(P _{1}), which are satisfied not only by welldesigned patterns, but also by weakly welldesigned patterns □
Since all the equivalences preserve weak welldesignedness, we obtain the desired result.
Proposition 4
Each wwdpattern P is equivalent to a wwdpattern in O F normal form of size O(P).
Proof
We call a pattern P prenormal if it adheres to the grammar that is the same as the one in Definition 5 except that the category F is extended as follows:
Given a pattern P, let P be the sum of the sizes of all A N Dsubpatterns and all F I L T E Rsubpatterns of P (where different occurrences of each pattern are counted separately). Consider a wwdpattern P that is not prenormal. Then P contains a subpattern P ^{′} of one of the following two forms (modulo commutativity of A N D): (P _{1} O P T P _{2}) A N D P _{3} and (P _{1} O P T P _{2}) F I L T E R R with P ^{′} not toplevel. In both cases we can rewrite P to a pattern S not increasing ⋅ and strictly decreasing ⋅ as follows.

Let P ^{′} = (P _{1} O P T P _{2}) A N D P _{3}. Since P is weakly welldesigned and the occurrence of P _{3} is not dominated by the occurrence of P _{1} O P T P _{2}, we have v a r s(P _{3}) ∩ v a r s(P _{2}) ⊆ v a r s(P _{1}). Therefore, using the first equivalence in Proposition 3, we can rewrite P to a pattern S by replacing P ^{′} with (P _{1} A N D P _{3}) O P T P _{2}. Moreover, we have P = S and P > S.

Let P ^{′} = (P _{1} O P T P _{2}) F I L T E R R where the occurrence of P ^{′} is not toplevel. Since P is weakly welldesigned, we then have v a r s(R) ∩ v a r s(P _{2}) ⊆ v a r s(P _{1}), and thus, with the second equivalence in Proposition 3, we can rewrite P to a pattern S by replacing P ^{′} with (P _{1} F I L T E R R) O P T P _{2}. Moreover, we have P = S and P > S.
Since this rewriting strictly decreases ⋅, its repeated application to P terminates and yields a prenormal pattern S equivalent to P with S = P.
Finally, S can be transformed to O Fnormal form by replacing every occurrence of an A N D F I L T E R combination of basic patterns by B F I L T E R R where B consists of all triples in the basic patterns and R is a conjunction of all the filter conditions (if there are no filters in the combination, then R is ⊤). Clearly, this transformation is equivalencepreserving and linear in S. □
Relying on this proposition, in the rest of the paper we silently assume that all wwdpatterns are in O Fnormal form and hence can be represented by CPTs.
We next transfer the notion of weak welldesignedness to CPTs. Given a pattern P in O Fnormal form, let ≺ be the strict topological sorting of the nodes in \(\mathcal {T}(P)\) computed by a depth first search traversal visiting the children of a node according to their ordering (i.e., v ≺ u holds if v is visited before u).
Lemma 1
Let P be a pattern in O F normal form and P ^{′} = P _{1} O P T P _{2} be a subpattern of P. Then v ≺ w for every two nodes v, w in \(\mathcal {T}(P)\) such that v is in the subtree of \(\mathcal {T}(P)\) corresponding to P _{1} and w is in the subtree corresponding to P _{2} .
Proof
The claim follows since \(\mathcal {T}(P^{\prime })\) is constructed by attaching \(\mathcal {T}(P_{2})\) as the last child to the root of \(\mathcal {T}(P_{1})\). □
In the following proposition, v a r s(u) for a node u of a CPT stands for the set of all variables in the label of u.
Proposition 5
A pattern P in O F normal form is weakly welldesigned if and only if, for each edge (v, u) with nonspecial u in the CPT \(\mathcal {T}(P)\) , every variable ?x ∈ v a r s(u) ∖ v a r s(v) occurs only in nodes w such that v ≺ w . The pattern is welldesigned if and only if for every variable ?x in P the set of all nodes v in \(\mathcal {T}(P)\) with ?x ∈ v a r s(v) is connected.
Proof
For the forward direction of the first statement, suppose P is weakly welldesigned. We proceed by induction on the structure of P and consider the following cases.

Let P = B F I L T E R R where B is basic. Then the claim is vacuous.

Let P = P _{1} F I L T E R R where P _{1} is not basic. By the inductive hypothesis, the claim holds for \(\mathcal {T}(P_{1})\). Moreover, \(\mathcal {T}(P)\) differs from \(\mathcal {T}(P_{1})\) only in the special node labelled with R, and the claim follows by Proposition 2.

Let P = P _{1} O P T P _{2}. By the inductive hypothesis, the claim holds for \(\mathcal {T}(P_{1})\) and \(\mathcal {T}(P_{2})\). Thus, by Lemma 1, it suffices to show that for every edge (v, u) in \(\mathcal {T}(P_{2})\) (with nonspecial u by definition), no variable ?x ∈ v a r s(u) ∖ v a r s(v) occurs in \(\mathcal {T}(P_{1})\). Suppose for contradiction that this property is violated for some (v, u) and ?x. Then P _{2} has a subpattern \(P^{\prime }=P^{\prime }_{1}~{{\mathsf {OPT}}}~P^{\prime }_{2}\) such that \(\mathcal {T}(P^{\prime }_{1})\) is a subtree of \(\mathcal {T}(P)\) rooted at v and \(\mathcal {T}(P^{\prime }_{2})\) is the complete subtree of \(\mathcal {T}(P)\) rooted at u. Moreover, ?x occurs in P _{1}, and thus outside P ^{′}. Since all F I L T E Rsubpatterns in P are safe, we can assume without loss of generality that the occurrence of ?x in P _{1} is not in a filter constraint. However, this contradicts the assumption that P is weakly welldesigned since the occurrence of ?x in P _{1} is not dominated by the occurrence of P ^{′}.
For the backward direction of the first claim, suppose P is not a wwdpattern. Then P has a subpattern \(P^{\prime }=P^{\prime }_{1}~{{\mathsf {OPT}}}~ P^{\prime }_{2}\), with v the root of \(\mathcal {T}(P^{\prime })\) in \(\mathcal {T}(P)\) and u the child of v corresponding to \(\mathcal {T}(P^{\prime }_{2})\), and a variable \(?x\in {\mathsf {vars}(P^{\prime }_{2})}\setminus {\mathsf {vars}(P^{\prime }_{1})}\) such that ?x ∈ v a r s(u) and, for some subpattern P _{1} O P T P _{2} of P, ?x occurs in P _{1} and P ^{′} occurs in P _{2}. Since \(?x\in {\mathsf {vars}(P^{\prime }_{2})}\setminus {\mathsf {vars}(P^{\prime }_{1})}\) and ?x ∈ v a r s(u), we have ?x ∈ v a r s(u) ∖v a r s(v). Thus, by Lemma 1, we have v ⊀ w, where w is a node in \(\mathcal {T}(P_{1})\) with an occurrence of ?x.
The second claim can be proved analogously. □
Note that if a pattern is F I L T E Rfree, its O Fnormal form coincides with the O P Tnormal form in [33] (modulo tautological filters), and its CPT is the pattern tree from [27, 35]. In fact, the second part of Proposition 5 generalises an observation from [27] to the case with filters. An important difference to pattern trees is that in our case the order of children of a node is semantically relevant since wwdpatterns do not satisfy the equivalence
This equivalence, established in [32], holds whenever (v a r s(P _{2}) ∩ v a r s(P _{3})) ⊆ v a r s(P _{1}), which is always the case for wdpatterns but not for wwdpatterns, as can be seen on query (2).
We conclude this section with a property that is unique to wwdpatterns: each wwdpattern is equivalent to a pattern whose corresponding CPT has depth one.
Definition 7
A pattern in \(\mathcal {P}\) is in depthone normal form if it has the structure
where B is a basic pattern and each o p _{ i } S _{ i }, 1 ≤ i ≤ n, is either O P T (B _{ i } F I L T E R R _{ i }) with B _{ i } a basic pattern and R _{ i } a filter constraint, or just F I L T E R R _{ i }.
To show that each wwdpattern can be brought to this form, we exploit the following observation in [33].
Lemma 2 (Pérez et al. [33])
Let P be a pattern in \(\mathcal {P}\) , G a graph, and μ _{1} , μ _{2} two mappings in [[P]]_{ G } . Then μ _{1} ∼ μ _{2} if and only if μ _{1} = μ _{2} .
This lemma allows us to prove the following crucial equivalence.
Proposition 6
For patterns P _{1} , P _{2} and P _{3} in \(\mathcal {P}\) with v a r s(P _{1}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{2})it holds that
Proof
We first show that any solution to the left hand side is also a solution to the right hand side. Let G be a graph and let μ ∈ [[P _{1} O P T (P _{2} O P T P _{3})]]_{ G }. We distinguish three cases.

Let μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }). Then, by Lemma 2, we have [[P _{2}]]_{ G } = [[P _{2}]]_{ G } ⋈ [[P _{2}]]_{ G }. Consequently, μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }), and the claim follows.

Let μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ∖ [[P _{3}]]_{ G }). Then μ = μ _{1} ∪ μ _{2} such that μ _{1} ∈ [[P _{1}]]_{ G }, μ _{2} ∈ [[P _{2}]]_{ G }, and for every μ _{3} ∈ [[P _{3}]]_{ G }, μ _{2} ≁ μ _{3}. Since every mapping in [[P _{2} A N D P _{3}]]_{ G } is an extension of some mapping in [[P _{3}]]_{ G }, no mapping in [[P _{2} A N D P _{3}]]_{ G } is compatible with μ _{2}, and hence with μ. Therefore, μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ∖ [[P _{2} A N D P _{3}]]_{ G }, and the claim follows.

Let μ ∈ [[P _{1}]]_{ G } ∖ [[P _{2} O P T P _{3}]]_{ G }. Then μ ∈ [[P _{1}]]_{ G } and is incompatible with any mapping in [[P _{2} O P T P _{3}]]_{ G }. Moreover, since v a r s(P _{1}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{2}), μ is incompatible with any mapping in [[P _{2}]]_{ G }, and consequently also with any mapping in [[P _{2} A N D P _{3}]]_{ G }. Therefore, μ ∈ ([[P _{1}]]_{ G } ∖ [[P _{2}]]_{ G }) ∖ [[P _{2} A N D P _{3}]]_{ G }, and the claim follows.
For the other direction, suppose μ ∈ [[(P _{1} O P T P _{2}) O P T (P _{2} A N D P _{3})]]_{ G }. We distinguish three cases.

Let μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }). Then, by Lemma 2, we have μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }), and the claim follows.

Let μ ∈ ([[P _{1}]]_{ G } ⋈ [[P _{2}]]_{ G }) ∖ ([[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }). Then μ = μ _{1} ∪ μ _{2} such that μ _{1} ∈ [[P _{1}]]_{ G }, μ _{2} ∈ [[P _{2}]]_{ G }, and μ is incompatible with every mapping in [[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G }. Since v a r s(P _{1}) ∩ v a r s(P _{3}) ⊆ v a r s(P _{2}), this implies that [[P _{2}]]_{ G } ⋈ [[P _{3}]]_{ G } is empty, that is, μ _{2} is incompatible with every mapping in [[P _{3}]]_{ G }. Therefore, μ _{2} ∈ [[P _{2}]]_{ G } ∖ [[P _{3}]]_{ G }, and thus μ ∈ [[P _{1}]]_{ G } ⋈ ([[P _{2}]]_{ G } ∖ [[P _{3}]]_{ G }). The claim follows.

Let μ ∈ [[P _{1}]]_{ G } ∖ [[P _{2}]]_{ G }. Since every mapping in [[P _{2} O P T P _{3}]]_{ G } extends a mapping in [[P _{2}]]_{ G }, we have that μ ∈ [[P _{1}]]_{ G } ∖ [[P _{2} O P T P _{3}]]_{ G }, and the claim follows.
□
Applied from left to right, equivalence (13) preserves weak welldesignedness (but not welldesignedness). Each such application transforms a weakly welldesigned O P T nesting of type (OptR) to a nesting of type (OptL), decreasing the depth of the CPT.
Corollary 1
Every wwdpattern is equivalent to a wwdpattern in depthone normal form.
For instance, pattern (10) is equivalent to the pattern
represented by the CPT in Fig. 5b. Such “flat” patterns are attractive in practice because of their regular structure. However, “flattening” a pattern can incur an exponential blowup in size. Hence, in the rest of the paper we consider arbitrary wwdpatterns in O Fnormal form rather than restricting our attention to depthonenormal patterns.
5 Evaluation of wwdPatterns
In this section, we look at the query answering problem for wwdpatterns and their extensions with union and projection. We show that in all three cases, complexity remains the same as for wdpatterns. To obtain these results, we develop several new techniques.
Formally, we look at the following decision problem for a given SPARQL fragment \(\mathcal {L}\).
It is known that \(\textsc {Eval}({\mathcal {U}})\) for general patterns \(\mathcal {U}\) is PSpacecomplete [33], and the result easily propagates to queries with projection (i.e., \(\mathcal {S}\)) [27]. For wdpatterns, the evaluation problem is coNPcomplete, and can be solved by exploiting the following idea of [27].
Suppose we are given a wdpattern P in O P Tnormal form (for simplicity, assume that P is F I L T E Rfree), a graph G, and a mapping μ. First, we look for a subtree of \(\mathcal {T}(P)\) that includes the root of \(\mathcal {T}(P)\), contains precisely the variables in d o m μ, and “matches” G under μ (i.e., images of all its triples under μ are contained in G). This is doable in polynomial time. If such a subtree does not exist, then μ cannot be a solution. Otherwise, the subtree witnesses that μ is a part of a solution to P. Finally, to verify that μ is a complete solution, we need to check that the subtree is maximal, that is, cannot be extended to any more nodes in \(\mathcal {T}(P)\) with a match in G. There are linearly many such nodes to check, and each check can be performed in coNP. So, the overall algorithm runs in coNP.
Inspired by this idea, we next show that the low evaluation complexity of wdpatterns transfers to wwdpatterns by developing a coNP algorithm for Eval(\({\mathcal {P}_{\text {wwd}}}\)).
Let P be a wwdpattern in O Fnormal form. An rsubtree of \(\mathcal {T}(P)\) is a subtree containing the root of \(\mathcal {T}(P)\) and all its special children. Every rsubtree \(\mathcal {T}(P^{\prime })\) of \(\mathcal {T}(P)\) is also a CPT representing a wwdpattern P ^{′} that can be obtained from P by dropping the right arguments of some O P Tsubpatterns (a transformation known from [33]). A child of an rsubtree \(\mathcal {T}(P^{\prime })\) of \(\mathcal {T}(P)\) is a node in \(\mathcal {T}(P)\) that is not contained in \(\mathcal {T}(P^{\prime })\) but whose parent is.
Definition 8
A mapping μ is a potential partial solution (or ppsolution for short) to a wwdpattern P over a graph G if there is an rsubtree \(\mathcal {T}(P^{\prime })\) of \(\mathcal {T}(P)\) such that d o m(μ) = v a r s(P ^{′}), μ(t r i p l e s(P ^{′})) ⊆ G, and μ ⊧ R for the constraint R of each ordinary node in \(\mathcal {T}(P^{\prime })\).
A ppsolution μ to P over G can be witnessed by several rsubtrees. However, the union of such rsubtrees is also a witness. Hence, there exists a unique maximal witnessing rsubtree, denoted \(\mathcal {T}(P_{\mu })\), with P _{ μ } being the corresponding wwdpattern.
Potential partial solutions generalise “partial solutions” as defined in [33] for wdpatterns. There, every “partial solution” is either a solution or can be extended to one. This is not the case for wwdpatterns. While every solution is clearly a ppsolution, not every ppsolution can be extended to a real one. Real solutions may not just extend ppsolutions by assigning previously undefined variables but can also override variable bindings established in some node v of \(\mathcal {T}(P_{\mu })\) by extending to a child of \(\mathcal {T}(P_{\mu })\) that precedes v according to the order ≺.
An additional complication is the presence of nonwelldesigned toplevel filters. Note that ppsolutions are only required to satisfy the constraints of ordinary nodes in the corresponding CPT, thus ignoring toplevel filters. Indeed, requiring ppsolutions to satisfy constraints of toplevel filters would be too strong since real solutions do not generally satisfy this property, as demonstrated by the following example.
Example 3
Consider the graph G = {(1, a, 1),(3, a, 3)} and wwdpattern
The mapping μ = {?x ↦ 1, ?y ↦ 3} is a solution to P over G, but μ ⊭ ¬b o u n d(?y).
We now present a characterisation of solutions for wwdpatterns in terms of ppsolutions that (a) takes into account that not every ppsolution can be extended to a real solution and (b) ensures correct treatment of nonwelldesigned toplevel filters. For this we need some more notation. Given a wwdpattern P, a node v in \(\mathcal {T}(P)\), a graph G, and a ppsolution μ to P over G, let μ_{ v } be the projection μ_{ X } of μ to the set X of all variables appearing in nodes u of \(\mathcal {T}(P_{\mu })\) such that u ≺ v. A mapping μ _{1} is subsumed by a mapping μ _{2} (written \(\mu _{1} \sqsubseteq \mu _{2}\)) if μ _{1} ∼ μ _{2} and d o m(μ _{1}) ⊆ d o m(μ _{2}) (this notion is from [5, 33]).
Intuitively, a ppsolution μ needs to satisfy two conditions to be a real solution to a wwdpattern P. First, μ_{ v } (as opposed to μ for wdpatterns) must be nonextendable to v for any child v of \(\mathcal {T}(P_{\mu })\). Indeed, if such an extension exists, then it is either possible to provide bindings for some variables that are undefined in μ, or some variables from d o m(μ) can be assigned different values of higher “priority” than the corresponding values in μ. Second, every toplevel filter R labelling a node s needs to be satisfied by μ_{ s }, which is precisely the part of μ bound by the subpattern of P that is paired with R in the F I L T E Rpattern. The following lemma formalises this intuition.
Lemma 3
A mapping μ is a solution to a wwdpattern P over a graph G if and only if

1.
μ is a ppsolution to P over G;

2.
for every child v of \(\mathcal {T}(P_{\mu })\) labelled with (B, R) there is no μ ^{′} such that \(\mu _{v} \sqsubseteq \mu ^{\prime }\) , μ ^{′} ⊧ R , and μ ^{′}(B) ⊆ G ;

3.
μ_{ s } ⊧ R for every special node s in \(\mathcal {T}(P)\) labelled with R.
Proof
In this proof we write \(\mathcal {T}_{v}\) for the complete subtree of a CPT \(\mathcal {T}\) rooted at a node v (i.e., the subtree over all the descendants of v including v itself) and \(\mathcal {T}_{\prec v}\) for the subtree of \(\mathcal {T}\) consisting of all nodes u such that u ≺ v.
For the forward direction, suppose μ is a solution to P over G. Clearly, μ is a ppsolution to P over G, so it suffices to show that conditions 2 and 3 hold.
For condition 2, assume for contradiction that v is a child of \(\mathcal {T}(P_{\mu })\) labelled with (B, R) and μ ^{′} a mapping such that \(\mu _{v}\sqsubseteq \mu ^{\prime }\), μ ^{′} ⊧ R, and μ ^{′}(B) ⊆ G. Moreover, without loss of generality, let d o m(μ ^{′}) = d o m(μ) ∪ v a r s(B). Let u be the parent of v in \(\mathcal {T}(P)\), and let \(\mathcal {T}\) be the largest subtree of \(\mathcal {T}(P)\) that is rooted at u and has v as the last child of u. Then . Moreover, since u is contained in \(\mathcal {T}(P_{\mu })\), there is a mapping \(\mu _{1}\sqsubseteq \mu \) such that \(\mu _{1}\in [{\kern 2.3pt}[ \mathcal {T} ]{\kern 2.3pt}]_{G}\). Since v is not contained in \(\mathcal {T}(P_{\mu })\), we have \(\mu _{1}\sqsubseteq \mu _{v}\) and, since \(\mathcal {T}(P_{\mu })\) is the largest rsubtree witnessing μ, μ _{1} is not compatible with any mapping in \([{\kern 2.3pt}[ \mathcal {T}_{v} ]{\kern 2.3pt}]_{G}\). On the other hand, μ ^{′} satisfies the label of v, and thus, since \(\mathcal {T}_{v}\) contains no toplevel filters, μ ^{′}_{ v a r s(v)} can be extended to a mapping of \(\mu ^{\prime \prime }\in [{\kern 2.3pt}[ \mathcal {T}_{v} ]{\kern 2.3pt}]_{G}\). Moreover, since P is weakly welldesigned, \({\mathsf {vars}(\mathcal {T}_{v})}\cap {\mathsf {dom}(\mu _{v})}\subseteq {\mathsf {vars}(v)}\), and hence d o m(μ ^{″}) ∩ d o m(μ _{1}) ⊆ d o m(μ ^{′}). Thus, since μ_{ v } is compatible with μ ^{′}, μ _{1} is compatible with μ ^{″}, in contradiction to the above observation that μ _{1} is not compatible with any mapping in \([{\kern 2.3pt}[ \mathcal {T}_{v} ]{\kern 2.3pt}]_{G}\).
For condition 3, let s be a special node in \(\mathcal {T}(P)\) labelled with R. Since μ is a solution to P, there is some μ _{1} ⊆ μ such that \(\mu _{1}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec s} ]{\kern 2.3pt}]_{G}\) and μ _{1} ⊧ R. Hence, it suffices to show that μ _{1} = μ_{ s }. Clearly, μ _{1} ⊆ μ_{ s } (as μ_{ s } is the largest mapping compatible with μ that can occur in \([{\kern 2.3pt}[ \mathcal {T}(P)_{\prec s} ]{\kern 2.3pt}]_{G}\)), so assume for contradiction that there is a variable ?x ∈ d o m(μ_{ s }) ∖ d o m(μ _{1}). Then there is a node in \(\mathcal {T}(P_{\mu _{s}})\cap \mathcal {T}(P)_{\prec s}\) that does not occur in \(\mathcal {T}(P_{\mu _{1}})\cap \mathcal {T}(P)_{\prec s}\). This yields a contradiction with \(\mu _{1}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec s} ]{\kern 2.3pt}]_{G}\) analogously to the case of condition 2.
For the backward direction, suppose that μ satisfies conditions 1–3. We show that μ ∈ [[P]]_{ G } by induction on the depth of \(\mathcal {T}(P_{\mu })\), that is, the maximal number of edges between the root and a leaf.
For the basis of the induction, let the depth of \(\mathcal {T}(P_{\mu })\) be 0, that is, the root v of \(\mathcal {T}(P)\) be the only node of \(\mathcal {T}(P_{\mu })\). We prove the claim by induction on the number n of children of v in \(\mathcal {T}(P)\). If n = 0, then P = B F I L T E R R for some basic pattern B and filter constraint R, and the claim follows since μ is a ppsolution to P over G. For the inductive step, suppose the claim holds for all wwdpatterns P ^{′} and mappings μ ^{′} satisfying 1–3 provided \(\mathcal {T}(P^{\prime }_{\mu ^{\prime }})\) has depth 0 and n − 1 children in \(\mathcal {T}(P^{\prime })\). Let P and μ be such that \(\mathcal {T}(P_{\mu })\) has depth 0 and n children in \(\mathcal {T}(P)\). Let u be the last child of (the root v of) \(\mathcal {T}(P_{\mu })\). Then μ_{ u } is a ppsolution to \(\mathcal {T}(P)_{\prec u}\) that satisfies conditions 2 and 3 since (μ_{ u })_{ w } = μ_{ w } for every w ≺ u. Hence, by the inductive hypothesis for the pattern corresponding to \(\mathcal {T}(P)_{\prec u}\) and the mapping μ_{ u }, we have \(\mu _{u}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec u} ]{\kern 2.3pt}]_{G}\). We distinguish two cases.

Let u be a special node labelled with R. Then it suffices to show that μ_{ u } ⊧ R, which is immediate since μ_{ u } satisfies condition 3.

Let u be an ordinary node labelled with (B, R). We know that u is not in \(\mathcal {T}(P_{\mu })\). Since v is in \(\mathcal {T}(P_{\mu })\), by condition 2 there is no mapping μ ^{′} such that (a) \(\mu _{u}\sqsubseteq \mu ^{\prime }\), (b) μ ^{′} ⊧ R, and (c) μ ^{′}(B) ⊆ G. Since R is safe, it follows that every mapping satisfying (b) and (c) is incompatible with μ_{ u }. Consequently, every mapping in \([{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\) is incompatible with μ_{ u }, and hence \(\mu =\mu _{u}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec u} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\), as required.
For the outer inductive step, let the claim hold for all P ^{′} and μ ^{′} with \(\mathcal {T}(P^{\prime }_{\mu ^{\prime }})\) of depth d − 1, for some d > 0. Once again, we show the claim for P and μ with \(\mathcal {T}(P_{\mu })\) of depth d by induction on the number n of children of the root v of \(\mathcal {T}(P)\). The basis is vacuous as v cannot have 0 children while \(\mathcal {T}(P_{\mu })\) has positive depth. The inductive step is the same as for depth 0, except that we have an additional case for the last child u of the root v.

Let u be an ordinary node labelled with (B ^{′}, R ^{′}) that is contained in \(\mathcal {T}(P_{\mu })\). Then μ = μ_{ u } ∪ μ _{2} where μ _{2} is the projection of μ to the set of variables occurring in the subtree \(\mathcal {T}\) of \(\mathcal {T}(P_{\mu })\) rooted at u (i.e., \(\mathcal {T}=\mathcal {T}(P_{\mu })_{u}\)). Since u is contained in \(\mathcal {T}(P_{\mu })\) and contains no special children, μ _{2} is a ppsolution to (the subpattern represented by) \(\mathcal {T}(P)_{u}\). Moreover, μ _{2} satisfies condition 3 with respect to \(\mathcal {T}(P)_{u}\) since \(\mathcal {T}(P)_{u}\) contains no special nodes. We next show that μ _{2} satisfies condition 2 with respect to \(\mathcal {T}(P)_{u}\). Let w be a child of \(\mathcal {T}\) (in \(\mathcal {T}(P)_{u}\)) labelled with (B, R), and assume for contradiction that there is some μ ^{′} such that \(\mu _{2}_{w}\sqsubseteq \mu ^{\prime }\), μ ^{′} ⊧ R, and μ ^{′}(B) ⊆ G. Without loss of generality, d o m(μ ^{′}) = d o m(μ _{2}) ∪ v a r s(B). Thus, since P is weakly welldesigned, v a r s(B) ∩ d o m(μ_{ u }) ⊆ v a r s(B ^{′}) ⊆ d o m(μ _{2}). Hence, μ ^{′} is compatible with μ_{ u }, and \(\mu _{w}\sqsubseteq \mu _{u}\cup \mu ^{\prime }\). Moreover, since μ ^{′} and μ_{ u } ∪ μ ^{′} coincide on v a r s(B) and R is safe, we have that μ_{ u } ∪ μ ^{′} ⊧ R and (μ_{ u } ∪ μ ^{′})(B) ⊆ G, contradicting the assumption for μ. Since μ _{2} satisfies conditions 1–3 with respect to \(\mathcal {T}(P)_{u}\), by the outer inductive hypothesis we obtain that \(\mu _{2}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\), and hence \(\mu \in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec u} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ \mathcal {T}(P)_{u} ]{\kern 2.3pt}]_{G}\) (as \(\mu _{u}\in [{\kern 2.3pt}[ \mathcal {T}(P)_{\prec u} ]{\kern 2.3pt}]_{G}\) holds by the inner inductive hypothesis). The claim follows.
□
Checking whether a mapping μ satisfies this characterisation is feasible in coNP, and the matching lower bound follows from the coNPhardness of evaluation of wdpatterns [33].
Theorem 1
Problem Eval \(({\mathcal {P}_{\text {wwd}}})\) is coNP complete.
Proof
The lower bound of this statement is known from [33], and the upper bound can be obtained from Lemma 3 as follows.
First we show that testing whether μ is a ppsolution takes polynomial time, same as computing the maximal witnessing tree \(\mathcal {T}(P_{\mu })\). We just proceed from the root of the tree down along the branches until we cannot find a match μ(t r i p l e s(v)) in G for the basic pattern in the child v which satisfies the condition in the node, and then check that the variables in the resulting tree are exactly v a r s(μ). So, the crucial part is to check that \(\mathcal {T}(P_{\mu })\) is not extendable to any of its children. But there are only linearly many children, and each check can be done in coNP. Finally, the checks for toplevel filters are again polynomial. □
Pérez et al. [33] extended wdpatterns to U N I O N by considering unions of wdpatterns, that is, patterns of the form P _{1} U N I O N … U N I O N P _{ n } with all \(P_{i}\in \mathcal {P}_{\text {wd}}\). We denote the resulting fragment by \(\mathcal {U}_{\text {wd}}\). This syntactic restriction on the use of U N I O N in \(\mathcal {U}_{\text {wd}}\) is motivated by the fact that any pattern in \(\mathcal {U}\) can be equivalently expressed as a union of U N I O Nfree patterns [33]. We denote the fragment of all queries over patterns in \(\mathcal {U}_{\text {wd}}\) by \(\mathcal {S}_{\text {wd}}\). Similarly, we write \(\mathcal {U}_{\text {wwd}}\) for unions of wwdpatterns and \(\mathcal {S}_{\text {wwd}}\) for queries over unions of wwdpatterns.
Analogously to the welldesigned case, Theorem 1 extends to fragments \(\mathcal {U}_{\text {wwd}}\) and \(\mathcal {S}_{\text {wwd}}\).
Corollary 2
Problem Eval ( \({\mathcal {U}_{\text {wwd}}}\) ) is coNP complete, and Eval ( \({\mathcal {S}_{\text {wwd}}}\) ) is \({{\Sigma }_{2}^{p}}\) complete.
The coNPalgorithm for \(\mathcal {U}_{\text {wwd}}\) is obtained simply by applying the algorithm for \(\mathcal {P}_{\text {wwd}}\) to each pattern in the union. Hardness for \(\mathcal {S}_{\text {wwd}}\) follows from the hardness of the welldesigned case [27], while for membership we just guess the values of the existential variables and then call a coNPoracle for \(\mathcal {U}_{\text {wwd}}\) on the resulting mapping and the normalised body of the query.
Hence, the complexity of evaluation for wwdpatterns is the same as for wdpatterns. We next show that wwdpatterns are, in a certain sense, a maximal extension of wdpatterns that preserves coNP evaluation complexity (under the usual complexitytheoretic assumptions).
The definition of weakly welldesigned patterns suggests two intuitive ways in which it could be relaxed. Given an occurrence i of an O P Tsubpattern P _{1} O P T P _{2}, one could allow variables in v a r s(P _{2}) ∖ v a r s(P _{1}) to occur in

some subpatterns whose occurrences are not dominated by i, or

constraints of some nontoplevel occurrences of F I L T E Rpatterns.
We next show that either relaxation immediately makes the evaluation problem \({{\Pi }_2^p}\)hard.
For the first relaxation, the arguably simplest special case would be to allow for some nonwelldesigned O P Tnesting of type (OptR). Consider the fragment \(\mathcal {P}_{\text {optr}}\) of patterns of the form B _{1} O P T (B _{2} O P T B _{3}), where B _{1}, B _{2} and B _{3} are basic patterns. Intuitively, \(\mathcal {P}_{\text {optr}}\) allows for the most simple form of nonwelldesigned nesting of type (OptR).
Theorem 2
Problem Eval \(({\mathcal {P}_{\text {optr}}})\) is \({{\Pi }_2^p}\) complete.
Proof
This theorem is a corollary of [38, Theorem 4] for their class \({\mathcal {E}}_{\leq 3}\), but without U N I O N. □
Now suppose we allow for some nonwelldesigned nontoplevel filters, as suggested by the second relaxation. As we will see next, even a very restricted fragment of patterns allowing for such filters is \({{\Pi }_2^p}\)complete. This implies that the requirement that special nodes be children of the root, while it may look somewhat adhoc, cannot be substantially relaxed. Consider the fragment \(\mathcal {P}_{\text {filter2}}\) of patterns of the form
where B _{1}, B _{2} and B _{3} are basic patterns such that v a r s(B _{3}) ∩ v a r s(B _{1}) ⊆ v a r s(B _{2}), and R is a filter constraint. Intuitively, \(\mathcal {P}_{\text {filter2}}\) allows for the simplest form of “secondlevel” filters.
Theorem 3
Problem Eval ( \({\mathcal {P}_{\text {filter2}}}\) ) is \({{\Pi }_2^p}\) complete.
Proof
This problem allows for a reduction from a restriction of Eval(\({\mathcal {P}_{\text {optr}}}\)). Indeed, from the proof of [38, Theorem 4] it follows that it is already \({{\Pi }_2^p}\)hard to check whether μ ∈ [[P]]_{ G } for P of the form B _{1} O P T (B _{2} O P T B _{3}) with d o m(μ) = v a r s(B _{1}) and v a r s(B _{2}) ∖ v a r s(B _{1}) ≠ ∅. Let P and μ be such a pattern and such a mapping, respectively. Consider the pattern
with \(B^{\prime }_{3}\) a basic pattern obtained from B _{3} by replacing all the variables ?x _{1}, …, ?x _{ n } in (v a r s(B _{3}) ∩ v a r s(B _{1})) ∖v a r s(B _{2}) by their fresh copies \(?x_{1}^{\prime }, \ldots , ?x_{n}^{\prime }\) (if no such variables exist, that is, if the original pattern is welldesigned, we just set R to ⊤). Clearly, \(P^{\prime }\in \mathcal {P}_{\text {filter2}}\), so it suffices to show, for every G and μ with d o m(μ) = v a r s(B _{1}), that μ ∈ [[P]]_{ G } if and only if μ ∈ [[P ^{′}]]_{ G }.
For the forward direction, suppose d o m(μ) = v a r s(B _{1}) and μ ∈ [[P]]_{ G }. Since v a r s(B _{2}) ∖v a r s(B _{1}) ≠ ∅, we must have μ ∈ [[B _{1}]]_{ G } ∖ [[B _{2} O P T B _{3}]]_{ G }. Thus, μ ∈ [[B _{1}]]_{ G } and for every μ ^{′}∈ [[B _{2} O P T B _{3}]]_{ G } we have μ ≁ μ ^{′}. Since μ ∈ [[B _{1}]]_{ G }, to show μ ∈ [[P ^{′}]]_{ G } it suffices to verify that μ is not compatible with any \(\mu ^{\prime }\in [{\kern 2.3pt}[ {((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~B^{\prime }_{3}) ~{{\mathsf {FILTER}}}~R}]{\kern 2.3pt}]_{G}\), for which we distinguish the following two cases.

If \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}\cup B_{1}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G}\), then \(\mu ^{\prime }\models (?x^{\prime }_{1} = {}?x_{1}) \wedge {\dots } \wedge (?x^{\prime }_{n} = {} ?x_{n})\). Hence, \(\mu ^{\prime }_{{\mathsf {vars}(B_{2})}\cup {\mathsf {vars}(B_{3})}}\in [{\kern 2.3pt}[ {B_{2}~{{\mathsf {OPT}}}~B_{3}} ]{\kern 2.3pt}]_{G}\). Consequently, by assumption, \(\mu \not \sim \mu ^{\prime }_{{\mathsf {vars}(B_{2})}\cup {\mathsf {vars}(B_{3})}}\), and thus μ ≁ μ ^{′}.

If \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}\cup B_{1}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G}\), then \(\mu ^{\prime }_{{\mathsf {vars}(B_{2})}}\in [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G}=[{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B_{3}} ]{\kern 2.3pt}]_{G}\subseteq [{\kern 2.3pt}[ {B_{2}}~{{\mathsf {OPT}}}~B_{3}]{\kern 2.3pt}]_{G}\). Therefore, by assumption, \(\mu \not \sim \mu ^{\prime }_{{\mathsf {vars}(B_{2})}}\), and hence μ ≁ μ ^{′}.
For the backward direction, suppose d o m(μ) = v a r s(B _{1}) and \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G}\). Again, since v a r s(B _{2}) ∖v a r s(B _{1}) ≠ ∅, we have \(\mu \in [{\kern 2.3pt}[ {B_{1}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ ((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~B^{\prime }_{3})\) F I L T E R R]]_{ G }. Thus, μ ∈ [[B _{1}]]_{ G } and μ ≁ μ ^{′} for every \(\mu ^{\prime }\in [{\kern 2.3pt}[ ((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~B^{\prime }_{3})\) F I L T E R R]]_{ G }. Since μ ∈ [[B _{1}]]_{ G }, it follows that \([{\kern 2.3pt}[ {((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~B^{\prime }_{3})~{{\mathsf {FILTER}}}~R} ]{\kern 2.3pt}]_{G} =\emptyset \). To show μ ∈ [[P]]_{ G } it suffices to verify that μ is not compatible with any \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}\,{{\mathsf {OPT}}}\, B_{3}} ]{\kern 2.3pt}]_{G}\). Assume for the sake of contradiction that this is not the case and there is a compatible μ ^{′}. We distinguish the following two cases.

Suppose \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B_{3}} ]{\kern 2.3pt}]_{G}\). Then there is some μ ^{″} such that \(\mu \cup \mu ^{\prime }\cup \mu ^{\prime \prime }\in [{\kern 2.3pt}[ {B_{1}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\Join [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G} \) and \(\mu \cup \mu ^{\prime }\cup \mu ^{\prime \prime }\models (?x^{\prime }_{1} = {} ?x_{1}) \wedge {\dots } \wedge (?x^{\prime }_{n} = {} ?x_{n})\). Thus, \(\mu \cup \mu ^{\prime }\cup \mu ^{\prime \prime }\in [{\kern 2.3pt}[ {((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~ B^{\prime }_{3})~{{\mathsf {FILTER}}}~R} ]{\kern 2.3pt}]_{G} \), which is a contradiction.

Suppose \(\mu ^{\prime }\in [{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B_{3}} ]{\kern 2.3pt}]_{G}=[{\kern 2.3pt}[ {B_{2}} ]{\kern 2.3pt}]_{G}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G} \). Then \(\mu \cup \mu ^{\prime }\in [{\kern 2.3pt}[ {B_{1}} ]{\kern 2.3pt}]_{G}\cup B_{2}\setminus [{\kern 2.3pt}[ {B^{\prime }_{3}} ]{\kern 2.3pt}]_{G} \) and \(\mu \cup \mu ^{\prime }\models \neg bound(?x^{\prime }_{1})\), and hence \(\mu \cup \mu ^{\prime }\in [{\kern 2.3pt}[ {((B_{2} \cup B_{1})~{{\mathsf {OPT}}}~B^{\prime }_{3})~\mathsf {FILTER}~R} ]{\kern 2.3pt}]_{G} \), which is again a contradiction.
□
Theorems 2 and 3 suggest that \(\mathcal {P}_{\text {wwd}}\) is a maximal fragment of \(\mathcal {P}\) that does not impose structural restrictions on basic patterns or filter constraints and has a coNP evaluation algorithm (assuming \(\text {\textsc {coNP}} \neq {{\Pi }_2^p}\)). Hence, going beyond wwdpatterns while preserving good computational properties requires more refined restrictions, possibly in the spirit of [27, Section 4].
6 Expressivity of wwdPatterns and Their Extensions
In this section, we analyse the expressive power of our fragments.
Definition 9
A language \(\mathcal {L}_{1}\) is strictly less expressive than a language \(\mathcal {L}_{2}\) (written \(\mathcal {L}_{1}<\mathcal {L}_{2}\)) if for every query Q _{1} in \(\mathcal {L}_{1}\) there is a query Q _{2} in \(\mathcal {L}_{2}\) such that Q _{1} ≡ Q _{2}, and there is a query Q _{2} in \(\mathcal {L}_{2}\) such that Q _{1} ≢ Q _{2} for every query Q _{1} in \(\mathcal {L}_{1}\).
We begin with UNIONfree patterns, establishing that \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}<\mathcal {P}\), and then proceed to unions, showing that \(\mathcal {U}_{\text {wd}}<\mathcal {U}_{\text {wwd}}<\mathcal {U}\), and queries, showing that \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}<\mathcal {S}\).
Following [5, 33], a set of mappings Ω_{1} is subsumed by a set of mappings Ω_{2} (written \({\Omega }_{1} \sqsubseteq {\Omega }_{2}\)) if for every μ _{1} ∈ Ω_{1} there exists a mapping μ _{2} ∈ Ω_{2} such that \(\mu _{1} \sqsubseteq \mu _{2}\). A query Q is weakly monotone if \([{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{1}} \sqsubseteq [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{2}}\) for any two graphs G _{1} and G _{2} with G _{1} ⊆ G _{2}, and a fragment \(\mathcal {L}\) is weakly monotone if it contains only weakly monotone queries. Arenas and Pérez [5] showed that, unlike \(\mathcal {P}\), the fragment \(\mathcal {P}_{\text {wd}}\) is weakly monotone, and hence \(\mathcal {P}_{\text {wd}}<\mathcal {P}\).
Example 4
(Pérez et al. [33]) Consider the nonwelldesigned pattern
as well as graphs G _{1} = {(1, a, 1), (2, a, 2)} and G _{2} = G _{1} ∪ {(3, a, 3)}. Then μ _{1} = {?x ↦ 1, ?y ↦ 2} is the only mapping in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\) while μ _{2} = {?x↦1} is the only mapping in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\). Hence \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\not \sqsubseteq [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\), meaning that P is not weakly monotone.
Analogously, we show that \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}\) by observing that \(\mathcal {P}_{\text {wwd}}\) is not weakly monotone.
Proposition 7
Fragment \(\mathcal {P}_{\text {wwd}}\) is not weakly monotone.
Proof
Consider a wwdpattern
as well as graphs G _{1} = {(1, a,1),(3, a,3)} and G _{2} = G _{1} ∪{(2, a,2)}. Then \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}=\{\{?x\mapsto 1,?y\mapsto 3\}\}\not \sqsubseteq \{\{?x\mapsto 1,?y\mapsto 2\}\}=[{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). □
An alternative proof of \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}\) can be obtained by adapting Theorem 3.5 in [6], which exhibits a weakly welldesigned, weakly monotone pattern that is not equivalent to any welldesigned pattern.
To distinguish \(\mathcal {P}_{\text {wwd}}\) from \(\mathcal {P}\) we need a different property.
Definition 10
A query Q is nonreducing if for any two graphs G _{1}, G _{2} such that G _{1} ⊆ G _{2} and any mapping \(\mu _{1} \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{1}}\) there is no \(\mu _{2} \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{2}}\) such that \(\mu _{2} \sqsubset \mu _{1}\) (i.e., \(\mu _{2} \sqsubseteq \mu _{1}\) and μ _{2} ≠ μ _{1}). A fragment is nonreducing if it contains only nonreducing queries.
Intuitively, for a nonreducing query extending a graph cannot result in a previously bound answer variable becoming unbound. All weakly monotone queries are nonreducing but not vice versa. Moreover, all wwdpatterns are nonreducing.
Proposition 8
Fragment \(\mathcal {P}_{\text {wwd}}\) is nonreducing.
Proof
Let \(P\in \mathcal {P}_{\text {wwd}}\) and let G _{1}, G _{2} be two graphs such that G _{1} ⊆ G _{2}. We show that \(\mu _{2}\not \sqsubset \mu _{1}\) for any \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\) by induction on the structure of P, proving, in parallel, that if all filters in P are over basic patterns, then for every mapping \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\) there is a mapping \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\) such that μ _{1}_{ v a r s(v)} = μ _{2}_{ v a r s(v)} for v the root of \(\mathcal {T}(P)\).
For the base case, suppose P = B F I L T E R R for some basic pattern B and filter constraint R. Then, P is monotone in the sense of [5], that is, satisfies \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\subseteq [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). Moreover, P contains no OPT, and hence every two distinct mappings in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) have the same domain and are thus incompatible. These facts imply both claims.
For the inductive step, suppose first that P = P _{1} O P T P _{2} and both claims hold for P _{1} and P _{2}. Let \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\). We first prove that \(\mu _{2}\not \sqsubset \mu _{1}\) for any \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). We distinguish two cases.

Let \(\mu _{1}={\mu _{1}^{1}}{\cup \mu _{1}^{2}}\) where \({\mu _{1}^{i}}\in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{1}}\). Assume for contradiction that \(\mu _{2}\sqsubset \mu _{1}\) for some \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). We begin by showing that μ _{2} must be of the form \({\mu _{2}^{1}}{\cup \mu _{2}^{2}}\) where \({\mu _{2}^{i}}\in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{2}}\), for which it suffices to show that \(\mu _{2}_{{\mathsf {vars}(P_{1})}}\) is compatible with some mapping in \([{\kern 2.3pt}[ P_{2} ]{\kern 2.3pt}]_{G_{2}}\). On the one hand, since \(\mu _{2}\sqsubset \mu _{1}\), \({\mu _{1}^{2}}\) is compatible with \(\mu _{2}_{{\mathsf {vars}(P_{1})}}\). On the other hand, since all filters in P _{2} are over basic patterns, the inductive hypothesis tells us that \([{\kern 2.3pt}[ P_{2} ]{\kern 2.3pt}]_{G_{2}}\) contains a mapping μ ^{′} that coincides with \({\mu _{1}^{2}}\) on the set of variables X in the root of \(\mathcal {T}(P_{2})\); moreover, since P is weakly welldesigned, d o m(μ ^{′}) ∩ v a r s(P _{1}) ⊆ X, and hence μ ^{′} is compatible with \(\mu _{2}_{{\mathsf {vars}(P_{1})}}\). Thus, \(\mu _{2}={\mu _{2}^{1}}{\cup \mu _{2}^{2}}\) where \({\mu _{2}^{i}}\in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{2}}\). Then, however, we must have that \({\mu _{2}^{1}}{\sqsubset \mu _{1}^{1}}\) or \({\mu _{2}^{2}}{\sqsubset \mu _{1}^{2}}\), contradicting the inductive hypothesis for P _{1} or P _{2}, respectively.

Let \(\mu _{1}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{1}}\setminus [{\kern 2.3pt}[ P_{2} ]{\kern 2.3pt}]_{G_{1}}\), and let μ _{2} be an arbitrary mapping in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). Then μ _{2} extends some \(\mu ^{\prime }\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{2}}\). By the inductive hypothesis for claim 2, we have that \(\mu ^{\prime }\not \sqsubset \mu _{1}\), and hence \(\mu _{2}\not \sqsubset \mu _{1}\).
Suppose now that all filters in P are over basic patterns. We need to prove that there is \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\) such that μ _{1}_{ v a r s(v)} = μ _{2}_{ v a r s(v)}. We know that μ _{1} extends some \(\mu ^{\prime }_{1}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{1}}\). Thus, by the inductive hypothesis, there is some \(\mu ^{\prime }_{2}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{2}}\) that coincides with \(\mu ^{\prime }_{1}\) on the variables in the root of \(\mathcal {T}(P_{1})\). The claim follows since \(\mu ^{\prime }_{2}\) can be extended to a mapping μ _{2} for P that coincides with \(\mu ^{\prime }_{2}\) on the variables in the root of \(\mathcal {T}(P_{1})\), and, by construction, the root of \(\mathcal {T}(P_{1})\) and the root of \(\mathcal {T}(P)\) have the same label.
Consider now the inductive step for the case when P = P _{1} F I L T E R R. Since P _{1} is not a basic pattern, we only need to show that \(\mu _{2}\not \sqsubset \mu _{1}\) for any \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{2}}}\). This holds by the inductive hypothesis, because \(\mu _{1}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{1}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P_{1} ]{\kern 2.3pt}]_{G_{2}}\) for any such μ _{1} and μ _{2}. □
In contrast to Proposition 8, patterns in \(\mathcal {P}\) do not generally satisfy nonreducibility. For instance, consider again pattern P, graphs G _{1}, G _{2}, and mappings μ _{1}, μ _{2} from Example 4. Pattern P is not nonreducing since \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) but \(\mu _{2}\sqsubset \mu _{1}\). Therefore, we have the following theorem.
Theorem 4
It holds that \(\mathcal {P}_{\text {wd}}<\mathcal {P}_{\text {wwd}}<\mathcal {P}\) .
We next compare \(\mathcal {U}_{\text {wwd}}\) to \(\mathcal {U}_{\text {wd}}\) and \(\mathcal {U}\), as well as \(\mathcal {S}_{\text {wwd}}\) to \(\mathcal {S}_{\text {wd}}\) and \(\mathcal {S}\) (note that neither UNION nor projection via S E L E C Tcan be expressed by means of the other operators [40], so adding either construct makes each fragment strictly more expressive). It is easily seen that \(\mathcal {U}_{\text {wd}}\) and \(\mathcal {S}_{\text {wd}}\) inherit weak monotonicity from \(\mathcal {P}_{\text {wd}}\) [27, 33], and hence \(\mathcal {U}_{\text {wd}}<\mathcal {U}_{\text {wwd}}\) and \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}\). Nonreducibility, however, does not propagate to unions.
Example 5
Consider the following \(\mathcal {U}_{\text {wd}}\)pattern with G _{1}, G _{2} and μ _{1}, μ _{2} from Example 4:
We have \(\mu _{1}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\) and \(\mu _{2}\in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) but \(\mu _{2}\sqsubset \mu _{1}\), which is due to the fact that μ _{2} is already contained in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{1}}\) along with μ _{1}. This is only possible in the presence of UNION since all mappings in the evaluation of a UNIONfree pattern are mutually nonsubsuming (see Lemma 2).
Thus, to account for UNION, we introduce the following, more delicate property.
Definition 11
A query Q is extensionwitnessing (ewitnessing) if for any two graphs G _{1} ⊆ G _{2} and mapping \(\mu \!\in \![{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{2}}\) such that \(\mu \!\notin \![{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{G_{1}}\!\) there is a triple t in Q such that v a r s(t) ⊆ d o m(μ) and μ(t) ∈ G _{2} ∖ G _{1}. A fragment is ewitnessing if so are all of its queries.
Informally, a query Q is ewitnessing if whenever an extension of a graph leads to a new answer, this answer is justified by a triple pattern in Q which maps to the extension. Unions of wwdpatterns can be shown ewitnessing.
Proposition 9
Fragment \(\mathcal {U}_{\text {wwd}}\) is ewitnessing.
Proof
Let \(P\in \mathcal {U}_{\text {wwd}}\) and let G _{1}, G _{2} be graphs such that G _{1} ⊆ G _{2}. Let μ be a mapping in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\) but not in \([{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\). We show that there is some t ∈ t r i p l e s(P) such that μ(t) ∈ G _{2} ∖ G _{1}.
Since P is a union of wwdpatterns, there is some wwdpattern P ^{′} in the union such that \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\). It suffices to show \(\mu (\mathsf {triples}(P^{\prime }_{\mu }))\cap (G_{2}\setminus G_{1})\ne \emptyset \), where \(P^{\prime }_{\mu }\) is the pattern corresponding to the maximal rsubtree of P witnessing μ in G _{2} (i.e., the part of P in the image of μ, see Definition 8). We know that \(\mu (\mathsf {triples}(P^{\prime }_{\mu })) \subseteq G_{2}\). Assume, for contradiction, that \(\mu (\mathsf {triples}(P^{\prime }_{\mu }))\subseteq G_{1}\). Then μ is a ppsolution to P ^{′} over G _{1}. We next show that μ is a real solution to P ^{′} over G _{1}. By Lemma 3, it suffices to show that (a) for any child u of \(\mathcal {T}(P^{\prime }_{\mu })\) labelled with (B, R), there is no mapping μ ^{′} such that \(\mu _{u}\sqsubseteq \mu ^{\prime }\), μ ^{′} ⊧ R, and μ ^{′}(B) ⊆ G _{1}, and (b) μ_{ s } ⊧ R for any special node s in \(\mathcal {T}(P^{\prime })\) labelled with R. Claim (a) holds since \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\) and G _{1} ⊆ G _{2} while (b) holds since \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\) and the claim does not depend on the graph over which the evaluation is computed. Consequently, \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{1}}\), and hence \(\mu \in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{{G_{1}}}\), in contradiction to the assumption. □
On the other hand, \(\mathcal {U}\) is not ewitnessing, as can be seen on the pattern and graphs in Example 4. Hence, we obtain the following theorem.
Theorem 5
It holds that \(\mathcal {U}_{\text {wd}}<\mathcal {U}_{\text {wwd}}<\mathcal {U}\) .
Next we move to the fragments that allow for projection. As already mentioned, we have \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}\) since \(\mathcal {S}_{\text {wd}}\) is weakly monotone while \(\mathcal {S}_{\text {wwd}}\) is not. However, \(\mathcal {S}_{\text {wwd}}\) is not ewitnessing, so we cannot apply the technique of Theorem 5 to establish \(\mathcal {S}_{\text {wwd}}<\mathcal {S}\); instead, we make use of the following lemma.
Lemma 4
Let Q be a query in \(\mathcal {S}_{\text {wwd}}\) and G be a graph. For every graph G _{1} with G ⊆ G _{1} and every \(\mu \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{{G_{1}}}\) , there is a graph G _{2} with G ⊆ G _{2} such that \(\mu \in [{\kern 2.3pt}[ Q ]{\kern 2.3pt}]_{{G_{2}}}\) and G _{2} ≤ G + t r i p l e s(Q).
Proof
Let Q = S E L E C T X W H E R E P, for P a union of wwdpatterns, and let G, G _{1} and μ be as required. Then there is a wwdpattern P ^{′} in the union P such that \(\mu ^{\prime }\in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{1}}\) for some μ ^{′} with μ ^{′}_{ X } = μ. Let \(G_{2}=G\cup \mu ^{\prime }(\mathsf {triples}(P^{\prime }_{\mu ^{\prime }}))\). Clearly, G _{2} ≤ G + t r i p l e s(Q), so it suffices to show that \(\mu ^{\prime }\in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{2}}\).
By construction, μ ^{′} is a ppsolution to P ^{′} over G _{2}. Moreover, since μ ^{′} is a solution to P ^{′} over G _{1}, we have that μ_{ s } ⊧ R for every special node s in \(\mathcal {T}(P^{\prime })\) labelled with R. Finally, suppose for contradiction that there is a child v of \(\mathcal {T}(P^{\prime }_{\mu ^{\prime }})\) labelled with (B, R) and a mapping μ ^{″} such that \(\mu ^{\prime }_{v}\sqsubseteq \mu ^{\prime \prime }\), μ ^{″} ⊧ R, and μ ^{″}(B) ⊆ G _{2}. However, since G _{2} ⊆ G _{1}, we then have μ ^{″}(B) ⊆ G _{1}, which contradicts the fact that \(\mu ^{\prime }\in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G_{1}}\). □
This lemma is the base of the last result of the section.
Theorem 6
It holds that \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}<\mathcal {S}\) .
Proof
As observed before, the inclusion \(\mathcal {S}_{\text {wd}}<\mathcal {S}_{\text {wwd}}\) holds since \(\mathcal {S}_{\text {wd}}\) is weakly monotone [27, 33] and \(\mathcal {S}_{\text {wwd}}\) is not.
As for the second inclusion, consider the family of graphs
for pairwise distinct IRIs a, b, c, d _{1}, …, d _{ n }, and the query
By equivalence (5) in Section 3, the operator D I F F can be expressed via O P T, A N D and F I L T E R, so we can assume that \(Q \in \mathcal {S}\). On the other hand, it is easily seen that \(Q\notin \mathcal {S}_{\text {wwd}}\). The mapping μ = {?x ↦ a} is an answer to Q over G _{1} but not an answer over any G _{ n } with n ≥ 2. Moreover, it is easily seen that any extension G of G _{ n } such that μ ∈ [[Q]]_{ G } requires the addition of at least n − 1 triples, namely {(d _{2}, c, c), …, (d _{ n }, c, c)}. Consequently, μ ∈ [[Q]]_{ G } implies G ≥ G _{1} + n − 1.
Suppose for contradiction there is a query Q ^{′} in \(\mathcal {S}_{\text {wwd}}\) such that Q ^{′} ≡ Q. Let n = t r i p l e s(Q ^{′}) + 2. Then, by Lemma 4, μ ∈ [[Q ^{′}]]_{ G } for some G with G ≤ G _{ n } + t r i p l e s(Q ^{′}) = G _{ n } + n − 2, which contradicts the above observation for Q. □
7 Static Analysis of wwdPatterns
In this section, we look at the general static analysis problems of query equivalence and containment. Formally, equivalence for a language \(\mathcal {L}\) is defined as follows.
This problem is commonly generalised to \(\textsc {Containment}({\mathcal {L}})\), in which one checks whether Q is contained in Q ^{′}, written Q ⊆ Q ^{′}, that is, whether [[Q]]_{ G } ⊆ [[Q ^{′}]]_{ G } holds for every graph G. We have Q ≡ Q ^{′} if and only if Q and Q ^{′} contain each other.
These problems have been studied for F I L T E Rfree wdpatterns in [27, 35], establishing NPcompleteness of equivalence and containment. Moreover, both problems are \({{\Pi }_{2}^{p}}\)complete for unions of F I L T E Rfree wdpatterns, and undecidable for fragments with projection. Finally, from the results in [41] it follows that containment is undecidable for \(\mathcal {U}\). On the other hand, nothing seems to be known so far for welldesigned patterns with F I L T E R.
We next show that equivalence and containment are both \({{\Pi }_2^p}\)complete for \(\mathcal {P}_{\text {wwd}}\) and \(\mathcal {U}_{\text {wwd}}\) (whereas they are undecidable for \(\mathcal {S}_{\text {wwd}}\) by the results in [35]). As the following lemma shows, the upper bound for containment follows from a small counterexample property: if P ⊈ P ^{′} for some P and P ^{′} from \(\mathcal {U}_{\text {wwd}}\), then there is a witnessing mapping and graph of size O(P + P ^{′}). Given this property, a \({{\Pi }_2^p}\) algorithm for containment is straightforward—we guess a mapping μ and a graph G of linear size, check that μ ∉ [[P ^{′}]]_{ G }, and then call a coNP oracle for checking μ ∈ [[P]]_{ G }. As a corollary, Equivalence(\({\mathcal {U}_{\text {wwd}}}\)) is also in \({{\Pi }_{2}^{p}}\).
Lemma 5
Let P and P ^{′} be two patterns from \(\mathcal {U}_{\text {wwd}}\) . If P ⊈ P ^{′} then there exists a mapping μ and a graph G of size O(t r i p l e s(P) + t r i p l e s(P ^{′})) such that μ ∈ [[P]]_{ G } but μ ∉ [[P ^{′}]]_{ G } .
Proof
Without loss of generality, let us assume that
where all P _{ i } and \(P^{\prime }_{j}\) are wwdpatterns in O Fnormal form.
Since P ⊈ P ^{′}, there exists a graph G ^{′}, a mapping μ, and a pattern P _{ i }, 1 ≤ i ≤ n, such that \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G^{\prime }}\), but \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }}\) for every \(P^{\prime }_{j}\). Hence, μ is a ppsolution to P _{ i } over G ^{′} with corresponding rsubtree \(\mathcal {T}((P_{i})_{\mu })\) of the CPT \(\mathcal {T}(P_{i})\). Let G _{0} = μ(t r i p l e s((P _{ i })_{ μ })). By construction, we have that G _{0} ⊆ G ^{′} and G _{0} ≤ t r i p l e s((P _{ i })_{ μ }) ≤ t r i p l e s(P _{ i }) ≤ t r i p l e s(P). Moreover, \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{0}}\), because all the matches and constraints, including the ones on the toplevel, stay unchanged. In fact, \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G^{\prime \prime }}\) for any G ^{″} such that G _{0} ⊆ G ^{″}⊆ G ^{′}.
If \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{0}}\) for every j, then G _{0} satisfies all the properties required from G. Otherwise, there exists \(P^{\prime }_{j}\) among \(P^{\prime }_{1}, \ldots , P^{\prime }_{m}\) such that \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{0}}\). Since G _{0} ⊆ G ^{′}, μ is a ppsolution to \(P^{\prime }_{j}\) over G ^{′}. Consider the corresponding pattern \((P^{\prime }_{j})_{\mu }\) (i.e., the maximal pattern witnessing μ in G ^{′} obtained from \(P^{\prime }_{j}\) by dropping the right arguments of some O P T operators), the rsubtree \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) of the CPT \(\mathcal {T}(P^{\prime }_{j})\), and the “image” \(\mu (\mathsf {triples}((P^{\prime }_{j})_{\mu }))\). Note that we may have \(\mu (\mathsf {triples}((P^{\prime }_{j})_{\mu })) \subseteq G_{0}\) or not: the latter is possible because the maximal rsubtree of \(\mathcal {T}(P^{\prime }_{j})\) witnessing μ in G _{0} may be different from \(\mathcal {T}((P^{\prime }_{j})_{\mu })\), which is maximal in G ^{′}. Let \(G^{\prime }_{1} = G_{0} \cup \mu (\mathsf {triples}((P^{\prime }_{j})_{\mu }))\). We define G _{1} depending on whether \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }_{1}}\) or not. If \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }_{1}}\), then let \(G_{1} = G^{\prime }_{1}\). Otherwise, since \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime }}\) by assumption, there exists a child v of \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) and a mapping μ _{0} such that \(\mu _{v}\sqsubset \mu _{0}\) and μ _{0}(t r i p l e s(v)) ⊆ G ^{′}. Then the graph \(G_{1} = G^{\prime }_{1} \cup \mu _{0}(\mathsf {triples}(v))\) is such that \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{1}}\). In either case, \(\mu \in [{\kern 2.3pt}[ P_{i} ]{\kern 2.3pt}]_{G_{1}}\) because G _{0} ⊆ G _{1} ⊆ G ^{′}. Moreover, we have \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime \prime }}\) for every G ^{″} such that G _{1} ⊆ G ^{″} ⊆ G ^{′}. To see this, suppose for contradiction that \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G^{\prime \prime }}\) for a graph G ^{″} as above. Then there must be a child v ^{′} of \(\mathcal {T}((P^{\prime }_{j})_{\mu _{0}})\) such that v ^{′}≺ v, \(\mu _{o}_{v^{\prime }}\sqsubset \mu \) and μ(t r i p l e s(v ^{′})) ⊆ G ^{″}. Since \(\mathcal {T}((P^{\prime }_{j})_{\mu _{0}})\) and \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) are identical restricted to nodes preceding v with respect to ≺, v ^{′} is a child of \(\mathcal {T}((P^{\prime }_{j})_{\mu })\). Thus, v ^{′} is not contained in \(\mathcal {T}((P^{\prime }_{j})_{\mu })\), which contradicts maximality of \(\mathcal {T}((P^{\prime }_{j})_{\mu })\) since μ(t r i p l e s(v ^{′})) ⊆ G ^{″}⊆ G ^{′}.
If \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{1}}\) for all other j as well, then G _{1} satisfies all the properties required from G. Otherwise we can extend G _{1} to a graph G _{2} on the base of some other \(P^{\prime }_{j}\) with \(\mu \in [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{1}}\) in the same way as G _{1} extends G _{0}. We then have G _{2} ⊆ G ^{′}, \(\mu \in [{\kern 2.3pt}[ P ]{\kern 2.3pt}]_{G_{2}}\), and \(\mu \notin [{\kern 2.3pt}[ P^{\prime }_{j} ]{\kern 2.3pt}]_{G_{2}}\) for j from both steps. Repeating the extension step until there are no \(P^{\prime }_{j}\) having μ as a solution on the resulting graph, we obtain a graph that satisfies all the properties required from G; in particular, for each j the number of added triples to the graph is bounded by \(\mathsf {triples}(P^{\prime }_{j})\). □
Hardness of equivalence is established in the following lemma by a reduction of ∀∃3SAT, while containment is \({{\Pi }_2^p}\)hard by the results in [35]. Note that both results hold even for fragments without F I L T E R.
Lemma 6
Problem Equivalence ( \({\mathcal {O}_{\text {wwd}}}\) ) is \({{\Pi }_{2}^{p}}\) hard for the fragment \(\mathcal {O}_{\text {wwd}}\) of F I L T E R free wwdpatterns.
Proof
We proceed by reduction of the ∀∃3SAT problem, that is, the problem of checking whether a formula of the form
holds for a conjunction ψ of clauses t _{1} ∨ t _{2} ∨ t _{3} with t _{ i } propositional literals, that is, propositional variables from \(\bar {x} \cup \bar y\) or their negations. Without loss of generality, we assume that ψ contains no tautologous clauses and no clauses with duplicate literals. Let ϕ be a formula of the form (14). Starting from ϕ, we construct F I L T E Rfree wwdpatterns P and P ^{′} in O Fnormal form, and then show that ϕ is true if and only if P ≡ P ^{′}. Let \(\bar {x} = x_{1}, \ldots , x_{n}\) and \(\bar y = y_{1}, \ldots , y_{m}\).
For each clause γ = t _{1} ∨ t _{2} ∨ t _{3}, there are exactly 7 assignments to the variables in t _{1}, t _{2}, t _{3} making γ true, and exactly one assignment making γ false (since γ is assumed to be nontautologous and contain no duplicate literals). Let, for each such γ in ψ, each ℓ, 1 ≤ ℓ ≤ 7, and each j, 1 ≤ j ≤ 3, v a l(γ, j, ℓ) = ⊤ if the variable of literal t _{ j } evaluates to true in the ℓ’th assignment making γ true, and v a l(γ, j, ℓ) = ⊥, otherwise; here ⊤ and ⊥ are fresh IRIs. Let also, for every clause γ in ψ, \(\mathit {cl}_{\gamma }^{1} ,\dots ,\mathit {cl}_{\gamma }^{7} \) and \(\mathit {lit}_{\gamma }^{1} \), \(\mathit {lit}_{\gamma }^{2} \), \(\mathit {lit}_{\gamma }^{3} \) be fresh IRIs. We define, for each γ and 1 ≤ ℓ ≤ 7, a basic pattern
and a basic pattern
(note that these patterns do not have any variables).
Let, for each propositional variable \(z \in \bar {x} \cup \bar y\), i r i _{ z } be a fresh IRI and ?z be a fresh SPARQL variable. For each γ, let also ?c _{ γ }, \(?v_{\gamma }^{1} \), \(?v_{\gamma }^{2} \), \(?v_{\gamma }^{3} \) be fresh variables. Let
where each \(\mathit {var}_{\gamma }^{j} \) and \(\mathit {iri}_{\gamma }^{j} \), 1 ≤ j ≤ 3, are the variable and the IRI corresponding to the variable of literal t _{ j } in γ; that is, \(\mathit {var}_{\gamma }^{j} ={?z}\) and \(\mathit {iri}_{\gamma }^{j} = iri_{z}\) if t _{ j } = z or t _{ j } = ¬z.
Let ?u and ?s be fresh variables and r, c y ^{⊥}, c y ^{⊤} fresh IRIs. We define
For example, a visualisation of these patterns for
is shown in Fig. 6.
Finally, let
and
be two F I L T E Rfree wwdpatterns in O Fnormal form.
We next show that ϕ is true if and only if P is equivalent to P ^{′}, starting with the forward direction.
Let ϕ be true, yet, for the sake of contradiction, P is not equivalent to P ^{′}. Then there is a graph G and mapping μ such that μ ∈ [[P]]_{ G }, but μ ∉ [[P ^{′}]]_{ G }. Since patterns P and P ^{′} have the same root B _{base}, which contains ?u as the only variable, we conclude that ?u ∈ d o m(μ). Each ?x _{ i } is also in d o m(μ) by the construction of P, since there is a homomorphism from the corresponding leaf \(B_{i}^{\top }\) to the root B _{base}. However, it is not necessary that \({(\mu (?x_{i}), iri_{x_{i}}, \top )}\) is in G because if G contains a triple of the form \({(c, iri_{x_{i}}, \bot )}\) for some IRI c, we will have \({(\mu (?x_{i}), iri_{x_{i}}, \bot )}\in G\). Note also that nothing prevents G from containing both a triple \({(c, iri_{x_{i}}, \bot )}\) and a triple \({(c, iri_{x_{i}}, \top )}\) for some i. Depending on whether ?s ∈ d o m(μ) or not, we have two cases.
Case 1
Let ?s ∈ d o m(μ), that is, there is a homomorphism from B _{ ψ } to G that aligns with the previous assignment of all ?x _{ i }. In particular, this means that d o m(μ) = v a r s(P) = v a r s(P ^{′}). If there is no homomorphism from B _{valid} to G, then μ ∈ [[P ^{′}]]_{ G }, because B _{ ψ } is the last leaf of P ^{′} as well, and nothing prevents it from matching. But this contradicts the assumption. However, even if there is a homomorphism h from B _{valid} to G, we still have a contradiction because \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G}\) still holds. Indeed, ?s is the only variable in B _{valid} and is essentially isolated in B _{valid}, so if h is a homomorphism from B _{valid} to G, then h ^{′}, which maps ?s to μ(?s), is also such a homomorphism (in other words, since by the assumptions of this case, we know that (?s, r, ?s) has a match in G, the existence of h just means that all the ground triples of B _{valid} are in G). This means, however, that nothing prevents B _{ ψ } from matching in P ^{′}, implying μ ∈ [[P ^{′}]]_{ G }.
Case 2
Let ?s ∉ d o m(μ). Since μ ∉ [[P ^{′}]]_{ G }, there is no homomorphism from B _{ ψ } to G but there is one from B _{valid} to G, that is, all ground triples of B _{valid} are in G (the nonexistence of a homomorphism from B _{ ψ } to G is immediate since ?s ∉ d o m(μ) and μ ∈ [[P]]_{ G }; the existence of a homomorphism from B _{valid} to G then follows since otherwise we would have \(\mu \in [{\kern 2.3pt}[ P^{\prime } ]{\kern 2.3pt}]_{G}\)). Consider now a truth assignment α of variables \(\bar {x}\) such that if α(x _{ i }) is true then \({(\mu (?x_{i}), iri_{x_{i}}, \top )} \in G\) and if α(x _{ i }) is false then \({(\mu (?x_{i}), iri_{x_{i}}, \bot )} \in G\) (as we mentioned earlier, α may be not unique, but the argument does not depend on its uniqueness). Since ϕ is true, we know that α can be extended to the variables \(\bar y\) in such a way that each clause in ψ holds. Let α ^{′} be such an extension, and let μ ^{′} be an extension of μ to the variables in B _{ ψ } such that, for all j, μ ^{′}(?y _{ j }) = c y ^{⊤} if α ^{′}(y _{ j }) is true and μ ^{′}(?y _{ j }) = c y ^{⊥} otherwise. Then, for every clause γ in ψ, the IRIs \(\mu ^{\prime }(?v_{\gamma }^{1} )\), \(\mu ^{\prime }(?v_{\gamma }^{2})\), \(\mu ^{\prime }(?v_{\gamma }^{3} )\) correspond to the values α ^{′}(z _{1}), α ^{′}(z _{2}), α ^{′}(z _{3}), respectively, where z _{1}, z _{2}, z _{3} are the variables in the literals t _{1}, t _{2}, t _{3} of γ. Moreover, \(\mu ^{\prime }(?c_{\gamma }) = \mathit {cl}_{\gamma }^{\ell } \) for ℓ the number of the assignment α ^{′}(z _{1}), α ^{′}(z _{2}), α ^{′}(z _{3}); this assignment makes γ true by the choice of α ^{′} (in other words, for every γ there is some ℓ such that \({(\mu ^{\prime }(?c_{\gamma }), \mathit {lit}_{\gamma }^{i}, \mu ^{\prime }(?v_{\gamma }^{i} ))}\in B_{\gamma }^{\ell }\) for all 1 ≤ i ≤ 3). Hence, the extension μ ^{′} is contained in [[P]]_{ G }, and hence μ ∉ [[P]]_{ G }, which, however, contradicts the original assumption.
Since both cases yield a contradiction, we conclude that P is equivalent to P ^{′}.
We continue with the backward direction of the equivalence. Suppose that P ≡ P ^{′}, yet, for the sake of contradiction, ϕ is false. Then, there is a truth assignment α of the variables \(\bar {x}\) such that for each extension α ^{′} of α to the variables \(\bar y\) there is a clause γ in ψ that evaluates to false under α ^{′}. Fix such an α and consider the graph G consisting of

the triple \({(u, iri_{x_{i}}, \top )}\), for each x _{ i } in \(\bar {x}\) and a fresh IRI u;

the triple \({(c_{x_{i}}, iri_{x_{i}}, \top )}\), for each x _{ i } in \(\bar {x}\) with α(x _{ i }) true, and the triple \({({c_{x_{i}}}, {iri_{x_{i}}}, {\bot })}\), for each x _{ i } in \(\bar {x}\) with α(x _{ i }) false, where \(c_{x_{1}},\dots ,c_{x_{n}}\) are fresh IRIs;

all ground triples from B _{valid}, that is, all of its triples except (?s, r, ?s);

the triple (s, r, s) for a fresh IRI s.
Consider also the mapping μ such that

μ(?u) = u;

\(\mu (?x_{i}) = c_{x_{i}}\), for each x _{ i } in \(\bar {x}\).
Clearly, μ is a ppsolution to both P and P ^{′} over G. However, by the construction of G, the mapping μ ^{′} = μ ∪ {?s ↦ s} is a ppsolution to P ^{′} over G as well. Thus, μ is not a solution to P ^{′} over G, and, since P ⊆ P ^{′}, we also have μ ∉ [[P]]_{ G }. Consequently, it must be possible to further extend μ ^{′} to a mapping μ ^{″} that is both in [[P]]_{ G } and in [[P ^{′}]]_{ G }, and is defined on all variables in B _{ ψ }. Essentially, this means that there is a homomorphism from B _{ ψ } to B _{valid} that preserves ?s. Consider now the extension α ^{′} of α to the variables \(\bar y\) such that μ ^{″}(?y _{ j }) = c y ^{⊤} if α ^{′}(y _{ j }) is true and μ ^{″}(?y _{ j }) = c y ^{⊥} otherwise. By the construction of B _{valid}, this assignment validates all clauses in ψ, which, however, contradicts the assumption that ϕ is false.
Thus, we have shown that ϕ is true if and only if P ≡ P ^{′}. □
Theorem 7
Problems \(\textsc {Equivalence}({\mathcal {L}})\) and \(\textsc {Containment}({\mathcal {L}})\) are both \({{\Pi }_2^p}\) complete for any \(\mathcal {L}\!\in \!\{\mathcal {P}_{\text {wwd}}, \mathcal {U}_{\text {wwd}}\}\) .
Proof
The existence of a \({{\Pi }_2^p}\) algorithm for containment immediately follows from Lemma 5: to show that P ⊈ P ^{′}, for \(P,P^{\prime }\in \mathcal {P}_{\text {wwd}}\), we just need to guess, in NP, a graph G of linear size as well as a mapping μ, check that μ ∉ [[P ^{′}]]_{ G }, and then call for a coNP oracle for checking that μ ∈ [[P]]_{ G }. The claim for patterns in \(\mathcal {U}_{\text {wwd}}\) is similar, but involves guessing a disjunct P _{1} of P with μ ∈ [[P _{1}]]_{ G } and checking \(\mu \notin [{\kern 2.3pt}[ {P^{\prime }_{1}} ]{\kern 2.3pt}]_{G} \) for every disjunct \(P^{\prime }_{1}\) of P ^{′}. Since P ≡ P ^{′} if and only if containment holds in both directions, the problem \(\textsc {Equivalence}({\mathcal {U}_{\text {wwd}}})\) is also in \({{\Pi }_{2}^{p}}\).
Hardness follows by the results in [35] for containment and by Lemma 6 for equivalence. □
Hence, for U N I O N and F I L T E Rfree patterns, the step from welldesigned to weakly welldesigned O P T incurs a complexity jump for containment and equivalence. However, for the fragments with U N I O N or projection complexity remains the same in both cases. As far as we are aware, these are the first decidability results on query equivalence and related problems for SPARQL fragments with O P T and F I L T E R.
8 Analysis of DBpedia Logs
In this section, we present an analysis of query logs over DBpedia, which suggests that the step from wdpatterns to wwdpatterns makes a dramatic difference in real life: while only about half of the queries with O P T have welldesigned patterns, almost all of these patterns fall into the weakly welldesigned fragment.
DBpedia [26] is a project providing access to RDF data extracted from Wikipedia via a SPARQL endpoint. DBpedia query logs are well suited for analysing the structure of reallife SPARQL queries as they contain a large amount of generalpurpose knowledge base queries, generated both manually and automatically. DBpedia query logs have been analysed by Picalausa and Vansummeren [34], who reported that, over a period in 2010, about 46.38% of a total of 1344K distinct DBpedia queries used O P T. However, only 47.80% of the queries with O P T had welldesigned patterns. Another analysis of DBpedia logs from the USEWOD 2011 data set performed by Arias Gallego et al. [7] concluded that 16.61% of about 5166K queries contained O P T; however, detailed structure of queries was not analysed.
We considered query logs over DBpedia 3.9 from USEWOD 2015 [30] and USEWOD 2016 [29]. The USEWOD 2015 DBpedia dataset is a random selection of almost 14M queries from the first half of 2014 while the USEWOD 2016 dataset contains 35M queries from the second half of 2015. We removed syntactically incorrect queries as well as queries outside of \(\mathcal {S}\) (in particular, queries using operators specific to SPARQL 1.1). Also, we rewrote the patterns of the remaining queries to unions of U N I O Nfree patterns as proposed in [33] and eliminated duplicates, which left us with 6.6M queries in USEWOD 2015 and 9.1M queries in USEWOD 2016. Finally, we isolated queries involving O P T and counted how many of their patterns were in \(\mathcal {U}_{\text {wd}}\) and in \(\mathcal {U}_{\text {wwd}}\).
The results are given in Table 1. They confirm that a nonnegligible number of DBpedia queries use O P T (over 17%). However, by far not all queries with O P T are welldesigned (only about 44% for USEWOD 2015 and 52% for USEWOD 2016), which is consistent with the results in [34]. On the other hand, almost all of the patterns with O P T (over 99.9% in both cases) are weakly welldesigned, which we consider as the main practical justification for wwdpatterns.
What about the remaining 0.05% of queries with O P T? We looked at a number of such queries and identified what we believe to be the three most common sources of nonweaklywelldesignedness in query patterns. The first and seemingly most common such source is joins between an O P T subpattern and another pattern on a variable that only occurs in the right argument of the O P T subpattern. The following query is an example of such a join:
We believe that the vast majority if not all such queries are erroneous as they are highly unlikely to yield meaningful answers in case the optional part fails to match. Intuitively, one would expect an answer to query (15) to contain the label of an object in variable ?ℓ together with one of its supertypes in variable ?u. And indeed, this is the answer returned by the query on graph G in Fig. 7a (see Fig. 7b). However, if an object in ?s is not assigned any type, which is explicitly allowed by the use of O P T, the query does not just return its label in ?ℓ leaving ?u unbound, as one would expect; instead, it returns the cross product of the label and all types in the graph, which are, of course, completely unrelated to the object in ?s (see Fig. 7c and d).
A second source of nonweaklywelldesigned patterns is joins between an O P T subpattern and another pattern where the left argument of the O P T subpattern is empty. The following query illustrates this source:
Intuitively, query (16) computes the join between the answers to the pattern (?b,height, ?h) and those to (?b,city, ?c), provided the second set is nonempty; otherwise, the query just returns the answers to (?b,height, ?h). Hence, query (16) is equivalent to the following query in \(\mathcal {S}_{\text {wwd}}\):
We conclude that, while queries such as (16) may make sense in practice, they can be easily and intuitively restated using wwdpatterns.
Our final, and most interesting, source of reallife nonwwdpatterns is U N I O N in the right argument of O P T. For instance, consider the pattern
The pattern is quite intuitive and it is easy to imagine similar patterns being useful in various applications. However, the normalisation algorithm in [33], which “pushes” unions outside, converts (17) to a pattern that is inherently nonwelldesigned (due to the two occurrences of ?a in the third disjunct):
We believe that this behaviour is unavoidable in general as we expect query answering to become \({{\Pi }_2^p}\)hard over patterns that contain U N I O N in the right argument of O P T. Yet, for certain classes of patterns, generalising (17), one may be able to obtain a coNP evaluation algorithm by accounting for U N I O N natively rather than relying on the normalisation in [33]. A detailed study of such patterns, however, is outside the scope of the present paper.
9 Conclusion and Future Work
In this paper, we introduced a new fragment of SPARQL patterns called weakly welldesigned patterns. This fragment extends the widely studied welldesigned fragment by allowing variables from the optional side of an O P Tsubpattern that are not “guarded” by the mandatory side to occur in certain positions outside of the subpattern. We showed that queries with wwdpatterns enjoy the same low complexity of evaluation as welldesigned queries but cover almost all reallife queries. Moreover, our fragment is the maximal coNP fragment that does not impose structural restrictions on basic patterns and filter conditions. We studied the expressive power of the fragment and the complexity of its query optimisation problems.
For future work, we want to extend wwdpatterns to allow for nontoplevel occurrences of U N I O N and projection. As we have seen in the previous section, this promises to be a challenging task since a naive extension of our definitions to such constructs is likely to increase reasoning complexity. Also, we want to take into account features of SPARQL 1.1 [17] such as G R A P H, N O T E X I S T S and property paths. Finally, we would like to implement our ideas in a prototype system and compare its performance with existing SPARQL engines.
References
Ahmetaj, S., Fischl, W., Pichler, R., Simkus, M., Skritek, S.: Towards reconciling SPARQL and certain answers. In: Gangemi, A., Leonardi, S., Panconesi, A. (eds.) Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 23–33. ACM (2015)
Angles, R., Gutierrez, C.: The expressive power of SPARQL. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T.W., Thirunarayan, K. (eds.) ISWC 2008, LNCS, vol. 5318, pp. 114–129. Springer (2008)
Arenas, M., Conca, S., Pérez, J.: Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In: Mille, A., Gandon, F.L., Misselis, J., Rabinovich, M., Staab, S. (eds.) Proceedings of the 21st World Wide Web Conference, WWW 2012, pp. 629–638. ACM (2012)
Arenas, M., Gottlob, G., Pieris, A.: Expressive languages for querying the semantic web. In: Hull, R., Grohe, M. (eds.) Proceedings of the 33rd ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2014, pp. 14–26. ACM (2014)
Arenas, M., Pėrez, J.: Querying Semantic Web Data with SPARQL. In: Lenzerini, M., Schwentick, T. (eds.) Proceedings 30th ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2011, pp. 305–316. ACM (2011)
Arenas, M., Ugarte, M.: Designing a query language for RDF: marrying open and closed worlds. In: Milo, T., Tan, W. (eds.) Proceedings 35th ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems, PODS 2016, pp. 225–236. ACM (2016)
Arias Gallego, M., Fernández, J.D., MartínezPrieto, M.A., de la Fuente, P.: An empirical study of realworld SPARQL queries Proceedings of the 1st International Workshop on Usage Analysis and the Web of Data, USEWOD 2011. arXiv:1103.5043 (2011)
Barceló, P., Pichler, R., Skritek, S.: Efficient Evaluation and Approximation of WellDesigned Pattern Trees. In: Milo, T., Calvanese, D. (eds.) Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS 2015, pp. 131–144. ACM (2015)
Bischof, S., Krótzsch, M., Polleres, A., Rudolph, S.: Schemaagnostic query rewriting in SPARQL 1.1. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part I, LNCS, vol. 8796, pp. 584–600. Springer (2014)
Buil Aranda, C., Arenas, M., Corcho, Ó., Simperl, E.P.B.: Semantics and optimization of the SPARQL 1.1 federation extension. In: Antoniou, G., Grobelnik, M., Parsia, B., Plexousakis, D., Leenheer, P.D., Pan, J.Z. (eds.) ESWC 2011, Part II, LNCS, vol. 6644, pp. 1–15. Springer (2011)
Buil Aranda, C., Polleres, A., Umbrich, J., Knoblock, C.A., Vrandecic, D.: Strategies for executing federated queries in SPARQL 1.1. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part II, LNCS, vol. 8797, pp. 390–405. Springer (2014)
Chekol, M.W., Euzenat, J., Genevès, P., Layaïda, N.: SPARQL query containment under RDFS entailment regime. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012, LNCS, vol. 7364, pp. 134–148. Springer (2012)
Chekol, M.W., Euzenat, J., Genevès, P., Layaïda, N.: SPARQL query containment under SHI Axioms. In: Hoffmann, J., Selman, B. (eds.) Proceedings of the 26th AAAI Conference on Artificial Intelligence, AAAI 2012, pp. 10–16. AAAI Press (2012)
Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 concepts and abstract syntax. W3C recommendation, W3C. http://www.w3.org/TR/rdf11concepts/ (2014)
Geerts, F., Unger, T., Karvounarakis, G., Fundulaki, I., Christophides, V.: Algebraic structures for capturing the provenance of SPARQL queries. J. ACM 63(1), 7:1–7:63 (2016)
Halpin, H., Cheney, J.: Dynamic Provenance for SPARQL Updates. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part I, LNCS, vol. 8796, pp. 425–440. Springer (2014)
Harris, S., Seaborne, A.: SPARQL 1.1 query language. W3C recommendation, W3C. http://www.w3.org/TR/sparql11query/ (2013)
Hayes, P.J., PatelSchneider, P.F.: RDF 1.1 semantics. W3C recommendation, W3C. http://www.w3.org/TR/rdf11mt/ (2014)
Kaminski, M., Kostylev, E.V.: Beyond welldesigned SPARQL. In: Martens, W., Zeume, T. (eds.) Proceedings of the 19th International Conference on Database Theory, ICDT 2016, LIPIcs, vol. 48, pp. 5:1–5:18. Schloss Dagstuhl  LeibnizZentrum für Informatik (2016)
Kaminski, M., Kostylev, E.V., Cuenca Grau, B.: Semantics and expressive power of subqueries and aggregates in SPARQL 1.1. In: Bourdeau, J., Hendler, J., Nkambou, R., Horrocks, I., Zhao, B.Y. (eds.) Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 227–238. ACM (2016)
Kontchakov, R., Kostylev, E.V.: On expressibility of nonmonotone operators in SPARQL. In: Baral, C., Delgrande, J.P., Wolter, F. (eds.) Proceedings of the 15th International Conference on Principles of Knowledge Representation and Reasoning, KR 2016, pp. 369–379. AAAI Press (2016)
Kontchakov, R., Rezk, M., Rodriguezmuro, M., Xiao, G., Zakharyaschev, M.: Answering SPARQL queries over databases under OWL 2 QL entailment regime. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C. A. (eds.) ISWC 2014, Part I, LNCS, vol. 8796, pp. 552–567. Springer (2014)
Kostylev, E.V., Cuenca Grau, B.: On the semantics of SPARQL queries with optional matching under entailment regimes. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C.A., Vrandecic, D., Groth, P.T., Noy, N.F., Janowicz, K., Goble, C.A. (eds.) ISWC 2014, Part II, LNCS, vol. 8797, pp. 374–389. Springer (2014)
Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoc, D., Staab, S.: SPARQL with property paths. In: Arenas, M., Corcho, Ȯ., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P.T., Dumontier, M., Heflin, J., Thirunarayan, K. (eds.) ISWC 2015, Part I, LNCS, vol. 9366, pp. 3–18. Springer (2015)
Kostylev, E.V., Reutter, J.L., Ugarte, M.: CONSTRUCT Queries in SPARQL. In: Arenas, M., Ugarte, M. (eds.) Proceedings of the 18th International Conference on Database Theory, ICDT 2015, LIPIcs, vol. 31, pp. 212–229. Schloss Dagstuhl  LeibnizZentrum für Informatik (2015)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia—a largescale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)
Letelier, A., Pėrez, J., Pichler, R., Skritek, S.: Static analysis and optimization of semantic web queries. ACM Trans. Database Syst. 38(4), 25 (2013)
Losemann, K., Martens, W.: The complexity of evaluating path expressions in SPARQL. In: Benedikt, M., Krótzsch, M., Lenzerini, M. (eds.) Proceedings of the 31st ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2012, pp. 101–112. ACM (2012)
LuczakRösch, M., Aljaloud, S., Berendt, B., Hollink, L.: USEWOD 2016 research dataset. doi:10.5258/SOTON/385344 (2016)
LuczakRösch, M., Berendt, B., Hollink, L.: USEWOD 2015 research dataset. doi:10.5258/SOTON/379407 (2015)
Manola, F., Miller, E., McBride, B.: RDF 1.1 primer. W3C working group note, W3C. http://www.w3.org/TR/rdf11primer/ (2014)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I.F., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006, LNCS, vol. 4273, pp. 30–43. Springer (2006)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)
Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Virgilio, R.D., Giunchiglia, F., Tanca, L. (eds.) Proceedings of the 3rd International Workshop on Semantic Web Information Management, SWIM 2011, pp. 7:1–7:6. ACM (2011)
Pichler, R., Skritek, S.: Containment and equivalence of welldesigned SPARQL. In: Hull, R., Grohe, M. (eds.) Proceedings of the 33rd ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS 2014, pp. 39–50. ACM (2014)
Polleres, A., Wallner, J.P.: On the relation between SPARQL 1.1 and answer set programming. J. Appl. NonClassical Log. 23(1–2), 159–212 (2013)
Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C recommendation, W3C. http://www.w3.org/TR/rdfsparqlquery/ (2008)
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Segoufin, L. (ed.) Proceedings of the 13Th International Conference on Database Theory, ICDT 2010, pp. 4–33. ACM (2010)
Zhang, X., Van den Bussche, J.: On the power of SPARQL in expressing navigational queries. Comput. J. 58(11), 2841–2851 (2015)
Zhang, X., Van den bussche, J.: On the primitivity of operators in SPARQL. Inf. Process. Lett. 114(9), 480–485 (2014)
Zhang, X., Van den bussche, J., Picalausa, F.: On the satisfiability problem for SPARQL patterns. J. Artif. Intell. Res. (JAIR) 56, 403–428 (2016)
Acknowledgements
This work was supported by the EPSRC projects Score!, DBOnto, and ED3.
Author information
Authors and Affiliations
Corresponding authors
Additional information
This article is part of the Topical Collection on Special Issue on Database Theory
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Kaminski, M., Kostylev, E.V. Complexity and Expressive Power of Weakly WellDesigned SPARQL. Theory Comput Syst 62, 772–809 (2018). https://doi.org/10.1007/s0022401798029
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0022401798029
Keywords
 RDF query languages
 SPARQL
 Optional matching