Complexity and Expressive Power of Weakly Well-Designed SPARQL

SPARQL is the standard query language for RDF data. The distinctive feature of SPARQL is the OPTIONAL operator, which allows for partial answers when complete answers are not available due to lack of information. However, optional matching is computationally expensive—query answering is PSPACE-complete. The well-designed fragment of SPARQL achieves much better computational properties by restricting the use of optional matching—query answering becomes coNP-complete. On the downside, well-designed SPARQL captures far from all real-life queries—in fact, only about half of the queries over DBpedia that use OPTIONAL are well-designed. In the present paper, we study queries outside of well-designed SPARQL. We introduce the class of weakly well-designed queries that subsumes well-designed queries and includes most common meaningful non-well-designed queries: our analysis shows that the new fragment captures over 99% of DBpedia queries with OPTIONAL. At the same time, query answering for weakly well-designed SPARQL remains coNP-complete, and our fragment is in a certain sense maximal for this complexity. We show that the fragment’s expressive power is strictly in-between well-designed and full SPARQL. Finally, we provide an intuitive normal form for weakly well-designed queries and study the complexity of containment and equivalence.


Introduction
The Resource Description Framework (RDF) [14,18,31] is the W3C standard for representing linked data on the Web. RDF models information in terms of labelled graphs consisting of triples of resource identifiers (IRIs). The first and last IRIs in such a triple, called subject and object, represent entity resources, while the middle IRI, called predicate, represents a relation between the two entities.
SPARQL [17,37] is the default query language for RDF graphs. First standardised in 2008 [37], SPARQL is now recognised as a key technology for the Semantic Web. This is witnessed by a recent adoption of a new version of the standard, SPARQL 1.1 [17], as well as by active development of SPARQL query engines in academia and industry, for instance, as part of the systems Allegro-Graph (http://franz.com/agraph/allegrograph/), Apache Jena (http://jena.apache.org), RDF4J (http://rdf4j.org), or OpenLink Virtuoso (http://virtuoso.openlinksw.com).
A distinctive feature of SPARQL as compared to SQL is the OPTIONAL operator (abbreviated as OPT in this paper). This operator was introduced to "not reject (solutions) because some part of the query pattern does not match" [37]. For instance, consider the SPARQL query SELECT ?i, ?n WHERE (?i, rdf:type, foaf:person) OPT (?i, foaf:name, ?n), (1) which retrieves all person IDs from the graph together with their names; names, however, are optional-if the graph does not contain information about the name of a person, the person ID is still retrieved but the variable ?n is left undefined in the answer. For instance, query (1) has two answers over the graph G in Fig. 1a, where the second answer is partial (see Fig. 1b). However, if we extend G with a triple supplying a name for P2, the second answer will include this name.
The OPT operator accounts in a natural way for the open world assumption and the fundamental incompleteness of the Web. However, evaluating queries that use OPT is computationally expensive-Pérez et al. [33] showed PSPACE-completeness of SPARQL query evaluation, and Schmidt et al. [38] refined this result by proving PSPACE-hardness even for queries using no operators besides OPT. This is not surprising given that SPARQL queries are equivalent in expressive power to first-order logic queries, and translations in both directions can be done in polynomial time [2,25,36].
This spurred a search for restrictions on the use of OPT that would ensure lower complexity of query evaluation. It was also recognised that queries that are difficult to evaluate are often unintuitive. For instance, they may produce less specified answers (i.e., answers with fewer bound variables) as the graph over which they are evaluated grows larger. Pérez et al. [33] introduced the well-designed fragment of SPARQL queries by imposing a syntactic restriction on the use of variables in OPT-expressions. Roughly speaking, each variable in the optional (i.e., right) argument of an OPT-expression should either appear in the mandatory (i.e., left) argument or be globally fresh for the query, i.e., appear nowhere outside of the argument. Well-designed queries have lower complexity of query evaluation-the problem is CONP-complete (provided all the variables in the query are selected). Moreover, such queries have a more intuitive behaviour than arbitrary SPARQL queries; in particular, they enjoy the monotonicity property that we observed for query (1): each partial answer over a graph can potentially be extended to undefined variables if the graph is completed with the missing information, and the more information we have the more specified are the answers. Well-designed queries can be efficiently transformed to an intuitive normal form allowing for a transparent graphical representation of queries as trees [27,35]. Hence, many recent studies concentrate partially [23,25,27,40,41] or entirely [1,8,35] on well-designed queries.
Such a success of well-designed queries may lead to the impression that non-welldesigned SPARQL queries are just a useless side effect of the early specification. But is this impression justified by the use of SPARQL in practice? To answer this question, a comprehensive analysis of real-life queries is required. We are aware of two works that analyse the distribution of operators in SPARQL queries asked over DBpedia [7,34]. Both studies show that OPT is used in a non-negligible amount of practical queries. However, only Picalausa and Vansummeren [34] go further and analyse how many of these queries are well-designed; and the result is quite interesting-welldesigned queries make up only about half of all queries with OPT. In other words, well-designed queries are common, but by far not exclusive.
The main goal of this paper is to investigate SPARQL queries beyond the welldesigned fragment. We wanted to see if the well-designedness condition could be extended so as to include most practical queries while preserving good computational properties. The main result of our study is very positive-we identified a new fragment of SPARQL queries, called weakly well-designed queries, that covers over 99% of queries over DBpedia and has the same complexity of query evaluation as the well-designed fragment. We also show that our fragment is in a sense maximal for this complexity.
We next describe our results and techniques in more detail. Our first step was to identify typical real-life queries that are not well-designed. We analysed DBpedia query logs in recent USEWOD research datasets [29,30] and found two interesting types of non-well-designed queries. The first type is exemplified by the following query: SELECT ?i, ?n WHERE ((?i, rdf:type, foaf:person) OPT (?i, foaf:name, ?n)) OPT (?i, v card:name, ?n).
(2) This query is clearly not well-designed because variable ?n, binding the name of a person, appears in two different unrelated optional parts. Let us analyse answers to this query over different graphs. On graph G in Fig. 1a the result is exactly the same as for query (1), shown in Fig. 1b, simply because the IRI v card:name is not present in G, and so cannot be matched against the second optional part of the query. Similarly, on graph G in Fig. 1c, where the source of the name and the name itself are different, the result is as in Fig. 1d. In this case, the first optional part in the query does not match anything in the graph so the variable ?n is left unbound at this point; then the second optional is matched, and the variable is assigned with the name from v card. More interestingly, query (2) evaluated over the graph G ∪ G once again yields the result in Fig. 1b. Indeed, in this case, the first optional part has a match again and ?n is assigned the value Ana; then, this variable is already bound and there is no match for the second optional part that agrees with this value, meaning that the alternative v card name is disregarded by the query. To summarise, query (2) is once again looking for person IDs and, optionally, their names. Now, however, names are collected from two different sources, foaf and v card, where the first source is given preference over the second (maybe because it is considered more reliable or more informative, or for some other reason). In other words, if we know the foaf name of a person, it is returned as part of the answer regardless of their v card name; however, if there is no foaf name, then the v card name is also acceptable and should be returned; variable ?n is left unbound only if the name cannot be extracted from either source.
Of course, preference patterns encountered in real-life queries are often more complex. Still, we will see that in most cases they do not increase the complexity of query evaluation.
(3) The query uses FILTER, a standard SPARQL operator that admits only answers conforming to a specified constraint. Again, this query is not well-designed because the FILTER constraint mentions the variable ?n, which occurs in the optional part of the query but not in the mandatory part. However, the intention of the query is quite clear: it searches for people whose names are not known to be Ana, including people whose names are unknown.
This use of FILTER is in fact very common in real-life queries. Moreover, it is intuitive as long as FILTER is essentially the outermost operator in the query, as it is in our example. We will see that in all such cases FILTER cannot lead to an increase in complexity.
Having isolated these typical uses of non-well-de-signed-ness, we identify a new fragment of SPARQL that (a) includes all queries of the above two types, (b) subsumes well-designed queries, and (c) has the same complexity of query evaluation as well-designed queries. We call such queries weakly well-designed. They are the maximal fragment without structural restrictions on conjunctive blocks and filter conditions that has the above properties. Our analysis shows that more than 99% of DBpedia queries with OPT are weakly well-designed.
Besides low complexity of query evaluation, we establish a few more useful properties of weakly well-designed queries, which are summarised in the following outline of the paper. After introducing the syntax and semantics of SPARQL in Section 2, we formally define our new fragment in Section 3. In Section 4, we show that, similarly to the well-designed case, weakly well-designed queries can be transformed to an intuitive normal form, which allows for a natural graphical representation as constraint pattern trees. Using this representation, in Section 5, we formally show that the step from well-designed to weakly well-designed queries does not increase complexity of query evaluation; minimal relaxations of weak welldesignedness, however, already lead to a complexity jump. In Section 6, we compare the expressive power of our fragment (and its extensions with additional operators) with well-designed queries and unrestricted SPARQL queries; in all cases, we show that the expressivity of weakly well-designed queries lies strictly in-between welldesigned and unrestricted queries. In Section 7, we study static analysis problems for weakly well-designed queries and establish p 2 -completeness of equivalence and containment. Finally, in Section 8, we detail our analysis of DBpedia logs.
This article significantly extends the conference paper [19]. Besides providing full proofs of our technical claims, we have extended the analysis section and updated the evaluation to use more recent datasets. Furthermore, we have removed the erroneous claim that queries over unions of weakly well-designed patterns have the same expressive power as unrestricted SPARQL queries; on the contrary, we show that the former are strictly less expressive than the latter.

SPARQL Query Language
We begin by formally introducing the syntax and semantics of SPARQL that we adopt in this paper. Our formal setup mostly follows [33], which has some differences from the W3C specification [17,37]; in particular, we use two-placed OPT and twovalued FILTER (conditional OPT and errors in FILTER evaluation as in the standard are expressible in our formalisation [2,21]), do not consider blank nodes (their presence in RDF graphs would not change any of our results), and adopt set semantics, leaving multiset answers for future work. , ?x = u, ?x = ?y, or bound(?x) for ?x, ?y in X and u ∈ I (these constraints are called atomic), -¬R 1 , R 1 ∧ R 2 , or R 1 ∨ R 2 for filter constraints R 1 and R 2 .

RDF Graphs
A basic pattern is a possibly empty set of triples from (I ∪ X) × (I ∪ X) × (I ∪ X) (to avoid notational clutter, in examples we will often omit braces when writing singleton basic patterns, e.g., we will write (?x, u, ?y) instead of {(?x, u, ?y)}). Then, SPARQL (graph) patterns P are defined by the grammar where B ranges over basic patterns and R over filter constraints. Additionally, we require all filter constraints to be safe, that is, vars(R) ⊆ vars(P ) for every pattern (P FILTER R), where vars(S) is the set of all variables in S (which can be a pattern, constraint, etc.) When needed, we distinguish between patterns by their top-level operator; e.g., we write OPT-pattern or FILTER-pattern.
We write U for the set of all patterns. We also distinguish the fragment P of U that consists of all UNION-free patterns, that is, patterns that do not use the UNION operator.
Projection is realised in SPARQL by means of queries with select result form, or queries for short, which are expressions of the form where X is a set of variables and P is a graph pattern. We write S for the set of all queries. The set of all triples in basic patterns of a query Q is denoted triples(Q). Note that every pattern P can be seen as a query of the form (4) where X = vars(P ). Hence, all definitions that refer to "queries" implicitly extend to patterns in the obvious way.

SPARQL Semantics
The semantics of graph patterns is defined in terms of mappings, that is, partial functions from variables to IRIs. The domain dom(μ) of a mapping μ is the set of variables on which μ is defined. Two mappings μ 1 and μ 2 are compatible (written μ 1 ∼ μ 2 ) if μ 1 (?x) = μ 2 (?x) for all variables ?x ∈ dom(μ 1 ) ∩ dom(μ 2 ). If μ 1 ∼ μ 2 , then μ 1 ∪ μ 2 constitutes a mapping with domain dom(μ 1 ) ∪ dom(μ 2 ) that coincides with μ 1 on dom(μ 1 ) and with μ 2 on dom(μ 2 ). Given two sets of mappings 1 and 2 , we define their join, union and difference as follows: Based on these, the left outer join operation is defined as Given a graph G, the evaluation P G of a graph pattern P over G is defined as follows: where μ satisfies a filter constraint R, denoted by μ |= R, if one of the following holds: and ?x ∈ dom(μ); -R is a Boolean combination of filter constraints evaluating to true under the usual interpretation of ¬, ∧, and ∨.
Let μ| X be the projection of a mapping μ to variables X, that is, μ| X (? The evaluation Q G of a query Q of the form (4) is the set of all mappings μ| X such that μ ∈ P G .
Finally, a solution to a query (or pattern) Q over G is a mapping μ such that μ ∈ Q G .

Weakly Well-Designed Patterns
We begin by recalling the notion of well-designed patterns and then formulate our generalisation. For now, we focus on the fragment P of UNION-free patterns (also known as the AND-OPT-FILTER fragment of SPARQL), leaving the operators UNION and SELECT for later sections.
Note that a given pattern can occur more than once within a larger pattern. In what follows we will sometimes need to distinguish between a (sub-)pattern P as a possibly repeated building block of another pattern P and its occurrences in P , that is, unique subtrees in the parse tree. Then, the left (right) argument of an occurrence i is the subtree rooted in the left (right) child of the root of i in the parse tree, and an occurrence i is inside an occurrence j if the root of i is a descendant of the root of j . Definition 1 (Pérez et al. [33]) A pattern P from P is well-designed (or wd-pattern, for short) if for every occurrence i of an OPT-pattern P 1 OPT P 2 in P the variables from vars(P 2 ) \ vars(P 1 ) occur in P only inside (the labels of) i.
We write P wd for the fragment of wd-patterns. Such patterns comply with the basic intuition for optional matching in SPARQL: "do not reject (solutions) because some part of the query pattern does not match" [37]; indeed, our canonical use case (1) is clearly well-designed. Evaluation of wd-patterns, that is, checking if μ ∈ P G for a mapping μ, graph G and pattern P ∈ P wd , is CONP-complete (in combined complexity), as opposed to PSPACE-completeness for P [33,38]. The high complexity of unrestricted patterns is partially due to the fact that unrestricted combinations of OPT and FILTER allow to express nesting of the difference operator DIFF with semantics P 1 DIFF P 2 G = P 1 G \ P 2 G (unless P 1 or P 2 are empty basic patterns, see [21] for details): where ?x, ?y and ?z do not occur in vars(P 1 ) ∪ vars(P 2 ). This property is wellknown [2,21,33], and has been usually believed to be an important source of nonwell-designed patterns in practice. We challenge this belief by answering differently the question on the prevalent structure of real-life queries beyond the well-designed fragment. This question is not just of theoretical interest: as previous studies [34] show (and our analysis confirms), about half of queries with OPT asked over DBpedia are not well-designed. Next we discuss two sources of non-well-designedness in patterns as revealed by the example queries (2) and (3) in the introduction-one based on OPT and another one on FILTER. Source 1. There are two substantially different ways of nesting the OPT operator in patterns: Non-well-designed nesting of type (Opt-R) is responsible for the PSPACEhardness of query evaluation [33,38]. Moreover, such nesting is not very intuitive unless well-designed. On the contrary, as we saw in the introduction, non-welldesigned nesting of type (Opt-L) can be used for prioritising some parts of patterns to others, and is indeed used in real life. As we will see later, nesting of type (Opt-L) cannot lead to high complexity of evaluation. Source 2. Well-designedness can be violated by using "dangerous" variables from the right argument of OPT in filter constraints. In particular, patterns of the form (P 1 OPT P 2 ) FILTER R with R using a variable from vars(P 2 ) \ vars(P 1 ) are not well-designed, but rather frequent in practice. However, such patterns almost never occur inside the right argument of other OPT-patterns. We will see that if we restrict the usage of such filters to the "top level", we preserve the good computational properties of wd-patterns.
Motivated by these observations, we considerably generalise the notion of wdpatterns to allow for useful queries like (2) and (3) while retaining important properties of such patterns. We start with two auxiliary notions. Definition 2 Given a pattern P , an occurrence i 1 in P dominates an occurrence i 2 if there exists an occurrence j of an OPT-pattern such that i 1 is inside the left argument of j and i 2 is inside the right argument.

Definition 3
An occurrence i of a FILTER-pattern P FILTER R in P is top-level if there is no occurrence j of an OPT-pattern such that i is inside the right argument of j .
We are ready to give the main definition of this paper.
Definition 4 A pattern P ∈ P is weakly well-designed (or wwd-pattern, for short) if, for each occurrence i of an OPT-subpattern P 1 OPT P 2 , the variables in vars(P 2 ) \ vars(P 1 ) appear outside i only in -subpatterns whose occurrences are dominated by i, and -constraints of top-level occurrences of FILTER-patterns.
We write P wwd for the fragment of wwd-patterns. They extend wd-patterns by allowing variables from the right argument of an OPT-subpattern that are not "guarded" by the left argument to appear in certain positions outside of the subpattern. Note that the patterns of queries (2) and (3) are wwd-patterns. Also, patterns which allow only for OPT nesting of type (Opt-L) are always weakly well-designed, same as the pattern on the right hand side of (5), which expresses DIFF. However, patterns that have subpatterns of the atter form in the right argument of OPT are not weakly well-designed. Next we give a few more examples.
Pattern (6) is not well-designed because of variable ?z, but is weakly well-designed since the occurrence of (?y, c, ?z) dominates (?x, d, ?z). However, the similar pattern (7) is not weakly well-designed because the occurrence of the inner OPT-pattern with the second occurrence of ?z does not dominate the first. Pattern (8) is weakly welldesigned since the FILTER-pattern (which is not dominated by the inner OPT-pattern) is top-level, but pattern (9) is not, because of variable ?w in a non-top-level FILTER.

Proposition 1
Checking whether a UNION-free pattern P belongs to the fragment P wwd can be done in time O(|P | 2 ), where |P | is the length of the string representation of P .
Proof First note that a UNION-free pattern P is weakly well-designed if and only if so is the pattern rm toplevel filters(P ), which is obtained from P by removing all top-level occurrences of filters. The operation rm toplevel filters can be implemented in linear time by the recursive procedure in Fig. 3a.
Next consider the recursive procedure is wwd in Fig. 3b, where sort(S) denotes a sorted, repetition-free list representation of a set S. Given a UNION-free pattern P without top-level filters, it is easily seen that is wwd(P ) returns a tuple of the form (true, vs, ws) if and only if P is weakly welldesigned, where ws is the sorted list of "unguarded" variables in P , that is, variables occurring in the second argument of an OPT-subpattern P of P but not in the first argument of P , and vs = sort(vars(P )) \ ws. Procedure is wwd can be implemented in quadratic time since sort (which may take time O(n log n)) is only applied to atomic subexpressions and set operations on sorted lists take linear time.

OPT-FILTER-Normal Form and Constraint Pattern Trees
One of the key properties of wd-patterns is that they can always be converted to a so-called OPT-normal form, in which all AND-and FILTER-subpatterns are OPT-free [33]. Also, FILTER-free patterns in OPT-normal form can be naturally represented as trees of a special form [27,35], which give a good intuition for the evaluation and optimisation of such patterns. In this section, we show that these notions can be generalised to wwd-patterns.
where B ranges over basic patterns and R over filter constraints.
In other words, the parse tree of a pattern in OF-normal form can be stratified as follows: 1. (occurrences of) basic patterns as the bottom layer, 2. a FILTER on top of each basic pattern as the middle layer, 3. a combination of OPT and FILTER as the top layer; moreover, each occurrence of a FILTER-pattern in the top layer is top-level (according to Definition 3). Note that our normal form is AND-free: all conjunctions are expressed via basic patterns.
Example 2 None of the four patterns in Example 1 are in OF-normal form. However, the first three of them can be easily normalised by replacing each triple t with t , where P is an abbreviation of P FILTER for a pattern P . Also, compare the pattern which is in OF-normal form, with the very similar pattern which is not, because the outer FILTER is in the right argument of the outermost OPT.
As shown by Letelier et al. [27], FILTER-free patterns in OPT-normal form can be represented by means of so-called pattern trees. We next show that this representation can be naturally extended to patterns in OF-normal form.

Definition 6
Let P be a pattern in OF-normal form. The constraint pattern tree (CPT) T (P ) of P is the directed, ordered, labelled, rooted tree recursively constructed as follows (in this definition we abuse notation and confuse patterns and their occurrences; strictly speaking, we create a fresh sub-tree for each occurrence, so the resulting object is always a tree): if P is not a basic pattern then T (P FILTER R) is obtained by adding a special node labelled by R as the last child of the root of T (P ); 3. T (P 1 OPT P 2 ) is the tree obtained from T (P 1 ) and T (P 2 ) by adding the root of T (P 2 ) as the last child of the root of T (P 1 ).
By definition, there is a one-to-one correspondence between patterns in OFnormal form and CPTs. Hence, such trees can be seen as a convenient representation of patterns in OF-normal form. Unlike parse trees, which represent the syntactic shape of patterns, CPTs show the semantic structure of OPT and FILTER nesting. Figure 4 shows how OPT nestings of types (Opt-R) and (Opt-L) are represented in both formats. Note that CPTs treat different FILTER-subpatterns differently: if the filter is over a basic pattern, the constraint of the FILTER is paired with this pattern; however, if the filter is over an OPT-subpattern, then the constraint is represented by a separate special node. Moreover, since in the second case the FILTER-pattern must be top-level, special nodes can only occur in CPTs as children of the root. For instance, the CPT of the example pattern (10) is given in Fig. 5a.

Proposition 2 Let P be a pattern in OF-normal form. Then every special node in T (P ) is a child of the root.
Proof Let v be a special node in T (P ). Then v is obtained from a subpattern P FILTER R where P is not basic. Hence, by definition of the OF-normal form, P must have the form  (10)) and (b) equivalent pattern in "flat" form (13) (for some n ≥ 0) where S 1 , . . . , S n contain only FILTER-subpatterns over basic patterns. Thus, the root of T (P ) is also the root of T (P ), and the claim follows.
Next we show that each wwd-pattern can be converted to OF-normal form and hence can be represented by a CPT. To prove this statement we make use of a number of equivalences. Formally, a pattern P 1 is equivalent to a pattern P 2 (written P 1 ≡ P 2 ) if P 1 G = P 2 G holds for any graph G. There are several equivalences, such as associativity and commutativity of AND, as well as filter decompositions, such as P FILTER (R 1 ∧ R 2 ) ≡ (P FILTER R 1 ) FILTER R 2 , which hold for all patterns (see [38] for an extensive list). Moreover, the key equivalences used in [33] for normalising wd-patterns can easily be adapted to serve our needs. Proposition 3 Let P 1 , P 2 , P 3 be patterns and R a filter constraint such that vars(P 2 ) ∩ vars(P 3 ) ⊆ vars(P 1 ) and vars(P 2 ) ∩ vars(R) ⊆ vars(P 1 ). Then the following equivalences hold: Proof Both equivalences are essentially shown in [33]. While stated for welldesigned patterns, the proof only exploits the properties vars(P 2 ) ∩ vars(P 3 ) ⊆ vars(P 1 ) and vars(P 2 ) ∩ vars(R) ⊆ vars(P 1 ), which are satisfied not only by well-designed patterns, but also by weakly well-designed patterns.
Since all the equivalences preserve weak well-designedness, we obtain the desired result.

Proposition 4 Each wwd-pattern P is equivalent to a wwd-pattern in OF-normal form of size O(|P |).
Proof We call a pattern P pre-normal if it adheres to the grammar that is the same as the one in Definition 5 except that the category F is extended as follows: Given a pattern P , let ||P || be the sum of the sizes of all AND-subpatterns and all FILTER-subpatterns of P (where different occurrences of each pattern are counted separately). Consider a wwd-pattern P that is not pre-normal. Then P contains a subpattern P of one of the following two forms (modulo commutativity of AND): (P 1 OPT P 2 ) AND P 3 and (P 1 OPT P 2 ) FILTER R with P not top-level. In both cases we can rewrite P to a pattern S not increasing | · | and strictly decreasing || · || as follows.
-Let P = (P 1 OPT P 2 ) AND P 3 . Since P is weakly well-designed and the occurrence of P 3 is not dominated by the occurrence of P 1 OPT P 2 , we have vars(P 3 ) ∩ vars(P 2 ) ⊆ vars(P 1 ). Therefore, using the first equivalence in Proposition 3, we can rewrite P to a pattern S by replacing P with (P 1 AND P 3 ) OPT P 2 . Moreover, we have |P | = |S| and ||P || > ||S||.
Since P is weakly well-designed, we then have vars(R) ∩ vars(P 2 ) ⊆ vars(P 1 ), and thus, with the second equivalence in Proposition 3, we can rewrite P to a pattern S by replacing P with (P 1 FILTER R) OPT P 2 . Moreover, we have |P | = |S| and ||P || > ||S||.
Since this rewriting strictly decreases || · ||, its repeated application to P terminates and yields a pre-normal pattern S equivalent to P with |S| = |P |. Finally, S can be transformed to OF-normal form by replacing every occurrence of an AND-FILTER combination of basic patterns by B FILTER R where B consists of all triples in the basic patterns and R is a conjunction of all the filter conditions (if there are no filters in the combination, then R is ). Clearly, this transformation is equivalence-preserving and linear in |S|.
Relying on this proposition, in the rest of the paper we silently assume that all wwd-patterns are in OF-normal form and hence can be represented by CPTs.
We next transfer the notion of weak well-designedness to CPTs. Given a pattern P in OF-normal form, let ≺ be the strict topological sorting of the nodes in T (P ) computed by a depth first search traversal visiting the children of a node according to their ordering (i.e., v ≺ u holds if v is visited before u). Lemma 1 Let P be a pattern in OF-normal form and P = P 1 OPT P 2 be a subpattern of P . Then v ≺ w for every two nodes v, w in T (P ) such that v is in the subtree of T (P ) corresponding to P 1 and w is in the subtree corresponding to P 2 .
Proof The claim follows since T (P ) is constructed by attaching T (P 2 ) as the last child to the root of T (P 1 ).
In the following proposition, vars(u) for a node u of a CPT stands for the set of all variables in the label of u.

Proposition 5 A pattern P in OF-normal form is weakly well-designed if and only if, for each edge (v, u) with non-special u in the CPT
Proof For the forward direction of the first statement, suppose P is weakly well-designed. We proceed by induction on the structure of P and consider the following cases.
Then the claim is vacuous. -Let P = P 1 FILTER R where P 1 is not basic. By the inductive hypothesis, the claim holds for T (P 1 ). Moreover, T (P ) differs from T (P 1 ) only in the special node labelled with R, and the claim follows by Proposition 2. -Let P = P 1 OPT P 2 . By the inductive hypothesis, the claim holds for T (P 1 ) and T (P 2 ). Thus, by Lemma 1, it suffices to show that for every edge . Suppose for contradiction that this property is violated for some (v, u) and ?x.
Then P 2 has a subpattern P = P 1 OPT P 2 such that T (P 1 ) is a subtree of T (P ) rooted at v and T (P 2 ) is the complete subtree of T (P ) rooted at u. Moreover, ?x occurs in P 1 , and thus outside P . Since all FILTER-subpatterns in P are safe, we can assume without loss of generality that the occurrence of ?x in P 1 is not in a filter constraint. However, this contradicts the assumption that P is weakly welldesigned since the occurrence of ?x in P 1 is not dominated by the occurrence of P .
For the backward direction of the first claim, suppose P is not a wwd-pattern. Then P has a subpattern P = P 1 OPT P 2 , with v the root of T (P ) in T (P ) and u the child of v corresponding to T (P 2 ), and a variable ?x ∈ vars(P 2 ) \ vars(P 1 ) such that ?x ∈ vars(u) and, for some subpattern P 1 OPT P 2 of P , ?x occurs in P 1 and P occurs in P 2 . Since ?x ∈ vars(P 2 ) \ vars(P 1 ) and ?x ∈ vars(u), we have ?x ∈ vars(u) \ vars(v). Thus, by Lemma 1, we have v ≺ w, where w is a node in T (P 1 ) with an occurrence of ?x.
The second claim can be proved analogously.
Note that if a pattern is FILTER-free, its OF-normal form coincides with the OPTnormal form in [33] (modulo tautological filters), and its CPT is the pattern tree from [27,35]. In fact, the second part of Proposition 5 generalises an observation from [27] to the case with filters. An important difference to pattern trees is that in our case the order of children of a node is semantically relevant since wwd-patterns do not satisfy the equivalence This equivalence, established in [32], holds whenever (vars(P 2 ) ∩ vars(P 3 )) ⊆ vars(P 1 ), which is always the case for wd-patterns but not for wwd-patterns, as can be seen on query (2).
We conclude this section with a property that is unique to wwd-patterns: each wwd-pattern is equivalent to a pattern whose corresponding CPT has depth one.

Definition 7 A pattern in
where B is a basic pattern and each with B i a basic pattern and R i a filter constraint, or just FILTER R i .
To show that each wwd-pattern can be brought to this form, we exploit the following observation in [33].
Lemma 2 (Pérez et al. [33]) Let P be a pattern in P, G a graph, and μ 1 , μ 2 two mappings in P G . Then This lemma allows us to prove the following crucial equivalence.

Proposition 6 For patterns
Proof We first show that any solution to the left hand side is also a solution to the right hand side. Let G be a graph and let μ ∈ P 1 OPT (P 2 OPT P 3 ) G . We distinguish three cases.
-Let μ ∈ P 1 G ( P 2 G P 3 G ). Then, by Lemma 2, we have , and the claim follows. -Let μ ∈ P 1 G ( P 2 G \ P 3 G ). Then μ = μ 1 ∪ μ 2 such that μ 1 ∈ P 1 G , μ 2 ∈ P 2 G , and for every μ 3 ∈ P 3 G , μ 2 ∼ μ 3 . Since every mapping in P 2 AND P 3 G is an extension of some mapping in P 3 G , no mapping in P 2 AND P 3 G is compatible with μ 2 , and hence with μ. Therefore, μ ∈ ( P 1 G P 2 G ) \ P 2 AND P 3 G , and the claim follows. -Let μ ∈ P 1 G \ P 2 OPT P 3 G . Then μ ∈ P 1 G and is incompatible with any mapping in P 2 OPT P 3 G . Moreover, since vars(P 1 ) ∩ vars(P 3 ) ⊆ vars(P 2 ), μ is incompatible with any mapping in P 2 G , and consequently also with any mapping in P 2 AND P 3 G . Therefore, μ ∈ ( P 1 G \ P 2 G ) \ P 2 AND P 3 G , and the claim follows.
For the other direction, suppose μ ∈ (P 1 OPT P 2 ) OPT (P 2 AND P 3 ) G . We distinguish three cases.
Applied from left to right, equivalence (13) preserves weak well-designedness (but not well-designedness). Each such application transforms a weakly well-designed OPT nesting of type (Opt-R) to a nesting of type (Opt-L), decreasing the depth of the CPT.

Corollary 1 Every wwd-pattern is equivalent to a wwd-pattern in depth-one normal form.
For instance, pattern (10) is equivalent to the pattern ( (((?x, a, a) OPT (?x, b, ?y) ) OPT (?x, b, ?z) represented by the CPT in Fig. 5b. Such "flat" patterns are attractive in practice because of their regular structure. However, "flattening" a pattern can incur an exponential blow-up in size. Hence, in the rest of the paper we consider arbitrary wwdpatterns in OF-normal form rather than restricting our attention to depth-one-normal patterns.

Evaluation of wwd-Patterns
In this section, we look at the query answering problem for wwd-patterns and their extensions with union and projection. We show that in all three cases, complexity remains the same as for wd-patterns. To obtain these results, we develop several new techniques. Formally, we look at the following decision problem for a given SPARQL fragment L.

Input:
Graph G, query , and mapping Question: Does belong to G ?
It is known that EVAL(U ) for general patterns U is PSPACE-complete [33], and the result easily propagates to queries with projection (i.e., S) [27]. For wd-patterns, the evaluation problem is CONP-complete, and can be solved by exploiting the following idea of [27].
Suppose we are given a wd-pattern P in OPT-normal form (for simplicity, assume that P is FILTER-free), a graph G, and a mapping μ. First, we look for a subtree of T (P ) that includes the root of T (P ), contains precisely the variables in domμ, and "matches" G under μ (i.e., images of all its triples under μ are contained in G). This is doable in polynomial time. If such a subtree does not exist, then μ cannot be a solution. Otherwise, the subtree witnesses that μ is a part of a solution to P . Finally, to verify that μ is a complete solution, we need to check that the subtree is maximal, that is, cannot be extended to any more nodes in T (P ) with a match in G. There are linearly many such nodes to check, and each check can be performed in CONP. So, the overall algorithm runs in CONP. Inspired by this idea, we next show that the low evaluation complexity of wdpatterns transfers to wwd-patterns by developing a CONP algorithm for EVAL(P wwd ).
Let P be a wwd-pattern in OF-normal form. An r-subtree of T (P ) is a subtree containing the root of T (P ) and all its special children. Every r-subtree T (P ) of T (P ) is also a CPT representing a wwd-pattern P that can be obtained from P by dropping the right arguments of some OPT-subpatterns (a transformation known from [33]). A child of an r-subtree T (P ) of T (P ) is a node in T (P ) that is not contained in T (P ) but whose parent is.

Definition 8
A mapping μ is a potential partial solution (or pp-solution for short) to a wwd-pattern P over a graph G if there is an r-subtree T (P ) of T (P ) such that dom(μ) = vars(P ), μ(triples(P )) ⊆ G, and μ |= R for the constraint R of each ordinary node in T (P ).
A pp-solution μ to P over G can be witnessed by several r-subtrees. However, the union of such r-subtrees is also a witness. Hence, there exists a unique maximal witnessing r-subtree, denoted T (P μ ), with P μ being the corresponding wwd-pattern.
Potential partial solutions generalise "partial solutions" as defined in [33] for wdpatterns. There, every "partial solution" is either a solution or can be extended to one. This is not the case for wwd-patterns. While every solution is clearly a pp-solution, not every pp-solution can be extended to a real one. Real solutions may not just extend pp-solutions by assigning previously undefined variables but can also override variable bindings established in some node v of T (P μ ) by extending to a child of T (P μ ) that precedes v according to the order ≺.
An additional complication is the presence of non-well-designed top-level filters. Note that pp-solutions are only required to satisfy the constraints of ordinary nodes in the corresponding CPT, thus ignoring top-level filters. Indeed, requiring pp-solutions to satisfy constraints of top-level filters would be too strong since real solutions do not generally satisfy this property, as demonstrated by the following example. G = {(1, a, 1), (3, a, 3)} and wwd-pattern P = (((?x, a, 1) OPT (?y, a, 2)) FILTER ¬bound(?y)) OPT (?y, a, 3).
We now present a characterisation of solutions for wwd-patterns in terms of ppsolutions that (a) takes into account that not every pp-solution can be extended to a real solution and (b) ensures correct treatment of non-well-designed top-level filters. For this we need some more notation. Given a wwd-pattern P , a node v in T (P ), a graph G, and a pp-solution μ to P over G, let μ| v be the projection μ| X of μ to the set X of all variables appearing in nodes u of T (P μ ) such that u ≺ v. A mapping μ 1 is subsumed by a mapping μ 2 (written μ 1 μ 2 ) if μ 1 ∼ μ 2 and dom(μ 1 ) ⊆ dom(μ 2 ) (this notion is from [5,33]).
Intuitively, a pp-solution μ needs to satisfy two conditions to be a real solution to a wwd-pattern P . First, μ| v (as opposed to μ for wd-patterns) must be non-extendable to v for any child v of T (P μ ). Indeed, if such an extension exists, then it is either possible to provide bindings for some variables that are undefined in μ, or some variables from dom(μ) can be assigned different values of higher "priority" than the corresponding values in μ. Second, every top-level filter R labelling a node s needs to be satisfied by μ| s , which is precisely the part of μ bound by the subpattern of P that is paired with R in the FILTER-pattern. The following lemma formalises this intuition.

Lemma 3 A mapping μ is a solution to a wwd-pattern P over a graph G if and only if
1. μ is a pp-solution to P over G; 2. for every child v of T (P μ ) labelled with (B, R) there is no μ such that μ| v μ , μ |= R, and μ (B) ⊆ G; 3. μ| s |= R for every special node s in T (P ) labelled with R.
Proof In this proof we write T v for the complete subtree of a CPT T rooted at a node v (i.e., the subtree over all the descendants of v including v itself) and T ≺v for the subtree of T consisting of all nodes u such that u ≺ v.
For the forward direction, suppose μ is a solution to P over G. Clearly, μ is a pp-solution to P over G, so it suffices to show that conditions 2 and 3 hold.
For condition 2, assume for contradiction that v is a child of T (P μ ) labelled with (B, R) and μ a mapping such that μ| v μ , μ |= R, and μ (B) ⊆ G. Moreover, without loss of generality, let dom(μ ) = dom(μ) ∪ vars(B). Let u be the parent of v in T (P ), and let T be the largest subtree of T (P ) that is rooted at u and has v as the last child of u. Then . Moreover, since u is contained in T (P μ ), there is a mapping μ 1 μ such that μ 1 ∈ T G . Since v is not contained in T (P μ ), we have μ 1 μ| v and, since T (P μ ) is the largest r-subtree witnessing μ, μ 1 is not compatible with any mapping in T v G . On the other hand, μ satisfies the label of v, and thus, since T v contains no top-level filters, μ | vars(v) can be extended to a mapping of μ ∈ T v G . Moreover, since P is weakly well-designed, vars(T v ) ∩ dom(μ| v ) ⊆ vars(v), and hence dom(μ ) ∩ dom(μ 1 ) ⊆ dom(μ ). Thus, since μ| v is compatible with μ , μ 1 is compatible with μ , in contradiction to the above observation that μ 1 is not compatible with any mapping in T v G .
For condition 3, let s be a special node in T (P ) labelled with R. Since μ is a solution to P , there is some μ 1 ⊆ μ such that μ 1 ∈ T (P ) ≺s G and μ 1 |= R. Hence, it suffices to show that μ 1 = μ| s . Clearly, μ 1 ⊆ μ| s (as μ| s is the largest mapping compatible with μ that can occur in T (P ) ≺s G ), so assume for contradiction that there is a variable ?x ∈ dom(μ| s ) \ dom(μ 1 ). Then there is a node in T (P μ| s ) ∩ T (P ) ≺s that does not occur in T (P μ 1 ) ∩ T (P ) ≺s . This yields a contradiction with μ 1 ∈ T (P ) ≺s G analogously to the case of condition 2.
For the backward direction, suppose that μ satisfies conditions 1-3. We show that μ ∈ P G by induction on the depth of T (P μ ), that is, the maximal number of edges between the root and a leaf.
For the basis of the induction, let the depth of T (P μ ) be 0, that is, the root v of T (P ) be the only node of T (P μ ). We prove the claim by induction on the number n of children of v in T (P ). If n = 0, then P = B FILTER R for some basic pattern B and filter constraint R, and the claim follows since μ is a pp-solution to P over G. For the inductive step, suppose the claim holds for all wwd-patterns P and mappings μ satisfying 1-3 provided T (P μ ) has depth 0 and n − 1 children in T (P ). Let P and μ be such that T (P μ ) has depth 0 and n children in T (P ). Let u be the last child of (the root v of) T (P μ ). Then μ| u is a pp-solution to T (P ) ≺u that satisfies conditions 2 and 3 since (μ| u )| w = μ| w for every w ≺ u. Hence, by the inductive hypothesis for the pattern corresponding to T (P ) ≺u and the mapping μ| u , we have μ| u ∈ T (P ) ≺u G . We distinguish two cases.
-Let u be a special node labelled with R. Then it suffices to show that μ| u |= R, which is immediate since μ| u satisfies condition 3. -Let u be an ordinary node labelled with (B, R). We know that u is not in T (P μ ).
Since v is in T (P μ ), by condition 2 there is no mapping μ such that (a) μ| u μ , (b) μ |= R, and (c) μ (B) ⊆ G. Since R is safe, it follows that every mapping satisfying (b) and (c) is incompatible with μ| u . Consequently, every mapping in T (P ) u G is incompatible with μ| u , and hence μ = μ| u ∈ T (P ) ≺u G \ T (P ) u G , as required.
For the outer inductive step, let the claim hold for all P and μ with T (P μ ) of depth d −1, for some d > 0. Once again, we show the claim for P and μ with T (P μ ) of depth d by induction on the number n of children of the root v of T (P ). The basis is vacuous as v cannot have 0 children while T (P μ ) has positive depth. The inductive step is the same as for depth 0, except that we have an additional case for the last child u of the root v.
-Let u be an ordinary node labelled with (B , R ) that is contained in T (P μ ).
Then μ = μ| u ∪ μ 2 where μ 2 is the projection of μ to the set of variables occurring in the subtree T of T (P μ ) rooted at u (i.e., T = T (P μ ) u ). Since u is contained in T (P μ ) and contains no special children, μ 2 is a pp-solution to (the subpattern represented by) T (P ) u . Moreover, μ 2 satisfies condition 3 with respect to T (P ) u since T (P ) u contains no special nodes. We next show that μ 2 satisfies condition 2 with respect to T (P ) u . Let w be a child of T (in T (P ) u ) labelled with (B, R), and assume for contradiction that there is some μ such that μ 2 | w μ , μ |= R, and μ (B) ⊆ G. Without loss of generality, dom(μ ) = dom(μ 2 ) ∪ vars(B). Thus, since P is weakly well-designed, vars(B) ∩ dom(μ| u ) ⊆ vars(B ) ⊆ dom(μ 2 ). Hence, μ is compatible with μ| u , and μ| w μ| u ∪ μ . Moreover, since μ and μ| u ∪ μ coincide on vars(B) and R is safe, we have that μ| u ∪ μ |= R and (μ| u ∪ μ )(B) ⊆ G, contradicting the assumption for μ. Since μ 2 satisfies conditions 1-3 with respect to T (P ) u , by the outer inductive hypothesis we obtain that μ 2 ∈ T (P ) u G , and hence μ ∈ T (P ) ≺u G T (P ) u G (as μ| u ∈ T (P ) ≺u G holds by the inner inductive hypothesis). The claim follows.
Checking whether a mapping μ satisfies this characterisation is feasible in CONP, and the matching lower bound follows from the CONP-hardness of evaluation of wd-patterns [33].

Theorem 1 Problem EVAL(P wwd ) is CONP-complete.
Proof The lower bound of this statement is known from [33], and the upper bound can be obtained from Lemma 3 as follows.
First we show that testing whether μ is a pp-solution takes polynomial time, same as computing the maximal witnessing tree T (P μ ). We just proceed from the root of the tree down along the branches until we cannot find a match μ(triples(v)) in G for the basic pattern in the child v which satisfies the condition in the node, and then check that the variables in the resulting tree are exactly vars(μ). So, the crucial part is to check that T (P μ ) is not extendable to any of its children. But there are only linearly many children, and each check can be done in CONP. Finally, the checks for top-level filters are again polynomial.
Pérez et al. [33] extended wd-patterns to UNION by considering unions of wdpatterns, that is, patterns of the form P 1 UNION . . . UNION P n with all P i ∈ P wd . We denote the resulting fragment by U wd . This syntactic restriction on the use of UNION in U wd is motivated by the fact that any pattern in U can be equivalently expressed as a union of UNION-free patterns [33]. We denote the fragment of all queries over patterns in U wd by S wd . Similarly, we write U wwd for unions of wwd-patterns and S wwd for queries over unions of wwd-patterns.
Analogously to the well-designed case, Theorem 1 extends to fragments U wwd and S wwd .

Corollary 2 Problem EVAL(U wwd ) is CONP-complete, and EVAL(S wwd ) is
The CONP-algorithm for U wwd is obtained simply by applying the algorithm for P wwd to each pattern in the union. Hardness for S wwd follows from the hardness of the well-designed case [27], while for membership we just guess the values of the existential variables and then call a CONP-oracle for U wwd on the resulting mapping and the normalised body of the query.
Hence, the complexity of evaluation for wwd-patterns is the same as for wdpatterns. We next show that wwd-patterns are, in a certain sense, a maximal extension of wd-patterns that preserves CONP evaluation complexity (under the usual complexity-theoretic assumptions).
The definition of weakly well-designed patterns suggests two intuitive ways in which it could be relaxed. Given an occurrence i of an OPT-subpattern P 1 OPT P 2 , one could allow variables in vars(P 2 ) \ vars(P 1 ) to occur in -some subpatterns whose occurrences are not dominated by i, or -constraints of some non-top-level occurrences of FILTER-patterns.
We next show that either relaxation immediately makes the evaluation problem p 2hard.
For the first relaxation, the arguably simplest special case would be to allow for some non-well-designed OPT-nesting of type (Opt-R). Consider the fragment P opt-r of patterns of the form B 1 OPT (B 2 OPT B 3 ), where B 1 , B 2 and B 3 are basic patterns. Intuitively, P opt-r allows for the most simple form of non-well-designed nesting of type (Opt-R).
Proof This theorem is a corollary of [38,Theorem 4] for their class E ≤3 , but without UNION. Now suppose we allow for some non-well-designed non-top-level filters, as suggested by the second relaxation. As we will see next, even a very restricted fragment of patterns allowing for such filters is p 2 -complete. This implies that the requirement that special nodes be children of the root, while it may look somewhat ad-hoc, cannot be substantially relaxed. Consider the fragment P filter-2 of patterns of the form where B 1 , B 2 and B 3 are basic patterns such that vars(B 3 ) ∩ vars(B 1 ) ⊆ vars(B 2 ), and R is a filter constraint. Intuitively, P filter-2 allows for the simplest form of "second-level" filters.

Theorem 3 Problem EVAL(P filter-2 ) is
Proof This problem allows for a reduction from a restriction of EVAL(P opt-r ). Indeed, from the proof of [38,Theorem 4] it follows that it is already p 2 -hard to check whether μ ∈ P G for P of the form B 1 OPT (B 2 OPT B 3 ) with dom(μ) = vars(B 1 ) and vars(B 2 ) \ vars(B 1 ) = ∅. Let P and μ be such a pattern and such a mapping, respectively. Consider the pattern with B 3 a basic pattern obtained from B 3 by replacing all the variables ?x 1 , . . . , ?x n in (vars(B 3 ) ∩ vars(B 1 )) \ vars(B 2 ) by their fresh copies ?x 1 , . . . , ?x n (if no such variables exist, that is, if the original pattern is well-designed, we just set R to ). Clearly, P ∈ P filter-2 , so it suffices to show, for every G and μ with dom(μ) = vars(B 1 ), that μ ∈ P G if and only if μ ∈ P G .
For the forward direction, suppose dom(μ) = vars(B 1 ) and μ ∈ P G . Since vars(B 2 ) \ vars(B 1 ) = ∅, we must have μ ∈ B 1 G \ B 2 OPT B 3 G . Thus, μ ∈ B 1 G and for every μ ∈ B 2 OPT B 3 G we have μ ∼ μ . Since μ ∈ B 1 G , to show μ ∈ P G it suffices to verify that μ is not compatible with any μ ∈ ((B 2 ∪ B 1 ) OPT B 3 ) FILTER R G , for which we distinguish the following two cases.
For the backward direction, suppose dom(μ) = vars(B 1 ) and μ ∈ P G . Again, To show μ ∈ P G it suffices to verify that μ is not compatible with any μ ∈ B 2 OPT B 3 G . Assume for the sake of contradiction that this is not the case and there is a compatible μ . We distinguish the following two cases.
Theorems 2 and 3 suggest that P wwd is a maximal fragment of P that does not impose structural restrictions on basic patterns or filter constraints and has a CONP evaluation algorithm (assuming CONP = p 2 ). Hence, going beyond wwd-patterns while preserving good computational properties requires more refined restrictions, possibly in the spirit of [27, Section 4].

Expressivity of wwd-Patterns and Their Extensions
In this section, we analyse the expressive power of our fragments.

Definition 9
A language L 1 is strictly less expressive than a language L 2 (written L 1 < L 2 ) if for every query Q 1 in L 1 there is a query Q 2 in L 2 such that Q 1 ≡ Q 2 , and there is a query Q 2 in L 2 such that Q 1 ≡ Q 2 for every query Q 1 in L 1 .
We begin with UNION-free patterns, establishing that P wd < P wwd < P, and then proceed to unions, showing that U wd < U wwd < U , and queries, showing that S wd < S wwd < S.
Following [5,33], a set of mappings 1 is subsumed by a set of mappings 2 (written 1 2 ) if for every μ 1 ∈ 1 there exists a mapping μ 2 ∈ 2 such that μ 1 μ 2 . A query Q is weakly monotone if Q G 1 Q G 2 for any two graphs G 1 and G 2 with G 1 ⊆ G 2 , and a fragment L is weakly monotone if it contains only weakly monotone queries. Arenas and Pérez [5] showed that, unlike P, the fragment P wd is weakly monotone, and hence P wd < P.
Analogously, we show that P wd < P wwd by observing that P wwd is not weakly monotone.
To distinguish P wwd from P we need a different property.

Definition 10
A query Q is non-reducing if for any two graphs G 1 , G 2 such that G 1 ⊆ G 2 and any mapping μ 1 ∈ Q G 1 there is no μ 2 ∈ Q G 2 such that μ 2 μ 1 (i.e., μ 2 μ 1 and μ 2 = μ 1 ). A fragment is non-reducing if it contains only non-reducing queries.
Intuitively, for a non-reducing query extending a graph cannot result in a previously bound answer variable becoming unbound. All weakly monotone queries are non-reducing but not vice versa. Moreover, all wwd-patterns are non-reducing.

Proposition 8 Fragment P wwd is non-reducing.
Proof Let P ∈ P wwd and let G 1 , G 2 be two graphs such that G 1 ⊆ G 2 . We show that μ 2 μ 1 for any μ 1 ∈ P G 1 and μ 2 ∈ P G 2 by induction on the structure of P , proving, in parallel, that if all filters in P are over basic patterns, then for every mapping μ 1 ∈ P G 1 there is a mapping μ 2 ∈ P G 2 such that μ 1 | vars(v) = μ 2 | vars (v) for v the root of T (P ).
For the base case, suppose P = B FILTER R for some basic pattern B and filter constraint R. Then, P is monotone in the sense of [5], that is, satisfies P G 1 ⊆ P G 2 . Moreover, P contains no OPT, and hence every two distinct mappings in P G 2 have the same domain and are thus incompatible. These facts imply both claims.
For the inductive step, suppose first that P = P 1 OPT P 2 and both claims hold for P 1 and P 2 . Let μ 1 ∈ P G 1 . We first prove that μ 2 μ 1 for any μ 2 ∈ P G 2 . We distinguish two cases.
Suppose now that all filters in P are over basic patterns. We need to prove that there is . We know that μ 1 extends some μ 1 ∈ P 1 G 1 . Thus, by the inductive hypothesis, there is some μ 2 ∈ P 1 G 2 that coincides with μ 1 on the variables in the root of T (P 1 ). The claim follows since μ 2 can be extended to a mapping μ 2 for P that coincides with μ 2 on the variables in the root of T (P 1 ), and, by construction, the root of T (P 1 ) and the root of T (P ) have the same label. Consider now the inductive step for the case when P = P 1 FILTER R. Since P 1 is not a basic pattern, we only need to show that μ 2 μ 1 for any μ 1 ∈ P G 1 and μ 2 ∈ P G 2 . This holds by the inductive hypothesis, because μ 1 ∈ P 1 G 1 and μ 2 ∈ P 1 G 2 for any such μ 1 and μ 2 .
In contrast to Proposition 8, patterns in P do not generally satisfy non-reducibility. For instance, consider again pattern P , graphs G 1 , G 2 , and mappings μ 1 , μ 2 from Example 4. Pattern P is not non-reducing since μ 1 ∈ P G 1 and μ 2 ∈ P G 2 but μ 2 μ 1 . Therefore, we have the following theorem.

Theorem 4 It holds that
We next compare U wwd to U wd and U , as well as S wwd to S wd and S (note that neither UNION nor projection via SELECT can be expressed by means of the other operators [40], so adding either construct makes each fragment strictly more expressive). It is easily seen that U wd and S wd inherit weak monotonicity from P wd [27,33], and hence U wd < U wwd and S wd < S wwd . Non-reducibility, however, does not propagate to unions.
We have μ 1 ∈ P G 1 and μ 2 ∈ P G 2 but μ 2 μ 1 , which is due to the fact that μ 2 is already contained in P G 1 along with μ 1 . This is only possible in the presence of UNION since all mappings in the evaluation of a UNION-free pattern are mutually non-subsuming (see Lemma 2).
Thus, to account for UNION, we introduce the following, more delicate property.

Definition 11
A query Q is extension-witnessing (e-witnessing) if for any two graphs G 1 ⊆ G 2 and mapping μ ∈ Q G 2 such that μ / ∈ Q G 1 there is a triple t in Q such that vars(t) ⊆ dom(μ) and μ(t) ∈ G 2 \ G 1 . A fragment is e-witnessing if so are all of its queries.
Informally, a query Q is e-witnessing if whenever an extension of a graph leads to a new answer, this answer is justified by a triple pattern in Q which maps to the extension. Unions of wwd-patterns can be shown e-witnessing.

Proposition 9 Fragment U wwd is e-witnessing.
Proof Let P ∈ U wwd and let G 1 , G 2 be graphs such that G 1 ⊆ G 2 . Let μ be a mapping in P G 2 but not in P G 1 . We show that there is some t ∈ triples(P ) such that μ(t) ∈ G 2 \ G 1 .
Since P is a union of wwd-patterns, there is some wwd-pattern P in the union such that μ ∈ P G 2 . It suffices to show μ(triples(P μ )) ∩ (G 2 \ G 1 ) = ∅, where P μ is the pattern corresponding to the maximal r-subtree of P witnessing μ in G 2 (i.e., the part of P in the image of μ, see Definition 8). We know that μ(triples(P μ )) ⊆ G 2 . Assume, for contradiction, that μ(triples(P μ )) ⊆ G 1 . Then μ is a pp-solution to P over G 1 . We next show that μ is a real solution to P over G 1 . By Lemma 3, it suffices to show that (a) for any child u of T (P μ ) labelled with (B, R), there is no mapping μ such that μ| u μ , μ |= R, and μ (B) ⊆ G 1 , and (b) μ| s |= R for any special node s in T (P ) labelled with R. Claim (a) holds since μ ∈ P G 2 and G 1 ⊆ G 2 while (b) holds since μ ∈ P G 2 and the claim does not depend on the graph over which the evaluation is computed. Consequently, μ ∈ P G 1 , and hence μ ∈ P G 1 , in contradiction to the assumption.
On the other hand, U is not e-witnessing, as can be seen on the pattern and graphs in Example 4. Hence, we obtain the following theorem.

Theorem 5 It holds that
Next we move to the fragments that allow for projection. As already mentioned, we have S wd < S wwd since S wd is weakly monotone while S wwd is not. However, S wwd is not e-witnessing, so we cannot apply the technique of Theorem 5 to establish S wwd < S; instead, we make use of the following lemma. Q be a query in S wwd and G be a graph. For every graph G 1 with  G ⊆ G 1 and every μ ∈ Q G 1 , there is a graph G 2 with G ⊆ G 2 such that μ ∈ Q G 2 and |G 2 | ≤ |G| + |triples(Q)|.

Lemma 4 Let
Proof Let Q = SELECT X WHERE P , for P a union of wwd-patterns, and let G, G 1 and μ be as required. Then there is a wwd-pattern P in the union P such that μ ∈ P G 1 for some μ with μ | X = μ. Let G 2 = G ∪ μ (triples(P μ )). Clearly, |G 2 | ≤ |G| + |triples(Q)|, so it suffices to show that μ ∈ P G 2 .
By construction, μ is a pp-solution to P over G 2 . Moreover, since μ is a solution to P over G 1 , we have that μ| s |= R for every special node s in T (P ) labelled with R. Finally, suppose for contradiction that there is a child v of T (P μ ) labelled with (B, R) and a mapping μ such that μ | v μ , μ |= R, and μ (B) ⊆ G 2 . However, since G 2 ⊆ G 1 , we then have μ (B) ⊆ G 1 , which contradicts the fact that μ ∈ P G 1 .
This lemma is the base of the last result of the section.
Theorem 6 It holds that S wd < S wwd < S.
Proof As observed before, the inclusion S wd < S wwd holds since S wd is weakly monotone [27,33] and S wwd is not.
By equivalence (5) in Section 3, the operator DIFF can be expressed via OPT, AND and FILTER, so we can assume that Q ∈ S. On the other hand, it is easily seen that Q / ∈ S wwd . The mapping μ = {?x → a} is an answer to Q over G 1 but not an answer over any G n with n ≥ 2. Moreover, it is easily seen that any extension G of G n such that μ ∈ Q G requires the addition of at least n − 1 triples, namely {(d 2 , c, c), . . . , (d n , c, c)}. Consequently, μ ∈ Q G implies |G| ≥ |G 1 | + n − 1.
Suppose for contradiction there is a query Q in S wwd such that Q ≡ Q. Let n = |triples(Q )| + 2. Then, by Lemma 4, μ ∈ Q G for some G with |G| ≤ |G n |+|triples(Q )| = |G n |+n−2, which contradicts the above observation for Q.

Static Analysis of wwd-Patterns
In this section, we look at the general static analysis problems of query equivalence and containment. Formally, equivalence for a language L is defined as follows.

Queries and in Question:
Is ?
This problem is commonly generalised to CONTAINMENT(L), in which one checks whether Q is contained in Q , written Q ⊆ Q , that is, whether Q G ⊆ Q G holds for every graph G. We have Q ≡ Q if and only if Q and Q contain each other. These problems have been studied for FILTER-free wd-patterns in [27,35], establishing NP-completeness of equivalence and containment. Moreover, both problems are p 2 -complete for unions of FILTER-free wd-patterns, and undecidable for fragments with projection. Finally, from the results in [41] it follows that containment is undecidable for U . On the other hand, nothing seems to be known so far for well-designed patterns with FILTER.
We next show that equivalence and containment are both p 2 -complete for P wwd and U wwd (whereas they are undecidable for S wwd by the results in [35]). As the following lemma shows, the upper bound for containment follows from a small counterexample property: if P ⊆ P for some P and P from U wwd , then there is a witnessing mapping and graph of size O(|P | + |P |). Given this property, a p 2 algorithm for containment is straightforward-we guess a mapping μ and a graph G of linear size, check that μ / ∈ P G , and then call a CONP oracle for checking μ ∈ P G . As a corollary, EQUIVALENCE(U wwd ) is also in p 2 .

Lemma 5
Let P and P be two patterns from U wwd . If P ⊆ P then there exists a mapping μ and a graph G of size O(|triples(P )| + |triples(P )|) such that μ ∈ P G but μ / ∈ P G .
Proof Without loss of generality, let us assume that where all P i and P j are wwd-patterns in OF-normal form.
Since P ⊆ P , there exists a graph G , a mapping μ, and a pattern P i , 1 ≤ i ≤ n, such that μ ∈ P i G , but μ / ∈ P j G for every P j . Hence, μ is a ppsolution to P i over G with corresponding r-subtree T ((P i ) μ ) of the CPT T (P i ). Let G 0 = μ(triples((P i ) μ )). By construction, we have that G 0 ⊆ G and |G 0 | ≤ |triples((P i ) μ )| ≤ |triples(P i )| ≤ |triples(P )|. Moreover, μ ∈ P i G 0 , because all the matches and constraints, including the ones on the top-level, stay unchanged. In fact, μ ∈ P i G for any G such that G 0 ⊆ G ⊆ G .
If μ / ∈ P j G 0 for every j , then G 0 satisfies all the properties required from G. Otherwise, there exists P j among P 1 , . . . , P m such that μ ∈ P j G 0 . Since G 0 ⊆ G , μ is a pp-solution to P j over G . Consider the corresponding pattern (P j ) μ (i.e., the maximal pattern witnessing μ in G obtained from P j by dropping the right arguments of some OPT operators), the r-subtree T ((P j ) μ ) of the CPT T (P j ), and the "image" μ(triples((P j ) μ )). Note that we may have μ(triples((P j ) μ )) ⊆ G 0 or not: the latter is possible because the maximal r-subtree of T (P j ) witnessing μ in G 0 may be different from T ((P j ) μ ), which is maximal in G . Let G 1 = G 0 ∪ μ(triples((P j ) μ )). We define G 1 depending on whether μ ∈ P j G 1 or not. If μ / ∈ P j G 1 , then let G 1 = G 1 . Otherwise, since μ / ∈ P j G by assumption, there exists a child v of T ((P j ) μ ) and a mapping μ 0 such that μ| v μ 0 and μ 0 (triples(v)) ⊆ G . Then the graph G 1 = G 1 ∪ μ 0 (triples(v)) is such that μ / ∈ P j G 1 . In either case, μ ∈ P i G 1 because G 0 ⊆ G 1 ⊆ G . Moreover, we have μ / ∈ P j G for every G such that G 1 ⊆ G ⊆ G . To see this, suppose for contradiction that μ ∈ P j G for a graph G as above. Then there must be a child v of T ((P j ) μ 0 ) such that v ≺ v, μ o | v μ and μ(triples(v )) ⊆ G . Since T ((P j ) μ 0 ) and T ((P j ) μ ) are identical restricted to nodes preceding v with respect to ≺, v is a child of T ((P j ) μ ). Thus, v is not contained in T ((P j ) μ ), which contradicts maximality of T ((P j ) μ ) since μ(triples(v )) ⊆ G ⊆ G .
If μ / ∈ P j G 1 for all other j as well, then G 1 satisfies all the properties required from G. Otherwise we can extend G 1 to a graph G 2 on the base of some other P j with μ ∈ P j G 1 in the same way as G 1 extends G 0 . We then have G 2 ⊆ G , μ ∈ P G 2 , and μ / ∈ P j G 2 for j from both steps. Repeating the extension step until there are no P j having μ as a solution on the resulting graph, we obtain a graph that satisfies all the properties required from G; in particular, for each j the number of added triples to the graph is bounded by |triples(P j )|.
Hardness of equivalence is established in the following lemma by a reduction of ∀∃3SAT, while containment is p 2 -hard by the results in [35]. Note that both results hold even for fragments without FILTER.
Proof We proceed by reduction of the ∀∃3SAT problem, that is, the problem of checking whether a formula of the form holds for a conjunction ψ of clauses t 1 ∨ t 2 ∨ t 3 with t i propositional literals, that is, propositional variables fromx ∪ȳ or their negations. Without loss of generality, we assume that ψ contains no tautologous clauses and no clauses with duplicate literals. Let φ be a formula of the form (14). Starting from φ, we construct FILTER-free wwdpatterns P and P in OF-normal form, and then show that φ is true if and only if P ≡ P . Letx = x 1 , . . . , x n andȳ = y 1 , . . . , y m . For each clause γ = t 1 ∨ t 2 ∨ t 3 , there are exactly 7 assignments to the variables in t 1 , t 2 , t 3 making γ true, and exactly one assignment making γ false (since γ is assumed to be non-tautologous and contain no duplicate literals). Let, for each such γ in ψ, each , 1 ≤ ≤ 7, and each j , 1 ≤ j ≤ 3, val(γ, , j) = if the variable of literal t j evaluates to true in the 'th assignment making γ true, and val(γ, , j) = ⊥, otherwise; here and ⊥ are fresh IRIs. Let also, for every clause γ in ψ, cl 1 γ , . . . , cl 7 γ and lit 1 γ , lit 2 γ , lit 3 γ be fresh IRIs. We define, for each γ and 1 ≤ ≤ 7, a basic pattern (γ , , 3))}, and a basic pattern B * γ = B 1 γ ∪ · · · ∪ B 7 γ (note that these patterns do not have any variables).
For example, a visualisation of these patterns for is shown in Fig. 6. Finally, let be two FILTER-free wwd-patterns in OF-normal form. We next show that φ is true if and only if P is equivalent to P , starting with the forward direction.
Let φ be true, yet, for the sake of contradiction, P is not equivalent to P . Then there is a graph G and mapping μ such that μ ∈ P G , but μ ∈ P G . Since patterns P and P have the same root B base , which contains ?u as the only variable, we conclude that ?u ∈ dom(μ). Each ?x i is also in dom(μ) by the construction of P , since there is a homomorphism from the corresponding leaf B i to the root B base . However, it is not necessary that (μ(?x i ), iri x i , ) is in G because if G contains a triple of the form (c, iri x i , ⊥) for some IRI c, we will have (μ(?x i ), iri x i , ⊥) ∈ G. Note also that nothing prevents G from containing both a triple (c, iri x i , ⊥) and a triple (c, iri x i , ) for some i. Depending on whether ?s ∈ dom(μ) or not, we have two cases.
Case 1 Let ?s ∈ dom(μ), that is, there is a homomorphism from B ψ to G that aligns with the previous assignment of all ?x i . In particular, this means that dom(μ) = and (e) B valid used in the proof of Lemma 6 on the example formula ∀x 1 , x 2 ∃y 1 , y 2 γ 1 ∧γ 2 with γ 1 = ¬x 1 ∨y 1 ∨y 2 and γ 2 = ¬y 1 ∨¬y 2 ∨x 2 vars(P ) = vars(P ). If there is no homomorphism from B valid to G, then μ ∈ P G , because B ψ is the last leaf of P as well, and nothing prevents it from matching. But this contradicts the assumption. However, even if there is a homomorphism h from B valid to G, we still have a contradiction because μ ∈ P G still holds. Indeed, ?s is the only variable in B valid and is essentially isolated in B valid , so if h is a homomorphism from B valid to G, then h , which maps ?s to μ(?s), is also such a homomorphism (in other words, since by the assumptions of this case, we know that (?s, r, ?s) has a match in G, the existence of h just means that all the ground triples of B valid are in G). This means, however, that nothing prevents B ψ from matching in P , implying μ ∈ P G . Case 2 Let ?s / ∈ dom(μ). Since μ / ∈ P G , there is no homomorphism from B ψ to G but there is one from B valid to G, that is, all ground triples of B valid are in G (the non-existence of a homomorphism from B ψ to G is immediate since ?s / ∈ dom(μ) and μ ∈ P G ; the existence of a homomorphism from B valid to G then follows since otherwise we would have μ ∈ P G ). Consider now a truth assignment α of variablesx such that if α(x i ) is true then (μ(?x i ), iri x i , ) ∈ G and if α(x i ) is false then (μ(?x i ), iri x i , ⊥) ∈ G (as we mentioned earlier, α may be not unique, but the argument does not depend on its uniqueness). Since φ is true, we know that α can be extended to the variablesȳ in such a way that each clause in ψ holds. Let α be such an extension, and let μ be an extension of μ to the variables in B ψ such that, for all j , μ (?y j ) = cy if α (y j ) is true and μ (?y j ) = cy ⊥ otherwise. Then, for every clause γ in ψ, the IRIs μ (?v 1 γ ), μ (?v 2 γ ), μ (?v 3 γ ) correspond to the values α (z 1 ), α (z 2 ), α (z 3 ), respectively, where z 1 , z 2 , z 3 are the variables in the literals t 1 , t 2 , t 3 of γ . Moreover, μ (?c γ ) = cl γ for the number of the assignment α (z 1 ), α (z 2 ), α (z 3 ); this assignment makes γ true by the choice of α (in other words, for every γ there is some such that (μ (?c γ ), lit i γ , μ (?v i γ )) ∈ B γ for all 1 ≤ i ≤ 3). Hence, the extension μ is contained in P G , and hence μ / ∈ P G , which, however, contradicts the original assumption.
Since both cases yield a contradiction, we conclude that P is equivalent to P . We continue with the backward direction of the equivalence. Suppose that P ≡ P , yet, for the sake of contradiction, φ is false. Then, there is a truth assignment α of the variablesx such that for each extension α of α to the variablesȳ there is a clause γ in ψ that evaluates to false under α . Fix such an α and consider the graph G consisting of the following triples: -the triple (u, iri x i , ), for each x i inx and a fresh IRI u; -the triple (c x i , iri x i , ), for each x i inx with α(x i ) true, and the triple (c x i , iri x i , ⊥), for each x i inx with α(x i ) false, where c x 1 , . . . , c x n are fresh IRIs; -all ground triples from B valid , that is, all of its triples except (?s, r, ?s); -the triple (s, r, s) for a fresh IRI s.
Consider also the mapping μ such that Clearly, μ is a pp-solution to both P and P over G. However, by the construction of G, the mapping μ = μ ∪ {?s → s} is a pp-solution to P over G as well. Thus, μ is not a solution to P over G, and, since P ⊆ P , we also have μ / ∈ P G . Consequently, it must be possible to further extend μ to a mapping μ that is both in P G and in P G , and is defined on all variables in B ψ . Essentially, this means that there is a homomorphism from B ψ to B valid that preserves ?s. Consider now the extension α of α to the variablesȳ such that μ (?y j ) = cy if α (y j ) is true and μ (?y j ) = cy ⊥ otherwise. By the construction of B valid , this assignment validates all clauses in ψ, which, however, contradicts the assumption that φ is false.
Thus, we have shown that φ is true if and only if P ≡ P .

Theorem 7
Problems EQUIVALENCE(L) and CONTAINMENT(L) are both p 2complete for any L ∈{P wwd , U wwd }.
Proof The existence of a p 2 algorithm for containment immediately follows from Lemma 5: to show that P ⊆ P , for P , P ∈ P wwd , we just need to guess, in NP, a graph G of linear size as well as a mapping μ, check that μ / ∈ P G , and then call for a CONP oracle for checking that μ ∈ P G . The claim for patterns in U wwd is similar, but involves guessing a disjunct P 1 of P with μ ∈ P 1 G and checking μ / ∈ P 1 G for every disjunct P 1 of P . Since P ≡ P if and only if containment holds in both directions, the problem EQUIVALENCE(U wwd ) is also in p 2 .
Hardness follows by the results in [35] for containment and by Lemma 6 for equivalence.
Hence, for UNION-and FILTER-free patterns, the step from well-designed to weakly well-designed OPT incurs a complexity jump for containment and equivalence. However, for the fragments with UNION or projection complexity remains the same in both cases. As far as we are aware, these are the first decidability results on query equivalence and related problems for SPARQL fragments with OPT and FILTER.

Analysis of DBpedia Logs
In this section, we present an analysis of query logs over DBpedia, which suggests that the step from wd-patterns to wwd-patterns makes a dramatic difference in real life: while only about half of the queries with OPT have well-designed patterns, almost all of these patterns fall into the weakly well-designed fragment.
DBpedia [26] is a project providing access to RDF data extracted from Wikipedia via a SPARQL endpoint. DBpedia query logs are well suited for analysing the structure of real-life SPARQL queries as they contain a large amount of general-purpose knowledge base queries, generated both manually and automatically. DBpedia query logs have been analysed by Picalausa and Vansummeren [34], who reported that, over a period in 2010, about 46.38% of a total of 1344K distinct DBpedia queries used OPT. However, only 47.80% of the queries with OPT had well-designed patterns. Another analysis of DBpedia logs from the USEWOD 2011 data set performed by Arias Gallego et al. [7] concluded that 16.61% of about 5166K queries contained OPT; however, detailed structure of queries was not analysed.
We considered query logs over DBpedia 3.9 from USEWOD 2015 [30] and USE-WOD 2016 [29]. The USEWOD 2015 DBpedia dataset is a random selection of almost 14M queries from the first half of 2014 while the USEWOD 2016 dataset contains 35M queries from the second half of 2015. We removed syntactically incorrect queries as well as queries outside of S (in particular, queries using operators specific to SPARQL 1.1). Also, we rewrote the patterns of the remaining queries to unions of UNION-free patterns as proposed in [33] and eliminated duplicates, which left us with 6.6M queries in USEWOD 2015 and 9.1M queries in USEWOD 2016. Finally, we isolated queries involving OPT and counted how many of their patterns were in U wd and in U wwd .
The results are given in Table 1. They confirm that a non-negligible number of DBpedia queries use OPT (over 17%). However, by far not all queries with OPT are well-designed (only about 44% for USEWOD 2015 and 52% for USEWOD 2016), which is consistent with the results in [34]. On the other hand, almost all of the patterns with OPT (over 99.9% in both cases) are weakly well-designed, which we consider as the main practical justification for wwd-patterns.
What about the remaining 0.05% of queries with OPT? We looked at a number of such queries and identified what we believe to be the three most common sources of non-weakly-well-designedness in query patterns. The first and seemingly most common such source is joins between an OPT subpattern and another pattern on a variable that only occurs in the right argument of the OPT subpattern. The following query is an example of such a join: SELECT ? , ?u WHERE ((?s, label, ? ) OPT (?s, type, ?t)) AND (?t, subClassOf, ?u).
We believe that the vast majority if not all such queries are erroneous as they are highly unlikely to yield meaningful answers in case the optional part fails to match. Intuitively, one would expect an answer to query (15) to contain the label of an object in variable ? together with one of its supertypes in variable ?u. And indeed, this is the answer returned by the query on graph G in Fig. 7a (see Fig. 7b). However, if an object in ?s is not assigned any type, which is explicitly allowed by the use of OPT, the query does not just return its label in ? leaving ?u unbound, as one would expect; instead, it returns the cross product of the label and all types in the graph, which are, of course, completely unrelated to the object in ?s (see Fig. 7c and d).
Intuitively, query (16) computes the join between the answers to the pattern (?b, height, ?h) and those to (?b, city, ?c), provided the second set is non-empty; otherwise, the query just returns the answers to (?b, height, ?h). Hence, query (16) is equivalent to the following query in S wwd : We conclude that, while queries such as (16) may make sense in practice, they can be easily and intuitively restated using wwd-patterns. Our final, and most interesting, source of real-life non-wwd-patterns is UNION in the right argument of OPT. For instance, consider the pattern (?p, type, person) OPT ((?p, son, ?a) UNION (?p, daughter, ?a)). (17) The pattern is quite intuitive and it is easy to imagine similar patterns being useful in various applications. However, the normalisation algorithm in [33], which "pushes" unions outside, converts (17) to a pattern that is inherently non-well-designed (due to the two occurrences of ?a in the third disjunct): FILTER ¬bound(?x 1 ) ∧ ¬bound(?x 2 )).
We believe that this behaviour is unavoidable in general as we expect query answering to become p 2 -hard over patterns that contain UNION in the right argument of OPT. Yet, for certain classes of patterns, generalising (17), one may be able to obtain a CONP evaluation algorithm by accounting for UNION natively rather than relying on the normalisation in [33]. A detailed study of such patterns, however, is outside the scope of the present paper.

Conclusion and Future Work
In this paper, we introduced a new fragment of SPARQL patterns called weakly well-designed patterns. This fragment extends the widely studied well-designed fragment by allowing variables from the optional side of an OPT-subpattern that are not "guarded" by the mandatory side to occur in certain positions outside of the subpattern. We showed that queries with wwd-patterns enjoy the same low complexity of evaluation as well-designed queries but cover almost all real-life queries. Moreover, our fragment is the maximal CONP fragment that does not impose structural restrictions on basic patterns and filter conditions. We studied the expressive power of the fragment and the complexity of its query optimisation problems.
For future work, we want to extend wwd-patterns to allow for non-top-level occurrences of UNION and projection. As we have seen in the previous section, this promises to be a challenging task since a naive extension of our definitions to such constructs is likely to increase reasoning complexity. Also, we want to take into account features of SPARQL 1.1 [17] such as GRAPH, NOT EXISTS and property paths. Finally, we would like to implement our ideas in a prototype system and compare its performance with existing SPARQL engines.