Characterizing Tractability of Simple Well-Designed Pattern Trees with Projection

We study the complexity of evaluating well-designed pattern trees, a query language extending conjunctive queries with the possibility to define parts of the query to be optional. This possibility of optional parts is important for obtaining meaningful results over incomplete data sources as it is common in semantic web settings. Recently, a structural characterization of the classes of well-designed pattern trees that can be evaluated in polynomial time was shown. However, projection—a central feature of many query languages—was not considered in this study. We work towards closing this gap by giving a characterization of all tractable classes of simple well-designed pattern trees with projection (under some common complexity theoretic assumptions). Since well-designed pattern trees correspond to the fragment of well-designed {AND, OPTIONAL}-SPARQL queries this gives a complete description of the tractable classes of queries with projections in this fragment that can be characterized by the underlying graph structures of the queries. For non-simple pattern trees the tractability criteria for simple pattern trees do not capture all tractable classes. We thus extend the characterization for the non-simple case in order to capture some additional tractable cases.


Introduction
Well-designed pattern trees (wdPTs) are a query formalism well-suited to deal with the ever increasing amount of incomplete data. Well-designed pattern trees over SPARQL triple patterns are equivalent to the class of well-designed {AND, OPTIONAL}-SPARQL queries, see Pérez et al. [21], and were in fact originally introduced as a formalism to more easily study SPARQL queries. By replacing triple patterns with relational atoms, wdPTs can also be seen as an extension of Conjunctive Queries (CQs): a wdPT is a rooted tree where each node represents a conjunction of atoms, and the tree structure represents a nesting of optional matching. The idea is to start evaluating the CQ at the root and to iteratively extend the retrieved results as much as possible by the results of the CQs in the other nodes. This allows wdPTs to return partial answers in cases where mapping the complete query into the database is impossible-unlike CQs which in such a situation return no answer.
Well-designed pattern trees and the corresponding SPARQL fragment represent an important class of SPARQL queries and have been studied intensively within the last decade, see Pérez et al. [21], Letelier et al. [17], Arenas and Pérez [1], Pichler and Skritek [23], Picalausa and Vansummeren [22], Kostylev et al. [15], Barceló et al. [3], Arenas et al. [2], Romero [24]. Thus, many properties of and problems related to these queries are now well understood. For example, the evaluation problem for wdPTs (i.e., given a wdPT, a database and a mapping, is this mapping an answer to the wdPT over the database?) is coNP-complete for projection free wdPTs [21] and P 2 -complete in the presence of projection [17]. However, certain tractable classes of wdPTs have been identified [3]. The main idea there is to extend known tractability conditions for CQs to wdPTs. However, the question of characterizing exactly the classes of wdPTs for which tractable query evaluation is possible-and thus the question of how suitable the approach of extending tractability conditions of CQs to wdPTs is for describing the space of tractable classes of wdPTs-has been largely ignored. Only very recently, this question was addressed for wdPTs without projection, and a characterization of the classes for which query evaluation is in PTIME was given by Romero [24]. Notably, as also observed for Boolean Conjunctive Queries by Grohe et al. [13] and Grohe [12], for wdPTs without projection these classes coincide with the ones for which evaluation is in FPT. However, Romero [24] does not consider projection, an essential and central feature of query languages. Thus, the question "What are all tractable classes of wdPTs with projection?" remains open. We work towards closing this gap.
One observation consistently made in all aforementioned work on wdPTs is that problems become much more complex once projection is included. This is true for the computational complexity of the problems (e.g., as mentioned, for the evaluation problem it increases from coNP-to P 2 -completeness; for classical query containment, the NP-complete problem becomes even undecidable [23]) as well as for establishing these results. This is because of the particular semantics of well-designed SPARQL with projection. For wdPTs without projection, given some database, the set of answers consists of all variable mappings such that there exists a subtree of the wdPT satisfying the following conditions: first, it must contain the root node of the tree. Second, the set of variables occurring in the subtree must be the same as the domain of the mapping. Third, the mapping must map each atom in the subtree into the database, and fourth, no extension of the mapping is allowed to map all atoms of any node outside the subtree that is a child node of a node in the subtree into the database. This is illustrated by the following example (a precise definition is given in Section 2).
Example 1 Fig. 1 shows a wdPT p with four nodes r, t 1 , t 2 , and t 3 where r is the root node.
At its root node, the query is looking for information on flights: the airline operating the flight, the flight number, origin and destination. This information shall be extended by some contact information (t 1 ), and information on available seats on the flight (t 2 ) in case any of this information is available. If, in addition to the information on available seats also some ratings of the free seats are available (t 3 ), these shall be returned as well. Observe that the extensions to t 1 and t 2 are independent of each other. An equivalent SPARQL query (replacing relational atoms by triple patterns, and abbreviating variable and predicate names) would be For the database instance D also shown in the figure, the mapping μ defined as μ(airline) = "XYZ", μ(flight number) = 1, μ(origin) = "LHR", μ(destination) = "LIS", and μ(contact details) = "e@ma.il" is an answer to p over D. This is because of the subtree of p consisting of the nodes r and t 1 . It can be checked that it satisfies all four conditions mentioned earlier. For the fourth condition, just observe that there exists no extension of μ that maps available seats(flight number, seat number) into D. Because of the fourth condition, the mapping ν with ν(airline) = "XYZ", ν(flight number) = 2, ν(origin) = "LHR", ν(destination) = "LIS", and μ(contact details) = "e@ma.il" is no solution, because this mapping can be extended by ν(seat number) = "1A" in a way that maps available seats( flight number, seat number) also into D. Theory of Computing Systems (2021) 65:  Without projection, the only hard part in deciding whether some mapping is a solution is to check for the existence of an extension. This requires a homomorphism test which is well known to be an NP-hard problem. However, for wdPTs with projection, a mapping is a solution if there exists an extension of this mapping to some subset of the existential variables in the tree, such that the extended mapping is a solution to the wdPT considered without projection.
Example 2 Consider the wdPT from Example 1, but now assume that the variables flight number and airline are not part of the output but projected away. Then the mapping μ with μ(origin) = "LHR", μ(destination) = "LIS", and μ(contact details) = "e@ma.il" is a solution because of the extension μ(flight number) = 1 and μ(airline) = "XYZ". Observe that the extension μ(airline) = "XYZ" and μ(flight number) = 2 does not witness μ to be a solution, since, as already discussed before, this mapping is not maximal. As a result both, μ and its extension ν with ν(seat number) = "1A" (and otherwise defined like μ) are solutions in this case.
As a consequence, besides testing some mapping for maximality, as a second source of hardness, different mappings on the existential variables have to be taken into account. In addition to the increased complexity of the evaluation problem, this also has the effect that the classes of wdPTs with projection for which query evaluation is in PTIME and in FPT no longer coincide, as observed by Kröll et al. [16]. Thus, in this setting, the choice of the tractability notion makes a difference when describing all tractable classes.
We choose to study the complexity of query evaluation in the model of parameterized complexity where, as usual, we take the size of the query as the parameter. As already argued by Papadimitriou and Yannakakis [20], this model allows for a more fine-grained analysis than the classical perspectives of data-and query complexity. In parameterized complexity, query answering is considered tractable, formally in FPT, if, after a preprocessing that only depends on the query, the actual evaluation can be done in polynomial time [10,11]. This allows for potentially costly preprocessing on the generally small query while the dependency on the generally far bigger database is polynomial for an exponent independent of the query. Parameterized complexity has found many applications in the complexity of query evaluation problems, see, e.g., Grohe et al. [13], Grohe [12], Marx [18], Chen [4], Romero [24].
In our efforts to better understand the tractability frontier for wdPTs, we provide a complete characterization of the tractable classes of simple wdPTs, i.e., wdPTs where no two atoms share the same relation symbol. Because of the relationship between wdPTs and well-designed {AND, OPTIONAL}-SPARQL queries, this immediately gives a complete description of the tractable classes of well-designed {AND, OPTIONAL}-SPARQL queries with projection that can be characterized by only considering the graph structures of the queries, similar, e.g., to the work of Grohe et al. [13] and Chen [4]. We note that the results showing the existence of classes of wdPTs for which the evaluation problem is NP-hard but in FPT can be easily extended to simple wdPTs.
Our tractability criteria are not restricted to simple wdPTs. In fact, the same tractability criteria can also directly be applied to give tractable classes of non-simple wdPTs, i.e., such in which the same relational symbol may appear several times. However, in this case, there are classes of queries that do not satisfy our tractability criteria for simple queries and are still tractable. This shows that the restriction to simple wdPTs is crucial for the lower bounds. To extend the applicability of our techniques in the case of non-simple queries, we generalize our criteria by incorporating the notion of cores into well-designed pattern trees as it was done in the projection free case by Romero [24] and for conjunctive queries by Dalmau et al. [7]. While this allows us to show tractability for more classes of wdPTs, we do not achieve a full dichotomy in this setting.

Summary of Results and Organization of the Paper
We study the following decision problem: Given a wdPT, a database, and a mapping, is the mapping a solution of the wdPT over the database? This is the standard formulation of the evaluation problem usually studied, cf. Letelier et al. [17], Kaminski and Kostylev [14], Romero [24], Barceló et al. [3]. It reveals the influence of the optional query parts on the evaluation problem, which is lost, e.g., when considering Boolean queries. Instead of just SPARQL triple patterns, we consider the more general case of wdPTs with arbitrary relational atoms where we always assume that the classes of queries we consider have bounded arity. Our main result is a characterization of the classes of simple wdPTs with projection that allow fixed-parameter tractable query evaluation.
After some preliminaries in Section 2, we define two tractability conditions in Section 3. By comparing these conditions with the tractability criterion given by Romero [24] for the projection free case, we discuss how they describe the additional complexity introduced by projection. Note that some of the conditions provided here have precursors in Barceló et al. [3] and Kröll et al. [16] that had to be carefully refined to provide a fine-grained complexity analysis.
In Section 4 we prove that the two tractability conditions imply FPT membership of the evaluation problem by presenting an algorithm that exploits these conditions.
In Section 5 we then show that both tractability conditions are indeed necessary for a class of simple wdPTs to be tractable. That is, we show that if either of them is not satisfied by a class of wdPTs, the evaluation problem for this class is either In Section 6 we show how to extend our tractability criteria for the non-simple case by incorporating the notion of cores and show how this allows us to capture more tractable classes. Besides a generalization of the tractability criteria from Section 3, we also introduce a generalization of the homomorphism problem, and completely characterize its tractable classes.
In Section 7, we discuss our results and potential extensions to conclude the paper. This article is an extended version of the conference paper [19]. The main additional contribution compared to the conference version consists of Section 6, which is completely new. In addition, several minor improvements have been made throughout the article, including the extension of existing and addition of new examples.

Preliminaries
Basics Let Const and Var be two disjoint countable infinite sets of constants and variables, respectively. A relational schema σ is a set {R 1 , . . . , R n } of relation symbols R i , each having an assigned arity r i ≥ 0. A relational atom R i (v) over σ consists of a relation symbol R i ∈ σ and a tuple v ∈ (Const ∪ Var) r i . For an atom τ = R i (v), let dom(τ ) denote the set of variables and constants occurring in v. This extends to sets R = {τ 1 , . . . , τ m } of atoms as dom(R) = m i=1 dom(τ i ). Furthermore, var(τ ) = dom(τ ) ∩ Var and var(R) = dom(R) ∩ Var. Observe that, by slight abuse of notation, we use the operators ∪, ∩, \ also between sets V and tuples v of variables and constants. For example, var(τ ) = v∩Var. We call a set of atoms simple if no relation symbol appears more than once in it.
Similarly, for a mapping μ we denote with dom(μ) the set of elements on which μ is defined. For a mapping μ and a set V ⊆ Var, we use μ| V to describe the restriction of μ to the variables in dom(μ) ∩ V. We say that a mapping μ is an extension of a mapping ν if μ| dom(ν) = ν, and that two mappings are compatible if they agree on the shared variables.
For a set A of atoms and a set A ⊆ dom(A), we write A\A to denote the restriction where v is obtained from v by removing elements of A, and s is the list of the removed positions and their values.
A database D over σ is a finite set of atoms over σ with var(D) = ∅. For a database D and relation symbol R we denote by R D the set of all atoms in D with relation symbol R.

Homomorphisms and Cores
A homomorphism h between two sets A and B of atoms over σ is a mapping h : dom(A) → dom(B) such that for all atoms R(v) ∈ A we have R(h(v)) ∈ B, and such that h(x) = x is only allowed if x ∈ var(A) (throughout the article, when defining homomorphisms we therefore only state the mapping on var(A), and assume the extension to constants via the identity mapping to be implicit). We write h : A → B to denote a homomorphism h from A to B.
Let A be a set of atoms. A minimal subset A ⊆ A such that there is a homomorphism A → A is called a core of A. We recall that all cores of A are unique up to isomorphism and thus speak of the core of A which we denote by core(A).

Conjunctive Queries
We write conjunctive queries (short CQs) q as Ans(x) ← B, where the body B = {R 1 (v 1 ), . . . , R m (v m )} is a set of atoms and x are the free variables. A Boolean CQ (BCQ) is a CQ with no free variables. We define var(q) = var(B). The existential variables are implicitly given by var(B) \x. The result q(D) of q over a database D is the set of tuples {h(x) | h : B → D}, i.e., every homomorphism mapping the body of the query into the database is projected onto the values assigned to the free variables.
Graphs We consider only undirected, simple graphs G = (V , E) with standard notations. We sometimes write t ∈ G to refer to a node t ∈ V (G). A graph G 2 is a subgraph of a graph . A tree is a connected, acyclic graph. A subtree is a connected, acyclic subgraph. A rooted tree T is a tree with one node r ∈ T marked as its root. Given two nodes t,t ∈ T , we say thatt is an ancestor of t ift lies on the path from r to t. Likewise,t is the parent node of t (and t is a child of t) ift is an ancestor of t and {t,t} ∈ E(T ). For a subtree T of T , a node t ∈ T is a child of T if t / ∈ T andt ∈ T for the parent nodet of t. We write ch(T ) for the set of all children of T . For a node t ∈ T the set of nodes on the path from the root r to the parent node of t is denoted by branch(t ). Moreover, cbranch(t ) = branch(t )∪{t}.
For a set A of atoms, the Gaifman graph of A is the graph and v j occur together in some atom in A.

Tree Decompositions and Treewidth
where T is a tree and ν : V (T ) → 2 V , that satisfies the following: 1. For each u ∈ V the set {s ∈ V (T ) | u ∈ ν(s)} is a connected subset of V (T ), and 2. each edge of E is contained in at least one of the sets ν(s), for s ∈ V (T ).
The treewidth of a set of atoms is the treewidth of its Gaifman graph.

Well-Designed Pattern Trees (wdPTs)
A pattern tree (short: PT) p over a relational schema σ is a tuple (T , λ, X ) where T is a rooted tree and λ maps each node t ∈ T to a set of relational atoms over σ . We may write ((T , r), λ, X ) to emphasize that r is the root node of T . The set X of variables denotes the free variables of the PT. For a PT (T , λ, X ) and a subtree T of T , let λ(T ) = t∈V (T ) λ(t). We may write var(t ) instead of var(λ(t )), and var(T ) instead of var(λ(T )). We further define fvar(t ) = var(t ) ∩ X as the free variables in t. Again this definition extends naturally to subtrees T of T . We call a PT (T , λ, X ) projection free if X = var(T ) and may write (T , λ) to emphasize a PT to be projection free. The size |p| of a pattern tree is Well-designed PTs restrict the distribution of variables among their nodes.
Definition 1 (Well-Designed Pattern Tree (wdPT)) A PT (T , λ, X ) is welldesigned if for every variable y ∈ var(T ), the set of nodes of T where y appears is connected.
As an immediate consequence of this restriction, in a wdPT p = (T , λ, X ), for every variable y ∈ var(T ) there exists a unique node t ∈ T such that y ∈ var(t ) and all nodes t ∈ T with y ∈ var(t ) are descendants of t.
Evaluating a wdPT p with free variables X over a database D returns a set p(D) of mappings μ : V → dom(D) with V ⊆ X . We follow the characterization of p(D) in terms of maximal subtrees introduced by Letelier et al. [17], but borrow the term pp-solution from Kaminski and Kostylev [14]. Definition 2 (pp-solution) For a wdPT p = ((T , r), λ) and a database D, a mapping μ : V → dom(D) (with V ⊆ var(T )) is a potential partial solution (pp-solution) to p over D if there is a subtree T of T containing r such that μ : λ(T ) → D.
The semantics of wdPTs can now be defined in terms of maximal pp-solutions.
Definition 3 (Semantics of wdPTs) Let p = (T , λ, X ) be a wdPT, and let p = (T , λ, var(T )), i.e., the projection-free wdPT retrieved from p by considering all of its variables as free, and let D be a database. The set p (D) contains all pp-solutions μ to p over D such that there exists no pp-solution μ to p over D that is a proper extension of μ.
The set p(D) is then defined as Example 3 Consider the PT p = (T , λ, X ) depicted in Fig. 2, where k may be any integer with k ≥ 2, and X = {x 1 , x 2 , x 3 , x 4 , x 5 }.
All variable occurrences in p are connected, thus it is well-designed. If for example the atom node 2 (y 2 ) was missing in the root node, the tree would not be well-designed because of the occurrences of y 2 in both, t 1 and t 2 . Similarly, the wdPT in Fig. 1 is also well-designed. If there, in node t 3 one would ask for a review of the airline instead of the seat (i.e., having λ(t 3 ) = review(airline, rating)), the resulting tree would no longer be well-designed, since the variable airline does not occur in t 2 .

Parameterized Complexity
We only give a bare-bones introduction to parameterized complexity and refer the reader to [9] for more details. Let be a finite alphabet. A parameterization of * is a polynomial time computable mapping κ : * → N. A parameterized problem over is a pair (L, κ) where L ⊆ * and κ is a parameterization of * . We refer to x ∈ * as the instances of a problem, and to the numbers κ(x) as the parameters.
A parameterized problem E = (L, κ) belongs to the class FPT of fixed-parameter tractable problems if there is an algorithm A deciding L, a polynomial pol, and a computable function f : N → N such that the running time of A on every input x ∈ * is at most f (κ(x)) · pol(|x|).
In this paper, for classes P of wdPTs, we study the problem p-EVAL(P) defined as follows.
We always assume that the arity of all atoms of the queries in P is bounded by a constant, i.e., that there is a constant c (possibly depending on P) such that no atom in the queries in P has an arity of more than c.
Let E = (L, κ) and E = (L , κ ) be parameterized problems over and , respectively. An FPT-reduction from E to E is a mapping R : * → ( ) * such that (1) for all x ∈ * we have x ∈ L if and only if R(x) ∈ L , (2) there is a computable function f and a polynomial pol such that R(x) can be computed in time f (κ(x)) · pol(|x|), and (3) there is a computable function g : N → N such that κ (R(x)) ≤ g(κ(x)) for all x ∈ * .
Of the rich hardness theory for parameterized problems, we will only use the classes W [1] and coW [1]. To keep this introduction short, we define a parameter- It is generally conjectured that FPT = W [1] and thus in particular W[1]hard problems and coW[1]-hard problems are not in FPT. We will take the hardness results for problems (L , κ ) from the literature. One important such problem is the homomorphism problem p−HOM(C) for a class C of sets of atoms defined as follows Theorem 1 (Grohe [12]) Let C be a decidable class of sets of atoms. Then p-HOM(C) is in FPT if there exists some constant c such that the treewidth of the core of each set in C is bounded by c, and W[1]-hard otherwise.
Since simple sets of atoms are their own core, this immediately implies that for simple sets C of atoms, p-HOM(C) is in FPT if the treewidth of each set in C is bounded by c, and W[1]-hard otherwise. We remark that this result was in fact shown by Grohe et al. [13] in a predecessor paper of Grohe [12].

Tractability Conditions for Simple wdPTs
In this section we will introduce our tractability conditions for simple wdPTs. While, as mentioned, these criteria also apply to arbitrary wdPTs, they are not optimal in this case as we will see in Section 6. There we will also show generalizations of the conditions we give here that, for general wdPTs, describe more tractable classes. However, to simplify the presentation, in this section we tailor the definitions towards simple wdPTs.
We start by recalling that in simple pattern trees, no relation symbol is allowed to occur more than once. Our overall idea for solving p-EVAL(P) is as follows: given a wdPT p, a database D, and a mapping μ, construct a set of CQs q with free variables x and associated databases D such that μ ∈ p(D) if and only if for at least one of these CQs q the tuple μ(x) is in q(D ). We give two tractability criteria ensuring that this algorithm is in FPT. Intuitively, one condition guarantees that deciding μ(x) ∈ q(D ) is in PTIME, while both conditions in combination guarantee that D can be computed efficiently.
We will state the tractability conditions with respect to a class P of wdPTs. So in the remainder of this section let P be an arbitrary but fixed class of wdPTs.
We start with some additional notation and results. First of all, as already observed by Letelier et al. [17], some nodes of a wdPT may not be relevant for the answers returned by the query, in the following sense.
where p is constructed from p by removing from T the subtree rooted in t. We use relv(T ) to denote the set of relevant nodes in T . Fig. 2, and consider the node t 3 . Given any mapping μ on at least var(λ(r) ∪ λ(t 1 )), whether this mapping can be extended to t 3 or not has no effect on the result, since an extension to t 3 does not include any new free variable. Note, however, that due to the clique of fresh existential variables s 1 , . . . , s k in λ(t 3 ), deciding whether a mapping can be extended to t 3 is actually expensive. Thus, nodes like t 3 that do not influence any solution can safely be omitted.

Example 4 Recall the wdPT from
Letelier et al. [17] introduced a normal form excluding non-relevant nodes. Here, in order to make our results more explicit, we do not follow this approach but allow wdPTs to contain non-relevant nodes. Luckily, it follows from Letelier et al. [17] that these nodes can be easily detected.
Proposition 1 (Letelier et al. [17] 1 ) Let p = (T , λ, X ) be a wdPT. Then a node t ∈ T is relevant if and only if fvar(T ) \ fvar(t) = ∅, where T is the subtree of T rooted in t andt is the parent node of t.
When testing whether a mapping can be extended to a node t in a wdPT, the variables shared between t and its parent node play a crucial role. First, they describe the relevant domain of the mapping to be extended. Second, the values for these variables are already determined. This not only reduces the number variables in t for which a value must be found, but may also allow to partition the atoms in t and then test each partition separately instead of all atoms in t at once. We call these shared variables the interface of t.
The interface of the root node r is I(r) = ∅.
It was already remarked, e.g., by Barceló et al. [3] and Kröll et al. [16] that restrictions on the number of variables shared between different sets of nodes can be used to define tractable classes. The above definition however differs slightly from the notion of interfaces in these works. E.g., in Kröll et al. [16], the interface of a node describes the set of variables shared between the node and any of its neighbors, while here it is restricted to the variables shared with its parent node.
Restrictions on the size of node interfaces turn out to be quite coarse, and we provide more fine grained tractability criteria here. To this end, we implement the idea of partitioning the set of atoms in a node using its interface. For this, we recall the notion of S-components from Durand and Mengel [8]. Originally, they were defined for graphs and then extended to sets of atoms. Since we will only use S-components of sets of atoms, we provide their definition directly, omitting the graph case. Let A be a set of atoms and S ⊆ var(A) a set of variables. Consider the dual graph Definition 7 (Kröll et al. [16]; Node Components) Let p = (T , λ, X ) be a wdPT and t ∈ T . The set of node components N C(t) of t is a set of sets of atoms, defined as the union of: 1. The set {{τ } | τ ∈ λ(t) and var(τ ) ⊆ I(t)} consisting of singleton sets for every atom τ ∈ λ(t) which contains only "interface variables", i.e., variables from I(t).

The set of all I(t)-components of λ(t).
In the following, node components of type (1) are the singleton sets of condition 1 and node components of type (2) are the I(t)-components of condition 2.
Example 5 Fig. 3 shows again the wdPT p from Example 3, but omitting the nonrelevant node t 3 . In addition, at each node the node components are depicted by dotted boxes. To emphasize the difference between interface-and non interface variables and to highlight which variables connect atoms to node components, the interface variables at each node are grayed. Consider the node t 2 , which has the following node components: each atom yclique ij (y i , y j ) forms a node component of type (1) since all variables y i occur also in r. In addition, there are several node components of type (2): each atom edge i (y i , r i ) is such a node component (they contain a variable in var(t 2 ) \ I(t 2 ) but do not share such a variable with another atom), and there are the two node com- Observe that these two sets are connected by y 1 , but y 1 ∈ I(t 2 ) separates them into two node components. In contrast, the atoms b 1 (x 1 , u 1 ), b 2 (u 1 , u 2 ), and b 3 (u 2 , y 1 ) are connected by u 1 and u 2 , which are not in I(t 2 ).
Also, note the effect of considering interface variables and node components instead of looking at the complete node at once. For example, the Gaifman graph of λ(t 2 ) has a treewidth of k. Thus, finding, e.g., values for r 1 , . . . , r k is a hard problem. However, by taking into account interface variables and node components, each of the resulting sets has a treewidth of 1. Therefore, the existence of an extension of some mapping on the interface variables to each node component independently can be decided efficiently.
To understand why node components are essential for our results, recall that solutions to wdPTs must be maximal, i.e., they map some subtree into the database, but cannot be extended to map some child node of this subtree into the database as well. But such an extension to a node exists if and only if the mapping can be extended to all of its node components. Thus, instead of testing extensions to the complete node at once (which might be intractable), we test the maximality of a mapping Fig. 3 The well-designed pattern tree from Fig. 2, with the non-relevant node t 3 omitted and the interface variables at each node grayed to emphasize the node components, which are depicted by the dotted boxes (see Example 5) independently for each node component (which might be tractable). This is possible because for all variables shared between any two node components, the values are already determined by the mapping to be extended. Extensions to different node components are thus independent of each other.
For node components, we are in particular interested in the contained interface variables.
We are now ready to formulate our first tractability condition. Intuitively, it guarantees that for each node component S, given some mapping on the variables in its interface, one can decide in polynomial time whether this mapping can be extended to map all atoms in the node component into a given database. It exploits (and formalizes) the fact that once a mapping on I(S, t) is given, these variables can be treated as constants.

Example 6
To demonstrate the situation after introducing condition (a), let p k be the wdPT p from Example 5 parameterized by k, let P = {p k | k ≥ 2}, and let T = T [{r, t 1 , t 2 }]. Assume, for some k ≥ 2, in order to test if some mapping is a solution to p k , we would like to verify whether some mapping μ with dom(μ ) = var(T ) is a maximal pp-solution. Deciding whether it is a pp-solution is easy, and because P satisfies tractability condition (a), testing if there exists in both, t 4 and t 5 , a node component to which μ cannot be extended is feasible in polynomial time as well. In fact, N C(t 4 }} (this holds for every p k ∈ P). However, there may exist an exponential number of pp-solutions on T (T contains 2k +4 existential variables). Thus, testing one mapping on var(T ) after the other is not feasible.
One way to overcome the problem sketched in the example is to not have two separate tests for being a pp-solution and being maximal. To combine these tests, we first compute for each node component the set of mappings on its interface variables that do not extend to the component, and then require pp-solutions to be consistent with these mappings.
It turns out that this idea can be encoded as an evaluation problem for CQs. One important step in this encoding is to introduce new relations, one for each node component, that store those mappings on the interface variables that cannot be extended to the component. In order for the resulting CQ evaluation problem to be in polynomial time, we require two properties. First, the resulting CQs must be from some tractable fragment of CQ evaluation, and, second, the size of the newly added relations must be at most polynomial. One way to achieve the second goal is to restrict the arity of these relations. Towards the first goal, since the bodies of the resulting CQs are simple sets of atoms, tractability for the evaluation problem holds if these queries have bounded treewidth. As it turns out, a bound on the treewidth also implies a bound on the arity of the new relations, and thus represents our second tractability condition.
To formalize the construction, we introduce the notion of a component interface atom. For a wdPT (T , λ, X ), a subtree T of T , a node t ∈ ch(T ), and node component S ∈ N C(t), let the interface atom be an atom R(v) where v contains the variables in I ∃ (S, t) and R is a fresh relation symbol. For a node component S, we use cia(S) to refer to the corresponding interface atom R(v). Observe that these definitions imply The intuition for cia(S) is that for each node component, we get one atom that covers exactly the variables in I ∃ (S, t). The free variables in the interface can be excluded from the considerations since a fixed value is provided for them as part of the input.

Example 7
Recall again the wdPT depicted in Fig. 3 and consider the node t 1 . It contains two node components: Observe that whether S 2 can be mapped into some database D is completely independent of the interface variables. Thus, cia(S 2 ) = R S 2 (). However, for S 1 the values of y 1 and y 2 influence whether S 1 may be mapped. Thus, we get cia(S 1 ) = R S 1 (y 1 , y 2 ). In case of the node component {c 3 (x 1 , u 3 , u 1 )} of t 4 , observe that we can assume some fixed value μ(x 1 ) for x 1 , and thus can reduce the atom c 3 (μ(x 1 ), u 3 , u 1 ) according to this value. We therefore get as interface atom R c 3 (u 3 , u 1 ), with the corresponding database being computed based on c Since we are looking for one pp-solution that cannot be extended to any child node, combining the two tests as sketched means that we must test all children simultaneously instead of individually. However, since each CQ tests only one node component for each child, we need one CQ for each possible combination, leading to our second tractability condition.
The following example breaks down condition (b) to demonstrate its intuition.
Example 8 Recall the setting in Example 6, as well as the database instance D from Example 3. Let μ be a mapping with dom(μ) = {x 1 , x 2 }, and μ(x 1 ) = μ(x 2 ) = 1. Assume that, in order to show that μ ∈ p k (D) for any k ≥ 2 we are looking for a pp-solution for T that does not extend neither to the node component S 1 = {c 1 (x 3 , u 1 )} of t 4 (and thus not to λ(t 4 )), nor to the node component S 2 = {d 1 (y 2 , x 5 ), d 2 (x 5 , u 2 )} of t 5 (and thus not to λ(t 5 )). The idea is to construct a CQ and R 2 (y 2 , u 2 ) are the component interface atoms for S 1 and S 2 , respectively. Observe that this query has exactly the structure described in condition (b), illustrating the motivation for the definition of this condition. Also, note that because of the yclique ij -atoms, P does not satisfy tractability condition (b).
The query is evaluated over the instance D extended by relations for R 1 and R 2 . As mentioned, these relations contain all values that cannot be extended to map S 1 and S 2 into D, respectively. For S 1 , we get {R 1 (2)} (since c 1 (1, 1) ∈ D, thus a mapping assigning 1 to u 1 could be extended to map S 1 into D), and for S 2 we get Consider the mapping μ with dom(μ ) = var(λ({r, t 1 , t 2 , t 3 })) and μ (x) = 1 for all x ∈ dom(μ ) except for u 1 and u 2 , for which μ (u i ) = 2. Now the mapping μ witnesses the fact that (1, 1) is an answer to the query over D, and thus μ 2 is a maximal pp-solution also witnessing μ ∈ p k (D).
The main result of this paper is that the conditions (a) and (b) characterize exactly the classes of simple wdPTs which can be evaluated efficiently.

Theorem 2
Assume that FPT = W [1], and let P be a decidable class of simple wdPTs of bounded arity. Then the following statements are equivalent.

The tractability conditions (a) and (b) hold for P. 2. p-EVAL(P) is in FPT.
We will show the upper bound of Theorem 2 in Section 4, where we describe how the different ideas described so far can be combined to an FPT algorithm, while the lower bounds will be shown in Section 5.
But before we turn to the proof of Theorem 2, let us interpret the result in the setting without projections to better understand the influence of projection. First note that in that case, by Definition 8, we have I ∃ (S, t) = ∅ for every t ∈ T and every S ∈ N C(t). Thus, all atoms cia(S) (for any node component S) are of arity 0 as are all atoms in (λ(T ) ∪ n i=1 {cia(S i )}) \ fvar(T ). Tractability condition (b) is therefore void in this setting, leaving only (a) as a useful condition in the projection free case. This immediately implies the following corollary.

Corollary 1 Assume that FPT = W[1], and let P be a decidable class of simple wdPTs of bounded arity without projections. Then p-EVAL(P) is in FPT if and only if tractability condition (a) holds for P.
We remark that Corollary 1 could also be inferred as a special case of the main result of Romero [24]. Stating the corollary explicitly here lets us better understand the role of projection for our problem: in fact, the role of condition (a) is essentially to deal with the complexity that we already have without projection, while condition (b) is necessary to deal with the additional source of hardness that is introduced by projections and does not appear without them.
Since it will simplify the discussion in the upcoming sections, we conclude the section by explicitly working out a third tractability condition already mentioned above in the discussion towards tractability condition (b). As described there, at some point we extend a given database by relations for the atoms cia(S) that contain for the corresponding node component all mappings on its existential interface that cannot be extended to a mapping on the whole node component. To guarantee that all these relations are of polynomial size, we restrict the number of variables in the existential interfaces of the node components by some constant c. We formalize this notion in terms of a suitable width measure.
For a node t ∈ T , the component width of t is the maximum width over all node components S of t. The component width of p is the maximum component width over all t ∈ relv(T ).
By the definition of the treewidth of a set of atoms -specifically by the fact that in the Gaifman graph all variables occurring together in an atom form a clique -and the fact that for the existential interface of each node component its variables occur together in some cia(S)-atom, the number of variables in any existential interface is bound by the treewidth of the CQs defined in condition (b). Thus, we get the following corollary.
Corollary 2 Let P be a class of wdPTs that satisfies tractability condition (b) for some constant c. Then, for every p ∈ P, the component width of p is at most c + 1.

The FPT Algorithm
Having defined the tractability conditions, we now show how they are used in the FPT-algorithm for p-EVAL(P) outlined in Algorithm 1.
The missing ingredient of Algorithm 1 that we have not yet introduced is stop(S, D) for a node component S and a database D which we explain now. Recall that we said earlier that the intention of the node components is to ensure a mapping to be maximal not by testing for extensions to the complete node, but to do these tests for smaller, independent units.
The idea of how to realize this is to store in D for each node component S those variable assignments ν to the variables in its existential interface such that there exists no extension homomorphism ν : S → D of ν ∪ μ. These are the values stored in stop (S, D).
In more detail, for a wdPT ((T , r), λ, X ), a subtree T of T containing r, a child node t ∈ ch(T ), node component S ∈ N C(t), a database D, and a mapping μ : fvar(T ) → dom(D), consider the set extend(S, D) = {η : I ∃ (S, t) → dom(D) | there exists a homomorphism η : S → D extending η and μ| var(S) }. So extend contains exactly those mappings on I ∃ (S, t) that can be extended in a way that is compatible with μ and maps S into D. We thus set stop(S, D) = {ν : With this in place, we describe the idea of Algorithm 1. Recall that, given μ, we have to find a mapping μ extending μ that is (1) a pp-solution, and (2) maximal. Because of the existential variables, there may be exponentially many subtrees T of T containing r with fvar(T ) = dom(μ), each being a potential candidate for witnessing (1) and (2). After removing all irrelevant nodes in line 1 (they might make evaluation unnecessarily hard, cf. Example 4), we thus check each of these subtrees (line 2).
If the required mapping μ exists, then, as discussed earlier, for each child node of T there exists at least one node component to which μ cannot be extended. Not knowing which node components these are, the algorithm iterates over all possible combinations (line 4). In lines 5-7, the algorithm now checks whether there exists an extension of μ that maps all of λ(T ) into D (ensured by adding λ(T ) to q), but none of the node components S 1 , . . . , S n . The latter property is equivalent to asking that μ must assign a value to the existential interface variables of each S i that cannot be extended. This is guaranteed by adding the atoms cia(S i ) to q and providing in D exactly the values from stop(S i , D). Observe that the CQ q in line 5 contains all x ∈ fvar(T ), and that replacing them by μ(x) is only part of the evaluation strategy in line 7.
Most of the central components and ideas of the algorithm have already been demonstrated in isolation in the examples throughout Section 3. The next example brings together all these ideas by illustrating how the algorithm works.
Example 9 Consider again the wdPT p = (T , λ, {x 1 , . . . , x 5 }) depicted in Fig. 2, but without the atoms yclique ij (y i , y j ) for 1 ≤ i < j ≤ k. The resulting tree now satisfies conditions (a) and (b). Assume we want to decide μ ∈ p(D) for the mapping μ with μ(x 1 ) = 1 and μ(x 2 ) = 2 and In a first step, the algorithm now reduces T by removing the non-relevant node t 3 . Thus, from now on T corresponds to the wdPT shown in Fig. 3 (again, without the yclique ij (y i , y j ) atoms in t 2 ).
Next, there exist two subtrees T of T with fvar(T ) = dom(μ) = {x 1 , x 2 }. These are the subtrees T 1 = T [{r, t 1 }] and T 2 [{r, t 1 , t 2 }], respectively. Starting with T 1 , we get ch(T 1 ) = {t 2 }. Without the yclique ij -atoms, the node t 2 has k + 2 node components. Since t 2 is the only child of T 1 , the algorithm simply iterates over these k + 2 node components (line 4). We distinguish four different kinds of node components in t 2 .
For components S of the form edge i (y i , r i ), the component interface atoms are in principle a mapping ν(y 1 ) = 2 can be extended to a mapping on S. However, because of μ(x 1 ) = 1, such a mapping would not be compatible with μ, since mapping b 1 (x 1 , u 1 ) to b 1 (2, 2) is not an option. As a result, again (1, 2) / ∈ q(D ). We omit the similar case for the last node component S = {d 4 (u 3 , y 1 )}. Hence, T 1 does not witness μ being a solution, and therefore T 2 is tested next (line 2). The children of T 2 are t 4 and t 5 . As already laid out in Example 6, For the combination (S 2 , S 4 ), we get q as Ans( Observe that D 2 and D 4 do not contain a compatible value for u 2 . Thus, q(D ) is again empty, showcasing that there must be a combined check for the node components of different nodes, and that this cannot be done in isolation.
For (S 2 , S 5 ), now D 2 and D 5 contain a compatible value for u 2 , namely 2. However, by mapping u 2 to 2, there is no way to map t 2 into D such that x 1 is mapped to 1. Thus, also in this case (1, 2) / ∈ q(D ). For (S 3 , S 4 ), again (1, 2) / ∈ q(D ): mapping all existential variables to 1 clearly does not allow to map neither R 3 (u 3 , u 1 ) nor R 4 (y 2 , u 2 ) into D . The only other way to map λ(t 2 ) into D is by mapping y 1 to 1 and u 2 to 3. However, such a mapping cannot map R 4 (y 2 , u 2 ) into D .
In order to see that this indeed gives an FPT algorithm in case tractability conditions (a) and (b) are satisfied, note that condition (b) ensures that the arity of each of the new relations for the atoms cia(S) is at most c + 1 (cf. Corollary 2). Thus, the size of these relations (and thus the number of possible mappings in stop (S, D)) is at most |dom(D)| (c+1) . Next, condition (a) ensures that for each mapping ν : I ∃ (S, t) → dom(D) deciding membership in stop(S, D) is in PTIME. Observe that the variables in I(S, t) are not considered in the computation of the treewidth since a fixed value is provided for them, thus they can be treated as constants. Finally, condition (b) also ensures that the test in line 7 is feasible in polynomial time. Again, since a fixed value is provided for the domain of μ, these variables can be treated as constants.
We note that the algorithm is an extension and refinement of the FPT algorithm of Kröll et al. [16]. An inspection of that paper reveals that the conditions provided there imply our tractability conditions (a) and (b), but there is no implication in the other direction. In fact, our conditions explicitly describe the crucial properties of their restrictions that ensure the problem is in FPT. From Algorithm 1 we thus derive the following result. The correctness of the algorithm follows immediately from the previous discussion. For the runtime, in addition to what was already discussed, the number of loop-iterations in lines 2 and 4 is bounded by a function in the size of p, which is the parameter for the problem.

Optimality of the Tractability Conditions
We now show that both tractability criteria are necessary, and thus finish the proof of Theorem 2. We provide individual results for both conditions. In addition, we show that the bound on the component width is necessary (and not just a side effect), which will turn out to be a useful result for proving that tractability condition (b) is necessary.

Lemma 2 Let P be a decidable class of simple wdPTs of bounded arity such that tractability condition (a) is not satisfied. Then p-EVAL(P) is coW[1]-hard.
Proof For a wdPT p ∈ P, let the relevant components set rcs(p) contain all the sets S \ I(S, t) as defined in tractability condition (a). Moreover, let rcs(P) = p∈P rcs(p). We will-by an FPT-reduction-reduce p-HOM(rcs(P)) to the complement of p-EVAL(P) . The result then follows from Theorem 1, as rcs(P) does not have bounded treewidth by assumption.
Consider an instance E, F of p-HOM(rcs(P)). As a first step, find p = ((T , r), λ, X ) ∈ P, a node t ∈ relv(T ) such that t = r, and a node component S ∈ N C(t) such that E = S \ I(S, t). They exist by assumption and, since P is decidable, can be computed as follows: enumerate all possible candidate triples p, t and S, check if p ∈ P and if so verify the conditions on p, t and S. Since p exists by assumption and all other steps are computable, this procedure eventually yields the desired triple.
Since t is relevant, either for t = t or some descendant t of t we have fvar(t ) \ fvar(branch(t )) = ∅. Among all possible candidates, pick some t at a minimal distance to t.
We next define a database D over the set of relation symbols in p. For the description of D, for all relation symbols R occurring in any atom R(v) ∈ λ(T ), we will assume that v contains only variables, i.e., elements from Var. We implicitly assume that for all positions where v contains a constant, all atoms in R D contain the same constant as in v. Recall that we deal with simple wdPTs, thus each relation symbol occurs at most once within λ(T ). Also recall that, for any nodet ∈ T , branch(t) contains the nodes on the path from r to the parent node oft, while cbranch(t) in addition includest itself. In the following, let d ∈ Const be some fresh value not occurring in dom(F).
For each relation symbol R mentioned outside of λ(cbranch(t )), let R D = ∅.
For each relation symbol R mentioned in S, observe that there exists a relation symbol R in E that was derived from R when computing S \ I(S, t). The idea is now to use R D to simulate the atoms with relation symbol R in F by padding the additional fields with d. Thus, let k be the arity of R, let m be the arity of R , let  R(b 1 , . . . , b k Finally, we define the mapping μ as μ(x) = d for all x ∈ fvar(branch(t )). With the description of the reduction complete, we claim that μ ∈ p(D) if and only if there is no homomorphism from E to F. We prove this property in two steps. First, we show that μ ∈ p(D) only depends on whether μ can be extended to t or not. After this we show that such an extension of μ exists if and only if there is a homomorphism h : E → F.
First, observe that the only possible extension μ of μ such that μ (τ ) ∈ D for every τ ∈ λ(branch(t)) is μ mapping every variable in var(branch(t )) to d. Moreover, for all nodes t = t in ch(branch(t)) the mapping μ cannot be extended to λ(t ), since for all relation symbols R mentioned in λ(t ) we have R D = ∅. Thus, μ is a pp-solution, and is a maximal pp-solution if and only if there exists no extension μ of μ with μ (τ ) ∈ D for all τ ∈ λ(t).
Clearly, if μ is a maximal pp-solution, then μ ∈ p(D). To see that μ / ∈ p(D) if μ is not a maximal pp-solution, assume the above mentioned extension μ of μ exists. Then μ can be obviously extended to μ with μ (τ ) ∈ D for all τ ∈ cbranch(t ) since for all atoms on (cbranch(t ) \ cbranch(t )) ∪ {t }, every possible atom over dom(D) is contained in D. Since dom(μ ) contains at least one free variable not in dom(μ ), this shows μ / ∈ p(D). It thus remains to show that the extension μ of μ exists if and only if there is a homomorphism h : E → F. To see that this is the case, observe that by construction every such homomorphism h in combination with μ gives a homomorphism from S into D, and vice versa, every homomorphism μ : S → D restricted to dom(E) gives the desired homomorphism. For the remaining atoms in λ(t) \ S, observe that every possible mapping sends them into D, since D again contains every possible atom for these relations.
To simplify the proof that tractability condition (b) is necessary, we first show that having bounded component width is a necessary condition on its own.

Lemma 3 Let P be a decidable class of simple wdPTs of bounded arity. If there does not exist some constant c such that for every p ∈ P the component width is bounded by c, then p-EVAL(P) is coW[1]-hard.
Proof The proof is an FPT-reduction from the problem of model checking FO sentences φ k of the form φ k = ∀x 1 . . . ∀x k ∃y k i=1 E i (x i , y). Model checking for this class of sentences, parameterized by their size, is W[1]-hard [5]. Thus, consider a formula φ k and a database E.
First, compute an arbitrary wdPT p = (T , λ, X ) ∈ P with a component width of at least k. Such a wdPT consists by assumption (otherwise the component width was bounded by k). W.l.o.g. we assume that p contains only binary atoms: Since we assume a bounded arity, binary atoms can be easily encoded into atoms of higher arity. Consider some relevant node t ∈ T and a node component S ∈ N C(t) such that the component width of S is at least k (they exist by construction). Since we assume relations to be of some bounded arity, S cannot be of type (1) (Definition 7). W.l.o.g. we thus assume that S is of type (2).
Since t is relevant, either for t = t or some descendant t of t we have fvar(t ) \ fvar(branch(t )) = ∅. Choose one such t at a minimal distance to t.
As in the proof of Lemma 2, for the description of D we assume for all R(v) ∈ λ(T ) that v contains only variables. Recall that we are dealing with simple wdPTs, thus each relation symbol R occurs at most once within λ(T ).
For each relation symbol R mentioned outside of λ(cbranch(t )), let R D = ∅.
For each relation symbol R mentioned in λ(cbranch(t )) \ S, let be the arity of For the relation symbols R mentioned in S, proceed as follows. Choose k interface variables v 1 , . . . , v k ∈ I(S, t). Let L = var(S) \ var(branch(t )) be the "local variables" of S. Observe that S being a node component of type (2) implies L = ∅. For the same reason, for each of the variables v i , there must exist at least one atom R i (v i , z i ) or R i (z i , v i ) for some z i ∈ L. We will assume R i (v i , z i ) in the following, the other case is analogous. Now for each v i , fix one such atom. Based on this, we define the following atoms to be contained in D: For each of the selected atoms Finally, μ is an arbitrary mapping fvar(branch(t )) → dom(E). It now follows by the same arguments as in the proof of Lemma 2 that we have μ / ∈ p(D) if and only if for every extension μ of μ to var(branch(t )), there exists an extension ν of μ such that ν(τ ) ∈ D for all τ ∈ S.
To complete this proof, we thus only need to show that such an extension exists if and only if φ k is satisfied. First, assume that φ k is satisfied. Then, for all z ∈ L, define ν(z) to be the value of y in φ k . This clearly maps S into D. Next, assume that φ k is not satisfied. Then there exists some assignment to x 1 , . . . , x k such that no suitable value for y exists. Then for the mapping μ assigning exactly those values to the selected interface variables v 1 , . . . , v k , there exists no extension of μ to S. This is because L defines a connected component in the Gaifman graph and because the definition of D forces all variables in L that occur together in some atom in S to be mapped to the same value by μ . Thus, μ has to map all "local variables" in S to the same value, which would provide a suitable value for y, leading to a contradiction. This concludes the proof.

Lemma 4 Let P be a decidable class of simple wdPTs of bounded arity that does not satisfy tractability condition (b). Then p-EVAL(P) is either coW[1]-or
Proof First, assume that there exists some constant that is, for all p ∈ P, an upper bound on the component width. Otherwise, p-EVAL(P) is coW[1]-hard by Lemma 3. In particular, we may thus assume that all relations in all instances of (λ(T ) ∪ n i=1 {cia(S i )}) \ fvar(T ) for all p ∈ P are of bounded arity. Let solcheck(P) be the class of all sets of atoms (λ(T ) ∪ n i=1 {cia(S i )}) \ fvar(T ) for P as defined in tractability condition (b). We reduce the problem p-HOM(solcheck(P)) to p-EVAL(P) via an FPT reduction. The result then follows directly by Theorem 1, since if (b) is false then solcheck(P) has unbounded treewidth. The rest of this proof gives the desired reduction.
Let E, F be an instance of p-HOM(solcheck(P)). We construct a wdPT p, a database D, and a mapping μ such that μ ∈ p(D) if and only if there is a homomorphism from E to F.
Next, the goal is to define a database D and a mapping μ such that μ ∈ p(D) if and only if a homomorphism from E to F exists. Before providing the formal construction, we sketch the idea first. Observe that E contains two types of atoms: λ(T ) and the interface atoms cia(S i ). The idea is to define D in such a way that there is a homomorphism μ : λ(T ) → D if and only if there exists h : λ(T ) → F, and that μ cannot be extended to any child node of T if and only if h can be extended to all atoms cia(S i ). Consider the depiction of a possible wdPT p in Fig. 4.
To implement the overall idea, we proceed as follows. We use atoms with the relation symbols in λ(T ) to simulate F. For each child node t i of T , we distinguish between the relation symbols in S i and those not in S i . For those not in S i , we provide all possible atoms over the domain in D. I.e., every homomorphism on λ(T ) can always be extended to the atoms not in S i . Whether it can be extended to all of λ(t i ) thus only depends on S i . For those atoms we provide values that are only compatible with a homomorphism on λ(T ) if this homomorphism cannot be extended to cia(S i ). However, given a mapping μ on fvar(T ), for μ / ∈ p(D) it is not sufficient that every homomorphism μ : λ(T ) → D can be extended to at least one child node of T : this child node must also contain a free variable not occurring in T . If this is not the case for some child t i , we pick one descendant s i of t i that contains a new free variable (since t i is relevant, s i exists), and for all relation symbols on the path from t i to s i , we let D contain all possible atoms over the domain, thus making sure that if μ can be extended to t i , it can also be extended all the way to s i .
We continue the formal definition of the reduction. To define a database D and a mapping μ such that μ ∈ p(D) if and only if a homomorphism from E to F exists, we need to define the following sets of nodes first. Let K = ch(T ) ∩ relv(T ) = {t 1 , . . . , t n }. For each t i ∈ K, we define the set N i of nodes as follows. If fvar(t i ) \ fvar(branch(t i )) = ∅ (i.e., t i contains some "new" free variable), then N i = ∅. Otherwise, let s i ∈ T be a descendant of t i such that fvar(s i ) \ fvar(branch(t i )) = ∅ and such that this property holds for no other node on the path from t i to s i . Then N i = cbranch(s i ) \ cbranch(t i ). (E.g., in Fig. 4, N 1 contains the two nodes with bold borders.) Finally, let N = n i=1 N i . We can now describe the database D. While   Fig. 4 Illustration of the different parts of a wdPT distinguished in the proof of Lemma 4. The "slots" in the nodes t 1 , t 2 , and t 3 represent the node components of these nodes. While we assume t 2 and t 3 to contain a free variable not occurring in T , t 1 does not contain such a variable. The node s 1 is one possible descendant of t 1 with a free variable not in T doing so, we implicitly assume that for all positions where an atom R(v) of p contains a constant, all the atoms in R D contain the same constant as v. I.e., we describe only the values for "variable positions" of v. Recall that we are dealing with simple wdPTs, thus each relation symbol R occurs at most once within λ(T ). λ(N)), let R D = ∅, i.e., for all atoms neither in T nor in any of the relevant child nodes of T (or their extensions to some node with a "new" free variable), no matching values exist in the database.

For all atoms R(y) ∈ λ(T ) \ (λ(T ) ∪ λ(K) ∪
For all atoms R(y) ∈ λ(T ), we want to use them to simulate in D the relations in F. Observe that for each such atom, there exists an atom R (z) ∈ E that was derived from R(y) by removing the free variables fvar(T ). Thus, for each atom of the form R (a) (i.e., atoms with relation symbol R ) in F, we add one atom R(b) to R D where b contains a fixed domain value d ∈ dom(F) at all positions where y contains a free variable, and the value from a at those positions where the variable still occurs in R (z ). I.e., R D is designed in such a way that all variables x ∈ fvar(T ) can only be mapped to d.
The definition for the atoms in K is more involved. Consider some t i ∈ K. Let v contain the existential interface variables of the node component S i ∈ N C(t i ) selected for the construction of E, and assume cia( For the atoms in S i , we distinguish between S i being of type (1) or of type (2).
where a v is the restriction of a to those positions in y with variables from v (and thus not containing variables from fvar(T )).
If S i is of type (2), we distinguish two types of variables: those that occur in I(S i , t), and those that do not appear in any node t ∈ branch(t i ). We call these latter variables new variables and use as their domain the set dom(F) |v| , i.e., the set of all possible assignments of values from F to the variables in v from R cia (v). We assume that the encoding of the assignments a ∈ dom(F) |v| is such that we can look up the value that is given to a variable v i ∈ v by a. For the remaining variables in var(S i ), we will use values from dom(F). For each atom R(y) ∈ S i , the values in R D are defined as follows: Then R D contains an atom for each tuple satisfying all of the following four properties.
1. All variables in fvar(T ) get the value d. 2. All the variables in z \ fvar(T ) get assigned the same value. Denote this value by a, and recall that a represents an assignment a ∈ dom(F) |v| . 3. For all v i ∈ v ∩ y, the value of v i is consistent with the vector a represented by a. Finally, let μ be the mapping defined on all variables x ∈ fvar(T ) as μ(x) = d. To prove μ ∈ p(D) if and only if homomorphism E → F exists, we first show the following claim.
Claim Let R cia (v) be an interface atom and S i the corresponding node component. Then, for a mapping μ : v → dom(F), we have that R cia (μ (v)) ∈ F if and only if there is no extension μ of μ ∪ μ to var(λ(t i )) that maps all atoms in S i into D (since S i ⊆ λ(t i ), this implies that there exists no extension μ of μ ∪ μ that maps λ(t i ) into D).
Proof (of the Claim) For node components of type (1), the claim is immediate. So let us assume for the rest of the proof that S i is of type (2).
First let R cia (μ (v)) ∈ F. Let us assume an arbitrary extension μ of μ ∪ μ to var(S i ). If μ does not satisfy conditions 1., 2., and 3. for all R(y) ∈ S i , then clearly for this particular atom there exists no atom in D to which it can be mapped by μ . We may thus assume that μ satisfies the first three conditions for all atoms R(y) ∈ S i . Then all variables in var(S i ) \ (v ∪ fvar(T )) take the same value under μ , and this value corresponds exactly to the tuple μ (v). But then μ does not satisfy condition 4. for any R(y) ∈ S i since R cia (μ (v)) ∈ R F cia by assumption. Thus, R D does not contain any atom onto which μ could map R(y) and thus μ cannot exist which completes the first direction.
For the other direction, assume that no extension μ of μ ∪ μ maps all atoms in S i into D. Then this is in particular true for those assignments satisfying conditions 1., 2., and 3. Note that every such assignment maps all variables in z \ fvar(T ) to the same value, representing a mapping on v. Also, μ | v = μ . Since μ fails to map S i into D because of 4., we get that R cia (μ (v)) ∈ R F cia , which completes the proof of the claim.
We continue the proof that μ ∈ p(D) if and only if a homomorphism E → F exists. First observe that μ ∈ p(D) if and only if on the one hand there is an extension μ of μ to var(T ) that maps all atoms in λ(T ) into D (of course, in general every subtree T containing the root node of T with fvar(T ) = dom(μ) is a potential candidate, but given the construction of D, the subtree T is the only possible candidate) and, on the other hand, for all t i ∈ K, we have that there does not exist an extension of μ onto λ(t i ) ∪ λ(N i ). (In fact, extending the mapping to any descendant of t i that contains some additional free variable would work. However, the only nodes with non-empty relations in D are those mentioned in N.) By the construction of D for atoms in λ(N), for every t i ∈ K it follows immediately that there exists an extension of μ onto λ(t i ) ∪ λ(N i ) if and only if there exists an extension to λ(t i ). This is because for the atoms in λ(N) the database D contains all possible atoms, thus every extension μ of μ onto λ(t i ) can be further extended to all atoms in λ(N i ).
Note that the existence of an extension of μ onto λ(t i ) is, by the Claim shown above, equivalent to μ sending R cia (v) into F. So μ ∈ p(D) if and only if there is a homomorphism from E into F. This completes the proof.

Tractability Conditions for Non-simple wdPTs
As already mentioned at the beginning of Section 3, the tractability conditions presented there are not restricted to simple wdPTs, but also apply to arbitrary wdPTs. I.e., deciding p-EVAL(P) is in FPT also for classes P of non-simple wdPTs. However, in case that the same relation symbol may occur in more than one atom throughout the query, the tractability criteria stated in the previous section miss important classes of tractable wdPTs. In the following, we discuss more general tractability notions.
The key difference for non-simple wdPTs is that in this setting, a set of atoms is not necessarily its own core. Since for the homomorphism problem, not the treewidth of the set of atoms, but the treewidth of its core determines the complexity of the problem as shown by Grohe [12] and recalled in Theorem 1, the concept of cores must also be taken into account for the tractability conditions. To do so, we revisit both, the notion of cores and the homomorphism problem, studying variations more suitable to our setting, and then apply these results to extend the definition of tractable classes.

Extension Cores
In this section, we will introduce a variant of cores we call extension core that will turn out to be the right notion for wdPTs. The reason for this is that while the core is the suitable concept for the homomorphism problem (cf. Theorem 1), when evaluating wdPTs we actually deal with a variant of the homomorphism problem. To further motivate the idea of these extensions, recall that when deciding p-EVAL(P) for some wdPT p = (T , λ, X ), one important step is to check, given some subtree T of T , a mapping μ and some child node of T , whether μ can be extended to this child. While this problem can be correctly stated as an instance of HOM, such a formulation could not adequately express all available information. In fact, compared to HOM, the problem at hand contains both, additional constraints (for some variables in the child node, the value is already determined by μ ) and additional information (we know that μ maps λ(T ) into the database). The formulation as an instance of HOM could not utilize all of the information at hand, and as a result the problem might look harder than it actually is.
Example 10 Consider the (non-simple) wdPT p k = (T , λ k , {x 1 , x 2 }) (parameterized by k) depicted in Fig. 5 consisting of two nodes. Clearly, the class P = {p k | k ≥ 2} of wdPTs does neither satisfy tractability condition (a) nor (b). In both cases, the reason for the violation is the clique of boss of (y i , y j ) atoms in t 1 , which resides in a single node component.
Nevertheless, deciding p-EVAL(P) is in FPT. As mentioned, the reason for this is that tractability for the homomorphism problem depends on the treewidth of the core of the structure, and not on the treewidth of the structure.
That is, consider tractability condition (b) first. It ensures that in line 7 of Algorithm 1 the CQ can be evaluated efficiently. Now condition (b) is not satisfied because of the "subtree" of T which contains all of T . In this case, the  q k (a, b) is the query q k with x 1 and x 2 in λ k (T ) being replaced by a and b, respectively) is tractable. This is because core(λ k (T )) = {emp(x 1 ), boss of (x 1 , x 1 ), works in(x 1 , d 1 ), leads(x 2 , d 1 ), part of (d 1 , d 2 )} due to the homomorphism h with h(y i ) = x 1 for 1 ≤ i ≤ k. The treewidth of these cores is obviously bounded by some constant, and thus deciding the problem is tractable (c.f. Theorem 1).
While adapting tractability condition (b) is rather straightforward, the situation is more interesting for condition (a). Recall that the idea of condition (a) is to ensure that deciding whether a mapping on the interface variables of a node component can be extended to this node component is tractable. Consider the node component S in t 1 that contains the set of boss of (y i , y j ) atoms. Clearly, for increasing k the treewidth of this component is not bounded, and neither is the treewidth of its core (since it already is its own core). Deciding whether a mapping on d 1 (the only interface variable) can be extended to this component is nevertheless tractable.
The reason for this is that it is only necessary to consider mappings μ on d 1 that have an extension which maps λ k (r) into a given database. Thus, for deciding the extension to t 1 , we can assume the existence of such a mapping on λ k (r). In our case, we thus know that the database contains a target for emp(x 1 ), boss of (x 1 , x 1 ), and works in(x 1 , d 1 ). Thus, instead of considering just the core of S, we can consider the core of S ∪ λ k (r). Again by the same homomorphism as before, that is h(y i ) = x 1 for all 1 ≤ i ≤ k, we can now fold parts of S into λ k (r). In the next step, we can remove again all atoms from λ k (r), since for them the existence of a mapping into the database is already known, and finally only need to decide the existence of an extension of μ on the remaining set of atoms, which again, is tractable for P.
To formalize this idea of "folding" parts of node components into the set of atoms on the branch to this component, we introduce the notion of an extension core. We then introduce the problem EXT(C) which captures the idea of finding extensions to a given mapping, and characterize its tractable classes. Using this problem, we then introduce refined versions of tractability conditions (a) and (b) based on the notion of the extension core, and show that these improved conditions indeed guarantee tractability for p-EVAL(P).
Towards the definition of the extension core, for a set A ⊆ Const ∪ Var of elements, let Fix A be the set of atoms Fix A = {R a (a) | a ∈ A} where each R a is a unique relation symbol. Said differently, the extension core is constructed by introducing a new relation for every domain element in A and then computing the core. That is, the extension core accounts on the one hand for the possibility that parts of B might be folded into A (and thus the extension of the homomorphism to these parts is guaranteed), and on the other hand for the fact that the mapping on dom(A) is fixed. Removing dom(A) is then possible because the mapping is already provided for these values.

The Extension Problem
A task within the evaluation problem of wdPTs is to test whether a mapping on some subtree is maximal or can be extended to some child node. We formalize this by the following problem, where C is a class of pairs of sets of atoms.
The parameterized problem p-EXT(C) is the problem EXT(C) parameterized by the size of (A, B).
The main difference between HOM and EXT is that in addition to a set of atoms, the input of EXT gets another set of atoms and a homomorphism that is already guaranteed to map this additional set of atoms into the target. When looking for tractable classes, this additional input has to be taken into account. This is exactly the role of extension cores as defined in the last section.
While here we use the extension problem to define tractable fragments of the evaluation problem of wdPTs with projection, we note that it also allows for an alternative formulation of the tractable classes of projection free wdPTs defined by Romero [24].
With these definitions settled, we next use extension cores to provide an exact characterization of the tractable classes C of the extension problem EXT(C). To this end, we define the treewidth of extcore(C) to be the maximum of the treewidth of extcore (A, B) for (A, B) ∈ C if this maximum exists and ∞ otherwise.

Theorem 3
Assume that FPT = W [1] and let C be a decidable class of pairs of sets of atoms. Then the following statements are equivalent: 1. The treewidth of extcore(C) is bounded by a constant. 2. The problem EXT(C) is in PTIME.

The problem p-EXT(C) is in FPT.
Theorem 3 is shown using a sequence of lemmas given below. First, we state an easy but important observation that we use tacitly throughout this section.

Observation 4 -Extension cores are unique up to isomorphism.
-For all sets A, B of atoms, we have core(extcore (A, B)) = extcore (A, B).
The first lemma in the sequence describes a crucial property of extension cores that will be used several times throughout the remainder of this section. Both, restricting a set of atoms and the projection of pairs of atoms are well-known techniques. However, they occur in slightly different interpretations throughout the literature. Thus, to avoid any ambiguities, we provide full formal definitions for both of them. Being very technical definitions for common notions, they are given in the Appendix.
The following observation not only is an immediate consequence of the above definitions, but also describes the intuition behind projecting a pair of sets of atoms and will be used in the proof of Lemma 6. Now the treewidth of EXT is bounded by c, and therefore also the treewidth of EXT (taking subgraphs does not increase the treewidth). As a result, the existence of a homomorphism EXT → F can be decided in polynomial time [7], which proves the lemma.
With this showing the upper bounds in Theorem 3, we turn towards the lower bound, for which the following property will turn out to be important. A and B be sets of atoms and let C be the set of atoms core(A ∪ B ∪ Fix dom(A) ) from the definition of the extension core. If h is a homomorphism from C to itself, then h is bijective.

Lemma 7 Let
Proof Since C is, by definition, a core, any homomorphism from C to itself is an isomorphism.
The next result shows that the lower bounds in Theorem 3 are optimal by using the characterization of tractable classes (for both, PTIME and FPT) of p-HOM(C) provided by Grohe [12].
Lemma 8 Let C be a decidable class of pairs of sets of atoms and let extcore(C) be the class of extension cores of the pairs in C. Then p-HOM(extcore(C)) ≤ FPT p-EXT(C).
Proof Let C be a decidable class of pairs of sets of atoms, and let (L, T) be an instance of p-HOM(extcore(C)). We reduce this problem to an instance of p-EXT(C).
Borrowing from the notation of databases introduced in Section 2, for an arbitrary set A of atoms and a relation symbol R, throughout this proof we write R A to denote the set of all atoms in A with relation symbol R.
In the first step, we compute some pair (A, B) ∈ C such that extcore(A, B) = L. By assumption, such a pair exists and, because C is decidable, can be computed. Next  ((d 1 , a 1 ), . . . , (d k , a k )) to D. (Observe that by slight abuse of notation, in order to simplify the description we denote the positions in d 1  These are all the tuples in D. It is worth pointing out that in case R is not part of the schema of T or (R ) T is empty, then by this definition also R D is empty. The resulting instance will therefore be a simple "no" instance, because R E is non-empty. However, in this case we also have that (R ) L is nonempty, and therefore also (L, T) is a trivial "no" instance. Finally, we define the mapping h : dom(A) → dom(D) as h(a) = (d, a) for some arbitrary but fixed element d ∈ dom(T). Since D contains one atom R ((d 1 , a 1 ), . . . , (d k , a k )) for every atom R(a 1 , . . . , a k ) ∈ A and every combination of d 1 , . . . , d k ∈ dom(T), clearly h is a homomorphism h : A → D.
It remains to prove that there indeed exists a homomorphism g : L → T if and only if h can be extended to a homomorphismĥ : E → D.
First assume that g exists. Then define an extensionĥ of h to dom(E) asĥ(a) =  (g(a), a) for all a ∈ dom(E) \ dom(A). The mapping g is indeed defined on all these elements, since dom(E) \ dom(A) = dom(extcore(A, B)) = dom(L) because extcore(A, B) = L. For a ∈ dom(A) we need not defineĥ since h is already defined on these elements, andĥ extends h. It now follows immediately from the construction of D thatĥ is indeed the required homomorphism.
For the other direction, assume thatĥ exists. First, observe that D projected onto the second component of its domain elements gives E. Thus,ĥ is a bijection in this second coordinate by Lemma 7. Let π 2 be the projection to the second coordinate. Then π 2 • h is an automorphism of E, and thus there is a n ∈ N such that (π 2 • h) n = id (where id denotes the identity mapping). Consequently, w.l.o.g. we assume that π 2 • h = id. For every a ∈ dom(L) = dom(extcore (A, B)) define g(a) to be the value d such thatĥ(a) = (d, a). Then again by definition of D it follows immediately that for all relation symbols R and tuples a ∈ R L we have g(a) ∈ R T .
Observing that all constructions can be done efficiently completes the proof.

Tractability Conditions for Arbitrary wdPTs
With the notion of extension cores, we now have a tool to adapt tractability conditions (a) and (b) to also account for the core of sets of atoms, and to make use of the knowledge that when looking for extensions, the existence of certain mappings can be assumed. Recall the intuition described at the beginning of Section 6.1 and Example 10. In this situation, the maximality test towards a single node component can be easily expressed as an instance of EXT(C). More precisely, given a wdPT (T , λ, X ), a database D, a subtree T of T , and a node component S ∈ N C(t) for some node t ∈ ch(T ), testing if some mapping μ can be extended to S is the instance (λ(T ), S), D, μ of EXT(C). One way to adapt tractability condition (a)recall that intuitively this is the condition ensuring that the test for maximality is tractable -would thus be to associate with each class P of wdPTs a class C of all relevant pairs (λ(T ), S), and to require EXT(C) to be in FPT. However, using the easy characterization of Theorem 3, we state the refined variant of tractability condition (a) directly in terms of extension cores.
Three comments are in order. First, observe that for the case of simple wdPTs, tractability condition (a') is equivalent to tractability condition (a). Second, note that unlike in the above discussion, condition (a') mentions branch(t ) instead of T . This is because the condition has to be satisfied for all subtrees T with t ∈ ch(T ). Among these, branch(t ) is the minimal one. Thus, for all subtrees containing branch(t ), the treewidth of extcore(λ(T ), S) is at most the treewidth of extcore(λ(branch(t )), S). Third, recall that the intuition used above for motivating tractability condition (a') does not match the actual idea implemented in Algorithm 1. In fact, not testing maximality for one possible mapping on subtrees T after the other, but merging this test with finding mappings on the existential variables in T was actually a crucial step in the development of the algorithm. We will later describe the necessary changes to the algorithm in order for it to utilize the additional information given by the existence of a mapping that maps λ(T ) into the database, but first we reconsider tractability condition (b).
Recall that tractability condition (b) ensures that the CQs for finding suitable, maximal mappings on the existential variables are tractable. By Grohe [12], this is the case if and only if the core of the CQs has bounded treewidth. However, to account for the fact that for some free variables a mapping was provided as part of the input, when computing the core, these variables must be mapped onto themselves. This requirement is again naturally expressed in terms of extension cores.
Analogously to tractability condition (a'), for simple wdPTs tractability condition (b') is equivalent to condition (b). However, there exist classes of non-simple wdPTs that satisfy conditions (a') and (b'), but not (a) and (b). An example for such a class of wdPTs was described in Example 10. Also, the two tractability conditions are independent of each other, as illustrated by the following example.
Example 12 For a class of wdPTs that satisfies condition (a') but not (b'), consider the class of wdPTs containing the wdPTs from Fig. 3 for all k ≥ 1. For any subtree containing the node t 2 , condition (b') is not satisfied because of the k-clique of yclique ij (y i , y j ) atoms. However, condition (a') is satisfied, since all variables in this clique are interface variables.
For the opposite case, recall the wdPT in Fig. 5, but assume that the atom emp(y 1 ) was part of λ(r) instead of λ(t 1 ). Clearly, the core of λ(T ) remains unchanged, and thus of bounded treewidth. Also the set of atoms boss of remains part of a single node component S. Now when computing the extension core of (λ(r), S), y 1 is part of λ(r), and thus must be mapped onto itself. As a result, all y i must be mapped onto themselves. After removing the variables from λ(r), a (k − 1)-clique remains, and thus condition (a') is not satisfied.
Conditions (a') and (b') therefore describe a proper extended set of classes of wdPTs. Next, we discuss why query evaluation is indeed tractable for these classes.
In fact, Algorithm 1 already describes a correct FPT algorithm even for classes of arbitrary wdPTs satisfying conditions (a') and The reason for this is that replacing S by extcore(λ(branch(t )), S) implies that the mapping h that shall be extended to S is (or can be extended to) a homomorphism branch(t ) → D. Obviously, in the example this is not the case for any mapping that maps y 2 to either 0 or 1.
As we will show below, this difference has no effect on the output of Algorithm 1, since the additional values in stop(S, D) according to the original definition are never involved in any solution anyway.
By combining all of this, we get the following result as a proper extension of Lemma 1. Proof First of all, observe that under these conditions Algorithm 1 is still in FPT: with the exception of evaluating the query q (line 7) and computing the set stop(S, D) (line 6), all the arguments from Section 4 still apply.
For computing stop (S, D), first of all observe that since the component interface atoms contain unique relation symbols and no variables from fvar(T ), they occur unchanged in the core. Thus, the component width of P is bounded by some constant, and therefore there exist at most polynomial many candidate mappings to be included in stop (S, D). Furthermore, testing each of these mappings is in FPT. To see this, recall that testing requires to decide an instance (∅, extcore(λ(branch(t )), S)), h, D of EXT(C). By Theorem 3, this decision is in FPT if extcore(∅, extcore(λ(branch(t )), S)) = extcore(λ(branch(t )), S) has bounded treewidth, which is guaranteed by tractability condition (a').
It thus remains to show the correctness of the algorithm, which follows by the same arguments as in Section 4. Therefore, the only point that needs to be shown is that the presented computation of stop(S, D) is correct. First of all, stop(S, D) according to the definition via the extension core being a subset of stop(S, D) according to the original definition in Section 4 follows immediately. For the opposite direction, where we have already shown that this is not necessarily the case, we show that the additional mappings in stop(S, D) according to the original definition have no effect on the result of Algorithm 1.
Towards this, first assume that for a mapping ν ∈ stop(S, D) according to the definition in Section 4, there exists an extension ν : λ(branch(t)) → D. Then, since all variables shared between S and λ(branch(t)) occur in I(S, t), we have that ν can be extended to a homomorphism S → D if and only if (λ(branch(t)), S), ν , D is a positive instance of EXT(C). I.e., for such ν we still have ν ∈ stop(S, D) by the new definition, and thus the test is correct. Next, assume that there exists no such extension ν . In this case, in line 7 of the algorithm, q(D ) = q(D \ {R i (ν(v i ))}, since λ(branch(t)) is contained in the body of the query, and thus ν cannot be part of any solution mapping. Hence, ν / ∈ stop(S, D) still gives a correct solution.

Relationship with SPARQL and Conclusion
Our results give a fine understanding of the tractable classes of wdPTs in the presence of projection. In particular they show the different sources of hardness. As laid out in the introduction, there is a strong relationship between well-designed SPARQL queries and wdPTs: For every well-designed SPARQL query, an equivalent well-designed pattern tree can be computed in polynomial time, and vice versa, in a completely syntactic way.
Note that our characterization of tractable classes of Theorem 2 unfortunately cannot be immediately translated to well-designed SPARQL queries. This is because our characterization only applies to classes of simple well-designed pattern trees. However, RDF triples and SPARQL triple patterns, in the relational model, are usually represented with a single (ternary) relation. Thus, there is no direct translation to and from simple (well-designed) pattern trees. As a consequence, our result does not imply an immediate characterization of the tractable classes of well-designed {AND, OPTIONAL}-SPARQL queries.
Nevertheless, our results also give interesting insights to SPARQL with projections. First, Algorithm 1 directly applies to queries in which relation symbols appear several times and thus in particular for well-designed pattern trees resulting from the translation of well-designed SPARQL queries. Moreover, our result determines completely the tractable classes that can be characterized by analyzing only the underlying graph structure of the queries, i.e., the Gaifman graph. Indeed, since simple queries can simulate all other queries sharing the same Gaifman graph by duplicating relations, Gaifman graph based techniques have exactly the same limits as simple queries. Thus, our work gives significant information on limits of tractability for SPARQL queries in the same way as, e.g., Grohe et al. [13], Chen [4], and Chen and Dalmau [5] did in similar contexts.
As we have seen, by incorporating cores, we can also characterize larger tractable classes in the non-simple case, and thus again for well-designed pattern trees resulting from the translation of well-designed SPARQL queries. However, in this case we do not get a dichotomy result.
Let us mention one major stumbling block towards a characterization of nonsimple well-designed pattern trees with projections: In the proof of Lemma 3, we have used a reduction from quantified conjunctive queries. Unfortunately, the tractable classes for the non-simple fragment for that problem are not well understood which limits our result to simple queries since we are using the respective results by Chen and Dalmau [5]. Note that we might have been able to give a more finegrained result in sorted logics by using the work of Chen and Marx [6], but since this would, in our opinion, not have been very natural in our setting, we did not pursue this direction. Thus, a better understanding of non-simple pattern trees would either need progress on quantified conjunctive queries or a reduction from another problem that is better understood.
One prominent operator of SPARQL that we did not consider is UNION, whose correspondence in pattern trees are sets of pattern trees, so-called pattern forests. While the extension to simple pattern forests is immediate (since no two trees share any relation symbols), it is not clear how to approach the possible repetition of relation symbol within different trees in forests of simple pattern trees in combination with projection.
Finally, another interesting class of queries are weakly well-designed pattern trees. While the tractability conditions can be easily adapted to provide FPT algorithms for these queries, providing a characterization of the tractable classes is much harder due to the fact that relevant nodes need not have a descendant introducing a "new" free variable.

Funding Information Open access funding provided by TU Wien (TUW).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommonshorg/licenses/by/4.0/.

Appendix: Additional Definitions
In Sections 2 and 6, we introduced the notions of restricting a set A of atoms to some subset A ⊆ dom(A) and projecting a pair of sets of atoms under some mapping. As mentioned in Section 6, below we provide a full formal definition of these notion to avoid any ambiguities a reader may have.

Restricting a Set of Atoms (S \ V)
A complete definition was already given in Section 2. In the following, we clarify the names of the newly introduced relation symbols R s . For a set S of atoms over some relational schema σ and V ⊆ dom(S), S \ V contains the following set of atoms over the relational schema σ , which is defined implicitly by the newly introduced atoms described below. For every atom R(a 1 , . . . , a k ) ∈ S, let o 1 , . . . , o be those positions of (a 1 , . . . , a k ) that do not contain values from V, i.e., a o j / ∈ V for 1 ≤ j ≤ , and let {i 1 , . . . , i m } = {1, . . . , k} \ {o 1 , . . . , o } be those positions such that a i j ∈ V for 1 ≤ j ≤ m. Then the set S\V contains the atom R i 1 =a i 1 ,...,i m =a im (a o 1 , . . . , a o ). These are all the atoms in S \ V.

Projecting a Pair of Sets of Atoms Under a Mapping
Let (Q, D) be a pair of sets of atoms over the same relational schema σ and let h be a mapping on (a subset of) dom(Q). We define the restriction of (Q, D) under h as the pair (Q , D ) of sets of atoms over the schema σ as follows (the relational schema σ is defined implicitly).
We start by defining Q . For every relation symbol R ∈ σ and every atom R(a 1 , . . . , a k ) ∈ Q, we distinguish two cases.
We need to take care of one special case: Assume that, for a pair (Q, D) and a homomorphism h such that Q is already a projection of some set L of atoms under a set V ⊆ dom(L), we want to get the projection of (Q, D) under h. I.e. the schema of Q already contains relation symbols of the form R i 1 =b 1 ,...,i =b . If for any of the i j (1 ≤ j ≤ ) it is the case that b j ∈ dom(h), then in the resulting schema we replace b j in the name of the resulting atom by h(b j ). In certain situations, this renaming of the relation symbols will ensure the resulting sets of atoms to be over the same relational schema, which is a prerequisite for finding homomorphisms.