All-Instances Restricted Chase Termination for Linear TGDs

The chase procedure is a fundamental algorithmic tool in database theory with a variety of applications. A key problem concerning the chase procedure is all-instances chase termination: for a given set of tuple-generating dependencies (TGDs), is it the case that the chase terminates for every input database? In view of the fact that this problem is, in general, undecidable, it is natural to ask whether well-behaved classes of TGDs, introduced in different contexts, ensure decidability. It has been recently shown that the problem is decidable for the restricted (a.k.a. standard) version of the chase, and linear TGDs, a prominent class of TGDs that has been introduced in the context of ontological query answering, under the assumption that only one atom appears in TGD-heads. We provide an alternative proof for this result based on Monadic Second-Order Logic, which we believe is simpler that the ones obtained from the literature.


Introduction
The chase procedure (or simply chase) is a fundamental algorithmic tool that has been successfully applied to several database problems such as computing data exchange solutions [14], query answering under constraints [9], containment of queries under constraints [1], and checking logical implication of constraints [5,22], to name a few. It accepts as an input a database D and a set T of constraints-which, for this work, are tuple-generating dependencies (TGDs) of the form ∀x∀ȳ( (x,ȳ) → ∃z (x,z)) with and being conjunctions of atoms -and, if it terminates, its result is a finite instance D T that is a universal model of D and T , i.e., is a model that can be homomorphically mapped into every other model of D and T . This is the reason for the ubiquity of the chase in database theory. Indeed, many key database problems can be solved by simply exhibiting a universal model. And this is not only in theory. Despite the fact that the instance constructed by the chase can be very large, efficient implementations of the chase procedure have been successfully applied during the last few years in many different contexts [6,20,25,26].
Given a database D and a set T of TGDs, roughly speaking, the chase adds new atoms to D (possibly involving null values that act as witnesses for the existentially quantified variables) until the final result satisfies T . Here is a simple example of how the chase procedure works. Example 1 Given the database D = {R(c)} , and the TGDs the database atom triggers the first TGD, and the chase adds in D the atom P(c, ⊥ 1 ) , which in turn triggers the second TGD and R(⊥ 1 ) is added, where ⊥ 1 is a labeled null representing some unknown value.
However, the atom R(⊥ 1 ) triggers again the first TGD, and the chase adds the atom P(⊥ 1 , ⊥ 2 ) , which triggers again the second TGD.

The Challenge of Non-termination
As said above, there are nowadays efficient implementations of the chase that allows us to solve central database problems by adopting a materialization-based approach [6,20,25,26]. But, of course, for this to be feasible in practice we need a guarantee that the chase terminates, which, as shown by Example 1, it is not always the case. This fact motivated a long line of research on identifying fragments of TGDs that ensure the termination of the chase procedure, for every input database. A prime example is the class of weakly-acyclic TGDs [14], the standard language for data exchange purposes, that guarantees the termination of the semi-oblivious and restricted (a.k.a. standard) chase. A similar formalism, called constraints with stratified-witness, has been proposed in [13]. Inspired by weakacyclicity, the notion of rich-acyclicity has been proposed in [19], which guarantees the termination of the oblivious chase. Many other sufficient conditions can be found in the literature; see, for example, [12,13,17,18,23,24]. At this point, let us note that the restricted chase applies a TGD only if it is necessary, i.e., only if the TGD is violated, while the (semi-)oblivious chase applies TGDs whenever the body is satisfied, without checking whether the head is satisfied.
With so much effort spent on identifying sufficient conditions for the termination of the chase procedure, the question that comes up is whether a sufficient condition that is also necessary exists. In other words, given a set T of TGDs, is it possible to decide whether, for every database D, the chase on D and T terminates? This has been studied in [15], and has been shown that the answer is negative, no matter which version of the chase we consider, namely the oblivious, semi-oblivious and restricted chase.
The undecidability proof in [15] relies on a sophisticated set of TGDs that goes beyond existing well-behaved classes of TGDs that enjoy certain syntactic properties, which in turn ensure useful model-theoretic properties. Such well-behaved classes of TGDs have been proposed in the context of ontological reasoning. The two main paradigms that led to robust TGD-based languages are guardedness [2,9,10] and stickiness [11]. A TGD is guarded if the left-hand side of the implication, known as the body of the TGD, has an atom that contains (or "guards") all the universally quantified variables. If a TGD has only one body-atom, which is trivially a guard, then is called linear; the class of linear TGDs is actually the main concern of the present work. On the other hand, sticky sets of TGDs are inherently unguarded. The key idea underlying stickiness can be described as follows: variables that appear more than once in the body of a TGD should be inductively propagated (or "stick") to every atom in the right-hand side (the head) of the TGD. Observe that the set of TGDs given in Example 1 is both guarded (actually linear) and sticky; notice that stickiness holds trivially since every body-variable occurs only once.
The fact that the set of TGDs given in the undecidability proof of [15] is far from being guarded (and therefore linear) or sticky raised the following question: is the chase termination problem, as described above, decidable for linear, guarded or sticky sets of TGDs? This question is rather wellunderstood for the (semi-)oblivious chase. In the case of linear TGDs, the problem is PSPACE-complete, and becomes 2EXPTIME-complete for guarded TGDs [7]. The sticky case has been recently addressed in [8], where it is shown that the problem is PSPACE-complete. On the other hand, when it comes to the more subtle case of the restricted chase, the problem has been studied only for single-head TGDs, i.e., TGDs with only one atom in the head, while the general case remains open. It has been recently shown that the problem is decidable for single-head guarded (and hence linear) and sticky TGDs [16]. The same result for single-head linear TGDs has been independently shown in [21].

Our Main Objective
In this work, we concentrate on single-head linear TGDs, and provide an alternative proof for the decidability of the restricted chase termination problem, which we believe is simpler than the ones obtained from the literature [16,21]. More precisely, we focus on the following problem: given a set T of single-head linear TGDs, is it the case that for every database D, every restricted chase derivation of D w.r.t. T is finite? Note that, in general, it might be the case that some derivations are finite and some others are not, depending on the order that the TGDs are being triggered, which is not the case for the (semi-)oblivious chase. The reason for this nondeterministic behavior is the fact that, as explained above, the restricted chase applies a TGD only if it is necessary, whereas the (semi-)oblivious chase applies TGDs whenever the body is satisfied, without checking whether the head is satisfied, which ensures a deterministic behavior.
As mentioned above, the decidability of this problem has been recently shown independently in [16,21]. In fact, [16] shows that the problem is decidable for the class of single-head guarded TGDs, which generalizes single-head linear TGDs. This is done via a reduction to the satisfiability problem of Monadic Second-Order Logic (MSOL) over infinite trees of bounded degree. On the other hand, [21] concentrates on the class of single-head linear TGDs, and the decidability of the restricted chase termination problem is shown by relying on derivation trees, a notion that was originally introduced in the context of ontological query answering [3]. Let us also say that the proof given in [16] for single-head sticky TGDs, which is via a reduction to the emptiness problem of deterministic Büchi automata, can be converted into a proof for single-head linear TGDs.
Although several different proofs for the decidability of the restricted chase termination problem for single-head linear TGDs can be obtained from the literature, we strongly believe that a proof based on MSOL that directly exploits the linearity of the TGDs is the natural way to go. This will provide a neat solution to the problem in question via standard means, which is simpler than the existing ones. The main objective of this work is to provide such a proof.

Preliminaries
We consider the disjoint countably infinite sets , , and of constants, (labeled) nulls, and variables, respectively. We refer to constants, nulls and variables as terms. For n > 0 , we may write [n] for the set {1, … , n}.
Relational Databases. A schema is a finite set of relation symbols (or predicates) with associated arity. We write R/n to denote that R has arity n > 0 ; we may also write (R) for n. A position of is a pair (R, i), where R∕n ∈ and i ∈ [n] , that identifies the i-th argument of R. An atom over is an expression of the form R(t) , where R∕n ∈ and t is an n-tuple of terms. A fact is an atom whose arguments consist only of constants. We write R(t)[i] for the term of R(t) at position (R, i), i.e., the i-th element of t . An instance over is a (possibly infinite) set of atoms over that contain constants and nulls, while a database over is a finite set of facts over . The active domain of an instance I, denoted (I) , is the set of all terms in I. Substitutions and Homomorphisms. A substitution from a set of terms T to a set of terms T ′ is a function h ∶ T → T � defined as follows: ∅ is a substitution, and if h is a substitu- A homomorphism from a set of atoms A to a set of atoms B is a substitution h from the terms occurring in A to the terms occurring in B such that Single-Head Tuple-Generating Dependencies. A singlehead tuple-generating dependency is a constant-free firstorder sentence of the form where x,ȳ,z are tuples of variables of , (x,ȳ) is a conjunction of atoms, and R(x,z) is a single atom. For brevity, we write as (x,ȳ) → ∃z R(x,z) , and use comma instead of ∧ for joining atoms. We refer to (x,ȳ) and R(x,z) as the body and head of , denoted ( ) and ( ) , respectively. Henceforth, we simply say tuple-generating dependency (TGD) instead of single-head TGD. The frontier of the TGD , denoted ( ) , is the set of variables x , i.e., the variables that appear both in the body and in the head of . Note that, by abuse of notation, we sometimes treat a tuple of variables as a set of variables. The schema of a set T of TGDs, denoted (T) , is the set of predicates occurring in T , and we write (T) for the maximum arity over all those predicates. An instance I satisfies a TGD , written I ⊧ , if the following holds: whenever there is a homomorphism h such that h( ( )) ∈ I . By abuse of notation, we may treat a conjunction of atoms as a set. The instance I satisfies a set T of TGDs, written I ⊧ T , if I ⊧ for each ∈ T .
Linearity. A TGD is called linear if ( ) consists of a single atom [10]. The class of linear TGDs, denoted , is the family of all possible finite sets of linear TGDs.

The Restricted Chase Procedure
The chase procedure accepts as input a database D and a set T of TGDs, and constructs an instance that contains D and satisfies T . Central notions in this context are the notion of trigger, and the notion of trigger application.

Definition 1 (Chase Trigger) A trigger for a set T of TGDs on an instance
where v is a mapping from the variables of ( ) to defined as An application of ( , h) to I returns the instance and such an application is denoted as I⟨ , h⟩J . ◻ In the definition of ( , h) , each existentially quantified variable x occurring in ( ) is mapped by v to a "fresh" null value of whose name is uniquely determined by the trigger ( , h) and x itself. Thus, given a trigger ( , h) , we can unambiguously write down the atom ( , h). The main idea of the restricted chase is, starting from a database D, to apply active triggers for the given set T of TGDs on the instance constructed so far, and keep doing this until a fixpoint is reached. This is formalized as follows. Consider a database D and a set T of TGDs. We distinguish the two cases where the chase is terminating or not: there is an active trigger ( , h) for T on I i with I i ⟨ , h⟩I i+1 , and no active trigger for T on I n .
In a fair chase derivation all the active triggers will eventually be deactivated, which is not true for unfair ones.
A restricted chase derivation is called valid if it is finite, or infinite and fair. Infinite but unfair restricted chase derivations are not valid since they do not serve the main purpose of the chase procedure, i.e., build an instance that satisfies the given set of TGDs. Since we deal only with the restricted chase, in the rest of the paper we may simply say chase derivation meaning restricted chase derivation.

Chase Termination Problem
It is well-known that due to the existentially quantified variables, a valid chase derivation may be infinite. This is true even for very simple settings: it is easy to verify that the only The key question is, given a set T of TGDs, can we check whether, for every database D, every valid chase derivation of D w.r.t. T is finite? Before formalizing this problem, let us recall a central class of TGDs: The superscript in ℂ ∀∀ indicates that we concentrate on restricted chase derivations, while the subscript ∀∀ indicates that we consider every database, and every valid chase derivation. The main problem tackled in this work is defined as follows, where ℂ is a class of TGDs: The above decision problem is, in general, undecidable. In fact, assuming that is the class of arbitrary (singlehead) TGDs, we have the following undecidability result: for every database D, every valid restricted chase derivation of D w.r.t. T is finite. Note that the undecidability of ∀∀ ( ) has been originally shown in [15] for schemas with binary and ternary predicates. The undecidability for schemas with binary predicates has been recently shown in [4] by adapting the proof of [15]. On the other hand, when it comes to the class of linear TGDs, we know that the above problem is decidable: The above result has been recently shown independently in [16,21]. In fact, [16] shows that the problem is decidable for the class of guarded TGDs, which generalizes linear TGDs. This is done via a reduction to the satisfiability problem of Monadic Second-Order Logic (MSOL) over infinite trees of bounded degree. On the other hand, [21] concentrates on the class of linear TGDs, and the decidability of the chase termination problem is shown by relying on derivation trees, a notion that was originally introduced in the context of ontological query answering [3]. The goal of the present work is to provide an alternative proof for Theorem 2 that relies on standard means, and is simpler than the ones obtained from [16,21]. This is done by exploiting MSOL.

Dealing With Fairness
As one might expect, to prove the decidability of ∀∀ ( ) , we focus on its complement and show that, for a set T of linear TGDs, we can decide whether there exists a database D such that there is a fair infinite chase derivation of D w.r.t. T . This is precisely how Theorem 2 is shown in [16,21]. However, as it has been already observed in [16,21], the difficulty is to ensure fairness. Interestingly, we know the following: The above has been independently shown in [16,21]. Notice, however, that the proof of [21] applies only to linear TGDs, while [16] shows, via a more sophisticated proof, that the above holds for arbitrary single-head (not necessarily linear) TGDs. At this point, let us stress that Theorem 3 does not hold for TGDs that can have a conjunction of atoms in the head. This is illustrated by the following example:

Example 2 Consider the set T of TGDs consisting of
There is an infinite restricted chase derivation of {R(a, b, b)} w.r.t. T ; apply only the first TGD. However, every valid chase derivation of {R(a, b, b)} w.r.t. T is finite. 1 ◻ The above discussion reveals the subtlety of the restricted chase, and explains why Theorem 2 is stated only for singlehead TGDs. The decidability status of ∀∀ ( ∧ ) , where ∧ is the class of arbitary linear TGDs, where the head can be a conjunction of atoms, remains an open problem.
From Theorem 3, we get the following useful corollary: The following are equivalent: There is a database D such that there exists an infinite chase derivation of D w.r.t. T .
Therefore, the complement of ∀∀ ( ) boils down to the problem of checking whether there is a database D such that there exists an infinite chase derivation of D w.r.t. the given set T ∈ , without having to ensure that is fair.

Plan of Attack
Our proof for Theorem 2 consists of two main steps: 1. We first establish, by relying on Corollary 1, that for a set T ∈ , T ∉ ℂ ∀∀ iff there exists a so-called chase path for T , which essentially encodes a path-like infinite chase derivation of a singleton database w.r.t. T . 2. We then show that chase paths are MSOL-definable, i.e., we can devise an MSOL sentence Φ T over infinite paths that is satisfiable iff a chase path for T exists.
The rest of the paper is devoted to giving further details concerning the above two steps.

Non-termination via Chase Paths
We start by introducing the notion of chase path. Given a trigger ( , h) for T on some instance I, and an atom , we say that stops ( , h) , written ( , h) , if there exists a homomorphism h ′ such that R(x, y, y) →∃z R(x, z, y), R(z, y, y) R(x, y, z) →R(z, z, z).

Roughly speaking,
( , h) means that in the presence of the atom ( , h) is superfluous in the sense that the trigger ( , h) for T on an instance that contains is not active due to the presence of . This is summarized in the following fact, which is easy to verify: Fact 1 Let T ∈ , and ( , h) be a trigger for T on some instance I over (T) . The following are equivalent: We are now ready to introduce the notion of chase path.
Definition 2 (Chase Path) Let T ∈ . A chase path for T is an infinite sequence ( i ) i≥0 of atoms over (T) , which contain constants and nulls, such that: 1. 0 is a fact, i.e., it contains only constants.

For
◻ Our goal is to show Lemma 1 given below. But first, we need to recall a useful notion known as the chase relation [11], which essentially describes how the atoms generated during the chase depend on each other. Consider a chase derivation = (I i ) i≥0 of a database D w.r.t. a set T of TGDs such that, for i ≥ 0 , Lemma 1 Let T ∈ . The following are equivalent: There exists a chase path for T .
Proof By Corollary 1, it suffices to show that the following statements are equivalent: 1. There exists a database D such that there is an infinite chase derivation of D w.r.t. T . 2. There exists a chase path for T .
We first show that (1) ⇒ (2) . Observe that it suffices to show that (1) implies that there is a fact such that there is an T that enjoys the following property: Indeed, in this case, for each i > 0 , there is a trigger ( , h) for T on { i−1 } such that (a) i = ( , h) , and (b) ( , h) is an active trigger for T on { 0 , … , i−1 } , and thus, by Fact 1, there is no 0 ≤ j < i such that j ( , h) . Therefore, ( i ) i≥0 is a chase path for T , which in turn implies that (1) ⇒ (2) , as needed. It remains to show that (1) implies the existence of as above.
By hypothesis, there is an infinite chase derivation = (I i ) i≥0 of D w.r.t. T . Since the TGDs of T are linear, that is, they have only one atom in the body, it is easy to see that the chase relation ≺ is essentially a forest with its roots being the atoms of D, i.e., for each ∈ ⋃ i≥0 I i , ≺ and ′ ≺ implies = � . This allows us, for each ∈ D , to extract from a (finite or infinite) chase derivation ′ of { } w.r.t. T . Moreover, since is infinite, there exists at least one ∈ D such that ′ is infinite. Therefore, ′ is an infinite chase derivation of { } w.r.t. T . However, ′ does not necessarily enjoy (⋆) . Note that ≺ ′ is essentially a tree T rooted at . It is not difficult to verify that the out-degree of T is bounded by |T| ⋅ (T) (T) , and therefore finite since T is finite. Since T is infinite, by König's Lemma, we get that in T there is an infinite path 0 , 1 , 2 , … , where 0 = . 2 Clearly, i ≺ � i+1 , for each i ≥ 0 . Therefore, = ({ 0 , … , i }) i≥0 is an infinite chase derivation of { } w.r.t. T that enjoys (⋆) , and the claim follows.
Let us now show that (2) ⇒ (1) . By hypothesis, there is a chase path It is clear that, for each i > 0 , there is a trigger ( , h) for T on I i−1 such that I i−1 ⟨ , h⟩I i . Moreover, there is no ∈ { 0 , … , i−1 } such that ( , h) ; thus, by Fact 1, ( , h) is active. This implies that is an infinite chase derivation of the singleton database { 0 } w.r.t. T . ◻

Chase Paths are MSOL-definable
We proceed to show that the existence of a chase path for a set T ∈ can be checked via an MSOL sentence. In other words, we are going to argue that there exists an MSOL sentence Φ T such that Φ T is satisfiable over infinite paths iff there is a chase path for T . Instead of giving the rather long and tedious sentence Φ T , we describe what it does, and it will be apparent that is indeed expressible in MSOL.

Atom Encoding
One may think that, for a set T ∈ , our MSOL sentence Φ T could directly talk about a chase path for T . But this is not going to work for the simple reason that a chase path for T consists of infinitely many atoms. We therefore need something similar to a chase path, i.e., a structure that encodes a chase path for T as a labeled path, but much more parsimonious with respect to the labeling function. To this end, we need a convenient encoding for atoms. We proceed to define a finite alphabet Λ T that provides such an encoding. We write T for the set of all equivalence relations on {f , m} × {1, 2, … (T)} . The desired alphabet is defined as the set of triples Here is the idea underlying this encoding: -The first element of each triple is a predicate; it simply tells us the predicate of the encoded atom. -Concerning the second element, indicates that the encoded atom is the starting atom of the chase path. If an atom is an intermediate one, then the second element of the triple tells us from which TGD of T was generated. -Finally, for the third element, note that f and m stand for "father" and "me". The idea is that, for example, the pair ((m, i), (m, j)) says that the encoded atom has the same term at its i-th and j-th position, while the pair ((m, i), (f, j)) says that the term at the i-th position in the atom in question is the same as the term at the j-th position of its father in the chase path.
For brevity, given a triple = (x, y, z) ∈ Λ T , we write ( ) for the predicate x, ( ) for y, i.e., the origin of the encoded atom, and ( ) for the equivalence relation z.

Abstract Chase Paths
It should be intuitively clear that there exists a chase path for T iff the infinite path v 0 , v 1 , … can be labeled with triples from Λ T in such a way that: 1. The label of v 0 is of the form (x, , z) for some predicate x ∈ (T) , and an equivalence relation z ∈ T . 2. For each i > 0 , it holds that (a) the atom encoded by the label of v i can be obtained via an application of a trigger for T on the atom encoded by the label of v i−1 , and (b) there is no j < i such that the atom encoded by the label of v j stops the atom encoded by the label of v i .
Such a Λ T -labeled infinite path is a structure that encodes a chase path for T using finitely many labels, which we call abstract chase path, for which our MSOL sentence could talk about. We proceed to formalize the above discussion.
We first need to make precise when a triple ′ is a successor of some triple , which means that the atom encoded by ′ can be obtained via an application of a trigger for T on the atom encoded by the label of . Consider a triple ∈ Λ T . A sucessor of is a triple � ∈ Λ T , with ( � ) = for some ∈ T , such that the following hold; for brevity, we write for ( ) and for ( )): We also need to formalize when a triple stops a triple ′ , which essentially means that the atom encoded by stops the atom encoded by ′ . This relies on the notion of correct coloring for a pair of triples. Consider two triples , � ∈ Λ T , with ( ) = R∕n , ( � ) = R � ∕m , and ( � ) = for some ∈ T . A pair of colors (i, j), where i ∈ [n] and j ∈ [m] , is a correct coloring for ( , � ) if is of the form i.e., propagates the variable at position (R, i) in ( ) to the position (R � , j) in ( ) . Given a sequence of triples from Λ T of the form s = 0 , … , n , for n > 0 , a locally correct coloring for s is a tuple of colors (i 0 , … , i n ) such that, for each 0 ≤ j < n , (i j , i j+1 ) is a correct coloring for ( j , j+1 ).
The first condition ensures that there exists a homomorphism h from the atom encoded by n to the atom encoded by 0 , while the second condition ensures that h is the identity on the witnesses for the variables in ( ).
We now have all the ingredients needed to formally define the notion of abstract chase path: Definition 3 (Abstract Chase Path) Let T ∈ . An abstract chase path for T is an infinite Λ T -labeled path (v i ) i≥0 , with being the labeling function, such that: 1.
◻ It should not be difficult to observe the correspondence between the two conditions in the definition of chase path (Definition 2) and the two conditions in the definition of abstract chase path. This allows us to show the following: The following are equivalent: 1. There exists a chase path for T . 2. There exists an abstract chase path for T .

Proof
The proof relies on the obvious translation of a chase path for T into an abstract chase path for T , and vice-versa. We proceed to make this explicit.
(1) ⇒ (2). By hypothesis, there exists a chase path ( i ) i≥0 for T . By definition, for each i > 0 , there exists a trigger , and there is no 0 ≤ j < i such that j i . We proceed to construct an infinite Λ T -labeled path p = (v i ) i≥0 as follows: -The node v 0 is labeled with (x, , z) , where x is the predicate of 0 , and z is the smallest equivalence relation on {f , m} × {1, … , (T)} such that, for every , where x is the predicate of i , and z is the smallest equivalence relation on {f , m} × {1, … , (T)} such that the following hold; we assume that P is the predicate of i−1 : This completes the construction of p. It is now not difficult to see that p is an abstract chase path for T . Indeed, conditions 1 and 2a of Definition 3 can be easily verified. Concerning condition 2b, by contradiction, assume there exists i > 0 and 0 ≤ j < i such that (with being the labeling func- . But this implies that j i , which is a contradiction. (2) ⇒ (1). By hypothesis, there exists an abstract chase path p = (v i ) i≥0 for T ; is the labeling function of p. By definition, for each i > 0 , (v i ) is a successor of (v i−1 ) , and there is no We inductively construct an infinite sequence ( i ) i≥0 of atoms over (T) , which contain constants and nulls, as follows: -The atom 0 is of the form P(c 1 , … , c n ) , with c 1 , … , c n being constants, where P = ( (v 0 )) , and, for each i, j ∈ [n] , c i = c j iff ((m, i), (m, j)) ∈ ( (v 0 )). -Assume that we have constructed 0 , 1 , … , i ; let R being the predicate of i . We define i+1 as an atom of the form P(t 1 , … , t n ) , with t 1 , … , t n being constants and nulls, where P = ( (v i+1 )) , and is an existentially quantified variable z, then i+1 [j] is the null ⊥ z ,h where h is the homomorphism from ( ) to i , which exists since (v i+1 ) is a successor of (v i ).
This completes the construction of ( i ) i≥0 . It is not difficult to see that ( i ) i≥0 is a chase sequence for T . Indeed, conditions 1 and 2a of Definition 2 can be easily verified. Concerning condition 2b, by contradiction, assume that there exists i > 0 and 0 ≤ j < i such that j i . But this implies that (v j ) s (v i ) with s = (v j ), (v j+1 ), … , (v i ) , which is a contradiction, and the claim follows. ◻

Abstract Chase Paths are MSOL-definable
Our last task is to show the following: There is an MSOL sentence Φ T such that, for an infinite Λ T -labeled path p, it holds that p ⊧ Φ T iff p is an abstract chase path for T .
The sentence Φ T has to check whether an infinite Λ T -labeled path p = (V, <) , with being the labeling function, enjoys the two conditions of Definition 3. Let us assume, for the moment, that we have available the following formulas: desc (x, y) states that x is a descendant of y.
stop (x, y) means that (x) s (y) with s being the sequence of triples that label the subpath from x to y.
By exploiting the above formulas, we can devise Φ T as the conjunction 1 ∧ 2a ∧ 2b , where each conjunct checks for the corresponding condition of Definition 3. Let i.e., the set of triples of Λ T that correspond to facts, and We also use unary predicates of the form L , where ∈ Λ T , to state that the label of a node should be . The sentences 1 , 2a and 2b are defined as follows: and We proceed to give more details about the auxiliary formulas used in Φ T . The formula desc (x, y) comes from the general MSOL toolbox. It simply states that any set of nodes that contains x and is closed under <, it also contains y.
The formula s (x, y) can be easily devised providing that we have available a formula i,j = (x, y) , for each i, j ∈ [ (T)] , that states the following: the term in the atom encoded by the label of x at position i is equal to the term in the atom encoded by the label of y at position j. This can be expressed in MSOL as follows: there exists a subset A of the nodes in the Λ T -labeled path p such that: 1. A is the path with x and y being its ends, i.e., A is finite, x, y have exactly one neighbor in A, and any other node in A has exactly two neighbors, and 2. A is a disjoint union of the sets A 1 , … , A (T) such that x ∈ A i , y ∈ A j , and, for all pairs z, w ∈ A with (z, w) being an edge in the path p, z ∈ A k and w ∈ A , it holds that ((f , k), (m, )) ∈ ( (w)).
Note that above we need to check whether the set of nodes A is finite. This can be done by exploiting a formula that comes from the general MSOL toolbox. It simply states that p has = (x, y, z) ∈ Λ T | y = = ( , � ) ∈ Λ T × Λ T | � is a successor of .