Languages ordered by the subword order

We consider a language together with the subword relation, the cover relation, and regular predicates. For such structures, we consider the extension of first-order logic by threshold- and modulo-counting quantifiers. Depending on the language, the used predicates, and the fragment of the logic, we determine four new combinations that yield decidable theories. These results extend earlier ones where only the language of all words without the cover relation and fragments of first-order logic were considered.


Introduction
The subword relation (sometimes called scattered subword relation) is one of the simplest nontrivial examples of a well-quasi ordering [6]. This property allows its prominent use in the verification of infinite state systems [3]. The subword relation can be understood as embeddability of one word into another. This embeddability relation has been considered for other classes of structures like trees, posets, semilattices, lattices, graphs etc. [12,14,13,7,8,9,10,19,20].
In this paper, we study logical properties of a set of words ordered by the subword relation. We are mainly interested in general situations where we get a decidable logical theory. Regarding first-order logic, we already have a rather precise picture about the border between decidable and undecidable fragments: For the subword order alone, the ∃ * -theory is decidable [15] and the ∃ * ∀ * -theory is undecidable [11]. For the subword order together with regular predicates, the two-variable theory is decidable [11] and the three-variable theory [11] as well as the ∃ * -theory are undecidable [5] (these two undecidabilities already hold if we only consider singleton predicates, i.e., constants).
Thus, to get a decidable theory, one has to restrict the expressiveness of first-order logic considerably. For instance, neither in the ∃ * -, nor in the two-variable fragment of first-order logic, one can express the cover relation ⊏ · (i.e., "u is a proper subword of v and there is no word properly between these two"). As another example, one cannot express threshold properties like "there are at most k subwords with a given property" in any of these two logics (for k > 2).
In this paper, we refine the analysis of logical properties of the subword order in three aspects: We restrict the universe from the set of all words to a given language L.
Besides the subword order, we also consider the cover relation ⊏ ·.
We add threshold and modulo counting quantifiers to the logic.
Also as before, we may or may not add regular predicates or constants to the structure. In other words, we consider reducts of the structure (L, ⊑, ⊏ ·, (K ∩ L) K regular , (w) w∈L ) with L some language and fragments of the logic C+MOD that extends first-order logic by threshold-and modulo-counting quantifiers.
In this spectrum, we identify four new cases of decidable theories: 1. The C+MOD-theory of the whole structure is decidable provided L is bounded and context-free (Theorem 4). A rather special case of this result follows from [5, Theorem 4.1]: If L = (a * 1 a * 2 · · · a * m ) ℓ and if only regular predicates of the form (a * 1 a * 2 · · · a * m ) k are used, then the FO-theory is decidable (with Σ = {a 1 , . . . , a m }). 2. The C+MOD 2 -theory (i.e., the 2-variable-fragment of the C+MOD-theory) of the whole structure is decidable whenever L is regular (Corollary 17). The decidability of the FO 2 -theory without the cover relation is [11,Theorem 5.5]. 3. The Σ 1 -theory of the structure (L, ⊑) is decidable provided L is regular (Theorem 18).
For L = Σ * , this is [15,Prop. 2.2]. 4. The Σ 1 -theory of the structure (L, ⊑, (w) w∈L ) is decidable provided L is regular and almost every word from L contains a non-negligible number of occurrences of every letter (see below for a precise definition, Theorem 25). Note that, by [5,Theorem 3.3], this theory is undecidable if L = Σ * .
Our first result is shown by an interpretation of the structure in (N, +). Four ingredients are essential here: Parikh's theorem [16], the rationality of the subword relation [11], Nivat's theorem characterising rational relations [2], and the decidability of the C+MOD-theory of (N, +) [1,18,4] (note that this decidability does not follow directly from Presburger's result since in his logic, one cannot make statements like "the number of witnesses x ∈ N satisfying . . . is even").
Our second result extends a result from [11] that shows decidability of the FO 2 -theory of the structure (Σ * , ⊑, (L) L regular ) . They provide a quantifier elimination procedure which relies on two facts: The class of regular languages is closed under images of rational relations. The proper subword relation and the incomparability relation are rational. Here, we follow a similar proof strategy. But while that proof had to handle the existential quantifier, only, we also have to deal with counting quantifiers. This requires us to develop a theory of counting images under rational relations, e.g., the set of words u such that there are at least two words v in the regular language K with (u, v) in the rational relation R. We show that the class of regular languages is closed under such counting images provided the rational relation R is unambiguous, a proof that makes heavy use of weighted automata [17]. To apply this to the subword and the cover relation, this also requires us to show that the proper subword, the cover, and the incomparability relations are unambiguous rational.
Our third result extends the decidability of the Σ 1 -theory of (Σ * , ⊑) from [15]. The main point there was to prove that every finite partial order can be embedded into (Σ * , ⊑) if |Σ| ≥ 2. This is certainly false if we restrict the universe, e.g., to L = a * . However, such bounded regular languages are already covered by the first result, so we only have to handle unbounded regular languages L. In that case, we prove nontrivial combinatorial results regarding primitive words in regular languages and prefix-maximal subwords. These considerations then allow us to prove that, indeed, every finite partial order embeds into (L, ⊑). Then, the decidability of the Σ 1 -theory follows as in [15].
Regarding our fourth result, we know from [11] that decidability of the Σ 1 -theory of (L, ⊑, (w) w∈L ) does not hold for every regular L. Therefore, we require that a certain fraction of the positions in a word carries the letter a (for almost all words from the language and for all letters). This allows us to conclude that every finite partial order embeds into (L, ⊑) above each word. The second ingredient is that, for such languages, any Σ 1 -sentence is effectively equivalent to such a sentence where constants are only used to express that all variables take values above a certain word w. These two properties together with some combinatorial arguments from the theory of well-quasi orders then yield the decidability.
In summary, we identify four classes of decidable theories related to the subword order. In this paper, we concentrate on these positive results, i.e., we did not try to find new undecidable theories. It would, in particular, be nice to understand what properties of the regular language L determine the decidability of the Σ 1 -theory of the structure (L, ⊑ , (w) w∈L ) (it is undecidable for L = Σ * [5] and decidable for, e.g., L = {ab, baa} * ∪ bb{abb} * by our third result). Another open question concerns the complexity of our decidability results.

Preliminaries
Throughout this paper, let Σ be some alphabet.
In that case, we write u ⊑ v; if, in addition, u = v, then we write u ⊏ v and call u a proper subword of v. If u, w ∈ Σ * such that u ⊏ w and there is no word v with u ⊏ v ⊏ w, then we say that w is a cover of u and write u ⊏ · w. This is equivalent to saying u ⊑ w and |u| + 1 = |w| where |u| is the length of the word u. If, for two words u and v, neither u is a subword of v nor vice versa, then the words u and v are incomparable and we write u v. For instance, aa ⊏ babbba, aa ⊏ · aba, and aba aabb.
Let S = (L, (R i ) i∈I , (w j ) j∈J ) be a structure, i.e., L is a set, R i ⊆ L ni is a relation of arity n i (for all i ∈ I), and w j ∈ L for all j ∈ J. Then, formulas of the logic C+MOD are built from the atomic formulas s = t for s, t variables or constants w j and R i (s 1 , s 2 , . . . , s ni ) for i ∈ I and s 1 , s 2 , . . . , s ni variables or constants w j by the following formation rules: 1. If α and β are formulas, then so are ¬α and α ∧ β. 2. If α is a formula and x a variable, then ∃x α is a formula. 3. If α is a formula, x a variable, and k ∈ N, then ∃ ≥k x α is a formula. 4. If α is a formula, x a variable, and p, q ∈ N with p < q, then ∃ p mod q x α is a formula. We call ∃ ≥k a threshold counting quantifier and ∃ p mod q a modulo counting quantifier. The semantics of these quantifiers is defined as follows: For instance, ∃ 0 mod 2 x α expresses that the number of elements of the structure satisfying α is even. Then ∃ 0 mod 2 x α ∨ ∃ 1 mod 2 x α holds iff only finitely many elements of the structure satisfy α. The fragment FO+MOD of C+MOD comprises all formulas not containing any threshold counting quantifier ∃ ≥k . First-order logic FO is the set of formulas from C+MOD not mentioning any counting quantifier, i.e., neither ∃ ≥k nor ∃ p mod q . Let Σ 1 denote the set of first-order formulas of the form ∃x 1 ∃x 2 . . . ∃x n : ψ where ψ is quantifier-free; these formulas are also called existential.

23:4
Languages ordered by the subword order -long version Note that the formulas and are equivalent (they both express that the structure contains at least k elements). Generalising this, the threshold quantifier ∃ ≥k can be expressed using the existential quantifier, only. Consequently, the logics FO+MOD and C+MOD are equally expressive. The situation changes when we restrict the number of variables that can be used in a formula: Let FO 2 and C+MOD 2 denote the set of formulas from FO and C+MOD, respectively, that use the variables x and y, only. Note that the formula from (1) belongs to C+MOD 2 , but the equivalent formula from (2) does not belong to FO+MOD 2 .
◮ Remark. Let L 1 be a set with two elements and let L 2 be a set with k · q + 2 elements (where k > 2). Furthermore, let ϕ ∈ C+MOD 2 be a formula such that all moduli appearing in ϕ are divisors of q. By induction on the construction of the formula ϕ, one can show the following for any x, y ∈ L 1 and x ′ , y ′ ∈ L 2 : If x = y and x ′ = y ′ , then Consequently, there is no C+MOD 2 -formula expressing that the number of elements of a structure is ≥ k.
In this paper, we will consider the following structures: The largest one is (L, ⊑, ⊏ ·, (K ∩ L) L regular , (w) w∈L ) for some L ⊆ Σ * . The universe of this structure is the language L, we have two binary predicates (⊑ and ⊏ ·), a unary predicate K ∩ L for every regular language L, and we can use every word from L as a constant. The other extreme is the structure (L, ⊑) for some L ⊆ Σ * where we consider only the binary predicate ⊑. Finally, we will also prove results on the intermediate structure (L, ⊑, (w) w∈L ) that has a binary relation and any word from the language as a constant. For any structure S and any of the above logics L, we call {ϕ | ϕ ∈ L sentence and S |= ϕ} the L-theory of S.
A language L ⊆ Σ * is bounded if there are a number n ∈ N and words w 1 , w 2 , . . . , w n ∈ Σ * such that L ⊆ w * 1 w * 2 · · · w * n . Otherwise, it is unbounded. For an alphabet Γ, a word w ∈ Γ * , and a letter a ∈ Γ, let |w| a denote the number of occurrences of the letter a in the word w. The Parikh vector of w is the tuple Ψ Γ (w) = (|w| a ) a∈Γ ∈ N Γ . Note that Ψ Γ is a homomorphism from the free monoid Γ * onto the additive monoid (N Γ , +).

3
The FO+MOD-theory with regular predicates The aim of this section is to prove that the full FO+MOD-theory of the structure is decidable for L bounded and context-free. This is achieved by interpreting this structure in (N, +), i.e., in Presburger arithmetic whose FO+MOD-theory is known to be decidable [1,18,4]. We start with three preparatory lemmas.
Proof. Let Γ = {a 1 , a 2 , . . . , a n } be an alphabet and define the monoid homomorphism f : Since the class of context-free languages is effectively closed under inverse homomorphisms, the language is effectively context-free. Since a * 1 a * 2 . . . a * n is regular, also the language is effectively context-free. By Parikh's theorem [16], the Parikh-image Ψ Γ (K 2 ) ⊆ N n of this intersection is effectively semilinear. Now let m ∈ N n . Then m ∈ Ψ Γ (K 2 ) iff there exists a word u ∈ K 1 ∩ a * 1 a * 2 . . . a * n with Parikh image m. But the only word from a * 1 a * 2 . . . a * n with this Parikh image is a m1 1 a m2 2 · · · a mn n , i.e., m ∈ Ψ Γ (K 2 ) iff a m1 1 a m2 2 · · · a mn n ∈ K 1 . Since f (a m1 1 a m2 2 · · · a mn n ) = g(m), this is equivalent to g(m) ∈ K. Thus, the semilinear set Ψ Γ (K 2 ) equals the set g −1 (K) from the lemma. ◭ ◮ Lemma 2. Let w 1 , . . . , w n ∈ Σ * and g : N n → Σ * be defined by g(m) = w m1 Proof. Let Γ = {a 1 , a 2 , . . . , a n } be an alphabet and define the monoid homomorphism f : In this prove, we construct an alphabet ∆ and homomorphisms g, h 1 , h 2 , p 1 , and p 2 , such that the diagrams (for i ∈ {1, 2}) from Fig. 1 commute. In addition, we will construct a regular language R ⊆ ∆ * with The subword relation on Σ * is rational [11]. Since the class of rational relations is closed under inverse homomorphisms [2], also the relation is rational. While the class of rational relations is not closed under intersections, it is at least closed under intersections with direct products of regular languages. Hence also Figure 1 Commuting diagram for proof of Lemma 2 is rational. By Nivat's theorem [2], there are a regular language R over some alphabet ∆ and two homomorphisms h 1 , h 2 : ∆ * → Γ * with is semilinear [16].
Since all words appearing in S 2 belong to a * 1 a * 2 · · · a * n , this last statement is equivalent to saying But this is equivalent to saying Note that g(m) = f (a m1 1 a m2 2 · · · a mn n ) and similarly g(n) = f (a n1 1 a n2 2 · · · a nn n ). Thus, the last claim is equivalent to g(m) ⊑ g(n).
In summary, we showed that the semilinear set H is the set from the lemma. ◭ Then there exists a semilinear set U ⊆ N n such that g maps U bijectively onto L.
Proof. By Lemma 1, the set g −1 (L) is semilinear and satisfies, by its definition, Then let U denote the set of n-tuples m ∈ g −1 (L) such that the following holds for all n ∈ N n : If (m, n) ∈ T and (n, m) ∈ T , then m is lexicographically smaller than or equal to n. This set U is semilinear since the class of semilinear relations is closed under first-order definitions. Now let u ∈ L. Since g maps g −1 (L) onto L, there is m ∈ g −1 (L) with g(m) = u. Since, on g −1 (L) ⊆ N n , the lexicographic order is a well-order, there is a lexicographically minimal such tuple m. This tuple belongs to U and it is the only tuple from U mapped to u. ◭ Now we can prove the main result of this section.
◮ Theorem 4. Let L ⊆ Σ * be context-free and bounded. Then the FO+MOD-theory of 1. By Lemma 3, there is a semilinear set U ⊆ N n that is mapped by g bijectively onto L. From this semilinear set, we obtain a first-order formula λ(x) in the language of (N, +) such that, for any m ∈ N n , we have (N, From this semilinear set, we obtain a first-order formula σ(x, y) in the language of (N, +) such that (N, +) |= σ(m, n) ⇐⇒ g(m) ⊑ g(n). 3. For any regular language K ⊆ Σ * the set {m ∈ N n | g(m) ∈ K} ⊆ N n is effectively semilinear by Lemma 1.
From this semilinear set, we can compute a first-order formula κ K (x) in the language of (N, +) such that (N, We now define, from an FO+MOD-formula ϕ(x 1 , . . . , x k ) in the language of S, an FO+MOD-formula ϕ ′ (x 1 , . . . , x k ) in the language of (N, +) such that Intuitively, one is tempted to set ϕ ′ = ∃ p mod q x : λ(x) ∧ ψ ′ , but this is not a valid formula since x is not a single variable, but a tuple of variables. To rectify this, we define FO+MOD-formulas α k p for p ∈ [0, q − 1] and k ∈ [0, n − 1] as follows: By induction, one obtains Recall that g maps the tuples satisfying λ bijectively onto L. Hence, the above is equivalent to w ∈ L | ∃m k+1 , m k+2 , . . . , m n ∈ N : w = w m1 1 · · · w mn n ∈ L and S |= ψ(w) ∈ p + qN .
Setting ϕ ′ = α 0 p therefore solves the problem. Consequently, any sentence ϕ from FO+MOD in the language of S is translated into an equivalent sentence ϕ ′ in the language of (N, +). By [1,18,4], validity of the sentence ϕ ′ in The C+MOD 2 -theory with regular predicates By [11], the FO 2 -theory of (Σ * , ⊑, (L) L regular ) is decidable. This two-variable fragment of first-order logic has a restricted expressive power since, e.g., the following two properties cannot be expressed: To make the first property accessible, we add the cover relation to the structure. The logic C+MOD 2 allows to express the second property with only two variables by ∃ ≥3 x : x ∈ L (in addition, it can express that the regular language L contains an even number of elements which is not expressible in first-order logic at all). It is the aim of this section to show that the C+MOD 2 -theory of the structure is decidable. This decidability proof extends the proof from [11] for the decidability of the FO 2 -theory of (Σ * , ⊑, (L) L regular ). That proof provides a quantifier-elimination procedure that relies on two facts, namely 1. that the class of regular languages is closed under images under rational relations and 2. that the proper subword relation and the incomparablity relation are rational. Similarly, our more general result also provides a quantifier-elimination procedure that relies on the following extensions of these two properties: 1. The class of regular languages is closed under counting images under unambiguous rational relations (Section 4.2) and 2. the proper subword, the cover, and the incomparability relation are unambiguous rational (Section 4.1). The actual quantifier-elimination is then presented in Section 4.3.

Unambiguous rational relations
Recall that, by Nivat's theorem [2], a relation R ⊆ Σ * × Σ * is rational if there exist an alphabet Γ, a homomorphism h : Γ * → Σ * × Σ * , and a regular language S ⊆ Γ * such that h maps S surjectively onto R. We call R unambiguous rational relation if, in addition, h maps S injectively (and therefore bijectively) onto R.
Note that the intersection R 1 ∩ R 2 is not even rational, while the union R = R 1 ∪ R 2 is rational (since the union of rational language is always rational) [2]. But this union is not unambiguous rational: If it were unambiguous rational, then the set {u ∈ a * ba * | ∃ ≥2 v : (u, v) ∈ R} = {a m ba n | m = n} would be regular by Prop. 14 below.
Proof. There are disjoint alphabets Γ 1 and Γ 2 , regular languages S i ⊆ Γ i and homomorphisms for a ∈ Γ. Then h maps the regular language S bijectively onto R 1 ∪ R 2 . ◭ ◮ Lemma 7. For any alphabet Σ, the cover relation ⊏ · and the relation ⊏ \⊏ · are unambiguous rational.
Since the factorization of v is unique, we have that proj maps Sub bijectively onto the subword relation ⊑.
Let S denote the intersection of Sub with Σ 2 Σ 1 * Σ 2 Σ 2 Σ 1 * , i.e., the regular language of words from Sub with precisely one more occurrence of letters from Σ 2 than from Σ 1 . Then S is mapped bijectively onto the relation ⊏ ·, hence this relation is unambiguous rational.
Similarly, let S ′ denote the regular language of all words from Sub with at least two more occurrences of letters from Σ 2 than from Σ 1 . It is mapped bijectively onto the relation ⊏ \⊏ ·. Hence this relation is unambiguous rational. ◭ ◮ Lemma 8. For any alphabet Σ, the incomparability relation is unambiguous rational.
Proof. Note that the set is the disjoint union of the following three relations: As in the previous proof, let Σ i = Σ × {i} and Γ = Σ 1 ∪ Σ 2 . Furthermore, let the homomorphism proj i : Γ * → Σ * be defined by proj i (a, i) = a and proj i (a, 3 − i) = ε for all a ∈ Σ.
We prove that the relations R 1 , R 2 , and R 3 are all unambiguous rational. From Lemma 6, we then get that R is unambiguous rational since it is the disjoint union of these three relations.
We start with the simple case: R 2 . Consider the regular language This is the set of sequences of words of the form (a, 1)(b, 2) such that, at least once, a = b. Hence, proj maps the regular language Inc 2 bijectively onto R 2 . Next, we handle the relation R 1 ∪ R 2 . Correcting [11, Lemma 5.2] slightly, we learn that (u, v) ∈ R 1 ∪ R 2 if, and only if, the number ℓ, the letters a 1 , a 2 , . . . , a ℓ , and the words u ′ and v ′ are unique. Define By the above characterisation of R 1 ∪ R 2 , the homomorphism proj maps Inc 1,2 bijectively onto R 1 ∪ R 2 .
Since the class of unambiguous rational relations is closed under inverses, also R 3 = R −1 1 is unambiguous rational. ◭

Closure properties of the class of regular languages
Let R ⊆ Σ * × Σ * be an unambiguous rational relation and L ⊆ Σ * a regular language. We want to show that the languages of all words u ∈ Σ * with are effectively regular for all k ∈ N and all 0 ≤ p < q, respectively. For these proofs, we need the following classical concepts. Let S be a semiring. A function r : Σ * → S is realizable over S, if there are n ∈ N, λ ∈ S 1×n , a homomorphism µ : Σ * → S n×n , and ν ∈ S n×1 with r(w) = λ · µ(w) · ν for all w ∈ Σ * . 1 The triple (λ, µ, ν) is a presentation or a weighted automaton for r.
In the following, we consider the semiring N ∞ , i.e., the set N ∪ {∞} together with the commutative operations + and · (with x + ∞ = ∞ for all x ∈ N ∪ {∞}, x · ∞ = ∞ for all x ∈ (N ∪ {∞}) \ {0}, and 0 · ∞ = 0). On this set, we define (in a natural way) an infinite sum setting i∈I for any family (x i ) i∈I with entries in N ∞ . Our first aim in this section is to prove the following 1 In the literature, a realizable function is often called recognizable formal power series. Since, in this paper, we will not encounter any operations on formal power series (like addition, Cauchy product etc), we use the (in this context) more intuitive notion of a "realizable function". ◮ Proposition 10. Let Γ and Σ be alphabets, f : Γ * → Σ * a homomorphism, and χ : Γ * → N ∞ a realizable function over N ∞ . Then the function is effectively realizable over N ∞ .
For σ ∈ Σ ∪ {ε}, let Since f is non-expanding, Γ is the disjoint union of these subalphabets. Furthermore, let M ∈ (N ∞ ) n×n be the matrix defined by for all i, j ∈ [1, n].
To define a presentation for the function r, we first define a homomorphism µ ′ : Σ * → N ∞ n×n by for all a ∈ Σ. Setting λ ′ = λ · M and ν ′ = ν defines the presentation (λ ′ , µ ′ , ν ′ ) of dimension n. Now let u = a 1 a 2 . . . a m ∈ Σ * with a i ∈ Σ for all 1 ≤ i ≤ m. Then we get Hence, (λ ′ , µ ′ , ν ′ ) is a presentation for the function r, i.e., r is realizable. It remains to be shown that the presentation (λ ′ , µ ′ , ν ′ ) is computable from the presentation (λ, µ, ν) and the homomorphism f . For this, it suffices to construct the matrix M effectively, i.e., to compute the infinite sum in Eq. (5). Using a pumping argument, one first shows the equivalence of the following two statements for all i, j ∈ {1, 2, . . . , n}: (a) There are infinitely many words w ∈ Γ * ε with µ(w) ij > 0. (b) There is a word w ∈ Γ * ε with n < |w| ≤ 2n and µ(w) ij > 0. Since statement (b) is decidable, we can evaluate Eq. (5) calculating ◭ ◮ Lemma 12. Let Γ and Σ be alphabets, f : Γ * → Σ * a non-erasing homomorphism, and χ : Γ * → N ∞ a realizable function over N ∞ . Then the function Proof. Since χ is realizable, it can be constructed from functions s : Γ * → N ∞ with s(w) = 0 for at most one w ∈ Γ * using addition, Cauchy-product, and iteration applied to functions t with t(ε) = 0 [17, Theorem 3.11]. Replacing, in this construction, the basic function s by yields a construction of r. Since f is non-erasing, also in this construction, iteration is only applied to functions t with t(ε) = 0. Hence, by [17,Theorem 3.1] again, r is realizable.
Analysing the proof of that theorem, one even obtains that a presentation for r can be computed from f and a presentation of χ. is effectively realizable. By Lemma 12, also Let R ⊆ Σ * × Σ * be an unambiguous rational relation and L ⊆ Σ * be regular. Then the function is effectively realizable.

23:14
Languages ordered by the subword order -long version Proof. Since R is unambiguous rational, there are an alphabet Γ, homomorphisms f, g : Γ * → Σ * , and a regular language S ⊆ Γ * such that maps S bijectively onto the relation R. Let S L = S ∩g −1 (L). Since L is regular, the language S L is effectively regular. Furthermore, (f, g) maps S L bijectively onto R ∩ (Σ * × L). Since S L is regular, the characteristic function By Proposition 10, also the function is effectively realizable over N ∞ . Note that, for u ∈ Σ * , we get In other words, the effectively realizable function r ′ is the function r from the statement of the lemma. ◭ ◮ Proposition 14. Let R ⊆ Σ * × Σ * be an unambiguous rational relation and L ⊆ Σ * be regular.

Quantifier elimination for C+MOD 2
Our decision procedure employs a quantifier alternation procedure, i.e., we will transform an arbitrary formula into an equivalent one that is quantifer-free. As usual, the heart of this procedure handles formulas ψ = Qy ϕ where Q is a quantifier and ϕ is quantifier-free. Since the logic C+MOD 2 has only two variables, any such formula ψ has at most one free variable. In other words, it defines a language K. The following lemma shows that this language is effectively regular, such that ψ is equivalent to the quantifier-free formula x ∈ K.
◮ Lemma 15 . Let ϕ(x, y) be a quantifier-free formula from C+MOD 2 . Then the sets {x ∈ Σ * | S |= ∃ ≥k y ϕ} and {x ∈ Σ * | S |= ∃ p mod q y ϕ} are effectively regular for all k ∈ N and all p, q ∈ N with p < q.
Proof. Without changing the meaning of the formula ϕ, we can do the following replacements of atomic formulas: x = y can be replaced by x ⊑ y ∧ y ⊑ x, x ⊑ x and y ⊑ y by x ∈ Σ * , and x ⊏ · x and y ⊏ · y by x ∈ ∅. Since ϕ is quantifier-free, we can therefore assume that it is a Boolean combination of formulas of the form x ∈ K for some regular language K, y ∈ L for some regular language L, x ⊑ y, y ⊑ x, x ⊏ · y, and y ⊏ · x.
We define the following formulas θ i (x, y) for 1 ≤ i ≤ 6: Note that any pair of words x and y satisfies precisely one of these six formulas. Hence ϕ is equivalent to In this formula, any occurrence of ϕ appears in conjunction with precisely one of the formulas θ i . Depending on this formula θ i , we can simplify ϕ to ϕ i by replacing the atomic subformulas that compare x and y as follows: If i ∈ {1, 2, 3}, we replace x ⊑ y by the valid formula ⊤ = (x ∈ Σ * ). If i ∈ {1, 4, 5}, we replace y ⊑ x by ⊤.
If i = 2, we replace x ⊏ · y by ⊤.
If i = 4, we replace y ⊏ · x by ⊤.
All remaining comparisions are replaced by ⊥ = (x ∈ ∅). As a result, the formula ϕ is equivalent to where the formulas ϕ i are Boolean combinations of formulas of the form x ∈ K and y ∈ L for some regular languages K and L. Now let k ∈ N. Since the formulas θ i are mutually exclusive (i.e., where the disjunction ( * ) extends over all tuples (k 1 , . . . , k 6 ) of natural numbers with Hence it suffices to show that is effectively regular for all 1 ≤ i ≤ 6, all k ∈ N, and all Boolean combinations ϕ of formulas of the form x ∈ K and y ∈ L where K and L are regular languages. Since the class of regular languages is closed under Boolean operations, we can find regular languages K i and L i such that ϕ is equivalent to Note that this formula is equivalent to Since this disjunction is exclusive (i.e. any pair of words (x, y) satisfies at most one of the cases), the set from (6) equals the union of the sets for M ⊆ {1, 2, . . . , n}. Observe that for k = 0, this set equals Σ * and we are done. So let us assume k ≥ 1 from now on. Note that in that case, the set from (7) equals In any case, it is effectively regular by Prop. 14, Lemma 7, and Lemma 8. Since the language from the claim of the lemma is a Boolean combination of such languages, the first claim is demonstrated.
To also demonstrate the regularity of the second language, let p, q ∈ N with p < q. Then ∃ p mod q y ϕ is equivalent to the disjunction of all formulas of the form Proof. Let ϕ ∈ C+MOD 2 be a sentence. By the previous theorem, the set {x ∈ L | S |= ϕ} is regular. Hence ϕ holds iff this set is nonempty, which is decidable. ◭

5
The Σ 1 -theory Let L be regular and bounded. Then, by Theorem 4, we obtain in particular that the Σ 2theory of (L, ⊑) is decidable. Note that the regular language L = {a, b} * is not covered by this result since it is unbounded. And, indeed, the Σ 2 -theory of ({a, b} * , ⊑) is undecidable [5].
On the positive side, we know that the Σ 1 -theory of ({a, b} * , ⊑) is decidable [15]. In this section, we generalize this positive result to arbitrary regular languages, i.e., we prove the following result: The proof for the case L = {a, b} * in [15] essentially relies on the fact that each order (N k , ≤), and thus every finite partial order, embeds into ({a, b} * , ⊑).
In the general case here, the situation is more involved. Take, for example, L = {ab, ba} * . Then, orders as simple as (N 2 , ≤) do not embed into (L, ⊑): This is because the downward closure of any infinite subset of L contains all of L, but N 2 contains a downwards closed infinite chain. Nevertheless, we will show, perhaps surprisingly, that every finite partial order embeds into (L, ⊑). In fact, this holds whenever L is an unbounded regular language. The latter requires two propositions that we shall prove only later. Recall that a word w ∈ Σ + is called primitive if there is no r ∈ Σ + with w = rr + .
Proof of Theorem 18. By Theorem 4, we may assume that L is unbounded.
Hence ϕ = ∃x 1 , x 2 , . . . , x n : ψ with ψ quantifier-free holds in (L, ⊑) iff it holds in some finite partial order whose size can be bounded by n. Since there are only finitely many such partial orders, the result follows. ◭ The first proposition used in the above proof deals with the existence of certain primitive words for every unbounded regular language.
Proof. Since L is unbounded and regular, there are words x, y, p, q ∈ Σ * with |p| = |q|, p = q, and x{p, q} * y ⊆ L. Set r = pq and s = pp.
Then |r| = |s| and x{r, s} * y ⊆ x{p, q} * y ⊆ L. Suppose r and s are conjugate. Since s = p 2 , this implies r = yxyx with p = yx, i.e., r is the square of some word yx of length |p| = |q|. But this contradicts r = pq and p = q. Hence r and s are not conjugate.
Next let n = |r|, u = rs n−1 and v = s n . By contradiction, we show that uv is primitive.
Since we assume uv = rs 2n−1 not to be primitive, there is a word w ∈ Σ * with rs 2n−1 ∈ ww + . Observe that there is a t ∈ N such that n ≤ |w t | ≤ n 2 : If |w| ≥ n, we can choose t = 1 since |w| ≤ 1 2 |rs 2n−1 | = n 2 and if |w| < n, we can take t = n. Observe that r and w t are prefixes of uv = rs 2n−1 of length n and ≥ n, respectively. Hence r is a prefix of w t .
On the other hand, v = s n and w t are suffixes of uv of length n 2 and ≤ n 2 , respectively. Hence w t is a suffix of v = s n .
Taking these two facts together, we obtain that r is a factor of s n . Since r and s are not conjugate, this implies pq = r = s = pp which contradicts p = q. ◭ The second proposition used above talks about the embeddability of every finite partial order into certain regular languages of the form {u, v} * where the words u and v originate from the previous proposition. The proof of this embeddability requires a good deal of preparation that deals with the combinatorics of subwords, more precisely with the properties of "prefix-maximal subwords". Let x = a 1 a 2 . . . a m and An embedding of x into y is a mapping α : {1, 2, . . . , m} → {1, 2, . . . , n} with a i = b α(i) and i < j ⇐⇒ α(i) < α(j) for all i, j ∈ {1, 2, . . . , m}. Note that x ⊑ y iff there exists an embedding of x into y. This embedding is called initial if α(1) = 1, i.e., if the left-most position in x hits the left-most position in y. Symmetrically, the embedding α is terminal if α(m) = n, i.e., if the right-most position in x hits the right-most position in y.
We write x ֒→ y if x ⊑ y and every embedding of x into y is terminal. This is equivalent to saying that x, but no word xa with a ∈ Σ is a subword of y. In other words, x ֒→ y if x is a prefix-maximal subword of y. ◮ Lemma 20. Let w be primitive and n > |w|. Then, every embedding of w n into w n+1 is either initial or terminal.
Proof. Let α be an embedding of w n into w n+1 that is neither initial nor terminal. Consider the n copies of w in the word w n . We call such a copy gapless if its image in w n+1 under α is contiguous. Since the length difference between w n and w n+1 is only |w| < n, there has to be at least one gapless copy of w, say the ith copy. The image of this copy is a contiguous subword of w n+1 that spells w and occurs at some position i · |w| + j with j ∈ {0, . . . , |w|}. If j = 0, then α is initial and if j = |w|, then α is terminal. This means j ∈ {1, . . . , |w| − 1}. However, since w is primitive, it can occur as a contiguous subword in w n+1 only at positions that are divisible by |w|, which is a contradiction. ◭ ◮ Lemma 21. The ordering ֒→ is multiplicative: If x, x ′ , y, y ′ ∈ Σ * with x ֒→ y and x ′ ֒→ y ′ , then xy ֒→ x ′ y ′ .
Finally, for claim (iii), suppose α is an embedding of (uv) 1+ℓ v(uv) n−ℓ−2 into v(uv) n . Since v(uv) n = v(uv) 1+ℓ (uv) n−ℓ−1 , α induces an embedding β of (uv) n−ℓ−2 into (uv) n−ℓ−1 . Again, β cannot be initial because otherwise, α would embed (uv) 1+ℓ v into v(uv) 1+ℓ , but these are distinct words of equal length. Thus, Lemma 20 tells us that β must be terminal and hence also α.  The Σ 1 -theory with constants By Theorem 18, the Σ 1 -theory of (L, ⊑) is decidable for all regular languages L. If L is bounded, then even the Σ 1 -theory of (L, ⊑, (w) w∈L ) is decidable (Theorem 4). This result does not extend to all regular languages since, e.g., the Σ 1 -theory of (Σ * , ⊑, (w) w∈Σ * ) is undecidable [5]. In this section, we present another class of regular languages L (besides the bounded ones) such that S = (L, ⊑, (w) w∈L ) has a decidable Σ 1 -theory. Let L ⊆ Σ * be some language. Then almost all words from L have a non-negligible number of occurrences of every letter if there exists a positive real number ε such that for all a ∈ Σ and all but finitely many words w ∈ L, we have |w| a |w| > ε .
An example of such a regular language is {ab, ba} * (this class contains all finite languages, is closed under union and concatenation and under iteration, provided every word of the iterated language contains every letter). For w ∈ Σ * , let w↑ denote the set of superwords of w, i.e., the upward closure of {w} in (Σ * , ⊑).
The basic idea is, as in the proof of Theorem 18, to embed every finite partial order into (L, ⊑). The following lemma refines this embedability. Furthermore, it shows that L \ w↑ is finite in this case.
◮ Lemma 24. Let L ⊆ Σ * be an unbounded regular language such that almost all words from L have a non-negligible number of occurrences of every letter. Let w ∈ Σ * . Then every finite partial order (P, ≤) can be embedded into (L ∩ w↑, ⊑). Furthermore, the set L \ w↑ is finite.
Note that L ∩ w↑ is regular, but not necessarily unbounded (it could even be finite). Hence the first claim is not an obvious consequence of Propositions 23 and 19.

Open questions
We did not consider complexity issues. In particular, from [11], we know that the FO 2theory of the structure (Σ * , ⊑, (w) w∈Σ * ) can be decided in elementary time. We currently work out the details for the extension of this result to the C+MOD 2 -theory of the structure (L, ⊑, (w) w∈L ) for L regular. We reduced the FO+MOD-theory of the full structure (for L context-free and bounded) to the FO+MOD-theory of (N, +) which is known to be decidable in elementary time [4]. Unfortunately, our reduction increases the formula exponentially due to the need of handling statements of the form "there is an even number of pairs (x, y) ∈ N 2 such that ..." It should be checked whether the proof from [4] can be extended to handle such statements in FO+MOD for (N, +) directly. Finally, we did not give any new undecidability results. For example, we know that the Σ 1 -theory of (L, ⊑, (w) w∈L ) is undecidable for L = Σ * [5] and decidable for L = {ab, ba} * (Theorem 25). To narrow the gap between decidable and undecidable cases, one should find more undecidable cases.