1 Introduction

A formula \(\phi (\bar{x})\) in some theory \(\mathcal {L}\) is monadically decomposable if it is \(\mathcal {L}\)-equivalent to a Boolean combination of monadic predicates in \(\mathcal {L}\), i.e., to a monadic decomposition of \(\phi \). Monadic decomposability measures how tightly the free variables in \(\phi \) are coupled. For example, \(x = y\) is not monadically decomposable in any (finitary) logic over an infinite domain, but \(x + y \ge 2\) can be decomposed, in Presburger arithmetic over natural numbers, since it can be written as \(x \ge 2 \vee (x \ge 1 \wedge y \ge 1) \vee y \ge 2\).

Veanes et al. [24] initiated the study of monadic decomposability in the setting of Satisfiability Modulo Theories, wherein formulas are required to be quantifier-free. Monadic decomposability has many applications, including symbolic transducers [11] and string analysis [24]. Although the problem was shown to be in general undecidable, a generic semi-algorithm for outputting monadic decompositions (if decomposable) was provided. A termination check could in fact be added if the input formula belongs to a theory for which monadic decomposability is decidable, e.g., linear arithmetic, Tarski’s Real-Closed Field, and the theory of uninterpreted functions. Hitherto, not much is known about the computational complexity of monadic decomposability problems for many first-order theories (in particular, quantifier-free theories), and about practical algorithms. This was an open problem raised by Veanes et al. in [24].

Monadic decomposability is intimately connected to the variable partition problem, first studied by Libkin [19] nearly 20 years ago. In particular, a monadic decomposition gives rise to a partition of the free variables \(\bar{x}\) of a formula \(\phi (\bar{x})\), wherein each part consists of a single variable. More precisely, take a partition \(\varPi = \{Y_1,\ldots ,Y_m\}\) of \(\bar{x}\) into sets \(Y_i\) of variables, with linearizations \(\overline{y}_i\). The formula \(\phi (\bar{x})\) is \(\varPi \)-decomposable (in some theory \(\mathcal {L}\)) if it is \(\mathcal {L}\)-equivalent to a boolean combination of formulas of the form \(\varDelta (\overline{y}_i)\). As suggested in [19], such variadic decompositions of \(\phi (\bar{x})\) have potential applications in optimization of database query processing and quantifier elimination. The author gave a general condition for the decidability of variable independence in first-order theories. This result is unfortunately not easily applicable in the SMT setting for at least two reasons: (i) the full first-order theory might be undecidable (e.g. theory of uninterpreted functions), and (ii) even for a first-order theory that admits decidable monadic decompositions, the complexity of the algorithm obtained from [19] could be too prohibitive for the quantifier-free fragment. One example that epitomizes (ii) is the problem of determining whether a given relation \(R \subseteq (\varSigma ^*)^k\) over strings represented by a regular transducer could be expressed as a boolean combination of monadic predicates. The result of [19] would give a double exponential-time algorithm for monadic decomposability, whereas it was recently shown in [5] to be solvable in polynomial-time (resp. polynomial-space) when the transducer is given as a deterministic (resp. nondeterministic) machine.

Contributions. First, we determine the complexity of deciding monadic decomposability and outputting monadic decompositions (if they exist) for the theory of integer linear arithmetic in the setting of SMT. Our result is summarized in Theorem 1.

Theorem 1 (Monadic Decomposability)

Given a quantifer-free formula \(\phi \) of Presburger Arithmetic, it is coNP-complete to decide if \(\phi \) is monadically decomposable. This is efficiently reducible to unsatisfiability of quantifier-free Presburger formulas. Moreover, if a decomposition exists, it can be constructed in exponential time.

We show a new application of monadic decomposability in integer linear arithmetic for SMT over strings, which is currently a very active research area, e.g., see [1,2,3, 6, 9, 12, 16, 18, 20, 22, 23]. One problem that makes string constraint solving difficult is the presence of additional length constraints, which forces the lengths of the strings in the solutions to satisfy certain linear arithmetic constraints. Whereas satisfiability of string equations with regular constraints is PSPACE-complete (e.g. see [13, 17]), it is a long-standing open problem [7, 14] whether word equations with length constraints are decidable. Length constraints are omnipresent in Kaluza [22], arguably the first serious string constraint benchmarks obtained from real-world JavaScript applications. Using our monadic decomposability solver, we show that 90% of the Kaluza benchmarks are in fact in a decidable fragment of string constraints, since occurring length constraints can be completely removed by means of decomposition.

Next we extend our result to variadic decomposability (cf. [19]).

Theorem 2 (Variadic Decomposability)

It is coNP-complete to decide if \(\phi \) is \(\varPi \)-decomposable, given a quantifer-free formula \(\phi (\bar{x})\) of Presburger Arithmetic and a partition \(\varPi = \{Y_1,\ldots ,Y_n\}\) of \(\bar{x}\). This is efficiently reducible to unsatisfiability of quantifier-free Presburger formulas. Moreover, if a decomposition exists, it can be constructed in exponential time.

We show how this could be applied to quantifier elimination. In particular, we show that if a formula \(\phi (\bar{y}) = \exists \bar{x} . \, \psi (\bar{x},\bar{y})\), where \(\psi \) is quantifier-free, is \(\{X, Y\}\)-decomposable—where \(\bar{x}\) and \(\bar{y}\) are linearizations of the variables in X and Y—then we can compute in exponential time a formula \(\theta (\bar{y})\) such that \(\langle \mathbb {N},+\rangle \models \theta \leftrightarrow \phi \), i.e., avoiding the standard double-exponential blow-up (cf. [25]).

Organization. Preliminaries are in Sect. 2. Results on monadic (resp. variadic) decomposition are in Sect. 3 (resp. Sect. 4) and applications appear in Sect. 5.

2 Preliminaries

2.1 Presburger Syntax

In this paper we study the problem of monadic decomposition for formulas in linear integer arithmetic. All of our results are presented for Presburger arithmetic over natural numbers, but they can be adapted easily to all integers.

Definition 1 (Fragments of Presburger Arithmetic)

A formula \(\phi \) of Presburger arithmetic is a formula of the form \(\mathcal {Q}_1x_1\cdots \mathcal {Q}_nx_n . \, \psi \) where \(\mathcal {Q}_i \in \{\forall ,\exists \}\) and \(\psi \) is a quantifier-free Presburger formula:

$$ \psi := \sum _ia_ix_i \sim b\ |\ ax\equiv _{k} by\ |\ x\equiv _{k} c\ |\ \phi _1 \wedge \phi _2 \ |\ \lnot \phi $$

where \(a_i, a, b\in \mathbb {Z}\), \(k, c\in \mathbb {N}\) with \(0 \le c< k\), variables \(x_i, x, y\) range over \(\mathbb {N}\), and \({\sim } \in \left\{ \le , \ge \right\} \). The operator \(\equiv _{k}\) denotes equality modulo \(k\), i.e., \(s \equiv _{k} t\) whenever \(s - t\) is a multiple of \(k\). Formulas of the shape \(\sum _ia_ix_i \sim b\), \(ax\equiv _{k} by\), or \(x\equiv _{k} c\) are called atoms.

Existential Presburger formulas are formulas of the form \(\exists x_1,\ldots ,x_n . \, \psi \) for some quantifier-free Presburger formula \(\psi \). We let \(\text {QF}(\mathbb {N})\) (resp. \(\exists ^*(\mathbb {N})\)) denote the set of all quantifier-free (resp., existential) Presburger formulas.

Let \(\overline{x}= \left( {x_1, \ldots , x_n}\right) \) be a tuple of integer variables. We write \({f}\mathord {\left( {\overline{x}}\right) } = \sum _ia_ix_i\) for a linear sum over \(\overline{x}\). Let \(\overline{y}= \left( {y_1, \ldots , y_m}\right) \). By slight abuse of notation, we may also write \({\phi }\mathord {\left( {\overline{x}, \overline{y}}\right) }\) to denote a \(\text {QF}(\mathbb {N})\) formula over the variables \(x_1, \ldots , x_n, y_1, \ldots , y_m\).

2.2 Monadic Decomposability

A quantifier-free formula \(\phi \) is called monadic if every atom in \(\phi \) contains at most one variable, and it is called monadically decomposable if \(\phi \) is equivalent to a monadic formula \(\phi '\). In this case, \(\phi '\) is also called a decomposition of \(\phi \). For our main results we use a slightly refined notion of a formula being decomposable:

Definition 2

(Monadically Decomposable on \(x\)). Fix a logic \(\mathcal {L}\) (e.g. \(\text {QF}(\mathbb {N})\) or \(\exists ^*(\mathbb {N})\)). We say a formula \({\phi }\mathord {\left( {x_1, \ldots , x_n}\right) }\) in \(\mathcal {L}\) is monadically decomposable on \(x_i\) whenever

$$ {\phi }\mathord {\left( {x_1, \ldots , x_n}\right) } \equiv \bigvee \limits _{j} {\varDelta _j}\mathord {\left( {x_i}\right) } \wedge {\psi _j}\mathord {\left( {x_1, \ldots , x_{i-1}, x_{i+1}, \ldots , x_n}\right) } $$

for some formulas \(\varDelta _j\) and \(\psi _j\) in \(\mathcal {L}\).

It can be observed that a formula is monadically decomposable if and only if it is monadically decomposable on all variables occurring in the formula (cf. Lemma 1). We expand on this for the variadic case below.

We recall the following characterization of monadic decomposability for formulas \(\phi (x,y)\) with two free variables (cf. [5, 8, 19, 24]), which holds regardless of the theory under consideration. This can be extended easily to formulas with k variables, but is not needed in this paper. Given a formula \(\phi (x,y)\), define the formula \(\sim \) as follows:

$$ x \sim x' := \forall y, y'.\ ( \phi (x,y) \wedge \phi (x',y') \rightarrow (\phi (x',y) \wedge \phi (x,y'))) $$

Proposition 1

The relation \(\sim \) is an equivalence relation. Furthermore, \(\phi (x,y)\) is monadically decomposable iff \(\sim \) has a finite index (i.e. the number of \(\sim \)-equivalence classes is finite).

Using this proposition, it is easy to show that over a structure with an infinite domain (e.g. integer linear arithmetic) the formula \(x=y\) is not monadically decomposable. As was noted already in [19], to check monadic decomposability of a formula \(\phi \) in Presburger Arithmetic in general, we may simply check if there is an upper bound B on the smallest representation of every \(\sim \)-equivalence class, i.e.,

$$\begin{aligned} \exists B. \forall x. \exists x_s.\ ( x_s \le B \wedge x_s \sim x ). \end{aligned}$$

However, to derive tight complexity bounds for checking monadic decomposability, this approach is problematic, since the above characterisation has multiple quantifier alternations. Using known results (e.g. [15]), one would only obtain an upper bound in the weak exponential hierarchy [15], which only admits double-exponential time algorithms.

2.3 Variadic Decomposability

The notion of a variadic decomposition generalises monadic decomposition by considering partitions of the occurring variables.

Definition 3

(\(\varPi \)-Decomposable). Fix a logic \(\mathcal {L}\) (e.g. \(\text {QF}(\mathbb {N})\) or \(\exists ^*(\mathbb {N})\)). Take a formula \({\phi }\mathord {\left( {x_1, \ldots , x_n}\right) }\) in \(\mathcal {L}\) and a partition \(\varPi = \{Y_1,\ldots ,Y_m\}\) of \(x_1, \ldots , x_n\). We say \(\phi \) is \(\varPi \)-decomposable whenever

$$ {\phi }\mathord {\left( {x_1, \ldots , x_n}\right) } \equiv \bigvee \limits _{i} {\varDelta ^1_i}\mathord {\left( {\overline{y}_1}\right) } \wedge \cdots \wedge {\varDelta ^m_i}\mathord {\left( {\overline{y}_m}\right) } $$

for some formulas \(\varDelta ^j_i\) in \(\mathcal {L}\) and linearizations \(\overline{y}_j\) of \(Y_j\).

Observe that a formula \({\phi }\mathord {\left( {x_1, \ldots , x_n}\right) }\) is monadically decomposable on \(x_i\) iff it is \(\varPi \)-decomposable with \(\varPi = \{ \{x_i\}, \{x_1, \ldots , x_{i-1}, x_{i+1}, \ldots , x_n\} \}\). Moreover, we say a formula \(\phi \) over the set of variables X is variadic decomposable on Y whenever it is \(\varPi \)-decomposable with \(\varPi = \{Y, X \setminus Y\}\).

General \(\varPi \)-decompositions can be computed by decomposing on binary partitions \(\{Y, X \setminus Y\}\), which is why we focus on this binary case in the rest of the paper. We argue why this is the case below.

Let a formula \(\phi \) and \(\varPi = \{Y_1, \ldots , Y_m\}\) be given. We can first decompose separately on each \(\{Y_i, Y\}\) where \(Y = Y_1 \cup \cdots \cup Y_{i-1} \cup Y_{i+1} \cup \cdots \cup Y_{m}\). Using the algorithm in Sect. 4 we obtain for each \(i\) a decomposition of a specific form:

$$ \bigvee \limits _{j} {\varDelta ^i_j}\mathord {\left( {\overline{y}_i}\right) } \wedge {\phi }\mathord {\left( {\overline{y}_1, \ldots , \overline{y}_{i-1}, \overline{c}^i_j, \overline{y}_{i+1}, \ldots , \overline{y}_{m}}\right) }. $$

Note, these decompositions can be performed independently using the algorithm in Sect. 4 and the second conjunct of each disjunct is \(\phi \) with \(y_i\) replaced by fixed constants \(\overline{c}_j\). Additionally, each \(\varDelta ^i_j\) is polynomial in size and each \(\overline{c}^i_j\) can be represented with polynomially many bits. We note also that our algorithm ensures that each \(\varDelta ^i_j\) is satisfiable.

Given such decompositions, we can recursively decompose \(\phi \) on \(\varPi \). We first use the above decomposition for \(i= 1\) and obtain

$$ \bigvee \limits _{j} {\varDelta ^1_j}\mathord {\left( {\overline{y}_1}\right) } \wedge {\phi }\mathord {\left( {\overline{c}^1_j, \overline{y}_2, \ldots , \overline{y}_{m}}\right) }. $$

Next, we use the decomposition for \(i= 2\) to decompose the copies of \(\phi \) in the decomposition above. We obtain

$$ \bigvee \limits _{j_1} \left( { {\varDelta ^1_{j_1}}\mathord {\left( {\overline{y}_1}\right) } \wedge \bigvee \limits _{j_2} {\varDelta ^2_{j_2}}\mathord {\left( {\overline{y}_2}\right) } \wedge {\phi }\mathord {\left( {\overline{c}^1_{j_1}, \overline{c}^2_{j_2}, \overline{y}_3, \ldots , \overline{y}_{m}}\right) } }\right) . $$

This process repeats until all \(Y_i\) have been considered. If \(\phi \) is \(\varPi \)-decomposable, we find a decomposition. If \(\phi \) is not \(\varPi \)-decomposable, then it would not be possible to do the independent decompositions for each \(i\). Thus, for \(\varPi = \{Y_1,\ldots , Y_m\}\), we can use variadic decompositions on \(Y_i\) to compute \(\varPi \)-decompositions.

The above algorithm runs in exponential time due both to the exponential size of the decompositions and the branching caused by the disjuncts. If we are only interested in whether a formula is \(\varPi \)-decomposable, it is enough to ask whether it is decomposable on \(Y_i\) for each \(i\). In particular, a formula \(\phi (\bar{x})\) is monadically decomposable iff \(\phi \) is decomposable for each variable \(y \in \bar{x}\). Since the complexity class coNP is closed under intersection, we obtain the following:

Lemma 1

A coNP upper bound for monadic decomposability on a given variable y implies a coNP upper bound for monadic decomposability. Likewise, a coNP upper bound for variadic decomposability on a given subset Y of variables implies a coNP upper bound for \(\varPi \)-decomposability.

2.4 Example

Consider the formula \({\phi }\mathord {\left( {x, y, z}\right) }\) given by \(z= x+ 2 y\wedge z< 5\). This formula is monadically decomposable, which means, it is \(\varPi \)-decomposable for \(\varPi = \{\{x\}, \{y\}, \{z\}\}\).

Our algorithm will first take a decomposition on \(x\) and might obtain \(\bigvee ^4_{i= 0} {\varDelta ^1_i}\mathord {\left( {x}\right) } \wedge {\phi }\mathord {\left( {i, y, z}\right) }\) where \({\varDelta ^1_i}\mathord {\left( {x}\right) } = (x= i)\) and \({\phi }\mathord {\left( {i, y, z}\right) } = (z= i+ 2y) \wedge z< 5\). Next, we use a decomposition on \(y\). For each \({\phi }\mathord {\left( {i, y, z}\right) }\) we substitute \( \bigvee ^{ 2 - \lceil \frac{i}{2} \rceil }_{j= 0} y= j\wedge {\phi }\mathord {\left( {i, j, z}\right) } \), and as the final decomposition we get

$$\begin{aligned} \bigvee \limits ^4_{i= 0} \bigvee \limits ^{ 2 - \lceil \frac{i}{2} \rceil }_{j= 0} x= i\wedge y= j\wedge z= i+ 2 j. \end{aligned}$$

3 Monadic Decomposability

3.1 Lower Bounds

We first show that unsatisfiability of Boolean formulas can be reduced to monadic decomposability of formulas with only two variables, directly implying coNP-hardness:

Lemma 2 (coNP-Hardness)

Deciding whether a formula \(\phi (x,y)\) in \(\text {QF}(\mathbb {N})\) is monadic decomposable is coNP-hard.

Proof

We reduce from unsatisfiability of propositional formulas to monadic decomposability of \(\phi (x,y)\). Take a propositional formula \({S}\mathord {\left( {x_1, \ldots , x_n}\right) }\). Let \(p_1,\ldots ,p_n\) be the first \(n\) primes. Let \(\psi (x)\) be the formula obtained from \(S\) by replacing each occurrence of \(x_i\) by \(x \equiv _{p_i} 0\). Given an assignment \(\nu : \{x_1,\ldots ,x_n\} \rightarrow \{0,1\}\), we let

$$ H_\nu = \{ m \in \mathbb {N}\mid \forall 1 \le i \le n.\ ( m \equiv _{p_i} 0 \leftrightarrow \nu (x_i) = 1) \}. $$

Thanks to the Chinese Remainder Theorem, \(H_\nu \) is non-empty and periodic with period \(p = \prod _{i=1}^np_i\), which implies that \(H_\nu \) is infinite for every \(\nu \). We also have that \(\nu \models S\) iff, for each \(n \in H_\nu \), \(\psi (n)\) is true.

Now define \(\phi (x,y) = (\psi (x) \wedge x=y)\). If \(S\) is unsatisfiable, then \(\psi \) is unsatisfiable and so it is decomposable. Conversely, if \(S\) can be satisfied by some assignment \(\nu \), then \(\phi (m,m)\) is true for all (infinitely many) \(m \in H_\nu \). Since all solutions to \(\phi (x,y)\) imply that \(x=y\), by Proposition 1 we have that \(\phi \) is not monadically decomposable.    \(\square \)

We next provide exponential lower bounds for decompositions in either disjunctive normal form (DNF) or conjunctive normal form (CNF). DNF has been frequently used to represent monadic decompositions by previous papers (e.g. [5, 8, 19]), and it is most suitable for applications in quantifier elimination.

Lemma 3 (Size of Decomposition)

There exists a family \(\{\phi _n(x, y)\}_{n \in \mathbb {N}}\) of formulas in \(\text {QF}(\mathbb {N})\) such that \(\phi _n\) grows linearly in n, while the smallest decomposition on \(x\) in DNF/CNF is exponential in n.

Proof

Consider the formulas \({\phi _n}\mathord {\left( {x, y}\right) } = (x+ y\le 2^n)\). Using a binary encoding of constants, the size of the formulas is linear in n. We show that decompositions in DNF/CNF must be exponential in size.

Disjunctive: Suppose \( {\psi _n}\mathord {\left( {x, y}\right) } = \bigvee _i \psi _i^x(x) \wedge \psi _i^y(y) \) is a monadic decomposition in DNF. Each disjunct \(\psi _i^x(x) \wedge \psi _i^y(y)\), if it is satisfiable at all, has an upper right corner \((x_i, y_i)\) such that \(\psi _i^x(x_i) \wedge \psi _i^y(y_i)\) holds, but \(\psi _i^x(x) \wedge \psi _i^y(y) \Rightarrow x \le x_i \wedge y \le y_i\). This immediately implies that exponentially many disjuncts are needed to cover the exponentially many points on the line \(x + y = 2^n\).

Conjunctive: Suppose \(\psi _n(x, y)\) is a succinct monadic decomposition of \(\phi _n\) in CNF. Since \(\lnot \psi _n(x, y) \equiv 2^n+1 \le x + y \equiv (2^n-x+1)+(2^n-y) \le 2^n\), it follows that \(\lnot \psi _n(2^n - x + 1, 2^n - y) \equiv (2^n-(2^n-x+1)+1)+(2^n-(2^n-y)) \le 2^n \equiv x+y \le 2^n\). Therefore, \(\lnot \psi _n(2^n - x + 1, 2^n - y)\) is a succinct decomposition of \(\phi _n\) in DNF, contradicting the lower bound for DNFs.   \(\square \)

3.2 Upper Bound

We prove Theorem 1. Following Lemma 1, it suffices to show that testing decomposability on a variable \(x\) is in coNP and that a decomposition can be computed in exponential time. Assume without loss of generality that we have \({\phi }\mathord {\left( {x, \overline{y}}\right) }\) where \(\overline{y}= \left( {y_1, \ldots , y_n}\right) \), and that we are decomposing on the first variable x.

We claim that \(\phi \) is monadically decomposable on \(x\) iff

$$ \forall x_1, x_2 \ge B. \forall \overline{y}.~~ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } \Rightarrow \left( { {\phi }\mathord {\left( {x_1, \overline{y}}\right) } \iff {\phi }\mathord {\left( {x_2, \overline{y}}\right) } }\right) $$

where \(B\) is a bound exponential in the size of \(\phi \) and \(\mathrm {SameDiv}\) is a formula asserting that \(x_1\) and \(x_2\) satisfy the same divisibility constraints. This bound is computable in polynomial time and is described in Sect. 3.4. To define \(\mathrm {SameDiv}\), let \(\mathrm {Divs}\) be the set of all divisibility constraints \(az_1 \equiv _{k} bz_2\) or \(z_1 \equiv _{k} c\) appearing (syntactically) in \(\phi \). Assume without loss of generality that \(x\) always appears on the left-hand side of a divisibility constraint (i.e., in the \(z_1\) position of \(az_1 \equiv _{k} bz_2\)). We then define

$$ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } ~=~~ \left( {\begin{array}{c} \bigwedge \limits _{ ax \equiv _{k} bz\in \mathrm {Divs}} \left( {ax_1 \equiv _{k} bz}\right) \iff \left( {ax_2 \equiv _{k} bz}\right) \\ \wedge \\ \bigwedge \limits _{ x \equiv _{k} c\in \mathrm {Divs}} \left( {x_1 \equiv _{k} c}\right) \iff \left( {x_2 \equiv _{k} c}\right) \end{array}}\right) . $$

We prove the claim in the following sections and simultaneously show how to construct the decomposition. Once we have established the above, we can test non-decomposability on \(x\) by checking

$$ \exists x_1, x_2 \ge B. \, \exists \overline{y}. ~~ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } \wedge {\phi }\mathord {\left( {x_1, \overline{y}}\right) } \wedge \lnot {\phi }\mathord {\left( {x_2, \overline{y}}\right) } $$

which is decidable in NP. Thus we obtain a coNP decision procedure because the above formula is polynomial in the size of \(\phi \).

Example. We consider some examples. First consider the formula \(x= y\) that cannot be decomposed on \(x\). Since there are no divisibility constraints, \(\mathrm {SameDiv}\) is simply \( true \). It is straightforward to see that, \(\forall B. \exists x_1, x_2 \ge B. \exists y.~ true \wedge x_1 = y\wedge x_2 \ne y\), for example by setting \(x_1 = B\), \(x_2 = B+ 1\), and \(y= B\).

Now consider the monadically decomposable formula

$$ {\phi }\mathord {\left( {x, y, z}\right) } = x+ 2 y\ge 5 \wedge z< 5 \wedge x\equiv _{2} y. $$

In this case \({\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, y, z}\right) } =~ (x_1 \equiv _{2} y\iff x_2 \equiv _{2} y)\). We can verify

$$ \forall x_1, x_2 \ge B. \forall y, z.~~ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, y, z}\right) } \Rightarrow \left( { {\phi }\mathord {\left( {x_1, y, z}\right) } \iff {\phi }\mathord {\left( {x_2, y, z}\right) } }\right) $$

holds, as it will be the case that \(5 < B\) and for all \(x> 5\) the formula \(\phi \) will hold whenever \(x\equiv _{2} y\) holds and \(z< 5\). The precondition \(\mathrm {SameDiv}\) ensures that the if and only if holds. We will construct the decomposition in the next section.

Expanded Divisibility Constraints. Observe that divisibility constraints are always decomposable. In particular, \(az_1 \equiv _{k} bz_2\) is equivalent to a finite disjunction of clauses \(z_1 \equiv _{k'} c\wedge z_2 \equiv _{k'} c\) where \(k'\) and \(c\) are bounded by a multiple of \(a, b\) and \(k\). The expansion is exponential in size, since the values up to \(k'\) have to be enumerated explicitly.

We define \(\mathrm {XDivs}\) be the set of all constraints of the form \(x\equiv _{k} c'\) where \(0 \le c' < k\) and \(x\equiv _{k} c\) appears directly in \(\phi \) or in the expansion of the divisibility constraints of \(\phi \). This set will be used in the next sections.

3.3 Soundness

We show that if

$$ \forall x_1, x_2 \ge B. \forall \overline{y}.~~ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } \Rightarrow \left( { {\phi }\mathord {\left( {x_1, \overline{y}}\right) } \iff {\phi }\mathord {\left( {x_2, \overline{y}}\right) } }\right) $$

then \(\phi \) is decomposable on \(x\). We do this by constructing the decomposition.

Although there are doubly exponentially many subsets \(D\subseteq \mathrm {XDivs}\), there are only exponentially many maximal consistent subsets. We implicitly restrict \(D\) to such subsets. This is because, for any \(k\), there is no value of \(x\) such that \(x\equiv _{k} c\) and \(x\equiv _{k} c'\) both hold with \(c\ne c'\) but \(c, c' \in \{0, \ldots , k-1\}\). For any maximal consistent set \(D\subseteq \mathrm {XDivs}\), let \(c_{D}\) be the smallest integer greater than or equal to \(B\) satisfying all constraints in \(D\). Note, since \(D\) is maximal, a value that satisfies all constraints in \(D\) also does not satisfy an constraints not in \(D\). The number \(c_{D}\) can be represented using polynomially many bits.

We can now decompose \(\phi \) into

$$ \left( {\begin{array}{c} \left( {x= 0 \wedge {\phi }\mathord {\left( {0, \overline{y}}\right) }}\right) \\ \vee \cdots \vee \\ \left( {x= B- 1 \wedge {\phi }\mathord {\left( {B- 1, \overline{y}}\right) }}\right) \end{array}}\right) \vee \bigvee \limits _{ D\subseteq \mathrm {XDivs}}\left( { x\ge B\wedge \bigwedge \limits _{ x\equiv _{k} c\in D} x\equiv _{k} c\wedge {\phi }\mathord {\left( {c_{D}, \overline{y}}\right) } }\right) . $$

This formula is exponential in the size of \(\phi \) if D only ranges over the maximal consistent subsets of \(\mathrm {XDivs}\). For values of \(x\) less than \(B\), equivalence with the original formula is immediate. For larger values, we use the fact that, from our original assumption, for any values \(x_1\) and \(x_2\) that satisfy the same divisibility constraints, we have \({\phi }\mathord {\left( {x_1, \overline{y}}\right) }\) iff \({\phi }\mathord {\left( {x_2, \overline{y}}\right) }\). Hence, we can substitute the values \(c_{D}\) in these cases.

Example. We return to \({\phi }\mathord {\left( {x, y, z}\right) } = x+ 2 y\ge 5 \wedge z< 5 \wedge x\equiv _{2} y\) and compute the decomposition on \(x\). Assuming \(B\) is odd, the decomposition will be as follows. In our presentation we slightly simplify the formula. Strictly speaking \(x\equiv _{2} y\) should be expanded to \((x\equiv _{2} 0 \wedge y\equiv _{2} 0) \vee (x\equiv _{2} 1 \wedge y\equiv _{2} 1)\). We simplify these to \(y\equiv _{2} 0\) and \(y\equiv _{2} 1\), respectively, when instantiated with concrete values of \(x\).

$$ \begin{array}{c} \left( { x = 0 \wedge \left( {0 + 2 y\ge 5 \wedge z< 5 \wedge y\equiv _{2} 0}\right) }\right) \vee \\ \left( { x = 1 \wedge \left( {1 + 2 y\ge 5 \wedge z< 5 \wedge y\equiv _{2} 1}\right) }\right) \\ \vee \cdots \vee \\ \left( { x = B- 1 \wedge \left( {B- 1 + 2 y\ge 5 \wedge z< 5 \wedge y\equiv _{2} 0}\right) }\right) \vee \\ \left( { \left( {x\equiv _{2} 0 \wedge x\ge B}\right) \wedge \left( {B+ 1 + 2 y\ge 5 \wedge z< 5 \wedge y\equiv _{2} 0}\right) }\right) \vee \\ \left( { \left( {x\equiv _{2} 1 \wedge x\ge B}\right) \wedge \left( {B+ 2 y\ge 5 \wedge z< 5 \wedge y\equiv _{2} 1}\right) }\right) \end{array} $$

3.4 Completeness

We now show that every formula \(\phi \) decomposable on \(x\) satisfies

$$ \forall x_1, x_2 \ge B. \forall \overline{y}.~~ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } \Rightarrow \left( { {\phi }\mathord {\left( {x_1, \overline{y}}\right) } \iff {\phi }\mathord {\left( {x_2, \overline{y}}\right) } }\right) . $$

We first show that some \(B\) must exist. Once the existence has been established, we can argue that it must be at most exponential in \(\phi \).

Existence of the Bound. If \({\phi }\mathord {\left( {x, \overline{y}}\right) }\) is decomposable on \(x\), then there is an equivalent formula \(\bigvee _{i} {\varDelta _i}\mathord {\left( {x}\right) } \wedge {\psi _i}\mathord {\left( {\overline{y}}\right) }\). It is known that every formula \({\varDelta }\mathord {\left( {x}\right) }\) is satisfied by a finite union of arithmetic progressions \(a+ jb\). Let \(B\) be larger than the largest value of \(a\) in the arithmetic progressions satisfying the \({\varDelta _i}\mathord {\left( {x}\right) }\).

We show when \({\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) }\) then \({\phi }\mathord {\left( {x_1, \overline{y}}\right) }\) iff \({\phi }\mathord {\left( {x_2, \overline{y}}\right) }\) for all values \(x_1, x_2 \ge B\) and \(\overline{y}\). Assume towards a contradiction that we have values \(x_1, x_2\) and a tuple of values \(\overline{y}\) such that \({\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) }\) and \({\phi }\mathord {\left( {x_1, \overline{y}}\right) }\), but not \({\phi }\mathord {\left( {x_2, \overline{y}}\right) }\).

Let \(k\) be the product of all \(k'\) appearing in some divisibility constraint \(x \equiv _{k'} c\) in \(\mathrm {XDivs}\). We know that there is some disjunct of the monadic decomposition such that \({\varDelta }\mathord {\left( {x_1}\right) } \wedge {\psi }\mathord {\left( {\overline{y}}\right) }\) holds. Moreover, let \(x_1\) belong to the arithmetic progression \(a+ jb\). Since \(x_1 \ge B> a\) we know that \({\varDelta }\mathord {\left( {x'_1}\right) } \wedge {\psi }\mathord {\left( {\overline{y}}\right) }\) also holds for any \(x'_1 = x_1 + j'bk\). That is, we can pump \(x_1\) by adding a multiple of \(bk\), while staying in the same arithmetic progression and satisfying the same divisibility constraints.

Similarly, let \(d\) be the product of all \(b\) appearing in the (finite number of) arithmetic progressions that define the monadic decomposition of \(\phi \), limited to disjuncts such that \({\psi _i}\mathord {\left( {\overline{y}}\right) }\) holds. Since \({\phi }\mathord {\left( {x_2, \overline{y}}\right) }\) does not hold, then \({\phi }\mathord {\left( {x'_2, \overline{y}}\right) }\) also does not hold for any \(x'_2 = x_2 + jdk\). This means that we can pump \(x_2\) staying outside of the arithmetic progressions defining permissible values of \(x\) for the given values \(\overline{y}\), whilst additionally satisfying the same divisibility constraints.

Now, for each value of \(x'_1\) satisfying \({\phi }\mathord {\left( {x'_1, \overline{y}}\right) }\) we can consider the disjunctive normal form of \(\phi \). By expanding the divisibility constraints, a disjunct becomes a conjunction of terms of the form, where \(f\) represents some linear function on \(\overline{y}\),

  1. 1.

    \(ax+ {f}\mathord {\left( {\overline{y}}\right) } \le c\) or \(ax+ {f}\mathord {\left( {\overline{y}}\right) } \ge c\), or

  2. 2.

    \(y_i\equiv _{k'} c\) or \(x\equiv _{k'} c\).

Since there are infinitely many \(x'_1\), we can choose one disjunct satisfied by infinitely many \(x'_1\). This means that for constraints of the form \(ax+ {f}\mathord {\left( {\overline{y}}\right) } \le c\) or \(ax+ {f}\mathord {\left( {\overline{y}}\right) } \ge c\) with a non-zero \(a\), then \(a\) must be negative or positive respectively (or zero). Otherwise, only a finite number of values of \(x\) would be permitted.

We know that \(x'_2\) and \(\overline{y}\) do not satisfy the disjunct. We argue that this is a contradiction by considering each term in turn. Since there are infinitely many \(x'_2\) we can assume without loss of generality that \(x'_2 > x'_1\).

  1. 1.

    If \(ax+ {f}\mathord {\left( {\overline{y}}\right) } \le c\) (resp. \(ax+ {f}\mathord {\left( {\overline{y}}\right) } \ge c\)) appears and is satisfied by \(x'_1\), then \(a\) must be negative or zero (resp. positive or zero) and \(x'_2\) will also satisfy the atom.

  2. 2.

    Atoms of the form \(y_i\equiv _{k'} c\) do not distinguish values of \(x\) and thus are satisfied for both \(x'_1\) and \(x'_2\). We cannot have \(x'_1 \equiv _{k'} c\) but not \(x'_2 \equiv _{k'} c\) since \(x'_1\) and \(x'_2\) satisfy the same divisibility constraints.

Thus, it cannot be the case that \(x'_1\) satisfies the disjunct, while \(x'_2\) does not. This is our required contradiction. Hence, for all \(x_1, x_2 \ge B\) and \(\overline{y}\) such that \({\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) }\) it must be the case that \({\phi }\mathord {\left( {x_1, \overline{y}}\right) }\) iff \({\phi }\mathord {\left( {x_2, \overline{y}}\right) }\). We have thus established the existence of a bound \(B\).

Size of the Bound. We now argue that this bound is exponential in the size of \(\phi \), and can thus be encoded in a polynomial number of bits.

Consider the formula that is essentially the negation of our property.

$$ {\chi }\mathord {\left( {x_1, x_2, \overline{y}}\right) } = {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2}\right) } \wedge {\phi }\mathord {\left( {x_1, \overline{y}}\right) } \wedge \lnot {\phi }\mathord {\left( {x_2, \overline{y}}\right) } . $$

There is some computable bound \(B'\) exponential in the size of \(\chi \) (and thus \(\phi \)) such that, if there exists \(x_1, x_2 \ge B'\) and some \(\overline{y}\) such that \({\chi }\mathord {\left( {x_1, x_2, \overline{y}}\right) }\) holds, then there are infinitely many \(x'_1\) and \(x'_2\) such that for some \(\overline{y}'\) we have that \({\chi }\mathord {\left( {x'_1, x'_2, \overline{y}'}\right) }\) holds. An argument for the existence of this bound is given in the full version In short, we first convert the formula above into a disjunction of conjunctions of linear equalities, using a linear number of slack variables to encode inequalities and divisibility constraints. Then, using a result of Chistikov and Haase [10], we set \(B' = 2^{dnm+ 3}\) where \(d\) is the number of bits needed to encode the largest constant in the converted formula (polynomially related to the size of the formula above), \(n\) is the maximum number of linear equalities in any disjunct, and \(m\) is the number of variables (including slack variables).

Now, assume that the smallest \(B\) is larger than \(B'\). That is

$$ \forall x_1, x_2 \ge B. \forall \overline{y}.~~ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } \Rightarrow \left( { {\phi }\mathord {\left( {x_1, \overline{y}}\right) } \iff {\phi }\mathord {\left( {x_2, \overline{y}}\right) } }\right) $$

holds, but it does not hold that

$$ \forall x_1, x_2 \ge B' . \forall \overline{y}.~~ {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } \Rightarrow \left( { {\phi }\mathord {\left( {x_1, \overline{y}}\right) } \iff {\phi }\mathord {\left( {x_2, \overline{y}}\right) } }\right) $$

This implies there exists some \(x_1, x_2 \ge B'\) and \(\overline{y}\) such that \({\chi }\mathord {\left( {x_1, x_2, \overline{y}}\right) }\) holds. Thus, there are infinitely many such \(x'_1\) and \(x'_2\), contradicting the fact that all \(x'_1, x'_2 \ge B\) do not satisfy the property. Thus, we take \(B'\) as the value of \(B\). It is computable in polynomial time, exponential in size, and representable in a polynomial number of bits.

4 Variadic Decomposability

We consider decomposition along several variables instead of just one. In this section, we assume without loss of generality that \(\phi \) is given in positive normal form and all (in)equalities rearranged into the form \(\sum _ia_ix_i \ge b\). We may use negation \(\lnot \phi \) as a shorthand. We require this form because later we use the set of all linear equations in the DNF of a formula. Since negation alters the linear equations, it is more convenient to assume that negation has already been eliminated.

4.1 \(\varPi \)-Decomposability

As described in Sect. 2.3, we refine the notion of \(\varPi \)-decomposability to separate only a single set \(Y_i\) in \(\varPi = \{Y_1, \ldots , Y_n\}\). Without loss of generality, we assume we are given a formula \({\phi }\mathord {\left( {\overline{x}, \overline{y}}\right) }\) and we separate the variables in \(\overline{x}\) from \(\overline{y}\).

In particular, given a formula \({\phi }\mathord {\left( {\overline{x}, \overline{y}}\right) }\) we aim to decompose the formula into \( {\phi }\mathord {\left( {\overline{x}, \overline{y}}\right) } \equiv \bigvee \limits _{j} {\varDelta _j}\mathord {\left( {\overline{x}}\right) } \wedge {\psi _j}\mathord {\left( {\overline{y}}\right) } \) for some \(\text {QF}(\mathbb {N})\) formulas \(\varDelta _i\) and \(\psi _i\).

4.2 Decomposition

We show that testing whether a given formula \(\phi \) is variadic decomposable on \(\overline{x}\) is in coNP. This proves Theorem 2 as the coNP lower bound follows from the monadic case.

Lemma 4

(Decomposing on \(\overline{x}\)). Given a \(\text {QF}(\mathbb {N})\) formula \({\phi }\mathord {\left( {\overline{x}, \overline{y}}\right) }\) there is a coNP algorithm to decide if \(\phi \) is variadic decomposable on \(\overline{x}\). Moreover, if a decomposition exists, it can be constructed in exponential-time and is exponential in size.

Let \(F\) be the set of all \(f\) such that \({f}\mathord {\left( {\overline{x}}\right) } + {g}\mathord {\left( {\overline{y}}\right) } \ge b\) is a linear inequality appearing in \(\phi \). Our approach will divide the points of \(\overline{x}\) into regions where all points within a region can be paired with the same values of \(\overline{y}\) to satisfy the formula. These regions are given by a bound \(B\). If \({f}\mathord {\left( {\overline{x}}\right) }\) is within the bound, then two points \(\overline{x}_1\) and \(\overline{x}_2\) are in the same region if \({f}\mathord {\left( {\overline{x}_1}\right) } = {f}\mathord {\left( {\overline{x}_2}\right) }\). If two points are outside the bound, then by a pumping argument we can show that we have \({\phi }\mathord {\left( {\overline{x}_1, \overline{y}}\right) }\) iff \({\phi }\mathord {\left( {\overline{x}_2, \overline{y}}\right) }\).

Let \({\hat{r}}= \left( {\mathrm {UB}, \mathrm {EQ}}\right) \) be a partition of \(F\) into unbounded and bounded functions (where \(\mathrm {EQ}\) refers to equality being asserted over bounded functions as shown below). Define for each \({\hat{r}}= \left( {\mathrm {UB}, \mathrm {EQ}}\right) \)

$$ {\mathrm {Region}_{{\hat{r}}}}\mathord {\left( {\overline{x}_1, \overline{x}_2}\right) } \triangleq \left( { \bigwedge \limits _{f\in \mathrm {EQ}} {f}\mathord {\left( {\overline{x}_1}\right) } = {f}\mathord {\left( {\overline{x}_2}\right) } }\right) . $$

Note, this formula intentionally does not say anything about the unbounded functions. This is important when we need to derive a bound—such a derivation cannot use a pre-existing bound.

We also need to extend \(\mathrm {SameDiv}\) to account for \(\overline{x}_1\) and \(\overline{x}_2\) being vectors. This is a straightforward extension asserting that each variable in \(\overline{x}_1\) satisfies the same divisibility constraints as its counterpart in \(\overline{x}_2\). Again, let \(\mathrm {Divs}\) be the set of all divisibility constraints \(az_1 \equiv _{k} bz_2\) appearing (syntactically) in \(\phi \). Let \(x_i\), \(x^1_i\) and \(x^2_i\) denote the \(i\)th variable of \(\overline{x}\), \(\overline{x}_1\), and \(\overline{x}_2\) respectively. Assume without loss of generality that variables in \(\overline{x}\) always either appear on the left-hand side of a divisibility constraint (i.e. in the \(z_1\) position) or on both sides. Define

$$\begin{aligned}&\quad {\mathrm {SameDiv}}\mathord {\left( {x_1, x_2, \overline{y}}\right) } = \\&\bigwedge \limits _{\begin{array}{c} ax_i\equiv _{k} bz\in \mathrm {Divs}, \\ z\ne x_j \end{array}} \left( {\begin{array}{c} \left( {ax^1_i\equiv _{k} bz}\right) \\ \iff \\ \left( {ax^2_i\equiv _{k} bz}\right) \end{array}}\right) \wedge \bigwedge \limits _{\begin{array}{c} ax_i\equiv _{k} bx_j\\ \in \\ \mathrm {Divs} \end{array}} \left( {\begin{array}{c} \left( {ax^1_i\equiv _{k} bx^1_j}\right) \\ \iff \\ \left( {ax^2_i\equiv _{k} bx^2_j}\right) \end{array}}\right) \wedge \bigwedge \limits _{\begin{array}{c} x_i\equiv _{k} c\\ \in \\ \mathrm {Divs} \end{array}}\left( {\begin{array}{c} \left( {x^1_i\equiv _{k} c}\right) \\ \iff \\ \left( {x^2_i\equiv _{k} c}\right) \end{array}}\right) . \end{aligned}$$

Next, we introduce an operator for comparing a vector of variables with a bound. For \(a\in \mathbb {Z}\) let \({\mathrm {abs}}\mathord {\left( {a}\right) }\) denote the absolute value of \(a\). Given a bound \(B\) and some \({\hat{r}}= \left( {\mathrm {UB}, \mathrm {EQ}}\right) \) let

$$ \left( {\overline{x}\ge _{{\hat{r}}} B}\right) \triangleq \bigwedge \limits _{f\in \mathrm {UB}} {\mathrm {abs}}\mathord {\left( {{f}\mathord {\left( {\overline{x}}\right) }}\right) } \ge B\wedge \bigwedge \limits _{f\in \mathrm {EQ}} {\mathrm {abs}}\mathord {\left( {{f}\mathord {\left( {\overline{x}}\right) }}\right) } < B. $$

We claim there is an exponential bound \(B\) such that \(\phi \) is variadic decomposable iff for all \({\hat{r}}\) we have

$$\begin{aligned} \forall \overline{x}_1, \overline{x}_2 \ge _{{\hat{r}}} B.\ \forall \overline{y}.\ \left( {\begin{array}{c} {\mathrm {Region}_{{\hat{r}}}}\mathord {\left( {\overline{x}_1, \overline{x}_2}\right) } \\ \wedge \\ {\mathrm {SameDiv}}\mathord {\left( {\overline{x}_1, \overline{x}_2, \overline{y}}\right) } \end{array}}\right) \Rightarrow \left( {\begin{array}{c} {\phi }\mathord {\left( {\overline{x}_1, \overline{y}}\right) } \\ \iff \\ {\phi }\mathord {\left( {\overline{x}_2, \overline{y}}\right) } \end{array}}\right) \qquad \qquad \qquad \text {(DC-}{\hat{r}}{\text{) }} \end{aligned}$$

Note, unsatisfiability can be tested in NP. First guess \({\hat{r}}\), then guess \(\overline{x}_1, \overline{x}_2, \overline{y}\).

We prove soundness of the claim in the next section. Completeness is an extension of the argument for the monadic case and is given in the full version. In the monadic case, we were able to take some values of \(x_1, x_2 > B\) such that both satisfied the same divisibility constraints, but one value satisfied the formula while the other did not. Since these values were large, we derived an infinite number of such value pairs with increasing values. We then used these growing solutions to show that it was impossible for the value of \(x_1\) to satisfy the formula, while the value of \(x_2\) does not, as they were both beyond the distinguishing power of the linear inequalities. The argument for the variadic case is similar, with the values of \(x_1\) and \(x_2\) being replaced by the values of \({f}\mathord {\left( {\overline{x}_1}\right) }\) and \({f}\mathord {\left( {\overline{x}_2}\right) }\).

4.3 Soundness

Assume there is an exponential bound \(B\) such that for each \({\hat{r}}\), Equation DC-\({\hat{r}}\) holds. We show how to produce a decomposition.

As in the monadic case (Sect. 3.3), let \(\mathrm {XDivs}\) be the set of all constraints of the form \(\overline{x}_i\equiv _{k} c\) in the expansion of the divisibility constraints of \(\phi \). Observe again that there are only exponentially many maximal consistent subsets \(D\subseteq \mathrm {XDivs}\). For each \(D\) fix a vector of values \(\overline{c}_D\) that satisfies all constraints in \(D\) and is encodable in a polynomial number of bits. Furthermore, we define

$$ {\mathrm {Div}_{D}}\mathord {\left( {\overline{z}}\right) } \triangleq \bigwedge \limits _{ x_i\equiv _{k} c\in D} z_i\equiv _{k} c. $$

For each \({\hat{r}}\) and \(D\) we can define an equivalence relation over values of \(\overline{x}\) such that \(\overline{x}\ge _{{\hat{r}}} B\) and \({\mathrm {Div}_{D}}\mathord {\left( {\overline{x}}\right) }\).

$$ \left( {\overline{x}_1 =^{D}_{{\hat{r}}} \overline{x}_2}\right) \triangleq \left( { \overline{x}_1 \ge _{{\hat{r}}} B\wedge \overline{x}_2 \ge _{{\hat{r}}} B\wedge {\mathrm {Region}_{{\hat{r}}}}\mathord {\left( {\overline{x}_1, \overline{x}_2}\right) } \wedge {\mathrm {Div}_{D}}\mathord {\left( {\overline{x}_1}\right) } \wedge {\mathrm {Div}_{D}}\mathord {\left( {\overline{x}_2}\right) } }\right) . $$

Observe each equivalence relation has an exponential number of equivalence classes depending on the values of the bounded \(f\). Let \(C^{D}_{{\hat{r}}}\) be a set of minimal representatives from each equivalence class such that each representative is representable in a polynomial number of bits. These can be computed by solving an existential Presburger constraint for each set of values of the bounded \(f\). In particular, for each \({\hat{r}}= \left( {\mathrm {UB}, \mathrm {EQ}}\right) \) and assignments \({\mathrm {abs}}\mathord {\left( {c_f}\right) } < B\) for each \(f\in \mathrm {EQ}\), we select a solution to the equation

$$ \overline{x}\ge _{{\hat{r}}} B\wedge \bigwedge \limits _{f\in \mathrm {EQ}} {f}\mathord {\left( {\overline{x}}\right) } = c_f\wedge {\mathrm {Div}_{D}}\mathord {\left( {\overline{x}}\right) } $$

if such a solution exists. If no such solution exists, the assignment can be ignored.

The decomposition is

$$ \bigvee \limits _{{\hat{r}}} \bigvee \limits _{D} \bigvee \limits _{\overline{c}\in C^{D}_{{\hat{r}}}} \left( { \overline{x}\ge _{{\hat{r}}} B\wedge {\mathrm {Region}_{{\hat{r}}}}\mathord {\left( {\overline{x}, \overline{c}}\right) } \wedge {\mathrm {Div}_{D}}\mathord {\left( {\overline{x}}\right) } \wedge {\phi }\mathord {\left( {\overline{c}, \overline{y}}\right) } }\right) . $$

The correctness of this decomposition follows from the Equations DC-\({\hat{r}}\). For any values \(\overline{c}_{\overline{x}}\) and \(\overline{c}_{\overline{y}}\) of \(\overline{x}\) and \(\overline{y}\), first assume \({\phi }\mathord {\left( {\overline{c}_{\overline{x}}, \overline{c}_{\overline{y}}}\right) }\) holds. Since there is some disjunct in the decomposition for which it holds that \(\overline{c}_{\overline{x}} \ge _{{\hat{r}}} B\wedge {\mathrm {Region}_{{\hat{r}}}}\mathord {\left( {\overline{c}_{\overline{x}}, \overline{c}}\right) } \wedge {\mathrm {Div}_{D}}\mathord {\left( {\overline{x}}\right) }\) then, by applying Equation DC-\({\hat{r}}\) we get \({\phi }\mathord {\left( {\overline{c}, \overline{c}_{\overline{y}}}\right) }\) as required. Conversely, if some disjunct of the decomposition holds, we can apply Equation DC-\({\hat{r}}\) and obtain \({\phi }\mathord {\left( {\overline{c}_{\overline{x}}, \overline{c}_{\overline{y}}}\right) }\).

5 Applications of Decomposition

5.1 Monadic Decomposition in String Solving

The development of effective techniques for solving string constraints has received a lot of attention over the last years, motivated by applications ranging from program verification [2, 16] and security analysis [22, 23] to the analysis of access policies of cloud services [4]. Strings give rise to a rich theory that may combine, depending on the studied fragment, (i) word equations, i.e., equations over the free monoid generated by some finite (but often large) alphabet, (ii) regular expression constraints, (iii) transduction, i.e., constraints described by finite-state automata with multiple tracks, (iv) conversion functions, e.g. between integer variables and strings encoding numbers in binary or decimal notation, (v) length constraints, i.e., arithmetic constraints on the length of strings.

Table 1. Statistics about the Kaluza benchmarks [22]. It should be noted (and is well-known [18]) that the categories “sat” and “unsat” do not (always) imply the status of the benchmarks, they only represent the way the benchmarks were organised by the Kaluza authors.

The handling of length constraints has turned out to be particularly challenging in this context, both practically and theoretically. Even for the combination of word equations (or even just quadratic word equations) with length constraints, decidability of the (quantifier-free) theory is a long-standing open problem [21]. At the same time, length constraints are quite frequently used in applications; they are needed, for instance, when encoding operations like indexof or substring, or also when splitting a string into the parts separated by some delimiter. In standard benchmark libraries for string constraints, like the Kaluza set [22], benchmarks with length constraints occur in large numbers.

The notion of monadic decomposition is in this setting important, since any monadic length constraint (in Presburger arithmetic) can be reduced to a Boolean combination of regular expression constraints, and is therefore easier to handle than the general case.

Proposition 2

Satisfiability of a quantifier-free formula \(\phi = \phi _{\text {eq}} \wedge \phi _{\text {regex}} \wedge \phi _{\text {len}}\) consisting of word equations, regular expression constraints, and monadically decomposable length constraints is decidable.

Proof

Suppose \(w_1, \ldots , w_n\) are the string variables occurring in \(\phi \), and \(|w_1|, \ldots , |w_n|\) the terms representing their length. A decision procedure can first compute a monadic representation \(\phi '_{\text {len}}\) of \(\phi _{\text {len}}\) over lengths \(|w_1|, \ldots , |w_n|\), and then turn each atom \(\varDelta (|w_i|)\) in \(\phi '_{\text {len}}\) into an equivalent regular membership constraint \(w_i \in \mathcal{L}_\varDelta \). This is possible because the Presburger formula \(\varDelta \) can be represented as a semi-linear set, which can directly be translated to a regular expression. Decidability follows from the decidability of word equations combined with regular expression constraints [13].    \(\square \)

This motivates the use of monadic decomposition as a standard pre-processing step in string solvers, transforming away those length constraints that can be turned into monadic form. To evaluate the effectiveness of such an optimisation, we implemented the decomposition check defined in Sect. 3.2, and used it within the string SMT solver OSTRICH [9] to determine the number of Kaluza benchmarks with monadic decomposable length constraints.Footnote 1 The results are summarised in Table 1:

  • Of altogether 47 284 benchmarks, 21 226 contain the str.len function, and therefore length constraints. This number was determined by a simple textual analysis of the benchmarks.

  • Running our decomposition check in OSTRICH, in 13 205 of the 21 226 cases length constraints were found that could be analysed. The remaining 8 021 problems were proven unsatisfiable without ever reaching the string theory solver in OSTRICH, i.e., as a result of pre-processing the input formula, or because Boolean reasoning discovered obvious inconsistencies in the problems.

  • In 8 492 of the 13 205 cases, all analysed length constraints were found to be monadically decomposable; 4 713 of the benchmarks contained length constraints that could not be decomposed.

This means that 42 571 of the Kaluza benchmarks (slightly more than 90%) do in principle not require support for length constraints in a string solver, either because there are no length constraints, or because length constraints can be decomposed and then turned into regular expression constraints.

Even with a largely unoptimised implementation, the time required to check whether length constraints can be decomposed was negligible in case of the Kaluza benchmarks, with the longest check requiring 2.1 s (on an AMD Opteron 2220 SE machine). The maximum number of variables in a length constraint was 140.

5.2 Variadic Decomposition in Quantifier Elimination

A second natural application of decomposition is quantifier elimination, i.e., the problem of deriving an equivalent quantifier-free formula \(\phi '\) for a given formula \(\phi \) with quantifiers. In Presburger arithmetic, for a formula \(\phi = \exists x_1, \ldots , x_n.\, \psi \) with n quantifiers but no quantifier alternations, quantifier elimination in the worst case causes a doubly-exponential increase in formula size [25].

Variadic decomposition can be used to eliminate quantifiers with a smaller worst-case increase in size, provided that the matrix of a quantifier formula can be decomposed. Suppose \(\phi = \exists \bar{x}.\, \psi (\bar{x}, \bar{y})\) is given and \(\psi \) is variadic decomposable on \(\bar{x}\), i.e.,

$$\begin{aligned} {\psi }\mathord {\left( {\bar{x}, \bar{y}}\right) } ~\equiv ~ \bigvee \limits _{j} {\varDelta _j}\mathord {\left( {\bar{x}}\right) } \wedge {\psi _j}\mathord {\left( {\bar{y}}\right) } \end{aligned}$$

This means that the existential quantifiers can be distributed over the disjunction, and their elimination turns into a simpler satisfiability check:

$$\begin{aligned} \exists \bar{x}.\, {\psi }\mathord {\left( {\bar{x}, \bar{y}}\right) } ~\equiv ~ \bigvee \limits _{j} \exists \bar{x}.\, {\varDelta _j}\mathord {\left( {\bar{x}}\right) } \wedge {\psi _j}\mathord {\left( {\bar{y}}\right) } ~\equiv ~ \bigvee \limits _{j:~ {\varDelta _j}\mathord {\left( {\bar{x}}\right) } \text {~is sat}} {\psi _j}\mathord {\left( {\bar{y}}\right) } \end{aligned}$$

Universal quantifiers can be handled in a similar way by negating the matrix first.

Proposition 3

Take a formula \(\phi (\bar{y}) = \exists \bar{x}.\, \psi (\bar{x}, \bar{y})\) in Presburger arithmetic in which \(\psi \) is quantifier-free and variadic decomposable on \(\bar{x}\). Then there is a quantifier-free formula \(\phi '(\bar{y})\) that is equivalent to \(\phi \) and at most singly-exponentially bigger than \(\phi \).

Checking whether a formula can be decomposed is therefore a simple optimisation that can be added to any quantifier elimination procedure for Presburger arithmetic.

6 Conclusion and Future Work

We have shown that the monadic and variadic decomposability problem for \(\text {QF}(\mathbb {N})\) is coNP-complete. Moreover, when a decomposition exists, it is at most exponential in size and can be computed in exponential time. This formula size is tight for decompositions presented in either disjunctive or conjunctive normal form.

We gave two applications of our results. The first was in string constraint solving. In program analysis, string constraints are often mixed with numerical constraints on the lengths of the strings (for example, via the indexOf function). Length constraints significantly complicate the analysis of strings. However, if the string constraints permit a monadic decomposition, they may be reduced to regular constraints and thus eliminated. We analysed the well-known Kaluza benchmarks and showed that less than 10% of the benchmarks contained length constraints that could not be decomposed.

For the second application, we showed that the doubly exponential blow-up caused by quantifier elimination can be limited to a singly exponential blow up whenever the formula is decomposable on the quantified variables. Thus, variadic decomposition can form an optimisation step in a quantifier elimination algorithm.

Interesting problems are opened up by our results. It would be interesting to study lower bounds for general boolean formulas. If smaller decompositions are possible, they would be useful for applications in string solving.

Second, we may consider variadic decomposition where a partition \(\varPi \) is not given as part of the input. Instead, one must check whether a \(\varPi \)-decomposition exists for some non-trivial \(\varPi \). This variant of the problem has a simple \(\varSigma ^P_2\) algorithm that first guesses some \(\varPi \) and then verifies \(\varPi \)-decomposability. However, the only known lower bound is coNP, which follows the same argument as monadic decomposability. A better algorithm would not improve the worst-case complexity for our quantifier elimination application, but it might provide a way to quickly identify a subset of a block of quantifiers that can be eliminated quickly with \(\varPi \)-decompositions.