IJCAR 2020: Automated Reasoning pp 3-10

# Efficient Automated Reasoning About Sets and Multisets with Cardinality Constraints

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12166)

## Abstract

When reasoning about container data structures that can hold duplicate elements, multisets are the obvious choice for representing the data structure abstractly. However, the decidability and complexity of constraints on multisets has been much less studied and understood than for constraints on sets. In this presentation, we outline an efficient decision procedure for reasoning about multisets with cardinality constraints. We describe how to translate, in linear time, multisets constraints to constraints in an extension of quantifier-free linear integer arithmetic, which we call LIA*. LIA* extends linear integer arithmetic with unbounded sums over values satisfying a given linear arithmetic formula. We show how to reduce a LIA* formula to an equisatisfiable linear integer arithmetic formula. However, this approach requires an explicit computation of semilinear sets and in practice it scales poorly even on simple benchmarks. We then describe a recent more efficient approach for checking satisfiability of LIA*. The approach is based on the use of under- and over-approximations of LIA* formulas. This way we avoid the space overhead and explicitly computing semilinear sets. Finally, we report on our prototype tool which can efficiently reason about sets and multisets formulas with cardinality constraints.

## Keywords

Multisets Cardinality constraints Linear interger arithmetic

## 1 Introduction

In the verification of container data structures one often needs to reason about sets of objects – for example, abstracting the content of a container data structure as a set. The need for cardinality constraints naturally arises in order to reason about the number of the elements in the data structure. We have all witnessed to the success of the BAPA logic [4, 5] that was, among others, used for verification of distributed algorithms [1].

Similarly, when reasoning about container data structures that can hold duplicate elements, multisets are the obvious choice of an abstraction. Multisets are collections of objects where an element can occur several times. They can be seen as “sets with counting”. Although multisets are interesting mathematical objects that can be widely used in verification, there was no efficient reasoner for multisets and sets with cardinality constraints until recently [6]. Moreover, for a long time it was not known if the logic of multisets with cardinality constraints is even decidable [7]. Nevertheless, researchers have recognized the importance of this logic and they have been studied multisets in combination with other theories.

Zarba [13] investigated decision procedures for quantifier-free multisets but without the cardinality operator. He showed how to reduce a multiset formula to a quantifier-free defining each multiset operation pointwise on the elements of the set. Adding the cardinality operator makes such a reduction impossible.

Lugiez studied multiset constraints in the context of a more general result on multitree automata [7] and proved the decidability of multiset formulas with a weaker form of cardinality operator that counts only distinct elements in a multiset.

### 1.1 Multisets with Cardinality Constraints

In this paper we revive the first decision procedure for multisets with cardinality constraints [9, 10]. We represent multisets (bags) with their characteristic functions. A multiset m is a function $$\mathbb {E}\rightarrow \mathbb {N}$$, where $$\mathbb {E}$$ is the universe used for populating multisets and $$\mathbb {N}$$ is the set of non-negative integers. The value m(e) is the multiplicity (the number of occurrences) of an element e in a multiset m. We assume that the domain $$\mathbb {E}$$ is fixed and finite but of an unknown size. We consider the logic of multisets constraints with the cardinality operator (MAPA), given in Fig. 1. An atom in MAPA is either a multiset comparison, or it is a standard quantifier-free linear integer arithmetic atom, or it is a quantified formula ($$\forall e. \mathsf {F}^\textsf {in}$$), or it is a collecting sum formula. We allow only universal quantification over all elements of $$\mathbb {E}$$. This way we can express, for example, that for a multiset k it holds $$\forall e. k(e)=0 \vee k(e)=1$$ – in other words, k is a set. A collecting sum atom is used to group several formulas involving sums into a single atom. This is needed for the next step of the decision procedure. The sums are used in the definition of the cardinality operator:
$$|m| = \sum \limits _{e \in \mathbb {E}} m(e)$$

Piskac and Kuncak [9] showed that every MAPA formula can be translated to an equisatisfiable $$\mathrm {LIA}^\star$$ formula. The translation is linear and described in [9]. This way reasoning about MAPA formulas reduces to reasoning about $$\mathrm {LIA}^\star$$ formulas.

### 1.2 Reasoning About $$\mathrm {LIA}^\star$$ Formulas

The $$\mathrm {LIA}^\star$$ logic [10] is a standard linear integer arithmetic extended with a new operator: the star operator, which is defined over a set of integer vectors as follows:
\begin{aligned} S^\star&\quad \triangleq \quad&\left\{ \sum _{i = 1}^n \varvec{s}_i \quad \mid \ \forall i. 1 \le i \le n. \ \varvec{s}_i \in S\right\} \end{aligned}
(1)
The result of the star operator applied to set S is a set if all linear additive combinations of vectors from S. Its syntax is given in Fig. 2.

To check a satisfiability of a $$\mathrm {LIA}^\star$$ formula, we use the semilinear set characterization of solutions of integer linear arithmetic formulas.

### Definition 1 (Semilinear sets)

A linear set $$LS(\varvec{a}, B)$$ is defined by an integer vector $$\varvec{a}$$ and a finite set of integer vectors $$B = \{\varvec{b_1},\ldots , \varvec{b_n}\}$$, all of the same dimension, as follows:
\begin{aligned} LS(\varvec{a}, B)&\quad \triangleq \quad&\left\{ \varvec{a} + \sum _{i = 1}^n \lambda _i \varvec{b_i} \quad \mid \quad \bigwedge _{i = 1}^n \lambda _i \ge 0 \right\} \end{aligned}
(2)
Sometimes, as a shorthand, we use $$\varvec{\lambda } B = \sum _{i = 1}^n \lambda _i \varvec{b_i}$$.

A semilinear set $$SLS(ls_1, \ldots , ls_n)$$ is a finite union of linear sets $$ls_1,\ldots ,ls_n$$, i.e., $$SLS(ls_1, \ldots , ls_n) = \bigcup _{i=1}^n ls_i$$.

Ginsburg and Spanier showed (Theorem 1.3 in [3]) that a solution set for every Presburger arithmetic formula is a semilinear set, and we use that result to eliminate the star operator.

### Theorem 1

(Lemmas 2 and 3 in [10]). Given a $$\mathrm {LIA}^\star$$ atom $$\varvec{x_1} \in \{ \varvec{x_2} \mid F_2\}^\star$$, let $$SLS(LS(\varvec{a_1}, B_1),\ldots ,LS(\varvec{a_k}, B_k))$$ be a semilinear set describing the set of the solutions of formula $$F_2$$. The atom $$\varvec{x_1} \in \{ \varvec{x_2} \mid F_2\}^\star$$ is equisatisfiable to the following $$\mathrm {LIA}$$ formula:
\begin{aligned} \exists \mu _1 \ge 0,&\ldots , \mu _k \ge 0, \varvec{\lambda }_1 \ge 0, \ldots \varvec{\lambda }_k \ge 0. \\&\varvec{x_1} = \sum _{i=1}^k \mu _i \varvec{a}_i + \varvec{\lambda }_i B_i \wedge \bigwedge _{i=1}^k (\mu _i = 0 \rightarrow \varvec{\lambda }_i = 0) \end{aligned}

By applying Theorem 1, checking satisfiability of a $$\mathrm {LIA}^\star$$ formula reduces to reasoning about linear integer arithmetic. Note, however, that this approach results in automatically constructing a formula might be really large, depending on the size of a semilinear set. In addition, this approach relies on computing semilinear sets explicitly, both of which make it scale poorly even on simple benchmarks.

## 2 Illustrating Example

We illustrate now how is a decision procedure for MAPA working on the following simple multiset formula: for two multisets X and Y, the size of their disjoint union is the sum of their respective sizes. In other words, we need to prove the validity of the following formula
\begin{aligned} |X \uplus Y | = |X| + |Y| \end{aligned}
As usual, we prove the unsatisfiability of the formula $$|X \uplus Y | \ne |X| + |Y|$$. The first step is to reduce this formula into an equisatisfiable $$\mathrm {LIA}^\star$$ formula. To do that, we perform a sequence of steps that resemble the purification step in the Nelson-Oppen combination procedure [8]. In a nutshell, we introduce a new variable for every non-terminal expression.
We first introduce a multiset variable M defining multiset expression $$X \uplus Y$$ and then we introduce integer variables $$k_1, k_2, k_3$$ for each of the cardinality expressions. This way the formula becomes:
\begin{aligned} k_1 \ne k_2 + k_3 \wedge k_1 = |M| \wedge k_2 = |X| \wedge k_3 = |Y|\wedge M = X \uplus Y \end{aligned}
We next apply the point-wise definitions of the cardinality and $$\uplus$$ operators and we obtain the following formula:
\begin{aligned} k_1 \ne k_2 + k_3 \wedge k_1 = \sum \limits _{e \in \mathbb {E}} M(e) \wedge k_2 = \sum \limits _{e \in \mathbb {E}} X(e) \wedge k_3 = \sum \limits _{e \in \mathbb {E}} Y(e) \qquad \qquad \quad \;\; \\ \wedge \forall e. M(e) = X(e) + Y(e) \end{aligned}
Grouping all the sum expressions together results in the formula:
\begin{aligned} k_1 \ne k_2 + k_3 \wedge (k_1, k_2, k_3) = \sum \limits _{e \in \mathbb {E}} (M(e) , X(e), Y(e)) \wedge \forall e. M(e) = X(e) + Y(e) \end{aligned}
Piskac and Kuncak have shown in [9] that every multiset formula can be reduced to this form. They call it the sum normal form. It consists of three conjuncts. One is a pure $$\mathrm {LIA}$$ formula, the other is the summation and the third part is a universally quantified formula. By applying Theorem 2 from [9], the above MAPA formula is translated into an equisatisfiable $$\mathrm {LIA}^\star$$ formula, where mx and y are non-negative integer variables:
\begin{aligned} k_1 \ne k_2 + k_3 \wedge (k_1, k_2, k_3) \in \{(m , x, y) | m = x + y \}^\star \end{aligned}
To check the satisfiability of this formula, we first need to eliminate the star operator, which is done by computing a semilinear set describing the set of solutions of $$m = x + y$$. In this particular case, the semilinear set is actually a linear set, consisting of the zero vector and two vectors defining linear combinations:
\begin{aligned} \{(m , x, y) | m = x + y \} = LS((0,0,0), \{(1,1,0), (1,0,1)\}) \end{aligned}
Having the semilinear set representation, we can apply Theorem 1. In particular, only one linear set and the zero vector can significantly simplify the corresponding equisatisfiable formula. As the result of applying Theorem 1, we obtain that formula $$(k_1, k_2, k_3) \in \{(m , x, y) | m = x + y \}^\star$$ is equisatisfiable to the formula $$(k_1, k_2, k_3) = \varvec{\lambda } \{(1,1,0), (1,0,1)\} \Leftrightarrow (k_1, k_2, k_3) = \lambda _1 (1,1,0) + \lambda _2 (1,0,1)$$.
This way we have eliminated the star operator from the given $$\mathrm {LIA}^\star$$ formula. It is now reduced to an equisatisfiable linear integer arithmetic formula:
\begin{aligned} k_1 \ne k_2 + k_3 \wedge k_1 = \lambda _1 + \lambda _2 \wedge k_2 = \lambda _1 \wedge k_3 = \lambda _2 \end{aligned}
The resulting $$\mathrm {LIA}$$ formula is unsatisfiable.

## 3 Efficient Reasoning About $$\mathrm {LIA}^\star$$ Formulas

The described decision procedure is sound and complete. However, its crucial component is a computation of semilinear sets. While it is possible to compute Hilbert basis using the z3 [2] SMT solver, to the best of our knowledge there are no efficient tools for computing semilinear sets. Moreover, Pottier [12] showed that a semilinear set might contain an exponential number of vectors. To overcome the explicit computation of semilinear sets, Piskac and Kuncak [10] developed a new decision procedure for $$\mathrm {LIA}^\star$$ which eliminates the star operator from the atom $$\varvec{x_1} \in \{ \varvec{x_2} \mid F\}^\star$$ by showing that $$\varvec{x_1}$$ is a linear combination of $$\mathcal {O} (n^2 \log n)$$ solution vectors of F, where n is the size of the input formula. Although this new decision procedure avoids computing semilinear sets, it instantly produces a very large formula that could not be solved in practice by existing tools, not even for the most simple benchmarks.

Levatich et al. [6] used those insights to develop a new efficient and scalable approach for solving $$\mathrm {LIA}^\star$$ formulas. The approach is based on the use of under- and over-approximations of $$\mathrm {LIA}^\star$$ formulas. This way one avoids the space overhead and explicitly computing semilinear sets.

The key insight of their approach is to construct a solution or a proof of unsatisfiability “on demand”. Given a $$\mathrm {LIA}^\star$$ formula $$F_1 (\varvec{x_1}) \wedge \varvec{x_1} \in \{ \varvec{x_2} \mid F_2 ( \varvec{x_2})\}^\star$$, we first find any solution vector for formula $$F_2$$, let us name it $$\varvec{u_1}$$. We next check if formula $$F_1 (\varvec{x_1}) \wedge \varvec{x_1} = \lambda _1 * \varvec{u_1}$$ is satisfiable. If this is the case, the given $$\mathrm {LIA}^\star$$ formula is satisfiable as well. However, if this is not the case, we cannot conclude anything about the satisifiability of the given $$\mathrm {LIA}^\star$$ formula, so we find a new different solution of formula $$F_2$$, denoted by $$\varvec{u_2}$$: $$F_2(\varvec{u_2}) \wedge \varvec{u_1} \ne \varvec{u_2}$$. Next, we check if the vector $$\varvec{x_1}$$ is a linear combination of those two solution vectors: $$F_1 (\varvec{x_1}) \wedge \varvec{x_1} = \lambda _1 * \varvec{u_1} + \lambda _2 * \varvec{u_2}$$. If this newly constructed formula is satisfiable, so is the original $$\mathrm {LIA}^\star$$ formula, otherwise we repeat the process. This way, by finding and checking solution vectors of $$F_2$$, we construct underapproximations of the set $$\{ \varvec{x_2} \mid F_2 ( \varvec{x_2})\}^\star$$. Moreover, we know that this process will terminate once we check sufficiently many solution vectors, as shown in [10].

However, if the given $$\mathrm {LIA}^\star$$ formula is unsatisfiable, this approach will result in an equally large formula as in [10], and again it does not scale. Therefore, in parallel to finding an under-approximation of the set $$\{ \varvec{x_2} \mid F_2 ( \varvec{x_2})\}^\star$$, we are also constructing a sequence of its over-approximation. The properties, that such an overapproximation should have, are encoded as a set of Constraint Horn Clauses and we use existing solvers to compute them. Such an overapproximation, if exists, is an interpolant that separates two conjuncts in the given $$\mathrm {LIA}^\star$$ formula, proving this way that the formula is unsatisfiable.

Finally, we have implemented the presented decision procedure and the tool is publicly available at https://github.com/mlevatich/sls-reachability. Because there were no MAPA benchmarks available, we had to create our own benchmarks. In addition, we also treated 240 BAPA benchmarks about sets, available in [1], as MAPA benchmarks. While the full report on the empirical results is available in [6], our general assessment is that the presented algorithm is effective on both SAT and UNSAT benchmarks. Our tool solved 83% of benchmarks in less than 50 seconds, and over 75% of those in under 3 seconds. We believe that this tool is the first efficient reasoner for multisets and sets with cardinality constraints.

## 4 Conclusions

The presented work describes a sequence of decision procedures that has lead towards an efficient reasoner for multisets and sets with cardinality constraints. We noticed that some constraints arising in formal verification of protocols and data structures could have been expressed more succinctly and naturally, were they using multisets as the underlying abstract datatype in the specification. Nevertheless, due to the lack of tool support they use sets, resulting in more complex constraints. While there was an older tool for reasoning about multisets with cardinality constraints [11], that tool served mainly as a proof of concept and was evaluated only on a handful of manually written formulas. We here presented a recent tool for reasoning about sets and multisets and we showed empirically that this tool scales well and can easily reason about complex multiset formulas. We hope that this work will lead to a renewed research interest in multisets and encourage their use in software analysis and verification.

## Notes

### Acknowledgments

This presentation is based on the previously published results on reasoning about multisets with cardinality constraints [6, 9, 10, 11]. We sincerely thank the collaborators on these projects: Nikolaj Bjørner, Maxwell Levatich, Viktor Kunčak and Sharon Shoham, without whom this work would not be possible.

## References

1. 1.
Berkovits, I., Lazić, M., Losa, G., Padon, O., Shoham, S.: Verification of threshold-based distributed algorithms by decomposition to decidable logics. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11562, pp. 245–266. Springer, Cham (2019).
2. 2.
de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008).
3. 3.
Ginsburg, S., Spanier, E.H.: Semigroups, Presburger formulas, and languages. Pacific J. Math. 16(2), 285–296 (1966)
4. 4.
Kuncak, V., Nguyen, H.H., Rinard, M.C.: An algorithm for deciding BAPA: Boolean algebra with Presburger arithmetic. In: Nieuwenhuis, R. (ed.) CADE 2005. LNCS (LNAI), vol. 3632, pp. 260–277. Springer, Heidelberg (2005).
5. 5.
Kuncak, V., Nguyen, H.H., Rinard, M.C.: Deciding Boolean algebra with Presburger arithmetic. J. Autom. Reason. 36(3), 213–239 (2006)
6. 6.
Levatich, M., Bjørner, N., Piskac, R., Shoham, S.: Solving $${\rm LIA}^\star$$ using approximations. In: Beyer, D., Zufferey, D. (eds.) VMCAI 2020. LNCS, vol. 11990, pp. 360–378. Springer, Cham (2020).
7. 7.
Lugiez, D.: Multitree automata that count. Theor. Comput. Sci. 333(1–2), 225–263 (2005)
8. 8.
Nelson, G., Oppen, D.C.: Fast decision procedures based on congruence closure. J. ACM 27(2), 356–364 (1980)
9. 9.
Piskac, R., Kuncak, V.: Decision procedures for multisets with cardinality constraints. In: Logozzo, F., Peled, D.A., Zuck, L.D. (eds.) VMCAI 2008. LNCS, vol. 4905, pp. 218–232. Springer, Heidelberg (2008).
10. 10.
Piskac, R., Kuncak, V.: Linear arithmetic with stars. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 268–280. Springer, Heidelberg (2008).
11. 11.
Piskac, R., Kuncak, V.: MUNCH - automated reasoner for sets and multisets. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS (LNAI), vol. 6173, pp. 149–155. Springer, Heidelberg (2010).
12. 12.
Pottier, L.: Minimal solutions of linear diophantine systems: bounds and algorithms. In: Book, R.V. (ed.) RTA 1991. LNCS, vol. 488, pp. 162–173. Springer, Heidelberg (1991).
13. 13.
Zarba, C.G.: Combining multisets with integers. In: Voronkov, A. (ed.) CADE-18. LNCS (LNAI), vol. 2392, pp. 363–376. Springer, Heidelberg (2002).

© Springer Nature Switzerland AG 2020

## Authors and Affiliations

1. 1.Yale UniversityNew HavenUSA