# Efficient Automated Reasoning About Sets and Multisets with Cardinality Constraints

- 626 Downloads

## Abstract

When reasoning about container data structures that can hold duplicate elements, multisets are the obvious choice for representing the data structure abstractly. However, the decidability and complexity of constraints on multisets has been much less studied and understood than for constraints on sets. In this presentation, we outline an efficient decision procedure for reasoning about multisets with cardinality constraints. We describe how to translate, in linear time, multisets constraints to constraints in an extension of quantifier-free linear integer arithmetic, which we call LIA*. LIA* extends linear integer arithmetic with unbounded sums over values satisfying a given linear arithmetic formula. We show how to reduce a LIA* formula to an equisatisfiable linear integer arithmetic formula. However, this approach requires an explicit computation of semilinear sets and in practice it scales poorly even on simple benchmarks. We then describe a recent more efficient approach for checking satisfiability of LIA*. The approach is based on the use of under- and over-approximations of LIA* formulas. This way we avoid the space overhead and explicitly computing semilinear sets. Finally, we report on our prototype tool which can efficiently reason about sets and multisets formulas with cardinality constraints.

## Keywords

Multisets Cardinality constraints Linear interger arithmetic## 1 Introduction

In the verification of container data structures one often needs to reason about sets of objects – for example, abstracting the content of a container data structure as a set. The need for cardinality constraints naturally arises in order to reason about the number of the elements in the data structure. We have all witnessed to the success of the BAPA logic [4, 5] that was, among others, used for verification of distributed algorithms [1].

Similarly, when reasoning about container data structures that can hold duplicate elements, multisets are the obvious choice of an abstraction. Multisets are collections of objects where an element can occur several times. They can be seen as “sets with counting”. Although multisets are interesting mathematical objects that can be widely used in verification, there was no efficient reasoner for multisets and sets with cardinality constraints until recently [6]. Moreover, for a long time it was not known if the logic of multisets with cardinality constraints is even decidable [7]. Nevertheless, researchers have recognized the importance of this logic and they have been studied multisets in combination with other theories.

Zarba [13] investigated decision procedures for quantifier-free multisets but without the cardinality operator. He showed how to reduce a multiset formula to a quantifier-free defining each multiset operation pointwise on the elements of the set. Adding the cardinality operator makes such a reduction impossible.

Lugiez studied multiset constraints in the context of a more general result on multitree automata [7] and proved the decidability of multiset formulas with a weaker form of cardinality operator that counts only distinct elements in a multiset.

### 1.1 Multisets with Cardinality Constraints

*bags*) with their characteristic functions. A multiset

*m*is a function \(\mathbb {E}\rightarrow \mathbb {N}\), where \(\mathbb {E}\) is the universe used for populating multisets and \(\mathbb {N}\) is the set of non-negative integers. The value

*m*(

*e*) is the multiplicity (the number of occurrences) of an element

*e*in a multiset

*m*. We assume that the domain \(\mathbb {E}\) is fixed and finite but of an unknown size. We consider the logic of multisets constraints with the cardinality operator (MAPA), given in Fig. 1. An atom in MAPA is either a multiset comparison, or it is a standard quantifier-free linear integer arithmetic atom, or it is a quantified formula (\(\forall e. \mathsf {F}^\textsf {in}\)), or it is a collecting sum formula. We allow only universal quantification over all elements of \(\mathbb {E}\). This way we can express, for example, that for a multiset

*k*it holds \(\forall e. k(e)=0 \vee k(e)=1\) – in other words,

*k*is a set. A collecting sum atom is used to group several formulas involving sums into a single atom. This is needed for the next step of the decision procedure. The sums are used in the definition of the cardinality operator:

Piskac and Kuncak [9] showed that every MAPA formula can be translated to an equisatisfiable \(\mathrm {LIA}^\star \) formula. The translation is linear and described in [9]. This way reasoning about MAPA formulas reduces to reasoning about \(\mathrm {LIA}^\star \) formulas.

### 1.2 Reasoning About \(\mathrm {LIA}^\star \) Formulas

*S*is a set if all linear additive combinations of vectors from

*S*. Its syntax is given in Fig. 2.

To check a satisfiability of a \(\mathrm {LIA}^\star \) formula, we use the semilinear set characterization of solutions of integer linear arithmetic formulas.

### Definition 1 (Semilinear sets)

A semilinear set \(SLS(ls_1, \ldots , ls_n)\) is a finite union of linear sets \(ls_1,\ldots ,ls_n\), i.e., \(SLS(ls_1, \ldots , ls_n) = \bigcup _{i=1}^n ls_i\).

Ginsburg and Spanier showed (Theorem 1.3 in [3]) that a solution set for every Presburger arithmetic formula is a semilinear set, and we use that result to eliminate the star operator.

### Theorem 1

**(Lemmas 2 and 3 in**[10]

**).**Given a \(\mathrm {LIA}^\star \) atom \(\varvec{x_1} \in \{ \varvec{x_2} \mid F_2\}^\star \), let \(SLS(LS(\varvec{a_1}, B_1),\ldots ,LS(\varvec{a_k}, B_k))\) be a semilinear set describing the set of the solutions of formula \(F_2\). The atom \(\varvec{x_1} \in \{ \varvec{x_2} \mid F_2\}^\star \) is equisatisfiable to the following \(\mathrm {LIA}\) formula:

By applying Theorem 1, checking satisfiability of a \(\mathrm {LIA}^\star \) formula reduces to reasoning about linear integer arithmetic. Note, however, that this approach results in automatically constructing a formula might be really large, depending on the size of a semilinear set. In addition, this approach relies on computing semilinear sets explicitly, both of which make it scale poorly even on simple benchmarks.

## 2 Illustrating Example

*X*and

*Y*, the size of their disjoint union is the sum of their respective sizes. In other words, we need to prove the validity of the following formula

*M*defining multiset expression \(X \uplus Y \) and then we introduce integer variables \(k_1, k_2, k_3\) for each of the cardinality expressions. This way the formula becomes:

*sum normal form*. It consists of three conjuncts. One is a pure \(\mathrm {LIA}\) formula, the other is the summation and the third part is a universally quantified formula. By applying Theorem 2 from [9], the above MAPA formula is translated into an equisatisfiable \(\mathrm {LIA}^\star \) formula, where

*m*,

*x*and

*y*are non-negative integer variables:

## 3 Efficient Reasoning About \(\mathrm {LIA}^\star \) Formulas

The described decision procedure is sound and complete. However, its crucial component is a computation of semilinear sets. While it is possible to compute Hilbert basis using the z3 [2] SMT solver, to the best of our knowledge there are no efficient tools for computing semilinear sets. Moreover, Pottier [12] showed that a semilinear set might contain an exponential number of vectors. To overcome the explicit computation of semilinear sets, Piskac and Kuncak [10] developed a new decision procedure for \(\mathrm {LIA}^\star \) which eliminates the star operator from the atom \(\varvec{x_1} \in \{ \varvec{x_2} \mid F\}^\star \) by showing that \(\varvec{x_1}\) is a linear combination of \(\mathcal {O} (n^2 \log n)\) solution vectors of *F*, where *n* is the size of the input formula. Although this new decision procedure avoids computing semilinear sets, it instantly produces a very large formula that could not be solved in practice by existing tools, not even for the most simple benchmarks.

Levatich et al. [6] used those insights to develop a new efficient and scalable approach for solving \(\mathrm {LIA}^\star \) formulas. The approach is based on the use of under- and over-approximations of \(\mathrm {LIA}^\star \) formulas. This way one avoids the space overhead and explicitly computing semilinear sets.

The key insight of their approach is to construct a solution or a proof of unsatisfiability “on demand”. Given a \(\mathrm {LIA}^\star \) formula \(F_1 (\varvec{x_1}) \wedge \varvec{x_1} \in \{ \varvec{x_2} \mid F_2 ( \varvec{x_2})\}^\star \), we first find any solution vector for formula \(F_2\), let us name it \(\varvec{u_1}\). We next check if formula \(F_1 (\varvec{x_1}) \wedge \varvec{x_1} = \lambda _1 * \varvec{u_1}\) is satisfiable. If this is the case, the given \(\mathrm {LIA}^\star \) formula is satisfiable as well. However, if this is not the case, we cannot conclude anything about the satisifiability of the given \(\mathrm {LIA}^\star \) formula, so we find a new different solution of formula \(F_2\), denoted by \(\varvec{u_2}\): \(F_2(\varvec{u_2}) \wedge \varvec{u_1} \ne \varvec{u_2}\). Next, we check if the vector \(\varvec{x_1}\) is a linear combination of those two solution vectors: \(F_1 (\varvec{x_1}) \wedge \varvec{x_1} = \lambda _1 * \varvec{u_1} + \lambda _2 * \varvec{u_2}\). If this newly constructed formula is satisfiable, so is the original \(\mathrm {LIA}^\star \) formula, otherwise we repeat the process. This way, by finding and checking solution vectors of \(F_2\), we construct underapproximations of the set \(\{ \varvec{x_2} \mid F_2 ( \varvec{x_2})\}^\star \). Moreover, we know that this process will terminate once we check sufficiently many solution vectors, as shown in [10].

However, if the given \(\mathrm {LIA}^\star \) formula is unsatisfiable, this approach will result in an equally large formula as in [10], and again it does not scale. Therefore, in parallel to finding an under-approximation of the set \(\{ \varvec{x_2} \mid F_2 ( \varvec{x_2})\}^\star \), we are also constructing a sequence of its over-approximation. The properties, that such an overapproximation should have, are encoded as a set of Constraint Horn Clauses and we use existing solvers to compute them. Such an overapproximation, if exists, is an interpolant that separates two conjuncts in the given \(\mathrm {LIA}^\star \) formula, proving this way that the formula is unsatisfiable.

Finally, we have implemented the presented decision procedure and the tool is publicly available at https://github.com/mlevatich/sls-reachability. Because there were no MAPA benchmarks available, we had to create our own benchmarks. In addition, we also treated 240 BAPA benchmarks about sets, available in [1], as MAPA benchmarks. While the full report on the empirical results is available in [6], our general assessment is that the presented algorithm is effective on both SAT and UNSAT benchmarks. Our tool solved 83% of benchmarks in less than 50 seconds, and over 75% of those in under 3 seconds. We believe that this tool is the first efficient reasoner for multisets and sets with cardinality constraints.

## 4 Conclusions

The presented work describes a sequence of decision procedures that has lead towards an efficient reasoner for multisets and sets with cardinality constraints. We noticed that some constraints arising in formal verification of protocols and data structures could have been expressed more succinctly and naturally, were they using multisets as the underlying abstract datatype in the specification. Nevertheless, due to the lack of tool support they use sets, resulting in more complex constraints. While there was an older tool for reasoning about multisets with cardinality constraints [11], that tool served mainly as a proof of concept and was evaluated only on a handful of manually written formulas. We here presented a recent tool for reasoning about sets and multisets and we showed empirically that this tool scales well and can easily reason about complex multiset formulas. We hope that this work will lead to a renewed research interest in multisets and encourage their use in software analysis and verification.

## Notes

### Acknowledgments

This presentation is based on the previously published results on reasoning about multisets with cardinality constraints [6, 9, 10, 11]. We sincerely thank the collaborators on these projects: Nikolaj Bjørner, Maxwell Levatich, Viktor Kunčak and Sharon Shoham, without whom this work would not be possible.

## References

- 1.Berkovits, I., Lazić, M., Losa, G., Padon, O., Shoham, S.: Verification of threshold-based distributed algorithms by decomposition to decidable logics. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11562, pp. 245–266. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25543-5_15CrossRefGoogle Scholar
- 2.de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24CrossRefGoogle Scholar
- 3.Ginsburg, S., Spanier, E.H.: Semigroups, Presburger formulas, and languages. Pacific J. Math.
**16**(2), 285–296 (1966)MathSciNetCrossRefGoogle Scholar - 4.Kuncak, V., Nguyen, H.H., Rinard, M.C.: An algorithm for deciding BAPA: Boolean algebra with Presburger arithmetic. In: Nieuwenhuis, R. (ed.) CADE 2005. LNCS (LNAI), vol. 3632, pp. 260–277. Springer, Heidelberg (2005). https://doi.org/10.1007/11532231_20CrossRefGoogle Scholar
- 5.Kuncak, V., Nguyen, H.H., Rinard, M.C.: Deciding Boolean algebra with Presburger arithmetic. J. Autom. Reason.
**36**(3), 213–239 (2006)MathSciNetCrossRefGoogle Scholar - 6.Levatich, M., Bjørner, N., Piskac, R., Shoham, S.: Solving \({\rm LIA}^\star \) using approximations. In: Beyer, D., Zufferey, D. (eds.) VMCAI 2020. LNCS, vol. 11990, pp. 360–378. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39322-9_17CrossRefGoogle Scholar
- 7.Lugiez, D.: Multitree automata that count. Theor. Comput. Sci.
**333**(1–2), 225–263 (2005)MathSciNetCrossRefGoogle Scholar - 8.Nelson, G., Oppen, D.C.: Fast decision procedures based on congruence closure. J. ACM
**27**(2), 356–364 (1980)MathSciNetCrossRefGoogle Scholar - 9.Piskac, R., Kuncak, V.: Decision procedures for multisets with cardinality constraints. In: Logozzo, F., Peled, D.A., Zuck, L.D. (eds.) VMCAI 2008. LNCS, vol. 4905, pp. 218–232. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78163-9_20CrossRefzbMATHGoogle Scholar
- 10.Piskac, R., Kuncak, V.: Linear arithmetic with stars. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 268–280. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70545-1_25CrossRefGoogle Scholar
- 11.Piskac, R., Kuncak, V.: MUNCH - automated reasoner for sets and multisets. In: Giesl, J., Hähnle, R. (eds.) IJCAR 2010. LNCS (LNAI), vol. 6173, pp. 149–155. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14203-1_13CrossRefzbMATHGoogle Scholar
- 12.Pottier, L.: Minimal solutions of linear diophantine systems: bounds and algorithms. In: Book, R.V. (ed.) RTA 1991. LNCS, vol. 488, pp. 162–173. Springer, Heidelberg (1991). https://doi.org/10.1007/3-540-53904-2_94CrossRefGoogle Scholar
- 13.Zarba, C.G.: Combining multisets with integers. In: Voronkov, A. (ed.) CADE-18. LNCS (LNAI), vol. 2392, pp. 363–376. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45620-1_30CrossRefGoogle Scholar