Proving Termination of Programs with Bitvector Arithmetic by Symbolic Execution

Hensel, Jera; Giesl, Jürgen; Frohn, Florian; Ströder, Thomas

doi:10.1007/978-3-319-41591-8_16

Jera Hensel¹⁵,
Jürgen Giesl¹⁵,
Florian Frohn¹⁵ &
…
Thomas Ströder¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9763))

Included in the following conference series:

International Conference on Software Engineering and Formal Methods

910 Accesses
4 Citations
1 Altmetric

Abstract

In earlier work, we developed an approach for automated termination analysis of C programs with explicit pointer arithmetic, which is based on symbolic execution. However, similar to many other termination techniques, this approach assumed the program variables to range over mathematical integers instead of bitvectors. This eases mathematical reasoning but is unsound in general. In this paper, we extend our approach in order to handle fixed-width bitvector integers. Thus, we present the first technique for termination analysis of C programs that covers both byte-accurate pointer arithmetic and bit-precise modeling of integers. We implemented our approach in the automated termination prover AProVE and evaluate its power by extensive experiments.

Supported by the DFG grant GI 274/6-1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See http://sv-comp.sosy-lab.org/.
2.
In C, adding 1 to the maximal unsigned integer results in 0. In contrast, for signed integers, adding 1 to the maximal signed integer results in undefined behavior. However, most C implementations return the minimal signed integer as the result.
3.
This LLVM program corresponds to the code obtained from g with the Clang compiler [3]. To ease readability, we wrote variables without “%” in front (i.e., we wrote “j” instead of “% j” as in proper LLVM) and added line numbers.
4.
Of course, $\langle {a} \rangle _{ FO }$ can be extended by more formulas, e.g., on the connection between $v_2$ and $v_2'$ if $(v_1 \hookrightarrow _{{\texttt {i{n}}},u}v_2), (v_1 \hookrightarrow _{{\texttt {i{m}}},u}v_2') \in PT $ for $n < m$. Then we can also handle programs which load an ${\texttt {i{n}}}$ integer from an address where an ${\texttt {i{m}}}$ integer was stored.
5.
As usual, mod is defined as follows: For any $m \in \mathbb {Z}$ and $n \in \mathbb {N}_{> 0}$, we have $t = m~\mathrm{mod}~n$ iff $t \in [0,n-1]$ and there exists a $k \in \mathbb {Z}$ such that $t = k \cdot n + m$.
6.
Then we would have to check first whether $ LV _{\!\!s, size (\mathtt{{ty}})}(t_1) < 0$ and $ LV _{\!\!s, size (\mathtt{{ty}})}(t_2) \ge 0$. In that case, “icmp ugt ty $t_1, t_2$” is true, since the most significant bits of $t_1$ and $t_2$ are 1 and 0, respectively. The other cases are $ LV _{\!\!s, size (\mathtt{{ty}})}(t_1) \ge 0 \wedge LV _{\!\!s, size (\mathtt{{ty}})}(t_2) < 0$, and the two cases where $ LV _{\!\!s, size (\mathtt{{ty}})}(t_1)$ and $ LV _{\!\!s, size (\mathtt{{ty}})}(t_2)$ have the same sign and either $ LV _{\!\!s, size (\mathtt{{ty}})}(t_1) > LV _{\!\!s, size (\mathtt{{ty}})}(t_2)$ or $ LV _{\!\!s, size (\mathtt{{ty}})}(t_1) \le LV _{\!\!s, size (\mathtt{{ty}})}(t_2)$.
7.
If $\mathtt{{y}}, \mathtt{{z}}\in [0,2^{n}-1]$, then $\mathtt{{y}}\cdot \mathtt{{z}}\in [0,2^{2\cdot n} -2^{n+1} + 1]$. So there are $\mathcal {O}(2^n)$ many potential intervals of size $2^n$ for the result, i.e., we would have to consider $\mathcal {O}(2^n)$ many cases.
8.
However, there is not yet any paper describing Ultimate’s adaption to bitvectors.
9.
Outside of termination analysis, there exist several tools for overflow detection. However, we cannot easily apply such external tools in our approach, since we want to use the result of potential overflows to continue our symbolic execution and analysis.
10.
We use “$\hookrightarrow $” instead of “$\mapsto $” in separation logic, since $ mem \models n_1 \mapsto n_2$ would imply that $ mem (n)$ is undefined for all $n \ne n_1$. This would be inconvenient in our formalization, since $ PT $ usually only contains information about a part of the allocated memory.

References

AProVE. http://aprove.informatik.rwth-aachen.de/eval/Bitvectors/
Chen, H.Y., David, C., Kroening, D., Schrammel, P., Wächter, B.: Synthesising interprocedural bit-precise termination proofs. In: Cohen, M.B., Grunske, L., Whalen, M. (eds.) ASE 2015, pp. 53–64. IEEE (2015)
Google Scholar
Clang compiler. http://clang.llvm.org
Cook, B., Kroening, D., Rümmer, P., Wintersteiger, C.: Ranking function synthesis for bit-vector relations. Formal Methods Syst. Des. 43(1), 93–120 (2013)
Article MATH Google Scholar
David, C., Kroening, D., Lewis, M.: Unrestricted termination and non-termination arguments for bit-vector programs. In: Vitek, J. (ed.) ESOP 2015. LNCS, vol. 9032, pp. 183–204. Springer, Heidelberg (2015)
Chapter Google Scholar
Dutertre, B., de Moura, L.M.: The Yices SMT solver (2006). Tool paper at http://yices.csl.sri.com/tool-paper.pdf
Falke, S., Kapur, D., Sinz, C.: Termination analysis of C programs using compiler intermediate languages. In: Schmidt-Schauß, M. (ed.) RTA 2011. LIPIcs, vol. 10, pp. 41–50 (2011)
Google Scholar
Falke, S., Kapur, D., Sinz, C.: Termination analysis of imperative programs using bitvector arithmetic. In: Joshi, R., Müller, P., Podelski, A. (eds.) VSTTE 2012. LNCS, vol. 7152, pp. 261–277. Springer, Heidelberg (2012)
Chapter Google Scholar
Giesl, J., et al.: Proving termination of programs automatically with $\sf AProVE$. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS, vol. 8562, pp. 184–191. Springer, Heidelberg (2014)
Google Scholar
Heizmann, M., Hoenicke, J., Leike, J., Podelski, A.: Linear ranking for linear lasso programs. In: Van Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 365–380. Springer, Heidelberg (2013)
Chapter Google Scholar
Kroening, D., Sharygina, N., Tsitovich, A., Wintersteiger, C.M.: Termination analysis with compositional transition invariants. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 89–103. Springer, Heidelberg (2010)
Chapter Google Scholar
Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO 2004, pp. 75–88. IEEE (2004)
Google Scholar
de Moura, L., Bjørner, N.S.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)
Chapter Google Scholar
Ströder, T., Giesl, J., Brockschmidt, M., Frohn, F., Fuhs, C., Hensel, J., Schneider-Kamp, P.: Proving termination and memory safety for programs with pointer arithmetic. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS, vol. 8562, pp. 208–223. Springer, Heidelberg (2014)
Google Scholar

Download references

Acknowledgments

We are grateful to M. Heizmann, D. Kroening, M. Lewis, and P. Schrammel for their help with the experiments.

Author information

Authors and Affiliations

LuFG Informatik 2, RWTH Aachen University, Aachen, Germany
Jera Hensel, Jürgen Giesl, Florian Frohn & Thomas Ströder

Authors

Jera Hensel
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Giesl
View author publications
You can also search for this author in PubMed Google Scholar
Florian Frohn
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Ströder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jürgen Giesl .

Editor information

Editors and Affiliations

IMT - School for Advanced Studies, Lucca, Italy
Rocco De Nicola
Institute of Computer Languages, TU Wien, Wien, Austria
Eva Kühn

Appendices

A Separation Logic Semantics of Abstract States

To formalize the semantics of an abstract state $a$, in [14] we introduced a separation logic formula $\langle {a} \rangle _{ SL }$, which extends $\langle {a} \rangle _{ FO }$ by information about the memory (i.e., about $ AL $ and $ PT $). In $\langle {a} \rangle _{ SL }$, we combine the elements of $ AL $ with the separating conjunction “$*$” to express that different allocated memory blocks are disjoint. As usual, $\varphi _1 * \varphi _2$ means that $\varphi _1$ and $\varphi _2$ hold for disjoint parts of the memory. In contrast, the elements of $ PT $ are combined by the ordinary conjunction “$\wedge $”. So $(v_1 \hookrightarrow _{{\mathtt {ty}},i}v_2) \in PT $ does not imply that $v_1$ is different from other addresses in $ PT $. Similarly, we also combine the two formulas resulting from $ AL $ and $ PT $ by “$\wedge $”, as both express different properties of the same addresses.

Definition 8

( $ SL $ Formulas for States). For $v_1,v_2 \in \mathcal {V}_{ sym }$, let $\langle {[\![v_1,\,v_2]\!]} \rangle _{SL} = (\forall x. \exists y. \; (v_1 \le x \le v_2) \Rightarrow (x \hookrightarrow y))$. For any LLVM type ${\mathtt {ty}}$, we define

$$ \langle v_1 \hookrightarrow _{\mathtt {ty},u} v_2\rangle _{SL} = \langle v_1 \hookrightarrow _{size\mathtt {(ty)}} v_2\rangle _{SL}. $$

To handle the two’s complement representation of signed integers, we define $\langle {v_1 \hookrightarrow _{{\mathtt {ty}},s}v_2} \rangle _{SL} =$

where $v_3 \in \mathcal {V}_{ sym }$ is fresh. We assume a little-endian data layout (where least significant bytes are stored in the lowest address). Hence, we define $\langle v_{1}\hookrightarrow _0 v_3\rangle _{SL} = true$ and $\langle v_1 \hookrightarrow _{n+8} v_3\rangle _{SL} = (v_1 \hookrightarrow (v_3\ \mathrm{mod}\ 2^8)) \wedge \langle (v_1 + 1) \hookrightarrow _n (v_3\, { \mathrm div }\, 2^8)\rangle _{SL}$.

A state $a= (p, LV , KB , AL , PT )$ is represented in separation logic by

We use interpretations $( as , mem )$ for the semantics of separation logic (Sect. 2).

Definition 9

(Semantics of Separation Logic). Let $ as \!:\!\mathcal {V}_{\mathcal {P}}\!\rightarrow \!\mathbb {Z}$ be an assignment, $ mem : \mathbb {N}_{> 0}\rightharpoonup \{0,\ldots ,\mathsf {umax}_{8}\}$, and $\varphi $ be a formula. Let $ as (\varphi )$ result from replacing all local variables $\mathtt{x}$ in $\varphi $ by the value $ as (\mathtt{x})$. By construction, local variables $\mathtt{x}$ are never quantified in our formulas. Then we define $( as , mem ) \models \varphi $ iff $ mem \models as (\varphi )$.

We now define $ mem \models \psi $ for formulas $\psi $ that may contain symbolic variables from $\mathcal {V}_{ sym }$. As usual, all free variables $v_1,\ldots ,v_n$ in $\psi $ are implicitly universally quantified, i.e., $ mem \models \psi $ iff $ mem \models \forall v_1,\ldots , v_n. \, \psi $. The semantics of arithmetic operations and predicates as well as of first-order connectives and quantifiers are as usual. In particular, we define $ mem \models \forall v. \, \psi $ iff $ mem \models \sigma (\psi )$ holds for all instantiations $\sigma $ where $\sigma (v) \in \mathbb {Z}$ and $\sigma (w) = w$ for all $w \in \mathcal {V}_{ sym }\setminus \{v\}$.

We still have to define the semantics of $\hookrightarrow $ and $*$ for variable-free formulas. For $n_1,n_2 \in \mathbb {Z}$, let $ mem \models {n_1}\hookrightarrow {n_2}$ hold iff $ mem (n_1) = n_2$ ^{Footnote 10}. The semantics of $*$ is defined as usual in separation logic: For two partial functions $ mem _1, mem _2 : \mathbb {N}_{> 0}\rightharpoonup \mathbb {Z}$, we write $ mem _1 \bot mem _2$ to indicate that the domains of $ mem _1$ and $ mem _2$ are disjoint. If $ mem _1 \bot mem _2$, then $ mem _1 \uplus mem _2$ denotes the union of $ mem _1$ and $ mem _2$. Now $ mem \models \varphi _1 * \varphi _2$ holds iff there exist $ mem _1 \bot mem _2$ such that $ mem = mem _1 \uplus mem _2$ where $ mem _1 \models \varphi _1$ and $ mem _2 \models \varphi _2$.

B Proofs

Proof of Theorem 5 . Since the result of “mod $2^n$” is always in the interval $[0,2^n-1]$, we immediately obtain $\mathsf {sig}_n(t) = ((t + 2^{n-1}) \;\text {mod}\;2^n) - 2^{n-1} \in [0-2^{n-1},2^n-1 - 2^{n-1}] = [-2^{n-1},2^{n-1}-1] = [\mathsf {smin}_{n},\mathsf {smax}_{n}]$. Moreover, we have

$$\begin{aligned} \begin{array}{ll} &{} t \;\text {mod}\;2^n\\ =&{} (t + 2^{n-1} - 2^{n-1}) \;\text {mod}\;2^n\\ =&{} (((t + 2^{n-1}) \;\text {mod}\;2^n) - 2^{n-1}) \;\text {mod}\;2^n\\ = &{} \mathsf {sig}_n(t) \;\text {mod}\;2^n. \end{array} \end{aligned}$$

$\square $

Proof of Theorem 6 . We consider three cases.

Clearly, $u < \ell $ implies $u- \ell < 0$. Moreover, we also have $u - \ell \ge \mathsf {min}- \mathsf {max}= -2^n + 1$, which together implies

$$\begin{aligned} -2^n< u - \ell < 0. \end{aligned}$$

(2)

Thus, we have:

This entails $ u + 1 \le \ell - 1$, i.e., $u - \ell + 1 < 0$. Moreover, we also have $u - \ell + 1 \ge \mathsf {min}- \mathsf {max}+ 1 = -2^n + 2$, which together implies

$$\begin{aligned} -2^n< u - \ell + 1 < 0. \end{aligned}$$

(3)

Note that $\mathsf {max}- \ell \ge 0$ and moreover, $\mathsf {max}- \ell < \mathsf {max}- \mathsf {min}= 2^{n} - 1$, i.e.,

$$\begin{aligned} 0 \le \mathsf {max}- \ell < 2^n. \end{aligned}$$

(4)

In addition, we have

$$\begin{aligned} \mathsf {max}= \mathsf {min}+ 2^n - 1 \le u + 2^n -1. \end{aligned}$$

(5)

Here, we obtain:

$\square $

Proof of Theorem 7 . The proof of Theorem 7 is identical to the proofs of Theorems 10 and 13 in [14]. It relies on the fact that our symbolic execution rules correspond to the actual execution of LLVM when they are applied to concrete states (this also holds for the new bitvector rules of the current paper). So if a concrete state $c$ is represented in the symbolic execution graph, then every LLVM evaluation of $c$ corresponds to a path in the graph. The generation of an ITS from the graph is done in such a way that termination of the ITS implies that there is no such infinite path in the graph. As all integers in the symbolic execution graphs and in the ITSs are still mathematical integers, the construction of ITSs has not changed in the current paper, i.e., the corresponding proof of [14] directly carries over to the present setting.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hensel, J., Giesl, J., Frohn, F., Ströder, T. (2016). Proving Termination of Programs with Bitvector Arithmetic by Symbolic Execution. In: De Nicola, R., Kühn, E. (eds) Software Engineering and Formal Methods. SEFM 2016. Lecture Notes in Computer Science(), vol 9763. Springer, Cham. https://doi.org/10.1007/978-3-319-41591-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-41591-8_16
Published: 23 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41590-1
Online ISBN: 978-3-319-41591-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics