Abstract
In earlier work, we developed an approach for automated termination analysis of C programs with explicit pointer arithmetic, which is based on symbolic execution. However, similar to many other termination techniques, this approach assumed the program variables to range over mathematical integers instead of bitvectors. This eases mathematical reasoning but is unsound in general. In this paper, we extend our approach in order to handle fixed-width bitvector integers. Thus, we present the first technique for termination analysis of C programs that covers both byte-accurate pointer arithmetic and bit-precise modeling of integers. We implemented our approach in the automated termination prover AProVE and evaluate its power by extensive experiments.
Supported by the DFG grant GI 274/6-1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
In C, adding 1 to the maximal unsigned integer results in 0. In contrast, for signed integers, adding 1 to the maximal signed integer results in undefined behavior. However, most C implementations return the minimal signed integer as the result.
- 3.
This LLVM program corresponds to the code obtained from g with the Clang compiler [3]. To ease readability, we wrote variables without “%” in front (i.e., we wrote “j” instead of “% j” as in proper LLVM) and added line numbers.
- 4.
Of course, \(\langle {a} \rangle _{ FO }\) can be extended by more formulas, e.g., on the connection between \(v_2\) and \(v_2'\) if \((v_1 \hookrightarrow _{{\texttt {i{n}}},u}v_2), (v_1 \hookrightarrow _{{\texttt {i{m}}},u}v_2') \in PT \) for \(n < m\). Then we can also handle programs which load an \({\texttt {i{n}}}\) integer from an address where an \({\texttt {i{m}}}\) integer was stored.
- 5.
As usual, mod is defined as follows: For any \(m \in \mathbb {Z}\) and \(n \in \mathbb {N}_{> 0}\), we have \(t = m~\mathrm{mod}~n\) iff \(t \in [0,n-1]\) and there exists a \(k \in \mathbb {Z}\) such that \(t = k \cdot n + m\).
- 6.
Then we would have to check first whether \( LV _{\!\!s, size (\mathtt{{ty}})}(t_1) < 0\) and \( LV _{\!\!s, size (\mathtt{{ty}})}(t_2) \ge 0\). In that case, “icmp ugt ty \(t_1, t_2\)” is true, since the most significant bits of \(t_1\) and \(t_2\) are 1 and 0, respectively. The other cases are \( LV _{\!\!s, size (\mathtt{{ty}})}(t_1) \ge 0 \wedge LV _{\!\!s, size (\mathtt{{ty}})}(t_2) < 0\), and the two cases where \( LV _{\!\!s, size (\mathtt{{ty}})}(t_1)\) and \( LV _{\!\!s, size (\mathtt{{ty}})}(t_2)\) have the same sign and either \( LV _{\!\!s, size (\mathtt{{ty}})}(t_1) > LV _{\!\!s, size (\mathtt{{ty}})}(t_2)\) or \( LV _{\!\!s, size (\mathtt{{ty}})}(t_1) \le LV _{\!\!s, size (\mathtt{{ty}})}(t_2)\).
- 7.
If \(\mathtt{{y}}, \mathtt{{z}}\in [0,2^{n}-1]\), then \(\mathtt{{y}}\cdot \mathtt{{z}}\in [0,2^{2\cdot n} -2^{n+1} + 1]\). So there are \(\mathcal {O}(2^n)\) many potential intervals of size \(2^n\) for the result, i.e., we would have to consider \(\mathcal {O}(2^n)\) many cases.
- 8.
However, there is not yet any paper describing Ultimate’s adaption to bitvectors.
- 9.
Outside of termination analysis, there exist several tools for overflow detection. However, we cannot easily apply such external tools in our approach, since we want to use the result of potential overflows to continue our symbolic execution and analysis.
- 10.
We use “\(\hookrightarrow \)” instead of “\(\mapsto \)” in separation logic, since \( mem \models n_1 \mapsto n_2\) would imply that \( mem (n)\) is undefined for all \(n \ne n_1\). This would be inconvenient in our formalization, since \( PT \) usually only contains information about a part of the allocated memory.
References
AProVE. http://aprove.informatik.rwth-aachen.de/eval/Bitvectors/
Chen, H.Y., David, C., Kroening, D., Schrammel, P., Wächter, B.: Synthesising interprocedural bit-precise termination proofs. In: Cohen, M.B., Grunske, L., Whalen, M. (eds.) ASE 2015, pp. 53–64. IEEE (2015)
Clang compiler. http://clang.llvm.org
Cook, B., Kroening, D., Rümmer, P., Wintersteiger, C.: Ranking function synthesis for bit-vector relations. Formal Methods Syst. Des. 43(1), 93–120 (2013)
David, C., Kroening, D., Lewis, M.: Unrestricted termination and non-termination arguments for bit-vector programs. In: Vitek, J. (ed.) ESOP 2015. LNCS, vol. 9032, pp. 183–204. Springer, Heidelberg (2015)
Dutertre, B., de Moura, L.M.: The Yices SMT solver (2006). Tool paper at http://yices.csl.sri.com/tool-paper.pdf
Falke, S., Kapur, D., Sinz, C.: Termination analysis of C programs using compiler intermediate languages. In: Schmidt-Schauß, M. (ed.) RTA 2011. LIPIcs, vol. 10, pp. 41–50 (2011)
Falke, S., Kapur, D., Sinz, C.: Termination analysis of imperative programs using bitvector arithmetic. In: Joshi, R., Müller, P., Podelski, A. (eds.) VSTTE 2012. LNCS, vol. 7152, pp. 261–277. Springer, Heidelberg (2012)
Giesl, J., et al.: Proving termination of programs automatically with \(\sf AProVE\). In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS, vol. 8562, pp. 184–191. Springer, Heidelberg (2014)
Heizmann, M., Hoenicke, J., Leike, J., Podelski, A.: Linear ranking for linear lasso programs. In: Van Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 365–380. Springer, Heidelberg (2013)
Kroening, D., Sharygina, N., Tsitovich, A., Wintersteiger, C.M.: Termination analysis with compositional transition invariants. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 89–103. Springer, Heidelberg (2010)
Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO 2004, pp. 75–88. IEEE (2004)
de Moura, L., Bjørner, N.S.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)
Ströder, T., Giesl, J., Brockschmidt, M., Frohn, F., Fuhs, C., Hensel, J., Schneider-Kamp, P.: Proving termination and memory safety for programs with pointer arithmetic. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS, vol. 8562, pp. 208–223. Springer, Heidelberg (2014)
Acknowledgments
We are grateful to M. Heizmann, D. Kroening, M. Lewis, and P. Schrammel for their help with the experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Separation Logic Semantics of Abstract States
To formalize the semantics of an abstract state \(a\), in [14] we introduced a separation logic formula \(\langle {a} \rangle _{ SL }\), which extends \(\langle {a} \rangle _{ FO }\) by information about the memory (i.e., about \( AL \) and \( PT \)). In \(\langle {a} \rangle _{ SL }\), we combine the elements of \( AL \) with the separating conjunction “\(*\)” to express that different allocated memory blocks are disjoint. As usual, \(\varphi _1 * \varphi _2\) means that \(\varphi _1\) and \(\varphi _2\) hold for disjoint parts of the memory. In contrast, the elements of \( PT \) are combined by the ordinary conjunction “\(\wedge \)”. So \((v_1 \hookrightarrow _{{\mathtt {ty}},i}v_2) \in PT \) does not imply that \(v_1\) is different from other addresses in \( PT \). Similarly, we also combine the two formulas resulting from \( AL \) and \( PT \) by “\(\wedge \)”, as both express different properties of the same addresses.
Definition 8
( \( SL \) Formulas for States). For \(v_1,v_2 \in \mathcal {V}_{ sym }\), let \(\langle {[\![v_1,\,v_2]\!]} \rangle _{SL} = (\forall x. \exists y. \; (v_1 \le x \le v_2) \Rightarrow (x \hookrightarrow y))\). For any LLVM type \({\mathtt {ty}}\), we define
To handle the two’s complement representation of signed integers, we define \(\langle {v_1 \hookrightarrow _{{\mathtt {ty}},s}v_2} \rangle _{SL} =\)
where \(v_3 \in \mathcal {V}_{ sym }\) is fresh. We assume a little-endian data layout (where least significant bytes are stored in the lowest address). Hence, we define \(\langle v_{1}\hookrightarrow _0 v_3\rangle _{SL} = true\) and \(\langle v_1 \hookrightarrow _{n+8} v_3\rangle _{SL} = (v_1 \hookrightarrow (v_3\ \mathrm{mod}\ 2^8)) \wedge \langle (v_1 + 1) \hookrightarrow _n (v_3\, { \mathrm div }\, 2^8)\rangle _{SL}\).
A state \(a= (p, LV , KB , AL , PT )\) is represented in separation logic by
We use interpretations \(( as , mem )\) for the semantics of separation logic (Sect. 2).
Definition 9
(Semantics of Separation Logic). Let \( as \!:\!\mathcal {V}_{\mathcal {P}}\!\rightarrow \!\mathbb {Z}\) be an assignment, \( mem : \mathbb {N}_{> 0}\rightharpoonup \{0,\ldots ,\mathsf {umax}_{8}\}\), and \(\varphi \) be a formula. Let \( as (\varphi )\) result from replacing all local variables \(\mathtt{x}\) in \(\varphi \) by the value \( as (\mathtt{x})\). By construction, local variables \(\mathtt{x}\) are never quantified in our formulas. Then we define \(( as , mem ) \models \varphi \) iff \( mem \models as (\varphi )\).
We now define \( mem \models \psi \) for formulas \(\psi \) that may contain symbolic variables from \(\mathcal {V}_{ sym }\). As usual, all free variables \(v_1,\ldots ,v_n\) in \(\psi \) are implicitly universally quantified, i.e., \( mem \models \psi \) iff \( mem \models \forall v_1,\ldots , v_n. \, \psi \). The semantics of arithmetic operations and predicates as well as of first-order connectives and quantifiers are as usual. In particular, we define \( mem \models \forall v. \, \psi \) iff \( mem \models \sigma (\psi )\) holds for all instantiations \(\sigma \) where \(\sigma (v) \in \mathbb {Z}\) and \(\sigma (w) = w\) for all \(w \in \mathcal {V}_{ sym }\setminus \{v\}\).
We still have to define the semantics of \(\hookrightarrow \) and \(*\) for variable-free formulas. For \(n_1,n_2 \in \mathbb {Z}\), let \( mem \models {n_1}\hookrightarrow {n_2}\) hold iff \( mem (n_1) = n_2\) Footnote 10. The semantics of \(*\) is defined as usual in separation logic: For two partial functions \( mem _1, mem _2 : \mathbb {N}_{> 0}\rightharpoonup \mathbb {Z}\), we write \( mem _1 \bot mem _2\) to indicate that the domains of \( mem _1\) and \( mem _2\) are disjoint. If \( mem _1 \bot mem _2\), then \( mem _1 \uplus mem _2\) denotes the union of \( mem _1\) and \( mem _2\). Now \( mem \models \varphi _1 * \varphi _2\) holds iff there exist \( mem _1 \bot mem _2\) such that \( mem = mem _1 \uplus mem _2\) where \( mem _1 \models \varphi _1\) and \( mem _2 \models \varphi _2\).
B Proofs
Proof of Theorem 5 . Since the result of “mod \(2^n\)” is always in the interval \([0,2^n-1]\), we immediately obtain \(\mathsf {sig}_n(t) = ((t + 2^{n-1}) \;\text {mod}\;2^n) - 2^{n-1} \in [0-2^{n-1},2^n-1 - 2^{n-1}] = [-2^{n-1},2^{n-1}-1] = [\mathsf {smin}_{n},\mathsf {smax}_{n}]\). Moreover, we have
\(\square \)
Proof of Theorem 6 . We consider three cases.
Clearly, \(u < \ell \) implies \(u- \ell < 0\). Moreover, we also have \(u - \ell \ge \mathsf {min}- \mathsf {max}= -2^n + 1\), which together implies
Thus, we have:
This entails \( u + 1 \le \ell - 1\), i.e., \(u - \ell + 1 < 0\). Moreover, we also have \(u - \ell + 1 \ge \mathsf {min}- \mathsf {max}+ 1 = -2^n + 2\), which together implies
Note that \(\mathsf {max}- \ell \ge 0\) and moreover, \(\mathsf {max}- \ell < \mathsf {max}- \mathsf {min}= 2^{n} - 1\), i.e.,
In addition, we have
Here, we obtain:
\(\square \)
Proof of Theorem 7 . The proof of Theorem 7 is identical to the proofs of Theorems 10 and 13 in [14]. It relies on the fact that our symbolic execution rules correspond to the actual execution of LLVM when they are applied to concrete states (this also holds for the new bitvector rules of the current paper). So if a concrete state \(c\) is represented in the symbolic execution graph, then every LLVM evaluation of \(c\) corresponds to a path in the graph. The generation of an ITS from the graph is done in such a way that termination of the ITS implies that there is no such infinite path in the graph. As all integers in the symbolic execution graphs and in the ITSs are still mathematical integers, the construction of ITSs has not changed in the current paper, i.e., the corresponding proof of [14] directly carries over to the present setting.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Hensel, J., Giesl, J., Frohn, F., Ströder, T. (2016). Proving Termination of Programs with Bitvector Arithmetic by Symbolic Execution. In: De Nicola, R., Kühn, E. (eds) Software Engineering and Formal Methods. SEFM 2016. Lecture Notes in Computer Science(), vol 9763. Springer, Cham. https://doi.org/10.1007/978-3-319-41591-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-41591-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41590-1
Online ISBN: 978-3-319-41591-8
eBook Packages: Computer ScienceComputer Science (R0)