Abstract
Formal verification of compiler correctness requires substantial effort. A particular challenge is lack of modularity and automation. Any change or update to the compiler can render existing proofs obsolete and cause considerable manual proof effort. We propose a framework for automatically proving the correctness of compilation rules based on simultaneous symbolic execution for the source and target language. The correctness of the whole system follows from the correctness of each compilation rule. To support a new source or target language it is sufficient to formalize that language in terms of symbolic execution, while the corresponding formalization of its counterpart can be re-used. The correctness of translation rules can be checked automatically. Our approach is based on a reduction of correctness assertions to formulas in a program logic capable of symbolic execution of abstract programs. We instantiate the framework for compilation from Java to LLVM IR and provide a symbolic execution system for a subset of LLVM IR.
This work was funded by the Hessian LOEWE initiative within the Software-Factory 4.0 project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Compiler verification with interactive proof assistants requires this as well, even separate formalizations for the target and source languages.
- 2.
When using Theorem 1 to automatically prove the soundness of translation rules with a deductive verification system such as , this aspect can and must be precisely modeled. This can be done by tracking for each abstract program an (abstract) set of variables it depends on.
References
Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M.: Deductive Software Verification - The KeY Book. LNCS, vol. 10001. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49812-6
Augustsson, L.: A compiler for lazy ML. In: Proceedings of LFP 1984. ACM (1984)
Breebaart, L.: Rule-based compilation of data parallel programs. Ph.D. thesis, Delft University of Technology (2003)
Bubel, R., Hähnle, R., Pelevina, M.: Fully abstract operation contracts. In: Margaria, T., Steffen, B. (eds.) ISoLA 2014. LNCS, vol. 8803, pp. 120–134. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45231-8_9
Burstall, R.M.: Program proving as hand simulation with a little induction. In: Information Processing, pp. 308–312. Elsevier (1974)
Cadar, C., Dunbar, D., et al.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: 8th USENIX Conference on OSDI, pp. 209–224. USENIX Association, Berkeley (2008)
Hähnle, R., Schaefer, I., Bubel, R.: Reuse in software verification by abstract method calls. In: Bonacina, M.P. (ed.) CADE 2013. LNCS (LNAI), vol. 7898, pp. 300–314. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2_21
Ji, R.: Sound program transformation based on symbolic execution and deduction. Ph.D. thesis, Technische Universität Darmstadt (2014)
Ji, R., Bubel, R.: Program transformation and compilation. In: Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M. (eds.) Deductive Software Verification – The KeY Book. LNCS, vol. 10001, pp. 473–492. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49812-6_14
Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual machine, and compiler. ACM Trans. PLS 28(4), 619–695 (2006)
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO 2004, p. 75. IEEE Computer Society (2004)
Le, V., Afshari, M., et al.: Compiler validation via equivalence modulo inputs. In: Proceedings of 35th ACM SIGPLAN Conference on PLDI, pp. 216–226. ACM (2014)
Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)
Menendez, D., Nagarakatte, S., Gupta, A.: Alive-FP: automated verification of floating point based peephole optimizations in LLVM. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 317–337. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53413-7_16
Saabas, A., Uustalu, T.: A compositional natural semantics and Hoare logic for low-level languages. Theor. Comput. Sci. 373(3), 273–302 (2007)
Scheurer, D., Hähnle, R., Bubel, R.: A general lattice model for merging symbolic execution branches. In: Ogata, K., Lawford, M., Liu, S. (eds.) ICFEM 2016. LNCS, vol. 10009, pp. 57–73. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47846-3_5
Steinhöfel, D., Wasser, N.: A new invariant rule for the analysis of loops with non-standard control flows. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 279–294. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66845-1_18
Tan, Y.K., Myreen, M.O., et al.: A new verified compiler backend for CakeML. In: Proceedings of 21st International Conference on Functional Programming, pp. 60–73. ACM (2016)
Zhao, J., Nagarakatte, S., et al.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of 39th ACM SIGPLAN-SIGACT Symposium on POPL, pp. 427–440. ACM (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
This appendix provides additional material for the reader, including a sketch of the integration of state merging techniques into our approach, an extended compilation example, and a proof of Theorem 1.
A State Merging
Our SE transition relation defined in Sect. 3 outputs tree structures: all states, but the root, have exactly one predecessor. State Merging [16] has originally been invented to mitigate the state explosion problem of SE. It permits to merge symbolic states with identical program counter as arising, for example, after the execution of the then and else branch of an statement, thus reducing the state space. Definition 2 can be extended to transition relations with state merging by applying the same conditions reversely for “merging” transitions:
Definition 8
(Soundness of SE with State Merging). Let \(\delta _m\) be an SE transition relation with state merging (some states have more than one predecessor). We call \(\delta _m\) sound iff the transition relation
is sound in the sense of Definition 2.
Translation rule mergeTransl (Fig. 7) merges two branches with identical program counters. Without it, compiling programs with branching statements leads to duplicated code, for example, in rule ifElseTransl. The double horizontal line below the rule’s premise indicates its special status: it is the only rule with more than one conclusion. For constructing the merged symbolic store, there is more than one option [16]. In the simplest case, we create a fully precise value summary by means of an if-then-else term.
B Factorial Example
The program in Fig. 8 (p. x) computes the factorial of variable x, contained in variable res after termination. Figure 12 shows the SET of the dual SE states corresponding to the translation of the Java to LLVM IR code. Constructing the tree works by first symbolically executing the Java program. The Java SET has the same structure, without the LLVM IR parts and observable variables sets.
In the second phase, we apply translation rules from the leaves to the root, obtaining the dual SE states. The root of the tree contains the compiled LLVM IR program \(q_ result \) depicted in Fig. 9 (with dropped loop scopes). The tree is slightly simplified: we sometimes abbreviate programs with “\(\dots \)” and combine several translation rules into one, for example, in the translation of Java’s , which in the course of SE would be simplified to SSA form first. Two subtrees have been factored out for better readability (Figures 10 and 11).
The example also shows an artifact occurring due to our rule-based compilation: The basic block with label is never reached since it is not targeted by jumps, and the block before jumps outside the loop (this is the translation of the instruction). The instruction has been added because of the general definition of the translation rule; if the else block did not jump out of the loop, it would also make sense. Removal of such dead code, as well as other optimizations, are subject to subsequent post-processing steps and not within the scope of this paper.
C Proofs
Proof
(Theorem 1). We must prove that a translation rule is sound (Definition 6) if \(\mathcal {F}=\big (\bigwedge _{i=1,\dots ,m} F ' {( pr_i )}\big )\rightarrow F ' {(c)}\) is valid. By Proposition 1, a dual SE state is valid iff its justifying formula is valid, and the shape of \(\mathcal {F}\) directly encodes Definition 6. Hence, it suffices to show that the validity of justifying formulas \( F ' {(s')}\) for abstract states \(s'\) implies the validity of \( F {(s)} \) for all concrete instantiations s of \(s'\). Let be an abstract SE state where , \(C^a\) and are symbolic and \(p_j^ abs \), \(p_ IR ^ abs \) contain abstract placeholder symbols. An instantiation s has the shape , where \(p_j\) and \(p_ IR \) result from \(p_j^ abs \) and \(p_ IR ^ abs \) by replacing placeholders with concrete programs. We have to show the following implication:
Since the validity of implies the validity of , it suffices to show the following, stronger property:
Since abstract execution is sound (Observation 1), the premise of (*) holds for all concrete contracts substituted for the abstract ones induced by the placeholders in the programs; this means that in particular, we can substitute \(p_j\)/\(p_ IR \) for \(p_j^ abs \)/\(p_ IR ^ abs \). Let , be the updates resulting from SE of the concrete programs (if SE splits, we can obtain summaries by state merging, see Appendix A).
Now there are two possibilities: Either, is an atom, or it is a guarded conjunction (and s constitutes the conclusion of a dual SE rule application). We focus on the more complicated second case. Let be the (concrete) set of observable variables of s, \(C_i\) the path conditions of the premisses of the rule that has s in its conclusion, and the variables occurring in any symbolic store or path condition of those premises. Then, after instantiation, (*) expands to:
Generally, can contain variables not occurring in , due to over-approximation. On the other hand, if contains variables not in , those are not in the scope of the currently investigated rule, and are therefore taken out of consideration here.Footnote 2 Additionally, we perform a strengthening by dropping the updates in the conclusion (this can be regarded as an abstraction step, since we discharge information). W.l.o.g., consider two different variables x, y and ; their right-hand sides in the updates and are , , etc. Under the assumption that x and y do not occur in the \(C_i\) (see Remark 1), the problem simplifies to:
For simplicity, we assume that the terms do not contain program variables. The equivalence in (\(\triangle \)) is valid iff for all interpretations \(\mathcal {I}\), it holds that
Let \(\mathcal {I} _0\) be an arbitrary interpretation; we show that (\(\Delta ^{ sem }\)) holds for \(\mathcal {I} _0\). All \(C_i\) are mutually exclusive due to the restriction built into our semantics (Definition 2), i.e. \(C_i\leftrightarrow \bigwedge _{k\ne i}\lnot C_k\) for all \(i=1,\dots ,n\). Since (\(\dagger \)) is valid, for any interpretation \(\mathcal {I}\) there is exactly one i such that
We choose an interpretation \(\mathcal {I} _1\) that (i) interprets in the same way as \(\mathcal {I} _0\), (ii) satisfies and, at the same time, (iii) satisfies (\(\dagger ^{{sem}}\)) for some i. By definition of \(\mathcal {I} _1\) we have that implies , and, similarly, for , . Hence, (\(\Delta ^{ sem }\)) holds in \(\mathcal {I} _0\). \(\square \)
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Steinhöfel, D., Hähnle, R. (2018). Modular, Correct Compilation with Automatic Soundness Proofs. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. Modeling. ISoLA 2018. Lecture Notes in Computer Science(), vol 11244. Springer, Cham. https://doi.org/10.1007/978-3-030-03418-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-03418-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03417-7
Online ISBN: 978-3-030-03418-4
eBook Packages: Computer ScienceComputer Science (R0)