Modular, Correct Compilation with Automatic Soundness Proofs

Steinhöfel, Dominic; Hähnle, Reiner

doi:10.1007/978-3-030-03418-4_25

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11244))

Included in the following conference series:

International Symposium on Leveraging Applications of Formal Methods

1152 Accesses
7 Citations

Abstract

Formal verification of compiler correctness requires substantial effort. A particular challenge is lack of modularity and automation. Any change or update to the compiler can render existing proofs obsolete and cause considerable manual proof effort. We propose a framework for automatically proving the correctness of compilation rules based on simultaneous symbolic execution for the source and target language. The correctness of the whole system follows from the correctness of each compilation rule. To support a new source or target language it is sufficient to formalize that language in terms of symbolic execution, while the corresponding formalization of its counterpart can be re-used. The correctness of translation rules can be checked automatically. Our approach is based on a reduction of correctness assertions to formulas in a program logic capable of symbolic execution of abstract programs. We instantiate the framework for compilation from Java to LLVM IR and provide a symbolic execution system for a subset of LLVM IR.

This work was funded by the Hessian LOEWE initiative within the Software-Factory 4.0 project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Compiler verification with interactive proof assistants requires this as well, even separate formalizations for the target and source languages.
2.
When using Theorem 1 to automatically prove the soundness of translation rules with a deductive verification system such as , this aspect can and must be precisely modeled. This can be done by tracking for each abstract program an (abstract) set of variables it depends on.

References

Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M.: Deductive Software Verification - The KeY Book. LNCS, vol. 10001. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49812-6
Book Google Scholar
Augustsson, L.: A compiler for lazy ML. In: Proceedings of LFP 1984. ACM (1984)
Google Scholar
Breebaart, L.: Rule-based compilation of data parallel programs. Ph.D. thesis, Delft University of Technology (2003)
Google Scholar
Bubel, R., Hähnle, R., Pelevina, M.: Fully abstract operation contracts. In: Margaria, T., Steffen, B. (eds.) ISoLA 2014. LNCS, vol. 8803, pp. 120–134. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45231-8_9
Chapter Google Scholar
Burstall, R.M.: Program proving as hand simulation with a little induction. In: Information Processing, pp. 308–312. Elsevier (1974)
Google Scholar
Cadar, C., Dunbar, D., et al.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: 8th USENIX Conference on OSDI, pp. 209–224. USENIX Association, Berkeley (2008)
Google Scholar
Hähnle, R., Schaefer, I., Bubel, R.: Reuse in software verification by abstract method calls. In: Bonacina, M.P. (ed.) CADE 2013. LNCS (LNAI), vol. 7898, pp. 300–314. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2_21
Chapter Google Scholar
Ji, R.: Sound program transformation based on symbolic execution and deduction. Ph.D. thesis, Technische Universität Darmstadt (2014)
Google Scholar
Ji, R., Bubel, R.: Program transformation and compilation. In: Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M. (eds.) Deductive Software Verification – The KeY Book. LNCS, vol. 10001, pp. 473–492. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49812-6_14
Chapter Google Scholar
Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual machine, and compiler. ACM Trans. PLS 28(4), 619–695 (2006)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO 2004, p. 75. IEEE Computer Society (2004)
Google Scholar
Le, V., Afshari, M., et al.: Compiler validation via equivalence modulo inputs. In: Proceedings of 35th ACM SIGPLAN Conference on PLDI, pp. 216–226. ACM (2014)
Google Scholar
Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)
Article Google Scholar
Menendez, D., Nagarakatte, S., Gupta, A.: Alive-FP: automated verification of floating point based peephole optimizations in LLVM. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 317–337. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53413-7_16
Chapter Google Scholar
Saabas, A., Uustalu, T.: A compositional natural semantics and Hoare logic for low-level languages. Theor. Comput. Sci. 373(3), 273–302 (2007)
Article MathSciNet Google Scholar
Scheurer, D., Hähnle, R., Bubel, R.: A general lattice model for merging symbolic execution branches. In: Ogata, K., Lawford, M., Liu, S. (eds.) ICFEM 2016. LNCS, vol. 10009, pp. 57–73. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47846-3_5
Chapter Google Scholar
Steinhöfel, D., Wasser, N.: A new invariant rule for the analysis of loops with non-standard control flows. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 279–294. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66845-1_18
Chapter Google Scholar
Tan, Y.K., Myreen, M.O., et al.: A new verified compiler backend for CakeML. In: Proceedings of 21st International Conference on Functional Programming, pp. 60–73. ACM (2016)
Google Scholar
Zhao, J., Nagarakatte, S., et al.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of 39th ACM SIGPLAN-SIGACT Symposium on POPL, pp. 427–440. ACM (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, TU Darmstadt, Darmstadt, Germany
Dominic Steinhöfel & Reiner Hähnle

Authors

Dominic Steinhöfel
View author publications
You can also search for this author in PubMed Google Scholar
Reiner Hähnle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominic Steinhöfel .

Editor information

Editors and Affiliations

University of Limerick, Limerick, Ireland
Tiziana Margaria
TU Dortmund, Dortmund, Germany
Bernhard Steffen

Appendices

Appendix

This appendix provides additional material for the reader, including a sketch of the integration of state merging techniques into our approach, an extended compilation example, and a proof of Theorem 1.

A State Merging

Our SE transition relation defined in Sect. 3 outputs tree structures: all states, but the root, have exactly one predecessor. State Merging [16] has originally been invented to mitigate the state explosion problem of SE. It permits to merge symbolic states with identical program counter as arising, for example, after the execution of the then and else branch of an statement, thus reducing the state space. Definition 2 can be extended to transition relations with state merging by applying the same conditions reversely for “merging” transitions:

Definition 8

(Soundness of SE with State Merging). Let $\delta _m$ be an SE transition relation with state merging (some states have more than one predecessor). We call $\delta _m$ sound iff the transition relation

$$ \begin{aligned} \delta ':=~&\{(i,o)\in \delta _m:\lnot \exists {}i'\ne {}i,(i',o)\in \delta _m\}\,\cup \\&\{(o,i):(i,o)\in \delta _m\wedge \exists {}i'\ne {}i,(i',o)\in \delta _m\} \end{aligned} $$

is sound in the sense of Definition 2.

Translation rule mergeTransl (Fig. 7) merges two branches with identical program counters. Without it, compiling programs with branching statements leads to duplicated code, for example, in rule ifElseTransl. The double horizontal line below the rule’s premise indicates its special status: it is the only rule with more than one conclusion. For constructing the merged symbolic store, there is more than one option [16]. In the simplest case, we create a fully precise value summary by means of an if-then-else term.

B Factorial Example

The program in Fig. 8 (p. x) computes the factorial of variable x, contained in variable res after termination. Figure 12 shows the SET of the dual SE states corresponding to the translation of the Java to LLVM IR code. Constructing the tree works by first symbolically executing the Java program. The Java SET has the same structure, without the LLVM IR parts and observable variables sets.

In the second phase, we apply translation rules from the leaves to the root, obtaining the dual SE states. The root of the tree contains the compiled LLVM IR program $q_ result $ depicted in Fig. 9 (with dropped loop scopes). The tree is slightly simplified: we sometimes abbreviate programs with “$\dots $” and combine several translation rules into one, for example, in the translation of Java’s , which in the course of SE would be simplified to SSA form first. Two subtrees have been factored out for better readability (Figures 10 and 11).

The example also shows an artifact occurring due to our rule-based compilation: The basic block with label is never reached since it is not targeted by jumps, and the block before jumps outside the loop (this is the translation of the instruction). The instruction has been added because of the general definition of the translation rule; if the else block did not jump out of the loop, it would also make sense. Removal of such dead code, as well as other optimizations, are subject to subsequent post-processing steps and not within the scope of this paper.

C Proofs

Proof

(Theorem 1). We must prove that a translation rule is sound (Definition 6) if $\mathcal {F}=\big (\bigwedge _{i=1,\dots ,m} F ' {( pr_i )}\big )\rightarrow F ' {(c)}$ is valid. By Proposition 1, a dual SE state is valid iff its justifying formula is valid, and the shape of $\mathcal {F}$ directly encodes Definition 6. Hence, it suffices to show that the validity of justifying formulas $ F ' {(s')}$ for abstract states $s'$ implies the validity of $ F {(s)} $ for all concrete instantiations s of $s'$. Let be an abstract SE state where , $C^a$ and are symbolic and $p_j^ abs $, $p_ IR ^ abs $ contain abstract placeholder symbols. An instantiation s has the shape , where $p_j$ and $p_ IR $ result from $p_j^ abs $ and $p_ IR ^ abs $ by replacing placeholders with concrete programs. We have to show the following implication:

Since the validity of implies the validity of , it suffices to show the following, stronger property:

Since abstract execution is sound (Observation 1), the premise of (*) holds for all concrete contracts substituted for the abstract ones induced by the placeholders in the programs; this means that in particular, we can substitute $p_j$/$p_ IR $ for $p_j^ abs $/$p_ IR ^ abs $. Let , be the updates resulting from SE of the concrete programs (if SE splits, we can obtain summaries by state merging, see Appendix A).

Now there are two possibilities: Either, is an atom, or it is a guarded conjunction (and s constitutes the conclusion of a dual SE rule application). We focus on the more complicated second case. Let be the (concrete) set of observable variables of s, $C_i$ the path conditions of the premisses of the rule that has s in its conclusion, and the variables occurring in any symbolic store or path condition of those premises. Then, after instantiation, (*) expands to:

Generally, can contain variables not occurring in , due to over-approximation. On the other hand, if contains variables not in , those are not in the scope of the currently investigated rule, and are therefore taken out of consideration here.^{Footnote 2} Additionally, we perform a strengthening by dropping the updates in the conclusion (this can be regarded as an abstraction step, since we discharge information). W.l.o.g., consider two different variables x, y and ; their right-hand sides in the updates and are , , etc. Under the assumption that x and y do not occur in the $C_i$ (see Remark 1), the problem simplifies to:

For simplicity, we assume that the terms do not contain program variables. The equivalence in ($\triangle $) is valid iff for all interpretations $\mathcal {I}$, it holds that

Let $\mathcal {I} _0$ be an arbitrary interpretation; we show that ($\Delta ^{ sem }$) holds for $\mathcal {I} _0$. All $C_i$ are mutually exclusive due to the restriction built into our semantics (Definition 2), i.e. $C_i\leftrightarrow \bigwedge _{k\ne i}\lnot C_k$ for all $i=1,\dots ,n$. Since ($\dagger $) is valid, for any interpretation $\mathcal {I}$ there is exactly one i such that

We choose an interpretation $\mathcal {I} _1$ that (i) interprets in the same way as $\mathcal {I} _0$, (ii) satisfies and, at the same time, (iii) satisfies ($\dagger ^{{sem}}$) for some i. By definition of $\mathcal {I} _1$ we have that implies , and, similarly, for , . Hence, ($\Delta ^{ sem }$) holds in $\mathcal {I} _0$. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Steinhöfel, D., Hähnle, R. (2018). Modular, Correct Compilation with Automatic Soundness Proofs. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. Modeling. ISoLA 2018. Lecture Notes in Computer Science(), vol 11244. Springer, Cham. https://doi.org/10.1007/978-3-030-03418-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-03418-4_25
Published: 29 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03417-7
Online ISBN: 978-3-030-03418-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics