Abstract Interpretation of LLVM with a Region-Based Memory Model

Gurfinkel, Arie; Navas, Jorge A.

doi:10.1007/978-3-030-95561-8_8

Arie Gurfinkel¹² &
Jorge A. Navas¹³

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 13124))

Included in the following conference series:

573 Accesses
5 Citations

Abstract

Static analysis of low-level programs (C or LLVM) requires modeling memory. To strike a good balance between precision and performance, most static analyzers rely on the C memory model in which a pointer is a numerical offset within a memory object. Finite partitioning of the address space is a common abstraction. For instance, the allocation-site abstraction creates partitions by merging all objects created at the same allocation site. Recency abstraction refines the allocation-site abstraction by distinguishing the most recent allocated memory object from the previous ones. Unfortunately, these abstractions are not often precise enough to infer invariants that are expressed over the contents of dynamically allocated data-structures such as linked lists. In those cases, more expensive abstractions such as shapes that consider connectivity patterns between memory locations are often needed.

Instead of resorting to expensive memory abstractions, we propose a new memory model, called region-based memory model (RBMM). RBMM is a refinement of the C memory model in which pointers have an extra component called regions. Thus, a memory object can spawn multiple regions which can greatly limit aliasing since regions are pairwise disjoint. Since RBMM requires that each memory instruction refers explicitly to a region, we first present a new intermediate representation (IR) based on regions which is the input of our abstract interpreter Crab. Second, we show how abstractions such as allocation-site and recency can be easily adapted to RBMM. Third, we evaluate Crab using our new IR and a simple allocation-site abstraction on widely-used C projects.

Jorge A. Navas has been supported in part by NSF grant 1816936. We also acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this paper, we use the terms smashing and allocation-site interchangeably.
2.
Usually, the term reference is used as an alias for an address that precludes pointer arithmetic. We do not place such a restriction. We use reference as an alternative to pointer, to stress that a reference belongs to a statically known region.
3.
Memory is word-addressable if all reads and writes access to the same, fixed amount of bytes (the CPU word length). Note that any C program can be ported to word-addressable memory by adding extra instructions that fill a word with empty bytes before a memory write or extract some bytes from a word after a memory read.
4.
Informally, casting any pointer to a char* type is allowed but the opposite is not. Moreover, signedness and constness do not affect the strict aliasing rules.
5.
Crab proves the four properties using smashing abstraction and unrolling the loop one iteration. Finite loop unrolling would not help Infer and IKOS.
6.
For instance, integer casts and casts between Boolean and integers are omitted.
7.
Note that the cell \(\langle 0,0 \rangle \) can either mean the null reference or the integer 0.
8.
We abuse notation and use \([\![.]\!](.)\) to evaluate integer and Boolean expressions defined as usual: \([\![c]\!](\sigma ) = c\) for constant c, \([\![v]\!](\langle \_,\_,numEnv,\_,\_,\_\rangle ) = numEnv(v)\) for variable v, \([\![e_1 + e_2]\!](\sigma ) = [\![e_1]\!](\sigma ) + [\![e_2]\!](\sigma )\) for expressions \(e_1\) and \(e_2\), etc.
9.
We use the notation \(\lambda x. \bot \) to represent the undefined function.
10.
Publicly available at https://github.com/seahorn/crab.
11.
SeaDsa, Clam, and Crab are also integrated into the SeaHorn verification framework [13].
12.
Publicly available at https://github.com/seahorn/clam.
13.
https://github.com/google/brotli.

References

Balakrishnan, G., Reps, T.: Recency-abstraction for heap-allocated storage. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 221–239. Springer, Heidelberg (2006). https://doi.org/10.1007/11823230_15
Chapter Google Scholar
Blanchet, B., et al.: A static analyzer for large safety-critical software. In: PLDI, pp. 196–207 (2003)
Google Scholar
Blazy, S., Bühler, D., Yakobowski, B.: Structuring abstract interpreters through state and value abstractions. In: Bouajjani, A., Monniaux, D. (eds.) VMCAI 2017. LNCS, vol. 10145, pp. 112–130. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52234-0_7
Chapter MATH Google Scholar
Bouillaguet, Q., Bobot, F., Sighireanu, M., Yakobowski, B.: Exploiting pointer analysis in memory models for deductive verification. In: Enea, C., Piskac, R. (eds.) VMCAI 2019. LNCS, vol. 11388, pp. 160–182. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11245-5_8
Chapter MATH Google Scholar
Brat, G., Navas, J.A., Shi, N., Venet, A.: IKOS: a framework for static analysis based on abstract interpretation. In: Giannakopoulou, D., Salaün, G. (eds.) SEFM 2014. LNCS, vol. 8702, pp. 271–277. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10431-7_20
Chapter Google Scholar
Calcagno, C., Distefano, D.: Infer: an automatic program verifier for memory safety of C programs. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 459–465. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_33
Chapter Google Scholar
Chow, F., Chan, S., Liu, S.-M., Lo, R., Streich, M.: Effective representation of aliases and indirect memory operations in SSA form. In: Gyimóthy, T. (ed.) CC 1996. LNCS, vol. 1060, pp. 253–267. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61053-7_66
Chapter Google Scholar
Conway, C.L., Dams, D., Namjoshi, K.S., Barrett, C.: Pointer analysis, conditional soundness, and proving the absence of errors. In: Alpuente, M., Vidal, G. (eds.) SAS 2008. LNCS, vol. 5079, pp. 62–77. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69166-2_5
Chapter MATH Google Scholar
Cornish, J.R.M., Gange, G., Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Analyzing array manipulating programs by program transformation. In: Proietti, M., Seki, H. (eds.) LOPSTR 2014. LNCS, vol. 8981, pp. 3–20. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17822-6_1
Chapter MATH Google Scholar
Cuoq, P., Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.: Frama-C: a software analysis perspective. In: Eleftherakis, G., Hinchey, M., Holcombe, M. (eds.) SEFM 2012. LNCS, vol. 7504, pp. 233–247. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33826-7_16
Chapter Google Scholar
Filliâtre, J.-C., Marché, C.: Multi-prover verification of C programs. In: Davies, J., Schulte, W., Barnett, M. (eds.) ICFEM 2004. LNCS, vol. 3308, pp. 15–29. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30482-1_10
Chapter Google Scholar
Gopan, D.: Numeric program analysis techniques with applications to array analysis and library summarization. Ph.D. thesis, University of Wisconsin (2007)
Google Scholar
Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn verification framework. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 343–361. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_20
Chapter Google Scholar
Gurfinkel, A., Navas, J.A.: A context-sensitive memory model for verification of C/C++ programs. In: Ranzato, F. (ed.) SAS 2017. LNCS, vol. 10422, pp. 148–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66706-5_8
Chapter Google Scholar
Illous, H., Lemerre, M., Rival, X.: A relational shape abstract domain. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 212–229. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57288-8_15
Chapter Google Scholar
Jourdan, J., Laporte, V., Blazy, S., Leroy, X., Pichardie, D.: A formally-verified C static analyzer. In: POPL, pp. 247–259 (2015)
Google Scholar
Journault, M., Miné, A., Monat, R., Ouadjaout, A.: Combinations of reusable abstract domains for a multilingual static analyzer. In: Chakraborty, S., Navas, J.A. (eds.) VSTTE 2019. LNCS, vol. 12031, pp. 1–18. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41600-3_1
Chapter Google Scholar
Journault, M., Miné, A., Ouadjaout, A.: Modular static analysis of string manipulations in C programs. In: Podelski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 243–262. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99725-4_16
Chapter Google Scholar
Kuderski, J., Navas, J.A., Gurfinkel, A.: Unification-based pointer analysis without oversharing. In: FMCAD, pp. 37–45 (2019)
Google Scholar
Miné, A.: Field-sensitive value analysis of embedded C programs with union types and pointer arithmetics. In: LCTES, pp. 54–63 (2006)
Google Scholar
Moy, Y.: Automatic modular static safety checking for C programs. Ph.D. thesis, Université Paris-Sud (2009)
Google Scholar
Rakamarić, Z., Emmi, M.: SMACK: decoupling source language details from verifier implementations. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 106–113. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_7
Chapter Google Scholar
Rakamarić, Z., Hu, A.J.: A scalable memory model for low-level code. In: Jones, N.D., Müller-Olm, M. (eds.) VMCAI 2009. LNCS, vol. 5403, pp. 290–304. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-93900-9_24
Chapter Google Scholar
Sui, Y., Xue, J.: SVF: interprocedural static value-flow analysis in LLVM. In: CC, pp. 265–266 (2016)
Google Scholar
Venet, A.: A scalable nonuniform pointer analysis for embedded programs. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 149–164. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27864-1_13
Chapter MATH Google Scholar
Wang, W., Barrett, C., Wies, T.: Cascade 2.0. In: McMillan, K.L., Rival, X. (eds.) VMCAI 2014. LNCS, vol. 8318, pp. 142–160. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54013-4_9
Chapter Google Scholar
Warren, R., Hermenegildo, M.V., Debray, S.K.: On the practicality of global flow analysis of logic programs. In: ICLP, pp. 684–699 (1988)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Waterloo, Waterloo, Canada
Arie Gurfinkel
SRI International, Menlo Park, USA
Jorge A. Navas

Authors

Arie Gurfinkel
View author publications
You can also search for this author in PubMed Google Scholar
Jorge A. Navas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Arie Gurfinkel or Jorge A. Navas .

Editor information

Editors and Affiliations

Graz University of Technology, Graz, Austria
Roderick Bloem
CISPA, Helmholtz Center for Information Security, Saarbrücken, Germany
Rayna Dimitrova
Massachusetts Institute of Technology, Cambridge, MA, USA
Chuchu Fan
Faculty of Informatics, Università della Svizzera italiana, Lugano, Switzerland
Natasha Sharygina

Appendices

A Motivating Example from Sect. 2 in CrabIR

In this section, we show the translation of our motivating example from Sect. 2 to CrabIR. For readability purposes, variable names have been renamed to match those names used in the C program. We have also omitted types when they can be inferred from the context. We show two functions: mk_list in Fig. 11 and main in Fig. 12. The function init is translated in a very similar manner.

B Auxiliary Functions for Concrete Semantics

(See Fig. 14).

C Concrete Semantics of reftoint and inttoref

The concrete semantics of reftoint and inttoref is shown in Fig. 15. It is straightforward by delegating to the functions \(\mathsf {cell2Addr} \) and \(\mathsf {addr2Cell} \). The most interesting part occurs in \(ref := \texttt {inttoref}(rgn, x)\) where the program state (\(rgnAddrs\)) must add the new reference ref to the set of owned addresses by the region rgn.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gurfinkel, A., Navas, J.A. (2022). Abstract Interpretation of LLVM with a Region-Based Memory Model. In: Bloem, R., Dimitrova, R., Fan, C., Sharygina, N. (eds) Software Verification. NSV VSTTE 2021 2021. Lecture Notes in Computer Science(), vol 13124. Springer, Cham. https://doi.org/10.1007/978-3-030-95561-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-95561-8_8
Published: 22 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95560-1
Online ISBN: 978-3-030-95561-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract Interpretation of LLVM with a Region-Based Memory Model