Skip to main content

Abstract Interpretation of LLVM with a Region-Based Memory Model

  • Conference paper
  • First Online:
Software Verification (NSV 2021, VSTTE 2021)

Abstract

Static analysis of low-level programs (C or LLVM) requires modeling memory. To strike a good balance between precision and performance, most static analyzers rely on the C memory model in which a pointer is a numerical offset within a memory object. Finite partitioning of the address space is a common abstraction. For instance, the allocation-site abstraction creates partitions by merging all objects created at the same allocation site. Recency abstraction refines the allocation-site abstraction by distinguishing the most recent allocated memory object from the previous ones. Unfortunately, these abstractions are not often precise enough to infer invariants that are expressed over the contents of dynamically allocated data-structures such as linked lists. In those cases, more expensive abstractions such as shapes that consider connectivity patterns between memory locations are often needed.

Instead of resorting to expensive memory abstractions, we propose a new memory model, called region-based memory model (RBMM). RBMM is a refinement of the C memory model in which pointers have an extra component called regions. Thus, a memory object can spawn multiple regions which can greatly limit aliasing since regions are pairwise disjoint. Since RBMM requires that each memory instruction refers explicitly to a region, we first present a new intermediate representation (IR) based on regions which is the input of our abstract interpreter Crab. Second, we show how abstractions such as allocation-site and recency can be easily adapted to RBMM. Third, we evaluate Crab using our new IR and a simple allocation-site abstraction on widely-used C projects.

Jorge A. Navas has been supported in part by NSF grant 1816936. We also acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this paper, we use the terms smashing and allocation-site interchangeably.

  2. 2.

    Usually, the term reference is used as an alias for an address that precludes pointer arithmetic. We do not place such a restriction. We use reference as an alternative to pointer, to stress that a reference belongs to a statically known region.

  3. 3.

    Memory is word-addressable if all reads and writes access to the same, fixed amount of bytes (the CPU word length). Note that any C program can be ported to word-addressable memory by adding extra instructions that fill a word with empty bytes before a memory write or extract some bytes from a word after a memory read.

  4. 4.

    Informally, casting any pointer to a char* type is allowed but the opposite is not. Moreover, signedness and constness do not affect the strict aliasing rules.

  5. 5.

    Crab proves the four properties using smashing abstraction and unrolling the loop one iteration. Finite loop unrolling would not help Infer and IKOS.

  6. 6.

    For instance, integer casts and casts between Boolean and integers are omitted.

  7. 7.

    Note that the cell \(\langle 0,0 \rangle \) can either mean the null reference or the integer 0.

  8. 8.

    We abuse notation and use \([\![.]\!](.)\) to evaluate integer and Boolean expressions defined as usual: \([\![c]\!](\sigma ) = c\) for constant c, \([\![v]\!](\langle \_,\_,numEnv,\_,\_,\_\rangle ) = numEnv(v)\) for variable v, \([\![e_1 + e_2]\!](\sigma ) = [\![e_1]\!](\sigma ) + [\![e_2]\!](\sigma )\) for expressions \(e_1\) and \(e_2\), etc.

  9. 9.

    We use the notation \(\lambda x. \bot \) to represent the undefined function.

  10. 10.

    Publicly available at https://github.com/seahorn/crab.

  11. 11.

    SeaDsa, Clam, and Crab are also integrated into the SeaHorn verification framework [13].

  12. 12.

    Publicly available at https://github.com/seahorn/clam.

  13. 13.

    https://github.com/google/brotli.

References

  1. Balakrishnan, G., Reps, T.: Recency-abstraction for heap-allocated storage. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 221–239. Springer, Heidelberg (2006). https://doi.org/10.1007/11823230_15

    Chapter  Google Scholar 

  2. Blanchet, B., et al.: A static analyzer for large safety-critical software. In: PLDI, pp. 196–207 (2003)

    Google Scholar 

  3. Blazy, S., Bühler, D., Yakobowski, B.: Structuring abstract interpreters through state and value abstractions. In: Bouajjani, A., Monniaux, D. (eds.) VMCAI 2017. LNCS, vol. 10145, pp. 112–130. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52234-0_7

    Chapter  MATH  Google Scholar 

  4. Bouillaguet, Q., Bobot, F., Sighireanu, M., Yakobowski, B.: Exploiting pointer analysis in memory models for deductive verification. In: Enea, C., Piskac, R. (eds.) VMCAI 2019. LNCS, vol. 11388, pp. 160–182. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11245-5_8

    Chapter  MATH  Google Scholar 

  5. Brat, G., Navas, J.A., Shi, N., Venet, A.: IKOS: a framework for static analysis based on abstract interpretation. In: Giannakopoulou, D., Salaün, G. (eds.) SEFM 2014. LNCS, vol. 8702, pp. 271–277. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10431-7_20

    Chapter  Google Scholar 

  6. Calcagno, C., Distefano, D.: Infer: an automatic program verifier for memory safety of C programs. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 459–465. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_33

    Chapter  Google Scholar 

  7. Chow, F., Chan, S., Liu, S.-M., Lo, R., Streich, M.: Effective representation of aliases and indirect memory operations in SSA form. In: Gyimóthy, T. (ed.) CC 1996. LNCS, vol. 1060, pp. 253–267. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61053-7_66

    Chapter  Google Scholar 

  8. Conway, C.L., Dams, D., Namjoshi, K.S., Barrett, C.: Pointer analysis, conditional soundness, and proving the absence of errors. In: Alpuente, M., Vidal, G. (eds.) SAS 2008. LNCS, vol. 5079, pp. 62–77. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69166-2_5

    Chapter  MATH  Google Scholar 

  9. Cornish, J.R.M., Gange, G., Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Analyzing array manipulating programs by program transformation. In: Proietti, M., Seki, H. (eds.) LOPSTR 2014. LNCS, vol. 8981, pp. 3–20. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17822-6_1

    Chapter  MATH  Google Scholar 

  10. Cuoq, P., Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.: Frama-C: a software analysis perspective. In: Eleftherakis, G., Hinchey, M., Holcombe, M. (eds.) SEFM 2012. LNCS, vol. 7504, pp. 233–247. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33826-7_16

    Chapter  Google Scholar 

  11. Filliâtre, J.-C., Marché, C.: Multi-prover verification of C programs. In: Davies, J., Schulte, W., Barnett, M. (eds.) ICFEM 2004. LNCS, vol. 3308, pp. 15–29. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30482-1_10

    Chapter  Google Scholar 

  12. Gopan, D.: Numeric program analysis techniques with applications to array analysis and library summarization. Ph.D. thesis, University of Wisconsin (2007)

    Google Scholar 

  13. Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn verification framework. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 343–361. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_20

    Chapter  Google Scholar 

  14. Gurfinkel, A., Navas, J.A.: A context-sensitive memory model for verification of C/C++ programs. In: Ranzato, F. (ed.) SAS 2017. LNCS, vol. 10422, pp. 148–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66706-5_8

    Chapter  Google Scholar 

  15. Illous, H., Lemerre, M., Rival, X.: A relational shape abstract domain. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 212–229. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57288-8_15

    Chapter  Google Scholar 

  16. Jourdan, J., Laporte, V., Blazy, S., Leroy, X., Pichardie, D.: A formally-verified C static analyzer. In: POPL, pp. 247–259 (2015)

    Google Scholar 

  17. Journault, M., Miné, A., Monat, R., Ouadjaout, A.: Combinations of reusable abstract domains for a multilingual static analyzer. In: Chakraborty, S., Navas, J.A. (eds.) VSTTE 2019. LNCS, vol. 12031, pp. 1–18. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41600-3_1

    Chapter  Google Scholar 

  18. Journault, M., Miné, A., Ouadjaout, A.: Modular static analysis of string manipulations in C programs. In: Podelski, A. (ed.) SAS 2018. LNCS, vol. 11002, pp. 243–262. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99725-4_16

    Chapter  Google Scholar 

  19. Kuderski, J., Navas, J.A., Gurfinkel, A.: Unification-based pointer analysis without oversharing. In: FMCAD, pp. 37–45 (2019)

    Google Scholar 

  20. Miné, A.: Field-sensitive value analysis of embedded C programs with union types and pointer arithmetics. In: LCTES, pp. 54–63 (2006)

    Google Scholar 

  21. Moy, Y.: Automatic modular static safety checking for C programs. Ph.D. thesis, Université Paris-Sud (2009)

    Google Scholar 

  22. Rakamarić, Z., Emmi, M.: SMACK: decoupling source language details from verifier implementations. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 106–113. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_7

    Chapter  Google Scholar 

  23. Rakamarić, Z., Hu, A.J.: A scalable memory model for low-level code. In: Jones, N.D., Müller-Olm, M. (eds.) VMCAI 2009. LNCS, vol. 5403, pp. 290–304. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-93900-9_24

    Chapter  Google Scholar 

  24. Sui, Y., Xue, J.: SVF: interprocedural static value-flow analysis in LLVM. In: CC, pp. 265–266 (2016)

    Google Scholar 

  25. Venet, A.: A scalable nonuniform pointer analysis for embedded programs. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 149–164. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27864-1_13

    Chapter  MATH  Google Scholar 

  26. Wang, W., Barrett, C., Wies, T.: Cascade 2.0. In: McMillan, K.L., Rival, X. (eds.) VMCAI 2014. LNCS, vol. 8318, pp. 142–160. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54013-4_9

    Chapter  Google Scholar 

  27. Warren, R., Hermenegildo, M.V., Debray, S.K.: On the practicality of global flow analysis of logic programs. In: ICLP, pp. 684–699 (1988)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Arie Gurfinkel or Jorge A. Navas .

Editor information

Editors and Affiliations

Appendices

A Motivating Example from Sect. 2 in CrabIR

In this section, we show the translation of our motivating example from Sect. 2 to CrabIR. For readability purposes, variable names have been renamed to match those names used in the C program. We have also omitted types when they can be inferred from the context. We show two functions: mk_list in Fig. 11 and main in Fig. 12. The function init is translated in a very similar manner.

Fig. 11.
figure 11

Simplified version of CrabIR code for mk_list function.

Fig. 12.
figure 12

Simplified version of CrabIR code for main.

B Auxiliary Functions for Concrete Semantics

(See Fig. 14).

Fig. 13.
figure 13

Conversion between addresses and cells.

Fig. 14.
figure 14

Helpers for concrete semantics.

C Concrete Semantics of reftoint and inttoref

The concrete semantics of reftoint and inttoref is shown in Fig. 15. It is straightforward by delegating to the functions \(\mathsf {cell2Addr} \) and \(\mathsf {addr2Cell} \). The most interesting part occurs in \(ref := \texttt {inttoref}(rgn, x)\) where the program state (\(rgnAddrs\)) must add the new reference ref to the set of owned addresses by the region rgn.

Fig. 15.
figure 15

Concrete Semantics for reftoint and inttoref.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gurfinkel, A., Navas, J.A. (2022). Abstract Interpretation of LLVM with a Region-Based Memory Model. In: Bloem, R., Dimitrova, R., Fan, C., Sharygina, N. (eds) Software Verification. NSV VSTTE 2021 2021. Lecture Notes in Computer Science(), vol 13124. Springer, Cham. https://doi.org/10.1007/978-3-030-95561-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95561-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95560-1

  • Online ISBN: 978-3-030-95561-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics