1 Introduction

Memory safety, and the vulnerabilities that follow from its absence [43], are common concerns. So what is it, exactly? Intuitions abound, but translating them into satisfying formal definitions is surprisingly difficult [20].

In large part, this difficulty stems from the prominent role that informal, everyday intuition assigns, in discussions of memory safety, to a range of errors related to memory misuse—buffer overruns, double frees, etc. Characterizing memory safety in terms of the absence of these errors is tempting, but this falls short for two reasons. First, there is often disagreement on which behaviors qualify as errors. For example, many real-world C programs intentionally rely on unrestricted pointer arithmetic [28], though it may yield undefined behavior according to the language standard [21, Sect. 6.5.6]. Second, from the perspective of security, the critical issue is not the errors themselves, but rather the fact that, when they occur in unsafe languages like C, the program’s ensuing behavior is determined by obscure, low-level factors such as the compiler’s choice of run-time memory layout, often leading to exploitable vulnerabilities. By contrast, in memory-safe languages like Java, programs can attempt to access arrays out of bounds, but such mistakes lead to sensible, predictable outcomes.

Rather than attempting a definition in terms of bad things that cannot happen, we aim to formalize memory safety in terms of reasoning principles that programmers can soundly apply in its presence (or conversely, principles that programmers should not naively apply in unsafe settings, because doing so can lead to serious bugs and vulnerabilities). Specifically, to give an account of memory safety, as opposed to more inclusive terms such as “type safety,” we focus on reasoning principles that are common to a wide range of stateful abstractions, such as records, tagged or untagged unions, local variables, closures, arrays, call stacks, objects, compartments, and address spaces.

What sort of reasoning principles? Our inspiration comes from separation logic [36], a variant of Hoare logic designed to verify complex heap-manipulating programs. The power of separation logic stems from local reasoning about state: to prove the correctness of a program component, we must argue that its memory accesses are confined to a footprint, a precise region demarcated by the specification. This discipline allows proofs to ignore regions outside of the footprint, while ensuring that arbitrary invariants for these regions are preserved during execution.

The locality of separation logic is deeply linked to memory safety. Consider a hypothetical jpeg decoding procedure that manipulates image buffers. We might expect its execution not to interfere with the integrity of an unrelated window object in the program. We can formalize this requirement in separation logic by proving a specification that includes only the image buffers, but not the window, in the decoder’s footprint. Showing that the footprint is respected would amount to checking the bounds of individual buffer accesses, thus enforcing memory safety; conversely, if the decoder is not memory safe, a simple buffer overflow might suffice to tamper with the window object, thus violating locality and potentially paving the way to an attack.

Our aim is to extend this line of reasoning beyond conventional separation logic, encompassing settings such as ML, Java, or Lisp that enforce memory safety automatically without requiring complete correctness proofs—which can be prohibitively expensive for large code bases, especially in the presence of third-party libraries or plugins over which we have little control. The key observation is that memory safety forces code to respect a natural footprint: the set of its reachable memory locations (reachable with respect to the variables it mentions). Suppose that the jpeg decoder above is written in Java. Though we may not know much about its input-output behavior, we can still assert that it cannot have any effect on the window object simply by replacing the detailed reasoning demanded by separation logic by a simple inaccessibility check.

Our first contribution is to formalize local reasoning principles supported by an ideal notion of memory safety, using a simple language (Sect. 2) to ground our discussion. We show three results (Theorems 1, 3 and 4) that explain how the execution of a piece of code is affected by extending its initial heap. These results lead to a noninterference property (Corollary 1), ensuring that code cannot affect or be affected by unreachable memory. In Sect. 3.3, we show how these results yield a variant of the frame rule of separation logic (Theorem 6), which embodies its local reasoning capabilities. The two variants have complementary strengths and weaknesses: while the original rule applies to unsafe settings like C, but requires comprehensively verifying individual memory accesses, our variant does not require proving that every access is correct, but demands a stronger notion of separation between memory regions. These results have been verified with the Coq proof assistant.Footnote 1

Our second contribution (Sect. 4) is to evaluate pragmatically motivated relaxations of the ideal notion above, exploring various trade-offs between safety, performance, flexibility, and backwards compatibility. These variants can be broadly classified into two groups according to reasoning principles they support. The stronger group gives up on some secrecy guarantees, but still ensures that pieces of code cannot modify the contents of unreachable parts of the heap. The weaker group, on the other hand, leaves gaps that completely invalidate reachability-based reasoning.

Our third contribution (Sect. 5) is to demonstrate how our characterization applies to more realistic settings, by analyzing a heap-safety monitor for machine code [5, 15]. We prove that the abstract machine that it implements also satisfies a noninterference property, which can be transferred to the monitor via refinement, modulo memory exhaustion issues discussed in Sect. 4. These proofs are also done in Coq.Footnote 2

We discuss related work on memory safety and stronger reasoning principles in Sect. 6, and conclude in Sect. 7. While memory safety has seen prior formal investigation (e.g. [31, 41]), our characterization is the first phrased in terms of reasoning principles that are valid when memory safety is enforced automatically. We hope that these principles can serve as good criteria for formally evaluating such enforcement mechanisms in practice. Moreover, our definition is self-contained and does not rely on additional features such as full-blown capabilities, objects, module systems, etc. Since these features tend to depend on some form of memory safety anyway, we could see our characterization as a common core of reasoning principles that underpin all of them.

2 An Idealized Memory-Safe Language

Our discussion begins with a concrete case study: a simple imperative language with manual memory management. It features several mechanisms for controlling the effects of memory misuse, ranging from the most conventional, such as bounds checking for spatial safety, to more uncommon ones, such as assigning unique identifiers to every allocated block for ensuring temporal safety.

Choosing a language with manual memory management may seem odd, since safety is often associated with garbage collection. We made this choice for two reasons. First, most discussions on memory safety are motivated by its absence from languages like C that also rely on manual memory management. There is a vast body of research that tries to make such languages safer, and we would like our account to apply to it. Second, we wanted to stress that our characterization does not depend fundamentally on the mechanisms used to enforce memory safety, especially because they might have complementary advantages and shortcomings. For example, manual memory management can lead to more memory leaks; garbage collectors can degrade performance; and specialized type systems for managing memory [37, 41] are more complex. After a brief overview of the language, we explore its reasoning principles in Sect. 3.

Fig. 1.
figure 1

Syntax, states and values

Figure 1 summarizes the language syntax and other basic definitions. Expressions e include variables \(x \in \mathsf {var}\), numbers \(n \in \mathbb {Z}\), booleans \(b \in \mathbb {B}\), an invalid pointer \(\mathsf {nil}\), and various operations, both binary (arithmetic, logic, etc.) and unary (extracting the offset of a pointer). We write [e] for dereferencing the pointer denoted by e.

Programs operate on states consisting of two components: a local store, which maps variables to values, and a heap, which maps pointers to values. Pointers are not bare integers, but rather pairs (in) of a block identifier \(i \in \mathbb {I}\) and an offset \(n \in \mathbb {Z}\). The offset is relative to the corresponding block, and the identifier i need not bear any direct relation to the physical address that might be used in a concrete implementation on a conventional machine. (That is, we can equivalently think of the heap as mapping each identifier to a separate array of heap cells.) Similar structured memory models are widely used in the literature, as in the CompCert verified C compiler [26] and other models of the C language [23], for instance.

We write \([\![c ]\!](s)\) to denote the outcome of running a program c in an initial state s, which can be either a successful final state \(s'\) or a fatal run-time error. Note that \([\![c ]\!]\) is partial, to account for non-termination. Similarly, \([\![e ]\!](s)\) denotes the result of evaluating the expression e on the state s (expression evaluation is total and has no side effects). The formal definition of these functions is left to the Appendix; we just single out a few aspects that have a crucial effect on the security properties discussed later.

Illegal Memory Accesses Lead to Errors. The language controls the effect of memory misuse by raising errors that stop execution immediately. This contrasts with typical C implementations, where such errors lead to unpredictable undefined behavior. The main errors are caused by reads, writes, and frees to the current memory m using invalid pointers—that is, pointers p such that m(p) is undefined. Such pointers typically arise by offsetting an existing pointer out of bounds or by freeing a structure on the heap (which turns all other pointers to that block in the program state into dangling ones). In common parlance, this discipline ensures both spatial and temporal memory safety.

Block Identifiers are Capabilities. Pointers can only be used to access memory corresponding to their identifiers, which effectively act as capabilities. Identifiers are set at allocation time, where they are chosen to be fresh with respect to the entire current state (i.e., the new identifier is not associated with any pointers defined in the current memory, stored in local variables, or stored on the heap). Once assigned, identifiers are immutable, making it impossible to fabricate a pointer to an allocated block out of thin air. This can be seen, for instance, in the semantics of addition, which allows pointer arithmetic but does not affect identifiers:

$$\begin{aligned}{}[\![e_1 + e_2 ]\!](s)&\triangleq {\left\{ \begin{array}{ll} n_1 + n_2 &{} \text {if }[\![e_i]\!](s) = n_i \\ (i,n_1+n_2) &{} \text {if }[\![e_1]\!](s) = (i,n_1)\text { and }[\![e_2]\!](s) = n_2 \\ \mathsf {nil}&{} \text {otherwise} \end{array}\right. } \end{aligned}$$

For simplicity, nonsensical combinations such as adding two pointers simply result in the \(\mathsf {nil}\) value. A real implementation might represent identifiers with hardware tags and use an increasing counter to generate identifiers for new blocks (as done by Dhawan et al. [15]; see Sect. 5.1); if enough tags are available, every identifier will be fresh.

Block Identifiers Cannot be Observed. Because of the freshness condition above, identifiers can reveal information about the entire program state. For example, if they are chosen according to an increasing counter, knowing what identifier was assigned to a new block tells us how many allocations have been performed. A concrete implementation would face similar issues related to the choice of physical addresses for new allocations. (Such issues are commonplace in systems that combine dynamic allocation and information-flow control [12].) For this reason, our language keeps identifiers opaque and inaccessible to programs; they can only be used to reference values in memory, and nothing else. We discuss a more permissive approach in Sect. 4.2.

Note that hiding identifiers doesn’t mean we have to hide everything associated with a pointer: besides using pointers to access memory, programs can also safely extract their offsets and test if two pointers are equal (which means equality for both offsets and identifiers). Our Coq development also shows that it is sound to compute the size of a memory block via a valid pointer.

New Memory is Always Initialized. Whenever a memory block is allocated, all of its contents are initialized to 0. (The exact value does not matter, as long it is some constant that is not a previously allocated pointer.) This is important for ensuring that allocation does not leak secrets present in previously freed blocks; we return to this point in Sect. 4.3.

3 Reasoning with Memory Safety

Having presented our language, we now turn to the reasoning principles that it supports. Intuitively, these principles allow us to analyze the effect of a piece of code by restricting our attention to a smaller portion of the program state. A first set of frame theorems (1, 3, and 4) describes how the execution of a piece of code is affected by extending the initial state on which it runs. These in turn imply a noninterference property, Corollary 1, guaranteeing that program execution is independent of inaccessible memory regions—that is, those that correspond to block identifiers that a piece of code does not possess. Finally, in Sect. 3.3, we discuss how the frame theorems can be recast in the language of separation logic, leading to a new variant of its frame rule (Theorem 6).

Fig. 2.
figure 2

Basic notation

3.1 Basic Properties of Memory Safety

Figure 2 summarizes basic notation used in our results. By permutation, we mean a function \(\pi : \mathbb {I}\rightarrow \mathbb {I}\) that has a two-sided inverse \(\pi ^{-1}\); that is, \(\pi \circ \pi ^{-1} = \pi ^{-1} \circ \pi = \mathsf {id}_{\mathbb {I}}\). Some of these operations are standard and omitted for brevity.Footnote 3

The first frame theorem states that, if a program terminates successfully, then we can extend its initial state almost without affecting execution.

Theorem 1

(Frame OK). Let c be a command, and \(s_1\), \(s_1'\), and \(s_2\) be states. Suppose that \([\![c]\!](s_1) = s_1'\), \(\mathsf {vars}(c) \subseteq \mathsf {vars}(s_1)\), and \(\mathsf {blocks}(s_1) \mathrel {\#}\mathsf {blocks}(s_2)\). Then there exists a permutation \(\pi \) such that \([\![c]\!](s_1 \cup s_2) = \pi \cdot s_1' \cup s_2\) and \(\mathsf {blocks}(\pi \cdot s_1') \mathrel {\#}\mathsf {blocks}(s_2)\).

The second premise, \(\mathsf {vars}(c) \subseteq \mathsf {vars}(s_1)\), guarantees that all the variables needed to run c are already defined in \(s_1\), implying that their values do not change once we extend that initial state with \(s_2\). The third premise, \(\mathsf {blocks}(s_1) \mathrel {\#}\mathsf {blocks}(s_2)\), means that the memories of \(s_1\) and \(s_2\) store disjoint regions. Finally, the conclusion of the theorem states that (1) the execution of c does not affect the extra state \(s_2\) and (2) the rest of the result is almost the same as \(s_1'\), except for a permutation of block identifiers.

Permutations are needed to avoid clashes between identifiers in \(s_2\) and those assigned to regions allocated by c when running on \(s_1\). For instance, suppose that the execution of c on \(s_1\) allocated a new block, and that this block was assigned some identifier \(i \in \mathbb {I}\). If the memory of \(s_2\) already had a block corresponding to i, c would have to choose a different identifier \(i'\) for allocating that block when running on \(s_1 \cup s_2\). This change requires replacing all occurrences of i by \(i'\) in the result of the first execution, which can be achieved with a permutation that swaps these two identifiers.

The proof of Theorem 1 relies crucially on the facts that programs cannot inspect identifiers, that memory can grow indefinitely (a common assumption in formal models of memory), and that memory operations fail on invalid pointers. Because of the permutations, we also need to show that permuting the initial state s of a command c with any permutation \(\pi \) yields the same outcome, up to some additional permutation \(\pi '\) that again accounts for different choices of fresh identifiers.

Theorem 2

(Renaming states). Let s be a state, c a command, and \(\pi \) a permutation. There exists \(\pi '\) such that:

figure a

A similar line of reasoning yields a second frame theorem, which says that we cannot make a program terminate just by extending its initial state.

Theorem 3

(Frame Loop). Let c be a command, and \(s_1\) and \(s_2\) be states. If \([\![c]\!](s_1) = \bot \), \(\mathsf {vars}(c) \subseteq \mathsf {vars}(s_1)\), and \(\mathsf {blocks}(s_1) \mathrel {\#}\mathsf {blocks}(s_2)\), then \([\![c]\!](s_1 \cup s_2) =~\bot \).

The third frame theorem shows that extending the initial state also preserves erroneous executions. Its statement is similar to the previous ones, but with a subtle twist. In general, by extending the state of a program with a block, we might turn an erroneous execution into a successful one—if the error was caused by accessing a pointer whose identifier matches that new block. To avoid this, we need a different premise (\(\mathsf {ids}(s_1) \mathrel {\#}\mathsf {blocks}(s_2)\)) preventing any pointers in the original state \(s_1\) from referencing the new blocks in \(s_2\)—which is only useful because our language prevents programs from forging pointers to existing regions. Since \(\mathsf {blocks}(s) \subseteq \mathsf {ids}(s)\), this premise is stronger than the analogous ones in the preceding results.

Theorem 4

(Frame Error). Let c be a command, and \(s_1\) and \(s_2\) be states. If \([\![c]\!](s_1) = \mathsf {error}\), \(\mathsf {vars}(c)\subseteq \mathsf {vars}(s_1)\), and \(\mathsf {ids}(s_1) \mathrel {\#}\mathsf {blocks}(s_2)\), then \([\![c]\!](s_1 \cup s_2) = \mathsf {error}\).

3.2 Memory Safety and Noninterference

The consequences of memory safety analyzed so far are intimately tied to the notion of noninterference [19]. In its most widely understood sense, noninterference is a secrecy guarantee: varying secret inputs has no effect on public outputs. Sometimes, however, it is also used to describe integrity guarantees: low-integrity inputs have no effect on high-integrity outputs. In fact, both guarantees apply to unreachable memory in our language, since they do not affect code execution; that is, execution (1) cannot modify these inaccessible regions (preserving their integrity), and (2) cannot learn anything meaningful about them, not even their presence (preserving their secrecy).

Corollary 1

(Noninterference). Let \(s_1\), \(s_{21}\), and \(s_{22}\) be states and c be a command. Suppose that \(\mathsf {vars}(c) \subseteq \mathsf {vars}(s_1)\), that \(\mathsf {ids}(s_1) \mathrel {\#}\mathsf {blocks}(s_{21})\) and that \(\mathsf {ids}(s_1) \mathrel {\#}\mathsf {blocks}(s_{22})\). When running c on the extended states \(s_1 \cup s_{21}\) and \(s_1 \cup s_{22}\), only one of the following three possibilities holds: (1) both executions loop (\([\![c]\!](s_1 \cup s_{21}) = [\![c]\!](s_1 \cup s_{22}) = \bot \)); (2) both executions terminate with an error (\([\![c]\!](s_1 \cup s_{21}) = [\![c]\!](s_1 \cup s_{22}) = \mathsf {error}\)); or (3) both executions successfully terminate without interfering with the inaccessible portions \(s_{21}\) and \(s_{22}\) (formally, there exists a state \(s_1'\) and permutations \(\pi _1\) and \(\pi _2\) such that \([\![c]\!](s_1 \cup s_{2i}) = \pi _i \cdot s_1' \cup s_{2i}\) and \(\mathsf {ids}(\pi _i \cdot s_1') \mathrel {\#}\mathsf {blocks}(s_{2i})\), for \(i = 1, 2\)).

Noninterference is often formulated using an indistinguishability relation on states, which expresses that one state can be obtained from the other by varying its secrets. We could have equivalently phrased the above result in a similar way. Recall that the hypothesis \(\mathsf {ids}(s_1) \mathrel {\#}\mathsf {blocks}(s_2)\) means that memory regions stored in \(s_2\) are unreachable via \(s_1\). Then, we could call two states “indistinguishable” if the reachable portions are the same (except for a possible permutation). In Sect. 4, the connection with noninterference will provide a good benchmark for comparing different flavors of memory safety.

3.3 Memory Safety and Separation Logic

We now explore the relation between the principles identified above, especially regarding integrity, and the local reasoning facilities of separation logic. Separation logic targets specifications of the form \(\{p\}\; c\; \{q\}\), where p and q are predicates over program states (subsets of \(\mathcal {S}\)). For our language, this could roughly mean

$$\begin{aligned} \forall s \in p,&\, \mathsf {vars}(c) \subseteq \mathsf {vars}(s) \Rightarrow [\![c]\!](s) \in q \cup \{\bot \}. \end{aligned}$$

That is, if we run c in a state satisfying p, it will either diverge or terminate in a state that satisfies q, but it will not trigger an error. Part of the motivation for precluding errors is that in unsafe settings like C they yield undefined behavior, destroying all hope of verification.

Local reasoning in separation logic is embodied by the frame rule, a consequence of Theorems 1 and 3. Roughly, it says that a verified program can only affect a well-defined portion of the state, with all other memory regions left untouched.Footnote 4

Theorem 5

Let p, q, and r be predicates over states and c be a command. The rule

is sound, where \(\mathsf {modvars}(c)\) is the set of local variables modified by c, \(\mathsf {independent}(r, V)\) means that the assertion r does not depend on the set of local variables V

$$ \forall l_1\,l_2\,m, (\forall x \notin V, \; l_1(x) = l_2(x)) \Rightarrow (l_1, m) \in r \Rightarrow (l_2, m) \in r, $$

and \(p * r\) denotes the separating conjunction of p and r:

$$ \{ (l, m_1 \cup m_2) \mid (l, m_1) \in p, (l, m_2) \in r, \mathsf {blocks}(l, m_1) \mathrel {\#}\mathsf {blocks}(l, m_2) \}. $$

As useful as it is, precluding errors during execution makes it difficult to use separation logic for partial verification: proving any property, no matter how simple, of a nontrivial program requires detailed reasoning about its internals. Even the following vacuous rule is unsound in separation logic:

For a counterexample, take p to be \(\mathsf {true}\) and c to be some arbitrary memory read \(x \leftarrow [y]\). If we run c on an empty heap, which trivially satisfies the precondition, we obtain an error, contradicting the specification.

Fortunately, our memory-safe language—in which errors have a sensible, predictable semantics, as opposed to wild undefined behavior—supports a variant of separation logic that allows looser specifications of the form \(\{p\}\; c\; \{q\}_e\), defined as

$$\begin{aligned} \forall s \in p,&\, \mathsf {vars}(c) \subseteq \mathsf {vars}(s) \Rightarrow [\![c]\!](s) \in q \cup \{\bot , \mathsf {error}\}. \end{aligned}$$

These specifications are weaker than their conventional counterparts, leading to a subsumption rule:

Because errors are no longer prevented, the Taut rule \(\{p\}\; c\; \{\mathsf {true}\}_e\) becomes sound, since the \(\mathsf {true}{}\) postcondition now means that any outcome whatsoever is acceptable. Unfortunately, there is a price to pay for allowing errors: they compromise the soundness of the frame rule. The reason, as hinted in the introduction, is that preventing run-time errors has an additional purpose in separation logic: it forces programs to act locally—that is, to access only the memory delimited their pre- and postconditions. To see why, consider the same program c as above, \(x \leftarrow [y]\). This program clearly yields an error when run on an empty heap, implying that the triple \(\{\mathsf {emp}\}\; c\; \{x = 0\}_e\) is valid, where the predicate \(\mathsf {emp}\) holds of any state with an empty heap and \(x = 0\) holds of states whose local store maps x to 0. Now consider what happens if we try to apply an analog of the frame rule to this triple using the frame predicate \(y \mapsto 1\), which holds in states where y contains a pointer to the unique defined location on the heap, which stores the value 1. After some simplification, we arrive at the specification \(\{y \mapsto 1\}\; c\; \{x = 0 \wedge y \mapsto 1\}_e\), which clearly does not hold, since executing c on a state satisfying the precondition leads to a successful final state mapping x to 1.

For the frame rule to be recovered, it needs to take errors into account. The solution lies on the reachability properties of memory safety: instead of enforcing locality by preventing errors, we can use the fact that memory operations in a safe language are automatically local—in particular, local to the identifiers that the program possesses.

Theorem 6

Under the same assumptions as Theorem 5, the following rule is sound

where \(p \mathrel {\triangleright } r\) denotes the isolating conjunction of p and r, defined as

$$ \{ (l, m_1 \cup m_2) \mid (l, m_1) \in p, (l, m_2) \in r, \mathsf {ids}(l, m_1) \mathrel {\#}\mathsf {blocks}(l, m_2) \}. $$

The proof is similar to the one for the original rule, but it relies additionally on Theorem 4. This explains why the isolating conjunction is needed, since it ensures that the fragment satisfying r is unreachable from the rest of the state.

3.4 Discussion

As hinted by their connection with the frame rule, the theorems of Sect. 3.1 are a form of local reasoning: to reason about a command, it suffices to consider its reachable state; how this state is used bears no effect on the unreachable portions. In a more realistic language, reachability might be inferred from additional information such as typing. But even here it can probably be accomplished by a simple check of the program text.

For example, consider the hypothetical jpeg decoder from Sect. 1. We would like to guarantee that the decoder cannot tamper with an unreachable object—a window object, a whitelist of trusted websites, etc. The frame theorems give us a means to do so, provided that we are able to show that the object is indeed unreachable; additionally, they imply that the jpeg decoder cannot directly extract any information from this unreachable object, such as passwords or private keys.

Many real-world attacks involve direct violations of these reasoning principles. For example, consider the infamous Heartbleed attack on OpenSSL, which used out-of-bounds reads from a buffer to leak data from completely unrelated parts of the program state and to steal sensitive information [16]. Given that the code fragment that enabled that attack was just manipulating an innocuous array, a programmer could easily be fooled into believing (as probably many have) that that snippet could not possibly access sensitive information, allowing that vulnerability to remain unnoticed for years.

Finally, our new frame rule only captures the fact that a command cannot influence the heap locations that it cannot reach, while our noninterference result (Corollary 1) captures not just this integrity aspect of memory safety, but also a secrecy aspect. We hope that future research will explore the connection between the secrecy aspect of memory safety and (relational) program logics.

4 Relaxing Memory Safety

So much for formalism. What about reality? Strictly speaking, the security properties we have identified do not hold of any real system. This is partly due to fundamental physical limitations—real systems run with finite memory, and interact with users in various ways that transcend inputs and outputs, notably through time and other side channels.Footnote 5 A more interesting reason is that real systems typically do not impose all the restrictions required for the proofs of these properties. Languages that aim for safety generally offer relatively benign glimpses of their implementation details (such accessing the contents of uninitialized memory, extract physical addresses from pointers or compare them for ordering) in return for significant flexibility or performance gains. In other systems, the concessions are more fundamental, to the extent that it is harder to clearly delimit what part of a program is unsafe: the SoftBound transformation [31], for example, adds bounds checks for C programs, but does not protect against memory-management bugs; a related transformation, CETS [32], is required for temporal safety.

In this section, we enumerate common relaxed models of memory safety and evaluate how they affect the reasoning principles and security guarantees of Sect. 3. Some relaxations, such as allowing pointers to be forged out of thin air, completely give up on reachability-based reasoning. Others, however, retain strong guarantees for integrity while giving up on some secrecy, allowing aspects of the global state of a program to be observed. For example, a system with finite memory (Sect. 4.5) may leak some information about its memory consumption, and a system that allows pointer-to-integer casts (Sect. 4.2) may leak information about its memory layout. Naturally, the distinction between integrity and secrecy should be taken with a grain of salt, since the former often depends on the latter; for example, if a system grants privileges to access some component when given with the right password, a secrecy violation can escalate to an integrity violation!

4.1 Forging Pointers

Many real-world C programs use integers as pointers. If this idiom is allowed without restrictions, then local reasoning is compromised, as every memory region may be reached from anywhere in the program. It is not surprising that languages that strive for safety either forbid this kind of pointer forging or confine it to clear unsafe fragments.

More insidiously, and perhaps surprisingly, similar dangers lurk in the stateful abstractions of some systems that are widely regarded as “memory safe.” JavaScript, for example, allows code to access arbitrary global variables by indexing an associative array with a string, a feature that enables many serious attacks [1, 18, 29, 44]. One might argue that global variables in JavaScript are “memory unsafe” because they fail to validate local reasoning: even if part of a JavaScript program does not explicitly mention a given global variable, it might still change this variable or the objects it points to. Re-enabling local reasoning requires strong restrictions on the programming style [1, 9, 18].

4.2 Observing Pointers

The language of Sect. 2 maintains a complete separation between pointers and other values. In reality, this separation is often only enforced in one direction. For example, some tools for enforcing memory safety in C [13, 31] allow pointer-to-integer casts [23] (a feature required by many low-level idioms [10, 28]); and the default implementation of hashCode() in Java leaks address information. To model such features, we can extend the syntax of expressions with a form \(\mathsf {cast}(e)\), the semantics of which are defined with some function \([\![\mathsf {cast}]\!]: \mathbb {I}\times \mathbb {Z}\rightarrow \mathbb {Z}\) for converting a pointer to an integer:

$$\begin{aligned}{}[\![\mathsf {cast}(e)]\!](s)&= [\![\mathsf {cast}]\!]([\![e]\!](s)) \qquad \text { if }[\![e]\!](s) \in \mathbb {I}\times \mathbb {Z}\end{aligned}$$

Note that the original language included an operator for extracting the offset of a pointer. Their definitions are similar, but have crucially different consequences: while offsets do not depend on the identifier, allocation order, or other low-level details of the language implementation (such as the choice of physical addresses when allocating a block), all of these could be relevant when defining the semantics of \(\mathsf {cast}\). The three frame theorems (1, 3, and 4) are thus lost, because the state of unreachable parts of the heap may influence integers observed by the program. An important consequence is that secrecy is weakened in this language: an attacker could exploit pointers as a side-channel to learn secrets about data it shouldn’t access.

Nevertheless, integrity is not affected: if a block is unreachable, its contents will not change at the end of the execution. (This result was also proved in Coq.)

Theorem 7

(Integrity-only Noninterference). Let \(s_1\), \(s_2\), and \(s'\) be states and c a command such that \(\mathsf {vars}(c) \subseteq \mathsf {vars}(s_1)\), \(\mathsf {ids}(s_1) \mathrel {\#}\mathsf {blocks}(s_2)\), and \([\![c]\!](s_1 \cup s_2) = s'\). Then we can find \(s_1' \in \mathcal {S}\) such that \(s' = s_1' \cup s_2\) and \(\mathsf {ids}(s_1') \mathrel {\#}\mathsf {blocks}(s_2)\).

The stronger noninterference result of Corollary 1 showed that, if pointer-to-integer casts are prohibited, changing the contents of the unreachable portion \(s_2\) has no effect on the reachable portion, \(s_1'\). In contrast, Theorem 7 allows changes in \(s_2\) to influence \(s_1'\) in arbitrary ways in the presence of these casts: not only can the contents of this final state change, but the execution can also loop forever or terminate in an error.

To see why, suppose that the jpeg decoder of Sect. 1 is part of a web browser, but that it does not have the required pointers to learn the address that the user is currently visiting. Suppose that there is some relation between the memory consumption of the program and that website, and that there is some correlation between the memory consumption and the identifier assigned to a new block. Then, by allocating a block and converting its pointer to a integer, the decoder might be able to infer useful information about the visited website [22]. Thus, if \(s_2\) denoted the part of the state where that location is stored, changing its contents would have a nontrivial effect on \(s_1'\), the part of the state that the decoder does have access to. We could speculate that, in a reasonable system, this channel can only reveal information about the layout of unreachable regions, and not their contents. Indeed, we conjecture this for the language of this subsection.

Finally, it is worth noting that simply excluding casts might not suffice to prevent this sort of vulnerability. Recall that our language takes both offsets and identifiers into account for equality tests. For performance reasons, we could have chosen a different design that only compares physical addresses, completely discarding identifiers. If attackers know the address of a pointer in the program—which could happen, for instance, if they have access to the code of the program and of the allocator—they can use pointer arithmetic (which is generally harmless and allowed in our language) to find the address of other pointers. If x holds the pointer they control, they can run, for instance,

$$\begin{aligned} y \leftarrow \mathsf {alloc}(1); \mathsf {if}\; x + 1729 = y \;\mathsf {then}\; \ldots \;\mathsf {else}\; \ldots , \end{aligned}$$

to learn the location assigned to y and draw conclusions about the global state.

4.3 Uninitialized Memory

Safe languages typically initialize new variables and objects. But this can degrade performance, leading to cases where this feature is dropped—including standard C implementations, safer alternatives [13, 31], OCaml’s Bytes.create primitive, or Node.js’s Buffer.allocUnsafe, for example.

The problem with this concession is that the entire memory becomes relevant to execution, and local reasoning becomes much harder. By inspecting old values living in uninitialized memory, an attacker can learn about parts of the state they shouldn’t access and violate secrecy. This issue would become even more severe in a system that allowed old pointers or other capabilities to occur in re-allocated memory in a way that the program can use, since they could yield access to restricted resources directly, leading to potential integrity violations as well. (The two examples given above—OCaml and Node.js—do not suffer from this issue, because any preexisting pointers in re-allocated memory are treated as bare bytes that cannot be used to access memory.)

4.4 Dangling Pointers and Freshness

Another crucial issue is the treatment of dangling pointers—references to previously freed objects. Dangling pointers are problematic because there is an inherent tension between giving them a sensible semantics (for instance, one that validates the properties of Sect. 3) and obtaining good performance and predictability. Languages with garbage collection avoid the issue by forbidding dangling pointers altogether—heap storage is freed only when it is unreachable. In the language of Sect. 2, besides giving a well-defined behavior to the use of dangling pointers (signaling an error), we imposed strong freshness requirements on allocation, mandating not only that the new identifier not correspond to any existing block, but also that it not be present anywhere else in the state.

To see how the results of Sect. 3 are affected by weakening freshness, suppose we run the program \(x \leftarrow \mathsf {alloc}(1); z \leftarrow (y = x)\) on a state where y holds a dangling pointer. Depending on the allocator and the state of the memory, the pointer assigned to x could be equal to y. Since this outcome depends on the entire state of the system, not just the reachable memory, Theorems 1, 3 and 4 now fail. Furthermore, an attacker with detailed knowledge of the allocator could launder secret information by testing pointers for equality. Weakening freshness can also have integrity implications, since it becomes harder to ensure that blocks are properly isolated. For instance, a newly allocated block might be reachable through a dangling pointer controlled by an attacker, allowing them to access that block even if they were not supposed to.

Some practical solutions for memory safety use mechanisms similar to our language’s, where each memory location is tagged with an identifier describing the region it belongs to [11, 15]. Pointers are tagged similarly, and when a pointer is used to access memory, a violation is detected if its identifier does not match the location’s. However, for performance reasons, the number of possible identifiers might be limited to a relatively small number, such as 2 or 4 [11] or 16 [46]. In addition to the problems above, since multiple live regions can share the same identifier in such schemes, it might be possible for buffer overflows to lead to violations of secrecy and integrity as well.

Although we framed our discussion in terms of identifiers, the issue of freshness can manifest itself in other ways. For example, many systems for spatial safety work by adding base and bounds information to pointers. In some of these [13, 31], dangling pointers are treated as an orthogonal issue, and it is possible for the allocator to return a new memory region that overlaps with the range of a dangling pointer, in which case the new region will not be properly isolated from the rest of the state.

Finally, dangling pointers can have disastrous consequences for overall system security, independently of the freshness issues just described: freeing a pointer more than once can break allocator invariants, enabling attacks [43].

4.5 Infinite Memory

Our idealized language allows memory to grow indefinitely. But real languages run on finite memory, and allocation fails when programs run out of space. Besides enabling denial-of-service attacks, finite memory has consequences for secrecy. Corollary 1 does not hold in a real programming language as is, because an increase in memory consumption can cause a previously successful allocation to fail. By noticing this difference, a piece of code might learn something about the entire state of the program. How problematic this is in practice will depend on the particular system under consideration.

A potential solution is to force programs that run out of memory to terminate immediately. Though this choice might be bad from an availability standpoint, it is probably the most benign in terms of secrecy. We should be able to prove an error-insensitive variant of Corollary 1, where the only significant effect that unreachable memory can have is to turn a successful execution or infinite loop into an error. Similar issues arise for IFC mechanisms that often cannot prevent secrets from influencing program termination, leading to termination-insensitive notions of noninterference.

Unfortunately, even an error-insensitive result might be too strong for real systems, which often make it possible for attackers to extract multiple bits of information about the global state of the program—as previously noted in the IFC literature [4]. Java, for example, does not force termination when memory runs out, but triggers an exception that can be caught and handled by user code, which is then free to record the event and probe the allocator with a different test. And most languages do not operate in batch mode like ours does, merely producing a single answer at the end of execution; rather, their programs continuously interact with their environment through inputs and outputs, allowing them to communicate the exact amount of memory that caused an error.

This discussion suggests that, if size vulnerabilities are a real concern, they need to be treated with special care. One approach would be to limit the amount of memory an untrusted component can allocate [47], so that exhausting the memory allotted to that component doesn’t reveal information about the state of the rest of the system (and so that also global denial-of-service attacks are prevented). A more speculative idea is to develop quantitative versions [6, 39] of the noninterference results discussed here that apply only if the total memory used by the program is below a certain limit.

5 Case Study: A Memory-Safety Monitor

To demonstrate the applicability of our characterization, we use it to analyze a tag-based monitor proposed by Dhawan et al. to enforce heap safety for low-level code [15]. In prior work [5], we and others showed that an idealized model of the monitor correctly implements a higher-level abstract machine with built-in memory safety—a bit more formally, every behavior of the monitor is also a behavior of the abstract machine. Building upon this work, we prove that this abstract machine satisfies a noninterference property similar to Corollary 1. We were also able to prove that a similar result holds for a lower-level machine that runs a so-called “symbolic” representation of the monitor—although we had to slightly weaken the result to account for memory exhaustion (cf. Sect. 4.5), since the machine that runs the monitor has finite memory, while the abstract machine has infinite memory. If we had a verified machine-code implementation of this monitor, it would be possible to prove a similar result for it as well.

5.1 Tag-Based Monitor

We content ourselves with a brief overview of Dhawan et al.’s monitor [5, 15], since the formal statement of the reasoning principles it supports are more complex than the one for the abstract machine from Sect. 5.2, on which we will focus. Following a proposal by Clause et al. [11], Dhawan et al.’s monitor enforces memory safety for heap-allocated data by checking and propagating metadata tags. Every memory location receives a tag that uniquely identifies the allocated region to which that location belongs (akin to the identifiers in Sect. 2), and pointers receive the tag of the region they are allowed to reference. The monitor assigns these tags to new regions by storing a monotonic counter in protected memory that is bumped on every call to malloc; with a large number of possible tags, it is possible to avoid the freshness pitfalls discussed in Sect. 4.4. When a memory access occurs, the monitor checks whether the tag on the pointer matches the tag on the location. If they do, the operation is allowed; otherwise, execution halts. The monitor instruments the allocator to make set up tags correctly. Its implementation achieves good performance using the PUMP, a hardware extension accelerating such micro-policies for metadata tagging [15].

5.2 Abstract Machine

The memory-safe abstract machine [5] operates on two kinds of values: machine words w, or pointers (iw), which are pairs of an identifier \(i \in \mathbb {I}\) and an offset w. We use \(\mathcal {W}\) to denote the set of machine words, and \(\mathcal {V}\) to denote the set of values. Machine states are triples \((m, rs , pc )\), where (1) \(m \in \mathbb {I}\rightharpoonup _\mathrm {fin}\mathcal {V}^*\) is a memory mapping identifiers to lists of values; (2) \( rs \in \mathcal {R}\rightharpoonup _\mathrm {fin}\mathcal {V}\) is a register bank, mapping register names to values; and (3) \( pc \in \mathcal {V}\) is the program counter.

The execution of an instruction is specified by a step relation \(s \rightarrow s'\). If there is no \(s'\) such that \(s \rightarrow s'\), we say that s is stuck, which means that a fatal error occurred during execution. On each instruction, the machine checks if the current program counter is a pointer and, if so, tries to fetch the corresponding value in memory. The machine then ensures that this value is a word that correctly encodes an instruction and, if so, acts accordingly. The instructions of the machine, representative of typical RISC architectures, allow programs to perform binary and logical operations, move values to and from memory, and branch. The machine is in fact fairly similar to the language of Sect. 2. Some operations are overloaded to manipulate pointers; for example, adding a pointer to a word is allowed, and the result is obtained by adjusting the pointer’s offset accordingly. Accessing memory causes the machine to halt when the corresponding position is undefined.

In addition to these basic instructions, the machine possesses a set of special monitor services that can be invoked as regular functions, using registers to pass in arguments and return values. There are two services \(\mathsf {alloc}\) and \(\mathsf {free}\) for managing memory, and one service \(\mathsf {eq}\) for testing whether two values are equal. The reason for using separate monitor services instead of special instructions is to keep its semantics closer to the more concrete machine that implements it. While instructions include an equality test, it cannot replace the \(\mathsf {eq}\) service, since it only takes physical addresses into account. As argued in Sect. 4.2, such comparisons can be turned into a side channel. To prevent this, testing two pointers for equality directly using the corresponding machine instruction results in an error if the pointers have different block identifiers.

5.3 Verifying Memory Safety

The proof of memory safety for this abstract machine mimics the one carried for the language in Sect. 3. We use similar notations as before: \(\pi \cdot s\) means renaming every identifier that appears in s according to the permutation \(\pi \), and \(\mathsf {ids}(s)\) is the finite set of all identifiers that appear in the state s. A simple case analysis on the possible instructions yields analogs of Theorems 1, 2 and 4 (we don’t include an analog of Theorem 3 because we consider individual execution steps, where loops cannot occur).

Theorem 8

Let \(\pi \) be a permutation, and s and \(s'\) be two machine states such that \(s \rightarrow s'\). There exists another permutation \(\pi '\) such that \(\pi \cdot s \rightarrow \pi ' \cdot s'\).

Theorem 9

Let \((m_1, rs , pc )\) be a state of the abstract machine, and \(m_2\) a memory. Suppose that \(\mathsf {ids}(m_1, rs , pc ) \mathrel {\#}\mathsf {dom}(m_2)\), and that \((m_1, rs , pc ) \rightarrow (m', rs ', pc ')\). Then, there exists a permutation \(\pi \) such that \(\mathsf {ids}(\pi \cdot m', \pi \cdot rs , \pi \cdot pc ) \mathrel {\#}\mathsf {dom}(m_2)\) and \((m_2 \cup m_1, rs , pc ) \rightarrow (m_2 \cup \pi \cdot m', \pi \cdot rs ', \pi \cdot pc ')\).

Theorem 10

Let \((m_1, rs , pc )\) be a machine state, and \(m_2\) a memory. If \(\mathsf {ids}(m_1, rs , pc ) \mathrel {\#}\mathsf {dom}(m_2)\), and \((m_1, rs , pc )\) is stuck, then \((m_2 \cup m_1, rs , pc )\) is also stuck.

Once again, we can combine these properties to obtain a proof of noninterference. Our Coq development includes a complete statement.

5.4 Discussion

The reasoning principles supported by the memory-safety monitor have an important difference compared to the ones of Sect. 3. In the memory-safe language, reachability is relative to a program’s local variables. If we want to argue that part of the state is isolated from some code fragment, we just have to consider that fragment’s local variables—other parts of the program are still allowed to access the region. The memory-safety monitor, on the other hand, does not have an analogous notion: an unreachable memory region is useless, since it remains unreachable by all components forever.

It seems that, from the standpoint of noninterference, heap memory safety taken in isolation is much weaker than the guarantees it provides in the presence of other language features, such as local variables. Nevertheless, the properties studied above suggest several avenues for strengthening the mechanism and making its guarantees more useful. The most obvious one would be to use the mechanism as the target of a compiler for a programming language that provides other (safe) stateful abstractions, such as variables and a stack for procedure calls. A more modest approach would be to add other state abstractions to the mechanism itself. Besides variables and call stacks, if the mechanism made code immutable and separate from data, a simple check would suffice to tell whether a code segment stored in memory references a given privileged register. If the register is the only means of reaching a memory region, we should be able to soundly infer that that code segment is independent of that region.

On a last note, although the abstract machine we verified is fairly close to our original language, the dynamic monitor that implements it using tags is quite different (Sect. 5.1). In particular, the monitor works on a machine that has a flat memory model, and keeps track of free and allocated memory using a protected data structure that stores block metadata. It was claimed that reasoning about this base and bounds information was the most challenging part of the proof that the monitor implements the abstract machine [5]. For this reason, we believe that this proof can be adapted to other enforcement mechanisms that rely solely on base and bounds information—for example, fat pointers [13, 25] or SoftBound [31]—while keeping a similar abstract machine as their specification, and thus satisfying a similar noninterference property. This gives us confidence that our memory safety characterization generalizes to other settings.

6 Related Work

The present work lies at the intersection of two areas of previous research: one on formal characterizations of memory safety, the other on reasoning principles for programs. We review the most closely related work in these areas.

Characterizing Memory Safety. Many formal characterizations of memory safety originated in attempts to reconcile its benefits with low-level code. Generally, these works claim that a mechanism is safe by showing that it prevents or catches typical temporal and spatial violations. Examples in the literature include: Cyclone [41], a language with a type system for safe manual memory management; CCured [33], a program transformation that adds temporal safety to C by refining its pointer type with various degrees of safety; Ivory [17] an embedding of a similar “safe-C variant” into Haskell; SoftBound [31], an instrumentation technique for C programs for spatial safety, including the detection of bounds violations within an object; CETS [32], a compiler pass for preventing temporal safety violations in C programs, including accessing dangling pointers into freed heap regions and stale stack frames; the memory-safety monitor for the PUMP [5, 15], which formed the basis of our case study in Sect. 5; and languages like Mezzo [35] and Rust [45], whose guarantees extend to preventing data races [7]. Similar models appear in formalizations of C [24, 26], which need to rigorously characterize its sources of undefined behavior—in particular, instances of memory misuse.

Either explicitly or implicitly, these works define memory errors as attempts to use a pointer to access a location that it was not meant to access—for example, an out-of-bounds or free one. This was noted by Hicks [20], who, inspired by SoftBound, proposed to define memory safety as an execution model that tracks what part of memory each pointer can access. Our characterization is complementary to these accounts, in that it is extensional: its data isolation properties allow us to reason directly about the observable behavior of the program. Furthermore, as demonstrated by our application to the monitor of Sect. 5 and the discussions on Sect. 4, it can be adapted to various enforcement mechanisms and variations of memory safety.

Reasoning Principles. Separation logic [36, 48] has been an important source of inspiration for our work. The logic’s frame rule enables its local reasoning capabilities and imposes restrictions that are similar to those mandated by memory-safe programming guidelines. As discussed in Sect. 3.3, our reasoning principles are reminiscent of the frame rule, but use reachability to guarantee locality in settings where memory safety is enforced automatically. In separation logic, by contrast, locality needs to be guaranteed for each program individually by comprehensive proofs.

Several works have investigated similar reasoning principles for a variety of program analyses, including static, dynamic, manual, or a mixture of those. Some of these are formulated as expressive logical relations, guaranteeing that programs are compatible with the framing of state invariants; representative works include: L\({}^{{\small 3}}\) [3], a linear calculus featuring strong updates and aliasing control; the work of Benton and Tabereau [8] on a compiler for a higher-order language; and the work of Devriese et al. [14] on object capabilities for a JavaScript-like language. Other developments are based on proof systems reminiscent of separation logic; these include Yarra [38], an extension of C that allows programmers to protect the integrity of data structures marked as critical; the work of Agten et al. [2], which allows mixing unverified and verified components by instrumenting the program to check that required assertions hold at interfaces; and the logic of Swasey et al. [42] for reasoning about object capabilities.

Unlike our work, these developments do not propose reachability-based isolation as a general definition of memory safety, nor do they attempt to analyze how their reasoning principles are affected by common variants of memory safety. Furthermore, many of these other works—especially the logical relations—rely on encapsulation mechanisms such as closures, objects, or modules that go beyond plain memory safety. Memory safety alone can only provide complete isolation, while encapsulation provides finer control, allowing some interaction between components, while guaranteeing the preservation of certain state invariants. In this sense, one can see memory-safety reasoning as a special case of encapsulation reasoning. Nevertheless, it is a practically relevant special case that is interesting on its own, since when reasoning about an encapsulated component, one must argue explicitly that the invariants of interest are preserved by the private operations of that component; memory safety, on the other hand, guarantees that any invariant on unreachable parts of the memory is automatically preserved.

Perhaps closer to our work, Maffeis et al. [27] show that their notion of “authority safety” guarantees isolation, in the sense that a component’s actions cannot influence the actions of another component with disjoint authority. Their notion of authority behaves similarly to the set of block identifiers accessible by a program in our language; however, they do not attempt to connect their notion of isolation to the frame rule, noninterference, or traditional notions of memory safety.

Morrisett et al. [30] state a correctness criterion for garbage collection based on program equivalence. Some of the properties they study are similar to the frame rule, describing the behavior of code running in an extended heap. However, they use this analysis to justify the validity of deallocating objects, rather than studying the possible interactions between the extra state and the program in terms of integrity and secrecy.

7 Conclusions and Future Work

We have explored the consequences of memory safety for reasoning about programs, formalizing intuitive principles that, we argue, capture the essential distinction between memory-safe systems and memory-unsafe ones. We showed how the reasoning principles we identified apply to a recent dynamic monitor for heap memory safety.

The systems studied in this paper have a simple storage model: the language of Sect. 2 has just global variables and flat, heap-allocated arrays, while the monitor of Sect. 5 doesn’t even have variables or immutable code. Realistic programming platforms, of course, offer much richer stateful abstractions, including, for example, procedures with stack-allocated local variables as well as structured objects with contiguously allocated sub-objects. In terms of memory safety, these systems have a richer vocabulary for describing resources that programs can access, and programmers could benefit from isolation-based local reasoning involving these resources.

For example, in typical safe languages with procedures, the behavior of a procedure should depend only on its arguments, the global variables it uses, and the portions of the state that are reachable from these values; if the caller of that procedure has a private object that is not passed as an argument, it should not affect or be affected by the call. Additionally, languages such as C allow for objects consisting of contiguously allocated sub-objects for improved performance. Some systems for spatial safety [13, 31] allow capability downgrading—that is, narrowing the range of a pointer so that it can’t access outside of a sub-object’s bounds. It would be interesting to refine our model to take these features into account. In the case of the monitor of Sect. 5, such considerations could lead to improved designs or to the integration of the monitor inside a secure compiler. Conversely, it would be interesting to derive finer security properties for relaxations like the ones discussed in Sect. 4. Some inspiration could come from the IFC literature, where quantitative noninterference results provide bounds on the probability that some secret is leaked, the rate at which it is leaked, how many bits are leaked, etc. [6, 39].

The main goal of this work was to understand, formally, the benefits of memory safety for informal and partial reasoning, and to evaluate a variety of weakened forms of memory safety in terms of which reasoning principles they preserve. However, our approach may also suggest ways to improve program verification. One promising idea is to leverage the guarantees of memory safety to obtain proofs of program correctness modulo unverified code that could have errors, in contexts where complete verification is too expensive or not possible (e.g., for programs with a plugin mechanism).