A Formal C Memory Model for Separation Logic

The core of a formal semantics of an imperative programming language is a memory model that describes the behavior of operations on the memory. Defining a memory model that matches the description of C in the C11 standard is challenging because C allows both high-level (by means of typed expressions) and low-level (by means of bit manipulation) memory accesses. The C11 standard has restricted the interaction between these two levels to make more effective compiler optimizations possible, at the expense of making the memory model complicated. We describe a formal memory model of the (non-concurrent part of the) C11 standard that incorporates these restrictions, and at the same time describes low-level memory operations. This formal memory model includes a rich permission model to make it usable in separation logic and supports reasoning about program transformations. The memory model and essential properties of it have been fully formalized using the Coq proof assistant.


Introduction
A memory model is the core of a semantics of an imperative programming language. It models the memory states and describes the behavior of memory operations. The main operations described by a C memory model are: -Reading a value at a given address. Formalizing the C11 memory model in a faithful way is challenging because C features both low-level and high-level data access. Low-level data access involves unstructured and untyped byte representations whereas high-level data access involves typed abstract values such as arrays, structs and unions.
This duality makes the memory model of C more complicated than the memory model of nearly any other programming language. For example, more mathematically oriented languages such as Java and ML feature only high-level data access, in which case the memory can be modeled in a relatively simple and structured way, whereas assembly languages feature only low-level data access, in which case the memory can be modeled as an array of bits.
The situation becomes more complicated as the C11 standard allows compilers to perform optimizations based on a high-level view of data access that are inconsistent with the traditional low-level view of data access. This complication has lead to numerous ambiguities in the standard text related to aliasing, uninitialized memory, end-of-array pointers and type-punning that cause problems for C code when compiled with widely used compilers. See for example the message [42] on the standard committee's mailing list, Defect Reports #236, #260, and #451 [26], and the various examples in this paper.
Contribution This paper describes the CH 2 O memory model, which is part of the the CH 2 O project [30][31][32][33][34][35][36][37]. CH 2 O provides an operational, executable and axiomatic semantics in Coq for a large part of the non-concurrent fragment of C, based on the official description of C as given by the C11 standard [27].
The key features of the CH 2 O memory model are as follows: -Close to C11 CH 2 O is faithful to the C11 standard in order to be compiler independent. When one proves something about a given program with respect to CH 2 O, it should behave that way with any C11 compliant compiler (possibly restricted to certain implementation-defined choices). -Static type system Given that C is a statically typed language, CH 2 O does not only capture the dynamic semantics of C11 but also its type system. We have established properties such as type preservation of the memory operations. -Proof infrastructure All parts of the CH 2 O memory model and semantics have been formalized in Coq (without axioms). This is essential for its application to program verification in proof assistants. Also, considering the significant size of CH 2 O and its memory model, proving metatheoretical properties of the language would have been intractable without the support of a proof assistant. Despite our choice to use Coq, we believe that nearly all parts of CH 2 O could be formalized in any proof assistant based on higher-order logic. -Executable To obtain more confidence in the accuracy of CH 2 O with respect to C11, the CH 2 O memory model is executable. An executable memory model allows us to test the CH 2 O semantics on example programs and to compare the behavior with that of widely used compilers [33,37]. -Separation logic In order to reason about concrete C programs, one needs a program logic. To that end, the CH 2 O memory model incorporates a complex permission model suitable for separation logic. This permission system, as well as the memory model itself, forms a separation algebra. -Memory refinements CH 2 O has an expressive notion of memory refinements that relates memory states. All memory operations are proven invariant under this notion. Memory refinements form a general way to validate many common-sense properties of the memory model in a formal way. They also open the door to reasoning about program transformations, which is useful if one were to use the memory model as part of a verified When f is called with pointers p and q that are aliased, the assignment to *p also affects *q. As a result, one cannot transform the function body of f into the shorter *p = 10; return (*q);. The shorter function will return 10 in case p and q are aliased, whereas the original f will always return the original value of *q.
Unlike this example, there are many situations in which pointers can be assumed not to alias. It is essential for an optimizing compiler to determine where aliasing cannot occur, and use this information to generate faster code. The technique of determining whether pointers can alias or not is called alias analysis.
In type-based alias analysis, type information is used to determine whether pointers can alias or not. Consider the following example: short g(int *p, short *q) { short z = *q; *p = 10; return z; } Here, a compiler is allowed to assume that p and q are not aliased because they point to objects of different types. The compiler is therefore allowed to transform the function body of g into the shorter *p = 10; return (*q);.
The peculiar thing is that the C type system does not statically enforce the property that pointers to objects of different types are not aliased. A union type can be used to create aliased pointers to different types: union int_or_short { int x; short y; } u = { .y = 3 }; int *p = &u.x; // p points to the x variant of u short *q = &u.y; // q points to the y variant of u return g(p, q); // g is called with aliased pointers p and q The above program is valid according to the rules of the C11 type system, but has undefined behavior during execution of g. This is caused by the standard's notion of effective types [27, 6.5p6-7] (also called strict-aliasing restrictions) that assigns undefined behavior to incorrect usage of aliased pointers to different types.
We will inline part of the function body of g to indicate the incorrect usage of aliased pointers during the execution of the example.
Approach Most existing C formalizations (most notably Norrish [45], Leroy et al. [39,40] and Ellison and Roşu [19]) use an unstructured untyped memory model where each object in the formal memory model consists of an array of bytes. These formalizations therefore cannot assign undefined behavior to violations of the rules for effective types, among other things.
In order to formalize the interaction between low-level and high-level data access, and in particular effective types, we represent the formal memory state as a forest of well-typed trees whose structure corresponds to the structure of data types in C. The leaves of these trees consist of bits to capture low-level aspects of the language.
The key concepts of our memory model are as follows: -Memory trees (Sect. 6.3) are used to represent each object in memory. They are abstract trees whose structure corresponds to the shape of C data types. The memory tree of struct S { short x, *r; } s = {33, &s.x } might be (the precise shape and the bit representations are implementation defined): The leaves of memory trees contain permission annotated bits (Sect. 6.2). Bits are represented symbolically: the integer value 33 is represented as its binary representation 1000010000000000, the padding bytes as symbolic indeterminate bits E (whose actual value should not be used), and the pointer &s.x as a sequence of symbolic pointer bits.
The memory itself is a forest of memory trees. Memory trees are explicit about type information (in particular the variants of unions) and thus give rise to a natural formalization of effective types. -Pointers (Sect. 6.1) are formalized using paths through memory trees. Since we represent pointers as paths, the formal representation contains detailed information about how each pointer has been obtained (in particular which variants of unions were used). A detailed formal representation of pointers is essential to describe effective types. -Abstract values (Definition 6.4) are trees whose structure is similar to memory trees, but have base values (mathematical integers and pointers) on their leaves. The abstract value of struct S { short x, *r; } s = { 33, &s.x } is: Abstract values hide internal details of the memory such as permissions, padding and object representations. They are therefore used in the external interface of the memory model and throughout the operational semantics.
Memory trees, abstract values and bits with permissions can be converted into each other. These conversions are used to define operations internal to the memory model. However, none of these conversions are bijective because different information is materialized in these three data types: Definition 2. 3 We let option A denote the option type over A, whose elements are inductively defined as either ⊥ or x for some x ∈ A. We implicitly lift operations to operate on the option type, and often omit cases of definitions that yield ⊥. This is formally described using the option monad in the Coq formalization.

Definition 2.4
A partial function f from A to B is a function f : A → option B.

Definition 2.5
A partial function f is called a finite partial function or a finite map if its domain dom f := {x | ∃y ∈ B . f x = y} is finite. The type of finite partial functions is denoted as A → fin B. The operation f [x := y] yields f with the value y for argument x. Definition 2. 6 We let A × B denote the product of types A and B. Given a pair (x, y) ∈ A × B, we let (x, y) 1 := x and (x, y) 2 := y denote the first and second projection of (x, y). Definition 2. 7 We let list A denote the list type over A, whose elements are inductively defined as either ε or x x for some x ∈ A and x ∈ list A. We let x i ∈ A denote the ith element of a list x ∈ list A (we count from 0). Lists are sometimes denoted as [ x 0 , . . . , x n−1 ] ∈ list A for x 0 , . . . , x n−1 ∈ A.
We use the following operations on lists: -We often implicitly lift a function f : A 0 → · · · → A n point-wise to the function f : list A 0 → · · · → list A n . The resulting list is truncated to the length of the smallest input list in case n > 1. -We often implicitly lift a predicate P : A 0 → A n−1 → Prop to the predicate P : list A 0 → · · · → list A n−1 → Prop that guarantees that P holds for all (pairs of) elements of the list(s). The lifted predicate requires all lists to have the same length in case n > 1. -We let | x| ∈ N denote the length of x ∈ list A.
-We let x [i, j) ∈ list A denote the sublist x i . . . x j−1 of x ∈ list A. -We let x n ∈ list A denote the list consisting of n times x ∈ A.
-We let ( x y ∞ ) [i, j) ∈ list A denote the sublist x i . . . x j−1 of x ∈ list A which is padded with y ∈ A in case x is too short. -Given lists x ∈ list A and y ∈ list B with | x| = | y|, we let # » x y ∈ list (A × B) denote the point-wise pairing of x and y.

Challenges
This section illustrates a number of subtle forms of underspecification in C by means of example programs, their bizarre behaviors exhibited by widely used C compilers, and their treatment in CH 2 O. Many of these examples involve delicacies due to the interaction between the following two ways of accessing data: -In a high-level way using arrays, structs and unions. -In a low-level way using unstructured and untyped byte representations.
The main problem is that compilers use a high-level view of data access to perform optimizations whereas both programmers and traditional memory models expect data access to behave in a concrete low-level way.

Byte-Level Operations and Object Representations
Apart from high-level access to objects in memory by means of typed expressions, C also allows low-level access by means of byte-wise manipulation. Each object of type τ can be interpreted as an unsigned char array of length sizeof(τ ), which is called the object representation [27, 6.2.6.1p4]. Let us consider: struct S { short x; short *r; } s1 = { 10, &s1.x }; unsigned char *p = (unsigned char*)&s1; On 32-bit computing architectures such as x86 (with _Alignof(short*)= 4), the object representation of s1 might be: The above object representation contains a hole due to alignment of objects. The bytes belonging to such holes are called padding bytes.
Alignment is the way objects are arranged in memory. In modern computing architectures, accesses to addresses that are a multiple of a word sized chunk (often a multiple of 4 bytes on a 32-bit computing architecture) are significantly faster due to the way the processor interacts with the memory. For that reason, the C11 standard has put restrictions on the addresses at which objects may be allocated [27, 6.2.8]. For each type τ , there is an implementationdefined integer constant _Alignof(τ ), and objects of type τ are required to be allocated at addresses that are a multiple of that constant. In case _Alignof(short*)= 4, there are thus two bytes of padding in between the fields of struct S.

i] = ((unsigned char*)&s1)[i];
In the above code, size_t is an unsigned integer type, which is able to hold the results of the sizeof operator [27, 7.19p2].
Manipulation of object representations of structs also involves access to padding bytes, which are not part of the high-level representation. In particular, in the example the padding bytes are also being copied. The problematic part is that padding bytes have indeterminate values, whereas in general, reading an indeterminate value has undefined behavior (for example, reading from an uninitialized int variable is undefined). The C11 standard provides an exception for unsigned char [27, 6.2.6.1p5], and the above example thus has defined behavior.
Our memory model uses a symbolic representation of bits (Definition 6.19) to distinguish determinate and indeterminate memory. This way, we can precisely keep track of the situations in which access to indeterminate memory is permitted.

Padding of Structs and Unions
The following excerpt from the C11 standard points out another challenge with respect to padding bytes [27, 6.2.6.1p6]: When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
Let us illustrate this difficulty by an example: struct S { char x; char y; char z; }; void f(struct S *p) { p->x = 0; p->y = 0; p->z = 0; } On architectures with sizeof(struct S) = 4, objects of type struct S have one byte of padding. The object representation may be as follows: Instead of compiling the function f to three store instructions for each field of the struct, the C11 standard allows a compiler to use a single instruction to store zeros to the entire struct. This will of course affect the padding byte. Consider: struct S s = { 1, 1, 1 }; ((unsigned char*)&s)[3] = 10; f(&s); return ((unsigned char*)&s) [3]; Now, the assignments to fields of s by the function f affect also the padding bytes of s, including the one ((unsigned char*)&s) [3] that we have assigned to. As a consequence, the returned value is unspecified.
From a high-level perspective this behavior makes sense. Padding bytes are not part of the abstract value of a struct, so their actual value should not matter. However, from a lowlevel perspective it is peculiar. An assignment to a specific field of a struct affects the object representation of parts not assigned to.
None of the currently existing C formalizations describes this behavior correctly. In our tree based memory model we enforce that padding bytes always have an indeterminate value, and in turn we have the desired behavior implicitly. Note that if the function call f(&s) would have been removed, the behavior of the example program remains unchanged in CH 2 O.

Type-Punning
Despite the rules for effective types, it is under certain conditions nonetheless allowed to access a union through another variant than the current one. Accessing a union through another variant is called type-punning. For example: union int_or_short { int x; short y; } u = { .x = 3 }; printf("%d\n", u.y); This code will reinterpret the bit representation of the int value 3 of u.x as a value of type short. The reinterpreted value that is printed is implementation-defined (on architectures where shorts do not have trap values).
Since C11 is ambiguous about the exact conditions under which type-punning is allowed, 1 we follow the interpretation by the GCC documentation [20]: Type-punning is allowed, provided the memory is accessed through the union type.
According to this interpretation the above program indeed has implementation defined behavior because the variant y is accessed via the expression u.y that involves the variable u of the corresponding union type.
However, according to this interpretation, type-punning via a pointer to a specific variant of a union type yields undefined behavior. This is in agreement with the rules for effective types. For example, the following program has undefined behavior.
union int_or_short { int x; short y; } u = { .x = 3 }; short *p = &u.y; printf("%d\n", *p); We formalize the interpretation of C11 by GCC by decorating pointers and l-values to subobjects with annotations (Definition 6.4). When a pointer to a variant of a union is stored in memory, or used as the argument of a function, the annotations are changed to ensure that type-punning no longer has defined behavior via that pointer. In Sect. 7.1 we formally establish that this approach is correct by showing that a compiler can perform type-based alias analysis (Theorem 7.2 on p. 51).

Indeterminate Memory and Pointers
A pointer value becomes indeterminate when the object it points to has reached the end of its lifetime [27, 6.2.4] (it has gone out of scope, or has been deallocated). Dereferencing an indeterminate pointer has of course undefined behavior because it no longer points to an actual value. However, not many people are aware that using an indeterminate pointer in pointer arithmetic and pointer comparisons also yields undefined behavior. Consider: int *p = malloc(sizeof(int)); assert (p != NULL); free(p); int *q = malloc(sizeof(int)); assert (q != NULL); if (p == q) { // undefined, p is indeterminate due to the free *q = 10; *p = 14; printf("%d\n", *q); // p and q alias, expected to print 14 } In this code malloc(sizeof(int)) yields a pointer to a newly allocated memory area that may hold an integer, or yields a NULL pointer in case no memory is available. The function free deallocates memory allocated by malloc. In the example we assert that both calls to malloc succeed.
After execution of the second call to malloc it may happen that the memory area of the first call to malloc is reused: we have used free to deallocate it after all. This would lead to the following situation in memory: Both GCC (version 4.9.2) or Clang (version 3.5.0) use the fact that p and q are obtained via different calls to malloc as a license to assume that p and q do not alias. As a result, the value 10 of *q is inlined, and the program prints the value 10 instead of the naively expected value 14.
The situation becomes more subtle because when the object a pointer points to has been deallocated, not just the argument of free becomes indeterminate, but also all other copies of that pointer. This is therefore yet another example where high-level representations interact subtly with their low-level counterparts.
In our memory model we represent pointer values symbolically (Definition 6.4), and keep track of memory areas that have been previously deallocated. The behavior of operations like == depends on the memory state, which allows us to accurately capture the described undefined behaviors.

End-of-Array Pointers
The way the C11 standard deals with pointer equality is subtle. Consider the following excerpt [27, 6.5.9p6]: Two pointers compare equal if and only if […] or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
End-of-array pointers are peculiar because they cannot be dereferenced, they do not point to any value after all. Nonetheless, end-of-array are commonly used when looping through arrays.
int a[4] = { 0, 1, 2, 3 }; int *p = a; while (p < a + 4) { *p += 1; p += 1; } The pointer p initially refers to the first element of the array a. The value p points to, as well as p itself, is being increased as long as p is before the end-of-array pointer a + 4. This code thus increases the values of the array a. The initial state of the memory is displayed below: End-of-array pointers can also be used in a way where the result of a comparison is not well-defined. In the example below, the printf is executed only if x and y are allocated adjacently in the address space (typically the stack). int x, y; if (&x + 1 == &y) printf("x and y are allocated adjacently\n"); Based on the aforementioned excerpt of the C11 standard [27, 6.5.9p6], one would naively say that the value of &x + 1 == &y is uniquely determined by the way x and y are allocated in the address space. However, the GCC implementers disagree. 2 They claim that Defect Report #260 [26] allows them to take the derivation of a pointer value into account.
In the example, the pointers &x + 1 and &y are derived from unrelated objects (the local variables x and y). As a result, the GCC developers claim that &x + 1 and &y may compare unequal albeit being allocated adjacently. Consider: int compare(int *p, int *q) { // some code to confuse the optimizer return p == q; } int main() { int x, y; if (&x + 1 == &y) printf("x and y are adjacent\n"); if (compare(&x + 1, &y)) printf("x and y are still adjacent\n"); } When compiled with GCC (version 4.9.2), we have observed that the string x and y are still adjacent is being printed, wheras x and y are adjacent is not being printed. This means that the value of &x + 1 == &y is not consistent among different occurrences of the comparison.
Due to these discrepancies we assign undefined behavior to questionable uses of end-ofarray pointers while assigning the correct defined behavior to pointer comparisons involving end-of-array pointers when looping through arrays (such as in the first example above). Our treatment is similar to our extension of CompCert [34].

Sequence Point Violations and Non-determinism
Instead of having to follow a specific execution order, the execution order of expressions is unspecified in C. This is a common cause of portability problems because a compiler may use an arbitrary execution order for each expression, and each time that expression is executed. Hence, to ensure correctness of a C program with respect to an arbitrary compiler, one has to verify that each possible execution order is free of undefined behavior and gives the correct result.
In order to make more effective optimizations possible (for example, delaying of sideeffects and interleaving), the C standard does not allow an object to be modified more than once during the execution of an expression. If an object is modified more than once, the program has undefined behavior. We call this requirement the sequence point restriction.
Note that this is not a static restriction, but a restriction on valid executions of the program. Let us consider an example: int x, y = (x = 3) + (x = 4); printf("%d %d\n", x, y); By considering all possible execution orders, one would naively expect this program to print 4 7 or 3 7, depending on whether the assignment x = 3 or x = 4 is executed first. However, x is modified twice within the same expression, and thus both execution orders have undefined behavior. The program is thereby allowed to exhibit any behavior. Indeed, when compiled with gcc -O2 (version 4.9.2), the compiled program prints 4 8, which does not correspond to any of the execution orders.
Our approach to non-determinism and sequence points is inspired by Norrish [44] and Ellison and Roşu [19]. Each bit in memory carries a permission (Definition 5.5) that is set to a special locked permission when a store has been performed. The memory model prohibits any access (read or store) to objects with locked permissions. At the next sequence point, the permissions of locked objects are changed back into their original permission, making future accesses possible again.
It is important to note that we do not have non-determinism in the memory model itself, and have set up the memory model in such a way that all non-determinism is on the level of the small-step operational semantics.

Types in C
This section describes the types used in the CH 2 O memory model. We support integer, pointer, function pointer, array, struct, union and void types. More complicated types such as enum types and typedefs are defined by translation [33,37].
This section furthermore describes an abstract interface, called an implementation environment, that describes properties such as size and endianness of integers, and the layout of structs and unions. The entire CH 2 O memory model and semantics will be parameterized by an implementation environment.

Integer Representations
This section describes the part of implementation environments corresponding to integer types and the encoding of integer values as bits. Integer types consist of a rank (char, short, int …) and a signedness (signed or unsigned). The set of available ranks as well as many of their properties are implementation-defined. We therefore abstract over the ranks in the definition of integer types.

Definition 4.2
An integer coding environment with ranks K consists of a total order (K , ⊆) of integer ranks having at least the following ranks: char ⊂ short ⊂ int ⊂ long ⊂ long long and ptr_rank.
It moreover has the following functions: Here, endianize k and deendianize k should be inverses, endianize k should be a permutation, rank_size should be (non-strictly) monotone, and rank_size char = 1.

Definition 4.3
The judgment x : τ i describes that x ∈ Z has integer type τ i .
The rank ptr_rank is the rank of the integer types size_t and ptrdiff_t, which are defined in the standard library header files [27, 7.19p2]. The type ptrdiff_t is a signed integer type used to represent the result of subtracting two pointers, and the type size_t is an unsigned integer type used to represent sizes of types.
An integer coding environment can have an arbitrary number of integer ranks apart from the standard ones char, short, int, long, long long, and ptr_rank. This way, additional integer types like those describe in [27, 7.20] can easily be included.
The function rank_size gives the byte size of an integer of a given rank. Since we require rank_size to be monotone rather than strictly monotone, integer types with different ranks can have the same size [27, 6.3.1.1p1]. For example, on many implementations int and long have the same size, but are in fact different.
The C11 standard allows implementations to use either sign-magnitude, 1's complement or 2's complement signed integers representations. It moreover allows integer representations to contain padding or parity bits [27, 6.2.6.2]. However, since all current machine architectures use 2's complement representations, this is more of a historic artifact. Current machine architectures use 2's complement representations because these do not suffer from positive and negative zeros and thus enjoy unique representations of the same integer. Hence, CH 2 O restricts itself to implementations that use 2's complement signed integers representations.
Integer representations in CH 2 O can solely differ with respect to endianness (the order of the bits). The function endianize takes a list of bits in little endian order and permutes them accordingly. We allow endianize to yield an arbitrary permutation and thus we not just support big-and little-endian, but also mixed-endian variants. 1. We have (x : τ i ) τ i = x and |x : τ i | = rank_size τ i provided that x : τ i .

Definition of Types
We support integer, pointer, function pointer, array, struct, union and void types. The translation that we have described in [33,37] translates more complicated types, such as typedefs and enums, into these simplified types. This translation also alleviates other simplifications of our simplified definition of types, such as the use of unnamed struct and union fields. Floating point types and type qualifiers like const and volatile are not supported.
All definitions in this section are implicitly parameterized by an integer coding environment with ranks K (Definition 4.2).
The three mutual inductive parts of types correspond to the different components of the memory model. Addresses and pointers have point-to types (Definitions 6.8 and 6.10), base values have base types (Definition 6.40), and memory trees and values have full types (Definitions 6.25 and 6.46).
The void type of C is used for two entirely unrelated purposes: void is used for functions without return type and void* is used for pointers to objects of unspecified type. In CH 2 O this distinction is explicit in the syntax of types. The type void is used for function without return value. Like the mathematical unit type it has one value called nothing (Definition 6.39). The type any * is used for pointers to objects of unspecified type.
Unlike more modern programming languages C does not provide first class functions. Instead, C provides function pointers which are just addresses of executable code in memory instead of closures. Function pointers can be used in a way similar to ordinary pointers: they can be used as arguments and return value of functions, they can be part of structs, unions and arrays, etc.
The C language sometimes allows function types to be used as shorthands for function pointers, for example: void sort(int *p, int len, int compare(int,int)); The third argument of sort is a shorthand for int (*compare)(int,int) and is thus in fact a function pointer instead of a function. We only have function pointer types, and the third argument of the type of the function sort thus contains an additional * : [ (signed int) * , signed int, (signed int → signed int) * ] → void.
Struct and union types consist of just a name, and do not contain the types of their fields. An environment is used to assign fields to structs and unions, and to assign argument and return types to function names. ∈ env := (tag → fin list type) × (types of struct/union fields) (funname → fin (list type × type)) (types of functions) The functions dom tag : env → P fin (tag) and dom funname : env → P fin (funname) yield the declared structs and unions, respectively the declared functions. We implicitly treat environments as functions tag → fin list type and funname → fin (list type × type) that correspond to underlying finite partial functions.
Struct and union names on the one hand, and function names on the other, have their own name space in accordance with the C11 standard [27, 6.2.3p1]. Notation 4. 9 We often write an environment as a mixed sequence of struct and union declarations t : τ , and function declarations f : ( τ , τ ). This is possible because environments are finite.
Since we represent the fields of structs and unions as lists, fields are nameless. For example, the C type struct S1 { int x; struct S1 *p; } is translated into the environment S1 : [ signed int, struct S1 * ].
Although structs and unions are semantically very different (products versus sums, respectively), environments do not keep track of whether a tag has been used for a struct or a union type. Structs and union types with the same tag are thus allowed. The translator in [33,37] forbids the same name being used to declare both a struct and union type.
Although our mutual inductive syntax of types already forbids many incorrect types such as functions returning functions (instead of function pointers), still some ill-formed types such as int[0] are syntactically valid. Also, we have to ensure that cyclic structs and unions are only allowed when the recursive definition is guarded through pointers. Guardedness by pointers ensures that the sizes of types are finite and statically known. Consider the following types: struct list1 { int hd; struct list1 tl; }; /* illegal */ struct list2 { int hd; struct list2 *p_tl; }; /* legal */ The type declaration struct list1 is illegal because it has a reference to itself. In the type declaration struct list2 the self reference is guarded through a pointer type, and therefore legal. Of course, this generalizes to mutual recursive types like: struct tree { int hd; struct forest *p_children; }; struct forest { struct tree *p_hd; struct forest *p_tl; }; Definition 4. 10 The following judgments are defined by mutual induction: -The judgment * τ p describes point-to types τ p to which a pointer may point: t ∈ dom tag struct t t ∈ dom tag union t

Definition 4.11
The judgment describes well-formed environments . It is inductively defined as: Note that τ does not imply . Most results therefore have as a premise. These premises are left implicit in this paper.
In order to support (mutually) recursive struct and union types, pointers to incomplete struct and union types are permitted in the judgment * τ p that describes types to which pointers are allowed, but forbidden in the judgment τ of validity of types. Let us consider the following type declarations: struct S2 { struct S2 x; }; /* illegal */ struct S3 { struct S3 *p; }; /* legal */ Well-formedness of the environment := S3 : [ struct S3 * ] can be derived using the judgments ∅ * struct S3, ∅ b struct S3 * , ∅ struct S3 * , and thus . The environment S2 : [ struct S2 ] is ill-formed because we do not have ∅ struct S2.
The typing rule for function pointers types is slightly more delicate. This is best illustrated by an example: union U { int i; union U (*f) (union U); }; This example displays a recursive self reference to a union type through a function type, which is legal in C because function types are in fact pointer types. Due to this reason, the premises of * τ → τ are * τ and * τ instead of τ and τ . Wellformedness of the above union type can be derived as follows: In order to define operations by recursion over the structure of well-formed types (see for example Definition 6.45, which turns a sequence of bits into a value), we often need to perform recursive calls on the types of fields of structs and unions. In Coq we have defined a custom recursor and induction principle using well-founded recursion. In this paper, we will use these implicitly.
Affeldt et al. [1,2] have formalized non-cyclicity of types using a complex constraint on paths through types. Our definition of validity of environments (Definition 4.11) follows the structure of type environments, and is therefore well-suited to implement the aforementioned recursor and induction principle.
There is a close correspondence between array and pointer types in C. Arrays are not first class types, and except for special cases such as initialization, manipulation of arrays is achieved via pointers. We consider arrays as first class types so as to avoid having to make exceptions for the case of arrays all the time.
Due to this reason, more types are valid in CH 2 O than in C11. The translator in [33,37] resolves exceptional cases for arrays. For example, a function parameter of array type acts like a parameter of pointer type in C11 [27, 6.7.6.3]. 3 void f(int a [10]); The corresponding type of the function f is thus (signed int) * → void. Note that the type (signed int) [10] → void is also valid, but entirely different, and never generated by the translator in [33,37].

Implementation Environments
We finish this section by extending integer coding environments to describe implementationdefined properties related the layout of struct and union types. The author's PhD thesis [33] also considers the implementation-defined behavior of integer operations (such as addition and division) and defines inhabitants of this interface corresponding to actual computing architectures.

Definition 4.12
A implementation environment with ranks K consists of an integer coding environment with ranks K and functions: These functions should satisfy: Here, we let offsetof τ i denote j<i (fieldsizes τ ) j . The functions sizeof , alignof , and fieldsizes should be closed under weakening of . We let sizeof τ specify the number of bytes out of which the object representation of an object of type τ consists. Objects of type τ should be allocated at addresses that are a multiple of alignof τ . We will prove that our abstract notion of addresses satisfies this property (see Lemma 6.18). The functions sizeof , alignof correspond to the sizeof and _Alignof operators [27, 6.5.3.4], and offsetof corresponds to the offsetof macro [27, 7.19p3]. The list fieldsizes τ specifies the layout of a struct type with fields τ as follows:

Permissions and Separation Algebras
Permissions control whether memory operations such as a read or store are allowed or not. In order to obtain the highest level of precision, we tag each individual bit in memory with a corresponding permission. In the operational semantics, permissions have two main purposes: -Permissions are used to formalize the sequence point restriction which assigns undefined behavior to programs in which an object in memory is modified more than once in between two sequence points. -Permissions are used to distinguish objects in memory that are writable from those that are read-only (const qualified in C terminology).
In the axiomatic semantics based on separation logic, permissions play an important role for share accounting. We use share accounting for subdivision of permissions among multiple subexpressions to ensure that: -Writable objects are unique to each subexpression.
-Read-only objects may be shared between subexpressions.
This distinction is originally due to Dijkstra [16] and is essential in separation logic with permissions [11]. The novelty of our work is to use separation logic with permissions for non-determinism in expressions in C. Share accounting gives rise to a natural treatment of C's sequence point restriction.
Separation algebras as introduced by Calcagno et al. [13] abstractly capture common structure of subdivision of permissions. We present a generalization of separation algebras that is well-suited for C verification in Coq and use this generalization to build the permission system and memory model compositionally. The permission system will be constructed as a telescope of separation algebras: Here, Q is the separation algebra of fractional permissions, C is a functor that extends a separation algebra with a counting component, and L is a functor that extends a separation algebra with a lockable component (used for the sequence point restriction). This section explains these functors and their purposes.

Separation Logic and Share Accounting
Before we will go into the details of the CH 2 O permission system, we briefly introduce separation logic. Separation logic [47] is an extension of Hoare logic that provides better means to reason about imperative programs that use mutable data structures and pointers.
The key feature of separation logic is the separating conjunction P * Q that allows one to subdivide the memory into two disjoint parts: a part described by P and another part described by Q. The separating conjunction is most prominent in the frame rule.
This rule enables local reasoning. Given a Hoare triple {P} s {Q}, this rule allows one to derive that the triple also holds when the memory is extended with a disjoint part described by R. The frame rule shows its merits when reasoning about functions. There it allows one to consider a function in the context of the memory the function actually uses, instead of having to consider the function in the context of the entire program's memory. However, already in derivations of small programs the use of the frame rule can be demonstrated 4 : The singleton assertion a − → v denotes that the memory consists of exactly one object with value v at address a. The assignments are not considered in the context of the entire memory, but just in the part of the memory that is used.
The key observation that led to our separation logic for C, see also [31,33], is the correspondence between non-determinism in expressions and a form of concurrency. Inspired by the rule for the parallel composition [46], we have rules for each operator that are of the following shape.
The intuitive idea of this rule is that if the memory can be subdivided into two parts in which the subexpressions e 1 and e 2 can be executed safely, then the expression e 1 e 2 can be executed safely in the whole memory. Non-interference of the side-effects of e 1 and e 2 is guaranteed by the separating conjunction. It ensures that the parts of the memory described by P 1 and P 2 do not have overlapping areas that will be written to. We thus effectively rule out expressions with undefined behavior such as (x = 3) + (x = 4) (see Sect. 3.6 for discussion).
Subdividing the memory into multiple parts is not a simple operation. In order to illustrate this, let us consider a shallow embedding of assertions of separation logic P, Q : mem → Prop (think of mem as being the set of finite partial functions from some set of object identifiers to some set of objects. The exact definition in the context of CH 2 O is given in Definition 6.26). In such a shallow embedding, one would define the separating conjunction as follows: The operation ∪ is not the disjoint union of finite partial functions, but a more fine grained operation. There are two reasons for that. Firstly, subdivision of memories should allow for partial overlap, as long as writable objects are unique to a single part. For example, the expression x + x has defined behavior, but the expressions x + (x = 4) and (x = 3) + (x = 4) have not.
We use separation logic with permissions [11] to deal with partial overlap of memories. That means, we equip the singleton assertion a γ − → v with a permission γ . The essential property of the singleton assertion is that given a writable permission γ w there is a readable permission γ r with: The above property is an instance of a slightly more general property. We consider a binary operation ∪ on permissions so we can write: Secondly, it should be possible to subdivide array, struct and union objects into subobjects corresponding to their elements. For example, in the case of an array int a [2], the expression (a[0] = 1) + (a[1] = 4) has defined behavior, and we should be able to prove so. The essential property of the singleton assertion for an array [ y 0 , . . . , y n−1 ] value is: This paper does not describe the CH 2 O separation logic and its shallow embedding of assertions. These are described in the author's PhD thesis [33]. Instead, we consider just the operations ∪ on permissions and memories.

Separation Algebras
As shown in the previous section, the key operation needed to define a shallow embedding of separation logic with permissions is a binary operation ∪ on memories and permissions. Calcagno et al. introduced the notion of a separation algebra [13] so as to capture common properties of the ∪ operation. A separation algebra (A, ∅, ∪) is a partial cancellative commutative monoid (see Definition 5.1 for our actual definition). Some prototypical instances of separation algebras are: where ∅ is the empty finite partial function, and ∪ the disjoint union on finite partial functions.
Separation algebras are also closed under various constructs (such as products and finite functions), and complex instances can thus be built compositionally.
When formalizing separation algebras in the Coq proof assistant, we quickly ran into some problems: -Dealing with partial operations such as ∪ is cumbersome, see Sect. 8.3.
-Dealing with subset types (modeled as -types) is inconvenient.
-Operations such as the difference operation \ cannot be defined constructively from the laws of a separation algebra.
In order to deal with the issue of partiality, we turn ∪ into a total operation. Only in case x and y are disjoint, notation x ⊥ y, we require x ∪ y to satisfy the laws of a separation algebra. Instead of using subsets, we equip separation algebras with a predicate valid : A → Prop that explicitly describes a subset of the carrier A. Lastly, we explicitly add a difference operation \. Definition 5.1 A separation algebra consists of a type A, with: Satisfying the following laws: Laws 1-4 describe the traditional laws of a separation algebra: identity, commutativity, associativity and cancellativity. Law 5 ensures that valid is closed under the ∪ operation. Law 6 describes positivity. Laws 7 and 8 fully axiomatize the ⊆ relation and \ operation. Using the positivity and cancellation law, we obtain that ⊆ is a partial order in which ∪ is order preserving and respecting.
In case of permissions, the ∅ element is used to split objects of compound types (arrays and structs) into multiple parts. We thus use separation algebras instead of permission algebras [47], which are a variant of separation algebras without an ∅ element.

Definition 5.2
The Boolean separation algebra bool is defined as: In the case of fractional permissions [0, 1] Q the problem of partiality and subset types already clearly appears. The ∪ operation (here +) can 'overflow'. We remedy this problem by having all operations operate on pre-terms (here Q) and the predicate valid describes validity of pre-terms (here 0 ≤ _ ≤ 1).

Definition 5.3
The fractional separation algebra Q is defined as: The version of separation algebras by Klein et al. [29] in Isabelle also models ∪ as a total operation and uses a relation ⊥. There are some differences: -We include a predicate valid to prevent having to deal with subset types. -They have weaker premises for associativity (law 3), namely x ⊥ y, y ⊥ z and x ⊥ z instead of x ⊥ y and x ∪ y ⊥ z. Ours are more natural, e.g. for fractional permissions one has 0.5 ⊥ 0.5 but not 0.5 + 0.5 ⊥ 0.5, and it thus makes no sense to require 0.5 ∪ (0.5 ∪ 0.5) = (0.5 ∪ 0.5) ∪ 0.5 to hold. -Since Coq (without axioms) does not have a choice operator, the \ operation cannot be defined in terms of ∪. Isabelle has a choice operator.
Dockins et al. [17] have formalized a hierarchy of different separation algebras in Coq. They have dealt with the issue of partiality by treating ∪ as a relation instead of a function. This is unnatural, because equational reasoning becomes impossible and one has to name all auxiliary results.
Bengtson et al. [6] have formalized separation algebras in Coq to reason about objectoriented programs. They have treated ∪ as a partial function, and have not defined any complex permission systems.

Permissions
In this section we define the CH 2 O permission system and show that it forms a separation algebra. We furthermore define permission kinds, which are used to classify the abilities of the permissions.

Definition 5.4 The lattice of permission kinds (pkind, ⊆) is defined as:
The order k 1 ⊆ k 2 expresses that k 1 has fewer abilities than k 2 . This organization of permissions is inspired by that of Leroy et al. [40]. The intuitive meaning of the above permission kinds is as follows: -Writable. Writable permissions allow reading and writing.
-Readable. Read-only permissions allow solely reading.
-Existing. Existence permissions [11] are used for objects that are known to exist but whose value cannot be used. Existence permissions are used to model that C only allows pointer arithmetic on pointers that refer to objects that have not been previously deallocated (see Sect. 3.4 for discussion). -Locked. Locked permissions are used to formalize the sequence point restriction. When an object is modified during the execution of an expression, it is temporarily given a locked permission to forbid any read/write accesses until the next sequence point. For example, in (x = 3) + *p; the assignment x = 3 locks the permissions of the object x. Since future read/write accesses to x are forbidden, accessing *p results in undefined in case p points to x. At the sequence point ";", the original permission of x is restored. Locked permissions are different from existence permissions because the operational semantics can change writable permissions into locked permissions and vice versa, but cannot do that with existence permissions. -⊥. Empty permissions allow no operations.
In the CH 2 O separation logic we do not only have control which operations are allowed, but we also have to deal with share accounting.
-We need to subdivide objects with writable or read-only permission into multiple parts with read-only permission. For example, in the expression x + x, both subexpressions require x to have at least read-only permission. -We need to subdivide objects with writable permission into a part with existence permission and a part with writable permission. For example, in the expression *(p + 1) = (*p = 1), the subexpression *p = 1 requires *p to have writable permission, and the subexpression *(p + 1) requires *p to have at least existence permission in order to perform pointer arithmetic on p.
We combine fractional permissions with counting permissions to combine these kinds of share accounting. Counting permissions have originally been introduced by Bornat et al. [11]. The author's PhD thesis [33] gives the exact definition of the separation algebra structure of permissions by defining it one by one for the counting separation algebra C, the lockable separation algebra L, and the separation algebra on sums +. We omit the formal definitions of these separation algebras in this paper.
We have three sorts of permissions: -Unlocked permissions ♦(x, y) where x ∈ Q counts the number existence permissions, and y ∈ Q is a fractional permission accounting for the read/write share. Permissions ♦(x, 0) with x < 0 are existence permissions (see also Definitions 5.6 and 5.9). Note that the counter x is not a fractional permission and is thus not restricted to the interval counts the number existence permissions, and y ∈ Q is a fractional permission accounting for the read/write share. -Const permissions γ ∈ Q, which are used for const qualified objects. Modifying an object with const permissions results in undefined behavior. Const permissions do not have a locked variant or an existence counter as they do not allow writing.
The areas marked green in Fig. 1 indicate the definition of the valid predicate on permissions. The figure furthermore visualizes how the permissions are projected onto their kinds, which is defined formally below.
The locking operations lock, unlock : perm → perm are defined as: The lock operation should only be used on permissions γ with Writable ⊆ kind γ . In other cases it produces a dummy value. Likewise, unlock should only be used on permissions γ with kind γ = Locked, and produces a dummy otherwise.
The operation ∪ on permissions is defined as point-wise addition of the counting permission and the fractional permission, and the operation \ is defined as point-wise subtraction. The exact definitions can be found in [33]. Apart from the common separation algebra connectives, we define an operation 1 2 to subdivide a writable or read-only permission into two read-only permissions.
Existence permissions are used to subdivide objects with writable permission into a part with existence permission and a part with writable permission. For example, in *(p + 1) = (*p = 1), the subexpression *p = 1 requires writable permission of *p, and *(p + 1) requires an existence permission of *p to perform pointer arithmetic. Subdivision is achieved using the \ operation, which can be used to split a writable permission γ into an existence permission token and writable permission γ \token. Law 8 of separation algebras guarantees that combining the subdivided permissions gives back the original permission, i.e. token ∪ (γ \token) = γ . Note that because tokens can be combined using ∪ and subdivided using 1 2 , the counter x in the permissions ♦(x, y) and (x, y) is an arbitrary rational number. As ensured by Definition 6.58, only objects with the full ♦(0, 1) permission can be deallocated, whereas objects with γ \token permission cannot. This is to model that expressions such as (p == p) + (free(p),0) have undefined behavior.
The following lemma shows that the operations on permissions interact accordingly and respect the permission kinds.

Lemma 5.10
Permissions satisfy the following properties:

Extended Separation Algebras
We extend the notion of a separation algebra with a split operation 1 2 , and predicates unmapped and exclusive that associate -in an abstract way -an intended semantics to elements of a separation algebra. Recall that the split operation 1 2 plays an important role in the CH 2 O separation logic [31,33] to subdivide objects with writable or read-only permission into multiple parts with read-only permission.

Definition 5.11
An extended separation algebra extends a separation algebra with: -Predicates splittable, unmapped, exclusive : A → Prop -A unary operation 1 2 : A → A Satisfying the following laws: If splittable y and x⊆y, then splittable x 12. If x ⊥ y and splittable (x ∪ y), then 1 2 (x ∪ y) = 1 2 x ∪ 1 2 y 13. unmapped ∅, and if unmapped x, then valid x 14. If unmapped y and x⊆y, then unmapped x 15. If x ⊥ y, unmapped x and unmapped y, then unmapped (x ∪ y) 16. exclusive x iff valid x and for all y with x ⊥ y we have unmapped y 17. Not both exclusive x and unmapped x 18. There exists an x with valid x and not unmapped x Note that 1 2 is described by a total function whose result, 1 2 x, is only meaningful if splittable x holds. This is to account for locked permissions, which cannot be split. Law 11 ensures that splittable permissions are infinitely splittable, and law 12 ensures that 1 2 distributes over ∪.
The predicates unmapped and exclusive associate an intended semantics to the elements of a separation algebra in an abstract way. The predicate unmapped describes whether the permission allows its content to be used, as will become clear in the definition of the tagged separation algebra (Definition 5.13). The predicate exclusive is the dual of unmapped. Let us consider the separation algebra of fractional permissions to describe the intended meaning of these predicates.

Definition 5.12
The fractional separation algebra Q is extended with: Remember that permissions will be used to annotate each individual bit in memory. Unmapped permissions are on the bottom: they do not allow their bit to be used in any way. Exclusive permissions are on the top: they are the sole owner of a bit and can do anything to that bit without affecting disjoint bits.
Fractional permissions have exactly one unmapped element and exactly one exclusive element, but CH 2 O permissions have more structure. The elements of the CH 2 O permission system are classified as follows: In order to formalize the intuitive meaning of the unmapped predicate and to abstractly describe bits annotated with permissions, we introduce the tagged separation algebra T t T (A). In the memory model it is instantiated as T E bit (perm) (Definition 6.21). The elements (γ , b) consist of a permission γ ∈ perm and bit b ∈ bit. We use the symbolic bit E that represents indeterminate storage to ensure that bits with unmapped permissions have no usable value.

Definition 5.13
Given a separation algebra A and a set of tags T with default tag t ∈ T , the tagged separation The definitions of the omitted relations and operations are as expected.

The Memory Model
This section defines the CH 2 O memory model whose external interface consists of operations with the following types: Many of these operations depend on the typing environment which assigns fields to structs and unions (Definition 4.8). This dependency is required because these operations need to be aware of the layout of structs and unions.
The operation m a yields the value stored at address a in memory m. It fails with ⊥ if the permissions are insufficient, effective types are violated, or a is an end-of-array address. Reading from (the abstract) memory is not a pure operation. Although it does not affect the memory contents, it may affect the effective types [27, 6.5p6-7]. This happens for example in case type-punning is performed (see Sect. 3.3). This impurity is factored out by the operation force a m.
The operation m a := v stores the value v at address a in memory m. A store is only permitted in case permissions are sufficient, effective types are not violated, and a is not an end-of-array address. The proposition writable a m describes the side-conditions necessary to perform a store.
After a successful store, the operation lock a m is used to lock the object at address a in memory m. The lock operation temporarily reduces the permissions to Locked so as to prohibit future accesses to a. Locking yields a formal treatment of the sequence point restriction (which states that modifying an object more than once between two sequence points results in undefined behavior, see Sect. 3.6).
The operational semantics accumulates a set ∈ lockset of addresses that have been written to (Definition 6.54) and uses the operation unlock m at the subsequent sequence point (which may be at the semicolon that terminates a full expression). The operation unlock m restores the permissions of the addresses in and thereby makes future accesses to the addresses in possible again. The author's PhD thesis [33] describes in detail how sequence points and locks are treated in the operational semantics.
The operation alloc o v μ m allocates a new object with value v in memory m. The object has object identifier o / ∈ dom m which is non-deterministically chosen by the operation semantics. The Boolean μ expresses whether the new object is allocated by malloc.
Accompanying alloc , the operation free o m deallocates a previously allocated object with object identifier o in memory m. In order to deallocate dynamically obtained memory via free, the side-condition freeable a m describes that the permissions are sufficient for deallocation, and that a points to a malloced object.

Representation of Pointers
Adapted from CompCert [40,41], we represent memory states as finite partial functions from object identifiers to objects. Each local, global and static variable, as well as each invocation of malloc, is associated with a unique object identifier of a separate object in memory. This approach separates unrelated objects by construction, and is therefore well-suited for reasoning about memory transformations.
We improve on CompCert by modeling objects as structured trees instead of arrays of bytes to keep track of padding bytes and the variants of unions. This is needed to faithfully describe C11's notion of effective types (see page 4 of Sect. 1 for an informal description). This approach allows us to describe various undefined behaviors of C11 that have not been considered by others (see Sects. 3.1 and 3.3).
In the CompCert memory model, pointers are represented as pairs (o, i) where o is an object identifier and i is a byte offset into the object with object identifier o. Since we represent objects as trees instead of as arrays of bytes, we represent pointers as paths through these trees rather than as byte offsets. Definition 6.2 Object identifiers o ∈ index are elements of a fixed countable set. In the Coq development we use binary natural numbers, but since we do not rely on any properties apart from countability, we keep the representation opaque.
We first introduce a typing environment to relate the shape of paths representing pointers to the types of objects in memory.

An object identifier o is alive, notation
Memory typing environments evolve during program execution. The code below is annotated with the corresponding memory environments in red.
Here, o 1 is the object identifier of the variable x, o 2 is the object identifier of the variable p and o 3 is the object identifier of the storage obtained via malloc.
Memory typing environments also keep track of objects that have been deallocated. Although one cannot directly create a pointer to a deallocated object, existing pointers to such objects remain in memory after deallocation (see the pointer p in the above example). These pointers, also called dangling pointers, cannot actually be used. Definition 6.4 References, addresses and pointers are inductively defined as: References are paths from the top of an object in memory to some subtree of that object. The shape of references matches the structure of types: − → i is used to select the ith element of a τ -array of length n. References can describe most pointers in C but cannot account for end-of-array pointers and pointers to individual bytes. We have therefore defined the richer notion of addresses. An address (o : τ, r , i) σ > * σ p consists of: -An object identifier o with type τ . -A reference r to a subobject of type σ in the entire object of type τ . -An offset i to a particular byte in the subobject of type σ (note that one cannot address individual bits in C). -The type σ p to which the address is cast. We use a points-to type in order to account for casts to the anonymous void* pointer, which is represented as the points-to type any. This information is needed to define, for example, pointer arithmetic, which is sensitive to the type of the address.
In turn, pointers extend addresses with a NULL pointer NULL σ p for each type σ p , and function pointers f τ →τ which contain the name and type of a function.
Let us consider the following global variable declaration: ; int y; } u; void *p; } s; The formal representation of the pointer (void*)(s.u.x + 2) is: Here, o s is the object identifier associated with the variable s of type struct S. The The annotation any describes that the pointer has been cast to type void*.
The annotations q ∈ {•, •} on references union s − −−→ q i describe whether type-punning is allowed or not. The annotation • means that type-punning is allowed, i.e. accessing another variant than the current one has defined behavior. The annotation • means that type-punning is forbidden. A pointer whose annotations are all of the shape •, and thereby does not allow type-punning at all, is called frozen. Definition 6.5 The freeze function | _ | • : refseg → refseg is defined as: A reference segment r is frozen, notation frozen r , if | r | • = r . Both | _ | • and frozen are lifted to references, addresses, and pointers in the expected way.
Pointers stored in memory are always in frozen shape. Definitions 6.32 and 6.41 describe the formal treatment of effective types and frozen pointers, but for now we reconsider the example from Sect These pointers are likely to have the same object representation on actual computing architectures. However, due to effective types, &u.y may be used for type-punning but p may not. It is thus important that we distinguish these pointers in the formal memory model.
The additional structure of pointers is also needed to determine whether pointer subtraction has defined behavior. The behavior is only defined if the given pointers both point to an element of the same array object [27, 6. Here, the pointers s.a + 3 and s.b have different representations in the CH 2 O memory model. The author's PhD thesis [33] gives the formal definition of pointer subtraction.
We will now define typing judgments for references, addresses and pointers. The judgment for references r : τ σ states that σ is a subobject type of τ which can be obtained via the reference r (see also Definition 7.1). For example, int [2] is a subobject type of struct S { int x [2]; int y [3]; } via struct S −−−→ 0.

Definition 6.6 The judgment
r : τ σ describes that r is a valid reference from τ to σ . It is inductively defined as: The typing judgment for addresses is more involved than the judgment for references. Let us first consider the following example: int a [4]; Assuming the object a has object identifier o a , the end-of-array pointer a+4 could be represented in at least the following ways (assuming sizeof (signed int) = 4): In order to ensure canonicity of pointer representations, we let the typing judgment for addresses ensure that the reference r of (o : τ, r , i) σ > * σ p always refers to the first element of an array subobject. This renders the second representation illegal.

Definition 6.7
The relation τ > * σ p , type τ is pointer castable to σ p , is inductively defined by τ > * τ , τ > * unsigned char, and τ > * any. Definition 6.8 The judgment , * a : σ p describes that the address a refers to type σ p . It is inductively defined as: Here, the helper functions offset, size : ref → N are defined as: We use an intrinsic encoding of syntax, which means that terms contain redundant type annotations so we can read off types. Functions to read off types are named typeof and will not be defined explicitly. Type annotations make it more convenient to define operations that depend on types (such as offset and size in Definition 6.8). As usual, typing judgments ensure that type annotations are consistent.
The premises i ≤ sizeof σ · size r and sizeof σ p | i of the typing rule ensure that the byte offset i is aligned and within range. The inequality i ≤ sizeof σ · size r is non-strict so as to allow end-of-array pointers. Definition 6.9 An address a = (o : τ, r , i) σ > * σ p is called strict, notation a strict, in case it satisfies i < sizeof σ · size r .
The judgment τ > * σ p does not describe the typing restriction of cast expressions. Instead, it defines the invariant that each address (o : τ, r , i) σ > * σ p should satisfy. Since C is not type safe, pointer casting has τ > * σ p as a run-time side-condition: int x, *p = &x; void *q = (void*)p; // OK, signed int > * any int *q1 = (int*)q; // OK, signed int > * signed int short *q2 = (short*)p; // Statically ill-typed short *q3 = (short*)q; // Undefined behavior, signed int > * signed short Definition 6.10 The judgment , * p : σ p describes that the pointer p refers to type σ p . It is inductively defined as: Here, the function setoffset : N → ref → ref is defined as: Let us display the above definition graphically. Given an address (o : τ, r , i) σ > * σ p , the normalized reference and normalized byte offset are as follows: For end-of-array addresses the normalized reference is ill-typed because references cannot be end-of-array. For strict addresses the normalized reference is well-typed.

Definition 6.12 The judgment
p alive describes that the pointer p is alive. It is inductively defined as: The judgment o alive on object identifiers is defined in Definition 6.3.
For many operations we have to distinguish addresses that refer to an entire object and addresses that refer to an individual byte of an object. We call addresses of the later kind byte addresses. For example: int x, *p = &x; // p is not a byte address unsigned char *q = (unsigned char*)&x; // q is a byte address Definition 6.13 An address (o : τ, r , i) To express that memory operations commute (see for example Lemma 6.36), we need to express that addresses are disjoint, meaning they do not overlap. Addresses do not overlap if they belong to different objects or take a different branch at an array or struct. Let us consider an example: The pointers &u1 and &u2 are disjoint because they point to separate memory objects. Writing to one does not affect the value of the other and vice versa. Likewise, &u1.s.x and &u1.s.y are disjoint because they point to different fields of the same struct, and as such do not affect each other. The pointers &u1.s.x and &u1.z are not disjoint because they point to overlapping objects and thus do affect each other. Definition 6.14 Disjointness of references r 1 and r 2 , notation r 1 ⊥ r 2 , is inductively defined as: Note that we do not require a special case for | r 1 | • = | r 2 | • . Such a case is implicit because disjointness is defined in terms of prefixes.

Definition 6.15
Disjointness of addresses a 1 and a 2 , notation a 1 ⊥ a 2 , is inductively defined as: index both a 1 and a 2 are byte addresses index The first inference rule accounts for addresses whose object identifiers are different, the second rule accounts for addresses whose references are disjoint, and the third rule accounts for addresses that point to different bytes of the same subobject. Disjointness implies non-overlapping bit-offsets, but the reverse implication does not always hold because references to different variants of unions are not disjoint. For example, given the declaration union { struct { int x, y; } s; int z; } u, the pointers corresponding to &u.s.y and &u.z are not disjoint.

Representation of Bits
As shown in Sect. 3.1, each object in C can be interpreted as an unsigned char array called the object representation. On actual computing architectures, the object representation consists of a sequence of concrete bits (zeros and ones). However, so as to accurately describe all undefined behaviors, we need a special treatment for the object representations of pointers and indeterminate memory in the formal memory model. To that end, CH 2 O represents the bits belonging to the object representations of pointers and indeterminate memory symbolically. A bit is either a concrete bit 0 or 1, the ith fragment bit (ptr p) i of a pointer p, or the indeterminate bit E. Integers are represented using concrete sequences of bits, and pointers as sequences of fragment bits. Assuming bitsizeof (signed int * ) = 32, a pointer p to a signed int will be represented as the bit sequence (ptr p) 0 . . . (ptr p) 31 , and assuming bitsizeof (signed int) = 32 on a little-endian architecture, the integer 33 : signed int will be represented as the bit sequence 1000010000000000.
The approach using a combination of symbolic and concrete bits is similar to Leroy et al. [40] and has the following advantages: -Symbolic bit representations for pointers avoid the need to clutter the memory model with subtle, implementation-defined, and run-time dependent operations to decode and encode pointers as concrete bit sequences. -We can precisely keep track of memory areas that are uninitialized. Since these memory areas consist of arbitrary concrete bit sequences on actual machines, most operations on them have undefined behavior. -While reasoning about program transformations one has to relate the memory states during the execution of the source program to those during the execution of the target program. Program transformations can, among other things, make more memory defined (that is, transform some indeterminate E bits into determinate bits) and relabel the memory. Symbolic bit representations make it easy to deal with such transformations (see Sect. 7.2).
-It vastly decreases the amount of non-determinism, making it possible to evaluate the memory model as part of an executable semantics [33,37]. -The use of concrete bit representations for integers still gives a semantics to many lowlevel operations on integer representations.
A small difference with Leroy et al. [40] is that the granularity of our memory model is on the level of bits rather than bytes. Currently we do not make explicit use of this granularity, but it allows us to support bit-fields more faithfully with respect to the C11 standard in future work.
Objects in our memory model are annotated with permissions. We use permission annotations on the level of individual bits, rather than on the level of bytes or entire objects, to obtain the most precise way of permission handling.

Definition 6.21
Permission annotated bits are defined as: In the above definition, T is the tagged separation algebra that has been defined in Definition 5.13. We have spelled out its definition for brevity's sake.

Definition 6.22
The judgment , b describes that a permission annotated bit b is valid. It is inductively defined as:

Representation of the Memory
Memory trees are abstract trees whose structure corresponds to the shape of data types in C. They are used to describe individual objects (base values, arrays, structs, and unions) in memory. The memory is a forest of memory trees.

Definition 6.23
Memory trees are inductively defined as: The structure of memory trees is close to the structure of types (Definition 4.7) and thus reflects the expected semantics of types: arrays are lists, structs are tuples, and unions are sums. Let us consider the following example: The memory tree representing the object s with object identifier o s may be as follows (permissions are omitted for brevity's sake, and integer encoding and padding are subject to implementation-defined behavior): The representation of unions requires some explanation. We considered two kinds of memory trees for unions: -The memory tree union t (i, w, b) represents a union whose variant is i. Unions of variant i can only be accessed through a pointer to variant i. This is essential for effective types. The list b represents the padding after the element w. -The memory tree union t b represents a union whose variant is yet unspecified. Whenever the union is accessed through a pointer to variant i, the list b will be interpreted as a memory tree of the type belonging to the ith variant.
The reason that we consider unions union t b with unspecific variant at all is that in some cases the variant cannot be known. Unions that have not been initialized do not have a variant yet. Also, when a union object is constructed byte-wise through its object representation, the variant cannot be known.
Although unions are tagged in the formal memory, actual compilers implement untagged unions. Information about variants should thus be internal to the formal memory model. In Sect. 7.2 we prove that this is indeed the case.
The additional structure of memory trees, namely type annotations, variants of unions, and structured information about padding, can be erased by flattening. Flattening just appends the bytes on the leaves of the tree.

Definition 6.24
The flatten operation (_) : mtree → list pbit is defined as: The flattened version of the memory tree representing the object s in the previous example is as follows: 10000100 01000100 EEEEEEEE EEEEEEEE (ptr p) 0 (ptr p) 1

. . . (ptr p) 31
Definition 6.25 The judgment , w : τ describes that the memory tree w has type τ . It is inductively defined as: Although padding bits should be kept indeterminate (see Sect. 3.1), padding bits are explicitly stored in memory trees for uniformity's sake. The typing judgment ensures that the value of each padding bit is E and that the padding thus only have a permission. Storing a value in padding is a no-op (see Definition 6.35).
The side-condition ¬unmapped (w b) in the typing rule for a union union t (i, w, b) of a specified variant ensures canonicity. Unions whose permissions are unmapped cannot be accessed and should therefore be in an unspecified variant. This condition is essential for the separation algebra structure, see Sect. 7.4.

Definition 6.26
Memories are defined as: Each object (w, μ) in memory is annotated with a Boolean μ to describe whether it has been allocated using malloc (in case μ = true) or as a block scope local, static, or global variable (in case μ = false). The types of deallocated objects are kept to ensure that dangling pointers (which may remain to exist in memory, but cannot be used) have a unique type. The judgment o alive on object identifiers is defined in Definition 6.3.

Definition 6.28
The minimal memory typing environment m ∈ memenv of a memory m is defined as: 29 We let m denote , m m.
Many of the conditions of the judgment , m ensure that the types of m match up with the types in the memory environment (see Definition 6.3). One may of course wonder why do we not define the judgment m directly, and even consider typing of a memory in an arbitrary memory environment. Consider: Using an assertion of separation logic we can describe the memory induced by the above program as x − → 10 * p − → &x. The separation conjunction * describes that the memory can be subdivided into two parts, a part for x and another part for p. When considering p − → &x in isolation, which is common in separation logic, we have a pointer that refers outside the part itself. This isolated part is thus not typeable by m, but it is typeable in the context of a the memory environment corresponding to the whole memory. See also Lemma 7.26.
In the remaining part of this section we will define various auxiliary operations that will be used to define the memory operations in Sect. 6.5. We give a summary of the most important auxiliary operations: Here, we poke some bytes into the object representation of u, and interpret these as a memory tree of type short.
We have defined the flatten operation w that takes a memory tree w and yields its bit representation already in Definition 6.24. We now define the operation which goes in opposite direction, called the unflatten operation.

Definition 6.30
The unflatten operation (_) τ : list pbit → mtree is defined as: In Sect. 7.2 we prove weaker variants of these cancellation properties that are sufficient for proofs about program transformations. Definition 6.31 Given a permission γ ∈ perm, the operation new γ : type → mtree that yields the indeterminate memory tree is defined as: The memory tree new γ τ that consists of indeterminate bits with permission γ is used for objects with indeterminate value. We have defined new γ τ in terms of the unflattening operation for simplicity's sake. This definition enjoys desirable structural properties such as new γ (τ [n]) = (new γ τ ) n .
We will now define the lookup operation m[a] that yields the subtree at address a in the memory m. The lookup function is partial, it will fail in case a is end-of-array or violates effective types. We first define the counterpart of lookup on memory trees and then lift it to memories.
The lookup operation uses the annotations q ∈ {•, •} on -The annotation q = • allows a union to be accessed via a reference whose variant is unequal to the current one. This is called type-punning. -The annotation q = • allows a union to be accessed only via a reference whose variant is equal to the current one. This means, it rules out type-punning.
Failure of type-punning is captured by partiality of the lookup operation. The behavior of type-punning of union t ( j, w, b) via a reference to variant i is described by the conversion ((w b) [0, bitsizeof τ i ) ) τ i . The memory tree w is converted into bits and reinterpreted as a memory tree of type τ i .  m (index a) = (w, μ). In omitted cases the result is ⊥. In this definition we let i := byte a · char_bits and j := (byte a + 1) · char_bits.
We have to take special care of addresses that refer to individual bytes rather than whole objects. Consider: In this code, we obtain the first byte ((unsigned char*)&s)[0] of the struct s. This is formalized by flattening the entire memory tree of the struct s, and selecting the appropriate byte.
The C11 standard's description of effective types [27, 6.5p6-7] states that an access (which is either a read or store) affects the effective type of the accessed object. This means that although reading from memory does not affect the memory contents, it may still affect the effective types. Let us consider an example where it is indeed the case that effective types are affected by a read: In this code, the variant of the union u is initially unspecified. The read *q in g forces its variant to y, making the assignment *p to variant x undefined. Note that it is important that we also assign undefined behavior to this example, a compiler may assume p and q to not alias regardless of how g is called.
We factor these side-effects out using a function force : addr → mem → mem that updates the effective types (that is the variants of unions) after a successful lookup. The force function, as defined in Definition 6.5, can be described in terms of the alter operation m[a/ f ] that applies the function f : mtree → mtree to the object at address a in the memory m and update variants of unions accordingly to a. To define force we let f be the identify.
In the last two cases we have t = τ , s :  (w, μ). In this definition we let: where i := byte a · char_bits and j := (byte a + 1) · char_bits.
The lookup and alter operation enjoy various properties; they preserve typing and satisfy laws about their interaction. We list some for illustration.  A variant of Lemma 6.37 for byte addresses is more subtle because a byte address can be used to modify padding. Since modifications of padding are masked, a successive lookup may yield a memory tree with more indeterminate bits. In Sect. 7.2 we present an alternative lemma that covers this situation.
We conclude this section with a useful helper function that zips a memory tree and a list. It is used in for example Definitions 6.58 and 7.22.

Definition 6.38
Given a function f : pbit → B → pbit, the operation that zips the leaveŝ f : mtree → list B → mtree is defined as: where n := | w| and s i :

Representation of Values
Memory trees (Definition 6.23) are still rather low-level and expose permissions and implementation specific properties such as bit representations. In this section we define abstract values, which are like memory trees but have mathematical integers and pointers instead of bit representations as leaves. Abstract values are used in the external interface of the memory model.

Definition 6.39
Base values are inductively defined as: While performing byte-wise operations (for example, byte-wise copying a struct containing pointer values), abstraction is broken, and pointer fragment bits have to reside outside of memory. The value byte b is used for this purpose.
The side-conditions of the typing rule for byte b ensure canonicity of representations of base values. It ensures that the construct byte b is only used if b cannot be represented as an integer int unsigned char x or indet (unsigned char).
In Definition 6.44 we define abstract values by extending base values with constructs for arrays, structs and unions. In order to define the operations to look up and store values in memory, we define conversion operations between abstract values and memory trees. Recall that the leaves of memory trees, which represent base values, are just sequences of bits. We therefore first define operations that convert base values to and from bits. These operations are called flatten and unflatten.

Definition 6.41
The flatten operation (_) : baseval → list bit is defined as: The operation _ : τ i : Z → list bool is defined in Definition 4.4.

Definition 6.42
The unflatten operation (_) τ b : list bit → baseval is defined as: The operation (_) τ i : list bool → Z is defined in Definition 4.4.
The encoding of pointers is an important aspect of the flatten operation related to our treatment of effective types. Pointers are encoded as sequences of frozen pointer fragment bits (ptr | p | • ) i (see Definition 6.5 for the definition of frozen pointers). Recall that the flatten operation is used to store base values in memory, whereas the unflatten operation is used to retrieve them. This means that whenever a pointer p is stored and read back, the frozen variant | p | • is obtained.
Freezing formally describes the situations in which type-punning is allowed since a frozen pointer cannot be used to access a union of another variant than its current one (Definition 6.32). Let us consider an example: union U { int x; short y; } u = { .x = 3 }; short *p = &u.y; // a frozen version of the pointer &u.y is stored printf("%d", *p); // type-punning via a frozen pointer -> undefined Here, an attempt to type-punning is performed via the frozen pointer p, which is formally represented as: The lookup operation on memory trees (which will be used to obtain the value of *p from memory, see Definitions 6.32 and 6.58) will fail. The annotation • prevents a union from being accessed through an address to another variant than its current one. In the example below type-punning is allowed: union U { int x; short y; } u = { .x = 3 }; printf("%d", u.y); Here, type-punning is allowed because it is performed directly via u.y, which has not been stored in memory, and thus has not been frozen.
The abstract value union t v represents a union whose variant is unspecified. The values v correspond to interpretations of all variants of union t. Consider: union U { int x; short y; int *p; } u; for (size_t i = 0; i < sizeof(u); i++) ((unsigned char*)&u)[i] = 0; Here, the object representation of u is initialized with zeros, and its variant thus remains unspecified. The abstract value of u is 5 : Recall that the variants of a union occupy a single memory area, so the sequence v of a union value union t v cannot be arbitrary. There should be a common bit sequence representing it. This is not the case in: The typing judgment for abstract values guarantees that v can be represented by a common bit sequence. In order to express this property, we first define the unflatten operation that converts a bit sequence into an abstract value.
This example already illustrates that so as to obtain the common bit sequence v of v we have to insert padding bits and "join" the padded bit representations.

Definition 6.47
The join operation on bits : bit → bit → bit is defined as: Definition 6. 48 The flatten operation (_) : val → list bit is defined as: where t = τ , n := | τ |, and z i := bitoffsetof τ i The operation ofval : list perm → val → mtree, which converts a value v of type τ into a memory tree ofval γ v, is albeit technical fairly straightforward. In principle it is just a recursive definition that uses the flatten operation v b for base values v b and the flatten operation union t v for unions union t v of an unspecified variant.
The technicality is that abstract values do not contain permissions, so we have to merge the given value with permissions. The sequence γ with | γ | = bitsizeof τ represents a flattened sequence of permissions. In the definition of the memory store m a := v (see Definition 6.58), we convert v into the stored memory tree ofval γ v where γ constitutes the old permissions of the object at address a.

Definition 6.49
The operation ofval : list perm → val → mtree is defined as: where s := bitsizeof τ and n := | v| Converting a memory tree into a value is as expected: permissions are removed and unions are interpreted as values corresponding to each variant.

Definition 6.50
The operation toval : mtree → val is defined as: t (i, w, b)) := union t (i, toval w) The function toval is an inverse of ofval up to freezing of pointers. Freezing is intended, it makes indirect type-punning illegal. Lemma 6.51 Given , v : τ , and let γ be a flattened sequence of permissions with | γ | = bitsizeof τ , then we have: The other direction does not hold because invalid bit representations will become indeterminate values.

Memory Operations
Now that we have all primitive definitions in place, we can compose these to implement the actual memory operations as described in the beginning of this section. The last part that is missing is a data structure to keep track of objects that have been locked. Intuitively, this data structure should represent a set of addresses, but up to overlapping addresses.

Definition 6.54
Locksets are defined as: (o, i) where o ∈ index describes the object identifier and i ∈ N a bit-offset in the object described by o. We introduce a typing judgment to describe that the structure of locksets matches up with the memory layout.  (f w y, μ) The lookup operation m a uses the lookup operation m[a] that yields a memory tree w (Definition 6.33), and then converts w into the value toval w. The operation m[a] already yields ⊥ in case effective types are violated or a is an end-of-array address. The additional condition of m a ensures that the permissions allow for a read access. Performing a lookup affects the effective types of the object at address a. This is factored out by the operation force a m which applies the identity function to the subobject at address a in the memory m. Importantly, this does not change the memory contents, but merely changes the variants of the involved unions.

Elements of locksets are pairs
The store operation m a := v uses the alter operation m[a/λw . ofval (w 1 ) v] on memories (Definition 6.35) to apply λw . ofval (w 1 ) v to the subobject at address a. The stored value v is converted into a memory tree while retaining the permissions w 1 of the previously stored memory tree w at address a.
The definition of lock a m is straightforward. In the Coq development we use a map operation on memory trees to apply the function lock (Definition 5.5) to the permission of each bit of the memory tree at address a.
The operation unlock m unlocks a whole lockset , rather than an individual address, in memory m. For each memory tree w at object identifier o, it converts to a Boolean vector y = ((o, 0) ∈ ) . . . ((o, |bitsizeof (typeof w)| − 1) ∈ ) and merges w with y (using Definition 6.38) to apply unlock (Definition 5.5) to the permissions of bits that should be unlocked in w. We show some lemmas to illustrate that the operations for locking and unlocking enjoy the intended behavior: Provided o / ∈ dom m, allocation alloc o v μ m extends the memory with a new object holding the value v and full permissions ♦(0, 1). Typically we use v = new τ for some τ , but global and static variables are allocated with a specific value v.
The operation free o m deallocates the object o in m, and keeps track of the type of the deallocated object. In order to deallocate dynamically obtained memory via free, the side-condition freeable a m describes that the permissions are sufficient for deallocation, and that a points to the first element of a malloced array.
All operations preserve typing and satisfy the expected laws about their interaction. We list some for illustration.
Storing a value v in memory and then retrieving it, does not necessarily yield the same value v. It intentionally yields the value | v | • whose pointers have been frozen. Note that the above result does not hold for byte addresses, which may store a value in a padding byte, in which case the resulting value is indeterminate. Lemma 6.64 (Stores and lookups commute) If , m and a 1 ⊥ a 2 and , a 2 : τ 2 and writable a 2 m and , v 2 : τ 2 , then we have: These results follow from Lemmas 6.36, 6.37 and 6.51.

Type-Based Alias Analysis
The purpose of C11's notion of effective types [27, 6.5p6-7] is to make it possible for compilers to perform typed-based alias analysis. Consider: short g(int *p, short *q) { short x = *q; *p = 10; return x; } Here, a compiler should be able to assume that p and q are not aliased because they point to objects with different types (although the integer types signed short and signed int may have the same representation, they have different integer ranks, see Definition 4.2, and are thus different types). If g is called with aliased pointers, execution of the function body should have undefined behavior in order to allow a compiler to soundly assume that p and q are not aliased.
From the C11 standard's description of effective types it is not immediate that calling g with aliased pointers results in undefined behavior. We prove an abstract property of our memory model that shows that this is indeed a consequence, and that indicates a compiler can perform type-based alias analysis. This also shows that our interpretation of effective types of the C11 standard, in line with the interpretation from the GCC documentation [20], is sensible. Definition 7.1 A type τ is a subobject type of σ , notation τ ⊆ σ , if there exists some reference r with r : σ τ .
For example, int [2] is a subobject type of struct S { int x [2]; int y [3]; } and int [2][2], but not of struct S { short x [2]; }, nor of int(*) [2]. Theorem 7.2 (Strict-aliasing) Given , m, frozen addresses a 1 and a 2 with , m a 1 : σ 1 and , m a 2 : σ 2 and σ 1 , σ 2 = unsigned char, then either: 3. Accessing a 1 after accessing a 2 and vice versa fails. That means: This theorem implies that accesses to addresses of disjoint type are either non-overlapping or have undefined behavior. Fact 6.61 accounts for a store after a lookup. Using this theorem, a compiler can optimize the generated code in the example based on the assumption that p and q are not aliased. Reconsider: short g(int *p, short *q) { short x = *q; *p = 10; return x; } If p and q are aliased, then calling g yields undefined behavior because the assignment *p = 10 violates effective types. Let m be the initial memory while executing g, and let a p and a q be the addresses corresponding to p and q, then the condition writable a p (force a q m) does not hold by Theorem 7.2 and Fact 6.61.

Memory Refinements
This section defines the notion of memory refinements that allows us to relate memory states. The author's PhD thesis [33] shows that the CH 2 O operational semantics is invariant under this notion. Memory refinements form a general way to validate many common-sense properties of the memory model in a formal way. For example, they show that the memory is invariant under relabeling. More interestingly, they show that symbolic information (such as variants of unions) cannot be observed.
Memory refinements also open the door to reason about program transformations. We demonstrate their usage by proving soundness of constant propagation and by verifying an abstract version of memcpy.
Memory refinements are a variant of Leroy and Blazy's notion of memory extensions and injections [41]. A memory refinement is a relation m 1 f m 2 between a source memory state m 1 and target memory state m 2 , where: 1. The function f : index → option (index × ref) is used to rename object identifiers and to coalesce multiple objects into subobjects of a compound object. The renaming function f : index → option (index × ref) is the core of all refinement judgments. It is used to rename object identifiers and to coalesce multiple source objects into subobjects of a single compound target object. Injectivity of renaming functions guarantees that distinct source objects are coalesced into disjoint target subobjects. In the case of Blazy and Leroy, the renaming functions have type index → option (index × N), but we replaced the natural number by a reference since our memory model is structured using trees.
Since memory refinements rearrange the memory layout, addresses should be rearranged accordingly. The judgment a 1 f : 1 → 2 a 2 : τ p describes how a 2 is obtained by renaming a 1 according to the renaming f , and moreover allows frozen union annotations • in a 1 to be changed into unfrozen ones • in a 2 . The index τ p in the judgment a 1 f : 1 → 2 a 2 : τ p corresponds to the type of a 1 and a 2 .
The judgment for addresses is lifted to the judgment for pointers in the obvious way. The judgment for bits is inductively defined as: The last two rules allow indeterminate bits E, as well as pointer fragment bits (ptr a) i belonging to deallocated storage, to be replaced by arbitrary bits b.
The judgment is lifted to memory trees following the tree structure and using the following additional rule: This rule allows a union that has a specific variant in the source to be replaced by a union with an unspecified variant in the target. The direction seems counter intuitive, but keep in mind that unions with an unspecified variant allow more behaviors.
This lemma is useful because it removes the need for simultaneous inductions on both typing and refinement judgments.
We The above definition makes sure that objects are renamed, and possibly coalesced into subobjects of a compound object, as described by the renaming function f .
In order to reason about program transformations modularly, we show that memory refinements can be composed.
All memory operations are preserved by memory refinements. This property is not only useful for reasoning about program transformations, but also indicates that the memory interface does not expose internal details (such as variants of unions) that are unavailable in the memory of a (concrete) machine. As shown in Lemma 6.63, storing a value v in memory and then retrieving it, does not necessarily yield the same value v. In case of a byte address, the value may have been stored in padding and therefore have become indeterminate. Secondly, it intentionally yields the value | v | • in which all pointers are frozen. However, the widely used compiler optimization of constant propagation, which substitutes values of known constants at compile time, is still valid in our memory model. Copying an object w by an assignment results in it being converted to a value toval w and back. This conversion makes invalid representations of base values indeterminate. Copying an object w byte-wise results in it being converted to bits w and back. This conversion makes all variants of unions unspecified. The following theorem shows that a copy by assignment can be transformed into a byte-wise copy.
Unused reads cannot be removed unconditionally in the CH 2 O memory model because these have side-effects in the form of uses of the force operation that updates effective types. We show that uses of force can be removed for frozen addresses.

Reasoning About Disjointness
In order to prove soundness of the CH 2 O axiomatic semantics, we often needed to reason about preservation of disjointness under memory operations [33]. This section describes some machinery to ease reasoning about disjointness. We show that our machinery, as originally developed in [31], extends to any separation algebra.

Definition 7.13
Disjointness of a list x, notation ⊥ x, is defined as: Notice that ⊥ x is stronger than having x i ⊥ x j for each i = j. For example, using fractional permissions, we do not have ⊥ [ 0.5, 0.5, 0.5 ] whereas 0.5 ⊥ 0.5 clearly holds. Using disjointness of lists we can for example state the associativity law (law 3 of Definition 5.1) in a symmetric way: We define a relation x 1 ≡ ⊥ x 2 that expresses that x 1 and x 2 behave equivalently with respect to disjointness.

Definition 7.15
Equivalence of lists x 1 and x 2 with respect to disjointness, notation x 1 ≡ ⊥ x 2 , is defined as: It is straightforward to show that ≤ ⊥ is reflexive and transitive, is respected by concatenation of lists, and is preserved by list containment. Hence, ≡ ⊥ is an equivalence relation, a congruence with respect to concatenation of lists, and is preserved by permutations. The following results (on arbitrary separation algebras) allow us to reason algebraically about disjointness.
Theorem 7. 18 We have the following algebraic properties: In Sect. 7.4 we show that we have similar properties as the above for the specific operations of our memory model.

The Memory as a Separation Algebra
We show that the CH 2 O memory model is a separation algebra, and that the separation algebra operations interact appropriately with the memory operations that we have defined in Sect. 6.
In order to define the separation algebra relations and operations on memories, we first define these on memory trees. Memory trees do not form a separation algebra themselves due to the absence of a unique ∅ element (memory trees have a distinct identity element new τ for each type τ , see Definition 6.31). The separation algebra of memories is then defined by lifting the definitions on memory trees to memories (which are basically finite functions to memory trees).

Definition 7.19
The predicate valid : mtree → Prop is inductively defined as: Fact 7.20 If , w : τ , then valid w.
The valid predicate specifies the subset of memory trees on which the separation algebra structure is defined. The definition basically lifts the valid predicate from the leaves to the trees. The side-condition ¬unmapped (w b) on union t (i, w, b) memory trees ensures canonicity, unions whose permissions are unmapped cannot be accessed and are thus kept in unspecified variant. Unmapped unions union t b can be combined with other unions using ∪. The rationale for doing so will become clear in the context of the separation logic in the author's PhD thesis [33].

Definition 7.22
The operation ∪ : mtree → mtree → mtree is defined as: In the last two clauses, w∪ b is a modified version of the memory tree w in which the elements on the leaves of w are zipped with b using the ∪ operation on permission annotated bits (see Definitions 6.38 and 5.13).
The definitions of valid, ⊥ and ∪ on memory trees satisfy all laws of a separation algebra (see Definition 5.1) apart from those involving ∅. We prove the cancellation law explicitly since it involves the aforementioned side-conditions on unions.

Lemma 7.23
If w 3 ⊥ w 1 and w 3 ⊥ w 2 then: Proof By induction on the derivations w 3 ⊥ w 1 and w 3 ⊥ w 2 . We consider one case: Here, we have w 3 b 3 ∪ w 1 b 1 = w 3 b 3 ∪ b 2 by assumption, and therefore w 1 b 1 = b 2 by the cancellation law of a separation algebra. However, by assumption we also have ¬unmapped (w 1 b 1 ) and unmapped b 2 , which contradicts w 1 b 1 = b 2 .

Definition 7.24
The separation algebra of memories is defined as: The definitions of the omitted relations and operations are as expected.
The emptiness conditions ensure canonicity. Objects that solely consist of indeterminate bits with ∅ permission are meaningless and should not be kept at all. These conditions are needed for cancellativity. Notice that the memory typing environment is not subdivided among m 1 and m 2 . Consider the memory state corresponding to int x = 10, *p = &x: Here, w is the memory tree that represents the integer value 10. The pointer on the right hand side is well-typed in the memory environment o x → w, o p → • of the whole memory, but not in o p → •.
We prove some essential properties about the interaction between the separation algebra operations and the memory operations. These properties have been used in the soundness proof of the separation logic in the author's PhD thesis [33]. Memory trees and memories can be generalized to contain elements of an arbitrary separation algebra as leaves instead of just permission annotated bits [32]. These generalized memories form a functor that lifts the separation algebra structure on the leaves to entire trees. We have taken this approach in the Coq development, but for brevity's sake, we have refrained from doing so in this paper.

Formalization in Coq
Real-world programming language have a large number of features that require large formal descriptions. As this paper has shown, the C programming language is not different in this regard. On top of that, the C semantics is very subtle due to an abundance of delicate corner cases. Designing a semantics for C and proving properties about such a semantics therefore inevitably requires computer support.
For these reasons, we have used Coq [15] to formalize all results in this paper. Although Coq does not guarantee the absence of mistakes in our definitions, it provides a rigorous set of checks on our definitions, for example by its type checking of definitions. On top of that, we have used Coq to prove all metatheoretical results stated in this paper. Last but not least, using Coq's program extraction facility we have extracted an exploration tool to test our memory model on small example programs [33,37]. Despite our choice to use Coq, we believe that nearly all parts of CH 2 O could be formalized in any proof assistant based on higher-order logic.

Overloaded Typing Judgments
Type classes are used to overload notations for typing judgments (we have 25 different typing judgments). The class Valid is used for judgments without a type, such as and , m. We use product types to represent judgments with multiple environments such as , m. The notation { }* is used to lift the judgment to lists. The class Typed is used for judgments such as , v : τ and , , τ e : τ lr .

Implementation-Defined Behavior
Type classes are used to parameterize the whole Coq development by implementation-defined parameters such as integer sizes. For example, Lemma 6.51 looks like: The parameter EnvSpec K is a type class describing an implementation environment with ranks K (Definition 4.12). Just as in this paper, the type K of integer ranks is a parameter of the inductive definition of types (see Definition 4.1) and is propagated through all syntax. The definition of the type class EnvSpec is based on the approach of Spitters and van der Weegen [55]. We have a separate class Env for the operations that is an implicit parameter of the whole class and all lemmas.

Partial Functions
Although many operations in CH 2 O are partial, we have formalized many such operations as total functions that assign an appropriate default value. We followed the approach presented in Sect. 5.2 where operations are combined with a validity predicate that describes in which case they may be used. For example, part (2)  Here, m1 ⊥ m2 is the side-condition of m1 ∪ m2, and mem_writable a1 m1 the side-condition of <[a1:=v1]{ }>m1. Alternatives approaches include using the option monad or dependent types, but our approach proved more convenient. In particular, since most validity predicates are given by an inductive definition, various proofs could be done by induction on the structure of the validity predicate. The cases one has to consider correspond exactly to the domain of the partial function.
Admissible side-conditions, such as in the above example <[a1:=v1]{ }>m1 ⊥ m2 and mem_writable a1 (m1 ∪ m2), do not have to be stated explicitly and follow from the side-conditions that are already there. By avoiding the need to state admissible side-conditions, we avoid a blow-up in the number of side-conditions of many lemmas. We thus reduce the proof effort needed to use such a lemma.

Automation
The proof style deployed in the CH 2 O development combines interactive proofs with automated proofs. In this section we describe some tactics and forms of proof automation deployed in the CH 2 O development.
Small inversions Coq's inversion tactic has two serious shortcomings on inductively defined predicates with many constructors. It is rather slow and its way of controlling of names for variables and hypotheses is deficient. Hence, we often used the technique of small inversions by Monin and Shi [43] that improves on both shortcomings. Solving disjointness We have used Coq's setoid machinery [54] to enable rewriting using the relations ≤ ⊥ and ≡ ⊥ (Definition 7.15). Using this machinery, we have implemented a tactic that automatically solves entailments of the form: x and x i (for i < n) are arbitrary Coq expressions built from ∅, ∪ and . This tactic works roughly as follows: 1. Simplify hypotheses using Theorem 7.18. 2. Solve side-conditions by simplification using Theorem 7.18 and a solver for list containment (implemented by reflection). 3. Repeat these steps until no further simplification is possible. 4. Finally, solve the goal by simplification using Theorem 7.18 and list containment.
This tactic is not implemented using reflection, but that is something we intend to do in future work to improve its performance. First-order logic Many side-conditions we have encountered involve simple entailments of first-order logic such as distributing logical quantifiers combined with some propositional reasoning. Coq does not provide a solver for first-order logic apart from the firstorder tactic whose performance is already insufficient on small goals.
We have used Ltac to implemented an ad-hoc solver called naive_solver, which performs a simple breath-first search proof search. Although this tactic is inherently incomplete and suffers from some limitations, it turned out to be sufficient to solve many uninteresting side-conditions (without the need for classical axioms).

Overview of the Coq Development
The Coq development of the memory model, which is entirely constructive and axiom free, consists of the following parts:

Related Work
The idea of using a memory model based on trees instead of arrays of plain bits, and the idea of using pointers based on paths instead of offsets, has already been used for object oriented languages. It goes back at least to Rossie and Friedman [51], and has been used by Ramananandro et al. [48] for C++. Furthermore, many researchers have considered connections between unstructured and structured views of data in C [2,14,21,56] in the context of program logics. However, a memory model that combines an abstract tree based structure with low-level object representations in terms of bytes has not been explored before. In this section we will describe other formalizations of the C memory model. Norrish (1998) Norrish has formalized a significant fragment of the C89 standard using the proof assistant HOL4 [44,45]. He was the first to describe non-determinism and sequence points formally. Our treatment of these features has partly been based on his work. Norrish's formalization of the C type system has some similarities with our type system: he has also omitted features that can be desugared and has proven type preservation.
Contrary to our work, Norrish has used an unstructured memory model based on sequences of bytes. Since he has considered the C89 standard in which effective types (and similar notions) were not introduced yet, his choice is appropriate. For C99 and beyond, a more detailed memory model like ours is needed, see also Sect. 3 and Defect Report #260 and #451 [26].
Another interesting difference is that Norrish represents abstract values (integers, pointers and structs) as sequences of bytes instead of mathematical values. Due to this, padding bytes retain their value while structs are copied. This is not faithful to the C99 standard and beyond.  [38,39]. Their part of C, which is called CompCert C, covers most major features of C and can be compiled into assembly (PowerPC, ARM and x86) using a compiler written in Coq. Their compiler, called CompCert, has been proven correct with respect to the CompCert C and assembly semantics.
The goal of CompCert is essentially different from CH 2 O's. What can be proven with respect to the CompCert semantics does not have to hold for any C11 compiler, it just has to hold for the CompCert compiler. CompCert is therefore in its semantics allowed to restrict implementation defined behaviors to be very specific (for example, it uses 32-bit ints since it targets only 32-bit computing architectures) and allowed to give a defined semantics to various undefined behaviors (such as sequence point violations, violations of effective types, and certain uses of dangling pointers). The CompCert memory model is used by all languages (from C until assembly) of the CompCert compiler [40,41]. The CompCert memory is a finite partial function from object identifiers to objects. Each local, global and static variable, and invocation of malloc is associated with a unique object identifier of a separate object in memory. We have used the same approach in CH 2 O, but there are some important differences. The paragraphs below discuss the relation of CH 2 O with the first and second version of the CompCert memory model. Leroy and Blazy (2008) In the first version of the CompCert memory model [41], objects were represented as arrays of type-annotated fragments of base values. Examples of bytes are thus "the 2nd byte of the short 13" or "the 3rd byte of the pointer (o, i)". Pointers were represented as pairs (o, i) where o is an object identifier and i the byte offset into the object o.
Since bytes are annotated with types and could only be retrieved from memory using an expression of matching type, effective types on the level of base types are implicitly described. However, this does not match the C11 standard. For example, Leroy and Blazy do assign the return value 11 to the following program: struct S1 { int x; }; struct S2 { int y; }; int f(struct S1 *p, struct S2 *q) { p->x = 10; q->y = 11; return p->x; } int main() { union U { struct S1 s1; struct S2 s2; } u; printf("%d\n", f(&u.s1, &u.s2)); } This code strongly resembles example [27, 6.5.2.3p9] from the C11 standard, which is stated to have undefined behavior. 6 GCC and Clang optimize this code to print 10, which differs from the value assigned by Leroy and Blazy.
Apart from assigning too much defined behavior, Leroy and Blazy's treatment of effective types also prohibits any form of "bit twiddling".
Leroy and Blazy have introduced the notion of memory injections in [41]. This notion allows one to reason about memory transformations in an elegant way. Our notion of memory refinements (Sect. 7.2) generalize the approach of Leroy and Blazy to a tree based memory model.

Leroy et al. (2012)
The second version of CompCert memory model [40] is entirely untyped and is extended with permissions. Symbolic bytes are only used for pointer values and indeterminate storage, whereas integer and floating point values are represented as numerical bytes (integers between 0 and 2 8 − 1).
We have extended this approach by analogy to bit-representations, representing indeterminate storage and pointer values using symbolic bits, and integer values using concrete bits. This choice is detailed in Sect. 6.2.
As an extension of CompCert, Robert and Leroy have formally proven soundness of an alias analysis [50]. Their alias analysis is untyped and operates on the RTL intermediate language of CompCert.
Beringer et al. [7] have developed an extension of CompCert's memory injections to reason about program transformations in the case of separate compilation. The issues of separate compilation are orthogonal to those that we consider.

Appel et al. (2014)
The Verified Software Toolchain (VST) by Appel et al. provides a higher-order separation logic for Verifiable C, which is a variant of CompCert's intermediate language Clight [3].
The VST is intended to be used together with the CompCert compiler. It gives very strong guarantees when done so. The soundness proof of the VST in conjunction with the correctness proof of the CompCert compiler ensure that the proven properties also hold for the generated assembly.
In case the verified program is compiled with a compiler different from CompCert, the trust in the program is still increased, but no full guarantees can be given. That is caused by the fact that CompCert's intermediate language Clight uses a specific evaluation order and assigns defined behavior to many undefined behaviors of the C11 standard. For example, Clight assigns defined behavior to violations of effective types and sequence point violations. The VST inherits these defined behaviors from CompCert and allows one to use them in proofs.
Since the VST is linked to CompCert, it uses CompCert's coarse permission system on the level of the operational semantics. Stewart and Appel [3,Chapter 42] have introduced a way to use a more fine grained permission system at the level of the separation logic without having to modify the Clight operational semantics. Their approach shows its merits when used for concurrency, in which case the memory model contains ghost data related to the conditions of locks [23,24].  [8].
Objects in their memory model consist of lazily evaluated values described by symbolic expressions. These symbolic expressions are used to delay the evaluation of operations on uninitialized memory and pointer values. Only when a concrete value is needed (for example in case of the controlling expression of an if-then-else, for, or while statement), the symbolic expression is normalized. Consider: int x, *p = &x; int y = ((unsigned char*)p)[1] | 1; // y has symbolic value "2nd pointer byte of p" | 1 if (y & 1) printf("one\n"); // unique normalization -> OK if (y & 2) printf("two\n"); // no unique normalization -> bad The value of ((unsigned char*)p)[1] | 1 is not evaluated eagerly. Instead, the assignment to y stores a symbolic expression denoting this value. During the execution of the first if statement, the actual value of y & 1 is needed. In this case, y & 1 has the value 1 for any possible numerical value of ((unsigned char*)p) [1]. As a result, the string one is printed.
The semantics of Besson et al. is deterministic by definition. Normalization of symbolic expressions has defined behavior if and only if the expression can be normalized to a unique value under any choice of numeral values for pointer representations and uninitialized storage. In the second if statement this is not the case.
The approach of Besson et al. gives a semantics to some programming techniques that rely on the numerical representations of pointers and uninitialized memory. For example, it gives an appropriate semantics to pointer tagging in which unused bits of a pointer representation are used to store additional information.
However, as already observed by Kang et al. [28], Besson et al. do not give a semantics to many other useful cases. For example, printing the object representation of a struct, or computing the hash of a pointer value, is inherently non-deterministic. The approach of Besson et al. assigns undefined behavior to these use cases.
The goal of Besson et al. is inherently different from ours. Our goal is to describe the C11 standard faithfully whereas Besson et al. focus on de facto versions of C. They intentionally assign defined behavior to many constructs involving uninitialized memory that are clearly undefined according to the C11 standard, but that are nonetheless faithfully compiled by specific compilers. [18,19] have developed an executable semantics of the C11 standard using the K-framework. 7 Their semantics is very comprehensive and describes all features of a freestanding C implementation [27, 4p6] including some parts of the standard library. It furthermore has been thoroughly tested against test suites (such as the GCC torture test suite), and has been used as an oracle for compiler testing [49].

Ellison and Roşu (2012) Ellison and Roşu
Ellison and Roşu support more C features than we do, but they do not have infrastructure for formal proofs, and thus have not established any metatheoretical properties about their semantics. Their semantics, despite being written in a formal framework, should more be seen as a debugger, a state space search tool, or possibly, as a model checker. It is unlikely to be of practical use in proof assistants because it is defined on top of a large C abstract syntax and uses a rather ad-hoc execution state that contains over 90 components.
Similar to our work, Ellison and Roşu's goal is to exactly describe the C11 standard. However, for some programs their semantics is less precise than ours, which is mainly caused by their memory model, which is less principled than ours. Their memory model is based on CompCert's: it is essentially a finite map of objects consisting of unstructured arrays of bytes.

Hathhorn et al. (2015)
Hathhorn et al. [22] have extended the work of Ellison and Roşu to handle more underspecification of C11. Most importantly, the memory model has been extended and support for the type qualifiers const, restrict and volatile has been added.
Hathhorn et al. have extended the original memory model (which was based on Comp-Cert's) with decorations to handle effective types, restrictions on padding and the restrict qualifier. Effective types are modeled by a map that associates a type to each object. Their approach is less fine-grained than ours and is unable to account for active variants of unions. It thus does not assign undefined behavior to important violations of effective types and in turn does not allow compilers to perform optimizations based on type-based alias analysis. For example: // Undefined behavior in case f is called with aliased // pointers due to effective types int f(short *p, int *q) { *p = 10; *q = 11; return *p; } int main() { union { short x; int y; } u = { .y = 0 }; return f(&u.x, &u.y); } The above program has undefined behavior due to a violation of effective types. This is captured by our tree based memory model, but Hathhorn et al. require the program to return the value 11. When compiled with GCC or Clang with optimization level -O2, the compiled program returns the value 10.
Hathhorn et al. handle restrictions on padding bytes in the case of unions, but not in the case of structs. For example, the following program returns the value 1 according to their semantics, whereas it has unspecified behavior according to the C11 standard [27, 6.2.6.1p6] (see also Sect. 3. The restrictions on paddings bytes are implicit in our memory model based on structured trees, and thus handled correctly. The above examples provide evidence that a structured approach, especially combined with metatheoretical results, is more reliable than depending on ad-hoc decorations.

Kang et al. (2015)
Kang et al. [28] have proposed a memory model that gives a semantics to pointer to integer casts. Their memory model uses a combination of numerical and symbolic representations of pointer values (whereas CompCert and CH 2 O always represent pointer values symbolically). Initially each pointer is represented symbolically, but whenever the numerical representation of a pointer is needed (due to a pointer to integer cast), it is nondeterministically realized.
The memory model of Kang et al. gives a semantics to pointer to integer casts while allowing common compiler optimizations that are invalid in a naive low-level memory model. They provide the following motivating example: In a concrete memory model, there is the possibility that the function g is able to guess the numerical representation of &a, and thereby access or even modify a. This is undesirable, because it prevents the widely used optimization of constant propagation, which optimizes the variable a out.
In the CompCert and CH 2 O memory model, where pointers are represented symbolically, it is guaranteed that f has exclusive control over a. Since &a has not been leaked, g can impossibly access a. In the memory model of Kang et al. a pointer will only be given a numerical representation when it is cast to an integer. In the above code, no such casts appear, and g cannot access a.
The goal of Kang et al. is to give a unambiguous mathematical model for pointer to integer casts, but not necessarily to comply with C11 or existing compilers. Although we think that their model is a reasonable choice, it is unclear whether it is faithful to the C11 standard in the context of Defect Report #260 [26]. Consider: int x = 0, *p = 0; for (uintptr_t i = 0; ; i++) { if (i == (uintptr_t)&x) { p = (int*)i; break; } } *p = 15; printf("%d\n", x); Here we loop through the range of integers of type uintptr_t until we have found the integer representation i of &x, which we then assign to the pointer p.
When compiled with gcc -O2 (version 4.9.2), the generated assembly no longer contains a loop, and the pointers p and &x are assumed not to alias. As a result, the program prints the old value of x, namely 0. In the memory model of Kang et al. the pointer obtained via the cast (int*)i is exactly the same as &x. In their model the program thus has defined behavior and is required to print 15.
We have reported this issue to the GCC bug tracker. 8 However it unclear whether the GCC developers consider this a bug or not. Some developers seem to believe that this program has undefined behavior and that GCC's optimizations are thus justified. Note that the cast (intptr_t)&x is already forbidden by the type system of CH 2 O.

Conclusion
In this paper we have given a formal description of a significant part of the non-concurrent C11 memory model. This formal description has been used in [33,37] as part of an an operational, executable and axiomatic semantics of C. On top of this formal description, we have provided a comprehensive collection of metatheoretical results. All of these results have been formalized using the Coq proof assistant.
It would be interesting to investigate whether our memory model can be used to help the standard committee to improve future versions of the standard. For example, whether it could help to improve the standard's prose description of effective types. As indicated on page 4 of Sect. 1, the standard's description is not only ambiguous, but also does not cover its intent to enable type-based alias analysis. The description of our memory model is unambiguous and allows one to express intended consequences formally. We have formally proven soundness of an abstract version of type-based alias analysis with respect to our memory model (Theorem 7.2).
An obvious direction for future work is to extend the memory model with additional features. We give an overview of some features of C11 that are absent.
- First of all, one could restrict to IEEE-754 floating point arithmetic, which has a clear specification [25] and a comprehensive formalization in Coq [10]. Boldo et al. have taken this approach in the context of CompCert [9] and we see no fundamental problems applying it to CH 2 O as well. Alternatively, one could consider formalizing all implementation-defined aspects of the description of floating arithmetic in the C11 standard. -Bitfields Bitfields are fields of struct types that occupy individual bits [27, 6.7.2.1p9]. We do not foresee fundamental problems adding bitfields to CH 2 O as bits already constitute the smallest unit of storage in our memory model. -Untyped malloc CH 2 O supports dynamic memory allocation via an operator alloc τ e close to C++'s new operator. The alloc τ e operator yields a τ * pointer to storage for a τ -array of length e. This is different from C's malloc function that yields a void* pointer to storage of unknown type [27, 7.22.3.4]. Dynamic memory allocation via the untyped malloc function is closely related to unions and effective types. Only when dynamically allocated storage is actually used, it will receive an effective type. We expect one could treat malloced objects as unions that range over all possible types that fit. -Restrict qualifiers The restrict qualifier can be applied to any pointer type to express that the pointers do not alias. Since the description in the C11 standard [27, 6.7.3.1] is ambiguous (most notably, it is unclear how it interacts with nested pointers and data types), formalization and metatheoretical proofs may provide prospects for clarification.
-Volatile qualifiers The volatile qualifier can be applied to any type to indicate that its value may be changed by an external process. It is meant to prevent compilers from optimizing away data accesses or reordering these [27, footnote 134]. Volatile accesses should thus be considered as a form of I/O. -Concurrency and atomics Shared-memory concurrency and atomic operations are the main omission from the C11 standard in the CH 2 O semantics. Although shared-memory concurrency is a relatively new addition to the C and C++ standards, there is already a large body of ongoing work in this direction, see for example [4,5,52,53,57]. These works have led to improvements of the standard text. There are still important open problems in the area of concurrent memory models for already small sublanguages of C [4]. Current memory models for these sublanguages involve just features specific to threads and atomic operations whereas we have focused on structs, unions, effective types and indeterminate memory. We hope that both directions are largely orthogonal and will eventually merge into a fully fledged C11 memory model and semantics.