## Abstract

The aliasing question (can two reference expressions point, during an execution, to the same object?) is both one of the most critical in practice, for applications ranging from compiler optimization to programmer verification, and one of the most heavily researched, with many hundreds of publications over several decades. One might then expect that good off-the-shelf solutions are widely available, ready to be plugged into a compiler or verifier. This is not the case. In practice, efficient and precise alias analysis remains an open problem. We present a practical tool, AutoAlias, which can be used to perform automatic alias analysis for object-oriented programs. Based on the theory of “duality semantics”, an application of Abstract Interpretation ideas, it is directed at object-oriented languages and has been implemented for Eiffel as an addition to the EiffelStudio environment. It offers variable-precision analysis, controllable through the choice of a constant that governs the number of fixpoint iterations: a higher number means better precision and higher computation time. All the source code of AutoAlias, as well as detailed results of analyses reported in this article, are publicly available. Practical applications so far have covered a library of data structures and algorithms and a library for GUI creation. For the former, AutoAlias achieves a precision appropriate for practical purposes and execution times in the order of 25 s for about 8000 lines of intricate code. For the GUI library, AutoAlias produces the alias analysis in around 232 s for about 150,000 lines of intricate code.

## Introduction

One of the most interesting questions that can be asked about a program is the aliasing question: can two given path expressions, say first_element.next.next and last_element.previous.previous.previous, denote the same object in the run-time object structure? Alias analysis can be a key step in many applications, from compiler optimization to verification of object-oriented programs and even [13] deadlock analysis.

Alias analysis has correspondingly produced an abundant literature, of which “Related Work” section cites a small part. The contributions of AutoAlias, the approach described in the present work, are: a context- and flow-sensitive alias analysis technique applicable to object-oriented languages, based on a general theory of object structures; good precision, matching, or exceeding the results of previous authors; an efficient implementation for object-oriented programs, currently available for Eiffel [9].

One of the applications of alias analysis, presented in Refs. [6, 11, 12], is change analysis, also known as “frame inference”: what properties can an operation change? The reason alias analysis plays a key role for frame inference in object-oriented languages is that the basic property-changing operation, assignment x := e, changes not only x and any path expression starting with x, such as x.a, x.a.b etc., but also y.x, y.x.a and so on for any y that is aliased to the current object (“this”). Expanding on the original work in Refs. [6, 11, 12], we have implemented automatic frame analysis in the AutoFrame tool, based on AutoAlias. The frame analysis effort is the topic of a companion paper [18].

The basic practical results are as follows, with detail in “AutoAlias: A Graph-Based Implementation for the Alias Calculus”. We applied AutoAlias to a library of data structures and algorithms, EiffelBase 2, of about 8000 lines of code and 45 classes, and a significantly larger (150K-LOC) graphical (GUI) library, EiffelVision. For EiffelBase 2, to obtain a precision appropriate for practical purposes, AutoAlias takes about 25 s. For EiffelVision, it takes a little less than 4 min. In both cases, the results permit detailed alias and change analysis.

The entire source code of AutoAlias is available in a public repository at [16]. The repository also contains detailed results of analyses performed in AutoAlias and reported in this article.

Some elements of this article, particularly in “The Mathematical Basis: Object Diagrams” and “The Alias Calculus” sections, will at first sight appear similar to the corresponding presentations in the earlier work cited above. One of the reasons is simply to make the presentation self-contained rather than requiring the reader to go to the earlier work. More fundamentally, however, the similarity of form should not mask the fundamental differences. The mathematical model has been profoundly refined, and the implementation is completely new. The previous work is best viewed as a prototype for the present version.

“Related Work” section presents the previous work. “The Mathematical Basis: Object Diagrams” and “The Alias Calculus” show the mathematical basis and theory on which AutoAlias relies. “AutoAlias: A Graph-Based Implementation for the Alias Calculus” section presents the implementation of AutoAlias its evaluation and results. “Future Work and Conclusion” section summarizes the work.

## Related Work

The work presented here is a continuation on the original work in Refs. [6, 11, 12]. The main difference is that we present a graph-based approach to the alias analysis, whereas the previous works used a relational-based approach. The immediate advantage is in performance.

There is a considerable literature on alias analysis, in particular for compiler optimization. We only consider work that is directly comparable to the present approach. For the overall problem of alias analysis in its full generality, good surveys exist, in particular two recent ones: [21, 23]. AutoAlias belongs to the fairly rarefied class of approaches that are (according to the standard terminology in the field, discussed in these surveys) both:

*Context-sensitive*, meaning that it differentiates between executions of a given instruction in different contexts. In particular, AutoAlias is*call-site-sensitive*, meaning that it does not coalesce the effects of different calls to the same routine, such as f (a1) and f (a2), where the routine f (a) performs b := a, and a context-insensitive analysis could deduce that this may alias both a1 and a2 to b and hence (wrongly) to each other.*Flow-sensitive*, meaning that it accounts for control flow: in**if**c**then**a := x**else**b := x**end**, the standard flow-insensitive analysis would report that a and b can get aliased to x, but flow-sensitive analysis reports that exactly one of them will.

Both these requirements place a much higher demand on the analysis technique.

Andersen [1] presents an efficient, inter-procedural pointer analysis for the C programming language. The analysis approximates for every variable of pointer type, the set of objects it may point to during program execution. This approach addresses C or languages of that level; the present work has been applied to a full-fledged object-oriented language. In an O–O context, some of the instructions may become unnecessary. In particular, there is no notion of plain pointers.

A specialization of context sensitivity is call-site sensitivity. References [19, 20] are the pioneers to use call sites as context. Whenever a routine gets called, the context under which the called method gets analyzed is a sequence of call sites. Another specialization of context sensitivity is object sensitivity [15] and type sensitivity [22]. These approaches use object abstractions or type information as contexts. Specifically, the analysis qualifies routine’s local variables with the allocation site of the receiver object of the method call. AutoAlias follows the same spirit; however, it also uses a flow-sensitive approach allowing a better precision of the analysis. An example of flow-sensitive analysis is Ref. [5], but it too introduces imprecision, in particular in handling assignments.

## The Mathematical Basis: Object Diagrams

An object diagram is a graph, where nodes represent possible objects at execution time and edges represent references variables. Let *N* be an enumerable set of potential nodes and *T* a set of names. An object diagram \(D \langle N, T, O, R, S \rangle\) is defined by

\(O \subseteq N\) Set of objects.

\(R \subseteq O\) Set of roots.

\(S: T \rightarrow O \rightarrow \mathbb {P}(O)\) Successors (references),

where *R* is the set of root nodes: in O-O computations, every operation is applied to a specific object (commonly known as *Current*, *this* or *self*). As an example, \(\texttt {x} >\mathtt {10}\) states a property of a variable that belongs to the current object.

###
**Definition 1**

An alias diagram is an object diagram *G*, such that

*O*is finite;\(\mid R\mid > 1\), there exists at least one root.

A path of edges \(a,b,\ldots\) on an alias diagram is associated with the expression \(a.b.\ldots\) in O–O. For a path expression \(e = a.b.\ldots\), \(e_G\) is the (possibly empty) set of end nodes of paths with edges \(a, b, \ldots\) from any root in *G*.

An empty path is represented by **Current** (**Current** represents the current object in O–O computations—also known as “this” or “self”). A single-element path is written as *a*, two or more elements as \(a.b.c\ldots\). We let “.” to also represent concatenation, e.g., if *p* and *q* are paths, then *p*.*q*, *a*.*q*, and *p*.*a* are also paths (their path concatenations). Both **Current** .p and p. **Current** mean p.

###
**Definition 2**

*E* is the set of expressions appearing in the program and its prefixes (set of all paths in *G*)

###
**Definition 3**

For any path *p* in *G*, \(compl (p) \subseteq E\) is the set of completion paths of *p*:

The semantics of paths is defined by value set V(p). The value set \({V(p) \subseteq \mathbb {P}(O)}\) of a path p is the set of nodes reachable from a root through *p*. In other words

\(V(\mathbf Current ) = R\).

\(V(p.a) = S(a)(V (p))\).

###
**Definition 4**

For any path *p* in *G*, the set \(alias_G (p) \subseteq E\) is the set of all paths that are aliased to *p* in *G*:

“Examples” section shows some examples for a better comprehension of the definitions, as well as examples on the operations describe in the next section.

### Operations on Alias Diagrams

This section describes a set of operations on an alias diagram *G*. Operations assume \(X \subseteq O\), \(t \in T\), and lists of the same size \(l_1\), \(l_2\) of expressions. All the operations are implicitly subscripted by the name of the diagram, e.g., **link** is really \(\mathbf{link}_G\); the subscript will be omitted in the absence of ambiguity.

#### (Un)Linking Nodes

The set of operations shown in Table 1 is used to compute the alias diagram when analyzing, for example, the most basic instruction for aliasing: assignment (see “The Alias Calculus” section). The effect on *G* of an assignment is to link and unlink some of its edges.

The last operation is particularly useful to link actual arguments to formal arguments in a feature call (for more details, see rule AC-UQCall in “The Alias Calculus” section).

#### Rooting Nodes

The operation shown in Table 2 is used to compute the effect on *G* when analyzing qualified feature calls (for more details, see “The Alias Calculus” section, rule AC_QCall). In O–O computations, calls are applied to a specific object. For instance, a call to the feature set_x (3) is applied to the current object (also know as *this* or *self*) and its effect might change its state. A call to the qualified feature y.set_x (3) might change the state of the object *y*. When a qualify call occurs, it is necessary to change the root of *G*, so the effect of executing the feature modifies the corresponding object.

This operation (along with dot distribution—see “Generalization of Dot Distribution over Alias Diagrams” section) allows the analysis to be call-site sensitive.

#### Including Nodes

The operation shown in Table 3 is used to compute the effect on *G* when analyzing a creation instruction in the code (for more details, see “The Alias Calculus” section, rule AC_New). The operations add a new object that does not currently exist in *G*.

#### New Alias Diagrams

The operations shown in Table 4 are used to compute the effect on *G* when analyzing a conditional or loop instruction in the code.

Rules AC_Cond and AC_Loop use these operations to create new alias diagrams for each branch (either in conditionals or loops). This allows the analysis to be flow-sensitive when analyzing conditionals and loops (see “The Alias Calculus” section for more details).

### Generalization of Dot Distribution over Alias Diagrams

The basic mechanism of object-oriented computations is feature call. All computations are achieved by calling certain features on a certain object. Consider x.f, this particular call means *apply feature* f *to the object attached to*eifx. Alias diagrams are built upon this mechanism. Authors in [14] introduce the notion of “distributed dot” that distributes the period of O-O programming over a list, a set or a relation; for example, \(x\bullet [u,v,w]\) denotes the list [x.u,x.v,x.w]. We extend the mechanism to dot distribution over alias diagrams.

###
**Definition 5**

For an alias diagram G, \(x\bullet G\) adds a back-pointer \(x'\) from *x* to each element in *R*, the roots of *G*. In other words, the effect of \(x\bullet G\) is \(S \cup \{(x', o, R) \mid o \in V(x)\}\).

Figure 1 depicts the effect of performing dot distribution over a graph.

This rule enables the analysis of alias diagrams to transpose the context of a call to the context of the caller, since it may depend and act on values and properties that are set by the object that launched the current call.

### Examples

Figure 2 shows the graphical representation of possible alias diagrams. The set of nodes for *G* (see Fig. 2b) is \(N_G = \{\underline{n_0}, n_1, n_2\}\) and the successors \(T_G = \{(a, \underline{n_0}, \{n_1\}), (d, \underline{n_0}, \{n1\}), (c, \underline{n_0}, \{n_2\}), (b, n_1, \{n_2\})\}\). Alias diagrams have at least one root, \(R_G = \{n_0\}\) (we use underlined nodes to graphically represent the set of roots).

The set of all expressions in the graph is \(E_G = \{a, b, c, d, a.b, d.b\}\). This set is used to get the set of expression completion. This set is not particularly interesting for aliasing; however, it is an important definition to be used in the Framing problem: the problem of inferring all program locations that might change. As an example, the completion path of *d* is \(compl_G (d) = \{d.b\}\) and the completion path of *a*.*b* is \(compl_G (a.b) = \emptyset\). Set *E* is also used to compute aliasing, see Definition 4. As an example, the expressions aliased to *c* are \(alias_G (c) = \{c, a.b, d.b\}\).

Table 5 shows the application of the different operations defined in Sect. 3.1 on the alias diagrams *G* and \(G_1\) depicted in Fig. 2.

## The Alias Calculus

The alias calculus is a set of rules defining the effect of executing an instruction on the aliasing that may exist between expressions. Each of these rules gives, for an instruction *p* of a given kind and an alias diagram *G* that holds in the initial state, the value of \(G \gg p\), the alias diagram that holds after the execution of *p*.

### The Programming Language

The programming language figuring in the rules of the alias calculus given below is a common-core subset of modern object-oriented languages, including the fundamental constructs found, with varying syntax and other details, in Java, C#, Eiffel, C++, and others: respectively, reference assignment, composition (sequencing), object creation (**new**), conditional, loop, unqualified call, and qualified call. As a result, the present work applies to any O–O language, with possible fine-tuning to account for individual differences, and so potentially does AutoAlias, although so far, we have applied it to Eiffel only.

Following the earlier work, the programming language does not have a real conditional instruction **if** c **then** p **else** q **end**, but only a non-deterministic choice written **then** p **else** q **end**, which executes either p or q. The loop construct similarly does not list a condition: **loop** p **end** executes *b* any number of times including zero. Ignoring conditions causes a potential loss of precision; as a trivial example, ignoring the condition in **if** n \(> \texttt {n} + 1\) **then** a := b **else** a := c **end** leads to concluding wrongly (that is to say, soundly but with a loss of precision) that a may become aliased to b. The alias calculus only knows about the object diagram and its reference structure; other properties, such as arithmetic properties in this example, are beyond its reach. Unlike the previous version of this work, however, AutoAlias can now deal with a limited set of properties which indeed pertain to the object structure. For that reason, the language now includes a construct

with the semantics of doing nothing (“skip”) if it is known for sure that c does not hold, and otherwise (that is to say, if the analysis can deduce that c holds, or cannot draw a conclusion) to execute p. Therefore, the standard conditional instruction of programming languages can be handled in the alias calculus as **then** (**if** c: p) **else** (**if** \(\lnot\) c: q) **end**. At present, AutoAlias has semantics for simple conditions on references such as e = f (equality) and e /= f (inequality) for path expressions (\(a.b.c\ldots\)) *e* and *f*.

### The Calculus

Rules of the calculus are shown in Table 6. The table shows the name of the rule, the rule \(G \gg p\), where *G* is an alias diagram and *p* is an instruction, and its semantics, the effect of executing *p* on the aliasing that may exist between expressions.

#### Assignments

Rule AC-Assg deals with assignments: the main instruction that creates aliasing. Figure 3 shows an example on how the rule is applied to the instruction a := b on the alias diagram *G* in Fig. 3a. The semantics for \(G \gg\) (a := b) is **relink ** \(a:V_G(b)\) which is shorthand for applying **unlink ***a* (Fig. 3b) then **link ** \(a:V_G(b)\) (Fig. 3c).

#### Composition

Rule AC-Comp deals with a compound of instructions (e.g., the set of instructions in a routine). Figure 4 shows an example on how the rule is applied to the instruction a := x;b := x on the alias diagram *G* in Fig. 4a. The semantics for \(G \gg\) (a := x; b := x) is

#### Creation

Rule AC-New deals with the creation of new objects. Figure 5 shows an example on how the rule is applied to the instruction **create ** x (also known as x = **new ** T(); in some Programming Languages) on the alias diagram *G* in Fig. 5a. The semantics for \(G \gg\) (**create** x) is **include ** \(n_4\) (Fig. 5b—\(n_4\) is just a new node on *G*) then **relink ** \(x:\{n_4\}\) (Fig. 5c).

#### Conditionals

Rule AC-Cond deals with conditionals. The rule does not take into consideration the condition, rather treats the instruction as a non-deterministic choice. The rule assumes the command-query separation principle [10]: asking a question should not change the answer. In other words, the rule assumes that functions being called in the condition are pure. Figure 6 shows an example on how the rule is applied to the instruction **then** a := x ** else** b := x ** end** on the alias diagram *G* in Fig. 6a. The semantics for \(G \gg\) (**then** a := x ** else** b := x ** end**) is

The semantics of rule AC-Cond is sound, but adds imprecision. This negatively affects the performance of the computation. Some improvements of the rule can be introduced: each branch of a conditional will only change a small part of the diagram; thus, there is not need to clone the common parts; the **clone** operation can be changed to clone only the source node. Figure 7 depicts the result of the union of the alias diagram in Fig. 6b, d applying the optimization in operation **clone**.

The AC-Cond rule (and its optimization) is sound. The example shown in Fig. 6 (or Fig. 7) elucidates the *flow-sensitive* approach of the analysis: the resulting alias diagram in Fig. 6e (or Fig. 7) reports that either *x* may be aliased to *a* or may be aliased to *b* but not both. Furthermore, the diagrams also report that *a* may not be aliased to *b* as a result of executing the instruction.

#### Loops

**loop** p **end** is the instruction that executes *p* any number of times including none. AC-Loop captures this semantics by unioning *i* times \(G \gg p\), so it can produce *G* (when \(i=0\)), or \(((G \gg p) \gg p)\) (when \(i=2\)), or \((((G \gg p) \gg p) \gg p)\) (when \(i=3\)) and so on. Figure 8 shows an example on how the rule is applied to the instruction **loop** l := l.right ** end** on the alias diagram *G* in Fig. 8a. Consider a common example, where l is a linked list that contains a reference to its right element. Figure 8b shows the result of applying \(G \gg\) l := l.right and Fig. 8c shows the result of \((G \gg\) l := l.right\() \gg\) l := l.right. Figure 8d shows the final result: \(\bigcup \nolimits _{i \in {\mathbb {N}}}\) (\(G_i \gg\) l := l.right).

Rule AC-Loop introduces imprecision, but retains soundness. An optimization of the rules is to consider the loop condition. In the general case, determining loop termination is undecidable, but there are specific cases that can be asserted, e.g., the approach might be able to determine whether two variables v and w are already aliased, as in **until** v = w ** loop** p ** end** (same concept can be applied to condition in rule AC-Cond).

#### Unqualified Calls

In rule AC-UQCall, l and \(f^\bullet\) are the lists of actual and formal arguments of routine f, respectively. \(\mid f\mid\) its body. The rule deals with unqualified calls (calls to routines of the Current object). Figure 9 shows an example on how the rule is applied to the instruction set_x (a) on the alias diagram *G* in Fig. 9a. set_x is a routine defined as set_x (v: T) ** do** x := v ** end**, it receives an argument v of any arbitrary type T and assigns it to variable x. The semantics for \(G \gg\) (*call* set_x (a)) is

The analysis is context-sensitive, meaning that it differentiates between executions of a given instruction in different contexts. In particular, the analysis is call-site-sensitive, meaning that it does not coalesce the effects of different calls to the same routine. Figure 10 depicts the process of performing \(G \gg (\) **call ** set_x (a); ** call ** set_x (b)). A context-insensitive analysis could deduce that this may alias both *a* and *b* to *x* and hence (wrongly) to each other. As shown in Fig. 10c, only *b* is aliased to *x*.

#### Qualified Calls

Rule AC-QCall deals with qualified calls (calls to routines on a different object from Current). Figure 11 shows an example on how the rule is applied to the instruction a.set_x (b) on the alias diagram *G* in Fig. 11a. The semantics of a.set_x (b) is apply routine set_x on the object attached to a (routine set_x has the same definition as before). The semantics for \(G \gg\) (a. **call** set_x (b)) is

Figure 11b performs dot distribution over *G* (see “Generalization of Dot Distribution over Alias Diagrams” section) and re-roots the graph to *V*(*a*) (the set of nodes reachable from the root through *a*, in this case \(\{n_1\}\)). Figure 11c depicts the resulting alias diagram after applying rule AC-UQCall on the alias diagram in Fig. 11b. Figure 11c shows the importance of shifting the context: operations are applied to *G* with \(n_1\) (the object attached to *a*) as the root. Finally, Fig. 11d re-roots the graph to its initial roots. This operation allows the analysis to support any number of nested calls. Each qualified call will (i) perform dot distribution, allowing the analysis to have access to those variables of the source (e.g., variable b when analyzing the call a.set_x (b)); (ii) shift the context (as in an O–O computation), allowing the analysis to perform operations on the right object; and (iii) shift the roots back, allowing the analysis to continue the normal operation.

## AutoAlias: A Graph-Based Implementation for the Alias Calculus

AutoAlias is a graph-based implementation for the alias calculus, sources of the tool are available in Ref. [16] and results can be checked in Ref. [17].

One of the main concerns of a graph-based approach with respect to the relation-based one (an approach from Refs. [6, 11, 12]) is the performance of the computation, especially when dealing with conditionals and loops (including recursion). This can be seen in rules AC-Cond and AC-Loop from “The Alias Calculus” section: both rules perform union operations on graphs. “Handling Conditionals” and “Handling Loops” sections show the techniques being used when dealing with such cases. “Dynamic Binding” section explains how Dynamic Binding (and Inheritance and Polymorphism), an important property of O–O computations is being handled.

### Handling Conditionals

The non-deterministic choice instruction has the form

**then**

\(branch_1\)

**elseif**

\(branch_2\)

...

**else**

\(branch_{n}\)

**end**

According to the AC-Cond rule, each branch of the conditional (\({ branch}_1\) ...\({ branch}_n\)) is analyzed with a starting alias diagram *G* that holds initially. Then, the resulting diagrams are cloned and union. When processing \({ branch}_i\), where \(i \in 1 \ldots n\), the implementation maintains two sets: \(A_i \in T \rightarrow O \rightarrow O\) (insertions—*A* for additions—of references in the alias diagram) and \(D_i \in T \rightarrow O \rightarrow O\) (deletions—*D* for deletions—of references). Both sets will contain triples \(({ name, source, target})\).

At the end of processing branch *i*, the implementation removes all the elements of \(A_i\) and add all elements of \(D_i\) to the alias graph (so as to get back to the starting state).

At the end of processing the conditional, \({ branch}_{n}\), the implementation:

- (i)
**clone**s the root of the diagram \(n-1\) times; - (ii)
for all \(b \in 2 \ldots n\) and for all \((n,s,t) \in D_b\), adds \((n, R^b, t)\) to the alias diagram;

- (iii)
for all \(b \in 2 \ldots n\) and for all \((n,s,t) \in A_b\), adds \(({ names} (s, t), R^b, t)\) to the alias diagram, where \({ names} ({ source, target})\) is a function returning the set of names from \({ source}\) to \({ target}\) in the alias diagram;

- (iv)
changes the corresponding clone root in sets

*A*and*D*; - (v)
inserts the union of \(A_i\) and removes the union of \(D_i\) in the alias diagram.

Consider the alias diagram in Fig. 12. The initial process of applying \(G \gg\) (**then** a := x ** else** b := x ** end**) is depicted in Fig. 13.

Figure 13a applies \(G \gg\) a := x. The implementation maintains sets *A* and *D*:

Figure 13b shows the result of removing all elements of \(A_{branch_1}\) and adding all elements of \(D_{branch_1}\) (to get back to the starting state). Figure 13c applies \(G \gg\) b := x, maintaining sets *A* and *D*:

The alias diagram is restored by removing all elements of \(A_{{branch}_2}\) and adding all elements of \(D_{{branch}_2}\) (to get back to the starting state—as depicted in Fig. 12). At the end of processing the conditional, the implementation: (i) **clone**s the root of the diagram \(n-1\) times. In this case, only once, as shown in Fig. 14a; then, (ii) for all \(b \in 2 \ldots n\) and for all \((n,s,t) \in D_b\), adds \((n, R^b, t)\) to the alias diagram, see Fig. 14b; the implementation then (iii) for all \(b \in 2 \ldots n\) and for all \((n,s,t) \in A_b\), adds \(({ names} (s, t), R^b, t)\) to the alias diagram, as depicted in Fig. 14c; *(iv)* changes the corresponding clone root in sets *A* and *D*

and finally, (v) the resulting alias diagram is the result of inserting the union of \(A_i\) and removing the union of the \(D_i\) (Fig. 14d).

The process is an optimization of the rules. Notice that each branch of a conditional will only change a small part of the diagram, this is being handled by just copying the edges that are being modified by the program.

### Handling Loops

Consider the instruction \(G \gg\) **loop** p **end** as executing the instruction p any number of times (*i*) including none, and unioning the resulting alias diagrams, so it can produce *G* (when \(i=0\)), or \(((G \gg p) \gg p)\) (when \(i=2\)), or \((((G \gg p) \gg p) \gg p)\) (when \(i=3\)) and so on.

Then, the mechanism to handle loops gives the following process:

Use a single

*D*(deletion) set. Here, there is no need for*A*sets.Process the loop body (p) repeatedly, at each iteration adding deleted references to

*D*.At the end of each iteration, nothing special needs to be done.

Stop when reaching a fixpoint.

At the end of the process, re-insert the elements of

*D*.

The general idea can be applied to recursion.

Maintain a single

*D*(deletion) set, as well as a stack with each call (and its target object).For each call, update the stack (so to handle the different ways of recursion, e.g., direct or indirect recursion).

Process the feature body repeatedly, at each call adding deleted references to

*D*.Stop when reaching a fixpoint using the stack calls.

At the end of the process, re-insert the elements of

*D*.

*Termination* Termination of fixpoint computations.

###
**Lemma 1**

*If the analysis starts from an existing graph and the program does not perform any object creations, then the iteration process* (*for loops*) *reaches a fixpoint finitely.*

###
*Proof of the Lemma*

The graph is finite; each iteration does not remove any nodes or edges, and can only insert edges. This cannot go on forever. \(\square\)

Hence, the process to guarantee termination, as follows. It assumes that we associate with every creation instruction **create** X a positive integer *N* (in the simplest variant, N = 1) and a fresh variable fx.

- S1
The first

*N*times processing the instruction, apply the normal rule (remove all edges labeled x from the root, create new node, create edge labeled x from the root to that node) - S2
The

*N*th time processing the instruction, after doing S1, add the label fx to the new edge (i.e., alias fx to x). - S3
Every subsequent time processing the instruction (starting with the \(N+\)1st), treat it not through the creation instruction rule but as if it were the assignment x := fx.

With the policy, after some number of iterations, no new node will ever be created. Therefore, the lemma applies and the fixpoint process terminates.

### Dynamic Binding

One of the main mechanisms of O–O programming is inheritance. It enables users to create ‘is-a’ relations between different classes: considering A and B as types, if B inherits from A, whenever an instance of A is required, an instance of B will be acceptable. This mechanism enables entities to be polymorphic: an entity is polymorphic if at run-time, its type differs from its static type.

Dynamic binding is the property that any execution of a feature call will use the version of the feature best adapted to the type of the target object, versions might differ thanks to polymorphism. It is important to mention how AutoAlias handles this property, since AutoAlias statically analyze the source code; hence, it is not possible to determine what is the appropriate type of a specific entity. Consider, as an example, the classes depicted in Fig. 15. Class T1, in Fig. 15a, defines two variables c and b. Class T2, in Fig. 15b, inherits from class T1 (using the keyword **inherit**), it also gives a redefinition of routine set (indicated by the keyword **redefine**).

A must-aliasing approach will yield, after executing feature call_set in Fig. 15c, a result that depends on the dynamic type of the entity t: if during execution t is attached to an object of type T1, the result would be t.c is aliased to a and if during execution, t is attached to an object of type T2, the result would be t.b is aliased to a.

A may-aliasing approach (as the one adopted by AutoAlias) would yield that t.c may be aliased to a or that t.b may be aliased to a. The mechanism implemented by AutoAlias is to treat the instruction as a conditional; in this case, it would be:

**then**

t.set (a) *—considering* t *attached to* T1

**else**

t.set (a) *—considering* t *attached to* T2

**end**

It will consider as many branches as heirs of T1 exists that redefine the feature call. For this particular case, the alias diagram resulting after executing \(G \gg (\) **call** call_set) is depicted in Fig. 16.

The mechanism introduces imprecision but retains soundness. Notice that t.c may be aliased to a or t.b may be aliased to a, but t.b may not be aliased to t.c.

### Using AutoAlias

#### AutoFrame

AutoFrame is a companion tool [18] that uses AutoAlias. AutoFrame produces the set of locations that are allowed to change in a routine. It statically analyzes the source code of a routine. AutoFrame relies on Autoalias to determine possibly aliasing. The most relevant results of AutoFrame so far are (i) the automatic reconstruction of the exact frame clauses, a total of 169 clauses, for an 8000+ lines data structures and algorithms. The frame inference in this case takes about 25 s on an ordinary laptop computer. (ii) The automatic generation of frame conditions of a 150,000 lines library for building GUIs. The frame inference in this case takes about 232 s.

#### Precision of AutoAlias

Deutsch [4] presents a comparison of the precision of some alias analysis algorithms (including his) on a structure-copying program creating two lists, whose elements are pairwise aliased. The idea behind was to answer an open problem on how to improve the accuracy of alias analysis in the presence of recursive pointers data structure. We ran AutoAlias on this program to evaluate the accuracy of our approach. Figure 17a shows the algorithm used in Ref. [4]. Since AutoAlias receives as an input Eiffel code, Fig. 17b shows the respective implementation of the algorithm.

Figure 17a defines the algorithm in a C-like program. List is a structure containing two pointers: to the head (char *hd) and to the tail (List *tl) of the list. Copy is a procedure that returns a list which is a copy of the elements of the list being passed as an argument. In Fig. 17b, the List structure is implemented as a class (LST) that contains two references: to the head (hd) and to the tail (tl: LST) of the list. The procedure copy_ is an implementation of copy. The return value of a routine in Eiffel is set by assigning it to the local variable **Result**. The type of this local variable is the one defined as the return type in the signature of the routine (LST in this case). Hence, there is not need to create the local variable p, as in Fig. 17a, we directly use **Result**. In Eiffel, class attributes have read-only privileges from outside the class: they can be changed only through procedures. This protects encapsulation and consistency of the object. Hence, the instruction **Result** .tl := copy_ (t1) is not permitted unless the proper **assigner** routines are being set. Figure 17b does not show the corresponding setter routines due to space.

Deutsch [4] defines five program properties and compares the precision of five different algorithms (including theirs) for alias analysis. Table 7 shows that comparison, the table also adds the results by AutoAlias.

AutoAlias is at least as precise as the other approaches. All of the five properties are met by our implementation. What it is interesting is that other approaches fail to capture the fact that after the execution of routine Copy, heads of *X* and *Y* might not be aliased at all. This is the case when the argument passed to Copy is **null**. According to Deutsch’s [4] approach, the set of aliases of the algorithm in Fig. 17a is \(\{(X\rightarrow (tl\rightarrow )^ihd, Y\rightarrow (tl\rightarrow )^jhd)\mid i=j\}\). For \(i=0\), the set is \(\{(X\rightarrow hd, Y\rightarrow hd)\}\), ruling out the possibility of no aliasing. AutoAlias captures this fact thanks to rule AC-Cond that analyses each branch of the conditional and unions the resulting alias diagrams, one of these diagrams yields no aliasing. Figure 18 depicts the respective alias graphs at different program points of the algorithm being analyzed.

## Future Work and Conclusion

A widely available, widely applicable, easy-to-integrate, and fast tool for alias analysis would immediately and immensely benefit many tasks of programming language implementation and verification. AutoAlias does not yet fulfill all these criteria, but provides, in our opinion, a significant step forward. The examples to which we have applied to the tool so far, while still limited, provide encouraging evidence of the solidity and scalability of the approach. The application to change analysis, described in the companion paper, are currently the showcase, but many others are potentially open, of interest to both tool developers (in particular developers of compilers and verification tools) and application programmers.

We realize the extent of the work that remains ahead, including the following: taking into account tricky language mechanisms such as exceptions and function objects (closures in Java, delegates in C#, agents in Eiffel); taking into account calls to external software mechanisms, e.g., system calls, which can potentially put the soundness of alias analysis into question, since objects then go into the big bad world out there, where anything can happen to them (but can we still reason about them without having to adopt the worst-case disaster scenario in which nothing can be assumed any longer?); refining the analysis and improving its precision further by taking into account ever more sophisticated patterns in conditional instructions and loops.

It is our hope, however, that the present state of the work, as described in this article, advances the search for general and effective techniques of automatic alias analysis.

## References

- 1.
Andersen LO. Program analysis and specialization for the C programming language. Technical report (1994).

- 2.
Chase DR, Wegman M, Zadeck FK. Analysis of pointers and structures. SIGPLAN Not. 1990;25(6):296–310.

- 3.
Choi J-D, Burke M, Carini P. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects. In: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’93. New York: ACM; 1993. p. 232–45.

- 4.
Deutsch A. Interprocedural may-alias analysis for pointers: beyond k-limiting. SIGPLAN Not. 1994;29(6):230–41.

- 5.
Hardekopf B, Lin C. Flow-sensitive pointer analysis for millions of lines of code. In: Proceedings of the 9th annual IEEE/ACM international symposium on code generation and optimization, CGO ’11. IEEE Computer Society: Washington, DC; 2011. p. 289–98.

- 6.
Kogtenkov A, Meyer B, Velder S. Alias calculus, change calculus and frame inference. Sci Comput Program. 2015;97(P1):163–72.

- 7.
Landi W, Ryder BG. A safe approximate algorithm for interprocedural aliasing. In: Proceedings of the ACM SIGPLAN 1992 conference on programming language design and implementation, PLDI ’92. ACM: New York; 1992. p. 235–248.

- 8.
Larus JR, Hilfinger PN. Detecting conflicts between structure accesses. SIGPLAN Not. 1988;23(7):24–31.

- 9.
Meyer B. Eiffel: a language and environment for software engineering. J Syst Softw. 1988;8(3):199–246.

- 10.
Meyer B. Object-oriented software construction. 2nd ed. Upper Saddle River: Prentice-Hall; 1997.

- 11.
Meyer B. Towards a theory and calculus of aliasing. J Object Technol. 2010;9(2):37–74

**(column)**. - 12.
Meyer B. Framing the frame problem. In: Pretschner A, Broy M, Irlbeck M, editors. Dependable software systems. New York: Springer; 2014. p. 174–85.

- 13.
Meyer B. An automatic technique for static deadlock prevention. In: Voronkov A, Virbitskaite I, editors. Perspectives of system informatics. Berlin: Springer; 2015. p. 45–58.

- 14.
Meyer B, Kogtenkov A. Negative variables and the essence of object-oriented programming. Berlin: Springer; 2014. p. 171–87.

- 15.
Milanova A, Rountev A, Ryder BG. Parameterized object sensitivity for points-to analysis for Java. ACM Trans Softw Eng Methodol. 2005;14(1):1–41.

- 16.
Rivera V. Autoalias and autoframe implementations; 2019. https://github.com/varivera/alias_graph_based/tree/master/autoframe commit:25d20fc529151d19760f12a3566681fd0c79b1ed.

- 17.
Rivera V. Autoalias and autoframe results. https://varivera.github.io/autoalias.html. Accessed Apr 2019.

- 18.
Rivera V, Bertrand M. Autoframe: automatic frame inference for object-oriented languages. Companion paper to this one, under submission (pre-print available at https://arxiv.org/pdf/1808.08751.pdf); 2019.

- 19.
Sharir M, Pnueli A. Two approaches to interprocedural data flow analysis, chapter 7. Englewood Cliffs: Prentice-Hall; 1981. p. 189–234.

- 20.
Shivers OG. Control-flow analysis of higher-order languages of taming lambda. Ph.D. thesis, Pittsburgh. UMI Order No. GAX91-26964; 1991.

- 21.
Smaragdakis Y, Balatsouras G. Pointer analysis. Found Trends Program Lang. 2015;2(1):1–69.

- 22.
Smaragdakis Y, Bravenboer M, Lhoták O. Pick your contexts well: understanding object-sensitivity. SIGPLAN Not. 2011;46(1):17–30.

- 23.
Sridharan M, Chandra S, Dolby J, Fink SJ, Yahav E. Aliasing in object-oriented programming. Chapter alias analysis for object-oriented programs. Berlin: Springer; 2013. p. 196–232.

## Acknowledgements

We are indebted to colleagues who collaborated on the previous iterations of the alias calculus work, particular Sergey Velder (ITMO University) for many important suggestions regarding the theory, Alexander Kogtenkov (Eiffel Software, also then ETH Zurich) who implemented an earlier version of the Change Calculus, and Marco Trudel (then ETH Zurich). We thank members of the Software Engineering Laboratory at Innopolis University, particularly Manuel Mazzara and Alexander Naumchev, for many fruitful discussions.

## Author information

### Affiliations

### Corresponding author

## Ethics declarations

### Conflict of interest

The authors declare that they have no conflict of interest.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

## About this article

### Cite this article

Rivera, V., Meyer, B. AutoAlias: Automatic Variable-Precision Alias Analysis for Object-Oriented Programs.
*SN COMPUT. SCI.* **1, **12 (2020). https://doi.org/10.1007/s42979-019-0012-1

Received:

Accepted:

Published:

### Keywords

- Alias analysis
- Object-oriented programming
- Points-to
- Program verification