# Automated Synthesis of Secure Platform Mappings

- 1 Citations
- 5.9k Downloads

## Abstract

System development often involves decisions about how a high-level design is to be implemented using primitives from a low-level platform. Certain decisions, however, may introduce undesirable behavior into the resulting implementation, possibly leading to a violation of a desired property that has already been established at the design level. In this paper, we introduce the problem of *synthesizing a property-preserving platform mapping*: synthesize a set of implementation decisions ensuring that a desired property is preserved from a high-level design into a low-level platform implementation. We formalize this synthesis problem and propose a technique for generating a mapping based on symbolic constraint search. We describe our prototype implementation, and two real-world case studies demonstrating the applicability of our technique to the synthesis of secure mappings for the popular web authorization protocols OAuth 1.0 and 2.0.

## 1 Introduction

When building a complex software system, one may begin by coming up with an abstract design, and then construct an implementation that conforms to this design. In practice, there are rarely enough time and resources available to build an implementation from scratch, and so this process often involves reuse of an existing *platform*—a collection of generic components, data structures, and libraries that are used to build an application in a particular domain.

The benefits of reuse also come with potential risks. A typical platform exhibits its own complex behavior, including subtle interactions with the environment that may be difficult to anticipate and reason about. Typically, the developer must work with the platform as it exists, and is rarely given the luxury of being able to modify it and remove unwanted features. For example, when building a web application, a developer must work with a standard browser and take into account all its features and security vulnerabilities. As a result, achieving an implementation that perfectly conforms to the design—in the traditional notion of behavioral refinement [20]—may be too difficult in practice. Worse, the resulting implementation may not necessarily preserve desirable properties that have already been established at the level of design.

These risks are especially evident in applications where security is a major concern. For example, OAuth 2.0, a popular authorization protocol subjected to rigorous and formal analysis at an abstract level [9, 33, 42], has been shown to be vulnerable to attacks when implemented on a web browser or a mobile device [10, 39, 41]. Many of these vulnerabilities are not due to simple programming errors: They arise from logical flaws that involve a subtle interaction between the protocol logic and the details of the underlying platform. Unfortunately, OAuth itself does not explicitly guard against these flaws, since it is intended to be a *generic*, *abstract* protocol that deliberately omits details about potential platforms. On the other hand, anticipating and mitigating against these risks require an in-depth understanding of the platform and security expertise, which many developers do not possess.

This paper proposes an approach to help developers overcome these risks and achieve an implementation that preserves desired properties. In particular, we formulate this task as the problem of automatically synthesizing a *property-preserving platform mapping*: A set of implementation decisions ensuring that a desired property is preserved from a high-level design into a low-level platform implementation.

Our approach builds on the prior work of Kang et al. [28], which proposes a modeling and verification framework for reasoning about security attacks across multiple levels of abstraction. The central notion in this framework is that of a *mapping*, which captures a developer’s decisions about how abstract system entities are to be realized in terms of their concrete counterparts. In this paper, we fix a bug in the formalization of mapping in [28] and extend the framework of [28] with the novel problem of synthesizing a property-preserving mapping. In addition, we present an algorithmic technique for performing this synthesis task. Our technique, inspired by the highly successful paradigms of *sketching* and *syntax-guided synthesis* [3, 26, 37, 38], takes a *constraint generalization* approach to (1) quickly prune the search space and (2) produce a solution that is *maximal* (i.e., a largest set of mappings that preserve a given property).

We have built a prototype implementation of the synthesis technique. Our tool accepts a high-level design model, a desired system property (both specified by the developer), and a model of a low-level platform (built and maintained separately by a domain expert). The tool then produces a maximal set of mappings (if one exists) that would ensure that the resulting platform implementation preserves the given property. We have successfully applied our tool to synthesize property-preserving mappings for two non-trivial case studies: the authentication protocols OAuth 1.0 and 2.0 implemented on top of HTTP. Our results are promising: The implementation decisions captured by our synthesized mappings describe effective mitigations against some of the common vulnerabilities that have been found in deployed OAuth implementations [39, 41].

The contributions of this paper include: a formal treatment of mapping, including a correction in the original definition [28] (Sect. 2); a formulation of the *mapping synthesis problem*, a novel approach for ensuring the preservation of a property between a high-level design and its platform implementation (Sect. 3); a technique for automatically synthesizing mappings based on symbolic constraint search (Sect. 4); and a prototype implementation of the synthesis technique along with a real-world case study demonstrating the feasibility of this approach (Sect. 5). We conclude with a discussion of related work (Sect. 6).

## 2 Mapping Composition

*sets of labels*, and includes a new composition operator based on the notion of

*mappings*, which relate event labels from one abstraction layer to another. In this section, we present the essential elements of this framework.

**Running Example.** Consider a simple example involving communication of messages among a set of processes. In our modeling approach, the communication of a message is represented by labels of the form *sender*.*receiver*.*message*. For example, label \(\textsf {\small {a.e.p}}\) represents Alicesending Eve a public, non-secret message. Similarly, \(\textsf {\small {a.b.s}}\) represents Alicesending a secret message to another process (\(\textsf {\small {b}}\) for Bob, for example). In this system, Aliceis unwilling to share its secret with Eve; in Fig. 1(a), this is modeled by the absence of any transition on event \(\{ \textsf {\small {a.e.s}} \}\) in the Aliceprocess.

Eve is a malicious character whose goal is to learn Alice’s secret. Beside \(\textsf {\small {a.e.p}}\) and \(\textsf {\small {a.e.s}}\), Eve is associated with two additional labels, \(\textsf {\small {u.e.p}}\) and \(\textsf {\small {u.e.s}}\), which represent receiving a public or secret message, respectively, through some *unknown* sender \(\textsf {\small {u}}\). Conceptually, these two latter labels can be regarded as *side channels* [30] that Eve uses to obtain information.

A desirable property of this abstract communication system is that Eve should never be able to learn Alice’s secret^{1}. In this case, it can be easily observed that the property holds, since Alice, by design, never sends the secret to Eve.

The model in Fig. 1(b) describes communication over a low-level public channel that is shared among all processes. A message sent over this channel may be encrypted using a key, as captured by labels of the form *message*.*key*. For instance, \(\textsf {\small {p.x}}\) and \(\textsf {\small {s.x}}\) represent the transmission of a public and secret message, respectively, using key \(\textsf {\small {x}}\). A message may also be sent in plaintext by omitting an encryption key (e.g., label \(\textsf {\small {s}}\) represents the plaintext transmission of a secret). Each receiver on the public channel is assumed to have knowledge of only a single key; for instance, \(\textsf {\small {Recv}}_\textsf {\small {X}}\) only knows key \(\textsf {\small {x}}\) and thus cannot receive messages that are encrypted using key \(\textsf {\small {y}}\) (i.e., labels \(\textsf {\small {p.y}}\) and \(\textsf {\small {s.y}}\) do not appear in events of \(\textsf {\small {Recv}}_\textsf {\small {X}}\)).

Suppose that we wish to reason about the behavior of the abstract communication system from Fig. 1(a) when it is implemented over the public channel in Fig. 1(b). In particular, in the low-level implementation, Eve and other processes (e.g., Bob) are required to share the same channel, no longer benefitting from the separation provided by the abstraction in Fig. 1(a). Does the property of the abstract communication hold in every possible implementation? If not, which decisions ensure that Alice’s secret remains protected from Eve? We formulate these questions as the problem of synthesizing a *property-preserving mapping* between a pair of high-level and low-level models.

**Events, Traces, and Processes.** Let *L* be a potentially infinite set of labels. An *event* *e* is a finite, non-empty set of labels: \(e \in E(L)\), where *E*(*L*) is the set of all finite subsets of *L* except the empty set \(\emptyset \). Let \(S^*\) be the set of all finite sequences of elements of set *S*. A *trace* *t* is a finite sequence of events: \(t\in T(L)\), where *T*(*L*) is the set of all traces over *L* (i.e., \(T(L) = (E(L))^*\)). The empty trace is denoted by Open image in new window , and the trace consisting of a sequence of events \(e_1,e_2,...\) is denoted Open image in new window . If *t* and \(t'\) are traces, then \(t\cdot t'\) is the trace obtained by concatenating *t* and \(t'\). Note that Open image in new window for any trace *t*.

*t*be a trace over set of labels

*L*, and let \(A \subseteq L\) be a subset of

*L*. The

*projection of t onto A*, denoted Open image in new window , is defined as follows: For example, if Open image in new window , then Open image in new window and Open image in new window .

A *process* *P* is defined as a triple \(( L _P, E _P, T _P)\). The *labels* of process *P*, \( L _P \subseteq L\), is the set of all labels appearing in *P*, and \( E _P \subseteq E( L )\) is the set of events that *may* appear in traces of *P*, which are denoted by \( T _P \subseteq T( L )\). We assume traces in every process *P* to be *prefix-closed*; i.e., Open image in new window and for every non-empty trace Open image in new window , \(t \in T _P\).

**Parallel Composition.** A pair of processes *P* and *Q* synchronize with each other by performing events \(e_1\) and \(e_2\), respectively, if these two events share at least one label. In their parallel composition, denoted \(P \parallel Q\), this synchronization is represented by a new event \(e'\) that is constructed as the union of \(e_1\) and \(e_2\) (i.e., \(e' = e_1 \cup e_2\)).

*eventCond*is defined as

*syncCond*, is defined as

The definition of \( T _{P \parallel Q}\) states that if we take a trace *t* in the composite process and ignore labels that appear only in *Q*, then the resulting trace must be a valid trace of *P* (and symmetrically for *Q*). The condition (**Cond. 1**) is imposed on every event appearing in \( T _{P \parallel Q}\) to ensure that an event performed together by *P* and *Q* contains at least one common label shared by both processes.

*sets*of labels. That is, the CSP parallel composition is the special case of the composition of

**Def. 1**where every event is a singleton (i.e., it contains exactly one label). Note that if event

*e*contains exactly one label

*a*, then

*a*must belong to the alphabet of

*P*or that of

*Q*, which means

*syncCond*(

*e*) always evaluates to true. The resulting expression in that case is equivalent to the definition of parallel composition in CSP [21, Sec. 2.3.3].

**Mapping Composition.** A *mapping m over set of labels L* is a partial function Open image in new window . Informally, \(m(a) = b\) stipulates that every event that contains *a* as a label is to be assigned *b* as an additional label. We sometimes use the notations \(a \mapsto _m b\) or \((a,b)\in m\) as alternatives to \(m(a) = b\). When we write \(m(a)=b\) we mean that *m*(*a*) is defined and is equal to *b*. The *empty* mapping, denoted \(m=\emptyset \), is the partial function \(m:L\rightarrow L\) which is undefined for all \(a\in L\).

*Mapping composition*allows a pair of processes to interact with each other over distinct labels. Formally, consider two processes \(P = ( L _P, E _P, T _P)\) and \(Q = ( L _Q, E _Q, T _Q)\), and let \(L = L _P\cup L _Q\). Given mapping Open image in new window , the

*mapping composition*\(P \Vert _mQ\) is defined as follows: where \( L _{P \Vert _mQ} = L _P \cup L _Q\), and \(syncCond'(e)\) and

*mapCond*(

*e*,

*m*) are defined as:

*dom*(

*m*) is the domain of function

*m*. Compared to

**Def. 1**, the additional disjunct in \(syncCond'(e)\) allows

*P*and

*Q*to synchronize even when they do not share any label, if at least one pair of their labels are mapped to each other in

*m*. The predicate

*mapCond*ensures that if an event

*e*contains a label

*a*and

*m*is defined over

*a*, then

*e*also contains the label that

*a*is mapped to.

Note that **Def. 2** is different from the definition of mapping composition in [28], and corrects a flaw in the latter. In particular, the definition in [28] omits condition \(syncCond'\), which permits the undesirable case in which events \(e_1\) and \(e_2\) from *P* and *Q* are synchronized into union \(e=e_1\cup e_2\) even when the events do not share any label.

**Example.**Let

*P*and

*Q*be the abstract and public channel communication models from Fig. 1(a) and (b), respectively. The property that Eve never learns Alice’s secret can be stated as follows:

**Def. 2**), the following event may appear in a trace of the overall composite process:Note that this trace is a violation of the above property (i.e., \(I_{m_1} \not \models \varPhi \)). This can be seen as an example of

*abstraction violation*: As a result of decisions in \(m_1\), \(\textsf {\small {a.b.s}}\) and \(\textsf {\small {u.e.s}}\) now share the same underlying representation (\(\textsf {\small {s}}\)), and \(\textsf {\small {Eve}} \) is able to engage in an event with a label (a.b.s) that was not previously available to it in the abstract model.

**Properties of the Mapping Composition Operator.** Mapping composition is a generalization of parallel composition: The latter is a special case of mapping composition where the given mapping is empty:

### Lemma 1

Given a pair of processes *P* and *Q*, if \(m = \emptyset \) then \(P \Vert _mQ = P \parallel Q\).

**Commutativity.** The proposed mapping composition operator is commutative: i.e., Open image in new window . This property can be inferred from the fact that **Def. 2** is symmetric with respect to *P* and *Q*. It follows that by being a special case of mapping composition, the parallel composition operator is also commutative.

**Associativity.** The mapping composition operator is associative under the following conditions on the alphabets of involved processes and mappings:

### Theorem 1

Given processes *P*, *Q*, and *R*, let Open image in new window and Open image in new window . If \( E _X = E _Y\), then \(X=Y\).

### Proof

Available in the extended version of this paper [27].

## 3 Synthesis Problems

The *mapping verification problem* is to check, given processes *P* and *Q*, mapping *m*, and specification \(\varPhi \), whether \((P \Vert _mQ) \models \varPhi \). This problem was studied by Kang et al. [28]. In this paper, we introduce and study, for the first time to our knowledge, the problem of *mapping synthesis*. We begin with a simple formulation of the problem and then generalize it. We will not define what exactly the specification \(\varPhi \) may be, neither the satisfaction relation \(\models \), as the mapping synthesis problems defined below are generic and can work with any type of specification or satisfaction relation. In Sect. 5.1, we discuss how this generic framework is instantiated in our implementation.

### Problem 1

* (Mapping Synthesis).* Given processes

*P*and

*Q*, and specification \(\varPhi \), find, if it exists, a mapping

*m*such that \((P \Vert _mQ) \models \varPhi \). We call such an

*m*a

*valid*mapping.

Note that if \(\varPhi \) is a *trace* property [2, 29], this problem can be stated as a \(\exists \forall \) problem; that is, finding a witness *m* to the formula \(\exists m : \forall t \in T_{P \Vert _mQ} : t \in \varPhi \).

Instead of synthesizing *m* from scratch, the developer may wish to express their partial system knowledge as a given *constraint*, and ask the synthesis tool to generate a mapping that adheres to this constraint. For instance, given labels \(a, b, c \in L\), one may express a constraint that *a* must be mapped to either *b* or *c* as part of every valid mapping; this gives rise to two possible candidate mappings, \(m_{1}\) and \(m_{2}\), where \(m_{1}(a) = b\) and \(m_{2}(a) = c\). Formally, let *M* be the set of all possible mappings between labels *L*. A *mapping constraint* \(C \subseteq M\) is a set of mappings that are considered legal candidates for a final, synthesized valid mapping. Then, the problem of synthesizing a mapping given a constraint can be formulated as follows:

### Problem 2

* (Generalized Mapping Synthesis).* Given processes

*P*and

*Q*, specification \(\varPhi \), and mapping constraint

*C*, find, if it exists, a valid mapping

*m*such that \(m \in C\).

Note that Problem 1 is a special case of Problem 2 where \(C = M\). The synthesis problem can be further generalized to one that involves synthesizing a constraint that contains a *set* of valid mappings:

### Problem 3

* (Mapping Constraint Synthesis).* Given processes

*P*and

*Q*, specification \(\varPhi \), and mapping constraint

*C*, generate, if it exists, a non-empty set of valid mappings \(C'\) such that \(C' \subseteq C\). We call such a \(C'\) valid with respect to

*P*,

*Q*, \(\varPhi \) and

*C*.

A procedure for solving Problem 3 can be used to solve Problem 2: Having generated constraint \(C'\), we can pick any mapping \(m \in C'\). Such an *m* is guaranteed to be valid and also to belong in *C*.

In practice, it is desirable for \(C'\) to be as large as possible while still being valid, as it provides more implementation choices (i.e., possible mappings). In particular, we say that a mapping constraint \(C'\) is *maximal* with respect to *P*, *Q*, \(\varPhi \), and *C* if and only if (1) \(C'\) is valid with respect to *P*, *Q*, \(\varPhi \), and *C*, and (2) there exists no other constraint \(C''\) such that \(C''\) is also valid w.r.t. *P*, *Q*, \(\varPhi \), *C*, and \(C' \subseteq C''\). Then, our final synthesis problem can be stated as follows:

### Problem 4

* (Maximal Constraint Synthesis).* Given processes

*P*and

*Q*, property \(\varPhi \), and constraint

*C*, generate, if it exists, a maximal constraint \(C'\) with respect to

*P*,

*Q*, \(\varPhi \),

*C*.

If found, \(C'\) is a *local* optimal solution. In general, there may be multiple maximal constraints for given *P*, *Q*, \(\varPhi \), and *C*.

**Example.**Back to our running example, an alternative implementation of the abstract communication model over the public channel involves encrypting messages sent by Aliceto Bob using a key (y) that Eve does not possess; this decision can be encoded as the following

*valid*mapping \(m_2\):

## 4 Synthesis Technique

**Mapping Representation.** In our approach, mappings are represented *symbolically* as logical expressions over variables that correspond to labels being mapped. The symbolic representation has the following advantages over an explicit one (where the entries of mapping *m* are enumerated explicitly): (1) it provides a succinct representation of implementation decisions to the developer (which is especially important as the size of the mapping grows large) and (2) it allows the user to specify partial implementation decisions (i.e., given constraint *C*) in a *declarative* manner.

*syntactic*approach where the space of candidate mapping constraints is restricted to expressions that can be constructed from a given grammar. Our grammar is specified as follows:where

*Var*is a set of variables that represent parameters inside a label, and

*Const*is the set of constant values. Intuitively, this grammar captures implementation decisions that involve assignments of parameters in an abstract label to their counterparts in a concrete label (represented by the equality operator “\(=\)”). A logical implication is used to construct a conditional assignment of a parameter.

*symbolic*labels

*abs*and

*conc*, where

*abs*represents the label being mapped to

*conc*. The body of each predicate is constructed as an expression from the above grammar. For example, let

*abs*\(=\) \(\textsf {\small {a.b}}.msg\) be a symbolic encoding of labels that represent Alicecommunicating to Eve, with variable

*msg*corresponding to the message being sent; similarly, let \(conc = msg'.key\) be a symbolic label in the public channel model, where \(msg'\) and

*key*correspond to the message being transmitted and the key used to encrypt it (if any). Then, the expression

*msg*in the abstract label must be equal to that in the concrete label (i.e., the message being transmitted must be preserved during the mapping) and (2) if the message is a secret, key \(\textsf {\small {y}}\) must be used to encrypt it in the implementation.

*m*is allowed by \(\mathcal {X}(abs, conc)\) if and only if for each label

*abs*, (1)

*m*is defined over

*abs*if and only if there exists some label

*conc*for which \(\mathcal {X}(abs,conc)\) evaluates to true, and (2)

*m*maps

*abs*to such a label

*conc*.

**Algorithmic Considerations.** To ensure that the algorithm terminates, the set of expressions that may be constructed using the given grammar is restricted to a finite set, by bounding the domains of data types (e.g., distinct messages and keys in our running example) and the size of expressions. We also assume the existence of a verifier that is capable of checking whether a candidate mapping satisfies a given specification \(\varPhi \). The verifier implements function \(verify(C,P,Q,\varPhi )\) which returns *OK* if and only if every mapping allowed by constraint *C* is valid with respect to \(P,Q,\varPhi \).

**Generalization Algorithm.** Once we limit the number of candidate expressions to be finite, we can use a brute-force algorithm to enumerate and check those candidates one by one. However, this naive algorithm is likely to suffer from scalability issues. Thus, we present an algorithm that takes a generalization-based approach to identify and prune undesirable parts of the search space. A key insight is that only a few implementation decisions—captured by some *minimal subset* of the entries in a mapping—may be sufficient to imply that the resulting implementation will be invalid. Thus, given some invalid mapping, the algorithm attempts to identify this minimal subset and construct a larger constraint \(C_{bad}\) that is guaranteed to contain only invalid mappings.

The outline of the algorithm is shown in Fig. 2. The function *synthesize* takes four inputs: processes *P* and *Q*, specification \(\varPhi \), and a user-specified mapping constraint *C*. It also maintains a set of constraints *X*, which keeps track of “bad” regions of the search space that do not contain any valid mappings.

In each iteration, the algorithm selects some mapping *m* from *C* (line 3) and checks whether it belongs to one of the constraints in *X* (meaning, the mapping is guaranteed to result in an invalid implementation). If so, it is simply discarded (lines 4–5).

*m*is valid with respect to \(\varPhi \) (line 7). If so, then

*generalize*is invoked to produce a maximal mapping constraint \(C_{maximal}\), which represents the largest set that contains \(\{m\}\), is contained in

*C*, and is valid with respect to \(P,Q,\varPhi \) (line 9). If, on the other hand,

*m*is invalid (i.e., it fails to preserve \(\varPhi \)), then

*generalize*is invoked to compute the largest superset \(C_{bad}\) of \(\{m\}\) that contains only invalid mappings (i.e., those that satisfy \(\lnot \varPhi \)). The set \(C_{bad}\) is then added to

*X*and used to prune out subsequent, invalid candidates (line 13).

**Constraint Generalization.** The function \(generalize(C',P,Q,\varPhi ,C)\) computes a maximal set that contains \(C'\), is contained within *C*, and only permits mappings that satisfy \(\varPhi \). This function is used in two different ways: (1) to identify an undesirable region of the candidate space that should be avoided, and (2) to produce a maximal version of a valid mapping constraint.

The procedure works by incrementally growing \(C'\) into a larger set \(C_{relaxed}\) and stopping when \(C_{relaxed}\) contains at least one mapping that violates \(\varPhi \). Suppose that constraint \(C'\) is represented by a symbolic expression \(\mathcal {X}\), which itself is a conjunction of *n* subexpressions \(k_1 \wedge k_2 \wedge ... \wedge k_n\), where each \(k_i\) for \(1 \le i \le n\) represents a (possibly conditional) assignment of a variable or a constant to some label parameter. The function \(decompose(C')\) takes the given constraint and returns the set of such subexpressions. The function \(relax(C',k_i)\) then computes a new constraint by removing *k* from \(C'\); this new constraint, \(C_{relaxed}\), is a larger set of mappings that subsumes \(C'\).

The verifier is then used to check \(C_{relaxed}\) against \(\varPhi \) (line 22). If \(C_{relaxed}\) is still valid with respect to \(\varPhi \), then the implementation decision encoded by *k* is irrelevant for \(\varPhi \), meaning we can safely remove *k* from the final synthesized constraint \(C'\) (line 24). If not, *k* is retained as part of \(C'\), and the algorithm moves onto the next subexpression *k* as a candidate for removal (line 20). On line 23, we also make sure that \(C_{relaxed}\) does not violate the predefined user constraints *C*.

**Example.**Let \(abs = \textsf {\small {a.e}}.msg\) be a symbolic label that represents Alice sending a message (

*msg*) to Eve, and \(conc = msg'.key\) be its corresponding label in the public channel model. Then, one candidate constraint \(C'\) for mappings from the high-level to low-level labels can be specified as the following expression:

*P*,

*Q*and \(\varPhi \). Next, the generalization procedure removes the subexpression \(k_1\) \(\equiv \) \((msg = \textsf {\small {p}}\) \(\implies \) \(key = \textsf {\small {x}})\) from \(C'\), resulting in constraint \(C_{relaxed}\) that is represented as:

## 5 Implementation and Case Studies

### 5.1 Implementation

We have built a prototype implementation^{2} of the synthesis algorithm described in Sect. 4. Our tool uses the Alloy Analyzer [25] as the underlying modeling and verification engine. Alloy’s flexible, declarative relational logic is convenient for encoding the semantics of the mapping composition as well as specifying mapping constraints. The analysis engine for Alloy uses an off-the-shelf SAT solver to perform *bounded* verification [25]. In particular, our current prototype is capable of synthesizing mappings to preserve the following types of properties: *reachability* and *safety* properties, which can be expressed in either of the forms \(\exists t : t \in T_P \wedge t\in \phi \) (reachability) and \(\lnot \exists t : t \in T_P \wedge t\not \in \phi \) (safety) for some process *P* and property \(\phi \).

However, our synthesis approach does not prescribe the use of a particular modeling and verification engine, and can be implemented using other tools as well (such as an SMT solver [11, 12]).

### 5.2 Case Studies: OAuth Protocols

As two major case studies, we took on the problem of synthesizing valid mappings for *OAuth 1.0 and OAuth 2.0*, two real-world protocols used for *third-party authorization* [24]. The purpose of the OAuth protocol family in general is to allow an application (called a *client* in the OAuth terminology) to access a resource from another application (an *authorization server*) without needing the credentials of the resource owner (a *user*). For example, a gaming application may initiate an OAuth process to obtain a list of friends from a particular user’s Facebook account, provided that the user has authorized Facebook to release this resource to the client.

OAuth 2.0 is the newer version of the protocol, while OAuth 1.0 is an older version. Although OAuth 2.0 is intended to be a replacement for OAuth 1.0, there has been much contention within the developer community about whether it actually improves over its predecessor in terms of security [17]. Since both protocols are designed to provide the same security guarantees (i.e., both share common properties), our goal was to apply our synthesis approach to systematically compare what developers would be required to do in order to construct secure web-based implementations of the two.

### 5.3 Formal Modeling

For our case studies, we constructed the following set of Alloy models: (1) model \(P_{1.0}\) representing OAuth 1.0; (2) model \(P_{2.0}\) representing OAuth 2.0; (3) model *Q* representing generic HTTP interactions between a browser and a server, as well as the behavior of a web-based attacker; (4) specification \(\varPhi \) describing desired protocol properties (same for both OAuth 1.0 and 2.0); and (5) mapping constraints \(C_{1.0}\) and \(C_{2.0}\) representing initial, user-specified partial mappings for OAuth 1.0 and 2.0, respectively. The complete models are approximately 1800 lines of Alloy code in total, and took around 4 man-months to build. These models were then provided as inputs to our tool to solve two instances of Problem 4 from Sect. 3. In particular, we synthesized a maximal mapping constraint \(C_{1.0}'\) such that every \(m \in C_{1.0}'\) ensures that \(P_{1.0} \Vert _mQ \models \varPhi \). and a maximal mapping constraint \(C_{2.0}'\) such that every \(m \in C_{2.0}'\) ensures that \(P_{2.0} \Vert _mQ \models \varPhi \).

**OAuth Models** (\(P_{1.0}\), \(P_{2.0}\)). We constructed Alloy models of OAuth 1.0 and 2.0 based on the official protocol specifications [23, 24]. Due to limited space, we give only a brief overview of the models. Each model consists of four processes: Client, AuthServer, and two users, Aliceand Eve (the latter with a malicious intent to access Alice’s resources).

A typical OAuth 2.0 workflow, shown in Fig. 3(a), begins with a user (Aliceor Eve) initiating a new protocol session with Client (initiate). The user is then asked to prove their own identity to AuthServer (by providing a user ID and a password) and officially authorize the client to access their resources (authorize). Given the user’s authorization, the server then allocates a unique code for the user, and then redirects their back to the client. The user forwards the code to the client (forward), which then can exchange the code for an access token to their resources (getToken).

Like in OAuth 2.0, a typical workflow in OAuth 1.0 (depicted in Fig. 3(b)) begins with a user initiating a new session with Client (initiate). Instead of immediately directing the user to AuthServer, however, Client first obtains a *request token* from AuthServer and associates it with the current session (getReqToken). The user is then asked to present the same request token to AuthServer and authorize Client to access their resources (authorize). Once notified by the user that the authorization step has taken place (notify), Client exchanges the request token for an access token that can be used subsequently to access their resources (getAccessToken).

**Specification**(\(\varPhi \)). There are two desirable properties of OAuth protocols in general: (1)

**Authenticity**: When the client receives an access token, it must correspond to the user who initiated the current protocol session. (2)

**Completion**: There exists at least one trace in which the protocol interactions are carried out to completion in the order of steps described in Fig. 3. Authenticity is a safety property while completion is a reachability property. The input specification \(\varPhi \) consists of these two properties. Completion is essential for ruling out mappings that over-constrain the resulting implementation and prevent certain steps of the protocol from being performed.

**HTTP Platform Model**(

*Q*). Our goal was to explore and synthesize

*web-based*implementations of OAuth. For this purpose, we constructed a formal model depicting interactions between a generic HTTP server and web browser. The model contains two types of processes, Server and Browser (which may be instantiated into multiple processes representing different servers and browsers). They interact with each other over HTTP requests, which share the following signature:

*generic*,

*application-independent*HTTP interactions. In particular, each Browser process is a machine that constructs, at each communication step with Server, an arbitrary HTTP request by non-deterministically selecting a value for each parameter of the request. The processes, however, follow a

*platform-specific*logic; for instance, when given a response from Server that instructs a browser cookie to be stored at a particular URL, Browser will include this cookie along with every subsequent request directed at that URL. In addition, the model includes a process that depicts the behavior of a web attacker, who may operate their own malicious server and exploit weaknesses in a browser to manipulate the user into sending certain HTTP requests.

**Mapping Constraint** (\(C_{1.0}, C_{2.0}\)). Building a web-based implementation of OAuth involves decisions about how abstract protocol operations are to be realized in terms of HTTP requests. As an input to the synthesizer, we specified an initial set of constraints that describe partial implementation decisions for both OAuth protocols; the ones for OAuth 2.0 are shown in Fig. 4. These decisions include a designation of fixed host and path names inside URLs for various OAuth operations (e.g., http:/client.com/initiate for the OAuth initiate event), and how certain parameters are transmitted as part of an HTTP request (ret_session as a return cookie in initiate). It is reasonable to treat these constraints as given, since they describe decisions that are common across typical web-based OAuth implementations.

**Insecure Mapping for OAuth 2.0.** Let us now give an example of an insecure mapping that satisfies the user-given constraint in Fig. 4 but could introduce a security vulnerability into the resulting implementation. Later in Sect. 5.4, we describe how our tool can be used to synthesize a secure mapping that prevents this vulnerability.

*code*and

*session*of the abstract event label are encoded using their concrete counterparts in an HTTP request. A number of choices is available. In one possible implementation, the authorization code may be transmitted as a query parameter inside the URL, and the session as a browser cookie, as described by the following constraint expression, \(\mathcal {X}_1\):

*l*[

*i*] refers to

*i*-th element of list

*l*.

*session swapping*[39]).

### 5.4 Results

Our synthesis tool was able to generate valid mapping constraints for both OAuth protocols. In particular, the constraints describe mitigations against attacks that exploit an interaction between the OAuth logic and security vulnerabilities in a web browser.

**OAuth 2.0.** The synthesized symbolic mapping constraint for OAuth 2.0 consists of 39 conjuncts in total, each capturing a (conditional) assignment of a concrete HTTP parameter to a constant (e.g., *b*.*url*.*path* \(=\) \(\textsf {\small {forward}}\)) or an abstract OAuth parameter (e.g., *b*.*url*.*queries*[0] \(=\) *a*.*code*). In particular, the constraint captures mitigations against *session swapping* [39] and *covert redirect* [16]. Due to limited space, we omit the full constraint, but instead describe how the vulnerability described at the end of Sect. 5.3 can be mitigated by our synthesized mapping.

^{3}. In particular, \(\mathcal {X}_2\) stipulates that every forward request must include an additional value (nonce) as an argument besides the code and the session, and that this nonce be unique for each session value. \(\mathcal {X}_2\) ensures that the resulting implementation satisfies the desired properties of OAuth 2.

**OAuth 1.0.** The synthesized symbolic mapping constraint for OAuth 1.0 consists of 48 conjuncts in total, capturing how the abstract parameters of the five OAuth 1.0 operations are related to concrete HTTP parameters. The constraint synthesized by our tool for OAuth 1.0 encodes a mitigation against the *session fixation* [15] attack; in short, this mitigation involves strengthening the notify operation with unique nonces (similar to the way the forward operation in OAuth 2.0 was fixed above) to prevent the attacker from violating the authenticity property.

**Performance.**Figure 5 shows experimental results for the two OAuth protocols

^{4}. Overall, the synthesizer took approximately 17.6 and 24.0 min to synthesize the constraints for 1.0 and 2.0, respectively. In both cases, the tool spent a considerable amount of time on the generalization step to learn the invalid regions of the search space. Note that generalization is effective at identifying and discarding a very large number of invalid candidates; it was able to skip 2184 out of 2465 candidates for OAuth 1.0 (\(\approx \)88.6%) and 1292 out of 1453 for OAuth 2.0 (\(\approx \)88.9%). Our generalization technique was particularly effective for the OAuth protocols, since a significant percentage of the candidate constraints would result in an implementation that violates the completion property (i.e., it prevents Aliceor Eve from completing a protocol session in an expected order). Often, the decisions contributing to this violation could be localized to a small subset of entries in a mapping (for example, attempting to send a cookie to a mismatched URL, which is inconsistent with the behavior of the browser process). By identifying this subset, our algorithm was able to discover and eliminate a large number of invalid mappings.

## 6 Related Work

Our approach has been inspired by the success of recent synthesis paradigms such as *sketching* [36, 37, 38], *oracle-guided synthesis* [26] and *syntax-guided synthesis* [3]. Our technique shares many similarities with these approaches in that (1) it allows the user to provide a partial specification of the artifact to be synthesized (in the form of constraints or examples), therefore having the underlying engine *complete* the remaining parts; (2) it relies on an interaction between the verifier, which checks candidate solutions, and the synthesizer, which prunes that search space based on previous invalid candidates. Our work also differs in a number of aspects. First, we synthesize mappings from high-level models to low-level execution platforms, which to our knowledge has not been considered before. Second, our approach leverages constraint generalization to not only prune the search space, but also to produce a constraint capturing a (locally) maximal set of valid mappings. Third, our application domain is in security protocols.

A large body of literature exists on *refinement-based* methods to system construction [4, 20]. These approaches involve building an implementation *Q* that is a behavioral refinement of *P*; such *Q*, by construction, would satisfy the properties of *P*. In comparison, we start with an assumption that *Q* is a *given* platform, and that the developer may not have the luxury of being able to modify or build *Q* from scratch. Thus, instead of behavioral refinement (which may be too challenging to achieve), we aim to preserve some critical property \(\phi \) when *P* is implemented using *Q*.

The task of synthesizing a valid mapping can be seen as a type of the *model merging* problem [8]. This problem has been studied in various contexts, including architectural views [31], behavioral models [6, 32, 40], and database schemas [34]. Among these, our work is most closely related to merging of partial behavioral models [6, 40]. In these works, given a pair of models \(M_1\) and \(M_2\), the goal is to construct \(M'\) that is a behavioral refinement of both \(M_1\) and \(M_2\). The approach proposed in this paper differs in that (1) the mapping composition involves merging a pair of events with distinct alphabet labels into a single event that retains all of those labels, and (2) the composed process \((P \Vert _mQ)\) need not be a behavioral refinement of *P* or *Q*, as long as it satisfies property \(\phi \).

Bhargavan and his colleagues presents a compiler that takes a high-level program written using *session types* [22] and automatically generates a low-level implementation [7]. This technique is closer to compilation than to synthesis in that it uses a fixed translation scheme from high-level to low-level operations in a specific language environment (.NET), without searching a space of possible translations. Synthesizing a low-level implementation from a high-level specification has also been studied in the context of data structures [18, 19], although their underlying representation (relational algebra for data schema specification) is very different from ours (process algebra).

A significant contribution of our work is the production of formal models for real-world protocols such as OAuth and HTTP. There have been similar efforts by other researchers in building reusable models of the web for security analysis [1, 5, 14]. As far as we know, however, none of these models has been used for synthesis.

## 7 Conclusions

In this paper, we have proposed a novel system design methodology centered around the notion of *mappings*. We have presented novel *mapping synthesis problems* and an algorithm for efficiently synthesizing symbolic maximal valid mappings. In addition, we have validated our approach on realistic case studies involving the OAuth protocols.

Future directions include performance improvements (e.g., exploiting the fact that our generalization-based algorithm is easily parallelizable), combining our generalization-based synthesis method with a counter-example guided approach, and application of our synthesis approach to other domains beside security (e.g., platform-based design and embedded systems [35]).

## Footnotes

- 1.
A formalization of this property is provided later in this section.

- 2.
The tool, along with the models used in our case studies, is available at https://github.com/eskang/MappingSynthesisTool.

- 3.
A nonce is a unique piece of string intended to be used once in communication.

- 4.
The experiments were performed on a Mac OS X 2.7 GHz laptop with 8G RAM and MiniSat [13] as the underlying SAT solver employed by the Alloy Analyzer.

## References

- 1.Akhawe, D., Barth, A., Lam, P.E., Mitchell, J.C., Song, D.: Towards a formal foundation of web security. In: CSF, pp. 290–304 (2010)Google Scholar
- 2.Alpern, B., Schneider, F.B.: Defining liveness. Inf. Process. Lett.
**21**(4), 181–185 (1985)MathSciNetCrossRefGoogle Scholar - 3.Alur, R., et al.: Syntax-guided synthesis. In: FMCAD, pp. 1–8 (2013)Google Scholar
- 4.Back, R.-J.: A calculus of refinements for program derivations. Acta Inf.
**25**(6), 593–624 (1988)MathSciNetCrossRefGoogle Scholar - 5.Bansal, C., Bhargavan, K., Maffeis, S.: Discovering concrete attacks on website authorization by formal analysis. In: CSF, pp. 247–262 (2012)Google Scholar
- 6.Ben-David, S., Chechik, M., Uchitel, S.: Merging partial behaviour models with different vocabularies. In: CONCUR, pp. 91–105 (2013)Google Scholar
- 7.Bhargavan, K., Corin, R., Deniélou, P.-M., Fournet, C., Leifer, J.J.: Cryptographic protocol synthesis and verification for multiparty sessions. In: CSF, pp. 124–140 (2009)Google Scholar
- 8.Brunet, G., Chechik, M., Easterbrook, S., Nejati, S., Niu, N., Sabetzadeh, M.: A manifesto for model merging. In: Proceedings of the 2006 International Workshop on Global Integrated Model Management, pp. 5–12 (2006)Google Scholar
- 9.Chari, S., Jutla, C.S., Roy, A.: Universally Composable Security Analysis of OAuth v2.0. IACR Cryptology ePrint Archive, 2011, 526 (2011)Google Scholar
- 10.Chen, E.Y., Pei, Y., Chen, S., Tian, Y., Kotcher, R., Tague, P.: OAuth Demystified for Mobile Application Developers. In: CCS, pp. 892–903 (2014)Google Scholar
- 11.de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24CrossRefGoogle Scholar
- 12.Dutertre, B.: Yices 2.2. In: CAV, pp. 737–744 (2014)Google Scholar
- 13.Eén, N., Sörensson, N.: An extensible sat-solver. In: SAT, pp. 502–518 (2003)Google Scholar
- 14.Fett, D., Küsters, R., Schmitz, G.: An expressive model for the web infrastructure: definition and application to the browser ID SSO system. In: 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, 18–21 May 2014, pp. 673–688 (2014)Google Scholar
- 15.OAuth Working Group. OAuth Security Advisory: 2009.1 “Session Fixation”. https://oauth.net/advisories/2009-1 (2009)
- 16.OAuth Working Group. OAuth Security Advisory: 2014.1 “Covert Redirect”. https://oauth.net/advisories/2014-1-covert-redirect (2014)
- 17.Hanmer. E.: OAuth 2.0 and the Road to Hell. https://hueniverse.com/2012/07/26/oauth-2-0-and-the-road-to-hell (2012)
- 18.Hawkins, P., Aiken, A., Fisher, K., Rinard, M.C., Sagiv, M.: Data representation synthesis. In: PLDI, pp. 38–49 (2011)Google Scholar
- 19.Hawkins, P., Aiken, A., Fisher, K., Rinard, M.C., Sagiv, M.: Concurrent data representation synthesis. In: PLDI, pp. 417–428 (2012)Google Scholar
- 20.Hoare, C.A.R.: Proof of correctness of data representations. Acta Inf.
**1**, 271–281 (1972)CrossRefGoogle Scholar - 21.Hoare, C.A.R.: Communicating sequential processes. Commun. ACM
**21**(8), 666–677 (1978)CrossRefGoogle Scholar - 22.Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. In: POPL, pp. 273–284 (2008)Google Scholar
- 23.Internet Engineering Task Force. The OAuth 1.0 Protocol. https://tools.ietf.org/html/rfc5849 (2010)
- 24.Internet Engineering Task Force. OAuth Authorization Framework. http://tools.ietf.org/html/rfc6749 (2014)
- 25.Jackson, D.: Software Abstractions: Logic, Language, and Analysis. MIT Press, Cambridge (2006)Google Scholar
- 26.Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided component-based program synthesis. In: ICSE, pp. 215–224 (2010)Google Scholar
- 27.Kang, E., Lafortune, S., Tripakis, S.: Synthesis of property-preserving mappings. CoRR, abs/1705.03618 (2017)Google Scholar
- 28.Kang, E., Milicevic, A., Jackson, D.: Multi-representational security analysis. In: FSE (2016)Google Scholar
- 29.Lamport, L.: Proving the correctness of multiprocess programs. IEEE Trans. Software Eng.
**3**(2), 125–143 (1977)MathSciNetCrossRefGoogle Scholar - 30.Lampson, B.W.: A note on the confinement problem. Commun. ACM
**16**(10), 613–615 (1973)CrossRefGoogle Scholar - 31.Maoz, S., Ringert, J.O., Rumpe, B.: Synthesis of component and connector models from crosscutting structural views. In: ESEC/FSE, pp. 444–454 (2013)Google Scholar
- 32.Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S.M., Zave, P.: Matching and merging of statecharts specifications. In: ICSE, pp. 54–64 (2007)Google Scholar
- 33.Pai, S., Sharma, Y., Kumar, S., Pai, R.M., Singh, S.: Formal verification of OAuth 2.0 using Alloy framework. In: Communication Systems and Network Technologies (CSNT), pp. 655–659. IEEE (2011)Google Scholar
- 34.Pottinger, R., Bernstein, P.A.: Merging models based on given correspondences. In: VLDB, pp. 826–873 (2003)Google Scholar
- 35.Sangiovanni-Vincentelli, A.L., Martin, G.: Platform-based design and software design methodology for embedded systems. IEEE Design Test Comput.
**18**(6), 23–33 (2001)CrossRefGoogle Scholar - 36.Solar-Lezama, A.: The sketching approach to program synthesis. In: Hu, Z. (ed.) APLAS 2009. LNCS, vol. 5904, pp. 4–13. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10672-9_3CrossRefGoogle Scholar
- 37.Solar-Lezama, A., Rabbah, R., Bodík, R., Ebcioğlu, K.: Programming by sketching for bit-streaming programs. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 281–294. ACM (2005)Google Scholar
- 38.Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S.A., Saraswat, V.A.: Combinatorial sketching for finite programs. In: ASPLOS, pp. 404–415 (2006)Google Scholar
- 39.Sun, S.-T., Beznosov, K.: The devil is in the (implementation) details: an empirical analysis of OAuth SSO systems. In: CCS, pp. 378–390 (2012)Google Scholar
- 40.Uchitel, S., Chechik, M.: Merging partial behavioural models. In: SIGSOFT FSE, pp. 43–52 (2004)Google Scholar
- 41.Wang, R., Chen, S., Wang, X.: Signing me onto your accounts through facebook and google: a traffic-guided security study of commercially deployed single-sign-on web services. In: IEEE Symposium on Security and Privacy, pp. 365–379 (2012)Google Scholar
- 42.Xu, X., Niu, L., Meng, B.: Automatic verification of security properties of OAuth 2.0 protocol with cryptoverif in computational model. Inf. Technol. J.
**12**(12), 2273 (2013)CrossRefGoogle Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.