# White-Box Security Notions for Symmetric Encryption Schemes

## Abstract

White-box cryptography has attracted a growing interest from researchers in the last decade. Several white-box implementations of standard block-ciphers (DES, AES) have been proposed but they have all been broken. On the other hand, neither evidence of existence nor proofs of impossibility have been provided for this particular setting. This might be in part because it is still quite unclear what white-box cryptography really aims to achieve and which security properties are expected from white-box programs in applications. This paper builds a first step towards a practical answer to this question by translating folklore intuitions behind white-box cryptography into concrete security notions. Specifically, we introduce the notion of white-box compiler that turns a symmetric encryption scheme into randomized white-box programs, and we capture several desired security properties such as one-wayness, incompressibility and traceability for white-box programs. We also give concrete examples of white-box compilers that already achieve some of these notions. Overall, our results open new perspectives on the design of white-box programs that securely implement symmetric encryption.

## Keywords

White-box cryptography Security notions Attack models Security games Traitor tracing## 1 Introduction

Traditionally, to
prove the security of a cryptosystem, cryptographers consider attack scenarios where an adversary is only given a *black-box* access to the cryptographic system, namely to the inputs and outputs of its underlying algorithms. Security notions are built on the standard paradigm that the algorithms are known and that computing platforms can be trusted to effectively protect the secrecy of the private key.

However attacks on *implementations* of cryptographic primitives have become a major threat due to side-channel information leakage (see for example [18, 28]) such as execution time, power consumption or electromagnetic emanations. More generally, the increasing penetration of cryptographic applications onto untrusted platform (the end points being possibly controlled by a malicious party) makes the black-box model too restrictive to guaranty the security of *programs* implementing cryptographic primitives.

White-box cryptography was introduced in 2002 by Chow, Eisen, Johnson and van Oorschot [10, 11] as the ultimate, *worst-case* attack model. This model considers an attacker far more powerful than in the classical black-box model (and thus more representative of real-world attackers); namely the attacker is given full knowledge and full control on both the algorithm and its execution environment. However, even such powerful capabilities should not allow her to e.g. extract the embedded key^{1}. White-box cryptography can hence be seen as a restriction of general obfuscation where the function to protect belongs to some narrower class of cryptographic functions indexed by a secret key. From that angle, the ultimate goal of a white-box implementation is to leak nothing more than what a black-box access to the function would reveal. An implementation achieving this strong property would be as secure as in the black-box model, in particular it would resist *all existing and future* side-channel and fault-based attacks. Although we know that general obfuscation of any function is impossible to achieve [1], there is no known impossibility result for white-box cryptography and positive examples have even been discovered [7, 15]. On the other hand, the work of Chow *et al.* gave rise to several proposals for white-box implementations of symmetric ciphers, specifically DES [10, 21, 32] and AES [6, 11, 19, 33], even though all these proposals have been broken [3, 13, 16, 20, 22, 23, 24, 31].

Our belief is that the dearth of promising white-box implementations is also a consequence of the absence of well-understood security goals to achieve. A first step towards a theoretical model was proposed by Saxena, Wyseur and Preneel [29], and subsequently extended by Wyseur in his PhD thesis [30]. These results show how to translate any security notion in the black-box model into a security notion in the white-box model. They introduce the *white-box property* for an obfuscator as the ability to turn a program (modeled as a polynomial Turing machine) which is secure with respect to some black-box notion into a program secure with respect to the corresponding white-box notion. The authors then give an example of obfuscator for a symmetric encryption scheme achieving the white-box equivalent of semantic security. In other words, the symmetric encryption scheme is turned into a secure asymmetric encryption scheme. While these advances describe a generic model to translate a given notion from the black-box to the white-box setting, our aim in this paper is to define explicit security notions that white-box cryptography should realize in practice. As a matter of fact, some of our security notions are not black-box notions that one would wish to preserve in the white-box setting, but arise from new features potentially introduced by the white-box compilation. Note that although we use a different formalism and pursue different goals, our work and those in [29, 30] are not in contradiction but rather co-exist in a wider framework.

**Our Contributions.** We formalize the notion of *white-box compilers* for a symmetric encryption scheme and introduce several security notions for such compilers. As traditionally done in provable security (e.g. [2]), we consider separately various adversarial goals (e.g. decrypt some ciphertext) and attack models (e.g. chosen ciphertext attack), and then obtain distinct security definitions by pairing a particular goal with a particular attack model. We consider four different attack models in the white-box context: the chosen plaintext attack, the chosen ciphertext attack, the recompilation attack and the chosen ciphertext and recompilation attack. We formalize the main security objective of white-box cryptography which is to protect the secret key as a notion of *unbreakability*. We show that additional security notions should be considered in applications and translate folklore intuitions behind white-box cryptography into concrete security notions; namely the *one-wayness*, *incompressibility* and *traceability* of white-box programs. For the first two notions, we show an example of a simple symmetric encryption scheme over an RSA group for which an efficient white-box compiler exists that provably achieves both notions. We finally show that white-box programs are efficiently traceable by simple means assuming that functional perturbations can be hidden in them. Overall, our positive results shed more light on the different aspects of white-box security and provide concrete constructions that achieve them in a provable fashion.

## 2 Preliminaries

**Symmetric Encryption.** A symmetric encryption scheme is a tuple \({\mathcal {E}}= \left( {\mathsf {K}}, {\mathsf {M}}, {\mathsf {C}}, K, E, D\right) \) where \({\mathsf {K}}\) is the key space, \({\mathsf {M}}\) is the plaintext (or message) space, \({\mathsf {C}}\) is the ciphertext space, \(K\) is a probabilistic algorithm that returns a key \(k\in {\mathsf {K}} = {\mathsf {range}}\left( {K()}\right) \), \(E\) is a deterministic encryption function mapping elements of \({\mathsf {K}}\times {\mathsf {M}}\) to elements of \({\mathsf {C}}\), \(D\) is a deterministic decryption function mapping elements of \({\mathsf {K}}\times {\mathsf {C}}\) to elements of \({\mathsf {M}}\).

We require that for any \(k\in {\mathsf {K}}\) and any \(m\in {\mathsf {M}}\), \(D(k, E(k, m)) = m\). Most typically, \({\mathcal {E}}\) refers to a block-cipher in which case all sets are made of binary strings of determined length and \({\mathsf {C}} = {\mathsf {M}}\).

**Programs.** A program is a word in the language-theoretic sense and is interpreted in the explicit context of a programming model and an execution model, the details of which we want to keep as abstracted away as possible. Programs differ from remote oracles in the sense that their code can be executed locally, read, copied and modified at will. Successive executions are inherently stateless and all the “system calls” that a program makes to external resources such as a random source or a system clock can be captured and responded arbitrarily. Execution can be interrupted at any moment and all the internal variables identified by the program’s instructions can be read and modified arbitrarily by the party that executes the program.

**Other Notations.** If \({\mathsf {A}}\) is some set, \(|{\mathsf {A}}|\) denotes its cardinality. If \(\mathbb A\) is some generator i.e. a random source with some prescribed output range \({\mathsf {A}}\), \(H(\mathbb A)\) denotes the output entropy of \(\mathbb A\) as a source. Abusing notations, we may also denote it by \(H(a)\) for \(a\leftarrow \mathbb A(\cdots )\). Finally, when we write \({\mathcal {O}}(\cdot )=\epsilon \), we mean that \({\mathcal {O}}\) is the oracle which, on any input, returns the empty string \(\epsilon \).

## 3 White-Box Compilers

In this paper, we consider that a *white-box implementation* of the scheme \({\mathcal {E}}\) is a program produced by a publicly known compiling function \({\mathbf {C}}_{{\mathcal {E}}}\) which takes as arguments a key \(k\in {\mathsf {K}}\) and possibly a diversifying nonce \(r\in {\mathsf {R}}\) drawn from some randomness space \({\mathsf {R}}\). We will denote the compiled program by \([E_{{k}}^{{r}}]\) (or \([E_{{k}}]\) when the random nonce \(r\) is implicit or does not exist), namely \([E_{{k}}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k,r)\).

A compiler \({\mathbf {C}}_{{\mathcal {E}}}\) for \({\mathcal {E}}\) is *sound* when for any \((k,r)\in {\mathsf {K}}\times {\mathsf {R}}\), \([E_{{k}}^{{r}}]\) exactly implements the function \(E(k,\cdot )\) (i.e. it is fully functional). Therefore \([E_{{k}}^{{r}}]\) accepts as input any \(m\in {\mathsf {M}}\) and always returns the correct encryption \(c = E(k,m)\). At this stage, we only care about sound compilers.

### *Remark 1*

In the above definition, we consider white-box compilers for the encryption function. However, since we focus on deterministic encryption – \(E(k,\cdot )\) and \(D(k,\cdot )\) being inverse of one another, we can swap roles without loss of generality and get compilers for the decryption procedure. We will precisely do this in Sect. 7.

Note again that \([E_{{k}}]\) differs in nature from \(E(k,\cdot )\). \(E(k, \cdot )\) is a mapping from \({\mathsf {M}}\) to \({\mathsf {C}}\), whereas \([E_{{k}}]\) is a word in some programming language (the details of which we want to keep away from) and has to fulfill some semantic consistency rules. Viewed as a binary string, it has a certain bitsize \({\mathsf {size}}\left( {[E_{{k}}]}\right) \in {\mathbb {N}}\). Even though \(E(k,\cdot )\) is deterministic, nothing forbids \([E_{{k}}]\) to collect extra bits from a random tape and behave probabilistically. For an input \(m\in {\mathsf {M}}\) and random tape \(\rho \in \{0,1\}^*\), \([E_{{k}}](m, \rho )\) takes a certain time \({\mathsf {time}}\left( {[E_{{k}}](m, \rho )}\right) \in {\mathbb {N}}\) to complete execution.

### 3.1 Attack Models

The first step in specifying new security notions for white-box cryptography is to classify the threats. This section introduces four distinct attack models for an adversary \({\mathcal {A}}\) in the white-box model: the *chosen plaintext attack* (\({\mathsf {CPA}}\)), the *chosen ciphertext attack* (\({\mathsf {CCA}}\)), the *recompilation attack* (\({\mathsf {RCA}}\)) and the *chosen ciphertext and recompilation attack* (\({\mathsf {CCA}}\)+\({\mathsf {RCA}}\)). In all of these, we assume that the compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is public, i.e. at any point in time, the adversary \({\mathcal {A}}\) can select any key \(k\in {\mathsf {K}}\) and nonce \(r\in {\mathsf {R}}\) of her choosing and generate a white-box implementation \([E_{{k}}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k, r)\) by herself.

In a *chosen plaintext attack* (\({\mathsf {CPA}}\)) the adversary can encrypt plaintexts of her choice under \(E(k,\cdot )\). Indeed, even though the encryption scheme \({\mathcal {E}}\) is a symmetric primitive, the attacks are defined with respect to the compiler that generates white-box programs implementing \(E(k,\cdot )\): given any one of these programs, the adversary can always evaluate it on arbitrary plaintexts at will. So clearly, chosen plaintexts attacks cannot be avoided, very much like in the public-key encryption setting.

In a *chosen ciphertext attack* (\({\mathsf {CCA}}\)), in addition to the challenge white-box implementation \([E_{{k}}^{{r}}]\), we give \({\mathcal {A}}\) access to a decryption oracle \(D(k, \cdot )\), i.e. she can send decryption queries \(c_1,\dots ,c_q\in {\mathsf {C}}\) adaptively to the oracle and be returned the corresponding plaintexts \(m_1,\dots ,m_q\in {\mathsf {M}}\) where \(m_i = D(k,c_i)\). Notice that this attack includes the \({\mathsf {CPA}}\) attack when \(q=0\).

In a *recompilation attack* (\({\mathsf {RCA}}\)), in addition to the challenge white-box implementation \([E_{{k}}^{{r}}]\), we give \({\mathcal {A}}\) access to a recompiling oracle \({\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\) that generates other programs \([E_{{k}}^{{r'}}]\) with key \(k\) for adversarially unknown random nonces \(r'\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\). In other words, we give \({\mathcal {A}}\) the ability to observe other programs compiled with the same key and different nonces.

In a *chosen ciphertext and recompilation attack* (\({\mathsf {CCA}}\)+\({\mathsf {RCA}}\)) we give \({\mathcal {A}}\) (the challenge white-box implementation \([E_{{k}}^{{r}}]\) and) simultaneous access to a decryption oracle \(D(k, \cdot )\) and a recompiling oracle \({\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\), both parametrized with the same key \(k\).

### *Remark 2*

We emphasize that the recompilation attack model is *not* artificial when dealing with white-box cryptography. Indeed, it seems reasonable to assume that user-related values can be embedded in the random nonce \(r\in {\mathsf {R}}\) used to compile a (user-specific) white-box implementation. Thus a coalition of malicious users can be modeled as a single adversary with (possibly limited) access to a recompiling oracle producing white-box implementations under fresh random nonces \(r'\in {\mathsf {R}}\).

### 3.2 The Prime Goal: Unbreakability

*et al.*stated in [10, 11] that the first security objective of white-box cryptography is, given a program \([E_{{k}}]\), to preserve the privacy of the key \(k\) embedded in the program (see also [17, Q1] and [30, Definition 2]). We define the following game to capture that intuition:

- 1.
randomly generate a key \(k\leftarrow K()\) and a nonce \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

- 2.
the adversary \({\mathcal {A}}\) is run on input \([E_{{k}}^{{r}}]= {\mathbf {C}}_{{\mathcal {E}}}(k,r)\),

- 3.
\({\mathcal {A}}\) returns a guess \(\hat{k}\in {\mathsf {K}}\),

- 4.
\({\mathcal {A}}\) succeeds if \(\hat{k} = k\).

Let us define more concisely and precisely the notion of unbreakability with respect to the attack model \({\mathsf {ATK}}\) (\({\mathsf {CPA}}\), \({\mathsf {CCA}}\), \({\mathsf {RCA}}\) or \({\mathsf {CCA}}\)+\({\mathsf {RCA}}\)).

### **Definition 1 (Unbreakability)**

Note that in our setting, a total break requires the adversary to output the whole key \(k\) embedded into \([E_{{k}}^{{r}}]\). Basing \({\mathsf {UBK}}\) on the semantic security of \(k\) makes no sense here since it is straightforward to ascertain, for some guess \(\hat{k}\), that \(\hat{k} = k\) by just checking whether the value returned by \([E_{{k}}^{{r}}](m)\) is equal to \(E(\hat{k}, m)\) for sufficiently many plaintext(s) \(m\in {\mathsf {M}}\). In other words, the distributions \(\{k, [E_{{k}}^{{r}}]\}_{k\in {\mathsf {K}},r\in {\mathsf {R}}}\) and \(\{k',[E_{{k}}^{{r}}]\}_{(k,k')\in {\mathsf {K}}^2,r\in {\mathsf {R}}}\) are computationally distinguishable. As a result, one cannot prevent some information leakage about \(k\) from \([E_{{k}}^{{r}}]\), whatever the specification of the compiler \({\mathbf {C}}_{{\mathcal {E}}}\).

### *Remark 3*

Although not required in the above definition, for a white-box compiler to be cryptographically sound, one would require that there exist some security parameter \(\lambda \) such that \(\varepsilon /\tau \) be exponentially small in \(\lambda \) and \({\mathsf {size}}\left( {[E_{{k}}]}\right) \) and \({\mathsf {time}}\left( {[E_{{k}}](\cdot )}\right) \) be polynomial in \(\lambda \). Otherwise said, one aims to get a negligible \(\varepsilon /\tau \) while keeping fair \({\mathsf {size}}\left( {[E_{{k}}]}\right) \) and \({\mathsf {time}}\left( {[E_{{k}}](\cdot )}\right) \).

### 3.3 Security Notions Really Needed in Applications

When satisfied, unbreakability ensures that an adversary cannot extract the secret key of a randomly generated white-box implementation. Therefore any party should have to execute the program rather than simulating it with the secret key. While this property is the very least that can be expected from white-box cryptography, it is rather useless on its own. Indeed, knowing the white-box program amounts to knowing the key in some sense since it allows one to process the encryption without restriction. As discussed in [30, Sect. 3.1.3], an attacker only needs to isolate the cryptographic code in the implementation. This is a common threat in DRM applications, which is known as *code lifting*. Although some countermeasures can make code lifting a tedious task it is reasonable to assume that sooner or later a motivated attacker would eventually recover the cryptographic code. That is why, in order to make the white-box compilation useful, the availability of the white-box program should restrict the adversary capabilities compared to the availability of the secret key.

**One-Wayness.** A natural restriction is that although the white-box implementation allows one to encrypt at will, it should not enable decryption. In other words, it should be difficult to invert the program computations. In that case, the program is said to be *one-way*, to keep consistency with the notion of one-wayness (for a function or a cryptosystem) traditionally used in cryptography. As already noted in [17], a white-box compiler achieving one-wayness is of great interest as it turns a symmetric encryption scheme into a public-key encryption scheme. This is also one of the many motivations to design methods for general obfuscation [1, 14].

**Incompressibility of Programs.** Another argument often heard in favor of white-box cryptography is that a white-box program is less convenient to store and exchange than a mere secret key due to its bigger size. As formulated in [30, Sect. 3.1.3], white-box cryptography allows to “hide a key in an even bigger key”. For instance, Chow *et al.* implementation of AES [11] makes use of 800 KB of look-up tables, which represents a significant overhead compared to a \(128\)-bit key. Suppose this implementation was unbreakable in the sense of Definition 1 (which we know to be false [3]), the question that would arise would be: what is the computationally achievable minimum size of a program functionally equivalent to this implementation? When a program is hard to compress beyond a certain prescribed size, we shall say that this program is *incompressible*. Section 6 shows an example of computationally incompressible programs for symmetric encryption.

**Traceability of Programs.** It is often heard that white-box compilation can provide traceability (see for instance [30, Sect. 5.5.1]). Specifically, white-box compilation should enable one to derive several functionally equivalent versions of the same encryption (or decryption) program. A typical use case for such a system is the distribution of protected digital content where every legitimate user gets a different version of some decryption software. If a malicious user shares its own program (e.g. over the Internet), then one can trace the so-called *traitor* by identifying its unique copy of the program. However, in a white-box context, a user can easily transform its version of the program while keeping the same functionality. Therefore to be effective, the tracing should be robust to such transformations, even in the case where several malicious users collude to produce an untraceable software. We show in Sect. 7 how to achieve such a robust tracing from a compiler that can *hide* functional perturbations in a white-box program. Accordingly, we define new security notions for such a white-box compiler. Combined with our tracing scheme, a compiler achieving these security notions is shown to provide traceable white-box programs.

## 4 One-Wayness

- 1.
randomly select a key \(k\leftarrow K()\) and a nonce \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

- 2.
generate the white box program \([E_{{k}}^{{r}}]= {\mathbf {C}}_{{\mathcal {E}}}(k,r)\),

- 3.
randomly select a plaintext \(m\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {M}}\)

- 4.
compute its encryption \(c = E(k,m)\),

- 5.
the adversary \({\mathcal {A}}\) is run on inputs \([E_{{k}}^{{r}}]\) and \(c\),

- 6.
\({\mathcal {A}}\) returns a guess \(\hat{m}\),

- 7.
\({\mathcal {A}}\) succeeds if \(\hat{m} = m\).

Let us define more precisely the notion of one-wayness with respect to the attack model \({\mathsf {ATK}}\).

### **Definition 2 (One-Wayness)**

## 5 Incompressibility of White-Box Programs

In this section, we formalize the notion of incompressibility for a white-box compiler. What we mean by incompressibility here is the hardness, given a (large) compiled program \([E_{{k}}]\), of coming up with a significantly smaller program functionally close to \(E(k,\cdot )\). A typical example is when a content provider distributes a large encryption program (e.g. \(100\) GB or more) and wants to make sure that no smaller yet equivalent program can be redistributed by subscribers to illegitimate third parties. The content provider cannot prevent the original program from being shared e.g. over the Internet; however, if compiled programs are provably incompressible then redistribution may be somewhat discouraged by the size of transmissions.

- 1.
randomly select \(k\leftarrow K()\) and \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

- 2.
compile \([E_{{k}}^{{r}}]= {\mathbf {C}}_{{\mathcal {E}}}(k,r)\),

- 3.
run \({\mathcal {A}}\) on input \([E_{{k}}^{{r}}]\),

- 4.
\({\mathcal {A}}\) returns some program \(P\),

- 5.
\({\mathcal {A}}\) succeeds if \(\varDelta (P, E(k,\cdot )) \le \delta \) and \({\mathsf {size}}\left( {P}\right) < \lambda \).

### **Definition 3 (** \(\mathbf{(}\lambda ,\delta \mathbf{)}\) **-Incompressibility)**

Notice that for some values of \(\lambda \) and \(\delta \), the \((\lambda ,\delta )\)-incompressibility may be trivially broken. For example, the problem is trivial for \(\delta =1\) as the user can always construct any program smaller than \(\lambda \) bits with outputs unrelated to \(E(k,\cdot )\). Even though the definition allows any \(\delta \in [0,1]\), the notion makes more sense (and surely is harder to break) when \(\delta \) is taken small enough. In that case, the adversary has to output a program which correctly encrypts nearly all plaintexts (or at least a significant fraction).

## 6 A Provably One-Way and Incompressible White-Box Compiler

In this section, we give an example of a symmetric encryption scheme for which there exists a efficient one-way and incompressible white-box compiler. This example is a symmetric-key variant of the RSA cryptosystem [27]. The one-wayness and incompressibility properties of the compiler are provably achieved based on standard hardness assumptions related to the integer factoring problem.

**One-Way Compilers from Public-Key Encryption.** It is worthwhile noticing that any *one-way public-key* encryption scheme straightforwardly gives rise to a symmetric encryption scheme for which a one-way compiler exists. The symmetric key is defined as the secret key of the asymmetric encryption scheme and encryption is defined as the function deriving the public key from the secret key composed with the encryption procedure. The white-box compiler then simply produces a program evaluating the encryption algorithm with the public key embedded in it. The one-wayness of the compiler comes directly from the one-wayness of the asymmetric scheme. Such an example of a one-way compiler is given in [29, Theorem 3],[30, Sect. 4.8.2].

We present hereafter another compiler obtained from the RSA cryptosystem and whose one-wayness straightforwardly holds by construction. The main interest of our example is to further satisfy \((\lambda ,0)\)-incompressibility for any arbitrary \(\lambda \). We first recall some background on RSA groups.

### 6.1 RSA Groups

*RSA group*. A typical construction for \(\mathcal {G}\) is to take the group of invertible integers modulo a composite number or a carefully chosen elliptic curve over a ring. Practical RSA groups are known to be efficiently samplable in the sense that there exists a group generation algorithm \({\mathbb {G}}\) which, given a security parameter \(n\in {\mathbb {N}}\), outputs the public description \({\mathsf {desc}}\left( {\mathcal {G}}\right) \) of a random group \(\mathcal {G}\) together with its order \(\omega \). Efficient means that the random selection

^{2}The group descriptor \({\mathsf {desc}}\left( {\mathcal {G}}\right) \) intends to contain all the necessary parameters for performing group operations. Obviously \(\omega \) is excluded from the group description.

In the following, we shall make the usual hardness assumptions for RSA group generators. Namely, we assume that the groups sampled by \({\mathbb {G}}\) have the following properties (formal definitions for these security notions are provided in the full version of this paper [12]):

**Unbreakability –** \({\mathsf {UBK}}[{\mathbb {G}}]\):

It is hard to compute the secret order \(\omega \) of \(\mathcal {G}\) from \({\mathsf {desc}}\left( {\mathcal {G}}\right) \).

**Hardness of Extracting Orders –** \({\mathsf {ORD}}[{\mathbb {G}}]\):

It is hard to compute the order of a random group element \(x \overset{\scriptscriptstyle {\$}}{\leftarrow }\mathcal {G}\) (or a multiple thereof) from \({\mathsf {desc}}\left( {\mathcal {G}}\right) \).

**Hardness of Extracting Roots –** \({\mathsf {RSA}}[{\mathbb {G}}]\):

For a random integer \(e \in [0,\omega )\) such that \(\gcd (e,\omega ) = 1\), it is hard to compute the \(e\)-th root of a random group element \(x \in \mathcal {G}\) from \(e\) and \({\mathsf {desc}}\left( {\mathcal {G}}\right) \).

### 6.2 The White-Box Compiler

- 1.
\({\mathcal {E}}\) makes use of a security parameter \(n\in {\mathbb {N}}\),

- 2.
\(K()\) randomly selects a group \(({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega )\leftarrow {\mathbb {G}}(1^n)\) and a public exponent \(e \in [0,\omega )\) such that \(\gcd (e,\omega ) = 1\), and returns \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e)\),

- 3.
plaintexts and ciphertexts are group elements i.e. \({\mathsf {M}} = {\mathsf {C}} = \mathcal {G}\),

- 4.
given a key \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e)\) and a plaintext \(m\in \mathcal {G}\), \(E(k,m)\) computes \(m^{e\mod \omega }\) in the group and returns that value,

- 5.
given a key \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e)\) and a ciphertext \(c\in \mathcal {G}\), \(D(k,c)\) computes \(c^{\frac{1}{e}\mod \omega }\) in the group and returns that value.

- 1.
\({\mathbf {C}}_{{\mathcal {E}}}\) makes use of an additional security parameter \(h\in {\mathbb {N}}\),

- 2.
the randomness space \({\mathsf {R}}\) is the integer set \([0,2^{h}/\omega )\),

- 3.
we define the

*blinded exponent*\(f\) with respect to the public exponent \(e\) and a random nonce \(r \in {\mathsf {R}}\) as the integer \(f = e + r \cdot \omega \), - 4.
given a key \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e) \in {\mathsf {K}}\), and a random nonce \(r \in {\mathsf {R}}\), our white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) generates a program \([E_{{k}}]\) which simply embeds \({\mathsf {desc}}\left( {\mathcal {G}}\right) \) and \(f\) and computes \(m^{f}\) for any input \(m\in \mathcal {G}\).

### **Theorem 1**

The white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is \({\mathsf {UBK}}\)-\({\mathsf {CPA}}\) secure under the assumption that \({\mathsf {UBK}}[{\mathbb {G}}]\) is hard, and \({\mathsf {OW}}\)-\({\mathsf {CPA}}\) secure under the assumption that \({\mathsf {RSA}}[{\mathbb {G}}]\) is hard.

### 6.3 Proving Incompressibility Under Chosen Plaintext Attacks

We now show that \({\mathbf {C}}_{{\mathcal {E}}}\) is \((\lambda ,0)\)-\({\mathsf {INC}}\)-\({\mathsf {CPA}}\) secure under \({\mathsf {UBK}}[{{\mathbb {G}}}]\) as long as the security parameter \(h\) is slightly greater than \(\lambda \). We actually show a slightly weaker result: our reduction assumes that the program \(P\) output by the adversary is *algebraic*. An algebraic program \(P\) (see [5, 26]) with respect to group \(\mathcal {G}\) has the property that each and every group element \(y\in \mathcal {G}\) output by \(P\) is computed as a linear combination of all the group elements \(x_1,\dots ,x_t\) that were given to \(P\) as input in the same execution. Relying on the definition of [26], \(P\) must then admit an efficient extractor Extract (running in time \(\tau _{\mathsf {Ex}}\)) which, given the code of \(P\) as well as all its inputs and random tape for some execution, returns the coefficients \(\alpha _i\) such that \(y=x_1^{\alpha _1}\cdots x_t^{\alpha _t}\).

### **Theorem 2**

The proof of Theorem 2 is provided in the full version of the paper [12].

### *Remark 4*

The white-box compiler can also be shown to be \((\lambda , 0)\)-\({\mathsf {INC}}\)-\({\mathsf {CCA}}\) secure under the (gap) assumption that \({\mathsf {ORD}}[{{\mathbb {G}}}]\) remains hard when \({\mathsf {RSA}}[{{\mathbb {G}}}]\) is easy. The reduction would work similarly but with an oracle solving \({\mathsf {RSA}}[{{\mathbb {G}}}]\) that it would use to simulate decryption queries.

## 7 Traceability of White-Box Programs

One of the main applications of white-box cryptography is the secure distribution of valuable content through applications enforcing digital rights management (DRM). Namely, some digital content is distributed in encrypted form to legitimate users. A service user may then recover the content in clear using her own private white-box-secure decryption software.

However, by sharing their decryption software, users may collude and try to produce a pirate decryption software i.e. a non-registered utility capable of decrypting premium content. Traitor tracing schemes [4, 8, 9, 25] were specifically designed to fight copyright infringement, by enabling a designated authority to recover the identity of at least one of the traitors in the malicious coalition who constructed the rogue decryption software. In this section, we show how to apply some of these techniques to ensure the full traceability of programs assuming that slight perturbations of the programs functionality by the white-box compiler can remain *hidden* to an adversary.

As opposed to previous sections, we interchange the roles of encryption and decryption, considering that for our purpose, user programs would implement decryption rather than encryption.

### 7.1 Programs with Hidden Perturbations

A program can be made traceable by unnoticeably modifying its functionality. The basic idea is to *perturbate* the program such that it returns an incorrect output for a small set of unknown inputs (which remains a negligible fraction of the input domain). The set of so-called *tracing inputs* varies according to the identity of end users so that running the decryption program over inputs from different sets and checking the returned outputs efficiently reveals the identity of a traitor. We consider tracing schemes that follow this approach to make programs traceable in the presence of pirate coalitions. Of course, one must consider collusions of several users aiming to produce an untraceable program from their own legitimate programs. A tracing scheme that resists such collusions is said to be *collusion-resistant*.

*supports perturbation*takes as additional input an ordered list of dysfunctional ciphertexts \(\varvec{c} = \langle {c_1, \dots , c_u} \rangle \in {{\mathsf {C}}}^u\) and returns a program

*hides*functional perturbations when, given a program instance \(P = [D_{{k,\varvec{c}}}^{{r}}]\), an adversary cannot extract enough information about the dysfunctional input-output pairs to be able to correct \(P\) back to its original functionality. It is shown later that perturbated programs can be made traceable assuming that it is hard to recover the correct output of dysfunctional inputs. This is formalized by the following game:

- 1.
randomly select \(k\leftarrow K()\), \(m \overset{\scriptscriptstyle {\$}}{\leftarrow }{{\mathsf {M}}}\) and \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

- 2.
compile \([D_{{k, \langle {c } \rangle }}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k, r ; \langle {c } \rangle )\) with \(c = E(k, m)\),

- 3.
run \({\mathcal {A}}\) on input \((c, [D_{{k, \langle {c } \rangle }}^{{r}}])\),

- 4.
\({\mathcal {A}}\) return some message \(\hat{m}\),

- 5.
\({\mathcal {A}}\) succeeds if \(\hat{m} = m\).

### **Definition 4 (Perturbation-Value Hiding)**

*perturbation-index hiding*. We formalize this notion with the following game, where \(n>1\) and \(v\in [1,n-1]\) are fixed parameters:

- 1.
randomly select \(k\leftarrow K()\),

- 2.
for \(i\in [1,n]\), randomly select \(m_i \overset{\scriptscriptstyle {\$}}{\leftarrow }{{\mathsf {M}}}\) and set \(c_i = E(k, m_i)\),

- 3.
for \(i\in [1,n]\) with \(i\ne v\), randomly select \(r_i\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\) and generate \(P_i = {\mathbf {C}}_{{\mathcal {E}}}(k, r_i ; \langle {c_1,\dots , c_i} \rangle )\),

- 4.
randomly pick \(b\overset{\scriptscriptstyle {\$}}{\leftarrow }\{0,1\}\),

- 5.
run \({\mathcal {A}}\) on inputs \(P_1,\dots ,P_{v-1},P_{v+1},\dots , P_n\) and \((m_{v+b},c_{v+b})\),

- 6.
\({\mathcal {A}}\) returns a guess \(\hat{b}\) and succeeds if \(\hat{b} = b\).

### **Definition 5 (Perturbation-Index Hiding)**

Note that in a PIH-secure white-box compiler, all entries in the list of its dysfunctional inputs can be permuted with no (non-negligible) impact on the security of the compiler.

### 7.2 A Generic Tracing Scheme

We now give an example of a tracing scheme \({\mathcal {T}}\) for programs generated by a white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) that supports hidden perturbations. We formally prove that the identification of at least one traitor is computationally enforced assuming that \({\mathbf {C}}_{{\mathcal {E}}}\) is secure in the sense of PVH and PIH, independently of the total number \(n\) of issued programs. Under these assumptions, \({\mathcal {T}}\) therefore resists collusions of up to \(n\) users i.e. is maximally secure. As usual in traitor-tracing schemes, \({\mathcal {T}}\) is composed of a setup algorithm \({\mathcal {T}}.\mathsf{setup}\) and a tracing algorithm \({\mathcal {T}}.\mathsf{trace}\). These algorithms are defined as follows.

**Setup Algorithm.** A random key \(k\overset{\scriptscriptstyle {\$}}{\leftarrow }K()\) is generated as well as \(n\) random input-output pairs \((m_i, c_i)\) where \(m_i\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {M}}\) and \(c_i = E(k, m_i)\) for \(i\in [1, n]\). \({\mathcal {T}}\) keeps \(\mathsf {perturbations}= \left( (m_1, c_1), \dots , (m_n, c_n)\right) \) as private information for later tracing. For \(i\in [1, n]\), user \(i\) is (securely) given the \(i\)-perturbated program \(P_i = {\mathbf {C}}_{{\mathcal {E}}}(k, r_i; \langle {c_1,\dots , c_i } \rangle )\) where \(r_i\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\). It is easily seen that all \(P_i\)’s correctly decrypt any \(c\not \in \{c_i, i\in [1,n]\}\). However when \(c = c_{i}\), user programs \(P_{i}, \dots , P_n\) return junk while \(P_1, \dots , P_{i-1}\) remain functional. Therefore \({\mathcal {T}}\) implements a private linear broadcast encryption (PLBE) scheme in the sense of [4].

**Tracing Algorithm.**Given a rogue decryption program \(Q\) constructed from a set of user programs \(\{P_j \mid j\in T \subseteq [1, n]\}\), \({\mathcal {T}}.\mathrm{trace}\) uses its knowledge of \(k\) and \(\mathsf {perturbations}\) to identify a traitor \(j\in T\) in \(O(\log n)\) evaluations of \(Q\) as follows. Since \(Q\) is just a program and is therefore stateless, the general tracing techniques of [4, 25] are applicable. \({\mathcal {T}}.\mathrm{trace}\) makes use of two probability estimators as subroutines:

- 1.a probability estimator \(\widehat{p_0}\) which intends to measure the actual probabilitywhen all calls \(Q\) makes to an external random source are fed with a perfect source. Since the pirate decryption program is assumed to be fully or almost fully functional, \(p_0\) must be significantly close to \(1\). It is classical to require from \(Q\) that \(p_0 \ge 1/2\).$$ p_0 = {\mathrm{{Pr}}}\left[ {m\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {M}}\,; \quad c = E(k, m) : Q(c) = m}\right] $$
- 2.a probability estimator \(\widehat{p_v}\) which, given \(v\in [1, n]\), estimates the actual probabilitywhere \(Q\) is run over a perfect random source again.$$ p_v = {\mathrm{{Pr}}}\left[ {Q(c_{v}) = m_{v}}\right] $$

### **Theorem 3**

Assume \({\mathbf {C}}_{{\mathcal {E}}}\) is secure in the sense of both PVH and PIH. Then for any subset of traitors \(T\subseteq [1, n]\), \({\mathcal {T}}.\mathsf{trace}\) correctly returns a traitor \(j\in T\) with overwhelming probability after \(O(\log n)\) executions of the pirate decryption program \(Q\).

This result validates the folklore intuition according to which cryptographic programs can be made efficiently traceable when properly obfuscated and assuming that slight alterations can be securely inserted in them. It also identifies clearly which sufficient security properties must be fulfilled by the white-box compiler to achieve traceability even when all users collude i.e., in the context of total piracy.

## Footnotes

- 1.
Quoting [10], the “choice of the implementation is the sole remaining line of defense and is precisely what is pursued in white-box cryptography”.

- 2.
In practice, it is well known how to generate such groups. For instance, the multiplicative group \(\mathbb {Z}_{pq}^{*}\) with \(p\) and \(q\) being

*safe primes*has order \(\omega = (p-1)(q-1)\) with \(\varphi (\omega ) \approx \frac{1}{2} \omega \).

## Notes

### Acknowledgements

This work has been financially supported by the French national FUI12 project MARSHAL+. The authors would like to thank Jean-Sébastien Coron and Louis Goubin for interesting discussions and suggestions.

## References

- 1.Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S., Yang, K.: On the (im)possibility of obfuscating programs. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 1–18. Springer, Heidelberg (2001)CrossRefGoogle Scholar
- 2.Bellare, M., Desai, A., Pointcheval, D., Rogaway, P.: Relations among notions of security for public-key encryption schemes. In: Krawczyk, H. (ed.) CRYPTO 1998. LNCS, vol. 1462, pp. 26–45. Springer, Heidelberg (1998)Google Scholar
- 3.Billet, O., Gilbert, H., Ech-Chatbi, C.: Cryptanalysis of a white Box AES implementation. In: Handschuh, H., Hasan, M.A. (eds.) SAC 2004. LNCS, vol. 3357, pp. 227–240. Springer, Heidelberg (2005)Google Scholar
- 4.Boneh, D., Sahai, A., Waters, B.: Fully collusion resistant traitor tracing with short ciphertexts and private keys. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 573–592. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 5.Boneh, D., Venkatesan, R.: Breaking RSA may not be equivalent to factoring. In: Nyberg, K. (ed.) EUROCRYPT 1998. LNCS, vol. 1403, pp. 59–71. Springer, Heidelberg (1998)Google Scholar
- 6.Bringer, J., Chabanne, H., Dottax, E.: White box cryptography: another attempt. Cryptology ePrint Archive, Report 2006/468 (2006). http://eprint.iacr.org/
- 7.Chandran, N., Chase, M., Vaikuntanathan, V.: Functional re-encryption and collusion-resistant obfuscation. In: Cramer, R. (ed.) TCC 2012. LNCS, vol. 7194, pp. 404–421. Springer, Heidelberg (2012)Google Scholar
- 8.Chor, B., Fiat, A., Naor, M.: Tracing traitors. In: Desmedt, Y.G. (ed.) CRYPTO 1994. LNCS, vol. 839, pp. 257–270. Springer, Heidelberg (1994)Google Scholar
- 9.Chor, B., Fiat, A., Naor, M., Pinkas, B.: Tracing traitors. IEEE Trans. Inf. Theory
**46**(3), 893–910 (2000)CrossRefzbMATHGoogle Scholar - 10.Chow, S., Eisen, P., Johnson, H., van Oorschot, P.C.: A White-box DES implementation for DRM applications. In: Feigenbaum, J. (ed.) DRM 2002. LNCS, vol. 2696, pp. 1–15. Springer, Heidelberg (2003)Google Scholar
- 11.Chow, S., Eisen, P., Johnson, H., van Oorschot, P.C.: White-box cryptography and an AES implementation. In: Nyberg, K., Heys, H. (eds.) SAC 2002. LNCS, vol. 2595, pp. 250–270. Springer, Heidelberg (2003)Google Scholar
- 12.Delerablée, C., Lepoint, T., Paillier, P., Rivain, M.: White-box security notions for symmetric encryption schemes. Cryptology ePrint Archive (2013). http://eprint.iacr.org/
- 13.Goubin, L., Masereel, J.-M., Quisquater, M.: Cryptanalysis of white box des implementations. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 278–295. Springer, Berlin Heidelberg (2007)Google Scholar
- 14.Hofheinz, D., Malone-Lee, J., Stam, M.: Obfuscation for cryptographic purposes. J. Cryptol.
**23**(1), 121–168 (2010)CrossRefzbMATHMathSciNetGoogle Scholar - 15.Hohenberger, S., Rothblum, G.N., Shelat, A., Vaikuntanathan, V.: Securely obfuscating re-encryption. In: Vadhan, S.P. (ed.) TCC 2007. LNCS, vol. 4392, pp. 233–252. Springer, Heidelberg (2007)Google Scholar
- 16.Jacob, M., Boneh, D., Felten, E.: Attacking an obfuscated cipher by injecting faults. In: Feigenbaum, J. (ed.) DRM 2002. LNCS, vol. 2696, pp. 16–31. Springer, Heidelberg (2003)Google Scholar
- 17.Joye, M.: On white-box cryptography. In: Preneel, B., Elçi, A., Ors, S.B. (eds.) Security of Information and Networks, pp. 7–12. Trafford Publishing (2008)Google Scholar
- 18.Joye, M.: Basics of side-channel analysis. In: Koç, C.K. (ed.) Cryptographic Engineering, pp. 365–380. Springer, New York (2009)CrossRefGoogle Scholar
- 19.Karroumi, M.: Protecting white-box AES with dual ciphers. In: Rhee, K.-H., Nyang, D. (eds.) ICISC 2010. LNCS, vol. 6829, pp. 278–291. Springer, Heidelberg (2011)Google Scholar
- 20.Lepoint, T., Rivain, M., De Mulder, Y., Roelse, P., Preneel, B.: Two Attacks on a White-Box AES Implementation. In: Lange, T., Lauter, K., Lisonek, P. (eds.) SAC 2013. LNCS. Springer (2013)Google Scholar
- 21.Link, H.E., Neumann, W.D.: Clarifying obfuscation: improving the security of white-box DES. In: ITCC 2005, vol. 1, pp. 679–684 (2005)Google Scholar
- 22.Michiels, W., Gorissen, P., Hollmann, H.D.L.: Cryptanalysis of a generic class of white-box implementations. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 414–428. Springer, Heidelberg (2009)Google Scholar
- 23.De Mulder, Y., Roelse, P., Preneel, B.: Cryptanalysis of the xiao - lai white-box aes implementation. In: Knudsen, L.R., Huapeng, W. (eds.) SAC 2012. LNCS, vol. 7707, pp. 34–49. Springer, Heidelberg (2013)Google Scholar
- 24.De Mulder, Y., Wyseur, B., Preneel, B.: Cryptanalysis of perturbated white-box AES implementation. In: Gong, G., Gupta, K.C. (eds.) INDOCRYPT 2010. LNCS, vol. 6498, pp. 292–310. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 25.Naor, D., Naor, M., Lotspiech, J.: Revocation and tracing schemes for stateless receivers. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 41–62. Springer, Heidelberg (2001)CrossRefGoogle Scholar
- 26.Paillier, P., Vergnaud, D.: Discrete-log-based signatures may not be equivalent to discrete log. In: Roy, B. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, pp. 1–20. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 27.Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM
**21**(2), 120–126 (1978)CrossRefzbMATHMathSciNetGoogle Scholar - 28.Rohatgi, P.: Improved techniques for side-channel analysis. In: Kçc, C.K. (ed.) Cryptographic Engineering, pp. 381–406. Springer, New York (2009)CrossRefGoogle Scholar
- 29.Saxena, A., Wyseur, B., Preneel, B.: Towards security notions for white-box cryptography. In: Samarati, P., Yung, M., Martinelli, F., Ardagna, C.A. (eds.) ISC 2009. LNCS, vol. 5735, pp. 49–58. Springer, Heidelberg (2009)Google Scholar
- 30.Wyseur, B.: White-box cryptography. Ph.D. thesis, Katholieke Universiteit Leuven (2009)Google Scholar
- 31.Wyseur, B., Michiels, W., Gorissen, P., Preneel, B.: Cryptanalysis of white-box des implementations with arbitrary external encodings. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4896, pp. 264–277. Springer, Heidelberg (2007)Google Scholar
- 32.Wyseur, B., Preneel, B.: Condensed white-box implementations. In: Proceedings of the 26th Symposium on Information Theory in the Benelux, pp. 296–301 (2005)Google Scholar
- 33.Yaying, X., Xuejia, X.: A secure implementation of white-box AES. In: CSA 2009, pp.1–6 (2009)Google Scholar