Keywords

1 Introduction

Traditionally, to prove the security of a cryptosystem, cryptographers consider attack scenarios where an adversary is only given a black-box access to the cryptographic system, namely to the inputs and outputs of its underlying algorithms. Security notions are built on the standard paradigm that the algorithms are known and that computing platforms can be trusted to effectively protect the secrecy of the private key.

However attacks on implementations of cryptographic primitives have become a major threat due to side-channel information leakage (see for example [18, 28]) such as execution time, power consumption or electromagnetic emanations. More generally, the increasing penetration of cryptographic applications onto untrusted platform (the end points being possibly controlled by a malicious party) makes the black-box model too restrictive to guaranty the security of programs implementing cryptographic primitives.

White-box cryptography was introduced in 2002 by Chow, Eisen, Johnson and van Oorschot [10, 11] as the ultimate, worst-case attack model. This model considers an attacker far more powerful than in the classical black-box model (and thus more representative of real-world attackers); namely the attacker is given full knowledge and full control on both the algorithm and its execution environment. However, even such powerful capabilities should not allow her to e.g. extract the embedded keyFootnote 1. White-box cryptography can hence be seen as a restriction of general obfuscation where the function to protect belongs to some narrower class of cryptographic functions indexed by a secret key. From that angle, the ultimate goal of a white-box implementation is to leak nothing more than what a black-box access to the function would reveal. An implementation achieving this strong property would be as secure as in the black-box model, in particular it would resist all existing and future side-channel and fault-based attacks. Although we know that general obfuscation of any function is impossible to achieve [1], there is no known impossibility result for white-box cryptography and positive examples have even been discovered [7, 15]. On the other hand, the work of Chow et al. gave rise to several proposals for white-box implementations of symmetric ciphers, specifically DES [10, 21, 32] and AES [6, 11, 19, 33], even though all these proposals have been broken [3, 13, 16, 20, 2224, 31].

Our belief is that the dearth of promising white-box implementations is also a consequence of the absence of well-understood security goals to achieve. A first step towards a theoretical model was proposed by Saxena, Wyseur and Preneel [29], and subsequently extended by Wyseur in his PhD thesis [30]. These results show how to translate any security notion in the black-box model into a security notion in the white-box model. They introduce the white-box property for an obfuscator as the ability to turn a program (modeled as a polynomial Turing machine) which is secure with respect to some black-box notion into a program secure with respect to the corresponding white-box notion. The authors then give an example of obfuscator for a symmetric encryption scheme achieving the white-box equivalent of semantic security. In other words, the symmetric encryption scheme is turned into a secure asymmetric encryption scheme. While these advances describe a generic model to translate a given notion from the black-box to the white-box setting, our aim in this paper is to define explicit security notions that white-box cryptography should realize in practice. As a matter of fact, some of our security notions are not black-box notions that one would wish to preserve in the white-box setting, but arise from new features potentially introduced by the white-box compilation. Note that although we use a different formalism and pursue different goals, our work and those in [29, 30] are not in contradiction but rather co-exist in a wider framework.

Our Contributions. We formalize the notion of white-box compilers for a symmetric encryption scheme and introduce several security notions for such compilers. As traditionally done in provable security (e.g. [2]), we consider separately various adversarial goals (e.g. decrypt some ciphertext) and attack models (e.g. chosen ciphertext attack), and then obtain distinct security definitions by pairing a particular goal with a particular attack model. We consider four different attack models in the white-box context: the chosen plaintext attack, the chosen ciphertext attack, the recompilation attack and the chosen ciphertext and recompilation attack. We formalize the main security objective of white-box cryptography which is to protect the secret key as a notion of unbreakability. We show that additional security notions should be considered in applications and translate folklore intuitions behind white-box cryptography into concrete security notions; namely the one-wayness, incompressibility and traceability of white-box programs. For the first two notions, we show an example of a simple symmetric encryption scheme over an RSA group for which an efficient white-box compiler exists that provably achieves both notions. We finally show that white-box programs are efficiently traceable by simple means assuming that functional perturbations can be hidden in them. Overall, our positive results shed more light on the different aspects of white-box security and provide concrete constructions that achieve them in a provable fashion.

2 Preliminaries

Symmetric Encryption. A symmetric encryption scheme is a tuple \({\mathcal {E}}= \left( {\mathsf {K}}, {\mathsf {M}}, {\mathsf {C}}, K, E, D\right) \) where \({\mathsf {K}}\) is the key space, \({\mathsf {M}}\) is the plaintext (or message) space, \({\mathsf {C}}\) is the ciphertext space, \(K\) is a probabilistic algorithm that returns a key \(k\in {\mathsf {K}} = {\mathsf {range}}\left( {K()}\right) \), \(E\) is a deterministic encryption function mapping elements of \({\mathsf {K}}\times {\mathsf {M}}\) to elements of \({\mathsf {C}}\), \(D\) is a deterministic decryption function mapping elements of \({\mathsf {K}}\times {\mathsf {C}}\) to elements of \({\mathsf {M}}\).

We require that for any \(k\in {\mathsf {K}}\) and any \(m\in {\mathsf {M}}\), \(D(k, E(k, m)) = m\). Most typically, \({\mathcal {E}}\) refers to a block-cipher in which case all sets are made of binary strings of determined length and \({\mathsf {C}} = {\mathsf {M}}\).

Programs. A program is a word in the language-theoretic sense and is interpreted in the explicit context of a programming model and an execution model, the details of which we want to keep as abstracted away as possible. Programs differ from remote oracles in the sense that their code can be executed locally, read, copied and modified at will. Successive executions are inherently stateless and all the “system calls” that a program makes to external resources such as a random source or a system clock can be captured and responded arbitrarily. Execution can be interrupted at any moment and all the internal variables identified by the program’s instructions can be read and modified arbitrarily by the party that executes the program.

For some function \(f\) mapping some set \({\mathsf {A}}\) to some set \({\mathsf {B}}\), we denote by \({\mathsf {prog}}\left( {f}\right) \) the set of all programs implementing \(f\). A program \(P\in {\mathsf {prog}}\left( {f}\right) \) is said to be fully functional with respect to \(f\) when for any \(a\in {\mathsf {A}}\), \(P(a)\) returns \(f(a)\) with probability \(1\). \(P\) is said to be \(\delta \)-functional (with respect to \(f\)) when \(P\) is at distance at most \(\delta \in [0,1]\) from \(f\), i.e.

$$\begin{aligned} \varDelta (P, f) \mathop {=}\limits ^{\scriptscriptstyle {\mathrm {def}}}{\mathrm{{Pr}}}[a\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {A}} \,; \quad b\leftarrow P(a): b\ne f(a)] \le \delta \;. \end{aligned}$$

The set of \(\delta \)-functional programs implementing \(f\) is noted \({\delta }\text{- }{\mathsf {prog}}\left( {{f}}\right) \). Obviously \({0}\text{- }{\mathsf {prog}}\left( {{f}}\right) = {\mathsf {prog}}\left( {f}\right) \).

Other Notations. If \({\mathsf {A}}\) is some set, \(|{\mathsf {A}}|\) denotes its cardinality. If \(\mathbb A\) is some generator i.e. a random source with some prescribed output range \({\mathsf {A}}\), \(H(\mathbb A)\) denotes the output entropy of \(\mathbb A\) as a source. Abusing notations, we may also denote it by \(H(a)\) for \(a\leftarrow \mathbb A(\cdots )\). Finally, when we write \({\mathcal {O}}(\cdot )=\epsilon \), we mean that \({\mathcal {O}}\) is the oracle which, on any input, returns the empty string \(\epsilon \).

3 White-Box Compilers

In this paper, we consider that a white-box implementation of the scheme \({\mathcal {E}}\) is a program produced by a publicly known compiling function \({\mathbf {C}}_{{\mathcal {E}}}\) which takes as arguments a key \(k\in {\mathsf {K}}\) and possibly a diversifying nonce \(r\in {\mathsf {R}}\) drawn from some randomness space \({\mathsf {R}}\). We will denote the compiled program by \([E_{{k}}^{{r}}]\) (or \([E_{{k}}]\) when the random nonce \(r\) is implicit or does not exist), namely \([E_{{k}}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k,r)\).

A compiler \({\mathbf {C}}_{{\mathcal {E}}}\) for \({\mathcal {E}}\) is sound when for any \((k,r)\in {\mathsf {K}}\times {\mathsf {R}}\), \([E_{{k}}^{{r}}]\) exactly implements the function \(E(k,\cdot )\) (i.e. it is fully functional). Therefore \([E_{{k}}^{{r}}]\) accepts as input any \(m\in {\mathsf {M}}\) and always returns the correct encryption \(c = E(k,m)\). At this stage, we only care about sound compilers.

Remark 1

In the above definition, we consider white-box compilers for the encryption function. However, since we focus on deterministic encryption – \(E(k,\cdot )\) and \(D(k,\cdot )\) being inverse of one another, we can swap roles without loss of generality and get compilers for the decryption procedure. We will precisely do this in Sect. 7.

Note again that \([E_{{k}}]\) differs in nature from \(E(k,\cdot )\). \(E(k, \cdot )\) is a mapping from \({\mathsf {M}}\) to \({\mathsf {C}}\), whereas \([E_{{k}}]\) is a word in some programming language (the details of which we want to keep away from) and has to fulfill some semantic consistency rules. Viewed as a binary string, it has a certain bitsize \({\mathsf {size}}\left( {[E_{{k}}]}\right) \in {\mathbb {N}}\). Even though \(E(k,\cdot )\) is deterministic, nothing forbids \([E_{{k}}]\) to collect extra bits from a random tape and behave probabilistically. For an input \(m\in {\mathsf {M}}\) and random tape \(\rho \in \{0,1\}^*\), \([E_{{k}}](m, \rho )\) takes a certain time \({\mathsf {time}}\left( {[E_{{k}}](m, \rho )}\right) \in {\mathbb {N}}\) to complete execution.

3.1 Attack Models

The first step in specifying new security notions for white-box cryptography is to classify the threats. This section introduces four distinct attack models for an adversary \({\mathcal {A}}\) in the white-box model: the chosen plaintext attack (\({\mathsf {CPA}}\)), the chosen ciphertext attack (\({\mathsf {CCA}}\)), the recompilation attack (\({\mathsf {RCA}}\)) and the chosen ciphertext and recompilation attack (\({\mathsf {CCA}}\)+\({\mathsf {RCA}}\)). In all of these, we assume that the compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is public, i.e. at any point in time, the adversary \({\mathcal {A}}\) can select any key \(k\in {\mathsf {K}}\) and nonce \(r\in {\mathsf {R}}\) of her choosing and generate a white-box implementation \([E_{{k}}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k, r)\) by herself.

In a chosen plaintext attack (\({\mathsf {CPA}}\)) the adversary can encrypt plaintexts of her choice under \(E(k,\cdot )\). Indeed, even though the encryption scheme \({\mathcal {E}}\) is a symmetric primitive, the attacks are defined with respect to the compiler that generates white-box programs implementing \(E(k,\cdot )\): given any one of these programs, the adversary can always evaluate it on arbitrary plaintexts at will. So clearly, chosen plaintexts attacks cannot be avoided, very much like in the public-key encryption setting.

In a chosen ciphertext attack (\({\mathsf {CCA}}\)), in addition to the challenge white-box implementation \([E_{{k}}^{{r}}]\), we give \({\mathcal {A}}\) access to a decryption oracle \(D(k, \cdot )\), i.e. she can send decryption queries \(c_1,\dots ,c_q\in {\mathsf {C}}\) adaptively to the oracle and be returned the corresponding plaintexts \(m_1,\dots ,m_q\in {\mathsf {M}}\) where \(m_i = D(k,c_i)\). Notice that this attack includes the \({\mathsf {CPA}}\) attack when \(q=0\).

In a recompilation attack (\({\mathsf {RCA}}\)), in addition to the challenge white-box implementation \([E_{{k}}^{{r}}]\), we give \({\mathcal {A}}\) access to a recompiling oracle \({\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\) that generates other programs \([E_{{k}}^{{r'}}]\) with key \(k\) for adversarially unknown random nonces \(r'\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\). In other words, we give \({\mathcal {A}}\) the ability to observe other programs compiled with the same key and different nonces.

In a chosen ciphertext and recompilation attack (\({\mathsf {CCA}}\)+\({\mathsf {RCA}}\)) we give \({\mathcal {A}}\) (the challenge white-box implementation \([E_{{k}}^{{r}}]\) and) simultaneous access to a decryption oracle \(D(k, \cdot )\) and a recompiling oracle \({\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\), both parametrized with the same key \(k\).

Remark 2

We emphasize that the recompilation attack model is not artificial when dealing with white-box cryptography. Indeed, it seems reasonable to assume that user-related values can be embedded in the random nonce \(r\in {\mathsf {R}}\) used to compile a (user-specific) white-box implementation. Thus a coalition of malicious users can be modeled as a single adversary with (possibly limited) access to a recompiling oracle producing white-box implementations under fresh random nonces \(r'\in {\mathsf {R}}\).

3.2 The Prime Goal: Unbreakability

Chow et al. stated in [10, 11] that the first security objective of white-box cryptography is, given a program \([E_{{k}}]\), to preserve the privacy of the key \(k\) embedded in the program (see also [17, Q1] and [30, Definition 2]). We define the following game to capture that intuition:

  1. 1.

    randomly generate a key \(k\leftarrow K()\) and a nonce \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

  2. 2.

    the adversary \({\mathcal {A}}\) is run on input \([E_{{k}}^{{r}}]= {\mathbf {C}}_{{\mathcal {E}}}(k,r)\),

  3. 3.

    \({\mathcal {A}}\) returns a guess \(\hat{k}\in {\mathsf {K}}\),

  4. 4.

    \({\mathcal {A}}\) succeeds if \(\hat{k} = k\).

Notice that at Step 2, the adversary may have access to the decryption oracle \(D(k, \cdot )\) or to the recompiling oracle \({\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\), or both, depending on the attack model.

Let us define more concisely and precisely the notion of unbreakability with respect to the attack model \({\mathsf {ATK}}\) (\({\mathsf {CPA}}\), \({\mathsf {CCA}}\), \({\mathsf {RCA}}\) or \({\mathsf {CCA}}\)+\({\mathsf {RCA}}\)).

Definition 1 (Unbreakability)

Let \({\mathcal {E}}\) be a symmetric encryption scheme as above, \({\mathbf {C}}_{{\mathcal {E}}}\) a white-box compiler for \({\mathcal {E}}\) and let \({\mathcal {A}}\) be an adversary. For \({\mathsf {ATK}}\in \{{\mathsf {CPA}}, {\mathsf {CCA}}, {\mathsf {RCA}},{\mathsf {CCA}}+{\mathsf {RCA}}\}\), we define

$$\begin{aligned} {{\mathsf{Succ}}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{{\mathsf {UBK}}-{\mathsf {ATK}}} \mathop {=}\limits ^{\scriptscriptstyle {\mathrm {def}}}{\mathrm{{Pr}}}\left[ k\leftarrow K() \,; \quad r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}} \,; \quad [E_{{k}}^{{r}}]= {\mathbf {C}}_{{\mathcal {E}}}(k,r) \,; \quad \hat{k} \leftarrow A^{{\mathcal {O}}}([E_{{k}}^{{r}}]) : \hat{k}=k\right] \end{aligned}$$

where

$$ \begin{array}{ll} {\mathcal {O}}(\cdot )=\epsilon &{} {\qquad } if~{\mathsf {ATK}}={\mathsf {CPA}}\\ {\mathcal {O}}(\cdot )=D(k, \cdot ) &{} {\qquad } if~{\mathsf {ATK}}={\mathsf {CCA}}\\ {\mathcal {O}}(\cdot )={\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}}) &{} {\qquad } if~{\mathsf {ATK}}={\mathsf {RCA}}\\ {\mathcal {O}}(\cdot )=\left\{ D(k, \cdot ),{\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\right\} &{} {\qquad } if~{\mathsf {ATK}}={\mathsf {CCA}}+{\mathsf {RCA}}\;. \end{array} $$

We say that \({\mathbf {C}}_{{\mathcal {E}}}\) is \((\tau ,\varepsilon )\)-secure in the sense of \({\mathsf {UBK}}\)-\({\mathsf {ATK}}\) if for any adversary \({\mathcal {A}}\) running in time at most \(\tau \), \({{\mathsf{Succ}}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{{\mathsf {UBK}}-{\mathsf {ATK}}}\le \varepsilon \).

Note that in our setting, a total break requires the adversary to output the whole key \(k\) embedded into \([E_{{k}}^{{r}}]\). Basing \({\mathsf {UBK}}\) on the semantic security of \(k\) makes no sense here since it is straightforward to ascertain, for some guess \(\hat{k}\), that \(\hat{k} = k\) by just checking whether the value returned by \([E_{{k}}^{{r}}](m)\) is equal to \(E(\hat{k}, m)\) for sufficiently many plaintext(s) \(m\in {\mathsf {M}}\). In other words, the distributions \(\{k, [E_{{k}}^{{r}}]\}_{k\in {\mathsf {K}},r\in {\mathsf {R}}}\) and \(\{k',[E_{{k}}^{{r}}]\}_{(k,k')\in {\mathsf {K}}^2,r\in {\mathsf {R}}}\) are computationally distinguishable. As a result, one cannot prevent some information leakage about \(k\) from \([E_{{k}}^{{r}}]\), whatever the specification of the compiler \({\mathbf {C}}_{{\mathcal {E}}}\).

Remark 3

Although not required in the above definition, for a white-box compiler to be cryptographically sound, one would require that there exist some security parameter \(\lambda \) such that \(\varepsilon /\tau \) be exponentially small in \(\lambda \) and \({\mathsf {size}}\left( {[E_{{k}}]}\right) \) and \({\mathsf {time}}\left( {[E_{{k}}](\cdot )}\right) \) be polynomial in \(\lambda \). Otherwise said, one aims to get a negligible \(\varepsilon /\tau \) while keeping fair \({\mathsf {size}}\left( {[E_{{k}}]}\right) \) and \({\mathsf {time}}\left( {[E_{{k}}](\cdot )}\right) \).

3.3 Security Notions Really Needed in Applications

When satisfied, unbreakability ensures that an adversary cannot extract the secret key of a randomly generated white-box implementation. Therefore any party should have to execute the program rather than simulating it with the secret key. While this property is the very least that can be expected from white-box cryptography, it is rather useless on its own. Indeed, knowing the white-box program amounts to knowing the key in some sense since it allows one to process the encryption without restriction. As discussed in [30, Sect. 3.1.3], an attacker only needs to isolate the cryptographic code in the implementation. This is a common threat in DRM applications, which is known as code lifting. Although some countermeasures can make code lifting a tedious task it is reasonable to assume that sooner or later a motivated attacker would eventually recover the cryptographic code. That is why, in order to make the white-box compilation useful, the availability of the white-box program should restrict the adversary capabilities compared to the availability of the secret key.

One-Wayness. A natural restriction is that although the white-box implementation allows one to encrypt at will, it should not enable decryption. In other words, it should be difficult to invert the program computations. In that case, the program is said to be one-way, to keep consistency with the notion of one-wayness (for a function or a cryptosystem) traditionally used in cryptography. As already noted in [17], a white-box compiler achieving one-wayness is of great interest as it turns a symmetric encryption scheme into a public-key encryption scheme. This is also one of the many motivations to design methods for general obfuscation [1, 14].

Incompressibility of Programs. Another argument often heard in favor of white-box cryptography is that a white-box program is less convenient to store and exchange than a mere secret key due to its bigger size. As formulated in [30, Sect. 3.1.3], white-box cryptography allows to “hide a key in an even bigger key”. For instance, Chow et al. implementation of AES [11] makes use of 800 KB of look-up tables, which represents a significant overhead compared to a \(128\)-bit key. Suppose this implementation was unbreakable in the sense of Definition 1 (which we know to be false [3]), the question that would arise would be: what is the computationally achievable minimum size of a program functionally equivalent to this implementation? When a program is hard to compress beyond a certain prescribed size, we shall say that this program is incompressible. Section 6 shows an example of computationally incompressible programs for symmetric encryption.

Traceability of Programs. It is often heard that white-box compilation can provide traceability (see for instance [30, Sect. 5.5.1]). Specifically, white-box compilation should enable one to derive several functionally equivalent versions of the same encryption (or decryption) program. A typical use case for such a system is the distribution of protected digital content where every legitimate user gets a different version of some decryption software. If a malicious user shares its own program (e.g. over the Internet), then one can trace the so-called traitor by identifying its unique copy of the program. However, in a white-box context, a user can easily transform its version of the program while keeping the same functionality. Therefore to be effective, the tracing should be robust to such transformations, even in the case where several malicious users collude to produce an untraceable software. We show in Sect. 7 how to achieve such a robust tracing from a compiler that can hide functional perturbations in a white-box program. Accordingly, we define new security notions for such a white-box compiler. Combined with our tracing scheme, a compiler achieving these security notions is shown to provide traceable white-box programs.

4 One-Wayness

An adversarial goal of interest in white-box cryptography consists, given a white-box implementation \([E_{{k}}^{{r}}]\), in recovering the plaintext of a given ciphertext with respect to the embedded key \(k\). This security notion is even essential when white-box implementations are deployed as an asymmetric primitive [17, Q4]. We define the following security game to capture that intuition:

  1. 1.

    randomly select a key \(k\leftarrow K()\) and a nonce \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

  2. 2.

    generate the white box program \([E_{{k}}^{{r}}]= {\mathbf {C}}_{{\mathcal {E}}}(k,r)\),

  3. 3.

    randomly select a plaintext \(m\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {M}}\)

  4. 4.

    compute its encryption \(c = E(k,m)\),

  5. 5.

    the adversary \({\mathcal {A}}\) is run on inputs \([E_{{k}}^{{r}}]\) and \(c\),

  6. 6.

    \({\mathcal {A}}\) returns a guess \(\hat{m}\),

  7. 7.

    \({\mathcal {A}}\) succeeds if \(\hat{m} = m\).

Notice that at Step 5, the adversary may have access to the decryption oracle \(D(k, \cdot )\) or to the recompiling oracle \({\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\) (or both) depending on the attack model. When \({\mathcal {A}}\) is given access to the decryption oracle, the challenge ciphertext \(c\) itself shall be rejected by the oracle.

Let us define more precisely the notion of one-wayness with respect to the attack model \({\mathsf {ATK}}\).

Definition 2 (One-Wayness)

Let \({\mathcal {E}}\) be a symmetric encryption scheme as above, \({\mathbf {C}}_{{\mathcal {E}}}\) a white-box compiler for \({\mathcal {E}}\) and \({\mathcal {A}}\) an adversary. For \({\mathsf {ATK}}\in \{{\mathsf {CPA}},{\mathsf {CCA}}, {\mathsf {RCA}},{\mathsf {CCA}}+{\mathsf {RCA}}\}\), let

$${{\mathsf{Succ}}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{{\mathsf {OW}}-{\mathsf {ATK}}} \mathop {=}\limits ^{\scriptscriptstyle {\mathrm {def}}}{\mathrm{{Pr}}}\bigg [\begin{array}{c} k\leftarrow K() \,; \quad r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}} \,; \quad [E_{{k}}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k,r) \,; \quad \\ m\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {M}} \,; \quad c = E(k,m) \,; \quad \hat{m} \leftarrow {\mathcal {A}}^{{\mathcal {O}}}([E_{{k}}^{{r}}],c) \end{array}:\hat{m}=m \bigg ]$$

where

$$ \begin{array}{ll} {\mathcal {O}}(\cdot )=\epsilon &{} \qquad if~{\mathsf {ATK}}={\mathsf {CPA}}\\ {\mathcal {O}}(\cdot )=D(k, \cdot ) &{} \qquad if~{\mathsf {ATK}}={\mathsf {CCA}}\\ {\mathcal {O}}(\cdot )={\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}}) &{} \qquad if~{\mathsf {ATK}}={\mathsf {RCA}}\\ {\mathcal {O}}(\cdot )=\left\{ D(k, \cdot ),{\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\right\} &{} \qquad if~{\mathsf {ATK}}={\mathsf {CCA}}+{\mathsf {RCA}}\;. \end{array} $$

We say that \({\mathbf {C}}_{{\mathcal {E}}}\) is \((\tau ,\varepsilon )\)-secure in the sense of \({\mathsf {OW}}\)-\({\mathsf {ATK}}\) if \({\mathcal {A}}\) running in time at most \(\tau \) implies \({{\mathsf{Succ}}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{{\mathsf {OW}}-{\mathsf {ATK}}}\le \varepsilon \).

Similarly to the unbreakability notion, it is obvious that any incorrect guess \(\hat{m}\) on \(m\) can be rejected by comparing the value returned by \([E_{{k}}^{{r}}](\hat{m})\) with \(c\). In other words, the two distributions

$$ \left\{ [E_{{k}}^{{r}}], E(k,m), m \right\} _{k\in {\mathsf {K}},r\in {\mathsf {R}},m\in {\mathsf {M}}} \text{ and } \left\{ [E_{{k}}^{{r}}], E(k,m), m'\right\} _{k\in {\mathsf {K}},r\in {\mathsf {R}},m,m'\in {\mathsf {M}}} $$

are easily distinguishable. Moreover, there is an easy reduction from \({\mathsf {OW}}\)-\({\mathsf {ATK}}\) to \({\mathsf {UBK}}\)-\({\mathsf {ATK}}\). Clearly, extracting \(k\) from \([E_{{k}}]\) enables one to use it and the challenge as inputs to the (publicly available) decryption function \(D(\cdot , \cdot )\) and thus to recover \(m\).

5 Incompressibility of White-Box Programs

In this section, we formalize the notion of incompressibility for a white-box compiler. What we mean by incompressibility here is the hardness, given a (large) compiled program \([E_{{k}}]\), of coming up with a significantly smaller program functionally close to \(E(k,\cdot )\). A typical example is when a content provider distributes a large encryption program (e.g. \(100\) GB or more) and wants to make sure that no smaller yet equivalent program can be redistributed by subscribers to illegitimate third parties. The content provider cannot prevent the original program from being shared e.g. over the Internet; however, if compiled programs are provably incompressible then redistribution may be somewhat discouraged by the size of transmissions.

We define \((\lambda ,\delta )\)-\({\mathsf {INC}}\) as the adversarial goal that consists, given a compiled program \([E_{{k}}]\) with \({\mathsf {size}}\left( {[E_{{k}}]}\right) \gg \lambda \), in building a smaller program \(P\) that remains satisfactorily functional, i.e. such that

$$ {\mathsf {size}}\left( {P}\right) < \lambda \qquad \text{ and }\qquad P\in {\delta }\text{- }{\mathsf {prog}}\left( {{E(k,\cdot )}}\right) \;. $$

This is formalized by the following game:

  1. 1.

    randomly select \(k\leftarrow K()\) and \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

  2. 2.

    compile \([E_{{k}}^{{r}}]= {\mathbf {C}}_{{\mathcal {E}}}(k,r)\),

  3. 3.

    run \({\mathcal {A}}\) on input \([E_{{k}}^{{r}}]\),

  4. 4.

    \({\mathcal {A}}\) returns some program \(P\),

  5. 5.

    \({\mathcal {A}}\) succeeds if \(\varDelta (P, E(k,\cdot )) \le \delta \) and \({\mathsf {size}}\left( {P}\right) < \lambda \).

Definition 3 ( \(\mathbf{(}\lambda ,\delta \mathbf{)}\) -Incompressibility)

Let \({\mathcal {E}}\) be a symmetric encryption scheme, \({\mathbf {C}}_{{\mathcal {E}}}\) a white-box compiler for \({\mathcal {E}}\) and \({\mathcal {A}}\) an adversary. For \({\mathsf {ATK}}\in \{{\mathsf {CPA}},\) \({\mathsf {CCA}},{\mathsf {RCA}},{\mathsf {CCA}}+{\mathsf {RCA}}\}\), let

where

$$ \begin{array}{ll} {\mathcal {O}}(\cdot )=\epsilon &{} \qquad if~{\mathsf {ATK}}={\mathsf {CPA}}\\ {\mathcal {O}}(\cdot )=D(k, \cdot ) &{} \qquad if~{\mathsf {ATK}}={\mathsf {CCA}}\\ {\mathcal {O}}(\cdot )={\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}}) &{} \qquad if~{\mathsf {ATK}}={\mathsf {RCA}}\\ {\mathcal {O}}(\cdot )=\left\{ D(k, \cdot ),{\mathbf {C}}_{{\mathcal {E}}}(k,{\mathsf {R}})\right\} &{} \qquad if~{\mathsf {ATK}}={\mathsf {CCA}}+{\mathsf {RCA}}\;. \end{array} $$

We say that \({\mathbf {C}}_{{\mathcal {E}}}\) is \((\tau ,\varepsilon )\)-secure in the sense of \((\lambda ,\delta )\)-\({\mathsf {INC}}\)-\({\mathsf {ATK}}\) if having \({\mathcal {A}}\) running in time at most \(\tau \) implies that \({\mathsf{Adv}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{(\lambda ,\delta )-{\mathsf {INC}}-{\mathsf {ATK}}} \le \varepsilon \).

Notice that for some values of \(\lambda \) and \(\delta \), the \((\lambda ,\delta )\)-incompressibility may be trivially broken. For example, the problem is trivial for \(\delta =1\) as the user can always construct any program smaller than \(\lambda \) bits with outputs unrelated to \(E(k,\cdot )\). Even though the definition allows any \(\delta \in [0,1]\), the notion makes more sense (and surely is harder to break) when \(\delta \) is taken small enough. In that case, the adversary has to output a program which correctly encrypts nearly all plaintexts (or at least a significant fraction).

It seems natural to hope that a reduction exists from \({\mathsf {INC}}\)-\({\mathsf {ATK}}\) to \({\mathsf {UBK}}\)-\({\mathsf {ATK}}\): intuitively, extracting \(k\) from \([E_{{k}}]\) enables one to build a small program that implements \(E(k,\cdot )\). Let \(\lambda (k)\) be the size of that program; it is easily seen that \(\lambda (k)\) is lower-bounded by

$$ \lambda _0 = H(k) + {\mathsf {size}}\left( {P_E}\right) $$

where \(H(k)\) is the average number of bits needed to represent the key \(k\) and \(P_E\) the smallest known program that implements the generic encryption function \(E(\cdot ,\cdot )\) that takes \(k,m\) as inputs and returns \(E(k,m)\). When \(\lambda _0\le \lambda \), a total break (i.e. recovering the key \(k\)) will allow to break \((\lambda ,0)\)-incompressibility by outputting a program \(P\) composed of \(P_E\) and a string representing \(k\), which will be of size at most \(\lambda _0\) (\(\le \lambda \)).

On the other hand, denoting

$$\begin{aligned} \lambda ^+ = \sup _{k\in {\mathsf {K}},r\in {\mathsf {R}}} {\mathsf {size}}\left( {[E_{{k}}^{{r}}]}\right) ~~~\text {and}~~~ \lambda ^- = \inf _{k\in {\mathsf {K}},r\in {\mathsf {R}}} {\mathsf {size}}\left( {[E_{{k}}^{{r}}]}\right) \;, \end{aligned}$$

we also see that when \(\lambda \ge \lambda ^+\), the challenge program \([E_{{k}}^{{r}}]\) given to \({\mathcal {A}}\) already satisfies the conditions of a satisfactorily compressed program and \({\mathcal {A}}\) may then return \(P = [E_{{k}}^{{r}}]\) as a solution. \((\lambda ,\delta )\)-\({\mathsf {INC}}\) is therefore trivial to break in that case. However, \((\lambda ,\delta )\)-incompressibility for \(\lambda \le \lambda ^-\) may not be trivial to break. To conclude, the \((\lambda ,\delta )\)-incompressibility notion makes sense in practice for parameters \(\lambda \in (\lambda _0,\lambda ^-)\) and \(\delta \) close to \(0\).

6 A Provably One-Way and Incompressible White-Box Compiler

In this section, we give an example of a symmetric encryption scheme for which there exists a efficient one-way and incompressible white-box compiler. This example is a symmetric-key variant of the RSA cryptosystem [27]. The one-wayness and incompressibility properties of the compiler are provably achieved based on standard hardness assumptions related to the integer factoring problem.

One-Way Compilers from Public-Key Encryption. It is worthwhile noticing that any one-way public-key encryption scheme straightforwardly gives rise to a symmetric encryption scheme for which a one-way compiler exists. The symmetric key is defined as the secret key of the asymmetric encryption scheme and encryption is defined as the function deriving the public key from the secret key composed with the encryption procedure. The white-box compiler then simply produces a program evaluating the encryption algorithm with the public key embedded in it. The one-wayness of the compiler comes directly from the one-wayness of the asymmetric scheme. Such an example of a one-way compiler is given in [29, Theorem 3],[30, Sect. 4.8.2].

We present hereafter another compiler obtained from the RSA cryptosystem and whose one-wayness straightforwardly holds by construction. The main interest of our example is to further satisfy \((\lambda ,0)\)-incompressibility for any arbitrary \(\lambda \). We first recall some background on RSA groups.

6.1 RSA Groups

We consider a (multiplicative) group \(\mathcal {G}\) of unknown order \(\omega \), also called an RSA group. A typical construction for \(\mathcal {G}\) is to take the group of invertible integers modulo a composite number or a carefully chosen elliptic curve over a ring. Practical RSA groups are known to be efficiently samplable in the sense that there exists a group generation algorithm \({\mathbb {G}}\) which, given a security parameter \(n\in {\mathbb {N}}\), outputs the public description \({\mathsf {desc}}\left( {\mathcal {G}}\right) \) of a random group \(\mathcal {G}\) together with its order \(\omega \). Efficient means that the random selection

$$ ({\mathsf {desc}}\left( {\mathcal {G}}\right) ,\omega ) \leftarrow {\mathbb {G}}(1^n) $$

takes time polynomial in \(n\). The parameter \(n\) determine the size of the returned order (i.e. \(|\omega |=n\)) and hence tunes the hardness of breaking the group. For security reasons, we require the returned order \(\omega \) to have a low smoothness. More specifically, we require that it satisfy \(\varphi (\omega ) \ge \frac{1}{3} \omega \), where \(\varphi \) denotes the Euler’s totient function.Footnote 2 The group descriptor \({\mathsf {desc}}\left( {\mathcal {G}}\right) \) intends to contain all the necessary parameters for performing group operations. Obviously \(\omega \) is excluded from the group description.

In the following, we shall make the usual hardness assumptions for RSA group generators. Namely, we assume that the groups sampled by \({\mathbb {G}}\) have the following properties (formal definitions for these security notions are provided in the full version of this paper [12]):

Unbreakability – \({\mathsf {UBK}}[{\mathbb {G}}]\):

   It is hard to compute the secret order \(\omega \) of \(\mathcal {G}\) from \({\mathsf {desc}}\left( {\mathcal {G}}\right) \).

Hardness of Extracting Orders – \({\mathsf {ORD}}[{\mathbb {G}}]\):

   It is hard to compute the order of a random group element \(x \overset{\scriptscriptstyle {\$}}{\leftarrow }\mathcal {G}\) (or a multiple thereof) from \({\mathsf {desc}}\left( {\mathcal {G}}\right) \).

Hardness of Extracting Roots – \({\mathsf {RSA}}[{\mathbb {G}}]\):

   For a random integer \(e \in [0,\omega )\) such that \(\gcd (e,\omega ) = 1\), it is hard to compute the \(e\)-th root of a random group element \(x \in \mathcal {G}\) from \(e\) and \({\mathsf {desc}}\left( {\mathcal {G}}\right) \).  

6.2 The White-Box Compiler

We consider the symmetric encryption scheme \({\mathcal {E}}= ({\mathsf {K}}, {\mathsf {M}}, {\mathsf {C}}, K, E, D)\) where:

  1. 1.

    \({\mathcal {E}}\) makes use of a security parameter \(n\in {\mathbb {N}}\),

  2. 2.

    \(K()\) randomly selects a group \(({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega )\leftarrow {\mathbb {G}}(1^n)\) and a public exponent \(e \in [0,\omega )\) such that \(\gcd (e,\omega ) = 1\), and returns \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e)\),

  3. 3.

    plaintexts and ciphertexts are group elements i.e. \({\mathsf {M}} = {\mathsf {C}} = \mathcal {G}\),

  4. 4.

    given a key \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e)\) and a plaintext \(m\in \mathcal {G}\), \(E(k,m)\) computes \(m^{e\mod \omega }\) in the group and returns that value,

  5. 5.

    given a key \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e)\) and a ciphertext \(c\in \mathcal {G}\), \(D(k,c)\) computes \(c^{\frac{1}{e}\mod \omega }\) in the group and returns that value.

It is clear that \(D(k,E(k,m)) = m\) for any \(k\in {\mathsf {K}}\) and \(m\in {\mathsf {M}}\). Our white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is then defined as follows:

  1. 1.

    \({\mathbf {C}}_{{\mathcal {E}}}\) makes use of an additional security parameter \(h\in {\mathbb {N}}\),

  2. 2.

    the randomness space \({\mathsf {R}}\) is the integer set \([0,2^{h}/\omega )\),

  3. 3.

    we define the blinded exponent \(f\) with respect to the public exponent \(e\) and a random nonce \(r \in {\mathsf {R}}\) as the integer \(f = e + r \cdot \omega \),

  4. 4.

    given a key \(k = ({\mathsf {desc}}\left( {\mathcal {G}}\right) , \omega , e) \in {\mathsf {K}}\), and a random nonce \(r \in {\mathsf {R}}\), our white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) generates a program \([E_{{k}}]\) which simply embeds \({\mathsf {desc}}\left( {\mathcal {G}}\right) \) and \(f\) and computes \(m^{f}\) for any input \(m\in \mathcal {G}\).

According to the above definition, we clearly have that the white-box program \([E_{{k}}]\) is a functional program with respect to the encryption function \(E(k,\cdot )\). Moreover, we state (see proof in the full version [12]):

Theorem 1

The white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is \({\mathsf {UBK}}\)-\({\mathsf {CPA}}\) secure under the assumption that \({\mathsf {UBK}}[{\mathbb {G}}]\) is hard, and \({\mathsf {OW}}\)-\({\mathsf {CPA}}\) secure under the assumption that \({\mathsf {RSA}}[{\mathbb {G}}]\) is hard.

6.3 Proving Incompressibility Under Chosen Plaintext Attacks

We now show that \({\mathbf {C}}_{{\mathcal {E}}}\) is \((\lambda ,0)\)-\({\mathsf {INC}}\)-\({\mathsf {CPA}}\) secure under \({\mathsf {UBK}}[{{\mathbb {G}}}]\) as long as the security parameter \(h\) is slightly greater than \(\lambda \). We actually show a slightly weaker result: our reduction assumes that the program \(P\) output by the adversary is algebraic. An algebraic program \(P\) (see [5, 26]) with respect to group \(\mathcal {G}\) has the property that each and every group element \(y\in \mathcal {G}\) output by \(P\) is computed as a linear combination of all the group elements \(x_1,\dots ,x_t\) that were given to \(P\) as input in the same execution. Relying on the definition of [26], \(P\) must then admit an efficient extractor Extract (running in time \(\tau _{\mathsf {Ex}}\)) which, given the code of \(P\) as well as all its inputs and random tape for some execution, returns the coefficients \(\alpha _i\) such that \(y=x_1^{\alpha _1}\cdots x_t^{\alpha _t}\).

Theorem 2

For every \(h > \lambda + \log _2(3)\), the compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is \((\tau _{\mathcal {A}}, \varepsilon _{\mathcal {A}})\)-secure in the sense of \((\lambda , 0)\)-\({\mathsf {INC}}\)-\({\mathsf {CPA}}\) under the assumption that \({\mathsf {ORD}}[{{\mathbb {G}}}]\) is \((\tau , \varepsilon )\)-hard, with

$$\tau _{\mathcal {A}}= \tau - \tau _{\mathsf {Ex}} ~~~\mathrm{and}~~~ \varepsilon _{\mathcal {A}}< \frac{3}{1 - 3 \cdot 2^{\lambda -h}} \, \varepsilon ~. $$

The proof of Theorem 2 is provided in the full version of the paper [12].

Remark 4

The white-box compiler can also be shown to be \((\lambda , 0)\)-\({\mathsf {INC}}\)-\({\mathsf {CCA}}\) secure under the (gap) assumption that \({\mathsf {ORD}}[{{\mathbb {G}}}]\) remains hard when \({\mathsf {RSA}}[{{\mathbb {G}}}]\) is easy. The reduction would work similarly but with an oracle solving \({\mathsf {RSA}}[{{\mathbb {G}}}]\) that it would use to simulate decryption queries.

7 Traceability of White-Box Programs

One of the main applications of white-box cryptography is the secure distribution of valuable content through applications enforcing digital rights management (DRM). Namely, some digital content is distributed in encrypted form to legitimate users. A service user may then recover the content in clear using her own private white-box-secure decryption software.

However, by sharing their decryption software, users may collude and try to produce a pirate decryption software i.e. a non-registered utility capable of decrypting premium content. Traitor tracing schemes [4, 8, 9, 25] were specifically designed to fight copyright infringement, by enabling a designated authority to recover the identity of at least one of the traitors in the malicious coalition who constructed the rogue decryption software. In this section, we show how to apply some of these techniques to ensure the full traceability of programs assuming that slight perturbations of the programs functionality by the white-box compiler can remain hidden to an adversary.

As opposed to previous sections, we interchange the roles of encryption and decryption, considering that for our purpose, user programs would implement decryption rather than encryption.

7.1 Programs with Hidden Perturbations

A program can be made traceable by unnoticeably modifying its functionality. The basic idea is to perturbate the program such that it returns an incorrect output for a small set of unknown inputs (which remains a negligible fraction of the input domain). The set of so-called tracing inputs varies according to the identity of end users so that running the decryption program over inputs from different sets and checking the returned outputs efficiently reveals the identity of a traitor. We consider tracing schemes that follow this approach to make programs traceable in the presence of pirate coalitions. Of course, one must consider collusions of several users aiming to produce an untraceable program from their own legitimate programs. A tracing scheme that resists such collusions is said to be collusion-resistant.

In the context of deterministic symmetric encryption schemes, one can generically describe functional perturbations with the following formalism. Consider a symmetric encryption scheme \({\mathcal {E}}= \left( {\mathsf {K}}, {\mathsf {M}}, {\mathsf {C}}, K, E, D\right) \) under the definition of Sect. 2. A white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) with respect to \({\mathcal {E}}\) that supports perturbation takes as additional input an ordered list of dysfunctional ciphertexts \(\varvec{c} = \langle {c_1, \dots , c_u} \rangle \in {{\mathsf {C}}}^u\) and returns a program

$$[D_{{k,\varvec{c}}}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k,r ; \varvec{c})$$

such that \([D_{{k,\varvec{c}}}^{{r}}](c) = D(k, c)\) for any \(c\in {\mathsf {C}}\setminus \varvec{c}\) and for \(i\in [1, u]\), \([D_{{k,\varvec{c}}}^{{r}}](c_i)\) returns some incorrect plaintext randomly chosen at compilation. We will say that \({\mathbf {C}}_{{\mathcal {E}}}\) hides functional perturbations when, given a program instance \(P = [D_{{k,\varvec{c}}}^{{r}}]\), an adversary cannot extract enough information about the dysfunctional input-output pairs to be able to correct \(P\) back to its original functionality. It is shown later that perturbated programs can be made traceable assuming that it is hard to recover the correct output of dysfunctional inputs. This is formalized by the following game:

  1. 1.

    randomly select \(k\leftarrow K()\), \(m \overset{\scriptscriptstyle {\$}}{\leftarrow }{{\mathsf {M}}}\) and \(r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\),

  2. 2.

    compile \([D_{{k, \langle {c } \rangle }}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k, r ; \langle {c } \rangle )\) with \(c = E(k, m)\),

  3. 3.

    run \({\mathcal {A}}\) on input \((c, [D_{{k, \langle {c } \rangle }}^{{r}}])\),

  4. 4.

    \({\mathcal {A}}\) return some message \(\hat{m}\),

  5. 5.

    \({\mathcal {A}}\) succeeds if \(\hat{m} = m\).

Definition 4 (Perturbation-Value Hiding)

Let \({\mathcal {E}}\) be a symmetric encryption scheme, \({\mathbf {C}}_{{\mathcal {E}}}\) a white-box compiler for \({\mathcal {E}}\) that supports perturbations, and let \({\mathcal {A}}\) be an adversary. Let

$$ {{\mathsf{Succ}}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{\mathsf {PVH}} \mathop {=}\limits ^{\scriptscriptstyle {\mathrm {def}}}{\mathrm{{Pr}}}\left[ \begin{array}{c} k\leftarrow K() \,; \quad m \overset{\scriptscriptstyle {\$}}{\leftarrow }{{\mathsf {M}}} \,; \quad c = E(k, m) \,; \quad \\ r\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}} \,; \quad [D_{{k, \langle {c } \rangle }}^{{r}}] = {\mathbf {C}}_{{\mathcal {E}}}(k,r; \langle {c } \rangle ) \,; \quad \\ \hat{m} \leftarrow {\mathcal {A}}^{\mathcal {O}}(c, [D_{{k, \langle {c } \rangle }}^{{r}}]) \end{array} : \hat{m} = m\right] \;. $$

where \({\mathcal {O}}\) is a recompiling oracle \({\mathcal {O}}(\cdot ) \mathop {=}\limits ^{\scriptscriptstyle {\mathrm {def}}}{\mathbf {C}}_{{\mathcal {E}}}(k, {\mathsf {R}}; \langle {c , \cdot } \rangle )\) that takes as input a list of dysfunctional inputs containing \(c\) and returns a perturbated program accordingly, under adversarially unknown randomness. The white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is said \((\tau ,\varepsilon )\)-secure in the sense of PVH if \({\mathcal {A}}\) running in time at most \(\tau \) implies \({{\mathsf{Succ}}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{\mathsf {PVH}}\le \varepsilon \).

A second security notion that we will make use of for our tracing construction relates to the intuition that all perturbations should be equally hidden by the white-box compiler. Namely, it should not matter in which order the dysfunctional inputs are given to the compiler: they should all appear equally hard to recover to an adversary. When this property is realized, we say that the compiler achieves perturbation-index hiding. We formalize this notion with the following game, where \(n>1\) and \(v\in [1,n-1]\) are fixed parameters:

  1. 1.

    randomly select \(k\leftarrow K()\),

  2. 2.

    for \(i\in [1,n]\), randomly select \(m_i \overset{\scriptscriptstyle {\$}}{\leftarrow }{{\mathsf {M}}}\) and set \(c_i = E(k, m_i)\),

  3. 3.

    for \(i\in [1,n]\) with \(i\ne v\), randomly select \(r_i\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\) and generate \(P_i = {\mathbf {C}}_{{\mathcal {E}}}(k, r_i ; \langle {c_1,\dots , c_i} \rangle )\),

  4. 4.

    randomly pick \(b\overset{\scriptscriptstyle {\$}}{\leftarrow }\{0,1\}\),

  5. 5.

    run \({\mathcal {A}}\) on inputs \(P_1,\dots ,P_{v-1},P_{v+1},\dots , P_n\) and \((m_{v+b},c_{v+b})\),

  6. 6.

    \({\mathcal {A}}\) returns a guess \(\hat{b}\) and succeeds if \(\hat{b} = b\).

Definition 5 (Perturbation-Index Hiding)

Let \({\mathcal {E}}\) be a symmetric encryption scheme, \({\mathbf {C}}_{{\mathcal {E}}}\) a white-box compiler for \({\mathcal {E}}\) that supports perturbations, and let \({\mathcal {A}}\) be an adversary. Let

$$\begin{aligned} {\mathsf{Adv}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{\mathsf {PIH}} \mathop {=}\limits ^{\scriptscriptstyle {\mathrm {def}}}\left| {\mathrm{{Pr}}}\left[ \begin{array}{c} k\leftarrow K() \,; \quad m_i \overset{\scriptscriptstyle {\$}}{\leftarrow }{{\mathsf {M}}} \,; \quad c_i = E(k, m_i) \text{ for } i \in [1,n] \\ r_i\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}} \,; \quad P_i = {\mathbf {C}}_{{\mathcal {E}}}(k, r_i ; \langle {c_1,\dots , c_i} \rangle ) \text{ for } i \in [1,n], i\ne v\\ b\overset{\scriptscriptstyle {\$}}{\leftarrow }\{0, 1\} \,; \quad \hat{b} \leftarrow {\mathcal {A}}(\{P_i\}_{i\ne v}, m_{v+b}, c_{v+b}) \end{array} : \hat{b} = b\right] - \frac{1}{2} \right| \;. \end{aligned}$$

The white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) is said to be \((\tau ,\varepsilon )\)-secure in the sense of PIH if \({\mathcal {A}}\) running in time at most \(\tau \) implies \({\mathsf{Adv}}_{{\mathcal {A}},{\mathbf {C}}_{{\mathcal {E}}}}^{\mathsf {PIH}}\le \varepsilon \).

Note that in a PIH-secure white-box compiler, all entries in the list of its dysfunctional inputs can be permuted with no (non-negligible) impact on the security of the compiler.

7.2 A Generic Tracing Scheme

We now give an example of a tracing scheme \({\mathcal {T}}\) for programs generated by a white-box compiler \({\mathbf {C}}_{{\mathcal {E}}}\) that supports hidden perturbations. We formally prove that the identification of at least one traitor is computationally enforced assuming that \({\mathbf {C}}_{{\mathcal {E}}}\) is secure in the sense of PVH and PIH, independently of the total number \(n\) of issued programs. Under these assumptions, \({\mathcal {T}}\) therefore resists collusions of up to \(n\) users i.e. is maximally secure. As usual in traitor-tracing schemes, \({\mathcal {T}}\) is composed of a setup algorithm \({\mathcal {T}}.\mathsf{setup}\) and a tracing algorithm \({\mathcal {T}}.\mathsf{trace}\). These algorithms are defined as follows.

Setup Algorithm. A random key \(k\overset{\scriptscriptstyle {\$}}{\leftarrow }K()\) is generated as well as \(n\) random input-output pairs \((m_i, c_i)\) where \(m_i\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {M}}\) and \(c_i = E(k, m_i)\) for \(i\in [1, n]\). \({\mathcal {T}}\) keeps \(\mathsf {perturbations}= \left( (m_1, c_1), \dots , (m_n, c_n)\right) \) as private information for later tracing. For \(i\in [1, n]\), user \(i\) is (securely) given the \(i\)-perturbated program \(P_i = {\mathbf {C}}_{{\mathcal {E}}}(k, r_i; \langle {c_1,\dots , c_i } \rangle )\) where \(r_i\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {R}}\). It is easily seen that all \(P_i\)’s correctly decrypt any \(c\not \in \{c_i, i\in [1,n]\}\). However when \(c = c_{i}\), user programs \(P_{i}, \dots , P_n\) return junk while \(P_1, \dots , P_{i-1}\) remain functional. Therefore \({\mathcal {T}}\) implements a private linear broadcast encryption (PLBE) scheme in the sense of [4].

Tracing Algorithm. Given a rogue decryption program \(Q\) constructed from a set of user programs \(\{P_j \mid j\in T \subseteq [1, n]\}\), \({\mathcal {T}}.\mathrm{trace}\) uses its knowledge of \(k\) and \(\mathsf {perturbations}\) to identify a traitor \(j\in T\) in \(O(\log n)\) evaluations of \(Q\) as follows. Since \(Q\) is just a program and is therefore stateless, the general tracing techniques of [4, 25] are applicable. \({\mathcal {T}}.\mathrm{trace}\) makes use of two probability estimators as subroutines:

  1. 1.

    a probability estimator \(\widehat{p_0}\) which intends to measure the actual probability

    $$ p_0 = {\mathrm{{Pr}}}\left[ {m\overset{\scriptscriptstyle {\$}}{\leftarrow }{\mathsf {M}}\,; \quad c = E(k, m) : Q(c) = m}\right] $$

    when all calls \(Q\) makes to an external random source are fed with a perfect source. Since the pirate decryption program is assumed to be fully or almost fully functional, \(p_0\) must be significantly close to \(1\). It is classical to require from \(Q\) that \(p_0 \ge 1/2\).

  2. 2.

    a probability estimator \(\widehat{p_v}\) which, given \(v\in [1, n]\), estimates the actual probability

    $$ p_v = {\mathrm{{Pr}}}\left[ {Q(c_{v}) = m_{v}}\right] $$

    where \(Q\) is run over a perfect random source again.

To estimate \(p_v\) for \(v\in [0,n]\), \(Q\) is executed \(\theta \) times (on fresh random tapes), where \(\theta \) is an accuracy parameter. Then, one counts how many times, say \(\nu \), the returned output is as expected and \(\widehat{p_v}\) is set to \(\nu /\theta \). Finally, \({\mathcal {T}}.\mathrm{trace}\) implements a dichotomic search as shown on Fig. 1.

We state (see proof in the full version [12]):

Fig. 1.
figure 1

Dichotomic search implemented by \({\mathcal {T}}.\mathsf{trace}\)

Theorem 3

Assume \({\mathbf {C}}_{{\mathcal {E}}}\) is secure in the sense of both PVH and PIH. Then for any subset of traitors \(T\subseteq [1, n]\), \({\mathcal {T}}.\mathsf{trace}\) correctly returns a traitor \(j\in T\) with overwhelming probability after \(O(\log n)\) executions of the pirate decryption program \(Q\).

This result validates the folklore intuition according to which cryptographic programs can be made efficiently traceable when properly obfuscated and assuming that slight alterations can be securely inserted in them. It also identifies clearly which sufficient security properties must be fulfilled by the white-box compiler to achieve traceability even when all users collude i.e., in the context of total piracy.