1 Introduction

1.1 Background

The beautiful concept of non-malleable codes [24] has recently emerged at the intersection between cryptography and information theory. Given a function family \(\mathcal {F}\), such codes allow to encode a \(k\)-bit value \(s\) into an \(n\)-bit codeword \(c\), such that, for each \(f\in \mathcal {F}\), it is unlikely that \(f(c)\) encodes a value \(\tilde{s}\) that is related to \(s\). On the theoretical side, being a weaker guarantee than error correction/detection, non-malleability is achievable for very rich families \(\mathcal {F}\); on the practical side, non-malleable codes have interesting applications to cryptography.

Continuous Non-malleability. In the original definition of non-malleable codes, the property of non-malleability is guaranteed as long as a single, possibly adversarial, function \(f\in \mathcal {F}\) is applied to a target codeword. All bets are off, instead, if an adversary can tamper multiple times with the same codeword. While “one-time” non-malleability is already sufficient in some cases, it comes with some shortcomings, among which, for instance, the fact that in applications, after a decoding takes place, we always need to re-encode the message using fresh randomness; the latter might be problematic, as such a re-encoding procedure needs to take place in a tamper-proof environment.

Motivated by these limitations, Faust et al. [29] introduced a natural extension of non-malleable codes where the adversary is allowed to tamper a target codeword by specifying polynomially-many functions \(f_j\in \mathcal {F}\); in case the functions can be chosen adaptively, depending on the outcome of previous queries, we speak of adaptive tampering, and otherwise we say that tampering is non-adaptive. As argued in [29], such continuously non-malleable codes allow to overcome several limitations of one-time non-malleable codes, and further yield new applications where continuous non-malleability is essential [17, 19, 29, 30].

Bit-Wise and Split-State Tampering. Since non-malleable codes do not involve secret keys, it is impossible to achieve (even one-time) non-malleability against all efficient families of functions \(\mathcal {F}\). (In fact, whenever the encoding and decoding algorithms belong to \(\mathcal {F}\), it is always possible to decode the target codeword, obtain the message, and encode a related value.) For this reason, research on non-malleable codes has focused on obtaining (continuous) non-malleability for limited, yet interesting, particular families. Two prominent examples, which are also the focus of this work, are described below:

  • Bit-wise independent tampering: Here, each function \(f\in \mathcal {F}_\mathsf{bit}^n\) is specified as a tuple \(f:= (f_1,\ldots ,f_n)\), where each \(f_i\) is an arbitrary map determining whether the i-th bit of the codeword should be kept, flipped, set to zero, or set to one. Continuously non-malleable codes for bit-wise independent tampering, with information-theoretic security, exist in the plain model [19] (i.e., without assuming a trusted setup).

  • Split-state tampering: Here, each function \(f\in \mathcal {F}_\mathsf{split}^{n_0,n_1}\) is specified as a pair \(f:= (f_0,f_1)\), where \(n= n_0 + n_1\), and \(f_0\) and \(f_1\) are arbitrary functions to be applied, respectively, to the first \(n_0\) bits and to the last \(n_1\) bits of the codeword. Continuously non-malleable codes for split-state tampering, with computational security, were constructed in the common reference string (CRS) model [26, 29] (i.e., assuming a trusted setup), and very recently in the plain model [43] (assuming injective one-way functions).

It is well known that continuous non-malleability is impossible in the split-state model with information-theoretic security, even for non-adaptive tampering [29]. Furthermore, non-adaptive continuous non-malleability for both the above families requires a special “self-destruct” capability that instructs the decoding algorithm to always output the symbol \(\bot \) (meaning “decoding error”) after the first invalid codeword is decoded, otherwise generic attacks are possible [29, 32].

An important parameter of non-malleable codes is their rate, defined as the asymptotic ratio between the length of the message to the length of its encoding, as the message length goes to infinity. The optimal rate is one, whereas a code has rate zero if the length of the codeword is super-linear in the length of the message. Non-malleable codes with optimal rate for bit-wise independent tampering [8] (with information-theoretic security) and split-state tampering [1] (with computational security), were recently constructed. To the best of our knowledge, however, the achievable rate for continuously non-malleable codes for the same families is poorly understood.

1.2 Our Contributions

In this paper, we make significant progress towards characterizing the achievable rate for continuously non-malleable codes in the bit-wise independent and split-state tampering model.

Split-State Tampering. In Sect. 3, we give three constructions of continuously non-malleable codes in the split-state model, with a natural trade-off in terms of efficiency, security, and assumptions. In particular, we show:

Theorem 1

(Informal). There exists a continuously non-malleable code in the split-state model in the following settings:

  1. (i)

    With rate 1 and with security against non-adaptive tampering in the common reference string model, assuming collision-resistant hash functions and non-interactive zero-knowledge proofs.

  2. (ii)

    With rate 1/2 and with computational security against adaptive tampering in the common reference string model, assuming collision-resistant hash functions and non-interactive zero-knowledge proofs.

  3. (iii)

    With rate 1 and with computational security against adaptive tampering in the non-programmable random oracle model.

Recall that computational security is inherent for continuous non-adaptive non-malleability in the split-state model, even in the random oracle model.

Bit-Wise Independent Tampering. In Sect. 4, we show a similar result for the case of bit-wise independent tampering, unconditionally:

Theorem 2

(Informal). There exists a rate-one continuously non-malleable code against bit-wise independent tampering, achieving information-theoretic security against adaptive tampering in the plain model.

From a technical perspective, the above theorems are proved by exhibiting so-called rate compilers. A rate compiler is a black-box transformation from a rate-zero non-malleable code \(\varSigma \) for some family \(\mathcal {F}\) into a non-malleable code \(\varSigma '\) for the same family and with improved rate. In fact, we show that the rate compilers constructed in [1, 8] already work, with some tweaks, in the continuous case. We stress, however, that while the constructions we analyze are similar to previous work, our security proofs differ significantly from the non-continuous case, and require several new ideas. We refer the reader directly to Sects. 3 and 4 for an overview of the main technical challenges we had to overcome.

1.3 Related Work

Several constructions of non-malleable codes for bit-wise [4, 7, 8, 17, 19, 24] and split-state [1,2,3, 5, 6, 12, 16, 20, 23,24,25,26, 29, 39, 40, 43] tampering appear in the literature; out of those, only a few achieve continuous non-malleability [6, 17, 19, 26, 29, 37, 43].Footnote 1 Non-malleable codes also exist for a plethora of alternative models, including bit-wise tampering composed with permutations [7, 8, 16], circuits of polynomial size [15, 24, 31], constant-state tampering [4, 14, 38], block-wise tampering [11], functions with few fixed points and high entropy [37], space-bounded algorithms [10, 28], and bounded-depth circuits [9, 13].

The capacity (i.e., the best achievable rate) of information-theoretic non-malleable coding was first studied by Cheraghchi and Guruswami [15], who established that \(1-\alpha \) is the maximum rate for function families which are only allowed to tamper the first \(\alpha n\) bits of the codeword. This translates into a lower bound of 1/2 for the case of split-state tampering, and we also know that computational assumptions, in particular one-way functions, are necessary to go beyond the 1/2 barrier [1].

Non-malleable codes find applications to cryptography, in particular for protecting arbitrary cryptographic primitives against related-key attacks [24]. In this context, continuous non-malleability is a plus [29, 30]. Additional applications include constructions of non-malleable commitments [34], interactive proof systems [33], and domain extenders for public-key non-malleable encryption [17, 19, 41] and commitments [7].

2 Preliminaries

2.1 Notation

For a string x, we denote respectively its length by |x| and the i-th bit by \(x_i\); if \(\mathcal {X}\) is a set, \(|\mathcal {X}|\) represents the number of elements in \(\mathcal {X}\). When x is chosen randomly in \(\mathcal {X}\), we write . When \(\mathsf {A}\) is an algorithm, we write to denote a run of \(\mathsf {A}\) on input x and output y; if \(\mathsf {A}\) is randomized, then y is a random variable and \(\mathsf {A}(x;r)\) denotes a run of \(\mathsf {A}\) on input x and randomness r. An algorithm \(\mathsf {A}\) is probabilistic polynomial-time (PPT) if \(\mathsf {A}\) is randomized and for any input \(x,r\in \{0,1\}^*\) the computation of \(\mathsf {A}(x;r)\) terminates in a polynomial number of steps (in the size of the input). Given two strings \(x,y\in \{0,1\}^n\), we define the Hamming distance \(\Delta (x,y):= \sum _{i\in [n]} (x_i + y_i \mod 2)\), where the sum is over the integers.

Negligible Functions. We denote with \(\lambda \in \mathbb {N}\) the security parameter. A function \(\nu :\mathbb {N}\rightarrow [0,1]\) is negligible in the security parameter (or simply negligible) if it vanishes faster than the inverse of any polynomial in \(\lambda \). We sometimes write \(\nu (\lambda )\in \mathtt {negl}(\lambda )\) to denote that \(\nu (\lambda )\) is negligible.

Random Variables. For a random variable \(\mathbf {X}\), we write \(\mathbb {P}\left[ \mathbf {X} = x\right] \) for the probability that \(\mathbf {X}\) takes on a particular value \(x\in \mathcal {X}\) (with \(\mathcal {X}\) being the set where \(\mathbf {X}\) is defined). The statistical distance between two random variables \(\mathbf {X}\) and \(\mathbf {X}'\) defined over the same set \(\mathcal {X}\) is defined as \(\mathbb {SD}\left( \mathbf {X};\mathbf {X}'\right) = \tfrac{1}{2}\sum _{x\in \mathcal {X}}|\mathbb {P}\left[ \mathbf {X} = x\right] - \mathbb {P}\left[ \mathbf {X}' = x\right] |\).

Given two ensembles \(\mathbf {X} = \{\mathbf {X}_\lambda \}_{\lambda \in \mathbb {N}}\) and \(\mathbf {Y} = \{\mathbf {Y}_\lambda \}_{\lambda \in \mathbb {N}}\), we write \(\mathbf {X} \equiv \mathbf {Y}\) to denote that they are identically distributed, \(\mathbf {X} \approx _s \mathbf {Y}\) to denote that they are statistically close, i.e. \(\mathbb {SD}\left( \mathbf {X}_\lambda ;\mathbf {X}'_\lambda \right) \in \mathtt {negl}(\lambda )\), and \(\mathbf {X} \approx _c \mathbf {Y}\) to denote that they are computationally indistinguishable, i.e., for all PPT distinguishers \(\mathsf {D}\):

2.2 Non-malleable Codes

We start by recalling the standard notion of a coding scheme in the common reference string (CRS) model.Footnote 2

Definition 1

(Coding scheme). Let \(k(\lambda ) = k\in \mathbb {N}\) and \(n(\lambda ) = n\in \mathbb {N}\) be functions of the security parameter \(\lambda \in \mathbb {N}\). A \((k,n)\)-code is a tuple of algorithms \(\varSigma = (\mathsf {Init},\mathsf {Enc},\mathsf {Dec})\) specified as follows: (1) The randomized algorithm \(\mathsf {Init}\) takes as input the security parameter \(\lambda \in \mathbb {N}\), and outputs a CRS \(\omega \in \{0,1\}^{p(\lambda )}\), where \(p(\lambda )\in \mathtt {poly}(\lambda )\); (2) The randomized algorithm \(\mathsf {Enc}\) takes as input a value \(s\in \{0,1\}^k\), and outputs a codeword \(c\in \{0,1\}^n\); (3) The deterministic decoding algorithm \(\mathsf {Dec}\) takes as input a codeword \(c\in \{0,1\}^n\), and outputs a value \(s\in \{0,1\}^k\cup \{\bot \}\) (where \(\bot \) denotes an invalid codeword).

We say that \(\varSigma \) satisfies correctness if for all \(\omega \in \{0,1\}^{p(\lambda )}\) output by \(\mathsf {Init}(1^\lambda )\), and for all values \(s\in \{0,1\}^k\) the following holds: \(\mathbb {P}[\mathsf {Dec}(\omega ,\mathsf {Enc}(\omega ,s))=s] = 1\).

An important parameter of a coding scheme is its rate, i.e. the asymptotic ratio of the length of a message to the length of its encoding (in bits), as the message length increases to infinity. More formally, \(\rho (\varSigma ) := \inf _{\lambda \in \mathbb {N}}\lim _{k\rightarrow \infty }\tfrac{k(\lambda )}{n(\lambda )}\). The best rate possible is 1; if the length of the encoding is super-linear in the length of the message, the rate is 0.

Non-malleability. Let \(\mathcal {F}\) be a family of functions \(\mathcal {F}:= \{f:\{0,1\}^n\rightarrow \{0,1\}^n\}\). The notion of \(\mathcal {F}\)-non-malleability [24] captures the intuition that any modification of a given target encoding via functions \(f\in \mathcal {F}\) yields a codeword that either decodes to the same message as the original codeword, or to a completely unrelated value.

The definition below formalizes the above intuition in a more general setting where non-malleability is required to hold against (fully adaptive) adversaries that can maul the original encoding several times. This is often referred to as continuous non-malleability [29]. Roughly speaking, security is defined by comparing two experiments (cf. Fig. 1). In the “real experiment”, the adversary tampers continuously with a target encoding of a chosen message (possibly dependent on the CRS);Footnote 3 for each tampering attempt, represented by a function \(f\in \mathcal {F}\), the adversary learns the outcome corresponding to the decoding of the modified codeword. In the “simulated experiment”, the view of the adversary is faked by a simulator which is completely oblivious of the message being encoded; importantly, the simulator is allowed to return a special symbol \(\diamond \) meaning that (it believes) the tampering function yields a modified codeword which decodes to the original message. Both experiments self-destruct upon the first occurrence of \(\bot \), i.e., they answer all subsequent queries by \(\bot \).

Fig. 1.
figure 1

Experiments defining continuously non-malleable codes. The self-destruct command causes the tamper oracles \(\mathcal {O}_\mathsf{maul}\) and \(\mathcal {O}_\mathsf{sim}\) to return \(\bot \) on all subsequent queries.

Definition 2

(Continuous non-malleability). Let \(\varSigma = (\mathsf {Init},\mathsf {Enc},\mathsf {Dec})\) be a \((k,n)\)-code in the CRS model. We say that \(\varSigma \) is continuously \(\mathcal {F}\)-non-malleable if for all PPT adversaries \(\mathsf {A}:= (\mathsf {A}_0,\mathsf {A}_1)\) there exists a simulator \(\mathsf {S}:= (\mathsf {S}_0,\mathsf {S}_1)\) such that

$$ \left\{ \mathbf {Real}_{\varSigma ,\mathsf {A},\mathcal {F}}(\lambda )\right\} _{\lambda \in \mathbb {N}} \approx _{\text {c}}\left\{ \mathbf {Simu}_{\mathsf {S},\mathsf {A},\mathcal {F}}(\lambda )\right\} _{\lambda \in \mathbb {N}}, $$

where the experiments \(\mathbf {Real}_{\varSigma ,\mathsf {A},\mathcal {F}}(\lambda )\) and \(\mathbf {Simu}_{\mathsf {S},\mathsf {A},\mathcal {F}}(\lambda )\) are defined in Fig. 1.

Remark 1

(Non-adaptive tampering). We model non-adaptive tampering by allowing the adversary \(\mathsf {A}_1\) to submit a single query \((f_j)_{j\in [q]}\) to the oracle \(\mathcal {O}_\mathsf{maul}\), for some polynomial \(q(\lambda )\in \mathtt {poly}(\lambda )\). Upon input such a query, the oracle computes \(\tilde{c}_j = f_j(c)\), and returns \(\tilde{s}_j = \mathsf {Dec}(\omega ,\tilde{c}_j)\) for all \(j\in [q]\) (up to self-destruct). In this case, we say that \(\varSigma \) is non-adaptively continuously \(\mathcal {F}\)-non-malleable.

Tampering Families. We are particularly interested in the following tampering families.

  • Split-state tampering: This is the family of functions \(\mathcal {F}_\mathsf{split}^{n_0,n_1} := \{(f_0,f_1):f_0:\{0,1\}^{n_0}\rightarrow \{0,1\}^{n_0},f_1:\{0,1\}^{n_1}\rightarrow \{0,1\}^{n_1}\}\), for some fixed \(n_0(\lambda ) = n_0\in \mathbb {N}\) and \(n_1(\lambda ) = n_1\in \mathbb {N}\) such that \(n_0 + n_1 = n\). Given an input codeword \(c= (c_0,c_1)\), tampering with a function \((f_0,f_1)\in \mathcal {F}_\mathsf{split}^{n_0,n_1}\) results in a modified codeword \(\tilde{c} = (f_0(c_0),f_1(c_1))\), where \(c_0\) (resp., \(c_1\)) consists of the first \(n_0\) (resp., the last \(n_1\)) bits of \(c\).

  • Bit-wise independent tampering: This is the family of functions \(\mathcal {F}_\mathsf{bit}^n:= \{(f_{1},\ldots ,f_{n}):\forall i\in [n], f_{i}:\{0,1\}\rightarrow \{0,1\}\}\). Given an input codeword \(c= (c_{1},\ldots ,c_{n})\), tampering with a function \(f\in \mathcal {F}_\mathsf{bit}^n\) results in a modified codeword \(\tilde{c} = (f_{1}(c_{1}),\ldots ,f_{n}(c_{n}))\), where each \(f_{i}\) is any of the following functions: (i) \(f_{i}(x) = x\) (\(\mathtt {keep}\)); (ii) \(f_{i}(x) = 1\oplus x\) (\(\mathtt {flip}\)); (iii) \(f_{i}(x) = 0\) (\(\mathtt {zero}\)); (iv) \(f_{i}(x) = 1\) (\(\mathtt {one}\)).

2.3 Authenticated Encryption

A secret-key encryption (SKE) scheme is a tuple of algorithms \(\varPi := (\mathsf {KGen},\mathsf {AEnc},\mathsf {ADec})\) specified as follows: (1) The randomized algorithm \(\mathsf {KGen}\) takes as input the security parameter \(\lambda \in \mathbb {N}\), and outputs a uniform key ; (2) The randomized algorithm \(\mathsf {AEnc}\) takes as input a key \(\kappa \in \{0,1\}^d\), a message \(\mu \in \{0,1\}^k\), and outputs a ciphertext \(\gamma \in \{0,1\}^m\); (3) The deterministic algorithm \(\mathsf {ADec}\) takes as input a key \(\kappa \in \{0,1\}^d\), a ciphertext \(\gamma \in \{0,1\}^m\), and outputs a value \(\mu \in \{0,1\}^k\cup \{\bot \}\) (where \(\bot \) denotes an invalid ciphertext). The values \(d(\lambda ),k(\lambda ),m(\lambda )\) are all polynomials in the security parameter \(\lambda \in \mathbb {N}\), and sometimes we call \(\varPi \) a \((d,k,m)\)-SKE scheme.

We say that \(\varPi \) meets correctness if for all \(\kappa \in \{0,1\}^d\), and all messages \(\mu \in \{0,1\}^k\), we have that \(\mathbb {P}\left[ \mathsf {ADec}(\kappa ,\mathsf {AEnc}(\kappa ,\mu )) = \mu \right] = 1\) (over the randomness of \(\mathsf {AEnc}\)). As for security, an authenticated SKE scheme should satisfy two properties (see below for formal definitions). The first property, usually known as semantic security, says that it is hard to distinguish the encryptions of any two (adversarially chosen) messages. The second property, usually called authenticity, says that, without knowing the secret key, it is hard to produce a valid ciphertext (i.e., a ciphertext that does not decrypt to \(\bot \)).

Fig. 2.
figure 2

Experiments defining security of SKE.

Definition 3

(Security of SKE). We say that \(\varPi = (\mathsf {KGen},\mathsf {AEnc},\mathsf {ADec})\) is a secure authenticated SKE scheme if the following holds for the games of Fig. 2:

  • For all PPT adversaries \(\mathsf {A}\), we have ;

  • \(\left\{ \mathbf {G}^\mathsf{ind}_{\varPi ,\mathsf {A}}(\lambda ,0)\right\} _{\lambda \in \mathbb {N}} \approx _{\text {c}}\left\{ \mathbf {G}^\mathsf{ind}_{\varPi ,\mathsf {A}}(\lambda ,1)\right\} _{\lambda \in \mathbb {N}}\).

Note that since both authenticity and semantic security are one-time properties, in principle, information-theoretic constructions with such properties are possible when \(d\le k\). However, we are interested in constructions where \(k> d\), for which the existence of one-way functions is a necessary assumption.

2.4 Error-Correcting Sharing Schemes

Intuitively, an error-correcting sharing scheme is an error-correcting code satisfying some form of privacy.

Definition 4

(Error-correcting sharing scheme). A \((k,n,T,D)\) error correcting sharing scheme (ECSS) is a triple of algorithms \((\mathsf {Enc},\mathsf {Dec},\mathsf {ECorr})\), where \(\mathsf {Enc}: \{0,1\}^k\rightarrow \{0,1\}^n\) is probabilistic, \(\mathsf {Dec}: \{0,1\}^n\rightarrow \{0,1\}^k\), and \(\mathsf {ECorr}: \{0,1\}^n\rightarrow \{0,1\}^n\cup \{\bot \}\), with the following properties:

  • Correctness: For all \(s \in \{0,1\}^k\), \(\mathsf {Dec}(\mathsf {Enc}(s)) = 1\) with probability 1 (over the randomness of \(\mathsf {Enc}\)).

  • Privacy: For all \(s \in \{0,1\}^k\), any subset of up to \(T\) bits of \(\mathsf {Enc}(s)\) are distributed uniformly and independently (over the randomness of \(\mathsf {Enc}\)).

  • Distance: Any two codewords in the range of \(\mathsf {Enc}\) have Hamming distance at least \(D\).

  • Error correction: For any codeword c in the range of \(\mathsf {Enc}\) and any \(\tilde{c} \in \{0,1\}^n\), \(\mathsf {ECorr}(\tilde{c}) = c\) if their Hamming distance is less than \(D/2\), and \(\mathsf {ECorr}(\tilde{c}) = \bot \) otherwise.

3 Split-State Tampering

In this section, we study several rate-optimizing compilers for continuously non-malleable codes in the split-state setting. As a starting point, in Sect. 3.1, we prove that, under certain assumptions on the initial rate-zero code, the compiler of Aggarwal et al. [1] actually achieves continuous security against non-adaptive tampering. Unfortunately, as we show in the full version [18], the limitation of non-adaptive security is inherent for this particular construction.

Motivated by this limitation, we propose two variants of the rate compiler from [1] that guarantee continuous security in the presence of adaptive tampering attacks. The first variant, which is described in Sect. 3.2, achieves rate 1/2. The second variant, which is described in Sect. 3.3, achieves rate one in the (non-programmable) random oracle model.

3.1 Rate-One Compiler (Non-adaptive Tampering)

Let \(\varSigma = (\mathsf {Init},\mathsf {Enc},\mathsf {Dec})\) be a rate-zero \((d,n)\)-code, and \(\varPi = (\mathsf {KGen},\mathsf {AEnc},\mathsf {ADec})\) be a \((d,k,m)\)-SKE scheme. Consider the following construction of a \((k,n')\)-code \(\varSigma ' = (\mathsf {Init}',\mathsf {Enc}',\mathsf {Dec}')\), where \(n' := m+n\).

  • \(\mathsf {Init}'(1^\lambda )\): Upon input \(\lambda \in \mathbb {N}\), return the same as \(\mathsf {Init}(1^\lambda )\).

  • \(\mathsf {Enc}'(\omega ,s)\): Upon input \(\omega \) and a value \(s\in \{0,1\}^k\), sample , compute and ; return \(c' := c||\gamma \).

  • \(\mathsf {Dec}'(\omega ,c')\): Parse \(c' := c||\gamma \), and let \(\tilde{\kappa } = \mathsf {Dec}(\omega ,c)\). If \(\tilde{\kappa } = \bot \), return \(\bot \) and self-destruct; else let \(\tilde{s}= \mathsf {ADec}(\tilde{\kappa },\gamma )\). If \(\tilde{s}=\bot \), return \(\bot \) and self-destruct; else return \(\tilde{\mu }\).

Roughly speaking, the compiler uses the underlying (rate-zero) code to encode a uniform key for the authenticated encryption scheme; such a key is then used to encrypt the message, and the resulting ciphertext is appended to the encoding of the key. The decoding algorithm, naturally decodes the encoding of the key, and hence uses the resulting key to decrypt the ciphertext.

Augmented Continuous Non-malleability. Assume that \(\varSigma \) is non-malleable in the split-state setting, where the encoding \(c\) is split in two halves \(c_0\) and \(c_1\) (consisting of \(n_0\) and \(n_1\) bits, respectively) that can be modified arbitrarily (yet independently). Intuitively, we would like to show that \(\varSigma '\) is continuously non-malleable against the class of split-state functions that modifies \(c_0' := c_0\) and \(c_1' := (c_1,\gamma )\) independently.

The difficulty, originally observed in [1], is that, although \((c_0,c_1)\) is a non-malleable encoding of \(\kappa \) (as long as \(c_0\) and \(c_1\) are mauled independently), the adversary could attempt to (independently) modify \(c_1'\) and \(c_0'\) yielding shares \(\tilde{c}_1' := (\tilde{c}_1,\tilde{\gamma })\) and \(\tilde{c}_0'\) such that \((\tilde{c}_0,\tilde{c}_1)\) decodes to a key \(\tilde{\kappa }\) which is unrelated to \(\tilde{\kappa }\), yet decrypting \(\tilde{\gamma }\) with \(\tilde{\kappa }\) results in a message \(\tilde{s}\) that is related to \(s\).

A similar difficulty, of course, appears in the continuous setting. In order to overcome this obstacle, inspired by the approach taken in [1], we define a notion of augmented continuous non-malleability. Such a notion is a stronger form of continuous non-malleability where, in the “real experiment” after \(\mathsf {A}\) is done with tampering queries, it is additionally given one share of the original encoding (say, \(c_1\)). In turn, the “ideal experiment” features a sort of “canonical” simulator \(\mathsf {S}\) that at the beginning of the simulation computes an encoding \(\hat{c} := (\hat{c}_0,\hat{c}_1)\) of, say, the all-zero string; hence, the dummy encoding \(\hat{c}\) is used to answer tampering queries from \(\mathsf {A}\), and, after the adversary is done with tampering queries, the simulator returns \(\hat{c}_1\) to \(\mathsf {A}\). The formal definition appears below.

Fig. 3.
figure 3

Experiments defining augmented continuously non-malleable codes.

Definition 5

(Augmented continuous non-malleability). Let \(\varSigma = (\mathsf {Init},\mathsf {Enc},\mathsf {Dec})\) be a \((k,n)\)-code in the CRS model, and let \(n_0(\lambda ) = n_0\in \mathbb {N}\) and \(n_1(\lambda )=n_1\in \mathbb {N}\) be such that \(n= n_0+n_1\). We say that \(\varSigma \) is augmented continuously \(\mathcal {F}_\mathsf{split}^{n_0,n_1}\)-non-malleable if for all PPT adversaries \(\mathsf {A}:= (\mathsf {A}_0,\mathsf {A}_1,\mathsf {A}_2)\) there exists a simulator \(\mathsf {S}:= (\mathsf {S}_0,\mathsf {S}_1)\) such that

(1)

where the experiments \(\mathbf {Real}_{\varSigma ,\mathsf {A},\mathcal {F}_\mathsf{split}^{n_0,n_1}}^{+}\) and \(\mathbf {Simu}_{\mathsf {S},\mathsf {A},\mathcal {F}_\mathsf{split}^{n_0,n_1}}^{+}\) are defined in Fig. 3.

Security Analysis. In the full version [18], we prove the following result.

Theorem 3

Assume that \(\varSigma \) is an augmented continuously \(\mathcal {F}_\mathsf{split}^{n_0,n_1}\)-non-malleable \((d,n)\)-code, and that \(\varPi \) is a secure authenticated \((d,k,m)\)-SKE scheme. Then \(\varSigma '\) as defined in Sect. 3.1 is a non-adaptively continuously \(\mathcal {F}_\mathsf{split}^{n_0,n_1+m}\)-non-malleable \((k,m+n)\)-code.

Remark 2

Similarly to [1], the analysis actually shows that the code \(\varSigma '\) also preserves augmented continuous non-malleability (and not just continuous non-malleability). However, since our goal is to construct continuously non-malleable codes (in the standard sense), we do not give the proof for the augmented case.

We also stress that it suffices to start from an augmented code \(\varSigma '\) that is non-adaptively continuously non-malleable. However, we rely on the stronger assumption of full adaptivity in order to simplify the exposition, and because, looking ahead, our instantiation from Sect. 5.1 achieves this property.

Proof Intuition. We sketch the main ideas behind the security proof. We need to describe a simulator \(\mathsf {S}'\) that can emulate arbitrary non-adaptive split-state tampering with a target encoding \(c' := (c_0,(c_1,\gamma ))\) of a message \(s\), without knowing \(s\). Roughly, \(\mathsf {S}'\) does the following.

  • At the beginning, run the simulator \(\mathsf {S}_0^+\) of the underlying augmented non-malleable code, obtaining a fake CRS \(\omega \) and a simulated right share \(\hat{c}_1\).

  • Sample a key \(\kappa \) for the authenticated encryption scheme, and define \(\gamma \) as an encryption of \(0^k\) under the sampled key.

  • Upon receiving a sequence of non-adaptive tampering queries \((f'_{0,j},f'_{1,j})_{j\in [q]}\) behave as follows for each \(j\in [q]\):

    • Invoke the simulator \(\mathsf {S}_1^+\) of the underlying augmented non-malleable code upon \((f'_{0,j},f'_{1,j},\hat{c}_1)\), obtaining a simulated decoded key \(\tilde{\kappa }_j\in \{\diamond ,\bot \}\cup \{0,1\}^d\).

    • Compute the mauled ciphertext \(\tilde{\gamma }_j\) by applying \(f'_{1,j}\) on \((\hat{c}_1,\gamma )\).

  • For each key \(\tilde{\kappa }_j\):

    • If \(\tilde{\kappa }_j = \bot \) set \(\tilde{s}_j := \bot \).

    • Else if \(\tilde{\kappa }_j = \diamond \), set \(\tilde{s}_j := \bot \) in case \(\tilde{\gamma }_j\) is different from the original ciphertext \(\gamma \), and otherwise set \(\tilde{s} := \diamond \).

    • Else set \(\tilde{s}_j\) as the decryption of \(\tilde{\gamma }_j\) under \(\tilde{\kappa }_j\).

    • Simulate a self-destruct by taking the minimum index \(j^*\) such that either \(\tilde{\kappa }_{j^*} = \bot \) or \(\tilde{s}_{j^*} = \bot \), and overwrite all values \(\tilde{s}_{j^*+1},\ldots ,\tilde{s}_q\) with \(\bot \).

  • Return \(\tilde{s}_1,\ldots ,\tilde{s}_q\).

In order to prove that the above simulation is indeed correct, we define a sequence of hybrid experiments starting with the real experiment (where the adversary \(\mathsf {A}'\) tampers non-adaptively with a target encoding computed using \(\varSigma '\)) and ending with the ideal experiment (where the above simulator is used to answer \(\mathsf {A}'\)’s tampering queries). In the first hybrid, we change the way a non-adaptive tampering query \((f'_{0,j},f'_{1,j})_{j\in [q]}\) is answered. In particular, given each \((f'_{0,j},f'_{1,j})\), we run the augmented simulator \(\mathsf {S}_1^+\) upon \((f_{0,j},f_{1,j})\), where \(f_{0,j}\) is identical to \(f'_{0,j}\), whereas \(f_{1,j}\) is obtained by hard-wiring the ciphertext \(\gamma \) (encrypting the real message \(s\)) into \(f'_{1,j}\). This allows us to get a mauled key \(\tilde{\kappa }_j\) that is then used to decrypt the ciphertext \(\tilde{\gamma }_j\) defined by applying the function \(f'_{1,j}\) on \((\hat{c}_1,\gamma )\), where \(\hat{c}_1\) is the right share of an encoding produced at the beginning of the experiment by running the augmented simulator \(\mathsf {S}_0^+\).

The most interesting part of the proof is to show that the real experiment and the above hybrid are computationally indistinguishable; here, the augmented non-malleability of the underlying code \(\varSigma \) plays a crucial role. For the purpose of this proof sketch, we only focus on this particular step of the proof, and refer the reader to the full proof for the analysis of the other hybrids. The main challenge is to reduce the attacker \(\mathsf {A}'\) against \(\varSigma '\) to an attacker \(\mathsf {A}\) against \(\varSigma \). In fact, the attacker \(\mathsf {A}'\) expects to attack a target encoding of the form \((c_0,(c_1,\gamma ))\), whereas the attacker \(\mathsf {A}\) can only tamper with \((c_0,c_1)\). This issue is resolved by having \(\mathsf {A}\) encrypt the value \(s\) chosen by \(\mathsf {A}'\) under a uniformly random key \(\kappa \) for the authenticated encryption, and by mapping each pair of tampering functions \((f'_{0,j},f'_{1,j})\) into a pair \((f_{0,j},f_{1,j})\) such that \(f_{0,j} := f'_{0,j}\) and \(f_{1,j}(\cdot ) := f'_{1,j}(\cdot ,\gamma )\) (i.e., the ciphertext \(\gamma \) is hard-wired into the right tampering function).

The above trick allows the reduction to obtain a mauled key \(\tilde{\kappa }_j\in \{\diamond ,\bot \}\cup \{0,1\}^d\) that is either distributed as in the real experiment (where decoding takes place) or as in the hybrid experiment (where the augmented simulator \(\mathsf {S}_1^+\) is used). Unfortunately, this information alone is not sufficient to complete the simulation; in fact, the reduction would need to use the key \(\tilde{\kappa }_j\) to decrypt the mauled ciphertext \(\tilde{\gamma }_j\) which is obtained by applying the function \(f'_{1,j}\) upon input the ciphertext \(\gamma \) and either the real share \(c_1\) (in the real experiment) or the simulated share \(\hat{c}_1\) (in the hybrid experiment). Now, if \(\mathsf {A}'\) were fully adaptive, the reduction would get to know the right share of the encoding only after the last tampering query, which makes it difficult to complete the reduction. Here is where we rely on the fact that tampering is non adaptive, as in this case \(\mathsf {A}'\) specifies all functions \((f'_{0,j},f'_{1,j})_{j\in [q]}\) in one go, which in turn allows \(\mathsf {A}\) to specify \((f_{0,j},f_{1,j})_{j\in [q]}\) as defined above, obtain all values \((\tilde{\kappa }_j)_{j\in [q]}\) together with the right share (i.e., either \(c_1\) or \(\hat{c}_1\)), compute the ciphertexts \((\tilde{\gamma }_j)_{j\in [q]}\), and finally complete the simulation.

3.2 Rate-1/2 Compiler (Adaptive Tampering)

We now explain how to slightly modify the compiler from Sect. 3.1 in order to get adaptive security, at the price or reducing the rate of the compiled code to 1/2. The main difference is that the authenticated ciphertext \(\gamma \) is stored in both halves of the target codeword, i.e. a codeword is now a tuple \((c_0||\gamma _0,c_1||\gamma _1)\) where \(\gamma _0=\gamma _1:=\gamma \), and the decoding algorithm additionally checks that, indeed, the two ciphertexts \(\gamma _0,\gamma _1\) are the same.

Intuitively, an adaptive adversary cannot store useful information about the inner encoding \(c_1\) in the part of the codeword that stores \(\gamma _1\). The idea is that in such a case, the same information must be guessed on the other side and overwritten in \(\tilde{\gamma }_1\), as otherwise the decoding algorithm would output \(\bot \) with consequent self-destruct; but then the adversary could have guessed this information directly, even without the need of a tampering oracle.

Note that the adversary might still be able to learn some partial information about the inner encoding, however, we show that this is not a problem as long as the underlying rate-0 continuously non-malleable code satisfies the additional property of being leakage resilient [5, 29, 40]. (Augmented non-malleability is not required here.) We defer the formal analysis to the full version of this paper [18].

3.3 Rate-One Compiler (Adaptive Tampering)

We give yet another twist of the rate-optimizing compiler from Sect. 3.1, in order to achieve optimal rate in the (non-programmable) random oracle model. The main idea is to store the ciphertext \(\gamma \) on one share of the codeword, say the right share, as before, and to add the hash of \(\gamma \) on the left share. Specifically, a codeword is now a tuple \((c_0\Vert h,c_1\Vert \gamma )\) where \(h=H(\gamma )\), and the decoding additionally checks that indeed the value h is equal to \(H(\gamma )\). The intuition is that having \(H(\gamma )\) in one share is equivalent to having \(\gamma \) itself, as in the random oracle model the value \(H(\gamma )\) can be seen as a “handle” for the value \(\gamma \).

Non-malleability in the Random Oracle Model. We start by explaining what it means to construct a continuously non-malleable code in the (non-programmable) random oracle model. First, the construction itself might make use of the random oracle, so that a code is now a tuple \(\varSigma =(\mathsf {Init}^H,\mathsf {Enc}^H,\mathsf {Dec}^H)\) where all algorithms can additionally make random-oracle queries (as in the code sketched above). Second, the adversary \(\mathsf {A}\) is allowed to make random-oracle queries, and to specify split-state tampering functions of the form \(f:= (f_0,f_1)\), such that \(f_0\) and \(f_1\) can additionally query the random oracle.

When defining non-malleability in the random oracle model, we also assume that the simulator can query the random oracle. We restrict to simulators that simply observe the random oracle queries made by the tampering functions, but do not program them, i.e. the so-called non-programmable random oracle model.

Proof Intuition. We now give an informal argument for the security of the above construction. We do so by showing a reduction to the continuous non-malleability of the code from Sect. 3.2; in order to simplify the exposition, we sketch the analysis in the programmable random oracle model, where the reduction/simulator is further allowed to program the random oracle. In the full version of this paper [18], we give a (slightly more complicated) direct proof that does not require to program the random oracle.

Let \(\mathsf {A}\) be an adaptive adversary against the security of the rate-one code; we build an adversary \(\mathsf {B}\) against the security of the rate-1/2 code. Adversary \(\mathsf {B}\) simply emulates \(\mathsf {A}\), keeping a list \(\mathcal {Q}_{H,\mathsf {A}}\) of all the random-oracle queries made by \(\mathsf {A}\). Upon input a split-state tampering query \((f_0,f_1)\) from \(\mathsf {A}\), adversary \(\mathsf {B}\) specifies its own tampering function \((f'_0,f'_1)\) as follows:

  • Compute , then execute \(f_0(c_0\Vert h)\).

  • Keep a list \(\mathcal {Q}_{H,f}\) of all the queries made by \(f_0\) to the random oracle.

  • Eventually, \(f_0\) outputs \((\tilde{c}_0\Vert \tilde{h})\), try to find a value \(\tilde{\gamma }\in \mathcal {Q}_{H,\mathsf {A}} \cup \mathcal {Q}_{H,f}\) such that \(H(\tilde{\gamma })=\tilde{h}\); if such value is found output \((\tilde{c}_0\Vert \tilde{\gamma })\) else output \(\bot \).

  • Run \(f_1(c_1\Vert \gamma _1)\).

One can show that \(\mathsf {B}\) simulates almost perfectly the tampering experiment with \(\mathsf {A}\). In fact, the only bad event is when the hash of \(\tilde{\gamma }\) as computed by \(f_1\) is equal to \(\tilde{h}\), but \(\tilde{\gamma }\) has never been queried to \(H\). However, if the adversary \(\mathsf {A}\) or the tampering function \(f_0\) do not query the random oracle with \(\tilde{\gamma }\), then the bad event happens only with probability \(2^{-\lambda }\).

In the above description, we did not specify how the reduction treats random-oracle queries asked by the tampering functions \(f_0\) and \(f_1\). The latter can be done by replacing the random oracle \(H\) with the evaluation of a pseudorandom function F (with random key \(\kappa '\) sampled by the reduction) which we can hard-code in the description of \((f'_0,f'_1)\). This allows to simulate random-oracle queries consistently, but requires to program the random oracle.

4 Bit-Wise Tampering

The compiler from Sect. 3 automatically implies a rate-compiler for continuously non-malleable codes tolerating bit-wise independent tampering (as \(\mathcal {F}_\mathsf{bit}^n\subset \mathcal {F}_\mathsf{split}^{n_0,n_1,n}\)) in the computational setting. However, since continuously non-malleable codes for bit-wise tampering also exist unconditionally [19], it might be possible to obtain such codes with optimal rate in the information-theoretic setting. This section shows that this is indeed possible, by extending the analysis of the compiler from Agrawal et al. [8] to the continuous case.

4.1 Description of the Compiler

The compiler combines a low-rate continuously non-malleable code (CNMC) \(\varSigma '\) against \(\mathcal {F}_\mathsf{bit}^n\) with an error-correcting secret-sharing scheme (ECSS) \(\varPi \) with high rate (cf. Sect. 2.4). The main idea of the compiler is to carefully introduce random errors into an encoding of a message s under \(\varPi \) and record these errors in a tag \(\tau \), which is encoded with \(\varSigma '\).

Specifically, let \(\varPi = (\mathsf {Enc},\mathsf {Dec},\mathsf {ECorr})\) be a \((k,n,T,D)\)-ECSS and \(\varSigma ' = (\mathsf {Init}',\mathsf {Enc}',\mathsf {Dec}')\) be a continuously \(\mathcal {F}_\mathsf{bit}^{n'}\)-non-malleable \((k',n')\)-code. Let \(E\le n\) be a parameter to be set later. Consider the following constructionFootnote 4 of a \((k,n'')\)-code \(\varSigma '' = (\mathsf {Init}'',\mathsf {Enc}'',\mathsf {Dec}'')\), where \(n'' := n+n'\).

  • \(\mathsf {Init}''(1^\lambda )\): Upon input \(\lambda \in \mathbb {N}\), return \(\mathsf {Init}'(1^\lambda )\).

  • \(\mathsf {Enc}''(\omega ,s)\): Upon input \(\omega \) and a message \(s\in \{0,1\}^k\):

    1. (a)

      Choose a set \(\mathcal I= \{ i_1,\dots ,i_E\} \subseteq [n]\) of cardinality \(E\) and a string \(\xi = (\xi _{i_1},\dots ,\xi _{i_E}) \in \{0,1\}^E\) uniformly at random and let \(\tau =(\mathcal I,\xi )\).Footnote 5

    2. (b)

      Compute and, for , let

    3. (a)

      Compute and return .

  • \(\mathsf {Dec}''(\omega ,\tilde{c})\): Upon input \(\omega \) and \(\tilde{c}= (\tilde{c}^{(1)},\tilde{c}^{(2)})\),

    1. (a)

      Compute \(\tau ^* = \mathsf {Dec}'(\omega ,\tilde{c}^{(2)})\). If \(\tau ^* = \bot \), return \(\bot \).

    2. (b)

      Let \(a^* = \mathsf {ECorr}(\tilde{c}^{(1)})\). If \(a^*=\bot \), return \(\bot \).

    3. (c)

      Let \(\tau ^* = (\mathcal I^*,\xi ^*)\) with \(\mathcal I^* = \{ i_1,\dots ,i_E\}\) and \(\xi ^* = (\xi _{i_1}^*,\dots ,\xi _{i_E}^*)\). Define \(c^* = (c^*_1,\ldots ,c^*_n)\) as

      $$\begin{aligned} c^*_i \ = \ {\left\{ \begin{array}{ll} \xi _i^* &{} \text {if } i \in \mathcal I, \\ a_i^* &{} \text {otherwise.} \\ \end{array}\right. } \end{aligned}$$
      (2)

      If \(c^* \ne \tilde{c}^{(1)}\), output \(\bot \).

    4. (d)

      Return \(\mathsf {Dec}(a^*)\).

4.2 Security Analysis

In the full version [18], we prove the following result (cf. also Sect. 5.2 for a concrete instantiation).

Theorem 4

Let \(\varPi \) be a \((k,n,T,D)\)-ECSS with rate \(\rho = k/n\) and \(T= \omega (\log n)\), and let \(\varSigma '\) be a continuously \(\mathcal {F}_\mathsf{bit}^{n'}\)-non-malleable code with rate \(\rho '\). Then, for any \(E\) satisfying

$$ \frac{n\cdot \omega (\log n)}{D} \ = \ E\ < \ \frac{D}{4}, $$

\(\varSigma ''\) is is a continuously \(\mathcal {F}_\mathsf{bit}^{n+n'}\)-non-malleable code with rate \(\rho '' = \tfrac{k}{\rho ^{-1} k+ 2 \rho '^{-1} E}\).

Proof Intuition. We start with the real security experiment for code \(\varSigma ''\) and consider a series of hybrid experiments \(\mathbf {H}_{1},\mathbf {H}_{2},\mathbf {H}_{3}\) such that a simulation strategy for the ideal experiment is immediately apparent in \(\mathbf {H}_{3}\).

The first hybrid \(\mathbf {H}_{1}\) changes the way the tampered tag \(\tau ^*\) is computed when \(\mathcal {O}_\mathsf{maul}\) answers a tamper query \(f_{}\): Instead of computing it from a tampered encoding \(f_{}^{(2)}(c^{(2)})\), the simulator \(\mathsf {S}_{1}'\) for the underlying non-malleable code \(\varSigma '\) is invoked to determine the outcome of applying \(f_{}\). The indistinguishability of the real experiment and \(\mathbf {H}_{1}\) follows directly from the security of \(\varSigma '\).

Once the switch to \(\mathbf {H}_{1}\) has been made, the right part \(f_{}^{(2)}\) of a tamper function \(f_{} = (f_{}^{(1)},f_{}^{(2)})\) can have one of three effects on the tag \(\tau ^*\), which lead to the definition of the second hybrid \(\mathbf {H}_{2}\):

  1. 1.

    \(\tau ^* = \bot \), in which case the outcome of tampering with \(f_{}\) is \(\bot \) as well.

  2. 2.

    \(\tau ^*\) is equal to the original tag \(\tau \). Thus, if the attacker changes too many bits of the left-hand side encoding \(c^{(1)}\), the result will almost surely be \(\bot \) since the changes are likely to be inconsistent with the parts of \(c^{(1)}\) recorded in the tag and are independent of it. Correspondingly, \(\mathbf {H}_{2}\) is defined to always answer such tamper queries by \(\bot \). If there are only few changes on the left-hand side, \(\mathbf {H}_{2}\) proceeds as \(\mathbf {H}_{1}\).

  3. 3.

    \(\tau ^*\) is independent of the original tag. Thus, if the attacker overrides too few bits of \(c^{(1)}\), the random errors in \(c^{(1)}\) are highly unlikely to match the corresponding bits in \(\tau ^*\) or not to be detected by the error correction. Correspondingly, \(\mathbf {H}_{2}\) is defined to always answers such tamper queries by \(\bot \). If there are many overrides on the left-hand side, \(\mathbf {H}_{2}\) proceeds as \(\mathbf {H}_{1}\).

To show that hybrids \(\mathbf {H}_{1}\) and \(\mathbf {H}_{2}\) are indistinguishable, one first argues, drawing on an idea from [17], that for every adaptive strategy, there is an equally good non-adaptive one.Footnote 6 The advantage of non-adaptive attackers is bounded by using a simple concentration bound to argue that it is highly unlikely that the query types described above are not caught by comparing the left-hand side to the tag or by performing error correction.

Returning to the case distinction above, it remains to consider the two cases where \(\mathbf {H}_{1}\) was not changed:

  1. 1.

    Suppose \(\tau ^*\) is equal to the original tag \(\tau \) and the tamper function changes only a few bits on the left-hand side. In such a case, it can be shown that the result of the tampering is either the original message s or \(\bot \). The key observation here is that in order to determine which is the case, one needs merely to find out whether the tamper function “guesses” the bits of \(c^{(1)}\) it overrides correctly.

  2. 2.

    Suppose \(\tau ^*\) is independent of the original tag and the tamper function overrides most of the bits on the left-hand side. In this case, it can be argued that the outcome of the tampering is either \(\bot \) or a unique message, stemming from a unique encoding \(\tilde{a}\). To see which is the case, one need only determine if the positions that are not overridden by the tampering function match \(\tilde{a}\).

This process can be abstracted as a guessing game for a randomly generated encoding a of s, where the game ends in a self-destruct as soon as an incorrect guess is made. The self-destruct property allows to argue that the guessing game for a generated as an encoding for s is indistinguishable from the guessing game for, say, the all-zero message (by privacy of the ECSS). Correspondingly, hybrid \(\mathbf {H}_{3}\) is defined to work as \(\mathbf {H}_{2}\), except that it works on an encoding of the all-zero message. The indistinguishability of the hybrids follows directly from the indistinguishability of the guessing games. Since hybrid \(\mathbf {H}_{3}\) is independent of the originally encoded message, it is straight-forward to design a simulation strategy.

5 Instantiating the Compilers

5.1 Split-State Model

Rate-One Code (Non-adaptive Tampering). In order to instantiate the compiler from Sect. 3.1, we need to exhibit an augmented continuously non-malleable code in the split-state model. Below, we give a short description of such a code, highlighting the main technical challenges. We assume the reader is familiar with the concept of zero-knowledge proofs.

The Code. The encoding scheme is a variation of the code from [29]. Given a \(k\)-bit string \(s\), its encoding has the form \((c_0,c_1) = ((c_0',h_1,\pi _1),(c_1',h_0,\pi _0))\), where \(h_0\) (resp. \(h_1\)) is a collision-resistant hashing of \(c_0'\) (resp. \(c_1'\)), \(\pi _0\) (resp. \(\pi _1\)) is a NIZK proof of knowledge of a pre-image of the hash value \(h_0\) (resp. \(h_1\)), and \((c_0',c_1')\) is a leakage-resilient encoding [21] of the input.Footnote 7 The decoding algorithm first checks the validity of the proofs locally on the left and right share, and then it makes sure that \(h_0\) (resp. \(h_1\)) is indeed the hash of \(c_0'\) (resp. \(c_1'\)); if any of the checks fails, it returns \(\bot \), and else it decodes \((c_0',c_1')\) using the decoding procedure of the leakage-resilient code.

The security proof differs significantly from that of [29]. In particular, we exploit the following additional properties of the leakage-resilient code: (1) It should tolerate so-called noisy leakage [22, 27, 42], meaning that the parameter \(\ell \) is an upper bound on the average min-entropy gap induced by the leakage (and not its bit-length). (2) Indistinguishability should hold even if the distinguisher is given one of the two shares of the target codeword, at the end of the experiment; this property is the one that allows to show augmented non-malleability. (3) For all messages, the distributions corresponding to the two shares \(c_0',c_1'\) of an encoding are almost independent. Properties (2) was already used in [29], whereas properties (1) and (3) are easily seen to be met by known constructions.

Simulator. The (augmented) code simulator roughly works as follows. It starts by sampling a dummy encoding \((c_0',c_1')\) of the message \(0^k\) under the leakage-resilient code, and hence it computes the hash values \(h_0,h_1\) and simulates the zero-knowledge proofs \(\pi _0,\pi _1\); this defines a simulated codeword \((c_0,c_1) = ((c_0',h_1,\pi _1),(c_1',h_0,\pi _0))\). Thus, given a tampering query \((f_0,f_1)\), we design a special simulation strategy that outputs a candidate decoded message acting only either on \((f_0,(c_0',h_1,\pi _1))\) or on \((f_1,(c_1',h_0,\pi _0))\). Let \(\tilde{s}_0\) and \(\tilde{s}_1\) be such candidate messages. Finally, as long as \(\tilde{s}_0 = \tilde{s}_1\) the simulator outputs \(\tilde{s}_0\), and otherwise it outputs \(\bot \) and self-destructs.

Intuitively, we want to make a reduction to the security of the leakage-resilient code in order to switch the dummy encoding of \(0^k\) with an encoding of the real message. In such a reduction, the values \(\tilde{s}_0\) and \(\tilde{s}_1\) are obtained via leakage queries, and thus the main challenge is to argue that such leakage is allowed. Take for instance the left share. The main observation is that, as long as \(\tilde{s}_0 = \tilde{s}_1\), then the leakage on \(c'_0\) reveals no additional information beyond what is revealed by \(c_1'\) and the hash of \(c_0'\). In fact, since \(\tilde{s}_0 = \tilde{s}_1\), the leakage performed on \(c_0'\) could have been also performed on \(c_1'\) (as the leaked values are the same!), and furthermore, by property (3) above and by the fact that the hash is short, those values do not reduce the min-entropy of \(c_0'\) by too much. On the other hand, if \(\tilde{s}_0 \ne \tilde{s}_1\), the amount of leakage can be naivelyFootnote 8 bounded by \(2k\), but notice that this happens only once, since the simulator self-destructs after the first \(\bot \) is obtained.

Further Optimizations. Along the way, we were also able to improve the parameters w.r.t. the original proof given by [29]. In particular, the leakage parameter we require from the underlying leakage-resilient code is \(\ell '\in O(\lambda )\) instead that \(\ell '\in \Omega (\lambda \log \lambda )\) in the original proof. This improvement also yields better efficiency in terms of computational complexity for the zero-knowledge proof system (e.g., when using the Groth-Sahai proof system [35, 36]). The details are deferred to the full version of this paper.

Putting it Together. Summarizing the above discussion, assuming collision-resistant hash functions and non-interactive zero-knowledge proofs, we have obtained a rate-optimal continuously non-malleable code with computational security against non-adaptive split-state tampering in the common reference string model, as stated in item (i) of Theorem 1.

Rate-1/2 Code (Adaptive Tampering). In order to instantiate the compiler from Sect. 3.2, we need a leakage-resilient continuously non-malleable code in the split-state model. Luckily, the above construction inherits leakage resilience from the underlying leakage-resilient code.

Hence, assuming collision-resistant hash functions and non-interactive zero-knowledge proofs, we have obtained a rate-1/2 continuously non-malleable code with computational security against adaptive split-state tampering in the common reference string model, as stated in item (ii) of Theorem 1.

Rate-One Code (Adaptive Tampering). Finally, we can instantiate the compiler from Sect. 3.3 under the same assumptions of the previous code, i.e. all we need is a leakage-resilient continuously non-malleable code in the split-state model. Here, we can further simplify the above construction by relying on the random oracle heuristic, and consider codewords of the form \((c_0,c_1) = ((c_0',h_1),(c_1',h_0))\), where \(h_0\) (resp. \(h_1\)) is computed by hashing \(c_0'\) (resp. \(c_1'\)) via a random oracle. One can prove that this construction achieves (computational) continuous non-malleability in the split-state model.

Hence, we have obtained a rate-optimal continuously non-malleable code with computational security against adaptive split-state tampering in the (non-programmable) random oracle model, as stated in item (iii) of Theorem 1.

5.2 Bit-Wise Independent Model

The ECSS for the \(\mathcal {F}_\mathsf{bit}^n\)-compiler can be instantiated using share packing, as shown in [8]. This results in a \((k,n,T,D)\)-ECSS with \(T= D= \tilde{\Theta }(n^{3/4})\) and \(n= (1+o(1))k\), which in turn allows to choose, e.g., \(E= n^{1/4+\gamma }\) for any \(\gamma > 0\).

The low-rate CNMC \(\varSigma '\) can be instantiated, e.g., by the codes of [16, 19]. Note that such codes are in the plain model (i.e., algorithm \(\mathsf {Init}'\) returns the empty string), and thus Theorem 4 yields a rate-optimal continuously non-malleable code with information-theoretic security against adaptive bit-wise independent tampering, and without trusted setup, as stated in Theorem 2.

6 Conclusions

We have provided several constructions of rate-optimizing compilers for continuously non-malleable codes in the bit-wise independent and split-state tampering models. While in the former case our compiler is optimal both in terms of rate and assumptions (in fact, the result is unconditional), in the latter case we only get rate-optimal codes for the case of non-adaptive tampering and assuming trusted setup, and in the random oracle model. Thus, the main problem left open by our work is whether rate-one continuously non-malleable codes for the split-state model, with adaptive security and without random oracles, actually exist (with or without trusted setup).