A trivial debiasing scheme for Helper Data Systems

We introduce a debiasing scheme that solves the more noise than entropy problem which can occur in Helper Data Systems when the source is very biased. We perform a condensing step, similar to Index-Based Syndrome coding, that reduces the size of the source space in such a way that some source entropy is lost, while the noise entropy is greatly reduced. In addition, our method allows for even more entropy extraction by means of a ‘spamming’ technique. Our method outperforms solutions based on the one-pass and two-pass von Neumann algorithms.


Helper Data Systems
The past decade has seen a lot of interest in the field of security with noisy data.In several security applications it is necessary to reproducibly extract secret data from noisy measurements on a physical system.One such application is read-proof storage of cryptographic keys using physical unclonable functions (PUFs) [5,16,[18][19][20].Another application is the privacy-preserving storage of biometric data.
Storage of keys in nonvolatile digital memory can often be considered insecure because of the vulnerability to physical attacks.(For instance, fuses can be optically inspected with a microscope; flash memory may be removed and read out.)PUFs provide an alternative way to store keys, namely in analog form, which allows the designer to exploit the inscrutability of analog physical behavior.Keys stored in this way are sometimes referred to as Physically Obfuscated Keys (POKs) [12].In both the biometrics and the PUF/POK case, one faces the problem that some form of error correction has to be performed, but under the constraint that the redundancy data, which are visible to attackers, do not endanger the secret extracted from the physical measurement.This problem is solved by a special security primitive, the Helper Data System (HDS).A HDS in its most general form is shown in Fig. 1.The Gen procedure takes as input a measurement X .
B Boris Škorić b.skoric@tue.nl 1 TU Eindhoven, Eindhoven, Netherlands It outputs a secret S and (public) Helper Data W .The helper data are stored.In the reconstruction phase, a fresh measurement X is obtained.Typically, X is a noisy version of X , close to X (in terms of, e.g., Euclidean distance or Hamming distance) but not necessarily identical.The Rec (reconstruction) procedure takes X and W as input.It outputs Ŝ, an estimate of S. If X is not too noisy then Ŝ = S.
Two special cases of the general HDS are the Secure Sketch (SS) and the fuzzy extractor (FE) [10].The Secure Sketch has S = X (and Ŝ = X , an estimator for X ).If X is not uniformly distributed, then S is not uniform.The SS is suitable for privacy-preserving biometrics, where high entropy of S (given W ) is required, but not uniformity.The fuzzy extractor is required to have a (nearly) uniform S given W .The FE is typically used for extracting keys from PUFs and POKs.

The code offset method (COM)
By way of example we briefly present the code offset method [4,9,10,14,21], the oldest and most well-known HDS.The COM makes use of a linear code C with k-bit messages and n-bit codewords.The syndrome function is denoted as Syn : {0, 1} n → {0, 1} n−k .The code supports a syndrome decoding algorithm SynDec : {0, 1} n−k → {0, 1} n that maps syndromes to error patterns.Use is also made of a key derivation function KeyDeriv : {0, 1} n × {0, 1} * → {0, 1} , with ≤ k.Below we show how the COM can be used as a FE for either uniform or non-uniform source X .

Enrollment :
The enrollment measurement gives X ∈ {0, 1} n .The helper data are computed as W = Syn X .The key is computed as S = KeyDeriv(X , R), where R is (optional) public randomness.The W and R are stored.Reconstruction : A fresh measurement of the PUF gives X ∈ {0, 1} n .The estimator for X is computed as and the reconstructed key is Ŝ = KeyDeriv( X , R).

The problem of bias
After seeing the helper data W as specified above, an attacker has uncertainty Let us consider the simplest possible noise model, the binary symmetric channel (BSC).In the case of a BSC with bit error rate β, the code's redundancy has to satisfy n − k ≥ nh(β) in order for the error correction to work.(Here, h denotes the binary entropy function, h(β) = −β log β − (1 − β) log(1 − β), with 'log' the base 2 logarithm.)More generally, H(Syn X ) has at least to be equal to the entropy of the noise.Hence, the COM helper data in the BSC case leak at least nh(β) bits of information about X .This becomes a problem when X itself does not have much entropy, which occurs for instance if the bits in X are highly biased [13,15].Note that the problem is not fundamental: if the bias parameter (see Sect. 2) is denoted as p, the secrecy capacity is nh( p + β − 2 pβ) − nh(β), which is positive.
A solution was proposed by Maes et al. [17].Their approach is to combine debiasing with error correction.For the debiasing they use the von Neumann algorithm in a single-pass or multi-pass manner.Their helper data comprise a selection 'mask' that helps the reconstruction algorithm to identify the locations where von Neumann should yield output.
In this paper we follow a simpler approach similar to the Index-Based Syndrome (IBS) method [22].In IBS the helper data consist of an ordered list of pointers to locations in X ; the content of X in those locations together forms a codeword. 1The notation H stands for Shannon entropy.For information-theoretic concepts, see, e.g., [8].

Contributions and outline
We introduce an alternative solution to the bias problem in helper data systems.We follow the condense-then-fuzzyextract philosophy proposed by Canetti et al. [7] as one of the available options when faced with 'more noise than entropy' scenarios.Condensing means mapping the source variable X to a smaller space such that most of the entropy of X is retained, but the noise entropy is greatly reduced.Our way of condensing the source is to restrict X to the bit positions U, with U ⊂ {1, . . ., n} a random subset containing all the rare symbols.The set U becomes part of the helper data.
Our U bears some similarity to the von Neumann mask in [17], but there are important differences.(i) The size of U is tunable.(ii) We can extract source information based on the legitimate party's ability to distinguish U from fake instances of U when a 'spamming' technique similar to [21] is applied.
The outline of this paper is as follows.Section 2 gives the details of the scheme.Section 3 analyzes the extractable entropy and the practicality of the 'spamming' option.In Sect. 5 we make some remarks on the use of min-entropy.Section 6 summarizes and suggests future work.

Debiasing based on subset selection
We will use the following notation.The set {1, . . ., n} is written as [n].The notation X U means X restricted to the positions specified in U. Set difference is written as '\.'A string consisting of n zeroes is written as 0 n .The Hamming weight of X is denoted as w(X ).We will consider a source X ∈ {0, 1} n made up of i.i.d.bits X i following a Bernoulli distribution with parameter p, i.e., Pr[X i = 1] = p.Without loss of generality we take p ∈ (0, 1 2 ).In particular we are interested in the case p < β where direct application of the COM fails.The notation 'log' stands for the base-2 logarithm.Information distance (Kullback-Leibler divergence) is denoted as D( p||q) = p log p q + (1 − p) log 1− p 1−q for p, q ∈ (0, 1).

The scheme
Below we present a barebones version of our proposed scheme.We omit details concerning the protection of the stored data.There are well-known ways to protect helper data, using either Public Key Infrastructure or one-way functions [6].We also omit details that have to do with the verification of the reconstructed key.These details can be trivially added.

Systemsetup
The following system parameters are fixed.An integer u satisfying np < u < n, representing the size of U; a list length L; a pseudorandom generator f that takes as input a seed σ and a counter j, and outputs a subset f (σ, j) ⊂ [n] such that | f (σ, j)| = u; a Secure Sketch (Gen, Rec) that acts on a source in {0, 1} u and is able to handle bit error rate β; a key derivation function KDF : {0, 1} u × [L] → {0, 1} .All these parameters are public.
We will typically consider u ≥ 2np.With an exponentially small probability it may occur that w(X ) ≥ u, leading to X U = 1 u in step E2.Even if such an exceptional PUF is encountered, the scheme still works.

Explanation of the scheme
The effect of steps E4,E5 is to create a list of U-candidates, of which only entry z is correct.To an attacker (who knows u but does not know X or X ) the L candidates are indistinguishable. 3teps R3-R5 allow for a quick search to identify the index z of the correct U-candidate.Note that the reconstruc-tion algorithm compares a permuted M to L-entries instead of M to permuted L-entries; this improves speed.To further optimize for speed, steps R4 and R5 can be combined in a loop to select good z values on the fly as soon as a new L j is generated.
Note that extremely fast pseudorandom generators exist which spew out more than 8 bits per clock cycle [1,2].This makes it practical to work with large values of L, as long as not too many plausible z-candidates are generated.See Sect.3.3.Even on CPU-constrained devices 4 (clock speed of MHz order) it should be possible to achieve L = 2 10 .
We did not explicitly specify how to map a seed to a size-u subset of [n].A very straightforward algorithm would be to pick u pseudorandom locations in [n].
We did not specify an algorithm for determining the permutation π , nor did we specify in which form π is stored.These are minor details and have no impact on the overall efficiency of the scheme, since steps E5 and R3 are performed only once.The computational bottleneck is R4, R5.For details about permutations we refer to [3,11].
Note that inputting z into the key derivation function increases the entropy of S by log L bits.If the PUF has ample entropy then L = 1 suffices, and one can skip all steps involving the seed σ and the permutation π ; the U itself is stored as helper data.This yields a scheme that is very fast and implementable on resource-constrained devices.

Entropy after condensing
The Hamming weight of X carries very little information.Let us assume for the moment that T = w(X ) ∈ {0, . . ., u} is known to the attacker 5 , just to simplify the analysis.
Even if the attacker knows t and U (i.e., z), there are u t equally probable possibilities for Y .Hence, The inequality follows from Stirling's approximation.As (2) does not depend on z, the entropy H(Y |T = t) is also given by (2).A lower bound on H(Y |T ) is obtained by taking the expectation over t.This turns out to be rather messy, since the distribution of t is a truncated binomial.(It is given that t ≤ u, while originally w(X ) ∈ {0, . . ., n}).As t equals approximately np on average, the result is H(Y |T ) ≈ uh( np u ).A more precise lower bound is given in Theorem 1 below.
Theorem 1 Let δ be defined as Then, the entropy of Y given T can be lower bounded as Proof The proof is rather tedious and can be found in 'Appendix A.' The entropy of Y is obtained as follows, The H(T ) is the entropy of the truncated binomial distribution.
Theorem 2 Let q t = n t p t (1 − p) n−t denote the probabilities in the full binomial distribution.Let δ be defined as in Theorem 1.
Proof Follows directly from Theorems 1 and 2 by adding The complicated expression ( 6) can be well approximated by uh(np/u) − u/np.Note that the difference between the original source entropy H(X ) = nh( p) and the condensed form H(Y ) ≈ uh(np/u) − u/np is considerable.For example, setting n = 640, p = 0.1, u = 128 yields H(Y ) ≈ 126 and H(X ) ≈ 300.A small part of this huge difference can be regained using the trick with the entropy of Z .The practical aspects are discussed in Sect.3.3.Note also that our scheme outperforms simple von Neumann debiasing by a factor of at least 2. The von Neumann algorithm takes n/2 pairs of bits; each pair has a probability 2 p(1 − p) of generating a (uniform) output bit; hence, the extracted entropy is np(1 − p) < np.In our scheme, if we set u ≈ 2np we get H(Y ) ≈ 2np − 2. Furthermore, H(Y ) is an increasing function of u.

Fuzzy Extraction after condensing
The code offset method applied to Y ∈ {0, 1} u leaks at least uh(β) bits of information about Y .In case of a 'perfect' error-correcting code, the length of a noise-robust key reconstructed with the COM is H(Y ) − uh(β).In Fig. 2 we plot H(Y ) − uh(β) for some example parameter settings.(Note that in all cases shown here β ≥ p holds; the COM acting on the original source X would be unable to extract any entropy.)Clearly, there is an optimal u for given (n, p, β).In practice one is given p and β and has to find (n, u) such that the H(Y ) − uh(β) is large enough.

The list size L
We briefly discuss how large L can be made before the reconstruction procedure sketched in Sect. 2 starts to produce too many candidates for z.We define p = p + β − 2 pβ.
On the one hand, there is the number of '1' symbols in X U for the correct U.The number of 1's in X U is on average np.Of these, on average np(1 − β) will be a '1' in X .The (u − np) zeroes in X U will generate on average (u − np)β 1's in X .Hence, the number of 1's in X U is expected to be around np(1 On the other hand, there is the number of 1's for incorrect U-candidates.The number of 1's in X is approximately n p.We pretend that n p is integer, for notational simplicity.We denote by A ∈ {0, . . ., n p} the number of 1's in X V for a randomly chosen subset V, with |V| = u.The A follows a hypergeometric probability distribution The first expression looks at the process of selecting u out of n positions with exactly a 1's hitting the n p existing 1's in X ; the second expression looks at the process of selecting n p Fig. 2 The key length H(Y ) − uh(β) that the code offset method can reproducibly extract after the condensing step, plotted as a function of u for various parameter values.For H(Y ) the bound in Theorem 3 is used positions in X such that exactly a of them lie in V. We have E a a = u p and E a (a − u p) 2 = u p(1 − p)(n − u)/(n − 1) < u p.In other words, a is sharply peaked around u p.We can put a threshold θ somewhere in the gap between u p and np + (u − 2np)β, and declare a U-candidate V to be bad if the Hamming weight of X V is lower than θ .Let us set θ = np + (u − 2np)β − c • √ np with c sufficiently large to avoid false negatives (i.e., missing the correct U).In a way analogous to (3), we can bound the false positive probability as These bounds are obtained by applying (3) and replacing u → θ , n → u, p → p.It is important to note that it is perfectly possible to make L extremely large.Then, many false positives occur, but this is not a fundamental problem.It requires extra work: one key reconstruction and one verification (e.g., of a key hash) per false positive.Depending on the available n, the computing platform, etc., this may be a viable option.

Comparison to other debiasing schemes
The main algorithms to compare against are given in the CHES2015 paper by Maes et al. [17].They introduce several schemes that perform debiasing in the context of a helper data system: Classic Von Neumann (CVN), Pair-Output (2O-VN) and Multi-Pass Tuple Output (MP-TO-VN).The following figures of merit are important, (i) the amount of entropy retained in Y , from the original nh( p) contained in X ; (ii) the amount of work required during the reconstruction phase to derive Ŷ from X .
Here, we will not include the additional entropy obtained from Z in our scheme.The procedure for reconstructing ẑ has no equivalent in [17], which makes comparison very difficult.

Scheme
Retained entropy Reconstruction of Ŷ Take subset of X 123 The reconstruction effort is practically the same in all these schemes and is very low.
The entropy estimates are obtained as follows.The result np(1 − p) for the original von Neumann algorithm is discussed in Sect.3.1.The CVN and 2O-VN retain exactly this amount.In the second VN pass as described in [17], there are n 2 − np(1 − p) bit pairs left after the first pass, and in each bit pair the two bits have the same value.In the second pass, this gives rise to ] von Neumann comparisons.Each comparison yields an output bit with probability 2 Hence, the expected number of output bits in the second pass is Note that the two-pass MP-TO-VN adds at most 25% to the CVN entropy, while trivial debiasing adds slightly more than 100%.

Some remarks on min-entropy
One could take the point of view that the relevant quantity to study is min-entropy instead of Shannon entropy, since we are deriving cryptographic keys.The min-entropy of the source is H min (X ) = n log(1 − p) −1 , corresponding to the all-zero string.For small p this is significantly smaller than the Shannon entropy H(X ) = nh( p).On the other hand, the entropy loss is also smaller when computed in terms of min-entropy.
Theorem 4 Consider a linear binary code with message length k and codeword length n that is able to correct t errors.Let X ∈ {0, 1} n consist of i.i.d.bits that are Bernoullidistributed with parameter p < 1 2 .
The proof is given in 'Appendix C.' For codes that are far from perfect, the last term in ( 9) is negligible.However, there are strong arguments against using minentropy in the context of biased PUFs.A situation where X has a Hamming weight far below the typical value np can be seen as a hardware error and is likely to occur only when the chip itself is malfunctioning.If we condition all our probabilities on the premise that the hardware is functioning correctly, then we are back in the typical regime; there minentropy is almost identical to Shannon entropy.

Summary
We have introduced a method for source debiasing that can be used in Helper Data Systems to solve the 'more noise than entropy' problem.Our method applies the condense-thenfuzzy-extract idea [7] in a particularly simple way: the space {0, 1} n is condensed to {0, 1} u in such a way that all the rare symbols are kept; meanwhile, the noise entropy is reduced from nh(β) to uh(β).Theorem 3 gives a lower bound on the retained entropy H(Y ).Furthermore, there is the option of extracting additional entropy from the index z, which points at the real subset U among the fakes.Even in its bare form, without the fake subsets, our method outperforms basic von Neumann debiasing by factor of at least 2.
Figure 2 shows that after the condensing step the code offset method can extract significant entropy in a situation where the bare COM fails.It also shows the trade-off between the reduction of source entropy and noise entropy as u varies.
In Sect.3.3 we did a very preliminary analysis of the practicality of extracting information from the index z.More work is needed to determine how this works out for real-world parameter values n, p, β and to see how the computations in steps R4 and R5 can be optimized for speed.
The entropy analysis can be improved and extended in various ways, e.g., by considering different noise models such as asymmetric noise.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

A Proof of Theorem 1
We start with a number of definitions and supporting lemmas.We define the binomial probability q t = n t p t (1 − p) n−t .We define Δ = Pr[w(X ) > u] = n t=u+1 q t and π t = q t /(1 − Δ) for t ≤ u, such that the vector (π t ) is the probability distribution of t from the attacker's point of view.The notation E t will refer to the binomial distribution (q t ) n t=0 , while Ẽ will refer to the truncated binomial (π t ) u t=0 .

Lemma 3 It holds that
In the last step, we used Lemma 2.

Lemma 4 It holds that
In the last step, we used Lemma 2.

Lemma 5 It holds that
In the last step, we used Lemma 2.

Lemma 7 Let δ < p and u
Finally, we bound Ẽt t 2 using Lemma 5 and we bound Ẽt t using Lemma 3.

C Proof of
The max x selects the smallest weight w(x).Among the strings x ∈ {0, 1} n that have syndrome σ (a coset), the one with the lowest Hamming weight is called the coset leader.For each weight a, there are possibly multiple cosets whose leader has weight a.The coset leader weight enumerator, denoted as c a , counts how many cosets have a leader of weight a.The σ summation in the expression above can be written in terms of the c a , (26) The code can correct t errors, so for a ≤ t it holds that c a = n a .A perfect code has c a = 0 for a > t.We consider codes that are far from perfect.We split σ , which has 2 n−k terms, into a part with coset leader weights ≤ t and a part with weights > t.In the latter part, we write ( p 1− p ) w(x) ≤ ( p 1− p ) t+1 .This yields

Figure 3
shows how many bits of entropy (b ≈ − log Pr[A > θ]) can be obtained from z without running into false positives in step R5.To extract b bits of entropy, a list length L = 2 b is needed.

Figure 3 Fig. 3
Fig. 3 Plots of − log Pr[A > θ] as a function of n for β = p, u = 2.25np, and θ = np + (u − 2np)β − c √ np with c = 4.The Pr[A > θ] was obtained by numerical summation of (7).The vertical axis represents approximately the number of bits b = log L that can be extracted from z without generating false positives
When we take the expectation Ẽt , the term linear in t − Ẽt t disappears, We use Ẽt t < u/2 to bound the second occurrence of h( Ẽt • • ) < 1 and to use h > 0. For the first occurrence of h( Ẽt t u ) we use that h is an increasing function and apply Lemma 3.