1 Introduction

Multimedia fingerprinting codes are used to protect digital content from illegal copying and redistribution.

The key idea of this technique is to embed a unique signal, called watermark, into every copy, so that it can be tracked to its buyer [10, 11]. Watermarks should be able to protect the dealer from collusion attack, when a coalition of dishonest users (pirates) construct a new file, for example, by averaging their copies of the same content. By gathering a big enough coalition it is possible to sufficiently decrease the impact of each individual fingerprint, which makes it hard for the dealer to identify the pirates. In papers [2, 3] the authors propose to use separable (or signature) codes to track all members of the coalition.

A model of multimedia fingerprinting with an adversarial noise was proposed in [6], i.e. the coalition of dishonest users can add some noise to the content in order to hide their fingerprints. In [8] it was shown that there are no multimedia codes resistant to a general linear attack and an adversarial noise. However, in [7] the authors proved that for the most common case of averaging attack one can construct multimedia codes with a non-vanishing rate. We continue their research and prove a new lower bound on the rate, which has the same order as an upper bound. A detailed survey of state-of-the-art results can be found in [5].

The rest of the paper is structured as follows. In Sect. 2 we introduce the required notation and definitions and formally describe the problem. Our main result is proved in Sect. 3. Sect. 4 concludes the paper and discusses some open problems.

2 Problem statement

Vectors are denoted by bold letters, such as \(\varvec{x}\), and the ith entry is referred to as \(x_i\). The set of integers \(\{1,\,2,\,\ldots ,\,M\}\) is abbreviated by [M]. The sign \(\left\| \cdot \right\| \) stands for the Euclidean norm. A support \(supp(\varvec{x})\) of a vector \(\varvec{x}\) is a set of such coordinates i that \(x_i\ne 0\). Scalar (dot) product of vectors \(\varvec{x}\) and \(\varvec{y}\) is denoted as \(\langle \varvec{x},\,\varvec{y}\rangle \), greatest common divisor of integers a and b is referred to as (ab). For a given binary \(n\times M\) matrix H with columns \(\varvec{h}_1,\,\ldots ,\,\varvec{h}_M\) and set \(I\subset [M]\) introduce notation for a result of averaging attack

$$\begin{aligned} \sigma (H\mid I)=|I |^{-1}\sum \limits _{i\in I}\varvec{h}_i. \end{aligned}$$

A binary entropy function h(x) is defined as follows

$$\begin{aligned} h(x)=-x\log _2x - (1-x)\log _2(1-x). \end{aligned}$$

Suppose that multimedia content is represented by a vector \(\varvec{x}\in \mathbb {R}^N\), which is being sold to M users. Vector \(\varvec{x}\) is often called a host signal. To protect the content from unauthorized copying the dealer constructs a set of watermarks \(\varvec{w}_1, \,\ldots , \,\varvec{w}_M\), which are also called fingerprints. The dealer fixes n orthonormal vectors \(\varvec{f}_1, \,\ldots , \,\varvec{f}_n\) of length \(N,\,\varvec{f}_i\in \mathbb {R}^N\) and forms watermarks \(\varvec{w}_i\) as linear combinations of \(\varvec{f}_j\) with binary coefficients \(h_{ij}\in \{0,\,1\}\)

$$\begin{aligned} \varvec{w}_i=\sum \limits _{j=1}^n h_{ij}\,\varvec{f}_j\text { for }i\in [M]. \end{aligned}$$
(1)

Then watermarks are added to the host signal to obtain a final copy \(\varvec{y}_i\) for the ith user

$$\begin{aligned} \varvec{y}_i=\varvec{x}+\varvec{w}_i. \end{aligned}$$

We assume that \(\left\| \varvec{w}_i\right\| \ll \left\| \varvec{x}\right\| \), so the added watermark doesn’t change the content much.

A coalition of dishonest users \(I \subset [M]\) may come together to forge a new copy and redistribute it among other users. They can apply a linear attack, i.e., create a new copy \(\varvec{y}\) as a linear combination of their copies. In addition, they may add a noise vector \(\varvec{\varepsilon }\), \(\left\| \varvec{\varepsilon }\right\| \ll \left\| \varvec{x}\right\| \), to make it harder for the dealer to identify them.

$$\begin{aligned} \varvec{y}= \sum \limits _{i\in I}\lambda _i\,\varvec{y}_i+\varvec{\varepsilon }, \end{aligned}$$

where \(\lambda _i>0\) for each dishonest user in I exactly participates in the attack, \(\lambda _i\in \mathbb {R}\) and \(\sum _{i\in I}\lambda _i=1\) to ensure the multimedia content \(\varvec{x}\) not be changed. Especially in averaging attack, the last condition is \(\lambda _i = 1/|I |\) for every \(i \in I\) and it implies that

$$\begin{aligned} \varvec{y}= \sum \limits _{i\in I}\lambda _i\,\varvec{y}_i +\varvec{\varepsilon }= \varvec{x}+\sum \limits _{i\in I}\lambda _i\,\varvec{w}_i+\varvec{\varepsilon }. \end{aligned}$$

Note that

$$\begin{aligned} \left\| \varvec{y}-\varvec{x}\right\| = \left\| \sum \limits _{i\in I}\lambda _i\,\varvec{w}_i+\varvec{\varepsilon }\right\| \le \max \left\| \varvec{w}_i\right\| +\left\| \varvec{\varepsilon }\right\| \ll \left\| \varvec{x}\right\| , \end{aligned}$$

therefore, \(\varvec{y}\) is close enough to the original signal \(\varvec{x}\).

In order to find the coalition of dishonest users based on the forged copy \(\varvec{y}\), the dealer evaluates

$$\begin{aligned} s_k&=\langle \varvec{y}-\varvec{x}, \varvec{f}_k\rangle \\&= \left\langle \sum \limits _{i\in I}\lambda _i\,\sum \limits _{j=1}^nh_{ij}\,\varvec{f}_j+\varvec{\varepsilon }, \varvec{f}_k\right\rangle = \sum \limits _{i\in I}\lambda _i\,h_{ik}+e_k, \end{aligned}$$

where \(e_k=\langle \varvec{\varepsilon }, \varvec{f}_k\rangle \), and forms a syndrome vector \(\varvec{S}=(s_1,\,\ldots ,\,s_n)\). The syndrome vector \(\varvec{S}\) can be equivalently defined through the matrix equation

$$\begin{aligned} \varvec{S}=H \varvec{ \Lambda }^T+\varvec{e}, \end{aligned}$$

where \(\Lambda =(\lambda _1,\,\ldots ,\,\lambda _M)\), \(\lambda _i=0\) for \(i\notin I\), and \(\varvec{e}=(e_1,\,\ldots ,\,e_n)\), \(\left\| \varvec{e}\right\| \le \left\| \varvec{\varepsilon }\right\| \).

The dealer wants to design a matrix H in such a way, that by observing \(\varvec{S}\) he always can find the support \(supp(\varvec{ \Lambda })\) if the size of the coalition I is at most t. The following definition for a noiseless scenario was introduced in [6].

Definition 1

A binary \(n\times M\) matrix H is called a t-multimedia digital fingerprinting code with complete traceability (t-MDF code for short) if for any two distinct coalitions I, \(I'\), \(|I |, |I' |\le t\), we have

$$\begin{aligned} H\varvec{ \Lambda }^T\ne H\varvec{ \Lambda }'^T \end{aligned}$$

for any real vectors \(\varvec{ \Lambda }=(\lambda _1,\,\ldots ,\,\lambda _M)\) and \(\varvec{ \Lambda }'=(\lambda _1',\,\ldots ,\,\lambda _M') \), such that \(\lambda _i\ge 0\), \(\lambda _i'\ge 0\), \(\sum \limits _{i=1}^M\lambda _i=\sum \limits _{i=1}^M\lambda _i'=1\), \(supp(\varvec{ \Lambda })=I\), \(supp(\varvec{ \Lambda }')=I'\).

Denote the maximal cardinality and the maximal rate of t-MDF code of length n as M(nt) and \(R(n, t)=n^{-1}\log _2M(n,t)\). Denote by \(R^*(t)\) and \(R_*(t)\) an upper and a lower limits of R(nt) as \(n\rightarrow \infty \). It is known that

$$\begin{aligned} \Omega \left( \frac{\log _2t}{t}\right) \le R_*(t)\le R^*(t)\le \frac{\log _2t}{2t}(1+o(1)). \end{aligned}$$
(2)

The upper bound of (2) can be derived from an upper bound for a binary adder channel from [4]. The lower bound is based on the following observation from [6]. If any 2t columns of a binary matrix H are independent over the field of real numbers \(\mathbb {R}\), then H is a t-MDF code. Since parity check matrices of binary codes with a distance \(d>2t\) poses this property, application of Goppa or BCH codes gives an explicit construction with a rate \(R_*(t)\ge 1/t\) [6]. An improved lower bound \(\Omega \left( \frac{\log _2t}{t}\right) \) can be derived from the results of the paper [1], where the authors proved the existence of binary \(n\times M\) matrices, \(n^{-1}\log _2M=\Omega (\log _2t/t)\), such that any 2t columns are independent over the field \(\mathbb {Z}_p\), \(p>2t\). We note that the latter result was proved with a probabilistic method, i.e. it’s not explicit.

Now we discuss a noisy scenario. In [8] the authors defined \((t, \delta )\)-light complete traceability multimedia digital fingerprinting codes and proved that they don’t exist. Informally, if some coefficient \(\lambda _i\) is sufficiently small, then it is possible to compensate the signal of ith user by the noise so that it would be impossible to identify this user. However, for the case of averaging attacks, when all non-zero coefficients \(\lambda _i\) are equal, the situation is different. Let us give the corresponding definition from [7].

Definition 2

A binary \(n\times M\) matrix H is called a (Euclidean) \((t, \delta )\)-light complete traceability code if for any two distinct coalitions \(I_1\), \(I_2\), \(|I_1 |, |I_2 |\le t\), we have

$$\begin{aligned} \sigma (H\mid I_1)+\varvec{e}_1\ne \sigma (H\mid I_2)+\varvec{e}_2, \end{aligned}$$

for any real vectors \(\varvec{e}_1, \varvec{e}_2 \in \mathbb {R}^n\), \(\left\| \varvec{e}_1\right\| , \left\| \varvec{e}_2\right\| \le \delta \).

In other words, Euclidean distance between vectors \(\sigma (H\mid I_1)\) and \(\sigma (H\mid I_2)\), generated by different coalitions \(I_1\) and \(I_2\), \(|I_1 |, |I_2 |\le t\), should be big, i.e.

$$\begin{aligned} \left\| \sigma (H\mid I_1)-\sigma (H\mid I_2)\right\| >2\delta . \end{aligned}$$

Remark 1

Although an averaging attack is very restrictive for the coalition, in many papers authors consider only them instead of general linear attacks. One of the arguments is that averaging attack is the most fair choice since all the members of a coalition contribute the same proportion of data into a forged copy [2, 11]. However, in future research it may be reasonable to study a model with different coefficients \(\lambda _i\), which are lower bounded by some constant.

Define codes for the case of noise vectors with a bounded cardinality of their support.

Definition 3

A binary \(n\times M\) matrix H is called a Hamming (tT)-light complete traceability code if for any two distinct coalitions \(I_1\), \(I_2\), \(|I_1 |, |I_2 |\le t\), we have

$$\begin{aligned} \sigma (H\mid I_1)+\varvec{e}_1\ne \sigma (H\mid I_2)+\varvec{e}_2, \end{aligned}$$

for any real vectors \(\varvec{e}_1, \varvec{e}_2 \in \mathbb {R}^n\), \(|supp(\varvec{e}_1) |, |supp(\varvec{e}_2) |\le T\).

Equivalently, the number of different coordinates of vectors \(\sigma (H\mid I_1)\) and \(\sigma (H\mid I_2)\), generated by distinct coalitions \(I_1\) and \(I_2\), \(|I_1 |, |I_2 |\le t\), should be big, i.e.

$$\begin{aligned} |supp(\sigma (H\mid I_1)-\sigma (H\mid I_2)) |>2T. \end{aligned}$$

Denote the maximal cardinality of Euclidean and Hamming light complete traceability codes of length n by \(M_E(n, t, \delta )\) and \(M_H(n, t, T)\) respectively. Define the rates of these codes as follows

$$\begin{aligned} R_E(n, t, \delta )= & {} \frac{\log _2M_E(n, t, \delta )}{n}, \\ R_H(n, t, T)= & {} \frac{\log _2M_H(n, t, T)}{n}. \end{aligned}$$

In the following proposition we show an obvious connection between these two families of codes.

Proposition 1

  1. 1.

    A Hamming (tT)-light complete traceability code H is a Euclidean \((t, \delta )\)-light complete traceability code for \(\delta =\sqrt{2T}/(2t(t-1))\).

  2. 2.

    A Euclidean \((t, \delta )\)-light complete traceability code H is a Hamming (tT)-light complete traceability code for \(T=\lfloor 2\delta ^2\rfloor \).

  3. 3.

    The rates of these codes are connected as follows

    $$\begin{aligned}&R_E(n, t, \sqrt{2T}/(2t(t-1)))\ge R_H(n, t, T),\\&R_H(n, t, \lfloor 2\delta ^2\rfloor )\ge R_E(n, t, \delta ). \end{aligned}$$

Proof

1. Assume that a Hamming (tT)-light complete traceability code H is not a Euclidean \((t, \sqrt{2T}/(2t(t-1)))\)-light complete traceability code, i.e. there exist two coalitions \(I_1\) and \(I_2\), such that

$$\begin{aligned} \left\| \varvec{ \Delta }\right\| \le 2\delta ,\text { where }\varvec{ \Delta }=\sigma (H\mid I_1)-\sigma (H\mid I_2), \;\delta =\sqrt{2T}/(2t(t-1)). \end{aligned}$$

Since the minimal positive value of coordinate \(\Delta _i\) is at least \(1/(t(t-1))\), we conclude that there are at most

$$\begin{aligned} 4\delta ^2t^2(t-1)^2= 2T \end{aligned}$$

coordinates, in which \(\sigma (H\mid I_1)\) and \(\sigma (H\mid I_2)\) are different. Hence, there are two vectors \(\varvec{u}_1\), \(\varvec{u}_2\), \(|supp(\varvec{u}_1) |, |supp(\varvec{u}_2) |\le T\), such that

$$\begin{aligned} \sigma (H\mid I_1)+\varvec{u}_1=\sigma (H\mid I_2)+\varvec{u}_2. \end{aligned}$$

Therefore, H is not a Hamming (tT)-light complete traceability code. This contradiction proves the first claim.

2. Assume that a Euclidean \((t, \delta )\)-light complete traceability code H is not a Hamming \((t, \lfloor 2\delta ^2\rfloor )\)-light complete traceability code, i.e. there exist two coalitions \(I_1\) and \(I_2\), such that

$$\begin{aligned} |supp(\varvec{ \Delta }) | \le 2T, \text { where } \varvec{ \Delta }= \sigma (H\mid I_1)-\sigma (H\mid I_2), \;T=\lfloor 2\delta ^2\rfloor . \end{aligned}$$

Since the absolute value of every coordinate of the vector \(\varvec{ \Delta }\) is at most 1, we have

$$\begin{aligned} \left\| \varvec{ \Delta }\right\| \le \sqrt{2T}\le 2\delta , \end{aligned}$$

which contradicts the definition of Euclidean \((t, \delta )\)-light complete traceability codes.

3. Claim 3 is an obvious corollary of claims 1 and 2. \(\square \)

In [7] it was proved that \(\liminf _{n\rightarrow \infty }R_E(n, t, \delta )\ge \Omega (1/t)\) for constant \(\delta \). An upper bound is the same as in the noiseless case, \(\limsup _{n\rightarrow \infty }R_E(n, t, \delta )\le \frac{\log _2t}{2t}(1+o(1))\), since the proof works for an averaging attack. Therefore, there is a \(\Theta (\log _2t)\) gap between the lower and upper bound. We eliminate this gap in the next section.

3 Lower bound on the rate of light complete traceability codes

In this section we prove

Theorem 1

For \(\tau <1/4\)

$$\begin{aligned} \liminf _{n\rightarrow \infty }R_H(n, t, \lfloor \tau n\rfloor )\ge \frac{(1-2\tau )\log _2t}{6t}(1+o(1)),\;\;t\rightarrow \infty . \end{aligned}$$
(3)

Combining Theorem 1 and Proposition 1 we obtain the following

Corollary 1

For \(\delta ^2=\alpha n\), \(\alpha <1/8t^2(t-1)^2)\), we have

$$\begin{aligned} \liminf _{n\rightarrow \infty }R_E(n, t, \sqrt{\alpha n})\ge \frac{(1-2\tau )\log _2t}{6t}(1+o(1)),\;\;t\rightarrow \infty , \end{aligned}$$

where \(\tau =2\alpha t^2(t-1)^2.\)

For the case of small noise \(\delta =o(\sqrt{n})\) and \(n\rightarrow \infty \) a new lower bound has the following form

$$\begin{aligned} \liminf _{n\rightarrow \infty }R_E(n, t, \delta ) \ge \lim \limits _{\alpha \rightarrow 0} \liminf _{n\rightarrow \infty }R_E(n, t, \sqrt{\alpha n}) \ge \frac{\log _2t}{6t}(1+o(1)). \end{aligned}$$

It improves the previous lower bound \(\Omega (1/t)\) and has the same order \(\Theta (\log _2t/t)\) as the upper bound. However, the new bound is not explicit, i.e. there is no effective encoding or decoding algorithm for a new code.

Proof of Theorem1

Consider a random \(n\times M\) matrix H, \(M=2^{Rn}\), in which every entry is chosen independently and equals 1 with a probability 1/2. The value of R will be specified later. Fix two coalitions \(I_1\) and \(I_2\), \(|I_1 |, |I_2 |\le t\). Call a row r good, if

$$\begin{aligned} \frac{\sum \limits _{i_1\in I_1} h_{r,i_1}}{|I_1 |}\ne \frac{\sum \limits _{i_2\in I_2} h_{r,i_2}}{|I_2 |}. \end{aligned}$$

Otherwise, we call a row bad. Call a pair of coalitions good, if there are at least \(2T+1\) good rows for them. Otherwise, call such a pair bad. Then the condition that H is a Hamming (tT)-light complete traceability code is equivalent to the absence of bad pairs of coalitions.

We say that a bad pair of coalitions \(I_1\) and \(I_2\) is minimal, if there is no another bad pair of coalitions \(I_1'\) and \(I_2'\), \(I_1'\cup I_2'\subset I_1\cup I_2\). For example, a bad pair of intersecting coalitions \(I_1\) and \(I_2\) with \(|I_1 |=|I_2 |\) can’t be minimal, since it contains another bad pair \(I_1\setminus I_2\) and \(I_2\setminus I_1\). Obviously, to prove that H is a Hamming (tT)-light complete traceability code it is enough to check that there are no minimal bad pairs of coalitions.

We are going to prove that a mathematical expectation of the number of minimal bad pairs of coalitions is tending to zero as \(n\rightarrow \infty \). By Markov’s inequality this would imply that for big enough n there exists (tT)-light complete traceability Hamming code with the rate R and \(T=\lfloor \tau n\rfloor \).

Now we estimate the probability that a row i is bad for coalitions \(I_1\) and \(I_2\). \(\square \)

Lemma 2

The probability that a row is bad for coalitions \(I_1\) and \(I_2\), \(|I_1 |=q\), \(|I_2 |=r\), \(q>r\), is upper bounded by \(p(q)=q^{-1/3+o(1)}\), \(q\rightarrow \infty \). For non-intersecting coalitions \(I_1\) and \(I_2\), \(|I_1 |=|I_2 |=q\), the probability that a row is bad is upper bounded by \(p(q)=q^{-1/2+o(1)}\). Moreover, \(p(q)\le 1/2\) for all q.

Proof of Lemma 2

For the case \(q=r\), \(I_1\cap I_2=\emptyset \), probability of a bad row is equal to

$$\begin{aligned} 2^{-2q}\sum \limits _{i=0}^q\left( {\begin{array}{c}q\\ i\end{array}}\right) \left( {\begin{array}{c}q\\ i\end{array}}\right) =2^{-2q}\left( {\begin{array}{c}2q\\ q\end{array}}\right) =O(q^{-0.5}), \end{aligned}$$

which is not greater than 1/2 for all q.

Now assume that \(q>r\). Denote the cardinality of the intersection of \(I_1\) and \(I_2\) as k. Consider two cases \(q-k>s\) and \(q-k\le s\), \(s=q^{2/3}\).

The first case \(q-k>s\). Note that for any distribution of zeroes and ones in columns from \(I_2\) there exists at most one fraction of ones in \(I_1\setminus I_2\) which makes the row bad. Hence the probability of obtaining a bad string is upper bounded by

$$\begin{aligned} \max \limits _{l}\frac{\left( {\begin{array}{c}q-k\\ l\end{array}}\right) }{2^{q-k}}\le 1/2. \end{aligned}$$

For \(q\rightarrow \infty \) this bound looks as follows

$$\begin{aligned} \max \limits _{l}\frac{\left( {\begin{array}{c}q-k\\ l\end{array}}\right) }{2^{q-k}}<\frac{1+o(1)}{\sqrt{\pi (q-k)/2}}<\frac{1+o(1)}{\sqrt{\pi s/2}}=O(q^{-1/3}), \end{aligned}$$

where in the first inequality a Stirling’s approximation \(\left( {\begin{array}{c}q-k\\ (q-k)/2\end{array}}\right) \sim \frac{2^{q-k}}{\sqrt{\pi (q-k)/2}}\) for a maximal binomial coefficient was used.

The second case \(q-k\le s\). Observe that the greatest common divisor \(d=(q,r)\) is at most s, since \(d\le q - r\le q - k\le s\). Since \(s_1/q=s_2/r\) implies \((q/d) \mid s_1\) and \((r/d) \mid s_2\), it is readily seen that for a bad row i the ith coordinate in sums \(\sum \limits _{j\in I_1}\varvec{h}_j\) and \(\sum \limits _{j\in I_2}\varvec{h}_j\) should be divided by q/d and r/d respectively. Therefore, probability of a bad row can be upper bounded by the probability P that a binomial random variable \(\xi \sim Bin(q, 1/2)\) is divided by \(q/d\ge q^{1/3}\). One can see that \(P\le 1/2\) for \(q/d>1\). Now we prove that for \(q\rightarrow \infty \) the probability P is at most \(q^{-1/3+o(1)}\).

By Hoeffding’s inequality [9]

$$\begin{aligned} \Pr \left( |\xi -q/2 |>\sqrt{q\ln q}\right) \le 2e^{-2\ln q}=O\left( q^{-2}\right) . \end{aligned}$$

Define \(S=[\lfloor \frac{q/2-\sqrt{q\ln q}}{q/d}\rfloor , \lfloor \frac{q/2+\sqrt{q\ln q}}{q/d}\rfloor ]\). Then we can estimate P as follows

$$\begin{aligned} P&= \sum \limits _{l=0}^{d}\Pr (\xi =l\cdot q/d)\\&\le \sum \limits _{l\in S} \Pr (\xi =l\cdot q/d) + \Pr \left( |\xi -q/2 |>\sqrt{q\ln q}\right) \\&\le \max \limits _x \Pr (\xi =x)\cdot \left\lceil \frac{2\sqrt{q\ln q}+1}{q/d}\right\rceil +O\left( q^{-2}\right) \\&\le \frac{1}{\sqrt{q}}\cdot \left\lceil \frac{2\sqrt{q\ln q}+1}{q/d}\right\rceil +O\left( q^{-2}\right) \\&\le O(q^{-1/3}\sqrt{\ln q}) =q^{-1/3+o(1)}. \end{aligned}$$

\(\square \)

To estimate a mathematical expectation E of the number of minimal bad pairs of coalitions we iterate over all \(< M^{q+r}\) pairs of coalitions having sizes q and r, \(q>r\), all pairs of non-intersecting coalitions of size q, and over all possible amounts \(L<2T+1\) of good rows.

$$\begin{aligned} E&<\sum \limits _{0<r\le q\le t}M^{q+r}\sum \limits _{L=0}^{2T}\left( {\begin{array}{c}n\\ L\end{array}}\right) (1-p(q))^Lp(q)^{n-L}\\&{\mathop {<}\limits ^{a)}}\sum \limits _{q=1}^{t} qM^{2q}(2T+1)\left( {\begin{array}{c}n\\ 2T\end{array}}\right) (1-p(q))^{2T}p(q)^{n-2T}\\&=\sum \limits _{q=1}^{t} 2^{2qRn}p(q)^n2^{(h(2\tau ) + o(1))n}\left( \frac{1-p(q)}{p(q)}\right) ^{2\tau n}\\&=\sum \limits _{q=1}^{t}2^{A(q)n}, \end{aligned}$$

where

$$\begin{aligned} A(q)=2qR +\log _2p(q) + h(2\tau ) + 2\tau \log _2\left( \frac{1-p(q)}{p(q)}\right) . \end{aligned}$$

In inequality a) we used the fact that

$$\begin{aligned} \left( {\begin{array}{c}n\\ L\end{array}}\right) (1-p(q))^Lp(q)^{n-L}\le \left( {\begin{array}{c}n\\ 2T\end{array}}\right) (1-p(q))^{2T}p(q)^{n-2T}, \end{aligned}$$

since \(2\tau <1/2\le 1-p(q)\) and \(2T\le (1-p(q))n\).

Let \(\hat{R}=\min \limits _{q\in [1,t]}-\frac{\log _2p(q) + h(2\tau ) + 2\tau \log _2\left( \frac{1-p(q)}{p(q)}\right) }{2q}\). Note that since \(2\tau <1-p(q)\) for all q then \(\hat{R}>0\). For \(R<\hat{R}\) the condition \(A(q)<0\) holds, hence, \(E\rightarrow 0\) as \(q\rightarrow \infty \), which implies that the rate \(\hat{R}\) is achievable. For \(t\rightarrow \infty \) the minimum would be attained at q, which tends to \(\infty \), so

$$\begin{aligned} \hat{R}=\frac{(1-2\tau )\log _2t}{6t}(1+o(1)),\quad t\rightarrow \infty . \end{aligned}$$

Theorem 1 is proved. \(\square \)

4 Conclusion

In this paper we proved a new lower bound on the rate of Euclidean \((t, \delta )\)-light complete traceability codes, which shows that the optimal rate has order \(\Theta (\log _2t/t)\). However, the proof uses probabilistic arguments and does not provide an explicit construction with efficient encoding and decoding algorithms. A natural open problem is to design a code with an optimal rate and efficient decoding algorithm.

Coefficient \(\lambda _i\) shows what proportion of the original content was contributed by user i into an illegal copy. It is natural that if the contribution of user i was very small then it will be hard for a dealer to identify such user. So, another open task is to design a code capable of finding all members of a coalition for an adversarial noise and linear attack, whose coefficients \(\lambda _i\) are lower bounded by some constant, i.e. all users, whose contribution was big enough.