1 Introduction

The concept of frameproof codes was coined in [1], where they are used to offer protection against illegal redistribution of digital goods. In that scenario, frameproof codes serve as underlying codes in fingerprinting schemes. In these schemes, a distributor prevents illegal redistribution of his/her goods by making delivered copies different. This is achieved by embedding a unique mark in each copy. Having distinguishable copies, clearly rules out plain redistribution. Unfortunately, this strategy is weak against collusion attacks, where a coalition of malicious users compare their copies in order to detect the positions in which the embedded marks differ. The goal of these traitor users is to change the detected positions, in order to create a new pirate copy that masks their identities. This new pirate copy is the one that will be illegally redistributed. On the other hand, the goal of the distributor is to design the set of marks to be embedded in such a way that, given a pirate copy, traitors can be traced back.

Separable codes were proposed in [2] for the case of multimedia fingeprinting, where traitor users can perform averaging attacks. The connection between separable codes andframeproof codes was discussed in [2, 3]. See also the work in [4]. Multimedia fingerprinting codes are also related to the family of \(B_2\) codes, as was stated in [5]. The concept of \(B_2\) codes [6] (known as 2-signature codes in multiple access communications) has its origins in the work of Sidon [7].

1.1 Our contribution

For the state of the art about bounds for separable codes, \(B_2\) codes and frameproof codes, we refer the reader to the elegant expositions in [8] and [9]. In this paper, by means of the Lovász Local Lemma, we present proofs that match the already known lower bounds in [8] and [9]. We insist in proving bounds through the Lovász Local Lemma, because we can then use the approach that Giotis et al. [10] made of the variable framework developed by Moser and Tardos [11, 12], to devise an algorithm that constructs codes.

Therefore, the main contribution of this paper, along with the alternative proofs of the lower bounds, is the construction of separable codes, \(B_2\) codes, and frameproof codes. In this sense it is a follow-up companion to [8] and [9], where only existence results were discussed. We stress that we also extend the work in [10], in order to establish the computational complexity of the algorithmic construction in a more precise manner.

2 Previous results

We adopt the notation of [8]. Let \(\mathcal {Q}\) be an alphabet of size q. A code \(\mathcal {C}\) of length n and size M is a subset of n-tuples in \(\mathcal {Q}^n\), i.e. \(\mathcal {C}=\{c_1,\ldots ,c_M\}\). The n-tuples \(c_i = (c_i(1),\dots ,c_i(n))\in \mathcal {C}\), \(1\le i \le M\), are called code words. The distance between two code words \(c_i\) and \(c_j\) is the number of positions in which they differ, \(|\{l\in \{ 1, \ldots , n\}: c_i(l) \ne c_j(l)|\). The minimum distance of a code is the smallest distance between any two distinct code words, and will be denoted by d. In this case we say that \(\mathcal {C}\) is an \((n,M,d)_q\) code or \((n,M)_q\) for short.

Given an \((n,M)_q\) code \(\mathcal {C}\) , we form an \(M \times n\) matrix C by writing the M code words as its rows. We denote the row set of C as \(\{c_1,\cdots ,c_M\)}. A set U of t rows \(\{c_{i_1},\ldots ,c_{i_t}\}\) will be denoted as \(U^{t}\).

If \(\mathcal {Q}\) is the finite field \(\mathbb {F}_q\), we can take \(\mathcal {C}\) to be a vector subspace of \(\mathbb {F}_q^n\). Then the size of the code is \(M=q^k\), where k is the dimension of the subspace. In this case we say we have an \([n,k,d]_q\) linear code.

The rate of an \((n,M)_q\) code, which is an important parameter, is defined as

$$\begin{aligned} R = \frac{\log _q M}{n}. \end{aligned}$$
(1)

In order to obtain families of codes with good asymptotic rate, we will make use of Forney’s idea of code concatenation [13]. We take an inner code defined over a small alphabet of size q, say \(\mathcal {C}_i = (n,M_i)_q\), and we take an outer code \(\mathcal {C}_o=(N,M_o)_{Q}\). The size of the alphabet of the outer code \(\mathcal {Q}_o\) is taken to be equal to the cardinality of the inner code, that is \(|\mathcal {Q}_o |=Q=M_i\). Thus, we can define a bijection between the outer code alphabet and the code words of the inner code, \(\phi : \mathcal {Q}_o \rightarrow \mathcal {C}_i\). Applying this bijection to all code words of the outer code, we obtain an \((nN,M_o)_q\) code over the small alphabet of size q, which we denote by \(\mathcal {C}_i\circ \mathcal {C}_o\). If the rates of the inner and outer code are \(R_i\) and \(R_o\) respectively, then the rate of the concatenated code \(\mathcal {C}_i\circ \mathcal {C}_o\) is \(R_iR_o\).

Given \(U \subseteq \mathcal {C},\) we define the ith projection set of U as

$$\begin{aligned} U(i) = \{ u_j(i) \in \mathcal {Q}: u_j\in U\}, \ 1\le i \le n. \end{aligned}$$
(2)

Also, we define the descendant set of U as

$$\begin{aligned} \textrm{desc}(U) = U(1)\times \cdots \times U(n)= \{ z \in \mathcal {Q}^n : z(i)\in U(i),\ 1\le i \le n\}. \end{aligned}$$
(3)

Definition 1

Let \(t\ge 2\) be an integer. We say that an \((n,M)_q\) code \(\mathcal {C}\) is:

  1. 1.

    a t-frameproof code, t-FP, if for every \(U\subset \mathcal {C}\) with \(|U|\le t\), we have \(\textrm{desc}(U)\cap \mathcal {C}= U\).

  2. 2.

    a \(\bar{t}\)-separable code, \(\bar{t}\)-SC, if for all distinct \(U,V \in \mathcal {C}\) with \(|U|\le t\) and \(|V |\le t\), we have \(\textrm{desc}(U)\ne \textrm{desc}(V)\).

  3. 3.

    a \(B_2\) code if all sums \(u_i+u_j\), \(1\le i\le j\le M\), are different, where the operation \(+\) takes place in the field of real numbers.

In a t-FP code, given a subset U of code words of size at most t, there are no other code words in \(\textrm{desc}(U)\) than the code words in U. This is the weakest form of tracing. The reason to study the families of t-FP, \(\bar{t}\)-SC, and \(B_2\) codes together is because they are closely related as the following lemma shows.

Lemma 2

[2, 5] Let nMqt be positive integers greater than or equal to 2. We have the following relationships:

  1. 1.

    A t-FP code is a \(\bar{t}\)-SC.

  2. 2.

    A \(\bar{t}\)-SC is a \((t-1)\)-FP code.

  3. 3.

    In the binary case, a \(\bar{2}\)-\(SC(n,M)_2\) is a \(B_2(n,M)_2\) code and vice versa.

We will denote the largest cardinality of t-FP, \(\bar{t}\)-SC, and \(B_2\) codes as F(tnq), \(S(\bar{t},n,q)\) and B(nq), respectively.

Let \(R_q(n,t)\) be the optimal rate of an \((n,M)_q\) code. As is customary in information theory, we are interested in the asymptotic rate

$$\begin{aligned} \underline{R}_{q}(t)=\limsup _{n\rightarrow \infty } R_q(n,t). \end{aligned}$$
(4)

We will use the following notation for the asymptotic rate of t-FP, \(\bar{t}\)-SC, and \(B_2\) codes:

$$\begin{aligned} f(t,q)&= \limsup _{n\rightarrow \infty }\frac{\log _q F(t,n,q)}{n},\end{aligned}$$
(5)
$$\begin{aligned} s(\bar{t},q)&= \limsup _{n\rightarrow \infty }\frac{\log _q S(\bar{t},n,q)}{n},\end{aligned}$$
(6)
$$\begin{aligned} b(q)&= \limsup _{n\rightarrow \infty }\frac{\log _q B(n,q)}{n}. \end{aligned}$$
(7)

2.1 Algebraic geometric codes

Algebraic geometric codes (AG) were developed by V.D. Goppa [14], and built from the theory of algebraic curves. Let \(\mathcal C\) be a code from a curve of genus g, over the field \(\mathbb {F}_q\), with N points. If d denotes the minimum distance of the code, it has been shown that the rate of the code satisfies:

$$\begin{aligned} R \ge 1- \frac{d}{N}-\frac{g}{N}. \end{aligned}$$
(8)

Since we are aiming for codes with the highest possible rate, we need g/N to be small. In that regard, Drinfeld and Vlădut [15] gave the following lower bound for g/N

$$\begin{aligned} \liminf _{g\rightarrow \infty }\frac{g}{N}\ge \frac{1}{\sqrt{q}-1}. \end{aligned}$$
(9)

Several research efforts [16, 17] show that there are explicitly described sequences of curves that achieve the Drinfeld-Vlădut bound. Also, by the results in [18], AG codes having asymptotic rate

$$\begin{aligned} R \ge 1- \frac{d}{N}-\frac{1}{\sqrt{q}-1} \end{aligned}$$
(10)

can be constructed with polynomial complexity.

2.2 Lovász Local Lemma

We start with a sample space \(\Omega \). We will define independent random variables, say \(\mathcal {V}=\{V_1,\ldots ,V_m\}\), taking values in \(\Omega \). In this context we will have a set of events \(\{E_1,\ldots ,E_b\}\) that will be considered “bad”. We will assume that all events are defined based on \(V_1,\ldots ,V_m\). The scope \({\mathrm sc}(E)\) of an event E is the minimal subset of random variables in \(\mathcal {V}\) that determines its occurrence. In the sequel, these events are going to be ordered. The most used version of the LLL is:

Lemma 3

(Symmetric Lovász Local Lemma) Let \(E_1,\ldots ,E_b\) be a set of (typically bad) events such that for each \(E_j\):

  • \(Pr[E_j]\le p\in (0,1)\).

  • \(E_j\) is mutually independent of a set of all but at most s of the other events.

If

$$\begin{aligned} ep(s+1)\le 1, \end{aligned}$$
(11)

then

$$\begin{aligned} \Pr \left[ \bigcap _{i=1}^{b} \overline{E}_i \right] > 0. \end{aligned}$$

That is, all bad events can be avoided.

Intuitively, a configuration exists if the probability of each bad event is not too large, and if the mutual independence is relatively small.

Observe that, in Lemma 3, a given event E has to be mutually independent of all but at most s events. Let us display some machinery, developed in [19], that will allow us to establish claims on mutual independence throughout the paper. We start by defining mutual independence.

Definition 4

[19] An event E is mutually independent of a set of events \(\mathcal {E}\) if for every \(E_1, \ldots , E_r \in \mathcal {E}\), \(\Pr [E \mid E_1 \cap \cdots \cap E_r]=\Pr [E]\).

As Molloy and Reed state in [19], for mutual independence, “looks can often be deceiving”, therefore in their own words: “we appeal to the following fact nearly every time we wish to establish mutual independence”.

The Mutual Independence Principle [19]. Let \(\mathcal {V}\) be a set of independent random variables. Suppose that \( E_1,\ldots , E_r\in \mathcal {E}\) is a set of events, where each \( E_i\) is determined by \(F_i \subset \mathcal {V}\). If \(F_i \cap (F_{i_1} \cup \cdots \cup F_{i_k}) = \emptyset \), then \(E_i\) is mutually independent of \(\{ E_{i_1}, \ldots , E_{i_k}\}\).

With the Mutual Independence Principle at hand, we can define a dependency graph. The vertices are the events \(E_1,\ldots ,E_b\), and there is an edge between \(E_i\) and \(E_j\) if they share at least a random variable. The neighborhood of a vertex \(E_j\) will be denoted by \(\Gamma _j\). We will consider that no vertex belongs to its neighborhood. We denote by \(s\ge 1\) the maximum degree of the graph. Therefore, \(|\Gamma _j|\le s\) for \(j=1,\ldots ,b\). Now, according to the Mutual Independence Principle, an event \(E_i\) is considered to be mutually independent with all events not in \(\Gamma _j\).

Intuitively, the symmetric version of the Lovász Local Lemma is appropriate when all bad events are “alike” in terms of error probability and dependence. If this is not the case, then one must resort to the general version, where each event is treated individually. However, if the bad events can be grouped into sets of events that are “alike”, and the number of sets is relatively small, the following version can be very useful.

Lemma 5

Let \(\mathcal {E}=\{E_1,\ldots , E_b\}\) be a set of (typically bad) events such that each \(E_j\) is mutually independent of \( \mathcal {E}\setminus (\mathcal {S}_j \cup E_j) \), for some \(\mathcal {S}_j \subset \mathcal {E}\). If for all \(i=1, \ldots , b\), the following conditions are satisfied:

  1. 1.

    \(\Pr [E_i]\le 1/4\),

  2. 2.

    \(\sum _{E_i \in \mathcal {S}_j}\Pr [E_i]\le \frac{1}{4},\)

then, with positive probability, none of the events in \(\mathcal {E}\) occur.

2.3 The variable framework

Since the appearance of the Lovász Local Lemma, a lot of work has been done in order to develop an algorithm to explicitly obtain the combinatorial objects whose existence was established by the lemma. The algorithmic version, that became a reality with the work in [11, 12], is usually called the variable framework. In this section, we discuss the variable framework approach of [10]. The main idea in [10] is to show that the probability that the variable framework algorithm performs more than N iterations is inverse exponential in N. We make a succinct presentation, and only highlight the aspects in [10] we will need in Section 5, where we extend the work in [10], and obtain an explicit upper bound on the number of “iterations” that the algorithm performs.

As said before, we can represent a code of size M and length n as an \(M\times n\) matrix. We start by outlining a general algorithm that constructs a matrix that represents a code whose code words avoid a set of bad events (see Algorithm 1).

Algorithm 1
figure a

 .

Observe that, if and when Algorithm 1 terminates, it produces a code in which none of the bad events occur. This is true because by the condition in the loop of line 2 of Body, if and when the Algorithm terminates, no bad event in \(\mathcal {E}\) occurs. Note also than by line 2 of the Resample \((E_i)\) procedure, whenever a Resample call returns, there is no bad event occurring that shares random variables with \(E_i\).

Let us show how it is proven in [10] that this algorithm terminates fast with positive probability. A Resample call made in line 2 of the Body is a root call, and one made by line 2 of the Resample \((E_i)\) routine is a recursive call. The computational complexity discussion we will make is based on the number of Resample calls. Specifically, the authors of [10] prove the following result (we have adapted notation to our context):

Theorem 6

[10] Let \(X_{i,j}\), \(1\le i\le M\), \(1\le j\le n\) be distinct random variables (arranged as an \(M\times n\) matrix), taking values in an alphabet \(\mathcal {Q}\). Let \(\mathcal {E}=\{E_1,\ldots ,E_b\}\) be a set of bad events, where each event is associated to a subset of the random variables \(X_{i,j}\). Let \(p\ge \Pr [E_i]\), \(\forall 1\le i \le b\), and let s be the maximum number of events whose scopes intersect the scope of a given event. Suppose bad events satisfy \(ep(s+1)<1\). Then, the probability that Algorithm 1 executes for at least N rounds is inverse exponential in N and, upon termination, the algorithm outputs an \(M\times n\) matrix in which no bad event \(\mathcal {E}\) occurs.

Proof

(Sketch) First, note that Algorithm 1 makes progress after each Resample call. More precisely, take an arbitrary call, say Resample(\(E_{i}\)).

  1. (i)

    If and when Resample(\(E_{i}\)) terminates, then \(E_{i}\) no longer occurs.

  2. (ii)

    Let us suppose \(E_{j}\) is not a bad event at the start of Resample(\(E_{i}\)). If and when Resample(\(E_{i}\)) ends, then \(E_{j}\) is still not bad.

In other words, after a Resample call returns, all progress made up to that point is maintained, and there is at least one more bad event that is “fixed”.

To continue, let us define a graph in order to represent executions of Algorithm 1. More precisely, the graph is a labeled rooted forest whose components are rooted trees with labeled vertices. In our scenario the labels of the vertices of the trees are from the set of events \(\mathcal {E}\). A witness forest of an execution of Algorithm 1, making at least N Resample calls, is a representation of the execution detailed in the following way:

  1. 1.

    A node labeled as \(E_i\) depicts a \(\textsc {Resample}(E_i)\) call.

  2. 2.

    The labels of the roots correspond to root Resample calls (line 3 of \(\textsc {Body}\)).

  3. 3.

    A recursive \(\textsc {Resample}(E_j)\) call done in line 3 of \(\textsc {Resample}(E_i)\) is associated to a child labeled as \(E_j\) of the node labeled as \(E_i\).

Let \(\mathcal {T}\) denote a witness forest having N nodes. Next lemma will be key to prove our result.

\(\square \)

Lemma 7

Let p denote an upper bound on the probability of bad events in the sense of Lemma 3. Let \(\textrm{P}_N\) denote the probability that Algorithm 1 executes at least N Resample calls. Then

$$\begin{aligned} \textrm{P}_N\le \sum _{\mathcal {T}: |\mathcal {T}|=N}p^{N}. \end{aligned}$$
(12)

The sum is over all witness forests with N nodes.

See [20].\(\square \)

Since the sum in (12) is over all witness forests, it follows that to compute \(\textrm{P}_N\) we have to count witness forests.

We call a forest feasible if:

  1. (i)

    the labels of the roots are pairwise distinct,

  2. (ii)

    the labels of the children of a node are pairwise distinct,

  3. (iii)

    if a vertex labeled by \(E_{j}\) is a child of a vertex labeled by \(E_{i}\), then \({\mathrm sc}(E_i)\cap {\mathrm sc}(E_j)\ne \emptyset \).

It can be seen that the class of feasible forests includes witness forests. So it will be enough for our purposes to deal with the former. As a matter of fact, as shown in [20], it turns out we can deal with full unlabeled ordered rooted planar forests. Informally, to convert a feasible forest into a full unlabeled ordered rooted planar forest, one has to:

  • add root nodes labeled conveniently so that the set of labels of roots is the set of bad events,

  • add leaves to every node so that the set of labels of each child is the set of bad events whose scope intersect the scope of the event of its label. Thus, each node will have at most \(s+1\) children, since the scope of an event intersects itself,

  • then remove all the labels.

Consequently, it suffices to count the number of full \((s+1)\)-ary rooted planar trees with a given number of nodes, say \(v\). This number, that we denote by \(T_v\), has already been computed (see for instance [21, Theorem 5.13]):

$$\begin{aligned} T_v=\frac{1}{sv+1}\left( {\begin{array}{c}(s+1)v\\ v\end{array}}\right) , \end{aligned}$$
(13)

and we have the upper bound

$$\begin{aligned} T_v< A \Bigg (\Big (1+\frac{1}{s}\Big )^s (s+1)\Bigg )^v, \end{aligned}$$
(14)

where A is a constant depending only on s.

The number \(F_N\) of rooted planar forests with N internal nodes that are composed of \(|\mathcal {E}|\,\) \((s+1)\)-ary rooted planar trees is:

$$\begin{aligned} F_N = \sum _{\begin{array}{c} N_1+\cdots +N_{|\mathcal {E}|}=N \\ N_1,...,N_{|\mathcal {E}|}\ge 0 \end{array}} T_{N_1}\cdots T_{N_{|\mathcal {E}|}}. \end{aligned}$$
(15)

Now, from (15) and (14) we get:

$$\begin{aligned} F_N< (AN)^{|\mathcal {E}|}\Bigg (\Big (1+\frac{1}{s}\Big )^s (s+1)\Bigg )^N<(AN)^{|\mathcal {E}|}(e(s+1))^N.\end{aligned}$$
(16)

And therefore, (12) together with (16) yield

$$\begin{aligned} P_N < (AN)^{|\mathcal {E}|}(ep(s+1))^N. \end{aligned}$$
(17)

Now Theorem 6 follows since \(ep(s+1)<1\) by assumption.

Remark 8

Note that according to the Mutual Independence Principle, in Algorithm 1, \(\textsc {Resample}(E_{i})\), line 2, the statement \({\mathrm sc}(E_i)\cap {\mathrm sc}(E_j)\ne \emptyset \) means that the events \(E_j\) we are resampling are not considered to be mutually independent with \(E_i\).

3 Sketch of proof

Since we are going to provide constructions for t-frameproof codes, \(\bar{2}\)-separable codes and \(B_2\) codes, using the variable framework of the LLL, the outline of the proofs will be similar. In this section we provide an outline of our strategy in order to clarify our discussion.

Our proof reasoning will observe the following guidelines:

  • Existence

    • Definition of random variables and events. Typically we will arrange a set of random variables as a matrix. A given assignment to these random variables will represent a code of length equal to the number of columns of the matrix and size equal to the number of rows of the same matrix. The events will be defined in terms of subsets of rows of the matrix.

    • Computing the probability of an event being bad. This probability will depend only on the number of columns of the matrix (code length).

    • Computing the number of events not mutually independent with a given event. This number will depend only on the number of rows of the matrix (code size).

    • Apply the LLL. Since the LLL relates probability with mutual independence, we will obtain a relationship between number of columns and number of rows, so that there exists a matrix that avoids all bad events.

  • Construction

    • Particularize the variable framework algorithm to the code to be constructed. We will clearly establish the input parameters of the algorithm.

    • Impose further restrictions of the LLL condition. This will allow us to find closed expressions for an upper bound of the computational complexity. Nevertheless this complexity will be exponential in the code length.

    • Describe a concatenated construction. Establish outer code parameters, so the overall construction is polynomial in the code length.

4 Lower bounds

The key issue in using the variable framework of the LLL is deciding how to define the random variables and events that are going to be used in order to represent the object one wishes to obtain. The choice has direct impact in the sampling procedure, which in turn affects the computational complexity.

4.1 Frameproof codes

Since we wish to obtain a code of size M and length n, we take Mn independent random variables \(\mathcal {X}=\{X_1,\ldots ,X_{Mn}\}\), and arrange them as an \(M \times n\) matrix C. If assignments to the rows of the matrix represent code words, then entry \(X_{ij}\) is associated with position j of code word i. Let us take a row c of C, and a set of t rows \(U^t\) of C, such that \(c\not \in U^t\). Abusing notation, we also denote the assignment to these random variables as c and \(U^t\) respectively. To define bad events, we will use Definition 1. So, for frameproof codes, a bad event is an assignment for which \(c\in \textrm{desc}(U^t)\). We denote such an event as \(E(c,U^t)\).

Observe that, according to the Mutual Independence Principle, an event \(E(c_i,U^t)\) is mutually independent with all events whose scope does not intersect \({\mathrm sc}(E(c_j,V^t))\). In this case, this means that \(E(c_i,U^t)\) is mutually independent with all events not sharing all the variables of at least a row of C. Thus, given \(E(c_i,U^t)\), we need to compute the number s of events, different from \(E(c_i,U^t)\), whose scopes intersect with \({\mathrm sc}(E(c_j,V^t))\). In Section 2.2, s was defined as the maximum degree of the dependency graph. Therefore, the number s corresponding to a given event \(E(c,U^t)\) is the total number of events, minus the number of events that do not contain neither c nor \(U^t\), minus 1 (corresponding to the event \(E(c,U^t)\)).

$$\begin{aligned} s=\left( {\begin{array}{c}M\\ 1\end{array}}\right) \left( {\begin{array}{c}M-1\\ t\end{array}}\right) -\left( {\begin{array}{c}M-(t+1)\\ 1\end{array}}\right) \left( {\begin{array}{c}M-(t+2)\\ t\end{array}}\right) -1. \end{aligned}$$
(18)

According to Pascal’s rule,

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) = \left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) +\left( {\begin{array}{c}n-1\\ k\end{array}}\right) ,\qquad \left( {\begin{array}{c}n-1\\ k\end{array}}\right) = \left( {\begin{array}{c}n-2\\ k-1\end{array}}\right) +\left( {\begin{array}{c}n-2\\ k\end{array}}\right) . \end{aligned}$$
(19)

Therefore,

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) -\left( {\begin{array}{c}n-2\\ k\end{array}}\right) = \left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) +\left( {\begin{array}{c}n-2\\ k-1\end{array}}\right) . \end{aligned}$$
(20)

By repeated application, we have

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) -\left( {\begin{array}{c}n-3\\ k\end{array}}\right)&= \left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) +\left( {\begin{array}{c}n-2\\ k-1\end{array}}\right) +\left( {\begin{array}{c}n-3\\ k-1\end{array}}\right) \nonumber \vdots&= \vdots \nonumber \\ \left( {\begin{array}{c}n\\ k\end{array}}\right) -\left( {\begin{array}{c}n-s\\ k\end{array}}\right)&= \left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) +\left( {\begin{array}{c}n-2\\ k-1\end{array}}\right) +\cdots +\left( {\begin{array}{c}n-s\\ k-1\end{array}}\right) \nonumber&= \sum _{j=1}^{s}\left( {\begin{array}{c}n-j\\ k-1\end{array}}\right) \end{aligned}$$

Substituting n by \(M-1\), k for t, and s by \(t+1\), we obtain

$$\begin{aligned} \left( {\begin{array}{c}M-1\\ t\end{array}}\right) - \left( {\begin{array}{c}M-1 - (t+1)\\ t\end{array}}\right) = \sum _{j=1}^{t+1}\left( {\begin{array}{c}M-1-j\\ t-1\end{array}}\right) . \end{aligned}$$
(21)

Therefore, from (18) we have

$$\begin{aligned} s+1&= M\left( {\begin{array}{c}M-1\\ t\end{array}}\right) - (M-(t+1))\left( {\begin{array}{c}M- (t+2)\\ t\end{array}}\right) \\&= M\left( {\begin{array}{c}M-1\\ t\end{array}}\right) - M\left( {\begin{array}{c}M- (t+2)\\ t\end{array}}\right) + (t+1)\left( {\begin{array}{c}M- (t+2)\\ t\end{array}}\right) \\&= M\left[ \left( {\begin{array}{c}M-1\\ t\end{array}}\right) - \left( {\begin{array}{c}M- (t+2)\\ t\end{array}}\right) \right] + (t+1)\left( {\begin{array}{c}M- (t+2)\\ t\end{array}}\right) \\&= M\left[ \sum _{j=1}^{t+1}\left( {\begin{array}{c}M-1-j\\ t-1\end{array}}\right) \right] + (t+1)\left( {\begin{array}{c}M- (t+2)\\ t\end{array}}\right) \\&< M(t+1)\left( {\begin{array}{c}M-2\\ t-1\end{array}}\right) +(t+1)\left( {\begin{array}{c}M- (t+2)\\ t\end{array}}\right) \\&< M(t+1)\frac{M^{t-1}}{(t-1)!}+(t+1)\frac{M^t}{t!}=\frac{(t+1)^2}{t!}M^t\le \frac{9}{2} M^t. \end{aligned}$$

In the previous calculation we have used the fact that \( \displaystyle \left( {\begin{array}{c}n\\ k\end{array}}\right) \le \frac{n^k}{k!}\).

Therefore, the maximum degree of the dependency graph, s, satisfies

$$\begin{aligned} s+1< \frac{9}{2} M^t, \quad \forall t\ge 2. \end{aligned}$$
(22)

A first approach to the problem is to consider a uniform distribution for the random variables. When doing so, we obtain the following theorem. This result will later be improved by using a refined distribution.

Theorem 9

Let \(\mathcal {Q}=\{ 0,\cdots , q-1\}\) be an alphabet of size \(q\ge 2\). If \(t\ge q\), then there exists a t-frameproof code of length n and size:

$$\begin{aligned} F(t,n,q)\ge \left\lfloor \frac{1}{(9e/2)^{\frac{1}{t}}(1-(1-\frac{1}{q} )^t)^{\frac{n}{t}}} \right\rfloor . \end{aligned}$$
(23)

Proof

Using the above notation, we take the distribution of the random variables \(X_{ij}\), \(1\le i \le M\), \(1\le j \le n\), to be

$$\begin{aligned} \Pr (X_{ij}=0)=\cdots = \Pr (X_{ij}=q-1)=\frac{1}{q}. \end{aligned}$$
(24)

Then, the probability of a bad event \(E(c_i,U^t)\) is

$$\begin{aligned} \Pr [E(c_i,U^t)]= \left( 1-\left( 1-\frac{1}{q}\right) ^t \right) ^n \end{aligned}$$
(25)

Indeed, \(\Pr [c_i(l) = \alpha ] = \dfrac{1}{q}\) with \(\alpha \in \mathcal {Q}\), and the result follows from the independence of the random variables \(X_{ij}\).

Having computed both the probability of a bad event, and an upper bound on the value of s (22), according to the Lovász Local Lemma, all bad events can be avoided if

$$\begin{aligned} e\left( 1-\left( 1-\dfrac{1}{q}\right) ^t\right) ^n 9M^{t}/2\le 1 \end{aligned}$$
(26)

Therefore,

$$\begin{aligned} M\le \frac{1}{(9e/2)^{\frac{1}{t}}(1-(1-\frac{1}{q} )^t)^{\frac{n}{t}}}, \end{aligned}$$
(27)

and the theorem follows.

The versatility of the Lovász Local Lemma allows to accommodate for alternative distributions of the random variables. To improve the result in the previous theorem, we are going to follow the approach in [9], where the authors use an interesting non-uniform distribution. Note that, in a related context, a similar approach was used in [22]. The proof of the following theorem follows along the lines of the previous one, and it is given in the Appendix.

Theorem 10

Let \(\mathcal {Q}=\{ 0,\cdots , q-1\}\) be an alphabet of size \(q\ge 2\). If \(t+1\ge q\), then there exists a t-frameproof code of length n and size:

$$\begin{aligned} F(t,n,q)\ge \left\lfloor \frac{1}{(9e/2)^{\frac{1}{t}}} \frac{1}{\left( 1-(1-\frac{q-1}{t+1})(\frac{q-1}{t+1})^t -\frac{q-1}{t+1}(\frac{t}{t+1})^t\right) ^{\frac{n}{t}} } \right\rfloor . \end{aligned}$$
(28)

Remark 11

According to [9], Theorem 10 is an improvement on Theorem 9 when \(q\le \frac{t}{2}+1\), and \(t\ge 8\).

4.2 Separable codes

For the family of separable codes, and in view of Lemma 2, we focus on the interesting and already non-trivial case \(t=2\). This is also the main focus in [8]. To establish a lower bound for \(\bar{2}\)-separable codes, we again take Mn random variables \(X_1,\ldots ,X_{Mn}\), and arrange them as an \(M \times n\) matrix C, with rows \(\{c_1,\dots ,c_M\}\), \(c_i=\{ X_{i1},\ldots ,X_{in}\}\). As before, this is our representation of a code of size M and length n, where code words correspond to assignments to the random variables associated with a row, that is position j of code word \(c_i\) corresponds to matrix entry \(X_{ij}\). We obtain the following result, whose proof is given in the Appendix.

Theorem 12

Let \(\mathcal {Q}=\{ 0,\cdots , q-1\}\) be an alphabet of size \(q\ge 2\). If \(n\ge 2\), then there exist \(\bar{2}\)-separable codes of size

$$\begin{aligned} S(\bar{2},n,q)\ge \left\lfloor \frac{1}{(16)^\frac{1}{3}}\left( \frac{q^3}{2q-1}\right) ^{n/3}\right\rfloor . \end{aligned}$$
(29)

The following corollary states the asymptotic rate.

Corollary 13

be Let \(\mathcal {Q}=\{ 0,\cdots , q-1\}\) an alphabet of size \(q\ge 2\). If \(n\ge 2\), then there exist \(\bar{2}\)-separable codes of rate

$$\begin{aligned} s(\bar{2},q) \ge 1-\frac{\log _q (2q-1)}{3} \end{aligned}$$
(30)

Proof

Just apply the definition of \(s(\bar{2},q)\) in (6).

Note that this is the same rate obtained in Corollary 4 of [8].

4.3 B2 codes

For \(B_2\) codes we have the following result, whose proof will be given in the Appendix.

Theorem 14

Let \(\mathcal {Q}=\{ 0,\cdots , q-1\}\) be an alphabet of size \(q\ge 2\). If \(n\ge 2\), then there exist \(B_2\) codes of size

$$\begin{aligned} B(n,q)\ge \left\lfloor \frac{q^{n/3}}{2}\right\rfloor . \end{aligned}$$
(31)

Our result and the one in Theorem 8 of [8] are asymptotically identical, as the following corollary shows.

Corollary 15

Let \(\mathcal {Q}=\{ 0,\cdots , q-1\}\) be an alphabet of size \(q\ge 2\). If \(n\ge 2\), then there exist \(B_2\) codes of rate

$$\begin{aligned} b(q) \ge \frac{1}{3}. \end{aligned}$$
(32)

Proof

The proof is immediate using (31) in (7).

5 Combinatorial constructions using the variable framework

In this section we provide probabilistic combinatorial constructions, for t-frameproof codes, \(\bar{2}\)-separable codes and \(B_2\) codes. The constructions are obtained as the output of an algorithm. First, we study the complexity of Algorithm 1 (see Section 2.3).

5.1 Expected number of iterations

We extend the work in [10], and deal with the expected number of Resample calls made in line 3 of the Body, and line 3 of the Resample routine. Let us first give the explicit expression of A in (14). The work in [23] proves that

$$\begin{aligned} \frac{ \sqrt{2\pi } e^{-n} n^{n+1}}{\sqrt{n}}< n!< \frac{ \sqrt{2\pi } e^{-n} n^{n+1}}{\sqrt{n-1}}, \end{aligned}$$

and therefore,

$$\begin{aligned} \left( {\begin{array}{c}(s+1)v\\ v\end{array}}\right)&< \frac{ \dfrac{\sqrt{2\pi }e^{-(s+1)v}((s+1)v)^{(s+1)v+1}}{\sqrt{(s+1)v-1}} }{ \dfrac{\sqrt{2\pi }e^{-sv}(sv)^{sv+1}}{\sqrt{sv}} \dfrac{\sqrt{2\pi }e^{-v}v^{v+1}}{\sqrt{v}} }\end{aligned}$$
(33)
$$\begin{aligned}&=\frac{ \sqrt{s}(s+1)^v }{ \sqrt{2\pi }\sqrt{(s+1)v-1} } \frac{ (s+1)^{sv+1} }{ s^{sv+1} }\end{aligned}$$
(34)
$$\begin{aligned}&=f(s,v)\Bigg (\Big (1+\frac{1}{s}\Big )^s (s+1)\Bigg )^v,\end{aligned}$$
(35)
$$\begin{aligned}&\hbox { where } f(s,v)=\frac{s+1}{\sqrt{2\pi }\sqrt{s}\sqrt{(s+1)v-1}}. \end{aligned}$$
(36)

Now, according to (13),

$$\begin{aligned} T_v=\frac{1}{sv+1}\left( {\begin{array}{c}(s+1)v\\ v\end{array}}\right)< \frac{f(s,v)}{sv+1}\, e^v(s+1)^v<\frac{s+1}{\sqrt{2\pi }s^2v}e^v(s+1)^v<\frac{e^v(s+1)^v}{\sqrt{2\pi }(s-1)v}. \end{aligned}$$
(37)

Observe that constant A in (14) corresponds to taking \(v=1\) in f(sv), that makes A dependent only on s, and in particular \(A<1\).

Take a given event, say \(E_{i}\), and let \(\mathbb {E}[E_{i}]\) be the expected number of times \(E_{i}\) is resampled. According to the reasoning in the proof of Theorem 6, we have that a fully \((s+1)\)-ary rooted planar tree with \(v\) internal nodes serves us to analyze a Resample call and its recursion.

Let \(\mathcal {T}_{i}\) be the set of witness trees rooted in \(E_{i}\), and let \(\mathcal {T}_{i}^v\) be the set of witness trees rooted in \(E_{i}\) with \(v\) nodes. Finally, let \(P_T\) be the probability that a tree T is a witness tree in an execution of Algorithm 1. Then, by the reasoning done in Section 2.3,

$$\begin{aligned} \mathbb {E}[E_{i}] \le \sum _{T\in \mathcal {T}_{i}} P_T = \sum _{v=1}^{\infty }\Bigg (\sum _{T\in \mathcal {T}_{i}^v} P_T\Bigg ). \end{aligned}$$
(38)

Using Lemma 7, the expectation in (38) can be bounded as follows:

$$\begin{aligned} \sum _{v=1}^{\infty }\Bigg (\sum _{T\in \mathcal {T}_{i}^v} P_T\Bigg )<&\sum _{v=1}^\infty \Big (T_vp^v\Big )\nonumber \\<&\sum _{v=1}^{\infty }\Bigg (\frac{1}{sv+1}\left( {\begin{array}{c}(s+1)v\\ v\end{array}}\right) p^v\Bigg )\nonumber \\ <&\frac{ 1 }{ \sqrt{2\pi }(s-1) } \sum _{v=1}^{\infty }\frac{1}{v} \Big (ep(s+1)\Big )^v\nonumber \\ =&\frac{ 1 }{ \sqrt{2\pi }(s-1) } (-\ln (1-ep(s+1))). \end{aligned}$$
(39)

We have used the well known Taylor expansion \(\,\ln (1-x)=-\sum _{v=1}^{\infty }\frac{x^{v}}{v}, \, \forall |x|<1\), and the fact that since a witness tree is a feasible tree, then we can sum over the number of feasible trees. Finally, adding for the total number of possible bad events, we have the following proposition:

Proposition 16

Let \(E_1,\ldots ,E_m\) be bad events that are to be avoided. The expected number of Resample calls (both in the main body and recursive routine) in an execution of Algorithm 1 is at most:

$$\begin{aligned} \sum _{j=1}^m \mathbb {E}[E_{j}] < \frac{1}{\sqrt{2\pi }} \frac{m}{(s-1)} (-\ln (1-ep(s+1))),\quad \forall s\ge 2. \end{aligned}$$
(40)

Proof

The proposition follows from (38) and (39).

Corollary 17

Let \(E_1,\ldots ,E_m\) the bad events that are to be avoided. If \(ep(s+1)\le 2/3\) and \(s\ge 2\), then the expected number of Resample calls (both in the main body and recursive routine) in an execution of Algorithm 1 is at most \(\frac{m}{s}\).

Proof

Since \(-\ln (1-x)\le x+x^2\), \(\forall x\in [0,2/3]\), in view of (40), an upper bound on the expected number of steps is

$$\begin{aligned} \frac{ 10 m }{ (s-1)9\sqrt{2\pi } }<\frac{m}{s}, \quad \forall s\ge 2. \end{aligned}$$
(41)

Remark 18

In their paper, Moser and Tardos [12] also discussed the expectation of the number of times an event is going to be resampled. Adapted to Lemma 5, their result states that

$$\begin{aligned} \sum _{\mathrm {events\ } E_{i} } \mathbb {E}[E_{i}]\le \sum _{\mathrm {events\ } E_{i} } \frac{2\Pr [E_i]}{1-2\Pr [E_i]}. \end{aligned}$$
(42)

We will have occasion to use (42) when we discuss the complexity of constructions for separable and \(B_2\) codes.

5.2 Frameproof codes

We begin by giving constructions of t-frameproof codes over an alphabet of size q. We will deal with the interesting case \(q=2\), but to understand the impact of the alphabet size, we start by discussing the case of a larger alphabet \(q=t\). Observe that \(q=t\) minimizes the denominator of (23).

5.2.1 Plain algorithmic construction

For \(q=t\) and in order to obtain compact expressions, it is more convenient to use Theorem 9. We will use the improvement given in Theorem 10 for \(q=2\). Let us impose a stronger restriction to (26),

$$\begin{aligned} e\left( 1-\left( 1-\dfrac{1}{t}\right) ^t\right) ^n \frac{9}{2}M^{t}\le \frac{2}{3}, \end{aligned}$$
(43)

which leads to

$$\begin{aligned} n\ge \frac{\ln (27eM^t/4)}{-\ln (1-(1-\frac{1}{t})^t)}. \end{aligned}$$
(44)

Since \(x< -\ln (1- x)\), \(\forall x\in (0,1)\), and \((1-1/t)^t\ge 1/4\), \(\forall t\ge 2\), then

$$\begin{aligned} \frac{\ln (27eM^t/4)}{-\ln (1-(1-\frac{1}{t})^t)} < 4\ln (27eM^t/4). \end{aligned}$$
(45)

Finally, we observe that \(4\ln (27eM^t/4)<6t\ln M -1\), \(\, \forall t\ge 3, \, \forall M\ge 8\). If \(t=2\), we can directly check that equation (44) is satisfied for \(n\ge 6t\ln M -1.\) Then,

$$\begin{aligned} n\ge \lfloor 6t\ln t \log _t M \rfloor ,\, \forall t\ge 2. \end{aligned}$$
(46)

Proposition 19

Let \(t\ge 2\), \(M\ge 8\), and \( n= \lfloor 6t\ln t \log _t M \rfloor \). Using Algorithm 1, t-frameproof \((n,M)_t\) codes can be constructed, with an expected number of Resample calls less than \(\frac{M}{t}\).

Proof

The codes can be obtained using Algorithm 1 with the following input:

  • Integers \(t\ge 2\), \( M\ge 8\) and \(n=\left\lfloor { 6t\ln t \log _t M} \right\rfloor \). Alphabet \(\mathcal {Q}\) of size t, \(\mathcal {Q}:=\{0,\dots ,t-1\}\).

  • Random variables: \(\{X_{ij}: \mathcal {Q}\rightarrow \mathcal {Q},\ 1\le i\le M,\, 1\le j \le n\}\).

  • Probability mass function: \(\Pr (X_{ij}=0)=\cdots = \Pr (X_{ij}=t-1) =\frac{1}{t}\).

  • Bad events:

    • Arrange the r.v. \(X_{ij}\) as an \(M\times n\) matrix C, with \((C)_{ij}=X_{ij}\).

      • * Let \({C}_{row}\) the set of rows of C.

      • * Define the set \(\{(r,T)\ \mid \ T\subset {C}_{row}, \, |T |=t, \, r\in {C}_{row}\setminus T \}\).

      • * Order the previous set and denote it by \(\mathcal {R}\).

      • * Let E(rT) be the bad event \(r\in \textrm{desc}(T)\), in the sense of Section 4.1.

    • Define the ordered set \(\mathcal {E} := \{E(r,T) \ \mid \ (r,T) \in \mathcal {R} \} \).

The existence of the code is guaranteed by the reasoning leading to (46). It only remains to prove the statement about the expected number of Resample calls. Since we have imposed (43), then according to Corollary 17 we have to find an upper bound on \(\frac{m}{s}\).

Since the total number of events is \(m=\left( {\begin{array}{c}M\\ 1\end{array}}\right) \left( {\begin{array}{c}M-1\\ t\end{array}}\right) \), and \(s+1=\left( {\begin{array}{c}M\\ 1\end{array}}\right) \left( {\begin{array}{c}M-1\\ t\end{array}}\right) -\left( {\begin{array}{c}M-(t+1)\\ 1\end{array}}\right) \left( {\begin{array}{c}M-(t+2)\\ t\end{array}}\right) \), we have

$$\begin{aligned} \frac{m}{s+1}&< \frac{ M\left( {\begin{array}{c}M-1\\ t\end{array}}\right) }{ M\left( {\begin{array}{c}M-1\\ t\end{array}}\right) -M\left( {\begin{array}{c}M-(t+2)\\ t\end{array}}\right) } = \frac{ 1 }{ 1- \frac{ \left( {\begin{array}{c}M-(t+2)\\ t\end{array}}\right) }{ \left( {\begin{array}{c}M-1\\ t\end{array}}\right) } } < \frac{M-t}{t+1}. \end{aligned}$$
(47)

This is because

$$\begin{aligned} \frac{\left( {\begin{array}{c}M-(t+2)\\ t\end{array}}\right) }{\left( {\begin{array}{c}M-1\\ t\end{array}}\right) }&= \frac{(M-(t+2))(M-(t+3))\cdots (M-(2t+1))}{(M-1)(M-2)\cdots (M-t)}\end{aligned}$$
(48)
$$\begin{aligned}&= \prod _{k=1}^t \frac{M-(t+1+k)}{M-k} = \prod _{k=1}^t \left( 1- \frac{t+1}{M-k} \right) \end{aligned}$$
(49)
$$\begin{aligned}&< 1- \frac{t+1}{M-t}. \end{aligned}$$
(50)

Now,

$$\begin{aligned} (M-t)\frac{s+1}{s} \le M\frac{t+1}{t} \, \Longrightarrow \, \frac{m}{s}=\frac{m}{s+1} \frac{s+1}{s}\le \frac{M}{t} ,\end{aligned}$$

and in view of (18), the inequality on the left holds because \(s\ge t\).

Analogously, for binary codes we have the following result:

Proposition 20

Let \(t\ge 2\), \(M\ge 8\), and \(n=\left\lfloor 3t(t+1) \log _2 M\right\rfloor \). Using Algorithm 1, binary t-frameproof \((n,M)_2\) codes can be constructed, with an expected number of Resample calls less than \(\frac{M}{t}\).

Proof

For \(q=2\), we make again a stronger restriction, in this case to (63), and impose

$$\begin{aligned} e \left[ 1-(1-\frac{q-1}{t+1})(\frac{q-1}{t+1})^t -\frac{q-1}{t+1}(\frac{t}{t+1})^t\right] ^n \frac{9}{2}M^t \le \frac{2}{3}. \end{aligned}$$
(51)

Therefore, the code length n has to satisfy

$$\begin{aligned} n&\ge \frac{\ln (27eM^t/4)}{-\ln \left( 1-\frac{t}{t+1}\left( \frac{1}{t+1}\right) ^t-\frac{1}{t+1}\left( \frac{1}{1+\frac{1}{t}}\right) ^t\right) }. \end{aligned}$$
(52)

Since

$$\begin{aligned} -\ln \left( 1-\frac{t}{t+1}\left( \frac{1}{t+1}\right) ^t-\frac{1}{t+1}\left( \frac{1}{1+\frac{1}{t}}\right) ^t\right)> -\ln \left( 1-\frac{1}{e(t+1)}\right) >\frac{1}{e(t+1)}, \end{aligned}$$

then

$$\begin{aligned} \frac{\ln (27eM^t/4)}{-\ln \left( 1-\frac{t}{t+1}\left( \frac{1}{t+1}\right) ^t-\frac{1}{t+1}\left( \frac{1}{1+\frac{1}{t}}\right) ^t\right) }< e(t+1) \ln (27e M^t/4). \end{aligned}$$

Finally, since \(e(t+1) \ln (27e M^t/4)<3t(t+1)\log _2 M -1\), for all \(t\ge 3, \, M\ge 8\), we can safely take \(n\ge \lfloor 3t(t+1)\log _2 M\rfloor \). The case \(t=2\) can be checked directly from equation (52).

The codes can be can be constructed using Algorithm 1 with the following input:

  • Integers \(t\ge 2\), \(M\ge 8\), and \(n= \lfloor 3t(t+1)\log _2 M\rfloor \). Alphabet \(\mathcal {Q}\) of size 2, \(\mathcal {Q}:=\{0,1\}\).

  • Random variables: \(\mathcal {C} = \{X_{ij}: \mathcal {Q}\rightarrow \mathcal {Q},\ 1\le i\le M, 1\le j \le n\}\).

  • Probability mass function: \(\Pr (X_{ij}=0)=\frac{t}{t+1}, \Pr (X_{ij}=1) =\frac{1}{t+1}\).

  • Bad events:

    • Arrange the r.v. \(X_{ij}\) as an \(M\times n\) matrix C, with \((C)_{ij}=X_{ij}\).

      • * Let \({C}_{row}\) the set of rows of C.

      • * Define the set \(\{(r,T)\ \mid \ T\subset {C}_{row}, \, |T |=t, \, r\in {C}_{row}\setminus T \}\).

      • * Order the previous set and denote it by \(\mathcal {R}\).

      • * Let E(rT) be the bad event \(r\in \textrm{desc}(T)\), in the sense of Section 4.1.

    • Define the ordered set \(\mathcal {E} := \{E(r,T) \ \mid \ (r,T) \in \mathcal {R} \} \).

The claim about the number of iterations is proved in the same manner as in Proposition 19, so we omit it.

Remark 21

We would like to point out that performing the same analysis leading to (46), using (26) instead of (43), would lead to a value of n of the same order of magnitude.

Remark 22

Observe that, according to Proposition 19 and Proposition 20, we can take \(M\le t^{n/(7t\ln t)}\) and \(M\le 2^{n/(3t(t+1))}\), for codes with alphabet size t and 2, respectively. This means that the number of Resample calls is exponential in the code length. Moreover, consider line 2 in \(\textsc {Body}\) of Algorithm 1. For the algorithm to find the least indexed event, in the worst case, it is needed to go over all events in \(\mathcal {E}\), and check if they are bad. There are approximately \(M^{t}\) events in \(\mathcal {E}\). Moreover, in line 2 of Resample, the algorithm must check all events in the neighborhood of the event that has been resampled. In both cases, and given the bound we have proved, the number of events that need to be checked is exponentially large in n. We deal with this situation in the following section.

5.2.2 Polynomial complexity constructions

Let us overcome the drawback stated in Remark 22, and construct codes with complexity polynomial in the code length. To do so, we will resort to concatenated constructions.

The concept of frameproof codes goes as far as the work of Boneh and Shaw in [1]. In that paper, the following result is stated:

Lemma 23

[1] Let \(\mathcal {C}\) be an \((n,M,d)_Q\) code. If \(d>n-\frac{n}{t}\), then \(\mathcal {C}\) is a t-frameproof code.

Now, using (10) we obtain the following result.

Lemma 24

An AG t-frameproof \((n,M,d)_{Q}\) code can be constructed, for rates

$$\begin{aligned} R+\epsilon = \frac{1}{t}-\frac{1}{\sqrt{Q}-1}, \end{aligned}$$
(53)

and polynomial complexity \(O((n\log _{Q}n)^3)\).

Proof

From Lemma 23 we have that \(\frac{d}{n} > 1-\frac{1}{t}\) is a sufficient condition for the frameproof property. With this in mind, the lemma is a consequence of (10). The result about the complexity is stated in [18].

Proposition 25

[1] If \(\mathcal {C}_i\) is a t-FP code of rate \(R_i\), and \(\mathcal {C}_o\) is a t-FP code of rate \(R_o\), then \(\mathcal {C}_i\circ \mathcal {C}_o\) is a t-FP code of rate \(R_iR_o\).

In view of the previous proposition, for the inner code we take the constructions for t-frameproof codes (alphabet sizes \(q=2\) and \(q=t\)) that we have presented in Propositions 19 and 20. In both cases, for the outer code we take an alphabet \(\mathcal {Q}\) of size \(|\mathcal {Q}|=Q\ge t^{\beta }\), with \(\beta >2\). Then, from (53) we have:

$$\begin{aligned} R_o+\epsilon \ge \frac{1}{t}-\frac{1}{t^{\beta /2} -1}, \hbox { i.e., }\, R_o=\frac{1}{t}\left( 1-o(1)\right) . \end{aligned}$$
(54)

Now, we are in position to state the following theorem:

Theorem 26

Using the variable framework, with \(t\ge 2\), we can construct t-frameproof codes, over an alphabet of size t, of rate

$$\begin{aligned} R=\frac{1}{6t^2\ln t }(1-o(1)), \end{aligned}$$
(55)

with polynomial complexity in the code length.

Proof

For the outer code we take an AG \((n_o,M_o,d_o)_{Q_o}\) code as given by Lemma 24, over an alphabet \(\mathcal {Q}_o\) of size \(|\mathcal {Q}_o |=Q_o\ge t^{\beta }\), \(\beta > 2\). Note that \(Q_o\) has to be a prime or a prime power, therefore we can choose \(Q_o\le 2\lceil t^\beta \rceil \), according to Bertrand’s postulate.

Now, the concatenated construction imposes to take an inner code of size \(M_i=Q_o\). According to Proposition 19, we can construct such a code using Algorithm 1, with an expected number of Resample calls less than \(M_i/t=Q_o/t\le 2\lceil t^{\beta }\rceil /t < 2t^{\beta }\). From Lemma 23, we have that \(t< n_o\), and therefore, the expected number of Resample calls is less than \(2n_o^{\beta }\), i.e. polynomial in the code length \(n=n_on_i\). Moreover, from Lemma 24, the complexity of constructing the outer code is also polynomial in the code length. Finally, the claim about the rate is straightforward from (54), taking an inner code of rate

$$\begin{aligned} R_i =\frac{1}{6t\ln t}, \end{aligned}$$
(56)

(which is again possible by Proposition 19), and then applying Proposition 25.

For the binary case and \(t\ge 3\), using Proposition 20, we have the following theorem, whose proof is analogous to the previous one.

Theorem 27

Using the variable framework, for \(t\ge 3\), we can construct t-frameproof binary codes of rate

$$\begin{aligned} R=\frac{1}{3t^2(t+1)}(1-o(1)), \end{aligned}$$
(57)

with polynomial complexity in the code length.

5.3 Separable codes and B2 codes

For the case of \(\overline{2}\)-separable codes and \(B_2\) codes we have the following proposition, whose proof is given in the Appendix.

Proposition 28

Using the variable framework, we can construct \(\overline{2}\)-SC \((\displaystyle \left\lfloor \frac{4\log _q M}{3-\log _q(2q-1)} \right\rfloor ,\)\( M)_q \) codes, with an expected number of Resample calls less than \(\frac{M}{9}\)\(\forall M\ge 16\).

For the binary case we have:

Corollary 29

By means of the variable framework, \(\overline{2}\)-SC \((\displaystyle \left\lfloor \frac{4\log _2 M}{3-\log _2 3} \right\rfloor ,M)_2\), binary codes can be constructed. The expected number of Resample calls is less than M/9,  \(\forall M\ge 16\).

Again, as in the case of frameproof codes, Remark 22 applies, and since M is exponential in the code length n, so are the expected number of Resample calls. In order to construct codes with polynomial complexity in the code length, we use code concatenation again. For the outer code we have, by Lemma 2, that a t-FP code is a \(\overline{t}\)-SC code, so we can use Lemma 24 to obtain the following result:

Theorem 30

Using the variable framework, we can construct \(\bar{2}\)-\(SC(n,M)_q\) codes, of rate R satisfying

$$\begin{aligned} R+\epsilon = \frac{1}{8} -\frac{\log _q(2q-1)}{24}, \end{aligned}$$
(58)

with polynomial complexity in the code length.

Proof

For the outer code we take an AG \((n_o,M_o,d_o)_{Q_o}\) code as given by Lemma 24, with \(t=2\), over an alphabet of size \(Q_o\ge 2^\beta \), \(\beta \ge 4\). Again, according to Bertrand’s postulate, we can choose \(2^\beta \le Q_o\le 2 \lceil 2^\beta \rceil \). According to (54), the rate \(R_o\) of this code satisfies

$$\begin{aligned} R_o+\epsilon \ge \frac{1}{2}-\frac{1}{2^{\beta /2}-1}\ge \frac{1}{6}, \, \forall \beta \ge 4. \end{aligned}$$
(59)

Now, the concatenated construction imposes to take an inner code of size \(M_i=Q_o\), over an alphabet of size \(q<Q_o\). According to Proposition 28, we can construct such an inner code using Algorithm 1, with an expected number of Resample calls less than \(\displaystyle \frac{2\lceil 2^\beta \rceil }{9}\). From Lemma 23, we have that \(t=2\le n_o-1\)  and therefore, the expected number of Resample calls is less than \(2n_o^\beta /9\), i.e., polynomial in the code length \(n=n_on_i\). Moreover, as before, from Lemma 24, the complexity of constructing the outer code is also polynomial in the code length. Finally, the claim about the rate is straightforward from (), if we take an inner code of rate

$$\begin{aligned} R_i= \frac{3-\log _q(2q-1)}{4} , \end{aligned}$$
(60)

(which is again possible by Proposition 28), and then applying Proposition 25.

Corollary 31

We can construct \(\bar{2}\)-\(SC(n,M)_2\) codes, of rate R satisfying

$$\begin{aligned} R+\epsilon = \frac{1}{8} -\frac{\log _2 3}{24}, \quad \forall \epsilon >0, \end{aligned}$$
(61)

with polynomial complexity in the code length.

Observe that in the previous theorem we have also constructed \(B_2(n,M)_2\) codes, since a \(\bar{2}\)-\(SC(n,M)_2\) is a \(B_2(n,M)_2\) code and vice versa, according to Lemma 2.

6 Conclusions

In this paper we have presented constructions for t-frameproof codes, \(\bar{2}\)-separated codes, and \(B_2\) codes, along with lower bounds for the respective code rates. Although the bounds were already known, our proof strategy leads to probabilistic constructions of such codes in polynomial time with respect to the code length.

Frameproof codes first appear in the work of Boneh and Shaw [1]. The reader familiar with [1] would have noticed that ours is a completely different approach with respect to the original one. Whereas in [1], in order to obtain asymptotically good codes, they concatenate a random outer code with an inner code having structure, we concatenate an outer code having structure with an inner code without structure. The reason to do so is that in this way our codes, as opposed to the codes in [1], are decodable with polynomial complexity in the code length, using algebraic list decoding algorithms (see for instance [24]). Moreover, the complexity of constructing our codes is polynomial in the code length. For this reason, both approaches are not comparable.

Separable codes were developed in order to make fingerprinting schemes for multimedia contents resistant to the averaging collusion [3]. There is a vast amount of work in the literature related to both upper and lower bounds. On the other hand, actual constructions are scarce and mostly restricted to codes of short length. We have also presented the first known constructions for the already non trivial case of \(t=2\). Moreover, the codes obtained have asymptotic positive rate. These constructions readily give \(B_2\) codes.