Secure code design against collusion attacks for protecting digital content rights

This paper proposes a group-based collusion-secure code design technique for protecting copyrights on digital contents. Designing collusion-secure codes were difficult and there were problems in creating these codes in large numbers, as these codes should enable detecting collusion attacks when occurring in content with inserted forensic marks. In addition, there was a problem in applying these codes to actual services in that unique user-specific codes could not be issued. This paper proposes group anti-collusion codes (ACCs) made by introducing the concept of groups to existing balanced incomplete block design (BIBD) matrices. This would solve the problem of complexity in code design and increase the number of collusion-secure codes. For the group ACCs, problems in securing the uniqueness and verification of codes were overcome using set operations. Using the group ACCs, code complexity can be reduced when compared to existing collusion-secure codes, and code quantities can be increased when compared to code lengths.

protection and management technologies designed to solve problems in relation to the production, distribution, and reprocessing of digital content have been intensively conducted to protect contents, and demands for those technologies have been gradually increasing in related areas [4,5].
In particular, watermark technology can insert ownership information into digital content as well as extract information from content reproduced illegally. This technology can prove original ownership by comparing data; however, it can only identify content reproduced illegally and cannot identify distributers or their routes. On the other hand, forensic mark technology inserts information that combines owner and buyer data into the content. Therefore, the locations of forensic marks can be determined using the inserted information, which varies by content [3,6,12].
Collusion-secure technologies used for forensic marks ensure contents are strong against attacks that can remove and create new forensic marks used to distribute the contents illegally. These technologies include the C-Secure codes proposed by Boneh [1], the d-detecting codes proposed by Dittmann [2], and the ACC using Balanced Incomplete Block Design (BIBD) matrices, as proposed by Trappe and Wu [9]. However, these collusion-secure technologies involve design difficulties in that they cannot be created in large quantities.
Such collusion technologies that prevent illegal public usage of content after public use so that its use can be detected through reverse combination had difficulties due to issues related to the exponentially increasing length of the codes if the number of users increased. For these reasons, there were problems of being unable to produce large numbers of collusion-secure codes [7]. This paper proposes a code design algorithm for the creation of collusion-secure forensic marks to protect copyrights. The introduction presents the current state of copyright technologies followed by a discussion of collusion attacks and collusionsecure codes in Chapter 2. Chapter 3 outlines collusion-secure code designs to prevent attacks on digital contents and verification of codes. Finally, in Chapter 4, the utility of the proposed algorithm is analyzed while future study directions and expected effects are described.
2 Related studies 2.1 Collusion attack Forensically marked contents have slightly different data values according to buyers because different buyer information is inserted into the contents. For this reason, many buyers may collude with each other to determine forensically marked positions in contents and hide the identities of conspirators by erasing forensic marks or inserting and redistributing new forensic marks using the relative differences between the contents. Attacks of this type are called collusion attacks [10].
Collusion attacks mainly use methods of reducing correlation coefficient values because forensic marks are extracted using correlation coefficients. The type of collusion attack includes averaging attacks, max/min attacks, negative-correlation attacks, zerocorrelation attacks, and mosaic attacks. Figure 1 is a concept map of forensic marks and collusion attacks.

Averaging attack
Averaging attacks are an attack technique that averages data values at the same position to create new forensically marked content.
In Expression 1, b e j is the contents obtained through averaging attacks on K pieces of forensically marked contents. In the same expression, p j is the coefficient value of the forensically marked contents and w i,j and c w j represent the forensic mark information inserted as watermarks and those changed by an averaging attack, respectively.
The forensic mark information w i,j of the contents created by Expression 1 by the averaging attack includes the signals decreased by 1 K . Fig. 1 System structure Expression 2 is intended to obtain the correlation coefficient of the forensic marks of the contents colluded by an averaging attack. From Expression 2, it can be seen that the correlation coefficient value of colluded contents decreases proportionally to ffiffiffiffi K p .

Max/min attack)
This is an attack method proposed by Stone that obtains the minimum the maximum values from forensically marked contents that have participated in collusion. It creates new content using the average of the minimum and maximum values [8].
In Expression 3, max i = 1 K (w i,j ) indicates the maximum value among the coefficients of the K pieces of forensically marked contents that have participated in collusion and min i = 1 K (w i,j ) indicates the minimum value. From Expression 4, it can be seen that the correlation coefficient of the forensic mark information w i,j of the content created by Expression 3 decreases proportionally to ffiffiffi ffi K p .

Negative-correlation attack
Negative-correlation attacks are a method proposed by Stone [8] that makes the values of correlation coefficients into negative numbers when forensic marks are extracted using correlation coefficients to make the extraction of correlation coefficients difficult. The method of making colluded contents is as shown in Expression 5 shown below.
In Expression 5, max(⋅), min(⋅), and med(⋅) represent the maximum value, the minimum value, and the median value, respectively. α is a coefficient used to adjust the max(⋅)and min(⋅) values, and it generally has a value of 0.5. The maximum, minimum, and median values are obtained from the contents that have participated in collusion. If the average of the maximum and minimum values is smaller than the median value, the minimum value will be taken. If not, the maximum value will be taken. This will reverse the polarity of the forensic mark information to turn the value of the correlation coefficient into a negative number.

Zero-correlation attack
Although Stone's negative-correlation attacks induce correlation coefficients into negative numbers, this does not mean that the forensic mark information has been erased. Zerocorrelation attacks induce correlation coefficients to become close to zero so that forensic mark information cannot be detected, as proposed by Wahadaniah et al. [10]. Expression 6 is a method of inducing correlation coefficients to become close to zero.
where w i,j represents the content that participated in collusion, and it is a targeted content. Unlike negative-correlation attacks, these attacks compare the maximum and minimum values with the forensic mark information of certain content that participated in collusion, rather than the median value, to create collusion content with a polarity opposite to that of the abovementioned content. The created collusion content is not correlated with other forensically marked content, either. That is, correlation coefficients are maintained close to zero.

Mosaic attack
Unlike the abovementioned attacks that make the correlation coefficient values small using the maximum and minimum values of the content that participated in collusion, this is an attack method that divides forensically marked content into many small geometric figures to create new content. Because forensic mark information is extracted as files, content divided into many pieces cannot be easily extracted.
To insert forensic mark information that is strong against mosaic attacks, the forensic mark information should be minimized, insertion areas should be in the smallest units, and the information should be inserted repeatedly so that the forensic mark information can never be removed after any form of mosaic attack.

Collusion-secure code
In general, forensic marks are made using random permutations so that those for individual buyers are not correlated with each other, providing some robustness against collusion attacks. However, there is a problem in that as the number of conspirators increases, the number of necessary codes increases exponentially. To solve this problem, the codes that have common parts at different positions by buyer are being designed and studied.

C-secure code
Boneh proposed c-secure codes that are robust against collusion attacks using redundancy among codes [1]. This technique defines codes to be assigned to different buyers as (l, n)-codes consisting of words where the length is l and the number is n.
In Expression 7, ∑ l represents words where the length is l, and Γ represents sets of marks that are to be inserted as forensic marks. That is, word w (i) is assigned to each buyer. The technique, proposed by Boneh, limited the range of collusion by presenting an insertion assumption that collusion attacks can detect inserted marks only when they are different from each other, and marks not detected cannot be changed without causing damage to the content. Therefore, according to this assumption, the set (F) may be colluded, as shown by Expression 8.
In this expression, C represents buyer collusion, u represents the indices of those conspirators that participated in the collusion, and R represents the positions of marks that have not been detected. {?} represents the values of marks at positions not detected. That is, based on the insertion assumption, those sets that may be colluded include all new collusion codes that necessarily include all values that have not been detected because they overlap among codes.
To eliminate cases where collusion-created codes include the codes of buyers that did not participate in collusion, Boneh defines c-frame proof codes as those that satisfy F(W) ∩ Γ = W for all forensic marks W⊂ Γ. For instance, (n, n)codes are c-frameproof codes. If Γ 0 (n) is assumed to be n-bit binary codes that have only one (1), Γ 0 (3) codes for three buyers are will be follows. The codes that may be colluded and can be created from the three codes again become Γ 0 (3) codes, according to the definition of sets that may be colluded. Therefore, the three conspirators become unable to put the blame on other buyers using new, different codes created by collusion. Based on these assumptions, Boneh proposed a tracking algorithm that can be used to follow c-secure codes and participating conspirators. Seven codes for four buyers are as follows: where four represents the number of codes and three represents the number of repetitions. buyer1: 111111111 buyer2: 000111111 buyer3: 000000111 buyer4: 000000000 Each of the assigned codes is randomly recombined and inserted into the content. When the recombined codes have been extracted, the originally inserted codes are found through reverse combination. Once collusion has been carried out, conspirators can be tracked based on the number of one, according to positions. Although the Boneh's method is creating codes that are robust against collusion attacks in environments with limited numbers of conspirators, this method is not suitable for application in contents of limited sizes because the codes can be easily guessed. This is because they are simple and the necessary length of codes increases exponentially as the number of conspirators increases.

D-detecting code
As another method of creating collusion-secure codes, there are d-detecting codes proposed by Dittmann [2]. To detect two conspirators, these codes compose forensic marks based on finite projective geometry. Figure 2 shows composing forensic marks in a Pano space, which is representative of an infinite projective geometry. As shown in the figure, Pano spaces are composed of points and lines. A Pano space that is composed of seven points and lines can create four forensic marks. That is, as shown in the figure, the three lines and the line that forms the circle on the center become forensic marks robust enough to protect against collusion attacks [9].
Code vectors are vectors that indicate the bit values of the positions of points that constitute lines as '1' and indicate those of other points as '0'. For instance, when the code of Forensic Mark 1 has been assigned to User1, the code vector indicates only the position bits of points 1, 2, and 3 as '1' and indicates those of the remaining four bits as '0.' That is, the code vector becomes {1110000} and this becomes the forensic mark of User 1. In the same method, the forensic mark vector of User 2 becomes {1000110} and that of User 3 becomes {0011100}. As shown in Fig. 2, forensic marks that have been composed as a Pano space always intersect at one point. Two conspirators can be tracked based on the position of this point of intersection.
Although Dittmann's 2-detecting codes show robustness against collusion attacks by two conspirators to some extent by creating commonality between codes at different positions, these codes have weaknesses in that the number of conspirators that can be detected is too limited and the codes are so simple that they can be easily assumed by attackers. In addition, the application of the code-creating method using finite projective geometry in environments with more than two conspirators is impracticable because the length of codes increases as an infinite series when the number of users as conspirators increases. In addition, composing finite projective geometry in Internet environments where many users should be supported is very difficult.

Anti-Collusion Code(ACC)
Trappe and Wu proposed ACCs for multimedia data using BIBD to enable detecting at least one conspirator [9,11]. ACCs constitute code vectors that enable detecting k conspirators from the positions of bit value '1' in the bit string, which are obtained by taking the logical operator AND from the forensic mark vector C = {c 1 ,c 2 ,…,c n } of k conspirators. A basic idea is to create unique codes for individual conspirators and track the conspirators based on the positions of bit value '1' in the codes. To design these codes, BIBD matrices are first used. Individual columns of BIBD matrices maintain unique bit patterns; when some column vectors have been overlapped and logical AND operations have been performed, the positions that have bit value'1' are unique. Original owners of the colluded content can be identified from these unique positions. Each BIBD The five parameters satisfy the set under Expressions 9 through 12, and a BIBD code of size v × b is an Incidence Matrix M, of which the internal values are determined by Expression 13.
The row vectors of M become forensic marks given to b users and this can be used as a collusion-secure code. If the bit set of the column vectors of M is assigned to individual users, forensic marks that include '1' at unique positions during logical AND operations can be made from (k-1) column vectors. For instance, the following matrix M shows a set of forensic marks using (7, 4, 1)-BIBD.
This code set can create and distribute 7-bit forensic marks to seven users and can extract conspirators for collusion attacks on any two-row vectors. If an averaging attack has been attempted on contents in which two forensic marks have been inserted, the forensic marks inserted from the colluded contents can be extracted and conspirators can be identified by finding the points that have a value of '1' at the same position as the extracted bit columns, using Expression 14. For instance, the first two columns in the above matrix were inserted into the contents bought by User 1 and User 2, respectively.
If an averaging attack on these two codes is attempted using logical AND operations, the extracted code vector will become '-1000101' by -s 1 + 0 + 0 + 0 + s 5 + 0 + s 7 . Here, the fact that the conspirators are User 1 and User 2 can be identified from the fifth and seventh bit values of '1.'That is, the fact that code vectors having '1' as the value of the fifth and seventh bits is that only User 1 and User 2 can be seen. However, the method using BIBD matrices is not suitable for application in the current environment either because of the difficulty in code composition and the fact that, as the number of conspirators increases, the length of codes increases.
3 Secure code design using group 3

.1 Group ACC design
Collusion-secure forensic marks can extract data from at least two conspirators within contents that were colluded and illegally distributed. However, these technologies involve problems such as the inability to apply code lengths, difficulty in securing large quantities of codes, or difficulty in composing codes. Therefore, these technologies cannot be applied to systems in Internet environments due to unspecified quantities of users.
To solve these problems, diverse algorithms have been proposed including some that expand existing BIBD matrices and some that use polynomial expressions. However, these proposals cannot be applied to systems that support unspecified quantities of users. Therefore, this paper proposes a technique that can compose group-based forensic mark sets using existing BIBD matrices, thereby creating efficiency.
A basic idea for extracting the data of at least one conspirator is creating unique codes for tracing them based on the location of the bit value '1'in the codes. To design these codes, BIBD matrices from the paper written by Trappe [11] were used. Each row of the BIBD matrix maintains a unique bit pattern. When logical AND operations are performed by overlapping row vectors in a matrix, the location that has the bit value '1' is unique. BIBD matrices are used to enable identification of the original owner of colluded contents. However, BIBD matrices have a drawback in that there are only seven collusion-secure codes that can be created using matrix M as conspirator-tracing codes. In addition, relatively complicated operations are required to compose collusion-secure codes.
In this paper, the concept of groups was introduced so that a collusion-secure code set W having seven elements can be made using the row vector of matrix M. This is created by using the existing (7,4,1) BIBD matrix to make groups named after the elements of set W, and assigning seven member codes per group for a total of 49 collusion-secure codes. In set W and set CW, copied from set W, if b is assumed to be an element of Wand cb is assumed to be an element of CW, the combined set C can be obtained by substituting the element p = (b, cb) of product set P. This is created by the equation W × CW and joined with the operation function CAT(x, y) = x||y. Therefore, sets can be created by grouping the ACC using the elements of W and CW, as shown in Table 1.
However, when making the above composition, a problem, as shown in Example 1, occurs.
x 1 ∩x 2 ∥y 3 ∩y 4 ¼ 0010111∩0101101∥0111010∩1001011= 0010101 || 0001010 To review the resultant value, the 5th and 7th values of the groups become the 1st and 2nd elements with value '1'of set W. Therefore, the colluded groups become the members of Groups 1 and 2. In addition, for individual members, the 4th and 6th elements of set W become the 3rd and 4th elements with value '1' of the groups and the conspirators become Members 3 and 4. However, in this example, conspirators cannot be identified because the groups to which they belong are unknown and this is against the permission of collusion, which is a requirement for forensic mark technologies.
To solve the problem as shown by the results of Example 1, it is assumed that an AND attack has been made on the element pair in a set, as shown by the following Expression 15.

A∥B; C∥D A∥D; C∥B ð15Þ
The resultant value from AND operations is shown by the following Expression 16.
In this case, the commutative law of Boolean algebra creates the same resultant values. Therefore, a technique to prevent this is necessary. In this study, the problem was solved by adding verification codes. As principles for the verification codes, the theorem and law of Boolean algebra for logical operations in Table 2 were used. Member 1 Member 1 Member n x n || y n Although there are associative and commutative laws, according to the definition of Boolean algebra, in the case of AND attacks and OR attacks, the associative and commutative laws can be prevented by using XOR operations. In the case of XOR attacks, verification bits can be added using AND operations or OR operations. For instance, according to the resultant values of the below Expressions 17 and 18, if AND operation verification bits are added to AND operation attacks, verification bits will not be formed because of the commutative law.
However, in the case of AND attacks, if XOR operations are used to create verification values, the values will be distinguished by the resultant values of Expressions 19 and 20.
On the contrary, in the case of XOR attacks, AND or OR operations can be used for verification values because of Expressions 21 and 22.
Verification bits can be added to the individual codes as in Example 1 above, and as follows. The collusion code that can be obtained from the first subset will be as follows.
0000101 0001010 0100100 0001001 And the collusion code that can be obtained from the second subset will be as follows.
0000101 0001010 0010100 0000000 Because the collusion code created from the first subset is identical to the collusion code created by the members, the conspirator can be detected.
The results in Example 2 follow the theorem of Boolean algebra and were obtained using the conditions specifying that associative and commutative laws are valid in OR and AND operations, though invalid in XOR operations. This can be generalized as per Theorem 1.

Theorem 1: Group ACC set theorem
CAT(x,y)=x ∥ y is a joint operation function XOR(x,y)=x ⨁ y is an XOR operation function AND(x,y)=x y is an AND operation function W={x 1 ,x 2 ,…,x n } is a collusion-secure code set CW={y 1 ,y 2 ,…,y n }is a copied collusion-secure code set P=W×CW={(a,b) | a∊W, b∊CW } is a product set of set W and set CW Set CXOR and set CAND are obtained using the given functions and sets.
CXOR={ XOR(a,b)| a∊W, b∊W} CAND={ AND(a,b)| a∊W, b∊W} A group-based copy prevention code set group ACC can be obtained by substituting the individual obtained sets into the CAT function. group ACC={ cat(s, q, r)| s∊p, q∊CXOR, r∊CAND } A set with 49 collusion-secure codes can be created by obtaining seven collusion-secure codes and using the existing BIBD matrix according to Theorem 1. In addition, the concept of groups was introduced to the seven codes to obtain seven groups and collusion-secure code sets with seven members per group. Therefore, if collusion-secure code sets are used, collusion-secure codes that are larger by the square in number will be obtained. Using n pieces of collusion-secure code, n*n pieces of code can be made if the proposed technique is used, while the quantity of collusion codes that can be obtained from a bit length of 4n using existing techniques is 4n.

2-Detecting using group ACCs
Created group ACCs can be detected by reverse tracing code creation by theorem 1. The detecting algorithm is as follows. For instance, if it is assumed that User 1 and User 2 have been provided with Group 2-Member 3 codes and Group 3-Member 4 codes, respectively, and User 1 and User 2 made a collusion attack using AND operations, a collusion code will be produced as shown in Example 3. If conspirators are traced using the algorithm, Group 2 and Group 3 that have '1' as the fourth and sixth values will be identified in the case of groups, and Member 3 and Member 4 that have '1' as the second and fourth values will be identified in the case of members. However, two cases of collusion may occur as follows because Member 3 and Member 4 may be in Group 2 or Group 3. If the collusion code in Example 3 is compared to the collusion codes in Case 1 and Case 2, Case 1 will be found to be identical to Case 3 because the third 7 bits of AND operation attacks are checked. Therefore, the fact that Group 2-Member 3 and Group 3-Member 4 colluded with other as shown in Case 1 can be identified, and this indicates that AND operation attacks are accurately traced.
In the case of XOR attacks, a collusion code as shown in Example 4 may be created. If conspirators are traced using the collusion code, Group 2 and Group 3 that have '0' as the fourth and sixth values will be identified in the case of groups, and Member 3 and Member 4 that have '0' as the second and fourth values will be identified in the case of members. However, two cases of collusion may occur, as follows, because Member 3 and Member 4 may be in Group 2 or Group 3.
If the collusion code in Example 4 is compared to the collusion codes in Case 3 and Case 4, Case 3 will be found to be identical to Case 4 because the third 7 bits of XOR operation attacks are checked. Therefore, the fact that Group 2-Member 3 and Group 3-Member 4 colluded with other as shown in Case 3 can be identified and this indicates that XOR attacks are also accurately traced.

Performance evaluation
The algorithm proposed in this paper uses group ACCs, and it introduces the concept of grouping to create codes square with the number of existing codes. Using group ACC relieves the complexity of existing ACC code compositions using BIBD and improves the efficiency of codes. The efficiency of codes can be expressed with Expression 23 below.
Where v is the number of bits necessary to show the code of one user, b shows the number of the entire users. The efficiency of forensic marks means the number of users that can be accommodated by each code vector. The number and efficiency of forensic marks using BIBD and those of forensic marks using group ACC, which is the method presented in this paper, can be compared, as shown in Fig. 3. In Fig. 3, the x-axis shows the number of code vectors that represent all users and the y-axis shows the number of user codes that can be represented by the number of code vectors. When the number of code vectors is seven, where BIBD can create seven codes, the proposed system can create 49 codes.
When the number of code vectors is 33, where BIBD can create 33 codes, the proposed system can create 1,089 codes. As well, when the number of code vectors is 72, where BIBD can create 72 codes, the proposed system can create 5,184 codes. Therefore, although the numbers of codes that can be created are not very different from the number of code vectors, the differences become larger when the number of code vectors increases as the number of codes that can be created by the proposed system increases by square in number. Because the group ACC algorithm of the proposed system creates forensic marks using the existing BIBD, the number of collusion-secure forensic marks becomes the square of b, which is the number Fig. 3 Comparison of the numbers of collusion-secure codes Fig. 4 The efficiency of collusion-secure codes of collusion-secure forensic marks that can be created by the existing BIBD. The code efficiency of the algorithm proposed in this paper can be expressed by Expression 24.
The code efficiency of the proposed system can be identified through Fig. 4. The x-axis shows the number of code vectors and the y-axis shows the efficiency. When the number of vectors is 72, where BIBD's efficiency is 100 %, the proposed system's efficiency is much higher, at 7,200 %.
However, the proposed algorithm has a weakness in that, although the number of elements v remains the same, the length of forensic marks increases to four times because group information and verification values for AND and XOR attacks are added. The magnitude of efficiency of collusion-secure codes is calculated by the number of code vectors in relation to code lengths, as shown in Fig. 5.
The fact is that when code lengths are below four, the efficiency of the existing BIBD is higher, and when code lengths are four or larger, the efficiency of the proposed algorithm is higher, as can be identified in Fig. 5. This indicates that, although code lengths increase slightly in the case of the proposed algorithm, the efficiency of collusion-secure code composition increases.

Conclusion
In this paper, a collusion-secure forensic mark design technique that introduces the concept of groups was proposed for the protection of digital contents. A group ACC algorithm that has more excellent expandability than existing collusion-secure forensic mark technologies expanded sets, of which the elements are the column vectors of given BIBD matrices, was proposed. This was in order to create group-based collusion-secure code sets, to separately create verification sets to solve the problem when two cases occur as group-based collusion-secure codes are introduced, and to obtain group ACC sets using the product and verification sets. The element of a group ACC set becomes a forensic mark, and this recalled the difficulties in composing collusion-secure codes while solving the problem related to the number of collusion-secure codes in order to create codes that can be applied to diverse content protection environments. The fact that the efficiency of using group ACCs was higher than existing methods in terms of the number of codes and code composition could be identified.
Further studies are considered necessary on the construction of integrated forensic mark infrastructures that will enable the extraction of forensic marks by insertion algorithms.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Youngmo Kim received his Ph.D. degree in Computer Engineering from Daejeon University, Daejeon, Korea in 2011. He is currently adjunct professor in Soongsil University and senior researcher Korea Copyright Commission. He is also working on several standardization activities and national project. His research interests are security, computer forensics, DRM(Digital Right Management), fingerprint.