Secure Primitive for Big Data Utilization

In this chapter, we describe two security primitives for big data utilization. One is a privacy-preserving data integration among databases distributed in different organizations. This primitive integrates the same data among databases kept in different organizations while keeping any different data in an organization secret to other organizations. Another is a privacy-preserving classiﬁcation. This primitive executes a procedure for server’s classiﬁcation rule to client’s input database and outputs only the result to the client while keeping the client’s input database secret to the server and server’s classiﬁcation rule to the client. These primitives can be executed not only independently but also jointly. That is, after we integrate databases from distributed organization by executing the privacy-preserving data integration, we can execute a privacy-preserving classiﬁcation.

occurrence of similar accidents can be attributed to a particular defective product. Such defective products should be identified as quickly as possible. However, the databases related to accidents are maintained separately by different organizations. Thus, investigating the causes of accidents is often time-consuming. For example, assume child A has broken her/his leg at school, but it is not clear whether the accident was caused by defective equipment. In this case, information relating to A's injury, such as the patient's name and type of injury, is stored in hospital database S 1 . Information pertaining to A's accident, such as their name and the location of the swing at the school, is stored in database S 2 , which is held by the fire department. Finally, information relating to the insurance claim following A's accident, such as the name and medical costs, is maintained in the insurance company's database, S 3 . Computing the intersection of these databases, S 1 ∩ S 2 ∩ S 3 , without compromising privacy would enable us to combine the separate sets of information, which may allow the cause of the accident to be identified. Let us consider another situation. Several clinics, denoted as P i , maintain separate databases, represented as S i . The clinics wish to know the patients they have in common to enable them to share treatment details; however, P i should not be able to access any information about patients not stored in their own dataset. In this case, the intersection of the set must not reveal private information.
These examples illustrate the need for the Multiparty Private Set Intersection (MPSI) protocol [1][2][3][4]. MPSI is executed by multiple parties who jointly compute the intersection of their private datasets. Ultimately, only designated parties can access this intersection. Previous protocols are impractical because the bulk of the computation depends on the number of players. One previous study required the size of the datasets maintained by the different players to be equal [1,2]. Another study [3] computed only the approximate number of intersections, whereas other researchers [4] required more than two trusted third-parties.
In this section, we propose a practical MPSI with the following features: 1. The size of the datasets maintained by each party is independent of those maintained by the other parties. 2. The computational complexity for each party is independent of the number of parties. This is accomplished by introducing an outsourcing provider, O. In fact, all computations related to the number of parties are carried out by O. Thus, the number of parties is irrelevant.

Preliminaries
In this section, we summarize the DDH assumption, Bloom filter, and ElGamal encryption. We consider security according to the honest-but-curious model [5]: all players act according to their prescribed actions in the protocol. A protocol that is secure in an honest-but-curious model does not allow any player to gain information about other players' private input sets, besides that that can be deduced from the result of the protocol. Note that the term adversary here refers to insiders, i.e., protocol participants. Outsider adversaries are not considered. In fact, behavior by outsider adversaries can be mitigated via standard network security techniques. A Bloom filter [6], denoted by BF, consists of m arrays and has a space-efficient probabilistic data structure. The BF can check whether an element x is included in a set S by encoding S with at most w elements. The encoded Bloom filter of S is denoted by BF(S).
The BF uses a set of k independent uniform hash functions H = {H 0 , . . . , H k−1 }, where H i : {0, 1} * −→ {0, 1, . . . , m − 1} for 0 ≤ ∀i ≤ k − 1. The BF consists of two functions: Const embeds a given set S into BF(S) and ElementCheck checks whether an element x is included in S. SetCheck, an extension of ElementCheck, checks whether an element x in S is in S ∩ S (see Algorithm 3.3). In Const (see Algorithm 3.1), BF(S) is constructed for a given set S by first setting all bits in the array to 0. To embed an element x ∈ S into the filter, the element is hashed using k hash functions to obtain k index numbers, and the bits at these indexes are set to 1, i.e., set BF[H i (x)] = 1 for 0 ≤ i ≤ k − 1. In ElementCheck (see Algorithm 3.2), we check all locations where x is hashed; x is considered to be not in S if any bit at these locations is 0; otherwise, x is probably in S.
Some false positive matches may occur, i.e., it is possible that all BF[H i (y)] are set to 1, but y is not in S. The false positive rate FPR is given by FPR = [7]. However, false negatives are not possible, and so Bloom filters have a 100% recall rate.

Algorithm 3.1 Const(S)
if BF[ j] = 0 then 6: go to next x. 7: end if 8: end for 9: add x to the set S ∩ 10: end for 11: output S ∩ . stop.
Homomorphic encryption under addition is useful for processing encrypted data. A typical homomorphic encryption under addition was proposed by Paillier [8]. However, because Paillier encryption cannot reduce the order of a composite group, it is computationally expensive compared with the following ElGamal encryption. Our protocol requires matching without revealing the original messages, for which exponential ElGamal encryption (exElGamal) is sufficient [9]. In fact, the decrypted results of exElGamal encryption can distinguish whether two messages m 1 and m 2 are equal, although the exElGamal scheme cannot decrypt messages itself. Furthermore, exElGamal can be used in (n, n)-threshold distributed decryption [10], where the decryption must be performed by all players acting together. An exElGamal encryption with (n, n)-threshold distributed decryption consists of three functions: Key generation: Let F p be a finite field, g ∈ F p , with prime order q. Each player P i chooses x i ∈ Z q at random and computes y i = g x i (mod p). Then, y = n i=1 y i (mod p) is a public key and each x i is a share for each player to decrypt a ciphertext.
q be a message. Choose r ∈ Z q at random, and compute both u = g r (mod p) and v = g m y r (mod p) for the input message m ∈ Z q and a public key y. 1 Finally, each player can decrypt the ciphertext as g m = v/z (mod p). ExElGamal encryption with (n, n)-threshold decryption has the following features: (1) homomorphic under addition: Enc(m 1 )Enc(m 2 ) = Enc(m 1 + m 2 ) for messages m 1 , m 2 ∈ Z p .
(2) homomorphic under scalar operations: Enc(m) k = Enc(km) for a message m and k ∈ Z q .

Previous Work
This section summarizes prior works on PSI between a server and a client and MPSI among n players. In PSI, let S = {s 1 , . . . , s v } and C = {c 1 , . . . , c w } be server and client datasets, respectively, where |S| = v and |C| = w. In MPSI [1], we assume that each player holds the same number of datasets. PSI protocol based on polynomial representation: The main idea is to represent the elements in C as the roots of a polynomial. The encrypted polynomial is sent to the server, where it is evaluated on the elements in S, as originally proposed by Freedman [11]. This is secure against honest-but-curious adversaries under secure public key encryption. The computational complexity is O(vw) exponentiations, and the communication overhead is O(v + w). The computational complexity can be reduced to O(v log log w) exponentiations using the balanced allocation technique [12]. Kissner and Song extended this protocol to MPSI [1], which requires O(nw 2 ) exponentiations and O(nw) communication overhead. The MPSI version is secure against honest-but-curious and malicious adversaries (in the random oracle model) using generic zero-knowledge proofs. PSI protocol based on DH-key agreement: The main objective here is to apply the DH-key agreement protocol [13]: after representing the server and client datasets as hash values {h(s i )} and {h(c i )}, respectively, the client encrypts the dataset as {h(c i ) r i } using a random number r i and sends the encrypted set to the server. The server encrypts the client set {h(c i ) r i } and the server set {h(s i )} using a random number r , which gives {h(c i ) rr i } and {h(s i ) r }, respectively, and returns these sets to the client. Finally, the client evaluates S ∩ C by decrypting to {h(c i ) r }. This is secure against honest-but-curious adversaries under the DDH assumption. The total computational complexity is O(v + w) exponentiations, and the total communication overhead is O(v + w). The security of this approach can be enhanced against malicious adversaries in the random oracle model [14] by using a blind signature. However, no extensions to MPSI based on the DH-key agreement protocol have been proposed. PSI protocol based on BF: This protocol was originally proposed in [4]. As the Bloom filter itself reveals information about the other player's dataset, the set of players is separated into two groups: input players who have datasets and privacy players who perform private computations under shared secret information. In [15], the privacy of each player's dataset is protected by encrypting each array of the Bloom filter using Goldwasser-Micali encryption [16]. In an honest-but-curious version, the computational complexity is O(kw) hash operations and O(m) public key operations, and the communication overhead is O(m), where m and k are the number of arrays and hash functions, respectively, used in the Bloom filter. The Bloom filter is used in the Oblivious transfer extension [17,18] and the newly constructed garbled Bloom filter [19]. The main novelty in the garbled Bloom filter is that each array requires λ bits rather than the single bit needed for the conventional Bloom filter. To embed an element x ∈ S to a garbled Bloom filter, x is split into k shares with λ bits using XOR-based secret sharing (x = x 1 · · · x k ). The x i are then mapped to an index of H i (x). An element y is queried by subjecting all bit strings at H i (y) to an XOR operation. If the result is y, then y is in S; otherwise, y is not in S. The client uses a Bloom filter BF(C), and the server uses a garbled Bloom filter GBF(S). If x is in C ∩ S, then for every position i it hashes to, BF(C)[i] must be 1 and GBF(S)[i] must be x i . Thus, the client can compute C ∩ S. The computational complexity of this method is O(kw) hash operations and O(m) public key operations, and the communication overhead is O(m). The number of public key operations can be changed to O(λ) using the Oblivious transfer extension. This is secure against honestbut-curious adversaries if the Oblivious transfer protocol is secure. Finally, some researchers have computed the approximate number of multiparty set unions [3].

Practical MPSI
This section presents a practical MPSI that is secure under the honest-but-curious model.

Notation and Privacy Definition
In the remainder of this paper, the following notations are used.
is the sum of all players' arrays We introduce an outsourcing provider O to reduce the computational burden on all players. The dealer has no information regarding the elements of any player's set. The privacy issues faced by MPSI with an outsourcing provider can be informally written as follows. • P i does not learn anything about the elements of other players' datasets except for the elements that P i originally possesses. • the outsourcing provider O does not learn anything about the elements of any player's set.

Proposed MPSI
Our MPSI comprises four phases: (i) initialization, (ii) Bloom filter construction and the encryption of P i data, (iii) the O's randomization of thrEnc(IBF(∪S i ) − n), and (iv) the computation of ∩P i . The computation of ∩P i consists of three steps: (a) joint decryption of an (n, n)-threshold exElGamal among n players, (b) Bloom filter check, and (c) output intersection. Figure 3.1 shows an overview of our protocol after the initialization phase. The system parameters of a finite field F p and a basepoint g ∈ F p with order q for an for r = [r 0 , . . . , r m−1 ] ∈ Z m q . Our protocol proceeds as follows.

Initialization:
1. P i generates x i ∈ Z q , computes y i = g x i ∈ Z q , and publishes y i to the other players as a public key, where the corresponding secret key is x i . 2. P i computes y = i y i , where y is the n-player public key. Note that no player knows the corresponding secret key x = x i before executing the joint decryption.
Construction and encryption of BF(S i ) − 1: where y is an n-player public key.

Randomization of thrEnc(IBF(∩S i ) − n):
1. O encrypts IBF(∩S i ) − n without knowing IBF(∩S i ) using an additive homomorphic feature and multiplying by thrEnc y (BF(S i ) − 1) as follows: Computation of ∩S i : The above protocol satisfies the correctness requirement. This is because each array position of thrEnc y (r(IBF(∩S i ) − n)) is decrypted to 1, where x ∈ ∩S i is embedded by each hash function; however, each array position for which x / ∈ ∩S i is embedded by each hash function is decrypted to a random value.

Security Proof
The security of our MPSI protocol is as follows.

Theorem 3.1 For any coalition of fewer than n players, the MPSI is player-private against an honest-but-curious adversary under the DDH assumption.
Proof The views of P i and O, that is, are shown to be indistinguishable from a random vector r = [r 0 , . . . , r m−1 ] ∈ Z m q . Assume that a polynomial-time distinguisher D outputs 0 when the views are presented as a random vector and outputs 1 when they are constructed in MPSI, . We show that a simulator SIM that solves the DDH assumption can be constructed as follows.
Upon receiving a DDH challenge (g, g α , g β , g γ ), SIM executes the following: 1. Set n-player public key y = g β and choose random numbers d 0 , . . . , d m−1 and If (g, g α , g β , g γ ) is a DH-key-agreement-protocol element, i.e., γ = αβ, then thrEnc y (BF m,k (S i )) is distributed in the same way as when constructed by the MPSI scheme. Thus, D must output 1. If (g, g α , g β , g γ ) is not a DH tuple, then thrEnc y (BF m,k (S i )) is randomly distributed, and D has to output 0. Therefore, SIM can use the output of D to respond to the DDH challenge correctly. Therefore, D can answer correctly with negligible advantage over random guessing. Furthermore, as all inputs of each player are encrypted until the decryption is performed, and decryption cannot be performed by fewer than n players, nothing can be learned by any player prior to decryption. As for the views of thrEnc y (r(IBF m,k (∩S i ) − n)), the same argument holds. Therefore, for any coalition of fewer than n players, MPSI is player-private under the honest-but-curious model.
Next, we present d-and-over MPSI. The procedures of d-and-over MPSI are the same as those of MPSI until O computes thrEnc y (IBF(∩S i )). Thus, we describe the procedure after O computes thrEnc y (IBF(∩S i )).

Encryption of -subtraction of IBF(∩S i ):
O executes the following: d-and-over MPSI computation: P i executes the following: Let CBF be an m-array for d ≤ ≤ n, where an array is set to 1 if and only if the corresponding array of rIBF(∩S i ) − is 1, and others are set to 0.
The correctness of d-and-over MPSI follows from the fact that if an element x ∈ ∩ S for d ≤ ∃ ≤ n, the corresponding array locations in IBF(∩S i ) − j for ≤ ∃ j ≤ n, where x is mapped by k hashes, are an encryption of 0, which are decrypted to 1; otherwise, it is an encryption of randomized value.

Efficiency
Although many PSI protocols have been proposed, to the best of our knowledge, relatively few consider the multiparty scenario [1][2][3][4]. Our target is multiparty private set intersection, and the final result must be obtained by all players acting together, without a trusted third-party (TTP). Among previous MPSI protocols, the approach in [3] computes only the approximate number of intersections, and that in [4] requires Restriction on set size more than two TTPs. In contrast, [2] follows almost the same method as [1] and thus has a similar complexity. The only difference exists in the security model. Hence, we only compare our scheme with that of [1]. The computational and communication efficiency of the proposed protocol and [1] are compared in Table 3.1. These approaches are secure against honest-but-curious adversaries without a TTP under exElGamal encryption (DDH security) and Paillier encryption (Decisional Composite Residue (DCR) security), respectively. The Bloom filter parameters (m, k) used in our protocol are set as follows: k = 80 and m = 80ω/ ln 2, where ω is the maximum |S i | = ω i . Then, the probability of false positives is given by p = 2 −80 .
Our MPSI uses the Bloom filter for the computations performed by P i and the integrations performed by the O. The use of a Bloom filter eliminates the restriction on set size. Thus, in our MPSI, the set size of each player is flexible. However, P i 's computations consist of Bloom filter construction, joint decryption, and Bloom filter check. Neither the computations related to the Bloom filter nor the joint decryption depends on the number of players, as shown in Sect. 3.1.2. In summary, the computational complexity of operations performed by P i is O(ω i ). All player-dependent data are sent to O, who integrates n i=1 thrEnc y (IBF(∩S i )) without decryption. Therefore, the computational complexity of operations performed by O is O(nω).

System and Performance
PSI or MPSI implicitly assumes that every attendee can provide data, any attendee can retrieve data from the shared data, and all attendees can communicate with each other. If PSI or MPSI is implemented straightforwardly, such implementation should become a system like a peer-to-peer (P2P) network system. Although a fully distributed system like P2P network has attractive features, such as high availability and scalability, it incurs some unfavorable features.
The network address and port translation (NAPT) is a major obstacle for P2P network systems. Modern P2P network systems take advantage of NAPT traversal technologies to overcome NAPT, but it should be costly to make the architecture complex. The absence of trusted node is also an obstacle for attendee or group management. Making consensus on a P2P network system is difficult or highly costly. Additionally, unpredictable node joining and leaving are reasons that make the P2P network systems complex. To avoid the complexities of P2P networks, we designed a system based on the client server model.
Then, we discuss the design of PSI or MPSI's client server model. There are 2 main functionalities of PSI or MPSI: (1) First, the data sharing is a functionality for sharing data among attendees. (2) Next, the data retrieving from the shared data is a functionality. Any attendee can retrieve data from the shared data, but the retrieving avoids correcting privacy sensitive data by using privacy preserving techniques described above.
However, we do not assume that every attendee provides and retrieves data. Imagine that an incident analysis situation in which data are provided by several organizations which employ labor and operate some machines, and a research institute collects data from the organizations and analyzes it. In such a situation, data providers do not need the data retrieving functionality, and data analysts do not need the data sharing functionality. Therefore, we define 3 roles for our MPSI application design as follows.
• Parties: entities for data providing • Clients: entities for data retrieving • Dealer: an entity for forwarding requests between parties and clients From the perspective of privilege separation, defining and separating roles are significant. Figure 3.2 shows a P2P network model and our client server model. As show in this figure, every P2P network node is connected to each other and can provide and retrieve data, but parties only provide data and clients only retrieve data in the client server model. The dealer forwards requests from parties and clients and provides other functionalities that are not specified by PSI or MPSI. For example, attendee or group management, user authentication, and data logging should be performed by the dealer. Figure 3.3 shows an example sequence diagram of our MPSI application. In this figure, there are 2 parties, 1 client, and 1 dealer. First of all, parties 1 and 2 join the dealer (join p1 and p2). A party must join before providing data, and it must be performed only once at initialization. After that, the client sends a request of data retrieval to the dealer (cl req), and parties send a request to confirm whether the dealer   data retrieval requests by clients (new-req p1 and p2). Then, the parties and the dealer generate keys, share the keys, encrypt data, and decrypt data (gpk p1 and p2, enc p1 and p2, and dec p1 and p2). Finally, the client gets the result from the dealer.
We measured performance of our MPSI application written in Python language on an Amazon's EC2 server (2.4 GHz CPU, 1 GB Memory). Figure 3.4 shows the results when there are from 2 to 4 parties which provide data including 10,000 entries. The results show that it takes approximately 280 s to accomplish data retrieval and that the computational amount does not depend on the number of parties.

Classification
In this section, we present a secure classification protocol, a type of secure computation protocols. We assume two participants Alice and Bob of the protocol. Alice has private data x, and Bob has a classification model C. The task is that Alice learns C(x) at the end of the protocol while preserving the privacy of x and C. That is, Alice can learn only C(x) and Bob can learn nothing. Our construction is based on a code-based public-key encryption scheme called HQC [20], which is a candidate of NIST's Post-Quantum Cryptography standardization [21].

Error-Correcting Code
We start with several fundamental notions for error-correcting codes.

Definition 3.3 (Linear code)
A code C such that c 1 + c 2 ∈ C always holds for any codeword c 1 , c 2 ∈ C is called a linear code. The code C of code length n and information bit number k is described as "a" code.

Definition 3.4 (Generation matrix)
For matrices G ∈ F k×n ,G that satisfy is called a generator matrix. The generator matrix is the basis of linear codes and generates all codewords.
is called a parity check matrix.
Definition 3.6 (Cyclic matrix) When x = (x 1 , . . . , x n ) ∈ F n , the circulant matrix for x is defined as In addition, the multiplication of two polynomials x, y has the following properties: x · y = x × rot( y)  (σ(c 0 ), . . . , σ(c s−1 ) ∈ C, C is called the s-quasi-cyclic code. In particular, when s = 1, C is called a cyclic code.

Security Assumptions
As mentioned above, the security of the public-key cryptosystem HQC is based on the computational difficulty of the quasi cyclic syndrome decoding problem. More specifically, its security is proved under the following quasi cyclic syndrome decoding decision assumptions. ←− F sn−n of random systematic quasi-cyclic code are given, every efficient algorithm distinguish only with negligible probability whether it is quasi-cyclic syndrome decoding distribution or the uniform distribution over F (sn−n)×sn × F (sn−n) .
As will be described later, since the security of the secure computation protocol proposed in this section is reduced to the security of HQC, the secure computation protocol of this section is proved to be secure under this assumption as well as under HQC.

Security Requirements for 2PC
Secure two-party computation is a subproblem of multi-party secure computation. The studies have been conducted by many researchers since it is closely related to many cryptographic protocols. The purpose of 2PC is to construct a general-purpose protocol so that arbitrary functions can be jointly computed without sharing the input values of the two parties with the other. One of the best-known examples of 2PCs is the millionaire problem [22] in Yao, where Alice and Bob do not reveal their money and decide who is richer. Specifically, suppose that Alice has a yen, and Bob has b yen. The problem is to decide whether a ≥ b or not while keeping each other secret. Generally speaking, the security requirement of 2PC is that the computation of any function is performed using a protocol without leaking the two inputs to the other, and only the computation result is known.
A two-party linear function evaluation is a kind of 2PC that satisfies the 2PC security requirements. In other words, the participants perform the evaluation without notifying the other party of their input. In addition, the function of the protocol is the evaluation of linear functions. Specifically, linear function secure computation protocol computes f (m) = a · m + b. The participants in the protocol are called Alice and Bob. Alice's input is m, and Bob's input is linear function parameters a, b. Alice gets only the result of f (m) = a · m + b through the protocol, and Bob gets nothing.
Below we define the security requirements for two-party linear function secure computation. Let f = ( f A , f B ) be a function of probabilistic polynomial time, and π be a two-way protocol for computing function f . Let the view of A with (x, y) execution π(x, y) and the security parameter n be view π A (x, y, n) and the view of B be view π B (x, y, n). The output of A is output π A (x, y, n) and the output of B is output π B (x, y, n). In addition, the joint output of the two is denoted as output π (x, y, n) = (output π A (x, y, n), output π B (x, y, n)). For semi-honest adversaries, we say that the protocol π(x, y) can securely compute the function f if there are probabilistic polynomial-time algorithms S A and S B that satisfy the following equations. For any x, y that satisfy |x| = |y| = n, n ∈ N, the following holds: (1 n , x, f A (x, y)), f (x, y))} x,y,n c ≡{(view π A (x, y, n), output π (x, y, n))} x,y,n , {(S B (1 n , x, f B (x, y)), f (x, y))} x,y,n c ≡{(view π B (x, y, n), output π (x, y, n))} x,y,n .

HQC Encryption Scheme
The protocols proposed in this section are based on the Hamming Quasi-Cyclic cryptosystem of Gaborit et al. First, we introduce the cryptosystem proposed by Gaborit et al. [20], which is a public key cryptosystem based on the quasi-cyclic syndrome decoding problem. In this cryptosystem, two kinds of codes quasi-cyclic code and error-correcting code C are used. The error-correcting code C is an arbitrary linear code (such as a BCH code) used for message encoding and decoding and with sufficient error correction capability. A quasi-cyclic code is used for a security requirement of this public key cryptosystem to generate noise that an adversary cannot decrypt. The participants of the HQC cryptosystem are Alice (A) and Bob (B), and B aims to send the input message m securely to A. The cryptosystem is performed as follows: 1. Global parameter settings: Parameters param = (n, k, δ, w x , w r , w e ) and the sign C generation matrix G ∈ F k×n . 2. Key generation: Furthermore, (x, y) $ ←− R 2 is generated, and the Hamming weight of x, y is w x . Secret information sk = (x, y) Public information pk = (h, s = x + h · y). A sends public information pk to B. 3. Encryption: The Hamming weight of e is w e , and the Hamming weight of r 1 and r 2 is w r . Then, we compute u = r 1 + h · r 2 and v = m · G + s · r 2 + e on input m. B sends the ciphertext u, v back to A.

Decryption:
A uses the decoding function C.Decode(v − u · y) of the error-correcting code C to recover the message m of B.
In the HQC cryptosystem, public information s is added to the message m encoded by the error-correcting code when it is encrypted. Since s is noise with a large Hamming weight generated by the quasi-cyclic code, security is guaranteed by the quasi-cyclic syndrome decoding decision assumption introduced above. In addition, A can use the secret key for the encrypted error-protected ciphertext in the decryption stage, and can remove a large amount of noise from s. However, some noise of x · r 2 − r 1 · y + e remains. If the weight of this noise is smaller than the maximum number of correctable errors δ of the error-correcting code, correct decoding is possible. Hamming weights w, w r , w e = O( √ n) are assumed and analyzed. Moreover, the conclusion that the probability of becoming ω(x · r 2 + e − y · r 1 ) ≤ δ increases as the code space n becomes larger is shown in the paper of Gaborit et al. In addition, the HQC cryptosystem is IND-CPA secure under the quasi-cyclic syndrome decoding decision assumption.

Linear Function Evaluation
We introduce the secure evaluation protocol of the linear functions between two parties.
We use two codes, quasi-cyclic code and arbitrary error-correcting code C, based on Gaborit's HQC cryptosystem. The participants in the protocol are Alice (A) and Bob (B). A's input is m ∈ F 2 , B's input is a, b ∈ F 2 , B's output is nothing, and A's output is a · m + b. The protocol is given in Protocol 3.2.5.1.

Protocol Linear function evaluation protocol
1. Global parameter param = (n, k, δ, w x , w r , w e ) and the sign C generation matrix G ∈ F k×n are chosen. Here, the Hamming weight of r B is w r , and the Hamming weight of e u and e v is w e . B computes u = a · u + h · r B + e u and v = a · v + b · G + s · r B + e v . B sends u , v back to A. 5. A uses C. Decode(v − u · y) to decode the error-correcting code C, and recovers a · m + b by taking the first bit of the result.

A generates the random h
First, we set global parameters. n is the code length of the code, k is the number of information bits, δ is the maximum number of correctable errors in the errorcorrecting code, and w x , w r , w e are Hamming weights set in advance. For example, it is half the weight of O( = (x y)(I n rot(h)) .

(3.6)
It can be converted to and can be reduced to the quasi cyclic syndrome decoding problem. Then, A sets secret information sk as (x, y) and public information pk as (h, s).
A pads the input m with 0, making m = (m, 0, . . . , 0) with dimension k. A generates r A , r u , r v $ ←− R, encodes the value of m with an error-correcting code, and re-randomizes it. A generates a ciphertext pair of (u = h · r A + r u , v = m · G + s · r A + r v ) and send it to B. As for B, v has a noise s that cannot be decoded, and has no secret information that can be removed, so B cannot learn m. As shown by the Eq.(3.9), the result of v − u · y is the result of removing h and s. Taking the first bit makes a · m + b available to A.

Correctness and Security of the Proposed Protocol
The correctness of the two-way linear function evaluation protocol proposed in this study obviously depends on the decoding ability of the code C. Specifically, assuming that C. Decode decodes v − u · y correctly, the following equation is satisfied: Encrypt( pk, a · m + b)) = a · m + b. (3.10) Also, let be the error of v − u · y. The error is = ⎧ ⎨ ⎩ xr B − ye u + e v (In the case of a = 0)

x(r A + r B ) − y(r u + e u ) + (r v + e v )
(In the case of a = 1) (3.11) for the error correction capability of the code C. In the paper of Gaborit et al., C.Decode can work correctly when ω(x · r 2 + e − y · r 1 ) ≤ δ is satisfied, and w r and w e have the same value when actually evaluated. If the Hamming weight of r 0 , r 1 , r u , r v ,  The security requirements of the proposed protocol are described above. In this section, we prove the security against semi-honest adversaries.

Theorem 3.2 Under the quasi-cyclic syndrome decoding assumption, the 2PC protocol securely computes linear functions for semi-honest adversaries.
Proof First, consider the semi-honest adversary A. With the global parameter omitted, the view of A is view A = (m; h, x, y, r 0 , r 1 , r u , r v ; u , v ). We construct a simulator S A (m, x, y) as follows: Generate h, r 0 , r A , r u , r v Here, the Hamming weight of r A , r u , r v is w r . 2. Output (m, x, y; h, r A , r u , r v ; u , v ).
Since, h, r A , r u , r v and h, r A , r u , r v follow the same distribution, the following equation holds: (m, x, y; h, r A , r u , m, x, y; h, r A , r u , r v ; u , v ). Therefore, the adversary of probabilistic polynomial time cannot distinguish between (h · r B + e u , s · r B + e v ) and uniform random numbers under the assumption of 3-quasi-cyclic syndrome decoding of quasi-cyclic code. Since u and v are also under the 3-quasicyclic syndrome decoding decision assumption, they cannot distinguish between u and v and uniform random numbers. Thus, the distribution of u and v also approaches uniform random numbers and satisfies the following equation: (m, x, y; h, r A , r u , r v , u , v ) ≡ c (m, x, y; h, r A , r u , r v , u , v ). (3.14) Thus, the distributions of the view view A of A and the simulator S A are indistinguishable against polynomial-time adversaries: m, x, y; h, r A , r u , r v ; u , v ). (3.15) Next, consider the semi-honest adversary B. With the global parameter omitted, the view of B is view B = (a, b; h, s, u, v, r B , e u , e v ). Configure the simulator S B (a, b) as follows: 1. Randomly generate h, s, u, v, r B , e u , e v $ ←− R. Here, the Hamming weight of r B is w r , and the Hamming weight of e u and e v is w e 2. Output (a, b; h, s, u, v, r B , e u , e v ).
Since , h, r B , r u , r v and h, r B , r u , r v follow the same distribution, the following equation holds :  (a, b; h, s, u, v, r B , e u , e v ) ≡ s (a, b; h, s, u, v, r B , e u , e v ). (3.16) Note that s can be reduced to 2-cyclic syndrome decoding decision, and the distribution cannot be distinguished from uniform random numbers for the adversary in polynomial time. Therefore, the following equation is satisfied.  a, b; h, s, u, v, r B , e u , e v ). (3.17) Moreover, since u and v are indistinguishable between (h · r B + e u , s · r B + e v ) and uniform random numbers based on the assumption of quasi-cyclic syndrome decoding and the adversary of probabilistic polynomial time cannot be distinguished, the following holds: (a, b; h, s, u, v, r B , e u , e v )  ≡ c (a, b; h, s, u, v, r B , e u , e v ). (3.18) Therefore, the distributions of the view view B of B and the simulator S B cannot be distinguished against the adversary of polynomial time: B (a, b; h, s, u, v, r B , e u , e v ). (3.19) The above protocol works over F 2 , but one can see that this can be easily extended to a larger field F q by using appropriate error-correcting linear codes over F q .

Secure Comparison
Two-party secure comparison protocol proposed in this section is based on the size comparison method used in the secure decision tree classification protocol of Wu et al. [23]. In this section, we used the following criteria given in Proposition 3.1 for comparison.

Proposition 3.1 For a t -bit x, y, if there is an i ∈ [t]
such that the following expression holds, then x < y.
In this section, we introduce the proposed protocol for two-party secret comparison protocol. The proposed protocol for two-party secret comparison protocol uses a quasi-cyclic code and an arbitrary error-correcting code (For example, Reed-Solomon code) on F q . The participants in the protocol are Alice (A) and Bob (B). The input of A is c ∈ N, and the input of B is d ∈ N. The output of A is the result of the comparison between c and d, and the output of B is none.
The flow of two-party secret comparison is shown as follows: Protocol Two-party secret comparison protocol of k bits. In addition, they set the global parameter param = (n, k, δ, w x , w r ) and the generator matrix G ∈ F k×n q of code C.

A generates random h
with Hamming weight w x is generated. Private key sk = (x, y), and public key pk = (h, s = x + h · y).  a 1i , a 2i , . . . , a li ,  b i . B computes u i = a 1i · u 1 + · · · + h · r Bi + e ui and v i = a 1i · v 1 + · · · + b i · G + s · r Bi + e vi for l pairs. Then, the order of (u i , v i ) of l pairs is randomly replaced and sent to A in a random order. In addition, set global parameters. n is the code length, k is the number of information bits, δ is the maximum number of errors that can be corrected by the error-correcting code, and w x and w r are the Hamming weights set in advance.

A generates a random r Ai
The public parameter G is the generator matrix(For example, the Reed-Solomon code generator matrix) of the error-correcting code C, which maps the message and code length as F k q → F n q . 2. In step 2, A generates a private key and public key for HQC encryption scheme.
In other words, c < d if i ∈ [l] exists such that In particular, since B has plaintext d i and encrypted c i , Eq. (3.20) can be regarded as an equation with c i as an unknown and can be computed. In addition, for XOR operations, B can transform x i ⊕ y i into (3.21) Therefore, the XOR operation requires only the additive homomorphism of HQC encryption scheme. That is, B substitutes plaintext d i , i ∈ [l] into the above equation, sets the appropriate a 1i , a 2i , . . . , a li , b i , and computes as follows: u i = a 1i ·u 1 +· · ·+a li ·u l +h· r Bi +e ui .
(3.22) v i = a 1i ·v 1 +· · ·+a li ·v l +b i · G +s· r Bi +e vi . (3.23) Here, the Hamming weight of r Bi , e ui , e vi , i ∈ [l] is w * r . Furthermore, to not leak the information about which bits are different to A, B needs to replace the order of each (u i , v i ) computed at random.

In step 5, A computes
. The result is v i − u i · y = (a 1i · m 1 + · · · + a li · m l ) · G + x · (a 1i · r A1 + · · · + a li · r Al + r Bi ) − y · (a 1i · r u1 + · · · + a li · r ul + e ui ) + (a 1i · r v1 + · · · + a li · r vl + e vi ). (3.24) Then, the evaluation result is decoded by the error-correcting code. A takes out the first 1 bit of each of l decoding results, and outputs c < d if there is 0 in it. If there is no 0, c ≥ d is output.

Correctness
First, we explain step 4 w * r . The Hamming weight of the polynomial coefficient vector x, y is w x , and the Hamming weight of r Ai , r ui , r vi , i ∈ [l] is w r . Since each is selected uniformly and independently, the probability of each bit value of the vector is expressed as follows: (3.25) Similarly, r Ai, j = r ui, j = r vi, j = 0 w.p. 1 − p r 1 w.p. p r = w r n . (3.26) Let L be the set of a 1i , a 2i , . . . , a li = 0 in each a 1i · r A1 + a 2i · r A2 + · · · + a li · r Al for the expression i ∈ [l].
Let |L| be the number of elements in set L. Set the Hamming weights w * r for r Bi , e ui , e vi be as follows: Thus, the value of each w * r can be determined based on the nonzero numbers in a i and i ∈ [l].
Next, we analyze the validity of the proposed protocol.
The legitimacy of the proposed bilateral linear function secure computation protocol clearly depends on the decoding ability of C. Set the v − u · y error to . For the error correction capability of code C, the error is = x · (a 1i · r A1 + · · · + a li · r Al + r Bi ) − y · (a 1i · r u1 + · · · + a li · r ul + e ui ) + (a 1i · r v1 + · · · + a li · r vl + e vi ).

(3.27)
In other words, if < δ, decoding is successful. Here, δ is the maximum number of errors that can be corrected by error-correcting code C. In addition, in order to analyze the validity of the proposed protocol, we generalize the validity of the HQC encryption scheme proved by Gaborit et al. [20].
The following proposition holds for the Hamming weight of the error.
Therefore, the distribution of B's view view B and simulator S B is indistinguishable against polynomial time adversaries.

Support Vector Machine from Secure Linear Function Evaluation and Secure Comparison
We can construct a code-based protocol for a support vector machine from the protocols for evaluation of linear functions and comparison described above. Note that the result of secure evaluation of linear function is in F q while that of secure composition is a bit string. Therefore, we need to provide secure bit-decomposition protocol. The bit-decomposition protocols have been already studied well in the research area of secure computation, and indeed, we can use the bit-decomposition protocol given in [24] with secure computation protocol from a threshold homomorphic encryption [25]. (It is straightforward to construct a threshold version of HQC scheme by setting sk A = (x 1 , y 1 ) and sk B = (x 2 , y 2 ) as distributed decryption keys for A and B. Then, the encryption key is (h, (x 1 + x 2 ) + h · ( y 1 + y 2 )).
We describe the overview of the protocol below. For simplification, we denote [m] as the ciphertext for m under HQC encryption scheme over F q .

Input
A : m ∈ F q B : a, b, t ∈ F q Output A : a · m + b > t or not B : ⊥ Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.