PrivacyPreserving Integration of Medical Data
 1.7k Downloads
 7 Citations
Abstract
Medical data are often maintained by different organizations. However, detailed analyses sometimes require these datasets to be integrated without violating patient or commercial privacy. Multiparty Private Set Intersection (MPSI), which is an important privacypreserving protocol, computes an intersection of multiple private datasets. This approach ensures that only designated parties can identify the intersection. In this paper, we propose a practical MPSI that satisfies the following requirements: The size of the datasets maintained by the different parties is independent of the others, and the computational complexity of the dataset held by each party is independent of the number of parties. Our MPSI is based on the use of an outsourcing provider, who has no knowledge of the data inputs or outputs. This reduces the computational complexity. The performance of the proposed MPSI is evaluated by implementing a prototype on a virtual private network to enable parallel computation in multiple threads. Our protocol is confirmed to be more efficient than comparable existing approaches.
Keywords
Medical data Privacypreserving data integration Private set intersectionIntroduction
Medical organizations often store the data accumulated through medical analyses. However, detailed data analysis sometimes requires separate datasets to be integrated without violating patient or commercial privacy. Consider the scenario in which the occurrence of similar accidents can be attributed to a particular defective product. Such defective products should be identified as quickly as possible. However, the databases related to accidents are maintained separately by different organizations. Thus, investigating the causes of accidents is often timeconsuming. For example, suppose child A has broken her/his leg at school, but it is not clear whether the accident was caused by defective equipment. In this case, information relating to A’s injury, such as the patient’s name and type of injury, are stored in hospital database S _{1}. Information pertaining to A’s accident, such as their name and the location of the swing at the school, are stored in database S _{2}, which is held by the fire department. Finally, information relating to the insurance claim following A’s accident, such as the name and medical costs, is maintained in the insurance company’s database, S _{3}. Computing the intersection of these databases, S _{1}∩S _{2}∩S _{3}, without compromising privacy would enable us to combine the separate sets of information, which may allow the cause of the accident to be identified. Let us consider another situation. Several clinics, denoted as P_{ i }, maintain separate databases, represented as S _{ i }. The clinics wish to know the patients they have in common to enable them to share treatment details; however, P_{ i } should not be able to access any information about patients not stored in their own dataset. In this case, the intersection of the set must not reveal private information.
These examples illustrate the need for the Multiparty Private Set Intersection (MPSI) protocol [11, 17, 18, 21]. MPSI is executed by multiple parties who jointly compute the intersection of their private datasets. Ultimately, only designated parties can access the intersection. Previous protocols are impractical, because the bulk of the computation is a function of the number of players. One previous study required the size of the datasets maintained by the different players to be equal [17, 21]. Another study [11] computed only the approximate number of intersections, whereas other researchers [18] required more than two trusted thirdparties.
 1.
The size of the datasets maintained by each party is independent of those maintained by the other parties.
 2.
The computational complexity for each party is independent of the number of parties. This is accomplished by introducing an outsourcing provider, \(\phantom {\dot {i}\!}\mathcal {O}\). In fact, all computations related to the number of parties are carried out by \(\phantom {\dot {i}\!}\mathcal {O}\). Thus, the number of parties is irrelevant.
The remainder of this paper is organized as follows. Previous results that are used to develop the proposed protocol are summarized in “Preliminaries”. “Previous work” then introduces some related studies. We propose the new MPSI in “Practical MPSI”, and present the results of its implementation in “Implementation results”.
Preliminaries
In this section, we summarize the DDH assumption, Bloom filter, and ElGamal encryption. We consider security according to the honestbutcurious model [13]: all players act according to their prescribed actions in the protocol. A protocol that is secure in an honestbutcurious model does not allow any player to gain information about other players’ private input sets, besides that which can be deduced from the result of the protocol. Note that the term adversary here refers to insiders, i.e., protocol participants. Outsider adversaries are not considered. In fact, behavior by outsider adversaries can be mitigated via standard network security techniques.
Our protocol is based on the following security assumption.
Definition 1
(DDH Assumption) Let t be a security parameter. A decisional Diffie–Hellman (DDH) parameter generator \(\phantom {\dot {i}\!}\mathcal {I}\mathcal {G}\) is a probabilistic polynomial time (ppt) algorithm that takes input 1^{ k } and outputs a description of a finite field \(\phantom {\dot {i}\!}\mathbb {F}_{p}\) and a basepoint \(\phantom {\dot {i}\!}g \in \mathbb {F}_{p}\) with prime order q. We say that \(\phantom {\dot {i}\!}\mathcal {I}\mathcal {G}\) satisfies the DDH assumption if p _{1}−p _{2} is negligible (in K) for all ppt algorithms A, where \(\phantom {\dot {i}\!}p_{1}= \text {Pr} [ (\mathbb {F}_{p}, g) \leftarrow \mathcal {I}\mathcal {G}(1^{K}); y_{1}=g^{x_{1}}, y_{2}= g^{x_{2}} \leftarrow \mathbb {F}_{p}: A(\mathbb {F}_{p}, g, y_{1}, y_{2}, g^{x_{1}x_{2}}) = 0]\) and \(p_{2}={\text {Pr}} [ (\mathbb {F}_{p}, g) \leftarrow \mathcal {I}\mathcal {G}(1^{K}); y_{1}=g^{x_{1}}, y_{2}= g^{x_{2}}, z \leftarrow \mathbb {F}_{p}: A(\mathbb {F}_{p}, g, y_{1}, y_{2}, z) = 0]\).
A Bloom filter [3], denoted by B F, consists of m arrays and has a spaceefficient probabilistic data structure. The B F can check whether an element x is included in a set S by encoding S with at most w elements. The encoded Bloom filter of S is denoted by B F(S).
The B F uses a set of k independent uniform hash functions \(\phantom {\dot {i}\!}\mathcal {H} = \left \{ H_{0}, ..., H_{k1} \right \}\), where H _{ i }:{0,1}^{∗}→{0,1,⋯ ,m−1} for 0≤∀i≤k−1. The B F consists of two functions: C o n s t embeds a given set S into B F(S), and E l e m e n t C h e c k checks whether an element x is included in S. S e t C h e c k, an extension of E l e m e n t C h e c k, checks whether an element x in S ^{ ′ } is in S ^{ ′ }∩S (see Algorithm 3). In C o n s t (see Algorithm 1), B F(S) is constructed for a given set S by first setting all bits in the array to 0. To embed an element x∈S into the filter, the element is hashed using k hash functions to obtain k index numbers, and the bits at these indexes are set to 1, i.e., set B F[H _{ i }(x)]=1 for 0≤i≤k−1. In E l e m e n t C h e c k (see Algorithm 2), we check all locations where x is hashed; x is considered to be not in S if any bit at these locations is 0; otherwise, x is probably in S.
Homomorphic encryption under addition is useful for processing encrypted data. A typical homomorphic encryption under addition was proposed by Paillier [19]. However, because Paillier encryption cannot reduce the order of a composite group, it is computationally expensive compared with the following ElGamal encryption. Our protocol requires matching without revealing the original messages, for which exponential ElGamal encryption (exElGamal) is sufficient [5]. In fact, the decrypted results of exElGamal encryption can distinguish whether two messages m _{1} and m _{2} are equal, although the exElGamal scheme cannot decrypt messages itself. Furthermore, exElGamal can be used in (n,n)threshold distributed decryption [9], where decryption must be performed by all players acting together. An exElGamal encryption with (n,n)threshold distributed decryption consists of three functions:
Key generation
Let \(\mathbb {F}_{p}\) be a finite field, \(\phantom {\dot {i}\!}g \in \mathbb {F}_{p}\), with prime order q. Each player P_{ i } chooses \(\phantom {\dot {i}\!}x_{i} \in \mathbb {Z}_{q}\) at random and computes \(\phantom {\dot {i}\!}y_{i}=g^{x_{i}} (\text {mod } p)\). Then, \(\phantom {\dot {i}\!}y={\prod }_{i=1}^{n}y_{i} (\text {mod } p)\) is a public key and each x _{ i } is a share for each player to decrypt a ciphertext.
Encryption
t h r E n c [ m ]→(u,v) Choose \(r \in \mathbb {Z}_{q}\) at random, and compute both u=g ^{ r }(modp) and v=g ^{ m } y ^{ r }(modp) for the input message \(\phantom {\dot {i}\!}m \in \mathbb {Z}_{q}\) and a public key y. Output (u,v) as a ciphertext of m.
Decryption
t h r D e c[(u,v)]→g ^{ m } Each player P_{ i } computes z _{ i }=u ^{ x } _{ i }(modp). All players then compute \(\phantom {\dot {i}\!}z = {\prod }_{i=1}^{n} z_{i} (\text {mod}{p})\) jointly.^{1} Finally, each player can decrypt the ciphertext as g ^{ m }=v/z(modp).

(1) homomorphic under addition: Enc (m _{1})Enc (m _{2})=Enc (m _{1}+m _{2}) for messages \(\phantom {\dot {i}\!}m_{1}, m_{2} \in \mathbb {Z}_{p}\).

(2) homomorphic under scalar operations: Enc (m)^{ k }=Enc (k m) for a message m and \(\phantom {\dot {i}\!}k \in \mathbb {Z}_{q}\).
Previous work
This section summarizes prior works on PSI between a server and a client and MPSI among n players. In PSI, let S={s _{1},...,s _{ v }} and C={c _{1},...,c _{ w }} be server and client datasets, where S=v and C=w. In MPSI [17], we assume that each player holds the same number of datasets.
PSI protocol based on polynomial representation
The main idea is to represent the elements in C as the roots of a polynomial. The encrypted polynomial is sent to the server, where it is evaluated on the elements in S, as originally proposed by Freedman [12]. This is secure against honestbutcurious adversaries under secure public key encryption. The computational complexity is O(v w) exponentiations, and the communication overhead is O(v+w). The computational complexity can be reduced to O(vloglogw) exponentiations using the balanced allocation technique [1]. Kissner and Song extended this protocol to MPSI [17], which requires O(n w ^{2}) exponentiations and O(n w) communication overhead. The MPSI version is secure against honestbutcurious and malicious adversaries (in the random oracle model) using generic zeroknowledge proofs.
PSI protocol based on DHkey agreement
The main objective here is to apply the DHkey agreement protocol [7]: after representing the server and client datasets as hash values {h(s _{ i })} and {h(c _{ i })}, respectively, the client encrypts the dataset as \(\phantom {\dot {i}\!}\{h(c_{i})^{r_{i}}\}\) using a random number r _{ i } and sends the encrypted set to the server. The server encrypts the client set \(\phantom {\dot {i}\!}\{h(c_{i})^{r_{i}}\}\) and the server set {h(s _{ i })} using a random number r, which gives \(\phantom {\dot {i}\!}\{h(c_{i})^{rr_{i}}\}\) and {h(s _{ i })^{ r }}, respectively, and returns these sets to the client. Finally, the client evaluates S∩C by decrypting to {h(c _{ i })^{ r }}. This is secure against honestbutcurious adversaries under the DDH assumption. The total computational complexity is O(v+w) exponentiations and the total communication overhead is O(v+w). The security of this approach can be enhanced against malicious adversaries in the random oracle model [6] by using a blind signature. However, no extensions to MPSI based on the DHkey agreement protocol have been proposed.
PSI protocol based on BF
This protocol was originally proposed in [18]. As the Bloom filter itself reveals information about the other player’s dataset, the set of players is separated into two groups: input players who have datasets and privacy players who perform private computations under shared secret information. In [16], the privacy of each player’s dataset is protected by encrypting each array of the Bloom filter using Goldwasser–Micali encryption [14]. In an honestbutcurious version, the computational complexity is O(k w) hash operations and O(m) public key operations, and the communication overhead is O(m), where m and k are the number of arrays and hash functions, respectively, used in the Bloom filter. The Bloom filter is used in the Oblivious transfer extension [15, 20] and the newly constructed garbled Bloom filter [10]. The main novelty in the garbled Bloom filter is that each array requires λ bits, rather than the single bit needed for the conventional Bloom filter. To embed an element x∈S to a garbled Bloom filter, x is split into k shares with λ bits using XORbased secret sharing \(\phantom {\dot {i}\!}(x=x_{1} \bigoplus ... \bigoplus x_{k})\). The x _{ i } are then mapped to an index of H _{ i }(x). An element y is queried by subjecting all bit strings at H _{ i }(y) to an XOR operation. If the result is y, then y is in S; otherwise, y is not in S. The client uses a Bloom filter B F(C) and the server uses a garbled Bloom filter G B F(S). If x is in C∩S, then for every position i it hashes to, B F(C)[i] must be 1 and G B F(S)[i] must be x _{ i }. Thus, the client can compute C∩S. The computational complexity of this method is O(k w) hash operations and O(m) public key operations, and the communication overhead is O(m). The number of public key operations can be changed to O(λ) using the Oblivious transfer extension. This is secure against honestbutcurious adversaries if the Oblivious transfer protocol is secure. Finally, some researchers have computed the approximate number of multiparty set unions [11].
Practical MPSI
This section presents a practical MPSI that is secure under the honestbutcurious model.
Notation and privacy definition

P _{ i }: ith player, i=1,⋯ ,n

\(\phantom {\dot {i}\!}\mathcal {O}\): outsourcing provider with no knowledge of the inputs or outputs

\(\phantom {\dot {i}\!}S_{i} = \{ s_{i,1}, s_{i, 2},\cdots , s_{i, w_{i}} \}\): dataset held by P _{ i }, where S _{ i }=ω _{ i }

∩S _{ j }: intersection of all n players

t h r E n c and t h r D e c: (n,n)threshold exElGamal encryption and decryption, respectively

m and k: number of arrays and hashes used in B F

ℓ=[ℓ,⋯ ,ℓ] ( 1≤ℓ≤n): an mdimensional array, where all strings in the array are set to ℓ

B F(S _{ i })=[B F _{ i }[0],⋯ ,B F _{ i }[m−1]]: Bloom filter applied to a set S _{ i }

\(\mathsf {IBF}(\cup S_{i})=[ {\sum }_{i=1}^{n} \mathsf {BF}_{i}[0], \cdots , {\sum }_{i=1}^{n} \mathsf {BF}_{i}[m1]]\) : integrated Bloom filter of n sets {S _{ i }}, where \({\sum }_{i=1}^{n} \mathsf {BF}_{i}[j]\) is the sum of all players’ arrays.
We introduce an outsourcing provider \(\phantom {\dot {i}\!}\mathcal {O}\) to reduce the computational burden on all players. The dealer has no information about the elements of any player’s set. The privacy issues faced by MPSI with an outsourcing provider can be informally written as follows.
Definition 2

P _{ i } does not learn anything about the elements of other players’ datasets except for the elements in ∩S _{ j }.

the outsourcing provider \(\phantom {\dot {i}\!}\mathcal {O}\) does not learn anything about the elements of any player’s set.
Proposed MPSI
Our protocol proceeds as follows.
Initialization:
 1.
P _{ i } generates \(\phantom {\dot {i}\!}x_{i} \in \mathbb {Z}_{q}\), computes \(\phantom {\dot {i}\!}y_{i}=g^{x_{i}} \in \mathbb {Z}_{q}\), and publishes y _{ i } to the other players as a public key, where the corresponding secret key is x _{ i }.
 2.
P _{ i } computes \(\phantom {\dot {i}\!}y={\prod }_{i} y_{i}\), where y is an nplayer public key. Note that no player knows the corresponding secret key \(x = \sum x_{i}\) before executing the joint decryption.
Construction and encryption of BF(S _{ i }) ⋯ 1:
 1.
P _{ i } executes C o n s t(S _{ i })→B F(S _{ i }) =[B F _{ i }[0],⋯ ,B F _{ i }[m−1]] (Algorithm 1).
 2.P _{ i } encrypts B F(S _{ i })−1 using t h r E n c _{ y }:where y is an nplayer public key.$$\begin{array}{@{}rcl@{}} &&\mathsf{thrEnc}_{\textit{y}}(\mathsf{BF}(S_{\textit{i}})  \mathbf{1}) \\ &&=\![\mathsf{thrEnc}_{\textit{y}}(\mathsf{BF}_{\textit{i}}[0] \,\,1), \cdots\!,\ \mathsf{thrEnc}_{\textit{y}}(\mathsf{BF}_{\textit{i}}[\textit{m}\,\,1]\,\,1)], \end{array} $$
 3.
P _{ i } sends t h r E n c _{ y }(B F(S _{ i })−1) to \(\phantom {\dot {i}\!}\mathcal {O}\).
Randomization of thrEnc(IBF(∪S _{ i })−n):
 1.\(\phantom {\dot {i}\!}\mathcal {O}\) encrypts I B F(∪S _{ i })−n without knowing I B F(∪S _{ i }) using an additive homomorphic feature and multiplying by t h r E n c _{ y }(B F(S _{ i })−1) as follows:$$\mathsf{thrEnc}_{\textit{y}}(\mathsf{IBF}(\cup S_{\textit{i}}) \mathbf{n}) = {\prod}_{i=1}^{n} \mathsf{thrEnc}_{\textit{y}}(\mathsf{BF}(S_{\textit{i}}) \mathbf{1}). $$
 2.\(\phantom {\dot {i}\!}\mathcal {O}\) randomizes t h r E n c _{ y }(I B F(∪S _{ i })−n) as \(\phantom {\dot {i}\!}\mathbf {r}\ = [r_{0}, \cdots , r_{\textit {m}1}] \in \mathbb {Z}_{\textit {q}}^{\textit {m}}\):$$\mathsf{thrEnc}_{\textit{y}}(\mathbf{r} (\mathsf{IBF}(\cup S_{\textit{i}})  \mathbf{n})) =(\mathsf{thrEnc}_{\textit{y}}(\mathsf{IBF}(\cup S_{\textit{i}})  \mathbf{n}))^{\mathbf{r}}. $$
 3.
\(\phantom {\dot {i}\!}\mathcal {O}\) broadcasts t h r E n c _{ y }(r(I B F(∪S _{ i })−n)) to P _{ i }.
Computation of ∩P _{ i }:
 1.
All players decrypt t h r E n c _{ y }(r(I B F(∪S _{ i })−n)) jointly.
 2.
P _{ i } computes S e t C h e c k(r(I B F(∪S _{ i })−n),S _{ i }) and obtains ∩S _{ i }.
The above protocol satisfies the correctness requirement. This is because each array position of t h r E n c _{ y }(r(I B F(∪S _{ i })−n)) is decrypted to 1, where x∈∩S _{ i } is embedded by each hash function; however, each array position for which x∉∩S _{ i } is embedded by each hash function is decrypted to a random value.
Security Proof
The security of our MPSI protocol is as follows.
Theorem 1
For any coalition of fewer than n players, MPSI is playerprivate against an honestbutcurious adversary under the DDH assumption.
Proof
 1.
Set nplayer public key \(\phantom {\dot {i}\!}y = \overline {g}^{\beta }\) and choose random numbers d _{0},...,d _{ m−1} and r _{1},...,r _{ m−1} from \(\phantom {\dot {i}\!}\mathbb {Z}_{q}\).
 2.Sendas \(\phantom {\dot {i}\!}\overline {\mathsf {thrEnc}_{\textit {y}}({\mathsf {BF}}_{\textit {m,k}}(S_{i}))}\) to \(\phantom {\dot {i}\!}\mathcal {D}\).$$\begin{array}{@{}rcl@{}} [(\overline{g}^{\alpha}, \overline{g}^{d_{0}}_{\cdot} \overline{g}^{r}), (\overline{g}^{\alpha})^{r_{1}},\overline{g}^{d_{1}} \cdot (\overline{g}^{\gamma})^{r_{1}}, {\kern.5pt}\cdots{\kern.1pt},{\kern.5pt} \overline{g}^{d_{m1}} \cdot (\overline{g}^{\gamma})^{r_{m1}} ] \end{array} $$
As for the views of t h r E n c _{ y }(r(I B F _{ m,k }(∪S _{ i })∖n)), the same argument holds. Therefore, for any coalition of fewer than n players, MPSI is playerprivate under the honestbutcurious model. □
Efficiency
Although many PSI protocols have been proposed, to the best of our knowledge, relatively few have considered the multiparty scenario [11, 17, 18, 21]. Our target is multiparty private set intersection, and the final result must be obtained by all players acting together, without a trusted thirdparty (TTP). Among previous MPSI protocols, the approach in [11] computes only the approximate number of intersections, and that in [18] requires more than two TTPs. In contrast, [21] follows almost the same method as [17] and thus has a similar complexity. The only difference exists in the security model. Hence, we only compare our scheme with that of [17].
Efficiency of [17] and the proposed protocol
[17]  Ours  

Computational complexity  O(n ω ^{2})  P _{ i }:O(ω _{ i }),O:O(n ω) 
Communication overhead  O(n ω)  P _{ i }:O(ω+n),O:O(n ω) 
Restriction on set size  S _{1}=…=S _{ n }  none 
Protected values  S _{ i }(∀i∈[1,n])  S _{ i },S _{ i }(∀i∈[1,n]) 
Our MPSI uses the Bloom filter for the computations performed by P _{ i } and the integrations performed by the \(\phantom {\dot {i}\!}\mathcal {O}\). The use of a Bloom filter eliminates the restriction on set size. Thus, in our MPSI, the set size of each player is flexible. However, P _{ i }’s computations consist of Bloom filter construction, joint decryption, and Bloom filter check. Neither the computations related to the Bloom filter nor the joint decryption depends on the number of players, as shown in “Preliminaries”. In summary, the computational complexity of operations performed by P _{ i } is O(ω _{ i }). All playerdependent data are sent to \(\phantom {\dot {i}\!}\mathcal {O}\), who integrates \({\prod }_{i=1}^{n} \mathsf {thrEnc}_{y}(\mathsf {IBF}(\cup S_{i}))\) without decryption. As a result, the computational complexity of operations performed by \(\phantom {\dot {i}\!}\mathcal {O}\) is O(n ω).
Implementation results
Implementation
To investigate the behavior and performance of our MPSI protocol, we implemented a prototype in C++ using the GNU MultiPrecision (GMP) library (version 5.1.3) and OpenSSL (version 1.0.1f). GMP is used for largeinteger arithmetic and random number generation in the exElGamal encryption. To instantiate hash functions for the Bloom filter, we used SHA1 in OpenSSL: H _{ i }(x):=sha1(s _{ i } ∥ x) mod m, where s _{ i } is a unique salt. This truncation of the hash functions is based on the recommendation of the National Institute of Standards and Technology (NIST) [8]. Each executable communicates through TCP. We used Boost.Asio C++ 1.54.0 for the TCP socket.
The C++ prototype has two executables: one for the players and one for the outsourcing provider. The prototype can work in either pipeline or parallel mode. In pipeline mode, the computation and communication threads are separated. Thus, computation and data transmission are processed in parallel when possible. Pipeline mode allows each executable to start immediately without waiting for the completion of all previous computations. Parallel mode extends the pipeline mode by multiplying the number of computation threads in each executable. The most expensive process of our protocol is Bloom filter encryption and decryption. In parallel mode, the encryption and decryption computation is conducted in multiple threads. This significantly improves the performance of our protocol.
Evaluation
All experiments were performed on the Google Compute Engine (GCE). GCE is a cloud computing system that delivers virtual machines running in Google’s data centers. In our experiments, each executable was calculated on a single virtual machine. We used the Ubuntu 14.04 LTE operating system with Intel Xeon 2.50 GHz CPUs. Each CPU core was assigned 3.75 GB of memory. Every virtual machine was connected to a virtual private network. The bandwidth between two virtual machines was approximately 2.0 Gbps, although our protocol used less than 10 Mbps.
Security parameter and group size
security parameter  p  q 

80  1024  160 
112  2048  224 
128  3072  256 
192  7680  384 
256  15360  512 
Pipeline mode performance (80bit security)
n  exe  Set size  

2^{6}  2^{8}  2^{10}  2^{12}  2^{14}  
4  \(\phantom {\dot {i}\!}\mathcal {O}\)  0.65  2.69  10.4  36.7  151 
P  0.82  3.39  13.4  54.1  214  
8  \(\phantom {\dot {i}\!}\mathcal {O}\)  0.76  2.95  12.4  44.4  178 
P  0.90  3.75  15.7  60.3  241  
16  \(\phantom {\dot {i}\!}\mathcal {O}\)  0.90  3.64  15.8  56.4  225 
P  1.30  4.71  19.2  76.1  307 
Pipeline mode performance (set size = 2^{6})
n  exe  Security parameter (bit)  

80  112  128  192  256  
4  \(\phantom {\dot {i}\!}\mathcal {O}\)  0.61  2.74  8.29  57.2  275 
P  0.87  4.28  11.1  85.7  417  
8  \(\phantom {\dot {i}\!}\mathcal {O}\)  0.72  2.95  7.84  58.1  277 
P  1.43  4.38  10.8  86.9  417  
16  \(\phantom {\dot {i}\!}\mathcal {O}\)  0.90  3.41  9.09  61.4  284 
P  1.30  5.18  12.0  91.8  433 
Breakdown of runtime (set size = 2^{6}, n=4)
exe  Process  Security parameter (bit)  

80  112  128  192  256  
\(\phantom {\dot {i}\!}\mathcal {O}\)  (A)  0.61  2.74  8.29  57.2  275 
P  (B)  0.50  2.67  6.79  55.8  275 
(C)  0.37  1.60  4.35  29.9  142  
(D)  ∼ 0.01  ∼ 0.01  ∼ 0.01  ∼ 0.01  ∼ 0.01 
Breakdown of runtime (set size = 2^{6}, Security parameter = 80)
exe  Process  Number of Players  

4  8  16  
\(\phantom {\dot {i}\!}\mathcal {O}\)  (A)  0.55  0.67  0.82 
P  (B)  0.45  0.44  0.44 
(C)  0.34  0.43  0.67  
(D)  ∼ 0.01  ∼ 0.01  ∼ 0.01 

Outsourcing provider

(A) Randomization of t h r E n c(I B F(∪S _{ i })−n)

Player

(B) Construction and encryption of B F(S _{ i })−1

(C) Joint decryption of t h r E n c _{ y }(r(I B F _{ m,k }(∪S _{ i })−n))

(D) S e t C h e c k(r(I B F(∪S _{ i })−n),S _{ i }) and obtains ∩S _{ i }
Clearly, the time consumption is dominated by the encryption and decryption of the Bloom filter array.
Parallel mode performance (80bit security)
CPU core  Set size  

2^{6}  2^{8}  2^{10}  2^{12}  2^{14}  
1  1.02  3.89  15.0  82.9  297 
2  1.49  2.83  8.72  33.0  131 
4  1.33  2.22  6.14  22.6  87.1 
Comparison
Performance comparison (80bit security)
Protocol  Set size  

2^{6}  2^{8}  2^{10}  2^{12}  2^{14}  
Kissner and Song’s ( n=4)  0.50  3.06  50.6  1051  N/A 
Our protocol ( n=4)  1.02  3.89  15.0  82.9  297 
Kissner and Song’s ( n=8)  0.92  6.41  92.0  1491  N/A 
Our protocol ( n=8)  1.50  3.05  19.4  83.2  355 
Kissner and Song’s ( n=16)  2.10  13.9  190  3246  N/A 
Our protocol ( n=16)  1.98  7.29  28.7  112  450 
The results show that our protocol is faster than Kissner and Song’s MSPI protocol when n=4 and the set size is greater than 2^{8}, when n=8 and the set size is greater than 2^{6}, and when n=16 and the set size is greater than 2^{4}. Furthermore, although Kissner and Song’s MSPI protocol crashed with a set size of 2^{14}, these results reveal that the time consumption of their protocol is approximately proportional to the square of the set size. As in our protocol, Kissner and Song’s MSPI protocol uses the (n,n)threshold scheme, so it does not require a conspiracy assumption. However, their protocol is not scalable with respect to either the set size or number of players.
Conclusion
This paper has described a practical MPSI in which some of the computations are outsourced to a thirdparty. As none of the information of S _{ i },S _{ i }(∀i∈[1,n]) is revealed to the thirdparty, this function can be safely outsourced. Our scheme satisfies that the following requirements: any restrictions on the sets are eliminated, meaning that the set size of each player can be flexibly chosen; and the computational burden on each player is independent of the number of players.
Importantly, our scheme can be applied to the efficient integration of medical and related data maintained by different organizations without violating any privacy constraints. We confirmed that the computational complexity is independent of the number of organizations from which data are being integrated.
Footnotes
 1.
The computational complexity of z for each player can be made independent of the number of players in various ways. For example, set z=1. P _{1} computes z=z⋅z _{1} and sends z to P _{2}, P _{2} computes z=z⋅z _{2} and sends z to P _{3}, and, finally, P _{ n } computes z=z⋅z _{ n } and shares z among all players. If we place all players in a binary tree, the communication complexity can be reduced, but each player’s computational complexity is still independent of the number of players.
Notes
Acknowledgments
The authors express our gratitude to anonymous referees for invaluable comments. This work is supported in part by a GrantinAid for Scientific Research (C)(15K00183) and (15K00189) and the Japan Science and Technology Agency, CREST, and Infrastructure Development for Promoting International S&T Cooperation.
References
 1.Azar, Y., Broder, A. Z., Karlin, A. R., Upfal, E, Balanced allocations. SIAM J. Comput. 29(1): 180–200, 1999.CrossRefGoogle Scholar
 2.Barker, E., Barker, W., Burr, W., Polk, W., Smid, M.: Nist special publication 80057: Recommendation for key management – part 1: General(revision 3). Technical report, National Institute of Standards and Technology (NIST) (2012)Google Scholar
 3.Bloom, B. H., Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13(7):422–426, 1970.CrossRefGoogle Scholar
 4.Broder, A., and Mitzenmacher, M., Network applications of bloom filters: A survey. Internet Math. 1(4): 485– 509, 2004.CrossRefGoogle Scholar
 5.Cramer, R., Gennaro, R., Schoenmakers, B., A secure and optimally efficient multiauthority election scheme. Eur. Trans. Telecommun. 8(5):481–490, 1997.CrossRefGoogle Scholar
 6.De Cristofaro, E., Kim, J., Tsudik, G.: Linearcomplexity private set intersection protocols secure in malicious model. In: ASIACRYPT 2010, volume 6477 of LNCS, pages 213–231. Springer (2010)Google Scholar
 7.De Cristofaro, E., and Tsudik, G.: Practical private set intersection protocols with linear complexity. In: FC 2010, volume 6052 of LNCS, pages 143–159. Springer (2010)Google Scholar
 8.Dang, Q., Nist special publication 800107: Recommendation for applications using approved hash algorithms(revision 1). Technical report, National Institute of Standards and Technology (NIST), 2012.Google Scholar
 9.Desmedt, Y., and Frankel, Y.: Threshold cryptosystems. In: CRYPTO 1989, volume 1462 of LNCS, pages 307–315. Springer (1989)Google Scholar
 10.Dong, C., Chen, L., Wen, Z.: When private set intersection meets big data: An efficient and scalable protocol. In: ACMCCS 2013, pages 789–800. ACM (2013)Google Scholar
 11.Egert, R., Fischlin, M., Gens, D., Jacob, S., Senker, M., Tillmanns, J.: Privately computing setunion and setintersection cardinality via bloom filters. In: ACISP 2015, volume 9144 of LNCS, pages 413–430. Springer (2015)Google Scholar
 12.Freedman, M. J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: EUROCRYPT 2004, volume 3027 of LNCS, pages 1–19. Springer (2004)Google Scholar
 13.Goldreich, O., Secure multiparty computation. Manuscript. Preliminary version, 1998.Google Scholar
 14.Goldwasser, S., and Micali, S., Probabilistic encryption. J. Comput. Syst. Sci. 28(2):270–299, 1984.CrossRefGoogle Scholar
 15.Ishai, Y., Kilian, J., Nissim, K., Petrank, E.: Extending oblivious transfers efficiently. In: CRYPTO 2003, volume 2729 of LNCS, pages 145–161. Springer (2003)Google Scholar
 16.Kerschbaum, F.: Outsourced private set intersection using homomorphic encryption. In: ACMCCS 2012, pages 85–86. ACM (2012)Google Scholar
 17.Kissner, L., and Song, D.: Privacypreserving set operations. In: CRYPTO 2005, volume 3621 of LNCS, pages 241–257. Springer (2005)Google Scholar
 18.Many, D., Burkhart, M., Dimitropoulos, X., Fast private set operations with sepia. Tech. Rep.,345, 2012.Google Scholar
 19.Paillier, P.: Publickey cryptosystems based on composite degree residuosity classes. In: EUROCRYPT 1999, volume 1592 of LNCS, pages 223–238. Springer (1999)Google Scholar
 20.Rabin, M. O., How to exchange secrets with oblivious transfer. Tech. Memo, TR81, 1981.Google Scholar
 21.Sang, Y., and Shen, H., Efficient and secure protocols for privacypreserving set operations. ACM Trans. Inf. Syst. Secur. 13(1):9:1–9:35, 2009.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.