1 Introduction

Private set intersection (PSI) protocols allow several mutually distrustful parties \(P_1,P_2,\dots ,P_n\) each holding a private set \(S_1,S_2,\dots ,S_n\) respectively to jointly compute the intersection \(I=\bigcap _{i=1}^n S_i\) without revealing any other information. PSI has numerous privacy-preserving applications, e.g., DNA testing and pattern matching [TPKC07], remote diagnostics [BPSW07], botnet detection [NMH+10], online advertising [IKN+17, MPR+20]. Over the last years enormous progress has been made towards realizing this functionality efficiently [HFH99, FNP04, KS05, DCT10, DCW13, PSZ14, PSSZ15, KKRT16, OOS16, RR17, KMP+17, HV17, PSWW18, PRTY19, GN19, PRTY20, CM20] in the two-party, multi-party, and server-aided settings with both semi-honest and malicious security.

Threshold PSI. In certain scenarios, the standard PSI functionality is not sufficient. In particular, the parties may only be willing to reveal the intersection if they have a large intersection. For example, in privacy-preserving data mining and machine learning [MZ17] where the data is vertically partitioned among multiple parties (that is, each party holds different features of the same object), the parties may want to learn the intersection of their datasets and start their collaboration only if their common dataset is sufficiently large. If their common dataset is too small, in which case they are not interested in collaboration, it is undesirable to let them learn the intersection. In privacy-preserving ride sharing [HOS17], multiple users only want to share a ride if large parts of their trajectories on a map intersect. In this case, the users may be interested in the intersection of their routes, but only when the intersection is large. This problem can be formalized as threshold private set intersection, where, roughly speaking, the parties only learn the intersection if their sets differ by at most T elements.

Many works [FNP04, HOS17, PSWW18, ZC18, GN19] achieve this functionality by first computing the cardinality of the intersection and then checking if this is sufficiently large. The communication complexity of these protocols scales at least linearly in the size of the smallest input set. Notice that Freedman et al. [FNP04] proved a lower bound of \(\varOmega (m)\) on the communication complexity of any private set intersection protocol, where m is the size of the smallest input set. This lower bound directly extends to protocols that only compute the cardinality of the intersection, which constitutes a fundamental barrier to the efficiency of the above protocols.

Recently, the beautiful work of Ghosh and Simkin [GS19a] revisited the communication complexity of two-party threshold PSI and demonstrated that the \(\varOmega (m)\) lower bound can be circumvented by performing a private intersection cardinality testing (i.e., testing whether the intersection is sufficiently large) instead of computing the actual cardinality. After passing the cardinality testing, their protocol allows each party to learn the set difference, where the communication complexity only grows with T, which could be sublinear in m. Specifically, [GS19a] proved a communication lower bound of \(\varOmega (T)\) for two-party threshold PSI and presented a protocol achieving a matching upper bound O(T) based on fully homomorphic encryption (FHE). They also showed a computationally more efficient protocol with communication complexity of \(\widetilde{O}(T^2)\) based on weaker assumptions, namely additively homomorphic encryption (AHE).

In this work, we investigate the communication complexity of multi-party threshold PSI. In particular, we ask the question of whether sublinear lower and upper bounds can also be achieved in the multi-party setting.

1.1 Our Contributions

We first identify and formalize the definition of multi-party threshold private set intersection. We put forth and study two functionalities that are in fact equivalent in the two-party case but are vastly different in the multi-party scenario. Assume there are n parties \(P_1,P_2,\dots ,P_n\), and each party \(P_i\) holds a private set \(S_i\) of size m. The first functionality allows the parties to learn the intersection \(I=\bigcap _{i=1}^n S_i\) only if \(\forall i, |S_i \setminus I| \le T\), or equivalently, \(|I| \ge m - T\). In the second functionality, the parties can learn the intersection I only if \(|\left( \bigcup _{i=1}^n S_i\right) \setminus I| \le T\).

We briefly discuss the difference between the two functionalities. The first functionality focuses on whether the intersection is sufficiently large, hence we call it \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\). The second functionality focuses on whether the set difference is sufficiently small, thus we call it \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {diff}}\). In the two-party case, we have the guarantee that \(|\left( \bigcup _{i=1}^n S_i\right) \setminus I|=2 \cdot |S_i \setminus I|\), so we do not have to differentiate between these two functionalities. However, in the multi-party case, we only know that \(2\cdot |S_i \setminus I| \le |\left( \bigcup _{i=1}^n S_i\right) \setminus I|\le n \cdot |S_i \setminus I|\), hence the two functionalities could lead to very different outcomes. Which functionality to choose and what threshold to set in practice highly depend on the actual application.

Sublinear Communication. The core contribution of this work is demonstrating sublinear (in the set sizes) communication lower and upper bounds for both functionalities. We summarize our results in Table 1. For lower bound, we prove that both functionalities require at least \(\varOmega (nT)\) bits of communication. For upper bound, we present protocols for both functionalities achieving a matching upper bound of O(nT) based on n-out-of-n threshold fully homomorphic encryption (TFHE) [BGG+18]. We also give a computationally more efficient protocol based on weaker assumptions, namely n-out-of-n threshold additively homomorphic encryption (TAHE) [Ben94, Pai99], with communication complexity of \(\widetilde{O}(nT)\) that almost matches the lower bound.Footnote 1 All these protocols achieve semi-honest security where up to \((n-1)\) parties could be corrupted.

Table 1. Communication lower and upper bounds for multi-party threshold PSI.

Our Protocols. As summarized in Table 1, we present three protocols for upper bounds, one for \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\) and two for \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}\). At a high level, all three protocols compute their functionality in two phases. In the first phase, they perform a multi-party private intersection cardinality testing where the parties jointly decide whether their intersection is sufficiently large. In particular, for \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\), the cardinality testing, which we call \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\), allows all the parties to learn whether \(|I| \ge (m - T)\). For \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {diff}}\), the cardinality testing, which we call \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\), allows all the parties to learn whether \(|\left( \bigcup _{i=1}^n S_i\right) \setminus I| \le T\). The communication complexity of our protocols for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\) and \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\) is summarized in Table 2. In particular, for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\), we present a protocol with communication complexity O(nT) based on TFHE. For \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\), we show a TFHE-based construction with communication complexity O(nT) and a TAHE-based construction with communication complexity \(\widetilde{O}(nT)\).

Table 2. Communication complexity of our protocols for multi-party private cardinality testing.

If the intersection is sufficiently large, namely it passes the cardinality testing, then the parties start the second phase of our protocols, which allows each party \(P_i\) to learn their set difference \(S_i \setminus I\). We present a singe protocol for the second phase, which works for both \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\) and \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {diff}}\). The second-phase protocol is based on TAHE and has communication complexity of O(nT). Thus, to construct a protocol for multi-party threshold PSI, we combine the first-phase protocols summarized in Table 2 with the second-phase one described above. Doing so, we achieve the communication upper bounds in Table 1.

This modular design enables our constructions to minimize the use of TFHE as it is not needed in the second phase. Moreover, it allows future work to focus on improving Table 2. In particular, to design a protocol for \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\) from assumptions weaker than TFHE, future work could focus on building protocols for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\) and directly plug in our second phase protocol after that.

Communication Topology. All our protocols are designed in the so-called star network topology, where a designated party communicates with every other party. An added benefit of this topology is that not all parties must be online at the same time. Our communication lower bounds are proved in point-to-point fully connected networks, which are a generalization of the star network.

For networks with broadcast channels, we prove another communication lower bound of \(\varOmega (T \log n+ n)\) for \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\) in the full version and leave further exploration in the broadcast model for future work.

1.2 Other Implications

Two-Party Threshold PSI. Recall that in the two-party case, both functionalities \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\) and \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}\) are identical. Ghosh and Simkin [GS19a] built a two-party threshold PSI protocol from AHE with communication complexity \(\widetilde{O}(T^2)\). They left it as an open problem to build a two-party threshold PSI protocol with communication complexity \(\widetilde{O}(T)\) from assumptions weaker than FHE. Observe that for the special case of \(n=2\), we can achieve a two-party threshold PSI protocol with communication complexity \(\widetilde{O}(T)\) from AHE thereby solving this open problem (refer to Sect. 6 and Sect. 7 for more details).

Sublinear Communication PSI. Our multi-party threshold PSI protocols for both \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\) and \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {diff}}\) can also be used to achieve multi-party “regular” PSIFootnote 2 where the communication complexity only grows with the size of the set difference and independent of the input set sizes. In particular, if we run a sequence of multi-party threshold PSI protocols on \(T=2^0,2^1,2^2,\dots \) until hitting the smallest \(T=2^k\) where the protocol outputs the intersection, then we can achieve multi-party PSI. The communication complexity of the resulting protocol is a factor \(\log T\) times that of a single instance but still independent of the input set sizes. Therefore, when the intersection is very large, namely the set difference is significantly smaller than the set sizes, this new approach achieves the first multi-party PSI with sublinear (in the set sizes) communication complexity.

Compact MPC. It is an open problem to construct a compact MPC protocol in the plain model where the communication complexity does not grow with the output length of the function. Prior works [HW15, BFK+19] construct compact MPC for general functions in the presence of a trusted setup (CRS, random oracle) from strong computational assumptions such as obfuscation. Our multi-party threshold PSI protocols have communication complexity independent of the output size (the set intersection). To the best of our knowledge, ours are the first compact MPC protocols for any non-trivial function in the plain model. The only prior compact protocol in the plain model we are aware of is the two-party threshold PSI protocol [GS19a].

1.3 Concurrent and Independent Work

Concurrent to our work, a recent update to the full version of the paper by Ghosh and Simkin [GS19b] extends the two-party threshold PSI protocol to the multi-party setting and consider the functionality \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\). They do not consider the functionality \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}\) that we additionally consider in our work. For \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\), [GS19b] also first constructs a TFHE-based protocol for the intersection cardinality testing \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\) with communication complexity O(nT). Then in the second phase for computing the intersection, they use an MPC protocol to compute the evaluations of a random polynomial, where the communication complexity depends on how the MPC is instantiated, which is not discussed

Another concurrent work by Branco, Döttling, and Pu [BDP21] studies multi-party private intersection cardinality testing with the functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\) and presents a TAHE-based protocol with communication complexity \(\widetilde{O}(nT^2)\), which complements our Table 2. They also do not consider the other functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\).

1.4 Roadmap

We describe some notations and definitions in Sect. 2, a technical overview in Sect. 3, and the lower bound in Sect. 4. We present the TFHE based protocols for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\) and \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {diff}}\) in Sect. 5 and the TAHE based protocol for \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {diff}}\) in Sect. 6. We present the second phase protocol to compute the actual intersection in Sect. 7.

2 Preliminaries

In this section, we introduce some notations and define the our ideal functionalities. See the full version for the remaining definitions.

2.1 Notations

We use \(\lambda \) to denote the security parameters. By \(\mathsf {poly}(\lambda )\) we denote a polynomial function in \(\lambda \). By \(\mathsf {negl}(\lambda )\) we denote a negligible function, that is, a function f such that \(f(\lambda ) < 1/p(\lambda )\) holds for any polynomial \(p(\cdot )\) and sufficiently large \(\lambda \). We use \([\![{ x }]\!]\) to denote an encryption of x. We use \(\widetilde{O}(x)\) to ignore any \(\mathsf {polylog}\) factor, namely \(\widetilde{O}(x)=O(x \cdot \mathsf {polylog}(x))\).

2.2 Multi-party Threshold Private Set Intersection

Setting. Consider n parties \(P_1,\ldots ,P_n\) with input sets \(S_1,\ldots ,S_n\) respectively. Throughout the paper, we consider all the sets to be of equal size m. We assume that the set elements come from a field \(\mathbb {F}_p\), where p is a \(\varTheta (\lambda )\)-bit prime. Also, throughout the paper, we focus only on the point-to-point network channels. For the lower bounds, we consider a setting where every pair of parties has a point-to-point channel between them. For the upper bounds, we consider a more restrictive model – the star network, where only one central party has a point-to-point channel with every other party and the other parties cannot communicate with each other.

The goal of the parties is to run an MPC protocol \(\varPi \) at the end of which each party learns the intersection I of all the sets if certain conditions hold. In the definition of two-party threshold PSI, both parties \(P_1\) and \(P_2\) learn the intersection I if the size of their set difference is small, namely \(|(S_1\setminus S_2) \cup (S_2 \setminus S_1)| < 2T\). In the multi-party case, we consider two different functionalities, each of which might be better suited to different applications.

Functionalities. In the first definition, we consider functionality \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {int}}}\), in which each party \(P_i\) learns the intersection I if the size of its own set minus the intersection is small, namely \(\left| S_i \setminus I \right| \le T\) for some threshold T. Recall that we consider all the sets to be of equal size, hence either all the parties learn the output or all of them don’t. In the second definition, we consider a functionality \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}\), where each party learns the intersection I if the size of the union of all the sets minus the intersection is small, namely \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\). The formal definitions of the two ideal functionalities are shown in Fig. 1 and Fig. 2.

Fig. 1.
figure 1

Ideal functionality \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {int}}\) for multi-party threshold PSI.

Fig. 2.
figure 2

Ideal functionality \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}\) for multi-party threshold PSI.

2.3 Multi-party Private Intersection Cardinality Testing

An important building block in our multi-party threshold PSI protocols is a multi-party protocol for private intersection cardinality testing which we define below. Consider n parties \(P_1,\ldots ,P_n\) with input sets \(S_1,\ldots ,S_n\) respectively of equal size m. Their goal is to run an MPC protocol \(\varPi \) at the end of which each party learns whether the size of the intersection I of all the sets is sufficiently large. As before, we consider two functionalities. In the first functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\), each party \(P_i\) learns whether \(\left| S_i \setminus I \right| \le T\). In the second functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\), each party learns whether \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\). The formal definitions of the two ideal functionalities are presented in Fig. 3 and Fig. 4.

Fig. 3.
figure 3

Ideal functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\) for multi-party intersection cardinality test.

Fig. 4.
figure 4

Ideal functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\) for multi-party intersection cardinality test.

3 Technical Overview

We now give an overview of the techniques used in our work. We denote \(P_1\) as the designated party that can communicate with all the other parties.

3.1 TFHE-Based Protocol for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\)

In Sect. 5.1 we construct a protocol for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\) from TFHE. Our starting point is the two-party protocol of [GS19a]. Recall that there are two parties Alice and Bob with sets \(S_A=\{a_1,\dots , a_m\}\) and \(S_B=\{b_1,\dots , b_m\}\) respectively. These sets define two polynomials \(\mathsf {p}_A(\text {x}):=\prod _{i=1}^m (\text {x}-a_i)\) and \(\mathsf {p}_B(\text {x}):=\prod _{i=1}^m (\text {x}-b_i)\). Let \(I := S_A \cap S_B\) be the intersection. A key observation in [MTZ03, GS19a] is that \(\mathsf {p}(\text {x}) := \frac{\mathsf {p}_B(\text {x})}{\mathsf {p}_A(\text {x})} = \frac{\mathsf {p}_{B\setminus I}(\text {x})}{\mathsf {p}_{A\setminus I}(\text {x})}.\) Both the numerator and denominator of \(\mathsf {p}\) have degree \(m-|I|\). If \(m-|I|=|S_A \setminus I|\le T\), then \(\mathsf {p}(\text {x})\) has degree at most 2T and can be recovered from \(2T+1\) evaluations by rational function interpolation.Footnote 3 Given \(\mathsf {p}(\text {x})\), the elements in \(S_A \setminus I\) are simply the roots of the polynomial in the denominator.

Two-Party Protocol. At a high level, the two-party protocol [GS19a] works as follows. First, Alice and Bob evaluate their own polynomials on \(2T+1\) publicly known distinct points \(\{\alpha _1,\dots , \alpha _{2T+1}\}\) to obtain \(\{\mathsf {p}_A(\alpha _1),\dots ,\mathsf {p}_A(\alpha _{2T+1})\}\) and \(\{\mathsf {p}_B(\alpha _1),\dots ,\mathsf {p}_B(\alpha _{2T+1})\}\), respectively. Then, Alice generates a public-secret key pair for FHE and sends Bob the FHE public key, encrypted evaluations \(\{[\![{ \mathsf {p}_A(\alpha _1) }]\!],\dots ,[\![{ \mathsf {p}_A(\alpha _{2T+1}) }]\!]\}\), a uniformly random z and encrypted evaluation \([\![{ \mathsf {p}_A(z) }]\!]\). Bob can homomorphically interpolate the rational function \([\![{ \mathsf {p}(\text {x}) }]\!]\) from \(\{[\![{ \mathsf {p}_A(\alpha _1) }]\!],\dots ,[\![{ \mathsf {p}_A(\alpha _{2T+1}) }]\!]\}\) and \(\{\mathsf {p}_B(\alpha _1),\dots ,\mathsf {p}_B(\alpha _{2T+1})\}\), and then homomorphically compute \([\![{ \mathsf {p}(z) }]\!]\). Bob can also compute \(\mathsf {p}_B(z)\) and homomorphically compute \(\frac{\mathsf {p}_B(z)}{[\![{ \mathsf {p}_A(z) }]\!]}\). We know that \(\mathsf {p}(z) = \frac{\mathsf {p}_B(z)}{{\mathsf {p}_A(z)}}\) if and only if the degree of \(\mathsf {p}(\text {x})\) is \(\le 2T\). Therefore Bob homomorphically computes an encryption of the predicate \([\![{ b }]\!] := \left( [\![{ \mathsf {p}(z) }]\!]\overset{?}{=} \frac{\mathsf {p}_B(z)}{[\![{ \mathsf {p}_A(z) }]\!]}\right) \) and sends the encryption \([\![{ b }]\!]\) back to Alice. Finally Alice decrypts and learns b.

Multi-party Protocol. For n parties, a natural idea is to consider

$$\begin{aligned} \mathsf {p}(\text {x}) := \frac{\mathsf {p}_2(\text {x})+\dots +\mathsf {p}_n(\text {x})}{\mathsf {p}_1(\text {x})} = \frac{\mathsf {p}_{2\setminus I}(\text {x})+\dots +\mathsf {p}_{n\setminus I}(\text {x})}{\mathsf {p}_{1\setminus I}(\text {x})}, \end{aligned}$$
(1)

where \(\mathsf {p}_i(\text {x})\) encodes the set \(S_i=\{a^i_{1},\dots ,a^i_m\}\) as \(\mathsf {p}_i(\text {x}):= \prod _{j=1}^m (\text {x}-a^i_{j})\). The n parties first jointly generate the TFHE keys. Each party \(P_i\) sends encrypted evaluations \(\{[\![{ \mathsf {p}_i(\alpha _1) }]\!],\dots , [\![{ \mathsf {p}_i(\alpha _{2T+1}) }]\!],\) \([\![{ \mathsf {p}_i(z) }]\!]\}\) to \(P_1\). Now \(P_1\) can interpolate \([\![{ \mathsf {p}(\text {x}) }]\!]\) from \(2T+1\) evaluations and compute an encryption \([\![{ b }]\!] := \left( [\![{ \mathsf {p}(z) }]\!]\overset{?}{=} \frac{[\![{ \mathsf {p}_2(z) }]\!]+\dots +[\![{ \mathsf {p}_n(z) }]\!]}{\mathsf {p}_1(z)}\right) \). Finally the parties jointly decrypt \([\![{ b }]\!]\).

Unexpected Degree Reduction. This seemingly correct protocol has a subtle issue.Footnote 4 Intuitively, we want to argue that \(\mathsf {p}(\text {x})\) in Eq. 1 has degree \(\le 2T\) if and only if \(|S_1 \setminus I|\le T\). However, this is not true because elements not in the intersection might be accidentally canceled out, which results in a lower degree than the intersection carnality would imply. As a concrete example, consider three sets with distinct elements \(S_1 = \{a\}\), \(S_2 = \{b\}\), \(S_3 = \{c\}\), where \(b+c=2\cdot a\). The intersection \(I=\emptyset \). Ideally we hope the rational polynomial \(\mathsf {p}(\text {x})\) has degree 1 in both the numerator and denominator because \(|S_1\setminus I|=1\). However,

$$\begin{aligned} \mathsf {p}(\text {x}) = \frac{(\text {x}-b)+(x-c)}{x-a} = \frac{2x-(b+c)}{x-a}=\frac{2x-2a}{x-a}=2. \end{aligned}$$

Randomness to the Rescue. On first thought, this approach seems fundamentally flawed as additional roots can always be created if we add polynomials in the numerator. To solve this problem, we add a random multiplicative term \((\text {x}-r_i)\) to each polynomial \(\mathsf {p}_i\) and set a new polynomial \(\mathsf {p}'_i(\text {x}):= \mathsf {p}_i(\text {x}) \cdot (\text {x}-r_i)\) for a random \(r_i\) chosen by party \(P_i\). Now, consider the rational polynomial

$$\begin{aligned} \mathsf {p}'(\text {x}) := \frac{\mathsf {p}'_2(\text {x})+\dots +\mathsf {p}'_n(\text {x})}{\mathsf {p}'_1(\text {x})} = \frac{\mathsf {p}'_{2\setminus I}(\text {x})+\dots +\mathsf {p}'_{n\setminus I}(\text {x})}{\mathsf {p}'_{1\setminus I}(\text {x})}. \end{aligned}$$

At a high level, the terms \((\text {x}-r_i)\) will randomize the roots of the numerator sufficiently to ensure that these roots are unlikely to coincide with the roots of the denominator.

3.2 TFHE-Based Protocol for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)

In Sect. 5.2 we present an TFHE-based protocol for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\). In summary, party \(P_1\) tries to homomorphically interpolate

$$ \widetilde{\mathsf {p}}_i(\text {x}) = \frac{\mathsf {p}_i(\text {x})}{\mathsf {p}_1(\text {x})}=\frac{\mathsf {p}_{i\setminus 1}(\text {x})}{\mathsf {p}_{1\setminus i}(\text {x})} $$

from \((2T+1)\) evaluations and computes encrypted \(D_{1,i} = S_1\setminus S_i\) as well as \(D_{i,1} = S_i\setminus S_1\) for every other party \(P_i\). Note that if \(|\left( \bigcup _{i=1}^m S_i\right) \setminus I| \le T\), then \(|S_i\setminus I| \le T\) for all i and the degree of each \(\widetilde{\mathsf {p}}_i(\text {x})\) is at most 2T, hence \(P_1\) can interpolate it using \((2T+1)\) evaluations. Observe that \(\left( \bigcup _{i=1}^m S_i\right) \setminus I = \bigcup _{i=2}^m \left( D_{1,i}\cup D_{i,1}\right) \), because each element \(a \in \left( \bigcup _{i=1}^m S_i\right) \setminus I\) must be one of the two cases: (1) \(a\in S_1\) and \(a\notin S_i\) for some i (i.e., \(a\in D_{1,i}\)), or (2) \(a \notin S_1\) and \(a\in S_i\) for some i (i.e., \(a\in D_{i,1}\)). Therefore, party \(P_1\) can homomorphically compute an encryption of \(\left( \bigcup _{i=1}^m S_i\right) \setminus I\) and an encryption of the predicate \(b = \left( \left| \left( \bigcup _{i=1}^m S_i\right) \setminus I\right| \overset{?}{\le } T\right) \). Finally, as before, the n parties jointly decrypt \([\![{ b }]\!]\) to learn the output.

3.3 TAHE-Based Protocol for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)

Section 6 presents our protocol for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\) based on TAHE. This protocol reduces the communication complexity for two-party from \(\widetilde{O}(T^2)\) to \(\widetilde{O}(T)\) as well as generalizes it to multi-party with communication \(\widetilde{O}(Tn)\).

Two-Party Protocol. For two parties Alice and Bob with private sets \(S_A\) and \(S_B\), if we encode their elements into two polynomials \(\mathsf {p}_A(\text {x})=\sum _{i=1}^m \text {x}^{a_i}\) and \(\mathsf {p}_B(\text {x})=\sum _{i=1}^m \text {x}^{b_i}\), then the number of monomials in the polynomial \(\mathsf {p}(\text {x}) := \mathsf {p}_A(\text {x}) - \mathsf {p}_B(\text {x})\) is exactly \(|(S_A \setminus S_B) \cup (S_B \setminus S_A)|\). Now the problem of cardinality testing (i.e., determining if \(|(S_A \setminus S_B) \cup (S_B \setminus S_A)| \le 2T\)) has be reduced to determining whether the number of monomials in \(\mathsf {p}(\text {x})\) is \(\le 2T\). Using the polynomial sparsity test of Grigorescu et al. [GJR10], we can further reduce the problem to determining whether the Hankel matrix below is singular or not:

$$ H= \left[ \begin{array}{cccccc} \mathsf {p}(u^0) &{} \mathsf {p}(u^1) &{} \dots &{} \mathsf {p}(u^{2T})\\ \mathsf {p}(u^1) &{} \mathsf {p}(u^2) &{} \dots &{} \mathsf {p}(u^{2T+1})\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \mathsf {p}(u^{2T}) &{} \mathsf {p}(u^{2T+1}) &{} \dots &{} \mathsf {p}(u^{4T})\\ \end{array}, \right] $$

where u is chosen uniformly at random. In the two-party protocol, Alice generates a public-secret key pair for AHE and sends Bob the public key, a uniformly random u along with encrypted Hankel matrix for \(\mathsf {p}_A\). Then Bob can homomorphically compute encrypted Hankel matrix for \(\mathsf {p}\). Now Alice holds the secret key and Bob holds an encryption of matrix H. They need to jointly perform a secure matrix singularity testing to determine if the matrix is singular, which can be done using the protocol of Kiltz et al. [KMWF07] with communication \(\widetilde{O}(T^2)\).

Our Approach. Our key observation is that the protocol of Kiltz et al. [KMWF07] can be used to perform singularity testing for arbitrary matrices, while we are only interested in testing the singularity of Hankel matrices. Since a Hankel matrix only has linear (in its dimension) number of distinct entries, there is a more efficient way to test its singularity. In particular, the work of Brent et al. [BGY80] demonstrates an elegant connection between the problem of testing singularity of a Hankel matrix and the so-called “half-GCD” problem, which can be solved in quasi-linear time. Thus, testing singularity of the Hankel matrix H only takes \(\widetilde{O}(T)\) computation. In our scenario, we can first let Alice and Bob learn an additive share of H, and then engage in a two-party computation (using AHE or Yao’s garbled circuits) to jointly test if H is singular or not. The important point to note here is that both communication and computation are only quasi-linear in the dimension of H. This is already an improvement over the quadratic cost of protocol in [KMWF07] and solves the open problem posed by Ghosh and Simkin [GS19a].

Multi-party Protocol. In designing a multi-party protocol, our strategy is to first find a polynomial where the number of monomials equals the size of the set difference \(\left| \left( \bigcup _{i=1}^m S_i\right) \setminus I\right| \). Furthermore, the polynomial should only involve linear operations among the parties, which allows the parties to obtain additive secret shares of the Hankel matrix for the polynomial. Then, the parties perform an MPC protocol to test singularity of the Hankel matrix.

3.4 Computing Set Intersection

In Sect. 7 we present a single construction that computes the concrete set intersection for both \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {int}}\) and \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {diff}}\) after the cardinality testing.

Two-Party Protocol. For two parties Alice and Bob, we use the first encoding method to encode the elements into two polynomials \(\mathsf {p}_A(\text {x})=\prod _{i=1}^m (\text {x}-a_i)\) and \(\mathsf {p}_B(\text {x})=\prod _{i=1}^m (\text {x}-b_i)\). After the cardinality testing, we already know that the rational polynomial \(\mathsf {p}(\text {x}) := \frac{\mathsf {p}_B(\text {x})}{\mathsf {p}_A(\text {x})} = \frac{\mathsf {p}_{B\setminus I}(\text {x})}{\mathsf {p}_{A\setminus I}(\text {x})}\) has degree at most 2T. If Alice learns the evaluation of \(\mathsf {p}_B(\cdot )\) on \(2T+1\) distinct points \(\{\alpha _1,\dots , \alpha _{2T+1}\}\), then she can evaluate \(\mathsf {p}_A\) on those points by herself and compute \(\{\mathsf {p}(\alpha _1),\dots , \mathsf {p}(\alpha _{2T+1})\}\). Using these evaluations of \(\mathsf {p}(\cdot )\), Alice can recover \(\mathsf {p}(\text {x})\) by rational polynomial interpolation, and then learn the set difference \(S_A \setminus I\) from the denominator of \(\mathsf {p}(\text {x})\). However, \(\mathsf {p}(\text {x})\) also allows Alice to learn \(S_B\setminus I\), which breaks security. Instead of letting Alice learn the evaluations of \(\mathsf {p}_B(\cdot )\), the two-party protocol of [GS19a] enables Alice to learn the evaluations of a “noisy” polynomial \(\mathsf {V}(\text {x}) := \mathsf {p}_A(\text {x})\cdot \mathsf {R}_1(\text {x}) + \mathsf {p}_B(\text {x}) \cdot \mathsf {R}_2(\text {x})\), where \(\mathsf {R}_1\) and \(\mathsf {R}_2\) are uniformly random polynomials of degree T. Note that

$$\begin{aligned} \mathsf {p}'(\text {x}) :=\frac{\mathsf {V}(\text {x})}{\mathsf {p}_A(\text {x})} = \frac{\mathsf {p}_{A\setminus I}(\text {x})\cdot \mathsf {R}_1(\text {x}) + \mathsf {p}_{B\setminus I}(\text {x}) \cdot \mathsf {R}_2(\text {x})}{\mathsf {p}_{A\setminus I}(\text {x})} \end{aligned}$$

has degree at most 3T. Given \(3T+1\) evaluations of \(\mathsf {V}(\cdot )\), Alice can interpolate \(\mathsf {p}'(\text {x})\) and figure out the denominator, but now the numerator is sufficiently random and does not leak any other information about \(S_B\).

Multi-party Protocol. For n parties, we first encode each set \(S_i=\{a^i_{1},\dots ,a^i_m\}\) as a polynomial \(\mathsf {p}_i(\text {x}):= \prod _{j=1}^m (\text {x}-a^i_{j})\), and then define

$$\begin{aligned} \mathsf {V}(\text {x}) :=&\, \mathsf {p}_1(\text {x})\cdot \mathsf {R}_1(\text {x}) + \dots + \mathsf {p}_n(\text {x}) \cdot \mathsf {R}_n(\text {x})\\ :=&\, \mathsf {p}_1(\text {x})\cdot \left( \mathsf {R}_{1,1}(\text {x}) + \dots +\mathsf {R}_{n,1}(\text {x})\right) + \dots + \mathsf {p}_n(\text {x}) \cdot \left( \mathsf {R}_{1,n}(\text {x}) +\dots + \mathsf {R}_{n,n}(\text {x})\right) , \end{aligned}$$

where \(\left( \mathsf {R}_{i,1},\dots ,\mathsf {R}_{i,n}\right) \) are random polynomials of degree T generated by party \(P_i\). Different from the two-party protocol, it is crucial that each party \(P_i\) contributes a random term in every polynomial \(\mathsf {R}_1,\dots , \mathsf {R}_n\). For both functionalities \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {int}}\) and \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {diff}}\), if the protocol passes the cardinality testing, then

$$\begin{aligned} \mathsf {p}'(\text {x}) := \frac{\mathsf {V}(\text {x})}{\mathsf {p}_1(\text {x})} = \frac{\mathsf {p}_{1\setminus I}(\text {x})\cdot \mathsf {R}_1(\text {x}) + \dots + \mathsf {p}_{n\setminus I}(\text {x}) \cdot \mathsf {R}_n(\text {x})}{\mathsf {p}_{1\setminus I}(\text {x})} \end{aligned}$$

has degree at most 3T. If \(P_1\) learns \(3T+1\) evaluations of \(\mathsf {V}(\cdot )\), then it can interpolate \(\mathsf {p}'(\text {x})\) and recover \(S_1\setminus I\) from the denominator while the numerator does not leak any other information. Since \(\mathsf {V}(\cdot )\) can be broken down to linear operations among the parties, it can be securely evaluated by TAHE.

Communication Blow-Up. However, this protocol requires \(O(n^2)\) communication complexity per evaluation, and the total communication complexity is \(O(n^2T)\) for \((3T+1)\) evaluations. Observe that the bottleneck of the communication in this approach is that every party \(P_i\) needs to contribute n randomizing polynomials \(\left( \mathsf {R}_{i,1},\dots ,\mathsf {R}_{i,n}\right) \). Through a careful analysis we demonstrate that it is sufficient for each party to only contribute two randomizing polynomials. The first is used to randomize their own polynomial while the second randomizes the polynomials from the other parties. Nevertheless, there is a subtle issue of unexpected degree reduction, similar to what we have seen in the TFHE-based protocol \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {int}}\). We follow the same approach as in the TFHE-based protocol by adding additional randomness in th polynomial, which reduces the communication complexity to O(nT).

3.5 Lower Bounds

We briefly discuss the communication lower bound for multi-party threshold PSI. To prove lower bound in the point-to-point network, we perform a reduction from two-party threshold PSI (for which [GS19a] showed a lower bound of \(\varOmega (T)\)) to multi-party threshold PSI. We first prove that the total “communication complexity of any party” is \(\varOmega (T)\) which denotes the sum of all the bits exchanged by that party (both sent and received). As a corollary, the total communication complexity of any multi-party threshold PSI protocol is \(\varOmega (nT)\). We refer to Sect. 4 for more details about the reduction.

To prove a lower bound in the broadcast model, we rely on the communication lower bound of the multi-party set disjointness problem shown by Braverman and Oshman [BO15]. We reduce the problem of multi-party set disjointness to multi-party threshold PSI \(\mathcal {F}_{\mathsf {TPSI}\text {-}\mathsf {int}}\) and prove a lower bound \(\varOmega (T \log n + n)\) for any multi-party threshold PSI protocol in the broadcast network. We refer to the full version for more details about the reduction.

4 Communication Lower Bound

In this section, we prove communication lower bounds for multi-party threshold PSI protocols in the point-to-point network model. Recall that we consider all parties to have sets of the same size m. We show that any secure protocol must have communication complexity at least \(\varOmega (n\cdot T)\) for both functionalities \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {int}}}}\) and \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}}\). We prove the lower bound for \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {int}}}}\) and defer the proof for \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}}\) to the full version. Before proving the lower bound, we first prove another related theorem below.

Theorem 1

For any multi-party threshold PSI protocol for functionality \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {int}}}}\) that is secure against a semi-honest adversary that can corrupt up to \((n-1)\) parties, for every party \(P_i\), the communication complexity of \(P_i\) is \(\varOmega (T)\).Footnote 5

Proof

Suppose this is not true. That is, suppose there exists a secure multi-party threshold PSI protocol \(\varPi \) for functionality \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {int}}}}\) in which for some party \(P_{i^*}\), \(\mathsf {CC}(P_{i^*}) = o(T)\) where \(\mathsf {CC}(\cdot )\) denotes the communication complexity. We will now use this protocol \(\varPi \) as a subroutine to design a secure two-party threshold PSI protocol which has communication complexity o(T).

Consider two parties \(Q_1\) and \(Q_2\) with input sets \(X_1\) and \(X_2\) (of same size m) who wish to run a secure two-party threshold PSI protocol for the following functionality: both parties learn the output if \(|(X_1\setminus X_2) \cup (X_2 \setminus X_1)| \le 2 \cdot T\). We invoke the multi-party threshold PSI protocol \(\varPi \) with threshold T as follows: \(Q_1\) emulates the role of party \(P_{i^*}\) with input set \(S_{i^*} = X_1\) and \(Q_2\) emulates the role of all the other \((n-1)\) parties with each of their input sets as \(X_2\). From the definition of the functionality \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {int}}}}\), \(Q_1\) learns the output at the end of the protocol if and only if \(|X_1 \setminus I| \le T\). Similarly, \(Q_2\) learns the output at the end of the protocol if and only if \(|X_2 \setminus I| \le T\). Notice that since \(|X_1| = |X_2|\) and \(I = X_1 \cap X_2\), \(|X_1 \setminus I| = |X_2 \setminus I|\). Thus, the parties learn the output if and only if \((|X_1 \setminus I|) + (|X_2 \setminus I|) \le 2 \cdot T\), namely \(|(X_1\setminus X_2) \cup (X_2 \setminus X_1)| \le 2\cdot T\), which is the functionality of the two-party threshold PSI. Therefore, correctness is easy to observe. For security, notice that if \(Q_1\) is corrupt, we can simulate it by considering only a corrupt \(P_{i^*}\) in the underlying protocol \(\varPi \) and if \(Q_2\) is corrupt, we can simulate it by considering all parties except \(P_{i^*}\) to be corrupt in the underlying protocol \(\varPi \).

Finally, notice that the communication complexity of the two-party protocol is exactly the same as \(\mathsf {CC}(P_{i^*})\) in the multi-party protocol \(\varPi \), which is o(T). However, recall from the work of Ghosh and Simkin [GS19a] that any two-party threshold PSI for this functionality has communication complexity lower bound \(\varOmega (T)\) leading to a contradiction. Thus, the assumption that there exists a secure multi-party PSI protocol \(\varPi \) in which for some party \(P_{i^*}\), \(\mathsf {CC}(P_{i^*}) = o(T)\) is wrong and this completes the proof of the theorem.

It is easy to observe that as a corollary of the above theorem, in a setting with only point-to-point channels (which also includes the star network), the overall communication complexity of the protocol must be at least n times the minimum communication complexity that each party is involved in, giving the lower bound of \(\varOmega (n \cdot T)\). Formally,

Corollary 1

For any multi-party threshold PSI protocol for functionality \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {int}}}}\) that is secure against a semi-honest adversary that can corrupt up to \((n-1)\) parties, the communication complexity is \(\varOmega (n \cdot T)\).

5 TFHE-Based Private Intersection Cardinality Testing

In this section, we present two protocols for private intersection cardinality testing, one for functionalities \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {int}}\) (described in Fig. 3) and the other for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\) (described in Fig. 4). Both protocols are based on n-out-of-n threshold fully homomorphic encryption with distributed setup. The former functionality states that the intersection must be of size at least \((m-T)\) where m is the size of each set. The latter functionality requires the difference between the union of all the sets and the intersection be of size at most T. Due to the possibility of elements appearing in a strict subset of the sets, these two functionalities are not equivalent.

5.1 Protocol for Functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\)

In this protocol, we compute the cardinality predicate b where \(b=1\) if and only if \(\forall i, \left| S_i \setminus I \right| \le T\). The communication complexity of this protocol involves sending O(nT) TFHE ciphertexts and performing a single decryption of the result. We briefly describe the approach below.

Each party \(P_i\) first encodes their set \(S_i\) as a polynomial \(\mathsf {p}_{i}(\text {x}):=\prod _{a\in S_i}(\text {x}-a) \in \mathbb {F}[\text {x}]\). Each of these polynomials are then randomized as \(\mathsf {p}_{i}'(\text {x}):=\mathsf {p}_{i}(\text {x})\cdot (\text {x}-r_i)\) where \(P_i\) uniformly samples \(r_i{\mathop {\leftarrow }\limits ^{\$}}\mathbb {F}\). The central party also picks a random \(z {\mathop {\leftarrow }\limits ^{\$}}\mathbb {F}\) which is sent to every other party. Each party \(P_i\) then computes \(e_{i,j}:=\mathsf {p}_{i}'(j)\) for \(j\in [2T+3]\) and \(e'_{i}:=\mathsf {p}_{i}'(z)\). \(P_i\) sends the ciphertexts \([\![{ e_{i,j} }]\!]:=\mathsf {TFHE.Enc}(\mathsf {pk}, e_{i,j})\) and \([\![{ e'_i }]\!]:=\mathsf {TFHE.Enc}(\mathsf {pk}, e'_{i})\) to \(P_1\). Party \(P_1\) considers the rational polynomial

$$\begin{aligned} \mathsf {p}'(\text {x})=\frac{\mathsf {p}'_{2}(\text {x})+\dots +\mathsf {p}'_{n}(\text {x})}{\mathsf {p}'_{1}(\text {x})} \end{aligned}$$

and homomorphically computes \(2T+3\) encrypted evaluations

$$\begin{aligned} \left( j,[\![{ \frac{e_{2,j}+\dots +e_{n,j}}{e_{1,j}} }]\!]\right) \end{aligned}$$

for \(j=[2T+3]\). Using these encrypted evaluations, \(P_1\) homomorphically computes an encrypted rational polynomial \([\![{ \mathsf {p}^*(\text {x}) }]\!]\) using rational polynomial interpolation. Note that \(\mathsf {p}^*(\text {x}) = \mathsf {p}'(\text {x})\) if \(\mathsf {p}'(\text {x})\) has degree at most \(2T+2\). Furthermore, \(P_1\) can homomorphically compute an encryption of the predicate \(b:=\left( \mathsf {p}^*(z) \overset{?}{=} \frac{e_{2}'+\dots +e_{n}'}{e_{1}'}\right) \). Finally the parties jointly perform a threshold decryption of \([\![{ b }]\!]\) and party \(P_1\) learns the output which is sent to every other party. The full protocol is detailed in Fig. 5.

Fig. 5.
figure 5

Multi-party private intersection cardinality testing protocol \(\varPi _{{\mathsf {TFHE}\text {-}\mathsf {CTest}\text {-}\mathsf {int}}}\) for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {int}}}\).

Theorem 2

Assuming threshold FHE with distributed setup, protocol \(\varPi _{{\mathsf {TFHE}\text {-}\mathsf {CTest}\text {-}\mathsf {int}}}\) (Fig. 5) securely realizes \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {int}}\) (Fig. 3).

Proof

Correctness. We first prove the protocol is correct. By the correctness of the \(\mathsf {TFHE}\) scheme, we only need to show that the computed predicate \(b=1\) if and only if \(\forall i,|S_i \setminus I|\le T\). First consider the case where the protocol should output \(\textsf {similar}\). Since

$$\begin{aligned} \mathsf {p}'(\text {x})=\frac{\mathsf {p}_{2}'(\text {x})+\dots +\mathsf {p}_{n}'(\text {x})}{\mathsf {p}_{1}'(\text {x})}=\frac{\mathsf {p}_{2\setminus I}(\text {x})\cdot (\text {x}-r_2)+\dots +\mathsf {p}_{n\setminus I}(\text {x})\cdot (\text {x}-r_n)}{\mathsf {p}_{1\setminus I}(\text {x})\cdot (\text {x}-r_1)}, \end{aligned}$$

the degree of each term \(\mathsf {p}_{i\setminus I}(\text {x})\cdot (\text {x}-r_i)\) is at most \(T+1\) and therefore the rational polynomial interpolation requires a total of \((2T+3)\) evaluation points. Therefore \(\mathsf {p}^*(\text {x})=\mathsf {p}'(\text {x})\) and \(\mathsf {p}^*(z)=\mathsf {p}'(z)=\frac{e_2'+\dots +e_n'}{e_{1}'}\). Thus \(b=1\) as required.

Now consider the case where the protocol should output different, namely when \(|I|< m-T\). Observe that \(\gcd (\mathsf {p}_{1\setminus I},\cdots ,\mathsf {p}_{n\setminus I})=1\) by construction and therefore

$$\begin{aligned} \gcd \left( \mathsf {p}_{2\setminus I}'(\text {x})+\dots +\mathsf {p}_{n\setminus I}'(\text {x}), \mathsf {p}_{1\setminus I}'(\text {x})\right) = 1 \end{aligned}$$

except with negligible probability, where \(\mathsf {p}_{i\setminus I}'(\text {x}):=\mathsf {p}_{i\setminus I}(\text {x})\cdot (\text {x}-r_i)\). The algebraic proof is deferred to the full version.

Assuming \(\gcd \left( \mathsf {p}_{2\setminus I}'(\text {x})+\dots +\mathsf {p}_{n\setminus I}'(\text {x}), \mathsf {p}_{1\setminus I}'(\text {x})\right) = 1\), it then follows that the degree of the rational polynomial \(\mathsf {p}'(\text {x})\) is the degree of \(\mathsf {p}_{2\setminus I}'(\text {x})+\dots +\mathsf {p}_{n\setminus I}'(\text {x})\) plus the degree of \(\mathsf {p}'_{1\setminus I}(\text {x})\). The former must have a leading term with degree \((m-|I|+1) > (T+1)\). Similarly, the latter also has degree \((m-|I|+1)> T+1\). Hence the degree of \(\mathsf {p}'(\text {x})\) is at least \(2T+4\). The probability of \(b=1\) is \(\Pr _z[\mathsf {p}'(z)=\mathsf {p}^*(z)]\) where \(\mathsf {p}^*(\text {x})\) is the polynomial interpolated by \(P_1\) using \((2T+3)\) evaluations. However, since the degree of \(\mathsf {p}'(\text {x})\) is at least \(2T+4\), \(\Pr _z[\mathsf {p}'(z)=\mathsf {p}^*(z)] \le \mathsf {negl}(\lambda )\).

Communication Cost. Each party sends \((2T+4)\) TFHE encryptions and one partial decryption to \(P_1\) where each plaintext is a field element. \(P_1\) sends one ciphertext to every other party. The size of each encryption and each partial decryption is \(\mathsf {poly}(\lambda )\). Thus, the overall communication complexity is \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\) in a star network and the protocols runs in O(1) rounds.

Security. Consider an environment \(\mathcal {Z}\) who corrupts a set \(\mathcal {S}^*\) of \(n^*\) parties where \(n^*<n\). The simulator \(\mathsf {Sim}\) has output \(w\in \{\textsf {similar},\textsf {different}\}\) from the ideal functionality. \(\mathsf {Sim}\) sets a bit \(b^*=1\) if \(w=\textsf {similar}\) and \(b^*=0\) otherwise. Also, for each corrupt party \(P_i\), \(\mathsf {Sim}\) has as input the tuple \((S_i, r_i)\) indicating the party’s input and randomness for the protocol. The strategy of the simulator \(\mathsf {Sim}\) for our protocol is described below.

  1. 1.

    \(\mathsf {Sim}\) runs the distributed key generation algorithm \(\mathsf {TFHE.DistSetup}(1^\lambda , i)\) of the TFHE scheme honestly on behalf of each honest party \(P_i\) as in the real world. Note that \(\mathsf {Sim}\) also knows \((\{\mathsf {sk}_i\}_{i \in S^*})\) as it knows the randomness for the corrupt parties.

  2. 2.

    In Steps 24 of the protocol, \(\mathsf {Sim}\) plays the role of the honest parties exactly as in the real world except that on behalf of every honest party \(P_i\), whenever \(P_i\) has to send any ciphertext, compute \([\![{ 0 }]\!] = \mathsf {TFHE.Enc}(0)\) using fresh randomness.

  3. 3.

    In Step 5, on behalf of each honest party \(P_i\), instead of sending the value \([\![{ b : \mathsf {sk}_i }]\!]\) by running the honest \(\mathsf {TFHE.PartialDec}\) algorithm as in the real world, \(\mathsf {Sim}\) computes the partial decryptions by running the simulator \(\mathsf {TFHE.Sim}\) as follows: \(\{[\![{ b : \mathsf {Sim}_i }]\!]\}_{i\in [n]\setminus \mathcal {S}^*}\leftarrow \mathsf {TFHE.Sim}(\mathsf {C}, b^*,[\![{ b }]\!],\{\mathsf {sk}_i\}_{i\in \mathcal {S}^*})\) where the circuit \(\mathsf {C}\) denotes the whole computation done by \(P_1\) in the real world to evaluate bit b. On behalf of the honest party \(P_i\) the simulator sends \([\![{ b : \mathsf {Sim}_i }]\!]\). This corresponds to the ideal world.

We now show that the above simulation strategy is successful against all environments \(\mathcal {Z}\) that corrupt parties in a semi-honest manner. We will show this via a series of computationally indistinguishable hybrids where the first hybrid \(\mathsf {Hybrid}_0\) corresponds to the real world and the last hybrid \(\mathsf {Hybrid}_2\) corresponds to the ideal world.

  • \(\mathsf {Hybrid}_0\) - Real World: In this hybrid, consider a simulator \(\mathsf {SimHyb}\) that plays the role of the honest parties as in the real world.

  • \(\mathsf {Hybrid}_1\) - Simulate Partial Decryptions: - In this hybrid, in Step 5, \(\mathsf {SimHyb}\) simulates the partial decryptions generated by the honest parties as done in the ideal world. That is, the simulator calls \(\{[\![{ b : \mathsf {Sim}_i }]\!]\}_{i\in [n]\setminus \mathcal {S}}\leftarrow \mathsf {TFHE.Sim}(\mathsf {C}, b^*,[\![{ b }]\!],\{\mathsf {sk}_i\}_{i\in \mathcal {S}})\). On behalf of the honest party \(P_i\) the simulator sends \([\![{ b : \mathsf {Sim}_i }]\!]\) instead of \([\![{ b : \mathsf {sk}_i }]\!]\).

  • \(\mathsf {Hybrid}_2\) - Switch Encryptions: In this hybrid, \(\mathsf {SimHyb}\) now computes every ciphertext generated on behalf of any honest party as encryptions of 0 as done by \(\mathsf {Sim}\) in the ideal world. This hybrid corresponds to the ideal world.

We show that every pair of consecutive hybrids is computationally indistinguishable in the full version.

5.2 Protocol for Functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)

This protocol will compute the cardinality predicate b where \(b=1\) if and only if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\). The core idea behind the protocol is that \(P_1\) (the star of the network) and \(P_i\) first run a protocol to compute an encryption (via TFHE) of their set differences \(D_{1,i}=S_1\setminus S_i\) and \(D_{i,1}=S_i\setminus S_1\) with O(T) communication complexity if \(|S_1\setminus S_i| \le T\). Before we describe how this is achieved, notice that at this point, the protocol enables \(P_1\) to reconstruct an encryption of \(\left( \bigcup _{i=1}^n S_i\right) \setminus I =\bigcup _{i\in [n]\setminus \{1\}} (D^*_{1,i}\cup D^*_{i,1})\) and a predicate b where \(b=1\) if and only if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\). \(P_1\) can then send this encryption to all parties to run threshold decryption.

We now describe in more detail how the encryption of \(D_{1,i}\) and \(D_{i,1}\) are computed. The idea follows from the two-party protocol of Ghosh and Simkin [GS19a]. Each party \(P_i\) encodes their set \(S_i\) as \(\mathsf {p}_{i}(\text {x}):=\varPi _{a\in S_i}(\text {x}-a)\in \mathbb {F}[x]\). \(P_i\) then computes \(e_{i,j}:=\mathsf {p}_{i}(j)\) for \(j\in [2T+1]\) and \(e'_{i}:=\mathsf {p}_{i}(z)\) on a special random point \(z\in \mathbb {F}\) (picked uniformly at random by \(P_1\)). Party \(P_i\) encrypts these values as \([\![{ e_{i,j} }]\!], [\![{ e_i' }]\!]\) and sends them to \(P_1\). Party \(P_1\) considers the rational polynomial

$$\begin{aligned} \widetilde{\mathsf {p}}_i(\text {x})=\frac{\mathsf {p}_{i}(\text {x})}{\mathsf {p}_{1}(\text {x})}=\frac{\mathsf {p}_{i \setminus 1}(\text {x})}{\mathsf {p}_{1\setminus i}(\text {x})} \end{aligned}$$

and homomorphically computes \(2T+1\) encrypted evaluations \(\left( j,[\![{ \frac{e_{i,j}}{e_{1,j}} }]\!]\right) \) for \(j=[2T+1]\). Using these encrypted evaluations, \(P_1\) homomorphically computes an encrypted rational polynomial \([\![{ \widetilde{\mathsf {p}}_i^*(\text {x}) }]\!]\) using rational polynomial interpolation. \(P_1\) then homomorphically reconstructs the roots of \(\mathsf {p}_{i\setminus 1}(\text {x})\) and \(\mathsf {p}_{1\setminus i}(\text {x})\) from \(\widetilde{\mathsf {p}}_i^*\) to obtain \([\![{ D^*_{i,1} }]\!],[\![{ D^*_{1,i} }]\!]\). Note that \(\widetilde{\mathsf {p}}_i^*(\text {x}) = \widetilde{\mathsf {p}}_i(\text {x})\) if \(\widetilde{\mathsf {p}}_i(\text {x})\) has degree at most 2T, in which case \(D^*_{i,1} = D_{i,1}\) and \(D^*_{1,i} = D_{1,i}\).

In the final protocol, \(P_1\) homomorphically computes encrypted predicates \(b_i\) where \(b_i=1\) iff \(\widetilde{\mathsf {p}}_i^*(z) = \frac{e_{i}'}{e_1'}\) for each \(i\in [n]\setminus \{1\}\) and encrypted predicate \(b'\) where \(b'=1\) iff \(\left| \bigcup _{i\in [n]\setminus \{1\}} (D^*_{1,i}\cup D^*_{i,1})\right| \le T\). The output predicate b is homomorpically computed as \([\![{ b }]\!]=[\![{ b' \cdot \prod _{i\in [n]\setminus \{1\}} b_i }]\!]\) and jointly decrypted by all the parties. The protocol is formally described in Fig. 6.

Fig. 6.
figure 6

Multi-party private intersection cardinality testing protocol \(\varPi _{{\mathsf {TFHE}\text {-}\mathsf {CTest}\text {-}\mathsf {diff}}}\) for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)

Theorem 3

Assuming threshold FHE with distributed setup, protocol \(\varPi _{{\mathsf {TFHE}\text {-}\mathsf {CTest}\text {-}\mathsf {diff}}}\) (Fig. 6) securely realizes \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {diff}}\) (Fig. 4).

Proof

Correctness. We first prove the protocol is correct. By the correctness of the \(\mathsf {TFHE}\) scheme, we only need to show that the computed predicate \(b=1\) if and only if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\). First consider the case where the protocol should output \(\textsf {similar}\). Since

$$\begin{aligned} \widetilde{\mathsf {p}}_i(\text {x})=\frac{\mathsf {p}_{i}(\text {x})}{\mathsf {p}_{1}(\text {x})}=\frac{\mathsf {p}_{i\setminus 1}(\text {x})}{\mathsf {p}_{1\setminus i}(\text {x})}, \end{aligned}$$

both the numerator and denominator have degree at most T and therefore the rational polynomial interpolation requires at most \((2T+1)\) evaluation points. Hence \(\widetilde{\mathsf {p}}_i^*(\text {x})=\widetilde{\mathsf {p}}_i(\text {x})\) and \(\widetilde{\mathsf {p}}_i^*(z)=\widetilde{\mathsf {p}}_i(z)=\frac{e_i'}{e_{1}'}\), thus \(b_i=1\). Since the roots of \(\mathsf {p}_{i\setminus 1}\) is simply the set difference \(D_{i,1}=S_i\setminus S_1\), we have \(D^*_{i,1}=D_{i,1}=S_i\setminus S_1\). Similarly \(D^*_{1,i}=S_1\setminus S_i\). Since \(\left| \bigcup _{i\in [n]\setminus \{1\}} (D^*_{1,i}\cup D^*_{i,1})\right| =\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\), we have \(b'=1\). Hence the protocol will output \(b=1\).

Now consider the case where the protocol should output different, namely \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| > T\). There are two possible cases. In the first case, \(|S_i \setminus S_1|>T\) for some i. Then \(\widetilde{\mathsf {p}}_i\) has degree at least \(2T+2\) but \(\widetilde{\mathsf {p}}^*_i\) is interpolated from \(2T+1\) evaluation points, hence \(b_i'=0\) with all but negligible probability. In the second case, \(|S_i \setminus S_1|\le T\) for all \(i\in [n]\setminus \{1\}\). Then \(D^*_{i,1}=D_{i,1}=S_i\setminus S_1\), \(D^*_{1,i}=S_1\setminus S_i\), and \(b_i=1\) for all i. Since \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| > T\), \(b'=0\). In both cases, we have \(b=b'\cdot \prod _{i\in [n]\setminus \{1\}} b_i = 0\) with all but negligible probability.

Communication Cost. Each party sends \((2T+2)\) TFHE encryptions and one partial decryption to \(P_1\) where each plaintext is a field element. \(P_1\) sends one ciphertext to every other party. The size of each encryption and each partial decryption is \(\mathsf {poly}(\lambda )\). Thus, the overall communication complexity is \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\) in a star network and the protocols runs in O(1) rounds.

Security. The proof of security is identical to the proof of Theorem 2. We defer the formal proof to the full version.

6 TAHE-Based Protocol for \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)

In this section, we present a multi-party protocol for private intersection cardinality testing for functionality \(\mathcal {F_{{\mathsf {CTest}\text {-}\mathsf {diff}}}}\) based on threshold additive homomorphic encryption with distributed setup. That is, the parties learn whether their sets satisfy \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\). Our protocol works in the star network communication model where \(P_1\) is the central party.

In our construction, we need a secure multi-party computation (MPC) protocol that tests the singularity of a specific Hankel matrix (defined later), which we discuss in Sect. 6.1. Using this, we present our complete protocol in Sect. 6.2.

6.1 Singularity Testing of Hankel Matrices

In Sect. 6.2, we will see that intersection cardinality testing can be reduced to determining whether the determinant of a specific matrix is 0 or not. The latter problem can be reduced to computing the so-called “Half-GCD” of two specific polynomials. In this section, we present a summary of the various results that go into these reductions and refer the reader to the cited works for further details.

Half-GCD Problem. Consider the ring of polynomials \(\mathbb {F}[\text {x}]\). Note that since \(\mathbb {F}[\text {x}]\) is a Euclidean domain, Euclid’s GCD algorithm can be applied to polynomials as well. Consider \(\mathsf {p}_{0}, \mathsf {p}_{1} \in \mathbb {F}[\text {x}]\) with \(d = \mathrm {deg}(\mathsf {p}_{0}) > \mathrm {deg}(\mathsf {p}_{1}) \ge 0\). The Euclidean algorithm can be viewed as a sequence of transformations of 2-vectors as below:

$$\begin{aligned} \left( \begin{array}{c} \mathsf {p}_{0} \\ \mathsf {p}_{1} \end{array} \right) \overset{M_{1}}{\longrightarrow } \left( \begin{array}{c} \mathsf {p}_{1} \\ \mathsf {p}_{2} \end{array} \right) \overset{M_{2}}{\longrightarrow } \ldots \overset{M_{h - 1}}{\longrightarrow } \left( \begin{array}{c} \mathsf {p}_{h - 1} \\ \mathsf {p}_{h} \end{array} \right) \overset{M_{h}}{\longrightarrow } \left( \begin{array}{c} \mathsf {p}_{h} \\ 0 \end{array} \right) \end{aligned}$$
(2)

Here, \(M_{1}, \ldots , M_{h}\) are \(2 \times 2\) matrices, \(\mathsf {p}_{2}, \ldots , \mathsf {p}_{h} \in \mathbb {F}[\text {x}]\). For vectors UV and a matrix M, we write \(U \overset{M}{\longrightarrow } V\) to denote \(U = MV\).

Equation 2 can be correctly interpreted if we define

$$M_{i} = \left( \begin{array}{cc} \mathsf {q}_{i} &{} 1 \\ 1 &{} 0 \end{array} \right) .$$

We call such matrices elementary matrices, where \(\mathsf {q}_{i}\) is a polynomial of positive degree. We also refer to \(\mathsf {q}_{i}\) as the partial quotient in \(M_{i}\). A regular matrix M is a product of zero or more elementary matrices, namely

$$M = M_{1}M_{2}\ldots M_{k} \text {} (k \ge 0)$$

where if \(k = 0\), then M is defined to be the identity matrix of order 2.

We define the half-GCD (HGCD) problem for the polynomial ring \(\mathbb {F}[\text {x}]\) as follows. Given \(\mathsf {p}_{0}, \mathsf {p}_{1} \in \mathbb {F}[\text {x}]\) with \(d = \mathrm {deg}(\mathsf {p}_{0}) > \mathrm {deg}(\mathsf {p}_{1}) \ge 0\), compute a regular matrix

$$\begin{aligned} M = \mathtt {HGCD}(\mathsf {p}_{0}, \mathsf {p}_{1}) \end{aligned}$$

such that if

$$\left( \begin{array}{c} \mathsf {p}_{0} \\ \mathsf {p}_{1} \end{array} \right) \overset{M}{\longrightarrow } \left( \begin{array}{c} \mathsf {p}_{2} \\ \mathsf {p}_{3} \end{array} \right) ,$$

then

$$\begin{aligned} \mathrm {deg}( \mathsf {p}_{2} ) \ge d/2 > \mathrm {deg}( \mathsf {p}_{3}). \end{aligned}$$

We now recall the result of Thull and Yap [TY90] on the computational complexity of HGCD.

Imported Theorem 4

Consider the polynomial ring \(\mathbb {F}[x ]\) and the polynomials \(\mathsf {p}_{0}, \mathsf {p}_{1} \in \mathbb {F}[x ]\) with \(d = \mathrm {deg}( \mathsf {p}_{0}) > \mathrm {deg}( \mathsf {p}_{1}) \ge 0\). The computational complexity of the HGCD problem is \(O(d\log ^{2} d)\).

Singularity Testing of Hankel Matrices. Next, we proceed to outline the results that enable us to use the HGCD problem to test singularity of Hankel matrices. A Hankel matrix is a matrix in which each ascending skew-diagonal from left to right is constant. We will be working with square Hankel matrices. In particular, a \((k + 1) \times (k + 1)\) Hankel matrix takes the form

$$H = \left( \begin{array}{cccc} a_{0} &{} a_{1} &{} \ldots &{} a_{k} \\ a_{1} &{} a_{2} &{} \ldots &{} a_{k + 1} \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ a_{k} &{} a_{k + 1} &{} \ldots &{} a_{2k} \end{array} \right) $$

where the \(2k + 1\) entries \(a_{0}, a_1, \ldots , a_{2k}\) define H. Define the two polynomials

$$\begin{aligned}&\mathsf {p}_{0}(\text {x}) = \text {x}^{2k + 1}\\&\mathsf {p}_{1}(\text {x}) = a_{0} + a_{1}\text {x}+ a_{2}\text {x}^{2} + \ldots + a_{2k}\text {x}^{2k} \end{aligned}$$

where \(\mathsf {p}_{0}, \mathsf {p}_{1} \in \mathbb {F}[\text {x}]\). Let \(M = \mathtt {HGCD}(\mathsf {p}_{0}, \mathsf {p}_{1})\) and

$$\left( \begin{array}{c} \mathsf {p}_{0} \\ \mathsf {p}_{1} \end{array} \right) \overset{M}{\longrightarrow } \left( \begin{array}{c} \mathsf {p}_{2} \\ \mathsf {p}_{3} \end{array} \right) .$$

Then we have

$$\begin{aligned} \mathrm {deg}( \mathsf {p}_{2}) \ge k + 1 > \mathrm {deg}(\mathsf {p}_{3}). \end{aligned}$$

We recall the setting and results of Brent, Gustavson and Yun [BGY80] that elegantly connect the singularity of H with the HGCD of \(\mathsf {p}_{0}(\text {x})\) and \(\mathsf {p}_{1}(\text {x})\).

Imported Theorem 5

The Hankel matrix H is singular iff \(\mathrm {deg}( \mathsf {p}_{3}) < k\).

Putting Imported Theorems 4 and 5 together, we have the following theorem.

Imported Theorem 6

The computational complexity of testing singularity of a \((k+1) \times (k+1)\) Hankel matrix is \({O}(k \log ^2 k)\).

Multi-party Singularity Testing. Looking ahead, in our multi-party intersection cardinality testing protocol, we will need to test for the singularity of a Hankel matrix H which the parties have additive shares of, and the parties will run a secure multi-party computation (MPC) protocol to jointly test for the singularity of H. The ideal functionality \(\mathcal {F}_{\mathsf {SingTest}}\) for the multi-party minimal polynomial computation is defined in Fig. 7. We will need an MPC protocol that realizes \(\mathcal {F}_{\mathsf {SingTest}}\) with communication complexity at most \(\widetilde{O}(k \cdot n\cdot \mathsf {poly}(\lambda ))\). Any such protocol suffices, and we denote by \(\varPi _\mathsf {SingTest}\) the MPC protocol realizing \(\mathcal {F}_{\mathsf {SingTest}}\).

Fig. 7.
figure 7

Ideal functionality \(\mathcal {F}_{\mathsf {SingTest}}\) for multi-party singularity testing of a Hankel matrix.

Here we describe two such protocols with communication complexity \(\widetilde{O}(k \cdot n \cdot \mathsf {poly}(\lambda ))\) based on TAHE. In the first protocol, after the TAHE setup, each party \(P_i\) sends \([\![{ H_{i} }]\!]\) to \(P_1\) and \(P_1\) homomorphically computes \([\![{ H }]\!]\). Afterwards \(P_1\) can homomorphically evaluate a circuit C that computes a predicate \(b\overset{?}{=} (\mathrm {det}(H) = 0)\), following the ideas from [FH96, CDN01]. Finally the parties jointly decrypt the encrypted output. Since the size and depth of C are both \(O(k \log ^{2} k)\) by Imported Theorem 6, the total communication complexity of this protocol is \(O(k\log ^{2} k \cdot n \cdot \mathsf {poly}(\lambda ))\) and the round complexity is \(O(k\log ^{2} k)\).

As a second protocol, the parties jointly compute another \(C'\) that takes H and a random PRF key r as input and outputs a Yao’s garbled circuit [Yao86] that computes C. This approach is inspired by the work of Damgård et al. [DIK+08]. Since both H and r are additively shared among all the parties, this MPC can be done similarly as in the previous protocol, namely \(P_1\) first obtains \([\![{ H }]\!]\) and \([\![{ r }]\!]\) and then homomorphically evaluates \(C'\). Since the size \(C'\) is \(\widetilde{O}(k\cdot \mathsf {poly}(\lambda ))\) and the depth of \(C'\) is constant assuming PRG is a circuit in \(\text {NC}^1\) [AIK05], the total communication complexity of this protocol is \(\widetilde{O}(k \cdot n \cdot \mathsf {poly}(\lambda ))\) and the round complexity is O(1).

Two-Party Case. Notice that for two parties, \(\mathcal {F}_{\mathsf {SingTest}}\) can be instantiated via Yao’s garbled circuits with communication complexity \(\widetilde{O}(k \cdot \mathsf {poly}(\lambda ))\).

6.2 Our Protocol

In this section we present our multi-party private intersection cardinality testing protocol. That is, the parties learn whether their sets satisfy \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\).

At a high level, our protocol first encodes each party \(P_i\)’s set as a polynomial \(\mathsf {p}_i(\text {x})=\sum _{j=1}^m \text {x}^{a^i_j}\), and let \(\mathsf {p}(\text {x}) := (n-1)\mathsf {p}_1(\text {x}) - \sum _{i=2}^n \mathsf {p}_i(\text {x})\). Notice that a term \(\text {x}^a\) is cancelled out in the polynomial \(\mathsf {p}\) if and only if the element a is in the set intersection I. Therefore, the number of monomials in \(\mathsf {p}\) is exactly \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \).

To determine if the number of monomials in \(\mathsf {p}\) is \(\le T\), we can apply the polynomial sparsity test of Grigorescu et al. [GJR10] similarly as in [GS19a]. In particular, pick a field \(\mathbb {F}_q\), sample \(u {\mathop {\leftarrow }\limits ^{\$}}\mathbb {F}_q\) uniformly at random, and compute the Hankel matrix

$$ H= \left[ \begin{array}{cccccc} \mathsf {p}(u^0) &{} \mathsf {p}(u^1) &{} \dots &{} \mathsf {p}(u^{T})\\ \mathsf {p}(u^1) &{} \mathsf {p}(u^2) &{} \dots &{} \mathsf {p}(u^{T+1})\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \mathsf {p}(u^T) &{} \mathsf {p}(u^{T+1}) &{} \dots &{} \mathsf {p}(u^{2T})\\ \end{array} \right] .$$

Determining if the number of monomials in \(\mathsf {p}\) is \(\le T\) can be reduced to testing the singularity of H. In particular, we take the following theorem from [GJR10, Theorem 3] and [GS19a, Theorem 1].

Imported Theorem 7

Let \(q> T(T+1)(p-1)2^{\kappa }\) be a prime. If the number of monomials in \(\mathsf {p}\) is \(\le T\), then \(\Pr [\det (H)=0]=1\), and if the number of monomials in \(\mathsf {p}\) is \(> T\), then \(\Pr [\det (H)=0]\le 2^{-\kappa }\),

In our multi-party private intersection cardinality testing protocol, the parties will first compute additive shares of H and then run a multi-party minimal polynomial computation protocol to jointly test the singularity of H. The protocol is presented in Fig. 8.

Fig. 8.
figure 8

Multi-party private intersection cardinality testing protocol \(\varPi _{\mathsf {CTest}\text {-}\mathsf {diff}}\).

Theorem 8

Let \(q> T(T+1)(p-1)2^{\kappa }\) be a prime. Assuming threshold additive homomorphic encryption scheme with distributed setup, the protocol \(\varPi _{\mathsf {CTest}\text {-}\mathsf {diff}}\) (Fig. 8) securely realizes \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {diff}}\) in the \(\mathcal {F}_\mathsf {SingTest}\)-hybrid model.

Proof

Correctness. By the correctness of \(\mathcal {F}_\mathsf {SingTest}\), in Step 2 all the parties learn a bit b and \(b=0\) if and only if H is singular, where H is the Hankel matrix \(H = \sum _{i = 1}^{n}H_{i}\) and each Hankel matrix \(H_{i}\) is defined by the inputs of party \(P_{i}\) as

$$H_{i} = \left( \begin{array}{cccc} a_{0, i} &{} a_{1, i} &{} \ldots &{} a_{T, i} \\ a_{1, i} &{} a_{2, i} &{} \ldots &{} a_{T + 1, i} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{T, i} &{} a_{T + 1, i} &{} \ldots &{} a_{2T, i} \end{array} \right) $$

for \(i = 1, \ldots , n\). By Imported Theorem 7, \(b=0\) if and only if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\) with all but negligible probability. Therefore the protocol is correct with all but negligible probability.

Communication Cost. The communication cost is the same as the protocol \(\varPi _{\mathsf {SingTest}}\). In particular, the round complexity is O(1) in a star network and the total communication complexity is \(\widetilde{O}(T\cdot n \cdot \mathsf {poly}(\lambda ))\).

Security. We construct a PPT \(\mathsf {Sim}\) which simulates the view of the corrupted parties. The simulator \(\mathsf {Sim}\) gets the output \(w\in \{\mathsf {similar},\mathsf {different}\}\) from the ideal functionality. \(\mathsf {Sim}\) sets a bit \(b^*=1\) if \(w=\textsf {similar}\) and \(b^*=0\) otherwise. Also, for each corrupt party \(P_i\), \(\mathsf {Sim}\) has as input the tuple \((S_i, r_i)\) indicating the party’s input and randomness for the protocol. The strategy of the simulator \(\mathsf {Sim}\) for our protocol is described below.

  1. 1.

    Invoke the corrupted parties with their corresponding inputs and randomness.

  2. 2.

    Play the role of the honest parties as follows: Run the protocol honestly. Note that \(P_{1}\) is the only party that ever sends a message, so this step in the simulation is trivial.

  3. 3.

    In Step 2, play the role of \(\mathcal {F}_{\mathsf {SingTest}}\) and respond \(b^*\).

  4. 4.

    Finally, output the view of the corrupted parties.

Next we argue that the view of the corrupted parties generated by \(\mathsf {Sim}\) is computationally indistinguishable to their view in the real world from \(\mathcal {Z}\)’s point of view. The only difference between the real and ideal worlds is that in the ideal world, the output from \(\mathcal {F}_{\mathsf {SingTest}}\) is replaced by 0 if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\) and 1 otherwise. This is computationally indistinguishable from the real world because of the correctness of the protocol.

Corollary 2

Assuming TAHE with distributed setup, protocol \(\varPi _{\mathsf {CTest}\text {-}\mathsf {diff}}\) (Fig. 8) securely realizes \(\mathcal {F}_{\mathsf {CTest}\text {-}\mathsf {diff}}\) in the star network communication model with communication complexity \(\widetilde{O}(n\cdot T \cdot \mathsf {poly}(\lambda ))\) and round complexity O(1).

7 Threshold PSI for \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}}\)

Recall that in a multi-party threshold PSI protocol for functionality \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}}\) defined in Fig. 2, each party wishes to learn the intersection of all their sets if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\), that is, if the size of the union of all their sets minus the intersection is less than the threshold T. In this section, we describe our multi-party threshold PSI protocol based on any protocol for multi-party private intersection cardinality testing. We rely on TAHE with distributed setup.

Theorem 9

Assuming threshold additive homomorphic encryption with distributed setup, protocol \(\varPi _{\mathsf {TPSI}\text {-}\mathsf {diff}}\) (Fig. 9) securely realizes \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}}\) in the \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)-hybrid model in the star network communication model. Our protocol is secure against a semi-honest adversary that can corrupt up to \((n-1)\) parties.

The protocol runs in a constant number of rounds and the communication complexity is \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\) in the \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)-hybrid model. We then instantiate the \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\)-hybrid with the two protocols from the previous sections: one based on TFHE from Sect. 5.2 that has round complexity O(1) and \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\) communication complexity and the other based on TAHE from Sect. 6 that has round complexity O(1) and communication complexity \(\widetilde{O}(n\cdot T \cdot \mathsf {poly}(\lambda ))\). Formally, we get the following corollaries:

Corollary 3

Assuming TFHE (resp. TAHE) with distributed setup, protocol \(\varPi _{\mathsf {TPSI}\text {-}\mathsf {diff}}\) (Fig. 9) securely realizes \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}}\) in the star network communication model with communication complexity \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\) (resp. \(\widetilde{O}(n\cdot T \cdot \mathsf {poly}(\lambda ))\)) and round complexity O(1).

Our threshold PSI protocol for functionality \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {int}}}}\) is almost identical and we defer the details to the full version.

7.1 Protocol

Consider n parties \(P_1,\ldots ,P_n\) with input sets \(S_1,\ldots ,S_n\) of size m and a star network where the central party is \(P_1\). The parties first run the private intersection cardinality testing protocols for functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\) from the previous sections and proceed if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\). Then, each party \(P_i\) encodes its set as a polynomial \(\mathsf {p}'_i(\text {x}) = (\text {x}-r_i)\cdot \prod _{j=1}^{m}(\text {x}-a^i_{j})\) where \(r_i\) is picked uniformly at random. The parties then compute \((3T+4)\) evaluations of the following polynomial \(\mathsf {V}(\cdot )\) on points \(1,.\ldots ,(3T+4)\) using threshold additive homomorphic encryption: \(\mathsf {V}(\text {x}) = \sum _{i=1}^{n} \left( \mathsf {p}'_i(\text {x})\cdot \mathsf {R}_i(\text {x}) \right) \) where each \(\mathsf {R}_i(\cdot )\) is a uniformly random polynomial of degree T that is computed as an addition of n random polynomials - one generated by each party. Then, each party \(P_i\) interpolates the degree \((3T+3)\) rational polynomial \(\frac{\mathsf {V}(\cdot )}{p'_i(\cdot )}\) using the \((3T+4)\) evaluations. Finally, each party outputs the intersection as \(S_i \setminus D_i\) where \(D_i\) denotes the roots of the above interpolated polynomial. Our protocol is formally described in Fig. 9.

Two-Party Case. For two parties Alice and Bob, we can rely on AHE alone, where Alice holds the secret key. In particular, define \(\mathsf {V}(\text {x}) := \mathsf {p}_A(\text {x})\cdot \left( \mathsf {R}^A_1(\text {x}) + \mathsf {R}^B_1(\text {x})\right) + \mathsf {p}_B(\text {x}) \cdot \left( \mathsf {R}^A_2(\text {x}) +\mathsf {R}^B_2(\text {x})\right) \), where \((\mathsf {R}^A_1, \mathsf {R}^A_2)\) and \((\mathsf {R}^B_1, \mathsf {R}^B_2)\) are uniformly random polynomials of degree T generated by Alice and Bob, respectively. To obtain an evaluation of \(\mathsf {V}(x)\), Alice first sends an encryption of \(\mathsf {p}_A(x)\) and \(\mathsf {R}^A_2(x)\) to Bob. Then Bob homomorphically computes an encryption of \(r=\mathsf {p}_A(x)\cdot \mathsf {R}^B_1(x) + \mathsf {p}_B(x) \cdot \left( \mathsf {R}^A_2(x) +\mathsf {R}^B_2(x)\right) \) and sends it back. Alice can decrypt \([\![{ r }]\!]\) and compute \(\mathsf {V}(x)=\mathsf {p}_A(x) \cdot \mathsf {R}^A_1(x)+r\). The communication complexity is \(O(T \cdot \mathsf {poly}(\lambda ))\).

Fig. 9.
figure 9

Multi-party threshold PSI protocol \(\varPi _{\mathsf {TPSI}\text {-}\mathsf {diff}}\) for functionality \(\mathcal {F_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}}\).

7.2 Security Proof

Correctness. If \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| > T\), then the protocol terminates after the first step – private intersection cardinality testing. If, on the other hand, \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\), observe that polynomial \(\mathsf {V}(\text {x})\) can be rewritten as \(\sum _{i=1}^{n} \mathsf {p}'_i(\text {x})\cdot U_{i}(\text {x})\) where each \(U_i\) is a uniformly random polynomial of degree at most \(T+1\). Now, from the correctness of the TAHE scheme, each party \(P_i\) learns \(3T+4\) evaluations of the rational polynomial:

$$\begin{aligned} \mathsf {q}_i(\text {x}) = \frac{\mathsf {V}(\text {x})}{\mathsf {p}'_i(\text {x})} = \frac{\sum _{i=1}^{n} \mathsf {p}'_i(\text {x})\cdot U_{i}(\text {x})}{\mathsf {p}'_i(\text {x})} = \frac{\sum _{i=1}^n \mathsf {p}_{i\setminus I}(\text {x})\cdot (\text {x}-r_i) \cdot U_{i}(\text {x})}{\mathsf {p}_{i\setminus I}(\text {x})\cdot (\text {x}-r_i)}. \end{aligned}$$

Since \(|S_i- I| \le T\) for each \(i\in [n]\), the numerator is a polynomial of degree at most \(2T+2\) and the denominator is a polynomial of degree at most \(T+1\). Further, since each \(U_i\) is uniformly random, we can show that the numerator is a random degree \(2T+2\) polynomial, and that the gcd of the polynomials in the numerator and denominator is 1 and hence no other terms will get canceled out. The algebraic proofs are deferred to the full version. Therefore, each party \(P_i\) can interpolate this rational polynomial using \(3T+4\) evaluation points and thereby learn the numerator and denominator. Finally, observe that for each party \(P_i\), the roots of the denominator contains the set \(S_i\setminus I\) and a random \(r_i\), from which \(P_i\) can easily compute the intersection I.

Communication Cost. The first phase of the protocol, namely private intersection cardinality testing, has a communication complexity of \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\) when instantiated with the TFHE-based scheme in Sect. 5.2 and a communication complexity of \(\widetilde{O}(n\cdot T\cdot \mathsf {poly}(\lambda ))\) when instantiated with the TAHE-based scheme in Sect. 6.

We now analyze the communication cost for the second phase where the parties compute the concrete intersection. The TAHE key generation is independent of the set sizes and the threshold T and has a communication complexity of only \(O(n \cdot \mathsf {poly}(\lambda ))\). The bottleneck of the protocol is in Step 3, that is, evaluating the random polynomial. In Steps 3b, 3d, and 3f, every party sends \(3T+4\) encryptions or partial decryptions to \(P_1\) hence the cost for these steps is \(O(n \cdot T \cdot \mathsf {poly}(\lambda ))\). In Steps 3c, 3e, and 3g, \(P_1\) sends \(3T+4\) ciphertexts or plaintexts to every other party so the cost of these steps is \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\). Finally, the last stage, namely computing the set intersection, does not involve any communication. Thus, the overall communication cost for computing the intersection is \(O(n\cdot T \cdot \mathsf {poly}(\lambda ))\).

Therefore, when the private intersection cardinality testing protocol is instantiated with the TFHE-based protocol, the overall communication complexity is \(O(n \cdot T \cdot \mathsf {poly}(\lambda ))\) and when instantiated with the TAHE-based scheme, the overall communication complexity is \(\widetilde{O}(n \cdot T \cdot \mathsf {poly}(\lambda ))\) for some apriori fixed polynomial \(\mathsf {poly}(\cdot )\) and is independent of the size of each input set m.

Security. Consider an environment \(\mathcal {Z}\) who corrupts a set \(\mathcal {S}^*\) of \(n^*\) parties where \(n^*<n\). The simulator \(\mathsf {Sim}\) has the output of the functionality \(\mathcal {F}_{{\mathsf {TPSI}\text {-}\mathsf {diff}}}\), namely the intersection set I or \(\bot \). \(\mathsf {Sim}\) sets \(w=\mathsf {similar}\) if the output is I and \(w=\mathsf {different}\) if the output is \(\bot \). In addition, \(\mathsf {Sim}\) has the tuple \((S_i, r_i)\) for each corrupt party \(P_i\) indicating the party’s input and randomness for the protocol. The strategy of the simulator \(\mathsf {Sim}\) for our multi-party threshold PSI protocol is described below.

  • (a) Private Intersection Cardinality Testing: \(\mathsf {Sim}\) plays the role of the ideal functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\) and responds with w.

  • (b) TAHE Key Generation: \(\mathsf {Sim}\) runs the distributed key generation algorithm \(\mathsf {TAHE.DistSetup}(1^\lambda ,i)\) of the TAHE scheme honestly on behalf of each honest party \(P_i\) as in the real world. Note that \(\mathsf {Sim}\) also knows \((\{\mathsf {sk}_i\}_{i \in S^*})\) as it knows the randomness for the corrupt parties.

  • (c) Evaluations of Random Polynomial: \(\mathsf {Sim}\) does the following:

  1. 1.

    Encode the intersection set \(I = \{b_1,\ldots ,b_{|I|}\}\) as a polynomial as follows: \(\mathsf {p}_I(\text {x}) = \varPi _{i=1}^{|I|}(\text {x}-b_{i})\).

  2. 2.

    Pick a random polynomial \(U(\cdot )\) of degree \(2T+2\) and set the polynomial \(\mathsf {V}(\text {x})\) as follows: \(\mathsf {V}(\text {x}) = \mathsf {p}_I(\text {x}) \cdot U(\text {x})\).

  3. 3.

    In Steps 3b3e, on behalf of every honest party \(P_i\), whenever \(P_i\) has to send any ciphertext, send \([\![{ 0 }]\!]\) using fresh randomness.

  4. 4.

    For each \(x \in [3T+4]\), let \([\![{ v_x }]\!]\) denote the ciphertext that is sent to all the parties at the end of Step 3f.

  5. 5.

    In Step 3f, for each \(j \in [3T+4]\), on behalf of each honest party \(P_i\), instead of computing \(\{[\![{ v_x : \mathsf {sk}_i }]\!]\}\) by running the honest \(\mathsf {TAHE.PartialDec}\) algorithm as in the real world, \(\mathsf {Sim}\) computes the partial decryptions by running the simulator \(\mathsf {TAHE.Sim}\) as follows: \(\{[\![{ v_x : \mathsf {sk}_i }]\!]\}\leftarrow \mathsf {TAHE.Sim}(\mathsf {C},\mathsf {V}(x),\) \([\![{ v_x }]\!],\{\mathsf {sk}_i\}_{i \in \mathcal {S}^*})\), where \(\mathsf {C}\) is the public linear circuit to compute \(\mathsf {V}(x)\) by \(P_1\).

  6. 6.

    Finally, in Step 3g, if \(P_1\) is honest, send the evaluations of polynomial \(\mathsf {V}(x)\) as in the real world description.

Hybrids. We now show that the above simulation strategy is successful against all environments \(\mathcal {Z}\) that corrupt parties in a semi-honest manner. That is, the view of the corrupt parties along with the output of the honest parties is computationally indistinguishable in the real and ideal worlds. We will show this via a series of computationally indistinguishable hybrids where the first hybrid \(\mathsf {Hybrid}_0\) corresponds to the real world and the last hybrid \(\mathsf {Hybrid}_4\) corresponds to the ideal world.

  • \(\mathsf {Hybrid}_0\) - Real World: In this hybrid, consider a simulator \(\mathsf {SimHyb}\) that plays the role of the honest parties as in the real world.

  • \(\mathsf {Hybrid}_1\) - Private Intersection Cardinality Testing: In this hybrid, \(\mathsf {SimHyb}\) plays the role of the ideal functionality \(\mathcal {F}_{{\mathsf {CTest}\text {-}\mathsf {diff}}}\) and responds with \(\mathsf {similar}\) if \(\left| \left( \bigcup _{i=1}^n S_i\right) \setminus I \right| \le T\) and \(\mathsf {different}\) otherwise.

  • \(\mathsf {Hybrid}_2\) - Simulate Partial Decryptions: In this hybrid, in the evaluations of random polynomial, \(\mathsf {SimHyb}\) simulates the partial decryptions generated by the honest parties in Step 3f as done in the ideal world. That is, for each \(j \in [3T+4]\), \(\mathsf {SimHyb}\) computes the partial decryptions as \(\{[\![{ v_x : \mathsf {sk}_i }]\!]\} \leftarrow \mathsf {TAHE.Sim}(\mathsf {C},\mathsf {V}(x),\) \([\![{ v_x }]\!],\{\mathsf {sk}_i\}_{i \in \mathcal {S}^*})\). Observe that the polynomial \(\mathsf {V}(\cdot )\) is still computed as in the real world (and in \(\mathsf {Hybrid}_2\)).

  • \(\mathsf {Hybrid}_3\) - Switch Polynomial Computation: In this hybrid, the polynomial \(\mathsf {V}(\cdot )\) is no longer computed as in the real world. Instead, \(\mathsf {SimHyb}\) now picks a random polynomial \(U(\cdot )\) of degree \(2T+2\) and sets the polynomial \(\mathsf {V}(\cdot )\) as follows: \(\mathsf {V}(\text {x}) = \mathsf {p}_I(\text {x}) \cdot U(\text {x})\).

  • \(\mathsf {Hybrid}_4\) - Switch Encryptions: In this hybrid, in the evaluations of random polynomial, \(\mathsf {SimHyb}\) now computes every ciphertext generated on behalf of any honest party as encryptions of 0 as done by \(\mathsf {Sim}\) in the ideal world. This hybrid corresponds to the ideal world.

We show that every pair of consecutive hybrids is computationally indistinguishable in the full version.