MPClan: Protocol Suite for Privacy-Conscious Computations

The growing volumes of data being collected and its analysis to provide better services are creating worries about digital privacy. To address privacy concerns and give practical solutions, the literature has relied on secure multiparty computation techniques. However, recent research over rings has mostly focused on the small-party honest-majority setting of up to four parties tolerating single corruption, noting efficiency concerns. In this work, we extend the strategies to support higher resiliency in an honest-majority setting with efficiency of the online phase at the centre stage. Our semi-honest protocol improves the online communication of the protocol of Damgård and Nielsen (CRYPTO’07) without inflating the overall communication. It also allows shutting down almost half of the parties in the online phase, thereby saving up to 50% in the system’s operational costs. Our maliciously secure protocol also enjoys similar benefits and requires only half of the parties, except for one-time verification towards the end, and provides security with fairness. To showcase the practicality of the designed protocols, we benchmark popular applications such as deep neural networks, graph neural networks, genome sequence matching, and biometric matching using prototype implementations. Our protocols, in addition to improved communication, aid in bringing up to 60–80% savings in monetary cost over prior work.


I. INTRODUCTION
Today's world is seeing a visible transition from offline services to a heavy dependency on online platforms for banking, socializing, healthcare, etc.This is leading to an increased user presence online, which leaves a trail of online activity and personal data over the Internet.The availability of such userspecific data opens up possibilities for its misuse.For instance, there has been a lot of concern raised regarding advertisement service providers such as Google, Facebook breaching user privacy for targeted advertisement services [34].In the process of providing enhanced targeted advertisement services, service providers are allegedly learning more information about their users than they are entitled to (e.g., user's shopping activity, browsing history) from various data collection entities.These entities collect user data via website cookies, loyalty cards, etc. [65].While such targeted advertisements offer a personalized online experience, they may come at the cost of revealing unauthorized user data to these service providers.Such a challenge is also encountered in the healthcare sector.Collaborative analysis among healthcare institutes over patient data is known to facilitate better diagnosis and improved treatment.However, laws such as GDPR, which prevent sharing of patient records, hinder such collaborations, thereby re-emphasizing the need for mechanisms that enable privacy-preserving computations.
Such mechanisms that ensure privacy-preserving computations can be facilitated via several privacy-enhancing technolo-gies such as homomorphic encryption [16], [41], differential privacy [36], secure multiparty computation [89], [11], [43], to name a few.We focus on secure multiparty computation (MPC) as it has been the cornerstone of research lately, showcasing its effectiveness in various applications such as privacy-preserving machine learning [56], [68], [85], secure collaborative analytics [75], secure genome matching [79], [5], etc. Essentially, it offers a solution to the potential privacy issues which may arise in collaborative computation scenarios such as targeted advertisements described earlier.MPC allows mutually distrusting parties to perform computations on their private inputs such that they learn nothing beyond the output of the computation.The distrust among the parties is captured by the notion of a centralized adversary, which is said to corrupt up to t out of the n participating parties.Depending on its behaviour, the adversary can be categorized as either semihonest or malicious [42].Semi-honest adversary models the corruption scenario where the corrupt parties are restricted to follow the protocol and cannot deviate arbitrarily, as in the stronger notion of malicious corruption.
MPC with honest majority, where only a minority of the parties are corrupt, enables construction of efficient protocols for multiple parties [13], [31], [1], [48], [12], [75].The recent concretely efficient protocols have only considered small number of parties [56], [74], [22], [85], [68], [28], [66], [84], which restricts the number of corruptions to at most one (t = 1).Although the small-party setting has found application in the outsourced computation paradigm too, the generic multiparty setting is a better fit for real-world deployments due to its resiliency to a higher number of corruptions (t < n/2).Thus, for larger n, the number of corruptions that can be tolerated is also higher, thereby increasing the trust in the system.Moreover, multiparty setting allows for privacy-conscious computations even in a non-outsourced deployment scenario, such as in providing targeted advertisement services (described in Fig. 1 and elaborated below), when outsourcing the computation is not feasible/preferable.Hence, to design efficient protocols, we focus on honest majority multiparty computation.a) Use Case: Consider the scenario of targeted advertisement services depicted in Fig. 1(a).Typically, data collection entities track a user's online activities via website cookies while browsing the Internet ( 1 ).Also known as cookie profiling, such data collection allows the entities to create a "profile" for each user, which may contain information such as browsing habits, gender, marital status, and age, to name a few, as shown in 2 .These profiles can facilitate targeted advertisements via specialized algorithms ( 4 ), which is leveraged by the advertisement service providers such as Google and Facebook.While such services offer a personalized experience, it comes at the expense of users' private data being revealed to the service providers, as indicated in 3 .A feasible solution (Fig. 1(b)) instead is to place a solution box at the interface between these service providers and the data collection entities such that it provides mechanisms to ensure the privacy of user data while also facilitating the required computations over the same (to provide targeted advertisement services).MPC being a technology that supports privacy-preserving computations, lends itself well to such tasks.Instead of the data collection entities directly revealing the user data to the advertisement service providers, they can engage in an instance of MPC protocol ( 3 ) which securely runs the required algorithm on the user data while maintaining its privacy.Moreover, such a computation does not require the data collection entities to reveal their data to each other, thus offering a viable solution.Furthermore, as studied in [49], the effectiveness of targeted advertisements can greatly benefit from the use of machine learning algorithms.In particular, neural networks, and more recently graph neural networks [24], [72], [90], [62], [88] have shown the potential to better analyse the data available via user profiles, in turn allowing for a refined personalized experience.We thus focus on protocols for securely evaluating the standard neural networks such as VGG16 [82] (deep neural network) and graph neural network, and provide benchmarks for the same in Section V.

A. Related work
We restrict related work to MPC protocols in honestmajority setting.Despite the interest in MPC for small population [4], [3], [39], [23], [1], [21], [74], [22], [18], [56], [85], [28], MPC protocols for arbitrary number of parties have been studied largely [38], [31], [48], [6], [8], [10], [15], [13], [17], [78], [2], [19], [12], [45].In the honest majority (t < n/2) semi-honest setting, [31], [40] forms the state of the art MPC protocols over fields in the information theoretic setting.This was further optimized in the computational setting in [13] using a one-time setup for correlated randomness.We will often refer to this optimized honest-majority semihonest protocol of [31] as DN07.In the information-theoretic setting, the work of [46], improves upon the communication and round complexity of [31].The work of [38] recently demonstrates MPC protocols in the honest majority setting in the preprocessing model with malicious security, which requires communicating 3t field elements in the online as well as the preprocessing phase.We observe that the semi-honest protocol derived from this requires communicating 2t elements in the online and 3t elements in the preprocessing phase.The recent work of [12], [6] provides semi-honest MPC protocols which require each party to communicate roughly t elements per multiplication gate, resulting in quadratic communication in the number of parties.DN07 has served as the basis for obtaining malicious security for free (i.e.amortized communication cost of 3t elements per multiplication gate) in the computational setting [13], [15] as well as in the informationtheoretic setting [48], [46].Both [48] and [15] follow the approach of executing a semi-honest protocol, followed by a verification phase to check the correctness of multiplication which involves heavy polynomial interpolation operations.As mentioned earlier, the recent work of [38] focuses on maliciously secure protocols for honest-majority setting in the preprocessing model.Their protocol relies on an instantiation of [48] in the preprocessing phase that requires communicating 3t elements while requiring another 3t element communication in the online phase.However, their protocol is inefficient due to a consistency check required after each level of multiplication and introduces depth-dependent overhead in communication complexity.The absence of this check results in a privacy breach as described in [47] and is elaborated in §B-B0c.

B. Towards practically efficient protocols
Before stating our contributions, we elaborate on the choices made in designing a practically efficient protocol.
2. Algebraic structure.To further enhance efficiency by utilizing the underlying CPU architecture, several protocols work over rings [68], [56], [22], [85], [57], [66].We follow this approach and design MPC protocols operating over the ring Z 2 and rely on replicated secret sharing (RSS).Note that usage of RSS inherently results in exponential blow-up in the number of shares for an arbitrary number of parties.Hence, it is well-suited for the practically-oriented scenarios comprising of a constant number of parties [13], [15], which we restrict to for benchmarking our protocols.
3. Masked evaluation.To make our protocols efficient in the preprocessing paradigm, we use the masked evaluation paradigm, a variant of the replicated secret sharing scheme.The secret data is masked using a masking value in this case, and the mask is RSS shared.The computation is done on the publicly available masked values and the shared masks.This technique was first introduced in the context of circuit garbling schemes (see [64], [87]), and was then adapted to secret sharing-based protocols in dishonest majority (see [51], [9]).It was later applied to small-population honest-majority settings such as [44], [21], [74], [28], [56] and [57] to aid in the development of practically efficient protocols.4. Adversarial strategy.Based on the deployment scenario, different levels of security may be desired.While semi-honest security suffices for several applications as shown in [4], [59], [21], [70], [5], [79], [20], [83], malicious security is always desirable.Thus, to cater to different scenarios, our protocols are designed to provide semi-honest and malicious security, where each security goal has its merit.
5. Monetary cost.To reduce the operational costs in the online phase, several recent works [74], [56], [22], [57] reduce the number of (online) computing parties.This is useful in long computations such as those involved in privacypreserving machine learning (PPML) applications, which span several days or even weeks.Reducing the number of online parties is especially advantageous for protocols deployed in the secure outsourced computation (SOC) setting since one has to pay for the up-time of every hired server.Shutting down even a single server significantly helps in reducing the monetary cost [67], [57] of the system.We thus focus on ensuring the participation of a minimal number of parties during the online computation in our protocols.This is achieved for the first time in generic n-party protocols 1 .Specifically, all the protocols for the semi-honest setting in our framework benefit from using only t + 1 parties in the online phase.The protocols in the malicious setting also enjoy this benefit except that the remainder t parties are required to come online for a short verification phase at the end.The reduction in online parties aids in improving the operational cost of the framework by almost 50%.This is unlike prior works [31], [13], [15], [48], [46] which require active participation from all parties throughout the computation.

C. Our Contributions
We begin with a quick overview of the contributions of this work, followed by the details.
• We construct an n-party semi-honest protocol in the preprocessing paradigm which offers an improved online phase than the decade-old state-of-the-art protocol of [31], without inflating its total cost.Moreover, our protocol reduces the number of active parties in the online phase, thereby improving the system's operational cost when deployed in SOC setting.
• We extend our semi-honest protocol to the malicious setting, while retaining the benefits of requiring reduced number of parties in online phase for majority of the computation.Our offer over state-of-the-art protocol of [38] is a stronger security guarantee of fairness, and O(d) improvement in round complexity.Here, d denotes depth of the circuit to be evaluated.
• We provide support for 3 and 4 input multiplication, at the same online complexity as that of the 2 input multiplication.In addition to improving the communication cost over the approach of sequential multiplications, multi-input multiplication offers a 2× improvement in the round complexity which is beneficial for high latency networks.
• We design building blocks for a range of applications such as deep neural networks, graph neural networks, genome sequence matching and biometric matching.When the applications are benchmarked, our semi-honest protocol witnesses a saving of up to 69% in monetary cost, and has 3.5× to 4.6× improvements in online run time and throughput over [31].Interestingly, our maliciously secure protocols outperforms the semi-honest protocol of [31] in terms of online run time and throughput for the applications under consideration, achieving the goal of fast online phase.
We now elaborate on the contributions and highlight the technical details and novelty of our work.Fig. 2: Hierarchy of primitives in our 3-tier framework Our protocol suite follows a 3-tier architecture (Fig. 2) to attain the final goal of privacy-conscious computations.The first tier comprises fundamental primitives such as input sharing, reconstruction, multiplication (with truncation), and multiinput multiplication.The second tier includes building blocks such as dot product, matrix multiplication, conversion between Boolean and arithmetic worlds, comparison, equality, nonlinear activation functions, to name a few, as required in the applications considered.Finally, the third tier is applications.Our main contribution lies in Tier I and is detailed below.

1) Tier I -MPC protocols:
Our goal is to design protocols with a fast online phase.Thus, working over Z 2 and relying on RSS, we design a semi-honest MPC protocol in the computational setting assuming a one-time shared-key setup for correlated randomness.
Note that the straightforward extension of semi-honest multiplication protocol of [31] to the preprocessing model, which can also be derived from the recent work of [38], incurs a communication of 3t elements in the preprocessing phase while communicating 2t elements in the online.This amounts to a 1.6× overhead in the total cost over [31].Our contribution lies in ensuring a fast online phase, without inflating the total communication cost of the protocol.Specifically, our protocol requires communicating only 2t ring elements in the online phase and t in the preprocessing, for a multiplication gate.We are the first to achieve a communication cost of 2t in the online phase (unlike 3t in the prior works [31], [40]), without incurring any overhead in the total cost, i.e., our total cost still matches that of the best known (optimized) semi-honest honest-majority protocol [31], [40].
We extend our protocol to provide malicious security with fairness2 at the cost of additionally communicating t elements in the online phase and 2t in the preprocessing phase.Although (abort3 ) protocol of [38] has the same communication as our maliciously secure protocol, we achieve a stronger security notion of fairness.Moreover, [38] requires an additional round of communication for consistency checks after each level, the absence of which results in a privacy breach (described in [47] and elaborated in §B-B0c), and necessitates participation from all parties.However, by relying on a variant of RSS, our protocol avoids the consistency check after each level of circuit evaluation and ensures privacy.Notably, we only require participation from all parties for a one-time verification at the end of evaluation, thus reducing the number of rounds by d (d denotes circuit depth).
3 and 4 input multiplications: Following [73], [71], [57], to reduce the online communication cost and round complexity, we design protocols to enable the multiplication of 3 and 4 inputs in a single shot.Compared to the naive approach of performing sequential multiplications to multiply 3/4 inputs, the multi-input multiplication protocol enjoys the benefit of having the same online phase complexity as that of the 2-input multiplication protocol.This brings in a 2× improvement in the online round complexity and improves the online communication cost.Support for multi-input multiplication enables usage of optimized adder circuits [73] for secure comparison and Boolean addition, thereby resulting in a faster online phase.The recent work of [46] also proposes a method to improve the round complexity of circuit evaluation by evaluating all gates in two consecutive layers in a circuit in parallel.We observe that their method can be viewed as a variant of multi-input multiplication with 3 and 4 inputs.Thus, our protocols need not be limited to facilitate faster comparison and Boolean additions alone (as described above), but can be used to reduce the round and communication complexity of any general circuit evaluation.Note that [46] only improves the round complexity (2×) without inflating the communication cost when compared to [31].However, we focus on improving round complexity (2×) as well as communication of the online phase by trading off an increase in the preprocessing.
2) Tier II -Building Blocks: We design efficient protocols for several building blocks in semi-honest and malicious settings, which are stepping stones for Tier III applications.These are extensions from the small party setting [68], [74], [56], [73], and hence we defer the details to §C-A (semi-honest) and §C-B (malicious).
3) Tier III -Applications: To showcase the practicality of our framework and improvements of our protocols, we benchmark a range of applications such as neural networks (NN), which also includes the popular deep NN called VGG16 [82], graph neural network, genome sequence matching, and biometric matching, and are considered for the first time in the n-party honest-majority setting.We benchmark the applications in the WAN setting using Google Cloud instances.As mentioned, owing to the inherent restrictions of RSS and keeping the focus on practical scenarios, we showcase the performance of our protocols for n = 5, 7, and 9 and compare with the state-ofthe-art semi-honest protocol of [31].
1. Deep neural networks.We benchmark inference phases of deep neural networks such as LeNet [60] and VGG16 [82].We observe savings of up to 69% in monetary cost, and improvements of up to 4.3× in online run-time and throughput, in comparison to [31].
2. Graph neural network.We benchmark the inference phase of graph neural network [35], [81] on MNIST [61] data set.In comparison to [31], our protocol improves up to 3.5× in online run-time, and sees up to 15% savings in monetary cost.
3. Genome sequence matching.We demonstrate an efficient protocol for similar sequence queries (SSQ), which can be used to perform secure genome matching.Our protocol is based on the protocol of [79] which works for 2 parties and uses an edit distance approximation [5].We extend and optimize the protocol for the multiparty setting.In comparison to [31], we witness improvements of up to 4× in online runtime and throughput, and savings of 66% in monetary cost.
4. Biometric matching.We propose efficient protocols for computing Euclidean distance (ED), which forms the basis for performing biometric matching.Continuing the trend, we witness a 4.6× improvement in online run-time and throughput compared to [31], and savings of up to 85% in monetary cost.

II. PRELIMINARIES
We cast our protocols in the (function-dependent) preprocessing paradigm to enable a fast online phase.Parties rely on a one-time shared key setup (see §A) [74], [22], [56], [68], [4], [18] to enable generation of correlated randomness, non-interactively.Our protocols are designed for rings (Z 2 ).We use fixed-point arithmetic (FPA) [70], [68], [21], [22], [74], [56] representation to operate over decimal values.Here, a decimal value is represented as an -bit integer in signed 2's complement representation.The most significant bit (msb) represents the sign bit, and d least significant bits are reserved for the fractional part.The -bit integer is then treated as an element of Z 2 , and operations are performed modulo 2 .We let = 64, d = 13, with − d − 1 bits for the integer part.This work considers both semi-honest and malicious adversarial models with static and at most t < n/2 corruptions.The security of constructions is proved using the real-world/ idealworld simulation paradigm [63], and the details are provided in §E.Let P = {P 1 , P 2 , . . ., P n } denote the set of n parties which are connected by pair-wise private and authentic channels in a synchronous network.Set E = {P 1 , P 2 , . . ., P t+1 }, termed as the evaluator set, comprises parties that are active during the online phase.Set D = {P t+2 , P t+3 , . . ., P n }, termed as the helper set, comprises parties which help in the preprocessing phase, and in the online verification in the malicious setting.Parties agree on a P king ∈ E. Without loss of generality, let P king = P t+1 .a) Sharing semantics: We use the following sharing semantics, based on RSS & additive sharing schemes, which facilitate a fast online phase.
• • -sharing: This denotes the replicated secret sharing (RSS) of a value with threshold t.A value a ∈ Z 2 is said to be RSS-shared with threshold t if for every subset T ⊂ P of n − t parties there exists a T ∈ Z 2 possessed by all P i ∈ T such that a = T a T .Alternatively, for every set of t parties, the residual h = n − t parties forming the set T , hold the share a T .Let T 1 , T 2 , . . ., T q ⊂ P be the distinct subsets of size h, where q = n h represents the total number of shares.Since P i belongs to n−1 h−1 such sets, the tuple of shares { a T } that it possesses are denoted as a i .
• [•]-sharing: A value a ∈ Z 2 is said to be [•]-shared (additively shared) among parties in Helper primitive

Input
Output Identity of a party P s • -sharing of a random value r ∈ Z 2 such that P s learns all shares We refer to this sharing scheme as (t + 1)-additive sharing, and use E [a] to denote such a sharing among parties in E.
• • -sharing: A value a ∈ Z 2 is said to be • -shared in the semi-honest setting if there exist values λ a , m a ∈ Z 2 such that m a = a + λ a where λ a is • -shared among P and every P i ∈ E holds m a .We denote the shares of P i ∈ D by a i = λ a i and that of P i ∈ E as a i = (m a , λ a i ).In the malicious setting, m a is held by all parties, and a i = (m a , λ a i ) for all P i ∈ P.
It is trivial to see that all the sharing schemes mentioned above are linear.This allows parties to compute linear operations such as addition and multiplication with constants locally.The Boolean world operates over Z 2 , and we denote the corresponding Boolean sharing with a superscript B. Notations are summarized in Table II.
Notation Description Total number of parties with t corrupt and h = t + 1 honest T 1 , . . ., T q q = n h distinct subsets of P with t + 1 parties each q Number of replicated secret shares (RSS) of a value   [13], [15], [74], [26] in our protocols, and their details are deferred to §A-A.The Boolean variants of corresponding primitives are denoted with a superscript B.

III. MPCLAN PROTOCOL
This section details the semi-honest MPC protocol execution performed over the ring Z 2 that comprises three phasesinput sharing, evaluation (linear operations and multiplication), and output reconstruction.a) Input sharing and Output Reconstruction: To enable P s ∈ P to • -share a value v ∈ Z 2 , parties first noninteractively sample • -shares of λ v , relying on the sharedkey setup, such that P s learns all these shares on clear (via Π pRand ).This enables P s to compute and send m v = v + λ v to parties in E, thereby generating v .
To reconstruct v towards all parties given v , parties in E non-interactively generate its additive shares, ).These parties send their additive shares to P king , who computes and sends v to all parties.Reconstruction towards a single party, say P s , proceeds similarly except that the protocol terminates after parties in E send their additive shares of v to P king = P s , who then computes v. b) Evaluation: Evaluation comprises linear operations of addition and multiplication with public constant, and nonlinear operations such as multiplication.Parties can noninteractively compute linear operations owing to the linearity of the • -sharing.Concretely, given a , b and public constants c 1 , c 2 , parties can non-interactively compute c 1 a+c 2 b as c 1 a + c 2 b .
To compute • -shares for non-linear operations such as multiplication, say z = ab given a , b , parties proceed as follows.At a high-level, the approach is to enable generation of z−r and r for a random r ∈ Z 2 , which enables parties to non-interactively compute z = z − r + r .Observe that r can be generated non-interactively by locally sampling each of its shares.To generate z − r , we let parties in E obtain z − r, following which z − r can be generated noninteractively (this is achieved via Π •→ • where all parties set their shares of λ z−r as 0, and parties in E set m z−r = z − r).Observe that z remains private while revealing z − r to parties in E since r is a random mask not known to adversary.
To enable parties in E to obtain z − r, we let z − r = D + E, where D is additively shared among parties in D while E is additively shared among parties in E (D, E are defined in the following paragraphs).Thus, to reconstruct z − r towards parties in E, parties send their respective additive shares of D or E towards P king ∈ P. P king reconstructs D, E, and sends z − r = D + E to parties in E. Elaborately, as seen in [21], [57], z − r can be computed as We next detail the steps in the multiplication protocol, and its schematic representation is provided in Fig. 3. Step 2 : This step involves computing additive shares of Λ ab − r among all parties.For this, parties non-interactively generate Step 3 : Parties in E generate additive shares of λ a , λ b among themselves ( Step 4 : Parties in D send their additive shares of D (as defined in step 2 ) to P king , who reconstructs D.
Step 5 : Note that it suffices for only one designated party in E to add M ab in its share of E [E], and without loss of generality we let this designated party be P king .For P king = P t+1 in our case, Parties send their additive shares of E to P king , who reconstructs E, and sends z − r = D + E to parties in E.
• Else, invoke Π dsBits (P, 1) (Fig. 22) to generate r , r d , and Online: Lemma III.1.Protocol Π mult (Fig. 4) incurs a communication of t elements in the preprocessing phase and 2t elements in 2 rounds in the online phase for multiplication when isTr = 0.
Analysis: Observe that the communication towards P king in steps 4 and 5 , can be performed in parallel, resulting in the overall round complexity of the protocol being two.Further, a communication of t elements is required in step 4 and 2t elements is required in 5 (since P king ∈ E), thereby having a total communication complexity of 3t ring elements.This complexity resembles that of DN07.However, our sharing semantics enables us to push some of the steps mentioned above to a preprocessing phase, resulting in a fast online phase, which is non-trivial to achieve in the case of DN07.Elaborately, observe that since r, λ a , λ b are independent of the input (owing to our sharing semantics), computation involving these terms ranging from steps 1 to 4 can thus be moved to a preprocessing phase.This improves the online communication complexity by slashing the inward communication towards P king by half.Thus, the online phase requires only 2t ring elements of communication while offloading t elements of communication to the preprocessing phase.
Note that a straightforward extension of semi-honest multiplication of [31] to the preprocessing model, which can be derived from [38], does not provide an efficient solution.Although such a protocol has the same online complexity (2t elements) as our online phase, it has the drawback of inflating the overall communication cost by a factor of 1.6× over [31].Elaborately, the online communication cost of 2t elements can be attained by appropriately defining the sharing semantics and using the P king approach, similar to our protocol.However, this requires parties to generate the sharing of Λ ab = λ a • λ b from the shares of λ a and λ b during the preprocessing phase, and requires a full-fledged multiplication, incurring a cost of 3t elements.This yields a protocol with a total cost of 5t elements in comparison to the 3t cost of the all-online DN07 protocol.Thus, departing from this approach, the novelty of our protocol lies in leveraging the interplay between the sharing semantics and redesigning the communication pattern among the parties to ensure that the total cost of 3t does not change.
Furthermore, our protocol design allows parties in D to remain shut in the online phase, thereby reducing the system's operational load.This is because parties in D only contribute towards the computation of D, which can be completed in the preprocessing phase.However, the preprocessing phase becomes function-dependent due to the linear gates, for which the λ value for the output wires cannot be chosen randomly.Concretely, if c is the output of a linear gate, say addition, with inputs a, b, then λ c cannot be chosen randomly and should be defined as Ideal functionality F n−PC for evaluating function f in the n-party setting with semi-honest security appears in Fig. 5.
F n−PC interacts with the parties in P and the adversary S sh .Let f denote the function to be computed.Let xs be the input corresponding to the party Ps, and ys be the corresponding output, i.e ({ys} n s=1 ) = f ({xs} n s=1 ).
Step 1: Step 2: F n−PC sends (Output, ys) to Ps ∈ P. Functionality F n−PC Fig. 5: Semi-honest: Ideal functionality for function f c) Incorporating truncation: To retain FPA semantics, it is required to truncate the result of multiplication, z = ab, which ends up having 2d bits in the fractional part, by d bits, i.e. compute z d = z/2 d .For this, we extend the probabilistic truncation technique of [68], [56], [57] proposed in the small party domain to the n-party setting.Given (r, r d )-pair, with r d = r/2 d , the truncated value of z can be obtained as z d = (z − r) d +r d .Accuracy and correctness of this method follows from [68], [66].d + r d , instead, in step 6 .For (i), we rely on the ideal functionality, F TrGen (Fig. 6), for computing r , r d .F TrGen can be instantiated using the appropriate MPC protocol which will be used as a black-box in our multiplication.Thus, improvements in the MPC protocol that realizes F TrGen can be inherited in our multiplication protocol.
In our work, we instantiate F TrGen using Π dsBits (Fig. 22), which is a slightly modified version of the doubly-shared random bit generation protocol of [29], adapted to our n-party setting.Concretely, Π dsBits generates doubly-shared random bits instead of a single bit, as done in the protocol of [29].
Here, a doubly-shared random bit is a bit which is arithmetic as well as Boolean shared.We defer the details of Π dsBits to §B-A since it follows easily from the protocol of [29].With respect to (ii), observe that it is a local operation, and hence performing truncation does not incur any additional overhead in the online phase.d) Dot product: Given • -shares of vectors x and y of size n, dot product outputs z where z = x y = n k=1 x k y k and denotes the dot product operation.The design of our multiplication protocol enables easy extension to support dot product computation without incurring any overhead.Concretely, similar to multiplication, In each of the summands of z − r, each of the n product terms can be generated similar to that in the multiplication protocol, which can then be locally summed up before sending it towards P king .Due to this simple extension, we defer the formal dot product protocol (Fig. 23) to §B-A.Looking ahead, for matrix multiplication, each element of the resultant matrix can be computed via a dot product.e) Multi input multiplication: 3-input and 4-input multiplication protocols have showcased their wide applicability in improving the online phase complexity [57], [73], [71].Concretely, computing z = abc (3-input) or z = abcd (4input) naively requires at least two sequential invocations of 2-input multiplication protocol in the online phase.Instead, 3input and 4-input multiplication protocol, respectively, enables performing this computation with the same online complexity as that of a single 2-input multiplication.Thus, we design 3input and 4-input multiplication protocols by extending the techniques of [73], [57] to the n-party setting.Designing these protocols require modifications in the preprocessing steps.Consider 3-input multiplication where the goal is to generate We follow an approach closely related to 2-input multiplication, with the difference being that parties additionally require to generate the additive sharing of Λ bc , Λ ac and Λ abc during preprocessing.Given these sharings, parties proceed with a similar online phase as in Π mult to compute the 3-input multiplication without inflating the online cost.Similarly, for 4-input multiplication, parties need to generate the additive sharing of Λ ad , Λ bd , Λ cd , Λ abd , Λ acd , Λ bcd , Λ abcd in addition to those required in the case of 3-input multiplication.The generation of these sharings follows a similar approach as the 2-input multiplication, and the details are deferred to §B-A.The recent work of [46] provides a method to reduce the round complexity of circuit evaluation.They group the (distinct) consecutive layers in the circuit into pairs and perform a parallel evaluation of all gates in the two layers in a group.Consider a multiplication gate with inputs x, y (obtained as output from a previous layer) and output z.Their approach considers three cases: (i) if x and y are not the outputs of a multiplication gate, (ii) exactly one among x, y is the output of a multiplication gate, and (iii) both x, y are outputs of a multiplication gate.We observe that case (ii) and (iii) in their approach resembles multi-input multiplication, which allows evaluating the second layer of multiplication (z = x • y) noninteractively, thereby saving on rounds.

Fig. 7: 4-input multiplication
For instance, consider a 2-layer sub-circuit as in Fig. 7 where x = a • b, y = c • d are outputs of a multiplication gate which are fed as input to a multiplication gate in the next level.The approach of [46] shot, which is equivalent to computing z via a 4-input multiplication in our case.Similarly, when only one of the inputs (either x or y) is the output of multiplication, computation of z = x • y resembles a 3-input multiplication.Thus, cases (i), (ii), (iii) correspond to 2-input, 3-input, and 4input multiplication, respectively, in our work and are sufficient to reduce round complexity of any circuit evaluation by half.Hence, we restrict our focus to 3 and 4-input multiplication.

IV. EXTENDING TO MALICIOUS SECURITY
Using standard approaches [74], [56], [38], it is straightforward to adapt the semi-honest protocols such as input sharing and output reconstruction to the malicious setting.The details are provided in §B-B for completeness.Hence, in this section, we focus on the challenges encountered and their resolutions for obtaining a maliciously secure multiplication protocol.
Note that although a maliciously secure multiplication protocol can be achieved by compiling our semi-honest protocol using compiler techniques such as [1], [15], the resultant protocol has an expensive online phase.For instance, using the compiler of [1] yields a protocol that requires computation over extended rings and communicating 4t extended ring elements in the online phase.This is not favourable compared to working over plain rings, especially in the online phase.Further, compilers such as those in [15] require heavy computational machinery like reliance on zero-knowledge proofs in the online phase, which is also not desirable.Thus, to attain a computation and communication efficient online phase, departing from the aforementioned compiler-based approaches, we design a maliciously secure multiplication protocol that requires communicating 3t ring elements in each phase.It is worth noting that we can do this while retaining the benefits of requiring only t + 1 parties in the online phase (for most of the computation).The remaining t parties are required to come online only for a short one-time verification phase, that is deferred to the end of the computation.Deferring verification may result in a privacy breach [47].However, we describe later why the privacy breach does not arise in our protocol.
To enable generation of z = ab from a and b , we retain the high-level ideas from the semi-honest protocol.Our task reduces to (i) generating additive shares of Λ ab among parties in E (i.e.E [Λ ab ]) given λ a and λ b , in the preprocessing phase, and (ii) reconstructing z − r in the online phase.Given (i), computing E [z − r] in the online phase is a local operation.Given (ii), parties can invoke Π •→ • to generate z − r , and compute z = z − r + r , where r is generated in the preprocessing phase, as discussed in the semi-honest case.
F MulPre interacts with the parties in P and the adversary S. Let Ti be the set of the honest parties.
Input: F MulPre receives the • -shares of a, b from the parties.It also receives • -shares of z = ab of corrupt parties from S. S is also allowed to send a special command, (abort, P), which indicates that parties in P with indices in P should abort.
F MulPre proceeds as follows.
-Reconstruct a, b using the shares received from honest parties, and compute z = ab.
-Compute the • -share of z to be held by the set of honest parties as the difference between z and the sum of • -shares of z received from corrupt parties.
-Let ys denote the • -shares of z for party Ps ∈ P. If received (abort, P) from S, set ys = abort for Ps, where s ∈ P.
Output: Send (Output, ys) to every Ps ∈ P. Functionality F MulPre Fig. 8: Ideal functionality F MulPre For task (i), our idea for the semi-honest case, of making parties in D to send their shares to P king , does not work in the presence of a malicious adversary.To address this, we make black-box use of a maliciously secure multiplication protocol, abstracted as a functionality F MulPre in Fig. 8, that computes Λ ab from λ a , λ b .In this work, we instantiate F MulPre with the state-of-the-art multiplication protocol of [15] that provides abort security.Note that although the protocol of [15] relies on zero-knowledge proofs, this computation is carried out in the preprocessing phase of our multiplication protocol.Moreover, since preprocessing is done for many instances in one shot, the zero-knowledge proof can benefit from amortization.The parties then invoke Looking ahead, Λ ab also aids in performing the online verification check.
For task (ii), in the online phase, we retain the idea of parties in E optimistically reconstructing z − r from their additive shares ( E [•]-share) to ensure that only the parties in E remain active for most of the computation.Moreover, this optimistic reconstruction requires only O(t)-element communication rather than the O(t 2 ) required for reconstruction from • -shares (which is what will be used later for performing verification, albeit to perform only one such reconstruction).Thus, similar to the semi-honest protocol, parties in E optimistically reconstruct z−r towards P king , who further sends the reconstructed value to the parties in E. In the malicious setting, this approach requires additional care since a malicious party may send a wrong E [•]-share of z − r to P king or a malicious P king may send an incorrectly reconstructed (inconsistent) z−r to the parties.To account for these behaviours, the protocol is augmented with a short one-off verification phase to verify the consistency and correctness of z − r.This phase is executed in the end of the protocol and requires the presence of all parties, and hence the possession of z − r by all.This is in contrast to the semi-honest protocol where z − r is given to only parties in E. To keep D disengaged for most of the online phase, sending z − r to them is deferred till the end of the protocol.This send is a one-off and can be combined for all multiplication gates.Details of verification protocol Π Vrfy (Fig. 9) are given next.
Verification comprises two checks-a consistency check to first verify that P king has indeed sent the same z − r to all the parties, followed by a correctness check to verify the correctness of the z−r.For the former, parties perform a hashbased consistency check of z − r, and abort in case of any inconsistency.If z − r is consistent, parties verify its correctness.The high-level idea for verifying correctness is to robustly reconstruct z−r, but now from its • -shares (can be computed given λ a , λ a , Λ ab that are generated in the preprocessing phase).Parties can then verify if this reconstructed value equals the value received from P king .Concretely, this is equivalent to robustly reconstructing , where z − r is the value received from P king , and verifying if Ω = 0.For robust reconstruction of Ω , every party sends its • -share to every other party who misses this share, and aborts in case of inconsistencies in the received values.Elaborately, reconstruction of Ω towards P s ∈ P proceeds as follows.For each missing • -share of Ω at P s , each of the t + 1 parties holding this share send it to P s .P s uses this share for reconstruction if all the t + 1 received values are consistent, else it aborts.Presence of at least one honest party among the t + 1 guarantees that inconsistency, if any, can be detected.Since each share in Ω is held by t + 1 parties, comprising at least one honest party, any cheating by up to t corrupt parties is guaranteed to be detected.Note that this reconstruction requires communicating O(t 2 ) ring elements to verify the correct computation of a single multiplication gate, the cost of which can be optimized using standard optimization techniques [1], [23].Concretely, the correctness of z − r for several multiplication gates can be verified with a single reconstruction by reconstructing a linear combination of Ω for several gates and verifying equality with 0. Thus, only one robust reconstruction from • -shares is required for several multiplication gates, whose cost gets amortized due to verification across multiple gates.
-Correctness Check.Repeat the following κ times.
-Generate random θ1, . . ., θm ∈ Z 2 and compute -For each • -share of Ω, the t + 1 parties possessing this share send it to every party that misses this share.If the recipient party receives inconsistent values for any missing share, it aborts.-Reconstruct Ω and abort if Ω = 0.
Fig. 9: Malicious: Verification protocol for all multiplication gates It is worth noting that this random linear combination technique does not trivially work over rings.This is due to the existence of zero divisors which results in the linear combination being 0 with a probability 1/2 (which denotes the cheating probability of the adversary) [1].Hence, to obtain the desired security, the verification check is repeated κ times where κ is the security parameter.This bounds the cheating probability of adversary to 1/2 κ .Another approach is to perform the verification over extended rings [13], [14].Specifically, verification operations are carried out over a ring Z 2 /f (x) which is a ring of all polynomials with coefficients in Z 2 modulo a degree d polynomial f (x) that is irreducible over Z 2 .Each element of Z 2 is lifted to a degree d polynomial in Z 2 [x]/f (x), which increases the communication required to perform verification by a factor of d.
To summarize, the maliciously secure multiplication protocol (see Fig. 27) can be broken down into the following.
-Preprocessing phase which involves generation of Λ ab by invoking F MulPre .Malicious behaviour, if any, will be caught by in the online phase, and thereby reconstruction of z − r via P king .The crucial point to note here is that this requires the presence of only parties in E in the online phase.This is followed by non-interactive generation of z − r from which z is computed as z = z − r + r , where r is generated during preprocessing.-Finally, to catch malicious behaviour in the online phase, if any, in the verification phase the correctness of the generated z is checked simultaneously, for each z that is the output of a multiplication gate.This is done by invoking Π Vrfy .Note that before this verification begins, P king sends z − r corresponding to all multiplication gates to parties in D in a single shot.
As pointed out in [47], deferring the correctness check to later may result in a privacy breach when using a sharing scheme that allows for redundancy (such as RSS or Shamir sharing).The details are elaborated in §B-B0c.However, the crucial point to note here is that although we rely on a variant of RSS which introduces redundancy, recall that while performing a reconstruction towards P king , we rely on E [•]sharing of z − r, which is a (t + 1)-additive sharing.The use of additive sharing while performing reconstruction towards P king eliminates any redundancy in the sharing scheme and thus, helps in overcoming this subtle privacy breach, as also shown in [47].This privacy breach persists in [38], and is discussed in §B-B0c.
Lemma IV.1.Protocol Π M mult (Fig. 27) incurs a communication of 3t elements in the preprocessing phase and 3t elements in 2 rounds in the online phase for multiplication when isTr = 0.
The ideal functionality F n−PC for evaluating a function f in the n-party setting while providing malicious security (with abort) appears in Fig. 10.
F n−PC interacts with the parties in P and the adversary S mal .Let f denote the function to be computed.Let xs be the input of party Ps, and ys be the corresponding output, i.e ({ys} n s=1 ) = f ({xs} n s=1 ).S mal is also allowed to send a special command, (abort, P), which indicates that honest parties in P with indices in P should abort.
Step 1: F n−PC receives (Input, xs) from Ps ∈ P. If (Input, * ) already received from Ps, then ignore the current message.Otherwise, record x s = xs internally.
Step 2: Compute ({ys} n s=1 ) = f ({xs} n s=1 ) and send the output ys for a corrupt Ps to S mal .
Step 3: If received (Signal, abort, P) from S mal , set ys = abort for Ps, where s ∈ P. Send (Output, ys) to honest Ps ∈ P. Functionality F mal n−PC Fig. 10: Malicious: Ideal functionality for evaluating function f a) Multiplication with truncation: Similar to the semihonest protocol, truncation can be incorporated in the malicious multiplication as well without inflating the online communication.For this, we rely on maliciously secure ideal functionality, F M TrGen (Fig. 28), to generate the • -shares of (r, r d ) and is instantiated using Π M dsBits [29] protocol in our work.On a high level, the semi-honest versions of interactive operations such as multiplication and reconstruction in Π dsBits are replaced with their maliciously secure counterparts in Π M dsBits , and more details are provided in §B-B.b) Dot product: Similar to the maliciously secure multiplication protocol that relied on F MulPre to generate •shares of the multiplicative term, Λ ab in the preprocessing phase, the maliciously secure dot product protocol invokes F DotPPre (Fig. 29) to generate • -shares of the multiplicative term, n k=1 Λ x k y k , required to compute the dot product as per equation (2).Given • -shares of n k=1 Λ x k y k , online phase proceeds similar to that of multiplication.
Observe that a trivial realization of F DotPPre can be reduced to n instances of multiplication.However, we extend the ideas from [56] and rely on a distributed zero-knowledge proof [15] to eliminate the vector-size dependency in the preprocessing phase.Concretely, we instantiate F DotPPre using a semi-honest dot product protocol [48] whose cost matches that of semihonest multiplication [31] (and thus is independent of the vector-size), followed by a verification phase to verify the correctness of the dot product computation.For the verification, we extend the verification technique for multiplication in [15], to now verify the correctness of dot product, such that the cost due to verification can be amortized away for multiple dot products, thereby resulting in vector-size independent preprocessing.Details of this extension are deferred to §B-B.c) Multi input multiplication: This protocol is similar to its semi-honest counterpart with the difference that the preprocessing phase relies on invoking F MulPre for generating the required multiplicative terms.The details are deferred to §B-B0d.

V. APPLICATIONS & BENCHMARKS
To evaluate the performance of our protocols, we benchmark some of the popular applications such as deep neural networks (NN), graph neural networks (GNN), similar sequence queries (SSQ), and biometric matching where MPC is used to achieve privacy.While these applications have been looked at in the small party setting [70], [56], [81], [5], [79], [85], [73], [68], we believe the n-party setting is a better fit for reasons described in the introduction.To the best of our knowledge, we are the first to benchmark these in the multiparty honestmajority setting for more than four parties.a) Benchmark environment: The performance of our protocols is analyzed using a prototype implementation building over the ENCRYPTO library [27] in C++17.
We chose 64 bit ring (Z 2 64 ) for our arithmetic world, and the operations over extended ring were carried out using the NTL library 4 .Since the correctness and accuracy of the applications considered in the secure computation setting are already established, our benchmark aims to demonstrate our protocols' performance and is not fully functional.Moreover, we believe that incorporating state-of-the-art code optimizations like GPU-assisted computing can enhance the efficiency of our protocols, which is left as future work.Since there is no defined way to capture an adversary's misbehaviour, following standard practice [68], [56], [28], we benchmark honest executions of the protocols, which also include the steps performed for verification in the malicious case.b) Benchmark parameters: We report the run-time and communication of the online phase and total (= preprocessing + online).To capture the effect of online round complexity and communication in one go, we also report the throughput (TP [4], [68], [56]) of the online phase.TP denotes the number of operations that can be performed in one minute.Finally, when deployed in the outsourced setting, one pays the price for the communication and up-time of the hired servers.To demonstrate how our protocols fare in this scenario, we additionally report the monetary cost (Cost) [67], [57] for the applications considered.This cost is estimated using Google Cloud Platform [80] pricing, where 1 GB and 1 hour of usage costs USD 0.08 and USD 3.04, respectively.

A. Comparison with DN07
In this section, we benchmark our semi-honest and malicious protocols over synthetic circuits comprising one million multiplications with varying depths of 1, 100, and 1000, and compare against the optimized ring variant of DN07 [13].The gates are distributed equally across each level in the circuit.a) Communication: The communication cost for 1 million multiplications is tabulated in Table V for the 5, 7, and 9 party settings.As can be observed, the online phase of our semi-honest protocol enjoys the benefits of pushing 33% communication to a preprocessing phase compared to DN07.The observed values corroborate the claimed improvement in the online complexity of our protocol.Our malicious protocol retains the online communication cost of DN07 while incurring a similar overhead in the preprocessing.Note that pushing the communication to the preprocessing phase has several benefits.First, communication with respect to several instances can happen in a single shot and leverage the benefit of serialization.Second, with respect to resourceconstrained devices such as mobile phones, the preprocessing communication can occur whenever they have access to a highbandwidth Wi-Fi network (for instance, when the device is at home overnight).These benefits facilitate a fast online phase, as observed, that may happen over a low-bandwidth network.
b) Run-time: The time taken to evaluate circuits of different depths appears in Table VI.Since the time for the 5, 7, and 9 party settings vary within the range [0, 0.5], we report values only for the 7-party setting in Table VI.With respect to the online run-time, our semi-honest protocol's time is expected to be similar to that of DN07.However, DN07 demonstrates around 1.5× higher run-time.This difference can be attributed to the asymmetry in the rtt among parties, which vanished when benchmarked over a symmetric rtt setting.Compared to the semi-honest protocol, the malicious variant incurs a minimal overhead of less than one second in the online run-time due to the one-time verification phase.However, the overhead is higher for the case of the overall run-time.Concretely, it is around 10 seconds and is due to the distributed zero-knowledge proof computation in the preprocessing phase.Note that this overhead is independent of the circuit depth and gets amortized for deeper circuits as evident from  c) Monetary Cost: Another key highlight of our protocols is their improved monetary cost, as evident from Fig. 12. Concretely, for 9 parties (semi-honest), we observe a saving of 17% over DN07 for a depth-1 circuit, and it increases up to 72% for circuits with depth 1000.This is primarily due to the reduction in the number of online parties over DN07.Comparing our semi-honest and malicious variants, the latter has an overhead of 8× for depth-1 circuit, and it reduces to 1.14× for depth-1000 circuit.This is justified because the verification cost is amortized for deeper circuits, as mentioned earlier.Interestingly, our malicious variant outperforms even the semi-honest DN07 upon reaching circuit depths of 100 and above.A similar analysis holds in the symmetric rtt setting as well, where the saving is up to 56% (for d = 1000).DN07, which vanishes in the symmetric rtt setting.However, recall that our protocol requires only t + 1 active parties in the online phase, which leaves several channels among the parties underutilized.Hence, we can leverage the load balancing technique where parties' roles are interchanged across various parallel executions.For instance, one approach is to make every party act as P king , i.e., in 5PC, in one execution, P king = P 1 , E = {P 1 , P 2 , P 3 }, D = {P 4 , P 5 }, while in another execution P king = P 2 , E = {P 2 , P 3 , P 4 }, D = {P 5 , P 1 }, and so on.To analyse the effect of load balancing, we performed experiments with similar rtt among the parties and observed a 1.5× improvement in our semi-honest variant over DN07.This is justified as we communicate over four channels among the parties as opposed to six in DN07.We note that while enhancing the security from semi-honest to malicious, we observe a significant drop in TP, which is about 3× for the depth-1 circuit.This is primarily due to increased run time owing to the verification in online phase for malicious setting.However, this drop tends to zero for deeper circuits (as verification cost gets amortized), making online phase of our maliciously secure protocol on par with semi-honest one.

B. Deep Neural Networks (DNN) and Graph Neural Networks (GNN)
We benchmark three different neural networks (NN) [68], [74], [85] with increasing number of parameters-(i) NN-1: a 3layer fully connected one from [70], (ii) NN-2: the LeNet [60] architecture, and (iii) NN-3: VGG16 [82] architecture (further details are deferred to §D-A0a).We benchmark the inference phase of the above NNs, which comprises computing activation matrices, followed by applying an activation function or pooling operation, depending on the network architecture.NN-1 and NN-2 are benchmarked over MNIST dataset [61] while NN-3 is benchmarked using CIFAR-10 dataset [58].We also benchmark GNN inference, for which we use the simplified architecture of [35] given in [81].This architecture ( §D-A0b) is shown to achieve an accuracy of more than 99% on MNIST classification [81].To analyse the improvement of our protocols, we also benchmark (semi-honest) DN07 for the applications by adapting our building blocks to their setting.
The semi-honest benchmarks for the different NNs and GNN appear in Table X ( §D-A0a) while the malicious ones appear in Table XI ( §D-A0a).Fig. 13 gives a pictorial view of the trends observed while comparing the semi-honest variants and are described next.We incur a very minimal overhead in the run-time of our protocols when moving from five to nine parties over all the networks considered.Hence, we use ±δ to denote this variation in the table.The trends witnessed in synthetic circuit benchmarks ( §V-A) carry forward to neural networks as well due to reasons discussed previously.For instance, the improvement in the online run-time for our semihonest variant is up to 4.3× over DN07.The effect of reduced run-time and improved communication results in a significant improvement in online throughput of our protocol over DN07.Concretely, the gain ranges up to 4.3×.Further, the improved run-time coupled with the reduced number of online parties for our case brings in a saving of up to 69% in monetary cost for NN-1.However, the improvement drops to 33% for deep network NN-3.The reduction in savings is due to improved run-time getting nullified by increased communication from NN-1 to NN-3, making communication the dominant factor in determining monetary cost.
Observe that, unlike the case in synthetic circuits (Table V), the total communication here is an order of magnitude higher.This is primarily due to the higher communication cost incurred for performing the truncation operation-specifically, generation of the doubly-shared bits (Π dsBits , Fig. 22) in the preprocessing phase.It is worth noting that Π dsBits is used as a black-box, and an improved instantiation for it will lower the communication.Similar trends are observed for GNN as well, where the online run-time of DN07 is up to 3.5× higher than our semi-honest protocol.This is reflected in the throughput where we gain up to 3.4×.Further, we observe savings of up to 15% in monetary cost due to the reduced number of active parties and lesser run-time.
Moving to the malicious setting, we incur an overhead of up to 3% in online run-time, 6% in communication, and 13% in monetary cost over the semi-honest counterpart.Details are deferred to §D-A0a.

C. Genome Sequence Matching
Given a genome sequence as a query, genome matching aims to identify the most similar sequence from a database of sequences.This task is also known as similar sequence query (SSQ).It requires the computation of Edit Distance (ED), which quantifies how different two sequences are by identifying the minimum number of additions, deletions, and substitutions required to transform one sequence to the other.To compute the ED, we extend the (2-party) protocol from [79] which builds on top of the approximation from [5], to the n-party setting.The details of the approximation algorithm for ED computation appear in §D-C.The accuracy and correctness of this algorithm follow from [5].Among the two phases of the ED algorithm, where the first phase happens non-interactively, we only focus on the second phase of ED, which requires interaction and benchmark the same.The benchmarks for genome sequence matching appear in Table VII, Table XIII ( §D-C).Following [79], we consider three cases with different number of sequences in the database (m) and different block lengths (ω).The benchmarks for m = 2000, ω = 30 are reported in Table VII, while the ones for m = 1000, ω = 25 and m = 4000, ω = 35 appear in Table XIII.We witness similar trends here, where our semihonest protocol has improvements of up to 4× in both online run-time and throughput over DN07.Our malicious variant incurs a minimal overhead in the range of 5-6% in online runtime and total communication over the semi-honest counterpart.For the monetary cost (Fig. 14), our semi-honest protocol has up to 66% saving over DN07, and malicious variant has around 42%-54% overhead over semi-honest counterpart.

D. Biometric Matching
We extend support for biometric matching, which finds application in many real-world tasks such as face recognition [37] and fingerprint matching [50].The goal of such computation is to identify a sample from a database of m samples that is "closest" to a sample u held by a user.We follow the general trend and reduce the biometric matching problem to that of finding the sample from the database which has the least Euclidean Distance (EuD) with the user's sample u.Details of the protocol are deferred to §D-B.
The benchmarks for biometric matching appear in Table VIII, Table XII ( §D-B).The former table considers the case with 1024 and 65536 sequences in the database, while the latter considers 4096 and 16384 sequences.As is evident from Table VIII, our semi-honest protocol witnesses a 4.6× improvement over DN07 in both online run-time and throughput.Further, in terms of monetary cost, we observe a saving of around 85%.With respect to our maliciously secure protocol, we incur a minimal overhead of around 9.5% in terms of total communication and around 4% in online throughput over our semi-honest variant.We note that our malicious variant outperforms semi-honest DN07 in both online run-time and throughput, thereby achieving our goal of a fast online phase.

CONCLUSION
This work improves the practical efficiency of n-party honest-majority protocols using function-dependent preprocessing.While our first construction achieves a fast online phase compared to the semi-honest protocol of DN07, the second enhances security by tolerating malicious adversaries with minimal overhead in the online phase.The active participation of half of the participants in both of our constructions is a major highlight.This reduction in online parties results in monetary benefits in real-world deployments.

APPENDIX A PRELIMINARIES
a) Shared key setup: F setup [4], [68], [74] enables establishment of common random keys for a pseudo-random function (PRF) F , among parties.This aids in non-interactively generating correlated randomness.Here F : {0, 1} κ × {0, 1} κ → X is a secure PRF, with co-domain X being Z 2 .The semi-honest functionality, F setup appears in Fig. 15.The functionality for the malicious case is similar, except that the adversary now has the capability to abort.
To sample a random value r ∈ Z 2 among a set of t + 1 parties T = {P 1 , . . ., P t+1 } non-interactively, each P i ∈ T invokes F k T (id T ) and obtains r.Here, id T denotes a counter maintained by the parties in T , and is updated after every PRF invocation.The appropriate keys used to sample is implicit from the context, from the identities of the parties that sample.
Fsetup interacts with the parties in P and the adversary S. Fsetup picks random keys kT , k T for every set T , T ⊆ P of t + 1, t + 2 parties, respectively.Fsetup picks random keys kij for every pair of parties Pi, Pj ∈ P and i < j.Output: Send (Output, xs, ys, zs) to every Ps ∈ P.

Functionality Fsetup
Fig. 15: Ideal functionality for shared-key setup b) Collision-Resistant Hash Function: A family of hash functions [77] {H : K×M → Y} is said to be collision resistant if for all PPT adversaries A, given the hash function H k for k ∈ R K, the following holds: , where x, x ∈ {0, 1} m , m = poly(κ), and κ is security parameter.c) Commitment Scheme: Let Com(x) denote the commitment of a value x [69].The commitment scheme Com(x) possesses two properties; hiding and binding.The former ensures privacy of value x given its commitment Com(x), while the latter prevents a corrupt party from opening the commitment to a different value x = x.

A. Helper primitives
(1) Π [0] → [0] (Fig. 18): To generate [•]-shares of 0, each party non-interactively samples two values, each with one of its neighboring parties.A party's shares of 0 are defined as the difference between these values.
1. Pi, Pi+1, for i ∈ {1, . . ., n − 1}, sample a random value ri ∈R Z 2 , while P1, Pn sample a random value rn ∈R Z 2 , using their respective common PRF keys.(2) Π rand → r (Fig. 17): To generate • -shares of a random r ∈ Z 2 , every set of t + 1 parties non-interactively sample a random value using keys established during the setup phase and define r to be the sum of these values.

Define r = q
j=1 r T j .
Protocol Π rand Fig. 17: Generating • -shares of a random value (3) Π pRand (P s ) → r (Fig. 18): This protocol generates • -shares of a random value r such that P s learns all the shares.Every set of t + 1 parties non-interactively samples a random value together with P s , using the keys established (for every set of t + 2 parties) during the setup phase.

Define r = q
j=1 r T j .
Protocol Π pRand (Ps) Fig. 18: Generating • -shares of a random value along with Ps (4) Π •→ • (a) → a : This protocol generates a when a ∈ Z 2 is held by at least t + 1 parties, say parties in E. For this, P i ∈ E sets m a = a and • -shares of λ a as 0. To generate a in the malicious case where all parties hold a, we let parties set m a = a and shares of λ a as 0.
(5 19): This protocol enables parties in T = {E 1 , E 2 , . . ., E t+1 } to generate T [a] from [a].To generate T [a] i , the idea is to sum up the shares in a T1 , . . ., a Tq , while ensuring that every share is accounted for and no share is incorporated more than once.Concretely, for share a Tj held by parties in T j for j ∈ {1, . . ., q}, , where e i j = 1 if Ei has the least index in Tj, and 0, otherwise.
, and is denoted as 1.For j ∈ {1, . . ., q}: , and a T j b i = 0, otherwise.(13) Π • (P s , a) → a : To enable P s to generate a , parties generate a Tj for j ∈ {1, . . ., q − 1} using Π pRand , with P s learning a Tj (i.e., a Tj are sampled using common key amongst t + 2 parties).P s sets a Tq = a − q−1 j=1 a Tj and sends a Tq to parties in T q .For malicious case, this is followed by invoking Π agree (P, { a Tq }) to check consistency of value sent by P s .

APPENDIX B MPCLAN PROTOCOLS
A. Semi-honest protocols a) Input sharing: The protocol for input sharing appears in Fig. 21.
Online: Ps computes and sends ma = a + λa to all Pi ∈ E.
Protocol Π Sh (Ps, a) Fig. 21: Semi-honest: Input sharing protocol b) Truncation -Instantiating F TrGen : We rely on a modified version of the doubly shared random bit (a bit that is arithmetic as well as Boolean shared) generation protocol of [29], extended to our n-party setting, to generate r , r d as required to perform truncation.Here, r d represents the truncated (by d bits) version of r ∈ Z 2 .The resulting protocol is referred to as Π dsBits (Fig. 22).

5.
Let ci be the smallest root of ei modulo 2 +2 , and c −1 , and b R i j , bi B j as the least significant bits and the least significant bit of bi +2 j , respectively.
Fig. 22: Semi-honest: Doubly shared bits At a high-level, generation of doubly shared bits relies on the property that every non-zero quadratic residue has exactly one root when working over fields.The work of [29], operating over rings, shows that something similar holds over rings as well.Concretely, according to lemma 4.1 of [29]: if a is such that a 2 ≡ 1, then a is congruent mod 2 to either 1, −1, −1 + 2 −1 , 1+2 −1 .Thus, the doubly shared bit generation protocol of [29] proceeds as follows.Generate a 2 for a ∈ Z 2 +2 such that a 2 ≡ +2 1, and compute its smallest root c mod 2 +2 .Compute (c −1 a), and by lemma 4.1 of [29] it follows that c −1 a ∈ {±1, ±1 + 2 +1 }.That is, (c −1 a) is congruent to ±1 modulo 2 +1 .Thus, d = c −1 a + 1 is congruent to 0 or 2 modulo 2 +1 with equal probability.Hence, setting b = d/2 outputs bit b = 0 or bit b = 1 with equal probability.Observe that the computation has to be performed over Z 2 +2 .Hence, in the protocol description, we use + 2 in the superscript to distinguish shares of x over Z 2 +2 from its shares over Z 2 .
The main change in Π dsBits from that of the protocol in [29] is that to generate r , r d Π dsBits generates random doubly shared bits b 0 , . . ., b −1 ∈ Z 2 instead of a single one, and composes these bits to generate r, and composes the higher − d bits to generate r d , as follows.
Looking ahead, Π dsBits can also be used only to generate a single doubly shared random bit, which finds use in other building blocks such as bit to arithmetic conversion and arithmetic to Boolean conversion.Thus, to distinguish the case when ( r , r d ) has to be generated versus when only a single doubly shared bit is to be generated, Π dsBits takes a bit isTr as input and gives as output a doubly shared bit b R , b B if isTr = 0, and ( r , r d ) otherwise.The protocol appears in Fig. 22.
A final thing to note is that the computation in Π dsBits proceeds over secret-shared data.Thus, to generate shares of the doubly shared bit b, one should be able to divide each share of d by 2, which necessitates d and its shares to be even.This holds true since Here, 2c −1 u +2 is even due to multiplication by 2, while c −1 + 1 is even since c −1 is odd by definition.c) Dot product: As described before, a dot product can be viewed as n instances of multiplication such that the communication for all the instances is aggregated and performed in a single shot to eliminate the vector-size dependency.Consequently, the dot product protocol follows along the lines of the multiplication, and the formal details appear in Fig. 23.

Preprocessing:
1. Invoke Π rand to generate r where r ∈ Z 2 , followed by The goal of 3-input multiplication (Fig. 24) is to generate • -sharing of z = abc given a , b , c , in a single shot.Observe that ] can be generated in the preprocessing among the parties in E, parties proceed with a similar online phase as in Π mult to compute the 3-input multiplication without inflating the online cost.With respect to the preprocessing phase, , Fig. 20).Following this parties in D communicate their share of [Λ ac ] and [Λ bc ] to P king , each masked with a random [•]-sharing of 0 (generated using Π [0] , Fig. 16).This establishes -For generating E [Λ ab ], a slightly different approach is taken where parties first generate Λ ab using λ a , λ b (as explained later), followed by non-interactively generating 19).The reason for generating Λ ab (instead of directly generating Similarly, for the 4-input multiplication, to obtain •sharing of z = abcd given the • -sharing of a, b, c, d, we can write z + r as is generated similar to generating E [Λ abc + r] in Π 3-mult .We omit formal details of 4-input multiplication protocol, Π 4-mult , as it is very close to Π 3-mult . Preprocessing: 1. Invoke Π rand to generate r and γ where r, γ ∈ Z 2 .Invoke Π Online: ) is similar to the semi-honest one, where to enable P s to generate a , parties generate λ a such that P s learns λ a , followed by P s sending the masked value m a = a + λ a to all.However, note that a corrupt P s can cause inconsistency among the honest parties by sending different masked values.To ensure the same value is received by all, parties perform a hashbased consistency check, denoted by Π agree ( §II), where each party sends a hash of the received masked value(s) to every other party and aborts if it receives inconsistent hashes.Note that this check for all the inputs can be combined, thereby amortizing the cost.
Online: Ps computes and sends ma = a + λa to all Pi ∈ P.

Protocol Π M
Sh (Ps, a) Fig. 25: Malicious: Input sharing protocol b) Reconstruction: To reconstruct • -shared value a towards P s ∈ P, observe that each share that P s misses is held by t + 1 other parties.Each of these parties sends the missing share to P s .If the received values for a share are consistent, P s uses this value to perform reconstruction, and aborts otherwise.As an optimization, one party can send the missing share while reconstructing several values, and t others can send its hash. Preprocessing: 1. Invoke Π rand to generate λz where λz ∈R Z 2 .
2. For j ∈ {1, . . ., q}: • Each Pi ∈ Tj generates commitments on λz T j using the common randomness, and sends to all other parties.
• Pi / ∈ Tj aborts if commitments for λz T j are inconsistent. Online: 1. Parties broadcast an alive bit, indicating that they did not abort.
2. If all parties are alive, Pi ∈ P sends the decommitment to the shares in λz i to the respective parties.Fairness is a stronger security notion than security with abort, where, during reconstruction, either all parties learn the output or none do.For fair reconstruction, we extend the techniques in [74] to the n-party setting, where commitments are generated on each share of the mask (required to reconstruct z) by t + 1 parties in the preprocessing phase.During the online phase, these are decommitted towards the respective parties if all parties are alive (did not abort).Since there is at least one honest party among every set of t + 1 parties, if all honest parties are alive, then parties are guaranteed to obtain the correct decommitment of the missing share from the honest party, and all honest parties can reconstruct the output.Else, none of the parties will obtain the output.c) Multiplication: The maliciously secure multiplication protocol (Π M mult ) appears in Fig. 27.Overcoming the privacy breach described in [47] We elaborate on the privacy breach that arises due to deferring the correctness check and how it is overcome in our case.We first explain the attack that a malicious adversary can launch if reconstruction towards P king is performed by relying on RSS (or Shamir sharing) naively and further justify why it gets bypassed in our protocol.Consider a circuit with two sequential multiplication gates with the output of the first gate, say a, going as input to the second gate.Let b denote the other input to the second multiplication gate, and z denote its output.In a P king based approach for multiplication, t parties send their respective (RSS/Shamir) share of a masked value to P king .In particular, for the first multiplication gate in the circuit mentioned above, t parties send their corresponding share of a − r a to P king , who reconstructs it and sends it back to all.Delaying the verification allows a malicious P king to send an inconsistent value of a − r a to the parties, using which it can learn the private input b, as follows.Suppose P king sends the correct a − r a to all but one out of the remaining t online parties, to which it sends a − r a + δ.Owing to this, for the next multiplication gate P king receives the shares of z−r z from the former t−1 parties and a share of (a+δ)b−r z = z+δb−r z from the latter party.Having obtained these and additionally using the shares of z − r z and z + δb − r z corresponding to the t corrupt parties including itself, a malicious P king can reconstruct z − r z as well as z + δb − r z , thus learning b in clear.The crux of this attack lies in the fact that a malicious adversary corrupting t parties including P king already possesses t shares each of z−r z and z+δb−r z .Thus, an additional share of these obtained from the online parties allows it to carry out the attack successfully.However, the same does not hold for the case of additive ( E [•]) sharing.isTr = 1 denotes that truncation is required and isTr = 0 denotes otherwise.

P king reconstructs ζ, computes and sends
Verification for all multiplication gates: Invoke Π Vrfy on •shares of (a1, b1, z1), . . ., (am, bm, zm) which denote the inputs and outputs of the m multiplication gates whose correctness is to be verified.a z−r is sent to parties in E during the online phase computation whereas it is sent to parties in D in a single shot before verification begins.Notice that in our protocol, during reconstruction towards P king , any redundancy due to • -sharing is eliminated with parties switching to E [•]-sharing (additive sharing among parties in E).Due to this, even if P king sends inconsistent values to the parties, the E [•]-share of z − r z or z + δb − r z that it receives, corresponds to an additive share defined with respect to parties in E. Hence, this additionally received additive share cannot be combined with the shares held by the t corrupt parties to perform the reconstruction.Thus, the earlier strategy of P king of using these additional shares in conjunction with the t corrupt shares to reconstruct z − r z and z + δb − r z does not hold.The primary reason which prevents the attack is the elimination of redundancy in the sharing scheme by switching to (t + 1)-out-of-(t + 1) additive sharing ( E [•]-sharing) for the set of parties in E, which is known to withstand this attack [47].
Discussion about [38] The above attack can be circumvented by making P king broadcast the reconstructed value to all the parties, as discussed in [38].To further optimize the protocol by requiring only t + 1 parties to be active in the online phase, they rely on broadcast with abort, which comprises two phases-(i) send: where P king sends the value to the recipients, and (ii) verification: where the recipients exchange hash of the received value among themselves, and abort in case of inconsistency.However, for amortization, they defer the verification (even with respect to broadcast) towards the end of the protocol, thus making their protocol susceptible to the aforementioned attack.We observe that one fix is to perform the verification with respect to broadcast after each level in the circuit.This, however, requires all the parties to be online.An optimization to let only the t + 1 parties in the online phase to perform this verification after each level, thereby allowing the remaining t parties to be shut off.Specifically, this involves performing verification where the online parties exchange the hash of the received value and abort in case of inconsistency.When the remainder t (offline) parties come online towards the end of the protocol for verifying the correctness of the multiplication gates, this verification should be preceded by first verifying the consistency of the values broadcast by P king to the offline parties (and involves participation of all n parties).Since the online phase involves broadcasting the reconstructed value to t other online parties, this amounts to an exchange of O(t 2 ) hashes after each level, thereby incurring a circuit depth-dependent overhead in the communication cost as well as the rounds.In order for the communication cost to get amortized, it is required that the circuit has O(t 2 ) gates at each level.However, the overhead in terms of number of rounds persists.

Multiplication with truncation -Instantiating F M
TrGen with maliciously secure doubly shared bits generation protocol: As mentioned earlier, F M TrGen (Fig. 28) can be realized using the maliciously secure variant of Π dsBits , denoted as Π M dsBits .This protocol is similar to the semi-honest protocol except with the following differences to account for malicious behaviour.The • -shares of e i = a 2 are generated by invoking Π multPre instead of relying on . This ensures generation of correct • -shares of e i , and malicious behaviour, if any, will lead to an abort.Following this, e i is either correctly reconstructed towards all or parties abort.This ensures that an adversary cannot lead to reconstruction of an incorrect e i .Concretely, for reconstruction, similar to multiplication, every party sends its • -share to every other party, and aborts in case of inconsistencies in the received values 5 .The rest of the protocol steps (which are non-interactive) remain unchanged, and hence a formal protocol is omitted.

F M
TrGen interacts with the parties in P and the adversary S.
Input: F M TrGen optionally receives a special command, (abort, P), from S indicating that honest parties in P with indices in P should abort.The malicious variant of multi-input multiplication protocol, at a high level, can be viewed as an amalgamation of the semi-honest multi-input multiplication and the malicious multiplication protocol.For the case of 3-input multiplication, recall that the semi-honest protocol to compute z given a , b and c where z = abc requires parties to obtain E [Λ ab ], E [Λ ac ], E [Λ bc ] and E [Λ abc ] in the preprocessing phase, which is then used to reconstruct m z in the online phase.
Since parties in E are required to hold the correct E [•]sharings before the online phase begins, as in the case of multiplication, the techniques from semi-honest protocol fail in this setting.Hence, our protocol uses 4 instances of a maliciously secure multiplication protocol in the preprocessing phase, one each to compute Λ ab , Λ ac , Λ bc and Λ abc .Each of the • -sharing is further converted to E [•]-sharing using Π • → E [•] to ensure active participation of only t + 1 parties in the online phase for reconstruction of z − r .Further, to detect malicious behaviour during reconstruction of z − r, a verification check similar to the multiplication protocol is performed such that parties abort if the check fails.19), computation of the former differs significantly from the semi-honest protocol.For this, we extend the ideas from SWIFT [56] and generate σ, by executing a maliciously secure dot product protocol Π dotPre (abstracted as a functionality F DotPPre in Fig. 29).Specifically, parties invoke Π dotPre on • -shares of λ x = (λ x1 , . . ., λ xn ) and λ y = (λ y1 , . . ., λ yn ) to compute σ , followed by an invocation of Π Having computed the necessary preprocessing data, the online phase proceeds similarly to the semi-honest protocol where parties reconstruct z − r via P king .To account for misbehaviour, the protocol is augmented with a verification phase similar to that in malicious multiplication.
Observe that a trivial realization of F DotPPre can be reduced to n instances of multiplication.However, we extend the ideas from [14], [15], [56] to eliminate the vector-size dependency in the preprocessing phase.For this, we instantiate Π dotPre using a semi-honest dot product protocol [48] whose cost matches that of semi-honest multiplication [31], followed by a verification phase where the cost of verification can be amortized away for multiple dot products, thereby resulting in vector-size independent preprocessing.
Elaborately, the semi-honest dot product [48] protocol takes as input x , y where x, y are vectors of size n, and outputs z = x y .For this, parties invoke Π • • • →[•] on each element in x, y and sum these up to generate [ρ] = [ x y].These shares are randomized by summing with [r] (converted from r ) for a random r, and the sum z + r = ( x y) + r is reconstructed towards P king , who sends the reconstructed z + r to parties in E. All parties then non-interactively generate z + r by setting one of its share as z+r and the others as 0. Given z + r , r , parties can compute z = z + r − r .Observe that communication of [z + r] to P king requires 2t elements, while communicating z + r to parties in E requires t elements, resulting in a matching cost of 3t elements as that required for semi-honest multiplication [31].
F DotPPre interacts with the parties in P and the adversary S. Let Ti be the set of the honest parties.
Input: F DotPPre receives the • -shares of the vectors a = (a1, . . ., an) and b = (b1, . . ., bn) from the parties.F DotPPre also receives • -shares of z = a b of corrupt parties from S. S is also allowed to send a special command, (abort, P), which indicates that parties in P with indices in P should abort.To verify the correctness of this dot product computation, we extend the verification technique for multiplication in [15], to verify the correctness of dot product.We give a high level idea of how the verification of m dot product triples ( x 1 , y 1 , z 1 ), . . ., ( x m , y m , z m ), can be performed.For this, correctness of the dot product triples can be verified by taking a random linear combination, where {θ k } m k=1 is randomly chosen by all the parties and checking if β = 0. Given • -shares of x k , y k , z k for k ∈ {1, . . ., m}, parties can compute an additive share ( . However, since [•]-sharing does not allow for robust reconstruction, the approach is to generate β and then robustly reconstruct it and check equality with 0. To generate β , parties first Let ψ i denote the [•]-share of ψ held by P i .Given ψ i for i ∈ {1, . . ., n}, parties can compute and reconstruct β.It is, however, required to ensure that every party P i • -shares the correct ψ i .To check the correctness of ψ i , parties need to verify if where x i kj , y i kj denote the • -share of x kj , y kj held by P i .Note that following along the lines of Π •→ • , parties can generate these • -share of x i kj , y i kj from • -shares of x kj , y kj , noninteractively.Now, setting a kj = θ k x i kj , b kj = y i kj , c = ψ i , for k ∈ {1, . . ., m}, Eq. ( 5 The correctness of Eq. ( 6) can be verified by invoking F abort proveDeg2Rel (see section 3 of [15] for the definition and its instantiation), which takes as input • -shares of ãl , bl , c for l ∈ {1, . . ., mn}, which are known in clear to party P i , and verifies if Eq. ( 6) holds.The protocol realizing F abort proveDeg2Rel for all n parties requires communicating O(n log(mn)+n) extended ring elements per party.Further, since steps other than F abort proveDeg2Rel require sharing and reconstructing one element, it adds a small constant cost, resulting in the communication cost for verifying m dot products for vector size n being O(n log(mn) + n) extended ring elements per party.

APPENDIX C BUILDING BLOCKS
For completeness, we discuss the building blocks used in our framework.These blocks are known from the literature [57], [73] and we show how these can be extended to n-party setting.g) Maxpool / Minpool: Maxpool allows parties to compute • -share of the maximum value x max among a vector of values x = (x 1 , . . ., x n ).For this, we proceed along the lines of [57].Observe that the maximum among two values x i , x j can be computed by first using the secure comparison protocol to obtain b B such that b = 0 if x i ≥ x j and 1 otherwise.Following this, parties can compute b(x j − x i ) + x i using the bit injection protocol, to obtain the maximum value as the output.To compute the maximum among a vector of values, parties follow the standard binary tree-based approach where consecutive pairs of values are compared in a level-bylevel manner.We refer to the resulting protocol as Π max .A protocol Π min for minpool can be worked out similarly.B and v , ReLU is computed using Π BitInj .

B. Malicious blocks
Note that the malicious variants for the building blocks such as bit to arithmetic, Boolean to arithmetic, and arithmetic to Boolean conversion, bit extraction, secure comparison, secure equality check, ReLU, maxpool, and convolutions, follow along similar lines to that of the semi-honest protocols with the difference that the underlying protocols used are replaced with their maliciously secure variants.Moreover, for steps that involve opening values via P king , the reconstructed values are sent to all and are accompanied by a verification check similar to the one in the multiplication protocol.

C. Communication cost
Table IX summarises communication cost and online round complexity of semi-honest and maliciously secure protocols.

APPENDIX D ADDITIONAL BENCHMARKS
A. Deep NN and GNN a) NN architecture: Among NNs, the first, NN-1, is a 3-layered fully connected network with ReLU activation after each layer, as considered in [68], [74], [56].The second, NN-2, is LeNet [60] architecture, which contains two convolutional layers and two fully connected layers with ReLU activation after each layer.Additionally, for convolutional layers, this is followed by maxpool operation.Finally, NN-3 is VGG16 [82] architecture that comprises 16 layers in total, which includes fully connected, convolutional, ReLU activation, and maxpool layers.Last 2 NNs were considered in [85].
b) GNN architecture: The goal of spectral-based GNNs [35], [55] is to learn a function of signals x 1 , . . ., x m each of length n, on a graph G = (V, E, M), where V is the set of n vertices of the graph, E is the set of edges and M is the the graph description in terms of an n × n adjacency matrix.The j th component of every signal x i corresponds to j th node of the graph.Training data is used to compute graph description M , which is common for all signals considered.The approximation of graph filters using a truncated expansion in terms of Chebyshev polynomials was put forth in [35].Chebyshev polynomials are recursively defined as follows: and the inference phase for a n × c signal matrix X with f feature maps, where c represents the dimension of feature vector for each node, with a K-localized filter matrix Θ k can be performed as and λ max is the largest eigenvalue of the normalized graph Laplacian L, Y is an n × f dimensional matrix and the trainable parameter for the k th layer Θ k is of dimension c × f .We use the simplified architecture of [35] given in [81].The GNN architecture in the latter uses one graph convolution layer without pooling operation instead of the original model with two graph convolution layers, each of which is followed by a pooling operation.Further, K is set to 5 instead of 25.This architecture is shown to achieve an accuracy of more than 99% on MNIST classification in [81].
-  Compared to our semi-honest variant for evaluating NNs, the malicious variant incurs a 2× higher online communication cost for NN-1 and NN-2.However, this difference closes in with deeper NNs, with the communication being 1.5× for NN-3.The drop in the difference can be attributed to the one-time cost of verification required in the malicious variant, which gets amortized over deeper circuits.Due to the same reason, in comparison to the semi-honest case, the malicious variant has an overhead of around 1 second in the online run-time, which in turn reflects in the reduced throughput.Similar to the semi-honest evaluation of NNs, the overall communication is an order of magnitude higher than the online communication due to the cost incurred for truncation during preprocessing.Also, analogous to the trend observed for synthetic circuits, the overhead in overall run-time is approximately 11 seconds owing to the distributed zero-knowledge proof verification required in the preprocessing phase.For GNN, the trend follows closely to that of NN-3,where malicious variant incurs 1.5× higher communication than its semi-honest counterpart.

B. Biometric Matching
Given a database of m biometric samples ( s 1 , . . ., s m ) each of size n, and a user holding its sample u, the goal of biometric matching is to identify the sample from the database that is "closest" to u.The notion of "closeness" can be formalized by various distance metrics, of which Euclidean Distance (EuD) is the most widely used.Following the general trend, we reduce our biometric matching problem to that of finding the sample from the database which has the least EuD with the user's sample u.We follow [70], [73] where EuD between two vectors x, y of length n is given as where z = ((x 1 − y 1 ), . . ., (x n − y n )).To achieve this goal of performing biometric matching securely, each s i , for all i ∈ {1, . . ., m} in the database is •shared among the n parties participating in the computation.Specifically, each component s ij , for all j ∈ {1, . . ., n} is • -shared among all the parties.Similarly, the user also •shares its sample u.The parties compute a • -shared distance vector DV of size m, where the i th component corresponds to the EuD between u and s i .For this, each party locally obtains z i = s i − u and computes DV i according to Eq. 7 using the dot product operation.The final step is then to identify the minimum of these m components of DV, which can be performed using the protocol Π min for minpool operation.Table XII tabulates the benchmarks when the database has 4096 and 16384 samples.
The trend observed for 4096 and 16384 samples adheres to that observed from Table VIII for the case of 1024 and 65536 samples.Specifically, these settings also enjoy a 4.6× improvement in online run-time and throughput and around 83% saving in the monetary cost compared to DN07.Moreover, similar to the prior case, the malicious variant incurs a minimal overhead of 4% in the online throughput and 9.5% in the total communication compared to our semi-honest setting.

C. Genome Sequence Matching
Given a genome sequence as a query, genome matching aims to identify the most similar sequence from a database of sequences.This task is also commonly referred to as Similar Sequence Query (SSQ) identification and has implications in the advancing field of medical science.An SSQ algorithm on two sequences s and q, requires the computation of Edit Distance (ED), which quantifies how different two sequences are by identifying the minimum number of additions, deletions, and substitutions needed to transform one sequence to the other.To compute the ED, we extend the (2-party) protocol from [79] which builds on top of the approximation from [5], to the n-party setting.We describe high-level idea of the approximation algorithm for ED computation for a query sequence q against a database of sequences {s 1 , . . ., s m }.
1.For i = 1 to ω • For j = 1 to m -Invoke Π Eq on LUTs[i][j] and q[i] to generate bj B .
-Invoke Π bit2A on bj B and generate bj .Protocol Π ED (P, LUTs , q ) Fig. 33: Edit distance between query q and sequence s with respect to a database of m sequences and ω blocks The ED approximation algorithm has a non-interactive phase, during which the database owner with the sequences s 1 , . . ., s m , generates a Look-Up-Table (LUT) for each sequence.These LUTs are then secret-shared among all the parties.To generate the LUT, the sequences in the database are aligned with respect to a common reference genome sequence (using the Wagner-Fischer algorithm [86]), and divided into blocks of a fixed, predetermined size.Based on the most frequently occurring block sequences in the database, an LUT is constructed consisting of these block values and their distance from each other.Specifically, for a database of m sequences {s 1 , . . ., s m }, each of length ω blocks, an LUT i is constructed for each s i .Each LUT has m columns, one corresponding to each s i in the database, and ω rows, one corresponding to each block of a sequence, where LUT s [i][j] corresponds to the ED between block i of the sequence s and s j .This completes the non-interactive phase of the ED approximation algorithm.
Given the LUTs, when a new query q has to be processed, its ED must be computed from every sequence s in the database.For this, similar to the non-interactive phase, the query is first aligned with the reference sequence and broken down into blocks of the same fixed size.Then, the i th block from the query is matched with the i th block of each sequence in the LUT for a sequence s.If the block values match, then the precomputed distance is taken as the output for that block; otherwise, the output is taken to be 0. Finally, the resultant sum of distances for all the blocks is taken to be the approximated ED between q and the sequence s.Computing the ED to all such sequences s in the database then allows the identification of the most similar sequence for the query using the minpool operation.Algorithms for ED computation between two sequences, and SSQ appear in Fig. 33, Fig. 34, respectively, where accuracy and correctness follow from [5].

For s = 1 to m
• Invoke Π ED on LUTs and q to generate ds .
Protocol Π SSQ (P, { LUTs } m s=1 , q ) Fig. 34: Similar sequence queries Since the generation of LUTs happens non-interactively, we only focus on the computation of ED with respect to the new query q, which requires interaction, and benchmark the same.Table XIII provides the benchmarks when the database consists of m = 1000, 4000 for block length ω = 25, 35 respectively.As expected, the observations tabulated for the varying sequence lengths follows closely to the ones for the case of m = 2000 and ω = 30 given in Table VII

APPENDIX E SECURITY PROOFS
Security proofs are given in the real-world/ideal-world simulation-based paradigm [63].Let A sh , A mal denote the realworld semi-honest, malicious adversary, respectively, corrupting at most t parties in P, denoted by C. Let S sh , S mal denote the corresponding ideal world semi-honest, malicious adversary, respectively.Security proofs are given in the F setup , F TrGen -hybrid (and F M TrGen , F MulPre , F DotPPre -hybrid for malicious setting) model.For modularity, we provide simulation steps for each protocol separately.
The following is the strategy for simulating the computation of function f (represented by a circuit ckt).The simulator S sh knows the input and output of the adversary A sh , and sets the inputs of the honest parties to be 0. S sh emulates F setup and gives the respective keys to the A sh .Knowing all the inputs and randomness, S sh can compute all the intermediate values for each building block in the clear.Thus, S sh proceeds to simulate each building block in topological order using the aforementioned values (input and output of A sh , randomness and intermediate values).We provide the simulation steps for each of the sub-protocols separately for modularity.When carried out in the respective order, these steps result in the simulation steps for the entire computation.To distinguish the simulators for various protocols, we use the corresponding protocol name as the subscript of S sh .a) Sharing and Reconstruction: Simulation for input sharing (Fig. 21) and reconstruction appears in Fig. 35, Fig. 36, respectively.

Preprocessing:
-Emulate Fsetup and give the respective shared keys to A sh .-Samples shares of λa commonly held with A sh using the respective PRF keys while other values are sampled randomly.

Online:
-If Ps ∈ C, receive ma from A sh on behalf of honest parties in E. Else, set a = 0, ma = λa and sends ma to A sh on behalf of Ps if there exists a corrupt party in E. Send the shares of the honest parties in E to A sh .

Simulator S sh
-If P king is honest, send output a to A sh on behalf of P king .
Simulator S sh Rec Fig. 36: Semi-honest: Simulation for reconstruction b) Multiplication: Simulation steps for multiplication (Fig. 4) are provided in Fig. 37. Observe that the adversary's view in the simulation is indistinguishable from its view in the real world since it only receives random value in each step of the protocol.

Preprocessing:
-If isTr = 0: Sample • -shares of r commonly held with A sh using the respective shared keys while other values are sampled randomly.
-Else if isTr = 1: Emulate F TrGen to generate r , r d .
-On behalf of every honest Pi ∈ D, send a random value for [Λ ab − r] i to A sh if P king ∈ C.

Online:
-If P king ∈ C, send random value for E [ζ] i to A sh on behalf of the honest Pi ∈ E.
-If P king / ∈ C, send a random z − r to A sh , if there exists a corrupt party in E.
Simulator S sh mult Fig. 37: Semi-honest: Simulation for Π mult c) Other building blocks: Simulation steps for the remaining building blocks can be obtained analogously by simulating the steps for the respective underlying protocols in their order of invocations.

A. Malicious security
The following is the strategy for simulating the computation of function f (represented by a circuit ckt).The simulator emulates F setup and gives the respective keys to the malicious adversary, A mal .This is followed by the input sharing phase in which S mal extracts the input of A mal , using the known keys, and sets the inputs of the honest parties to be 0. Knowing all the inputs, S mal can compute all the intermediate values for each building block in the clear.Further, S mal invokes F mal n−PC and obtains the function output on clear.S mal proceeds to simulate each building block in topological order using the aforementioned values (inputs of A mal , intermediate values, and function output).As before, we provide the simulation steps for each of the sub-protocols separately for modularity.When carried out in the respective order, these steps result in the simulation steps for the entire computation.To distinguish the simulators for various protocols, the corresponding protocol name appears as the subscript of S mal .a) Sharing: Simulation for sharing appears in Fig. 38.

Preprocessing:
-Emulate Fsetup and give the respective shared keys to A mal .-Samples shares of λa commonly held with A mal using the respective PRF keys while other values are sampled randomly.

Online:
-For Ps ∈ C, receive ma from A mal on behalf of honest parties in E, and obtain a = ma − λa (since S mal knows all the PRF keys, it knows λa).Invoke F mal n−PC with (Input, a) on behalf of A mal .-On behalf of the honest parties, set its input a = 0, ma = λa and send ma to A mal if there exists a corrupt party in E.
Verification: Send H(ma) to A mal on behalf of the honest parties.If inconsistent mas were received with respect to a corrupt party, invoke F mal n−PC with (Signal, abort).

1Fig. 3 :
Fig. 3: Steps of multiplication protocol Step 1 : Parties non-interactively generate r by locally sampling each of its shares (via Π rand ).Parties locally compute [r] and r from r using Π • →[•] and Π • → • , respectively.Looking ahead, [r] aids in generating additive shares of D, E, while r aids in computing z from z − r .Step 2 : This step involves computing additive shares of Λ ab − r among all parties.For this, parties non-interactively generate [Λ ab ] from λ a , λ b (via Π • • • →[•] ).P i ∈ P sets its additive share of Λ ab − r as [Λ ab − r] i = [Λ ab ] i − [r] i .Observe that the shares [Λ ab − r] i of P i ∈ D define the additive shares of D = (Λ ab − r) D among parties in D. Similarly, the shares [Λ ab − r] i of P i ∈ E define the additive shares of (Λ ab − r) E among parties in E (i.e.E [(Λ ab − r) E ] ).Step 3 : Parties in E generate additive shares of λ a , λ b among themselves ( E [•]-shares, via Π • → E [•] ).Looking ahead, E [λ a ], E [λ b ] aid in generating additive shares of E among E.

-
Samples random r ∈ Z 2 , and computes r d = r/2 d .-Generates • -shares of r, r d and set output share for Ps ∈ P as ys = { r s , r d s }.Output: Send (Output, ys) to Ps ∈ P.

Fig. 12 :
Fig. 12: Monetary cost (in USD) for evaluating circuits (1000 instances) of various depths (d) for n = 9 parties.The values are reported in log 2 scale.Bars in solid colors denote computation over network given in Fig. 11, while the area represented via crosshatch pattern denotes the additional cost incurred in the symmetric rtt setting (356 ms).

Fig. 13 :
Fig. 13: Comparison for GNN and deep NN between our semi-honest protocol and DN07 (values plotted are logarithmic in base 2)

( 9 )
Π • → • ( a ) → a : To convert a , to a , set m a = 0 and set λ a = − a .(10) Π • → • ( a ) → a : To convert a to a , set a Tj = − λ a Tj for j ∈ {1, . . ., q − 1} and a Tq = m a − λ a Tq , where T q = E. (11) Π • • • →[•] ( a , b ) → [ab] (Fig. 20): Given a , b , parties non-interactively compute [ab] as follows.Observe that [ab] = q j=1 a Tj b .To generate a Tj b , the idea is to generate Tj a Tj b and perform a conversion.Parties in T j generate Tj a Tj b as Tj a Tj b = a Tj • Tj [b] .To obtain a Tj b from Tj a Tj b , P i ∈ P sets a Tj b i = Tj a Tj b i if P i ∈ T j and a Tj b i = 0, otherwise.

Fig. 20 :
Fig. 20: a , b to [ab] (12) Π agree (P, { v 1 , . . ., v n }) → continue/abort: Allows parties to check if they hold the same set of values v = (v 1 , . . ., v m ), where parties continue if the values are same, and abort otherwise.We denote the version of v held by P i ∈ P as v i .To check for consistency of v, parties compute hash, H = H(v 1 || . . .||v m ), of the concatenation of all values v 1 , . . ., v m , and exchange H among themselves.If any party receives inconsistent hashes, it aborts; else it continues.

3 .Fig. 23 :
Fig. 23: Semi-honest: Dot product protocol and [r], which closely follows the preprocessing phase of the 2-input multiplication.Specifically, parties can generate [Λ abc ] using Π • • • →[•] (Fig. 20) on Λ ab , λ c , followed by parties in D communicating their [Λ abc ] shares masked with [•]-sharing of a random r to P king .This generates E [Λ abc + r]-sharing required during online phase.-Regarding generation of Λ ab , all parties generate •sharing of a random γ ∈ Z 2 non-interactively and convert it to [γ].Parties then compute [Λ ab + γ] by computing [Λ ab ] from λ a , λ b followed by summing it up with [γ].Parties reconstruct this value towards P king , who then generates Λ ab + γ , from which parties compute

3 .Fig. 24 :
Fig. 24: Semi-honest: 3-input multiplication protocol B. Malicious protocols a) Input sharing: This protocol (Π M Sh (P s , a)) is similar to the semi-honest one, where to enable P s to generate a , parties generate λ a such that P s learns λ a , followed by P s sending the masked value m a = a + λ a to all.However, note that a corrupt P s can cause inconsistency among the honest parties by sending different masked values.To ensure the same value is received by all, parties perform a hashbased consistency check, denoted by Π agree ( §II), where each party sends a hash of the received masked value(s) to every other party and aborts if it receives inconsistent hashes.Note that this check for all the inputs can be combined, thereby amortizing the cost.

FFig. 28 :
Fig. 28: Ideal functionality F M TrGen d) Multi-input multiplication:The malicious variant of multi-input multiplication protocol, at a high level, can be viewed as an amalgamation of the semi-honest multi-input multiplication and the malicious multiplication protocol.For the case of 3-input multiplication, recall that the semi-honest protocol to compute z given a , b and c where z = abc requires parties to obtain E [Λ ab ], E [Λ ac ], E [Λ bc ] and E [Λ abc ] in the preprocessing phase, which is then used to reconstruct m z in the online phase.

For 4 -
input multiplication, parties obtain • -sharing of z = abcd using z − r = (m a − λ a )(m b − λ b )(m c − λ c )(m d − λ d ) − r.The protocol proceeds in a similar manner as the 3input case by delegating the computation of product terms to the preprocessing phase.e) Dot product: To generate z for z = x y where x and y are vectors of size n and are • -shared, the protocol proceeds similar to the semi-honest variant.That is, in the preprocessing phase parties in E obtain E [•]-shares of σ = n k=1 λ x k λ y k and λ x k , λ y k for k ∈ {1, . . ., n}.Although the latter two can be computed by parties locally with an invocation of Π • → E [•] (Fig.

F
DotPPre proceeds as follows.-Reconstruct a k , b k for k ∈ {1, . . ., n} using the shares received from honest parties and compute z = n k=1 a k • b k .-Compute the • -share of z to be held by the set of honest parties as the difference between z and the sum of • -shares of z received from corrupt parties.-Let ys denote the • -shares of z for party Ps ∈ P. If received (abort, P) from S, set ys = abort for Ps, where s ∈ P.Output: Send (Output, ys) to every Ps ∈ P.Functionality F DotPPre

Fig. 29 :
Fig. 29: Ideal functionality for Π dotPre h) ReLU: The ReLU function, ReLU(v) = max(0, v), can be written as ReLU(v) = b•v, where bit b = 1 if v < 0 and 0 otherwise.Here b denotes the complement of b.Given v , parties invoke Π bitext on v to obtain b B .• B -sharing of b is then computed, non-interactively, by setting m b = 1 ⊕ m b .Given b

TABLE I :
Description of helper primitives -all primitives are non-interactive, except Π agree (see §A-A for details)

TABLE II :
Notations used in this work b) Helper primitives: We use the primitives described in Table I from literature Table III compares the cost of computing z = abc via a 2-input multiplication sequentially vs a 3-input multiplication, and computing z = abcd via a 2-input and 4-input multiplication.

TABLE III :
Semi-honest: Communication and round complexity for computing multi-input multiplications Table IV compares the cost of computing multi-input multiplication via a 2-input multiplication sequentially vs. the multi-input multiplication protocol.

TABLE IV :
Malicious : Communication and round complexity for computing multi-input multiplications

TABLE VI :
Latency in seconds (Preprocessing, Online) for varying depth (d) circuits with 1 million multiplications for n = 7

TABLE VIII :
Benchmarks for biometric matching.
a TP denotes throughput b monetary cost in USD

TABLE X :
Semi-honest: Benchmarks for deep NN and GNN.

TABLE XI :
Malicious: Benchmarks for deep NN and GNN.

TABLE XII :
Benchmarks for biometric matching for varying number of sequences in the database. .
a communication in MB b TP denotes throughput c communication in GB d monetary cost in USD

TABLE XIII :
Benchmarks for genome sequence matching for varying number of sequences (m) and block length (ω).