First, we describe the implementation details of HomoESM in Sect. 5.1, and we describe the benchmark implementations in Sects. 5.2, 5.3, 5.4, and 5.5.
Implementation of HomoESM
We have developed the HomoESM on top of the WSO2 Stream Processor (WSO2 SP) software stack. As we described earlier, WSO2 SP internally uses Siddhi which is a complex event processing library [20]. Siddhi feature of WSO2 SP lets users run queries using an SQL-like query language in order to get notifications on interesting real-time events.
High-level view of the system implementation is shown in Fig. 3. Input events are received by the ‘Event Publisher.’ Java objects are created for each incoming event and put into a queue. Event Publisher thread picks those Java objects from the queue according to the configured period. Next, it evaluates whether the picked event needs to be sent to the private or the public Siddhi server, according to the configured load transfer percentage and threshold values. If that event needs to be sent to private Siddhi, it will mark the time and delegate the event into a thread pool which handles sending to private Siddhi. If that event needs to be sent to public Siddhi, it will mark the time and be put into the queue which is processed by the Encrypt Master asynchronously.
Encrypt Master thread (see Fig. 4a) periodically checks a queue which keeps the events required to be sent to public cloud. The queue is maintained by the ‘Event Publisher’ (see Fig. 5a). If that queue size is greater than or equal to composite event size, it will create a list of events equal to the composite event size. Next, it delegates the event encryption and composite event creation task to the ‘Composite Event Encode Worker’ (see Fig. 4b).
Composite Event Encode Worker is a thread pool which handles event encryptions and composite event creations. First, it combines nonoperational fields of each plain event in the list by the pre-defined separator. Then, it converts operational fields into binary form and combines them together. Next, it pads the operational fields with zeros in order to encrypt using HElib API. Finally, it performs encryption on those operational fields and puts the newly created composite event into a queue which is processed by the ‘Encrypted Events Publisher’ thread (see Fig. 5b).
Firing events into the public VM is done asynchronously. Decision of how many events are sent to the public Siddhi server was taken according to the percentage we have configured initially. But the public Siddhi server’s publishing flow has max limit of 1500 TPS (tuples per second). If the Event Publisher receives more than the max TPS, the events are routed back into the private Siddhi server’s VM.
‘Encrypted Events Publisher’ thread periodically checks for encrypted events in the encrypted queue which is put by the ‘Composite Event Encode Worker’ at the end of the composite event creation and encryption process (see Fig. 4b). First, it combines nonoperational fields of each plain event in the list by the pre-defined separator. If there are encrypted events, it will pick those at once and send them to public Siddhi server. The encryptor module batches events into composite events and encrypts each composite message using homomorphic encryption. The encrypted events are sent to the public cloud where Homomorphic CEP Engine module conducts the evaluation.
We encrypt operand(s) and come up with composite operand field(s) in each HE function initially, in order to perform HE operations on operational fields in composite event. For example, in the case of the E-mail Filter benchmark, at the Homomorphic CEP Engine which supports homomorphic evaluations, initially it converts the constant operand into an integer (int) buffer with size 40 with a necessary 0 padding. Then, it replicates the integer buffer ten times and encrypts using HElib [14]. Finally, the encrypted value and the relevant field in the composite event are used for HElib’s relevant (e.g., comparison, addition, subtraction, multiplication, etc.) operation homomorphically. The result is replaced with the relevant field in the composite event and is sent to the receiver without any decryption.
The received encrypted information is decrypted and decomposed to extract the relevant plain events. The latency measurement happens at the end of this flow. ‘Event Receiver’ thread checks whether the event received from the Siddhi server is encrypted with homomorphic encryption. If so, it delegates composite event into ‘Composite Event Decode Worker.’ If not, it will read payload data and calculate the latency (see Fig. 6a).
After receiving a composite event from the Event Receiver, the Composite Event Decode Worker handles all decompositions and decryptions of the composite event (see Fig. 6b). It first splits nonoperational fields in the composite event by the pre-defined separator. Second, it performs decryption on the operational fields using HElib API and splits the decrypted fields into fixed-length strings. Then, it creates plain events using the split fields. Next, it checks each operational field in the plain event to see whether it contains zeros and then processes the events. Finally, it calculates the latency of the decoded events.
Note that we implement the homomorphic comparison of values following the work by Togan et al. [30]. We have used Togan et al.’s methodology for implementing homomorphic comparison operation because it provides an \(O(\log _2(n))\) solution which evaluates the comparison. Furthermore, according to the authors, their approach provides good results compared to other previous approaches [1]. For two single-bit numbers with x and y, Togan et al. [30] have shown that the following equations (see Eq. 3) will satisfy greater-than and equal operations, respectively.
$$\begin{aligned} \begin{aligned} x&> y \Leftrightarrow xy + x = 1 \\ x&= y \Leftrightarrow x + y + 1 = 1 \end{aligned} \end{aligned}$$
(3)
Togan et al. have created comparison functions for n-bit numbers using divide and conquer methodology. In our case, we derived two-bit number comparisons as follows. \(x_1x_0\) and \(y_1y_0\) are the two numbers with two bits (see Eq. 4). Here, every ‘+’ operation is for XOR gate operation and every ‘\(\cdot\)’ operator is for AND gate operation.
$$\begin{aligned} \begin{aligned}&x_1x_0> y_1y_0 \Leftrightarrow (x_1> y_1)(x_1 = y_1)(x_0> y_0) = 1 \\&\Leftrightarrow (x_1\cdot y_1 + x_1) + (x_1 + y_1 + 1)(x_0\cdot y_0 + x_0) = 1\\&\Leftrightarrow x_1\cdot y_1 + x_1 + x_1\cdot x_0\cdot y_0 + x_1\cdot x_0 +\\&y_1\cdot x_0\cdot y_0 + y_1\cdot x_0 + x_0\cdot y_0 + x_0 = 1 \\&x_1x_0 == y_1y_0 \Leftrightarrow (x_0 + y_0 + 1)\cdot (x_1 + y_1 + 1) = 1\\&\Leftrightarrow x_0\cdot x_1 + x_0\cdot y_1 + x_0 + y_0\cdot x_1 + y_0\cdot y_1 + y_0 + 1 = 1 \\&x_1x_0 < y_1y_0 \Leftrightarrow (x_1x_0 > y_1y_0) + (x_1x_0 == y_1y_0) + 1 = 1\\&\Leftrightarrow (x_1\cdot y_1 + x_1 + x_1\cdot x_0\cdot y_0 + x_1\cdot x_0 + y_1\cdot x_0\cdot y_0+\\&y_1\cdot x_0 + x_0\cdot y_0 + x_0) + (x_0\cdot x_1 + x_0\cdot y_1 +\\&x_0 + y_0\cdot x_1 + y_0\cdot y_1 + y_0 + 1) + 1 = 1 \end{aligned} \end{aligned}$$
(4)
Reason that we build up comparison functions for two-bit numbers is to apply the concept of homomorphic encryption and evaluation into the CEP engine. Even for two-bit number comparisons, a number of XOR and AND gate evaluations need to be done as above.
After evaluating the individual HE operations at public SP, filtering using those gate operations happens at private SP. Boolean conditions are evaluated on encrypted operands using HE with above limitations for input number range, and ‘NOT,’ ‘AND,’ and ‘OR’ gate operations are evaluated at private SP after decrypting/decoding the events which come from public SP after HE evaluations.
We have evaluated the HomoESM’s functionality using five benchmark applications developed using two datasets. Next, in order to ensure the completeness of this section, we describe the implementation details of these benchmarks.
E-mail Filter Benchmark
E-mail Filter is a benchmark we developed based on the canonical Enron e-mail dataset [21]. The dataset has 517,417 e-mails with an average body size of 1.8 KB, the largest being 1.92MB. The E-mail Filter benchmark only had filter operation and was used to compare filtering performance compared to the EDGAR Filter benchmark which is described in the next subsection. The architecture of the E-mail Filter benchmark is shown in Fig. 7. The events in the input e-mails stream had eight fields iij_timestamp, fromAddress, toAddresses, ccAddresses, bccAddresses, subject, body, regexstr where all the fields were Strings except iij_timestamp which was long type. We formatted the toAddresses and ccAddresses fields to have only single e-mail address to support HElib evaluations. The criterion for filtering out e-mails was to filter by the e-mail addresses lynn.blair@enron.com and richard.hanagriff@enron.com. The filtering SiddhiQL statement can be stated as in Listing 2,
EDGAR Filter Benchmark
We developed another benchmark based on a HTTP log dataset published by Division of Economic and Risk Analysis (DERA) [11]. The data provide details of the usage of publicly accessible EDGAR company filings in a simple but extensive manner [11]. Each record in the dataset consists of 16 different fields; hence, each event sent to the benchmark had 16 fields (iij_timestamp, ip, date, time, zone, cik, accession, extension, code, size, idx, norefer, noagent, find, crawler, and browser). Similar to the E-mail Filter benchmark, all of the fields except iij_timestamp were strings. Out of these fields, we used noagent field by adding lengthy string of 1024 characters to the existing value, in order to increase the events’ size. (Note that we have done the same for all the EDGAR benchmarks described in this paper.) The architecture of EDGAR filter benchmark is shown in Fig. 8.
The EDGAR benchmark was developed with the aim of implementing filtering support. Basic criterion was to filter out EDGAR logs, which satisfies the conditions shown in Listing 3.
Most of the EDGAR log events were the same, and the logs did not have any data rate variation inherently. Therefore, we introduced varying data rate by publishing events in different TPS values according to a custom-defined function.
EDGAR Comparison Benchmark
Using the same EDGAR dataset, we developed EDGAR Comparison benchmark to evaluate the performance [10] of homomorphic comparison operation. In the EDAGR Comparison benchmark, we have changed the input format of the zone and find fields to integer (Int) in order to do comparison operations. Since we are doing only bitwise operations, we limited the HElib message space to 2, in order to use only 0s and 1s. Therefore, maximum length for encrypting field when we used message space as 2 was 168, and we used composite event size as 168 when sending to public Siddhi server. The architecture of EDGAR Comparison benchmark is similar to the topology shown in Fig. 7. Basic criterion is to filter out EDGAR logs, which satisfies the following conditions (see Listing 4).
EDGAR Add/Subtract Benchmark
In EDGAR Add/Subtract benchmark, we have changed the input format to an Integer, for code, idx, norefer, and find fields in order to support add/subtract operations. The corresponding Siddhi query which depicts the addition and subtract operations conducted by this benchmark is shown in Listing 5.
The architecture of EDGAR Add/Subtract benchmark is shown in Fig. 9. Note that EDGAR Multiply benchmark also has similar architecture although Q2 and Q5 operators conduct multiply operations instead.
EDGAR Multiply Benchmark
In EDGAR Multiply benchmark, we have changed the input format to an Integer, for ‘code’ and ‘idx’ fields. As in EDGAR filter benchmark, here also we add lengthy string of 1024 characters to the existing value of ‘noagent’ field, in order to increase the packet size. We multiply code field by 2 and idx field by 3. The corresponding Siddhi query which depicts the multiplication operation done by this benchmark is shown in Listing 6. The architecture of EDGAR Multiply benchmark is shown in Fig. 10.