International Journal of Information Security

, Volume 14, Issue 6, pp 531–548 | Cite as

Secure floating point arithmetic and private satellite collision analysis

Open Access
Regular Contribution

Abstract

In this paper, we show that it is possible and, indeed, feasible to use secure multiparty computation (SMC) for calculating the probability of a collision between two satellites. For this purpose, we first describe basic floating point arithmetic operators (addition and multiplication) for multiparty computations. The operators are implemented on the \({\textsc {Sharemind}}\) SMC engine. We discuss the implementation details, provide methods for evaluating example elementary functions (inverse, square root, exponentiation of \(e\), error function). Using these primitives, we implement a satellite conjunction analysis algorithm and give benchmark results for the primitives as well as the conjunction analysis itself.

Keywords

Secure multi-party computation Secure floating point arithmetic  Satellite collision analysis Benchmark results 

1 Introduction

The Earth is orbited by nearly 7,000 spacecraft,1 orbital debris larger than 10 cm are routinely tracked and their number exceeds 21,000.2 It is understandable that countries do not want to reveal orbital information about their more strategic satellites. On the other hand, satellites are a big investment, and it is in every satellite owner’s interest to keep their property intact. Currently, approximate information about the whereabouts and orbits of satellites is available. These data can be analyzed to predict collisions and hopefully react to the more critical results.

However, in 2009, two communications satellites belonging to the US and Russia collided in orbit [23]. Based on the publicly available orbital information, the satellites were not supposed to come closer than half a kilometer to each other at the of the collision. As there are many orbital objects and satellite operators, an all-to-all collaboration between parties is infeasible. However, public data are precise enough to perform a prefilter and exclude all pairs of objects that cannot collide at least within a given period of time (e.g., the next 24 h). Once the satellite pairs with a sufficiently high collision risk have been found, the satellite operators should exchange more detailed information and determine if a collision is imminent and decide if the trajectory of either object should be modified. This is similar to today’s process, where the Space Data Center performs collision predictions on 300 satellite pairs twice a day [16].

We show that secure multiparty computation (SMC) can be used as a possible solution for this problem. We propose a secure method that several satellite operators can jointly use for determining if the orbital objects operated by them will collide with each other. SMC allows satellite operators to secret share data about their satellites so that no party can see the individual values, but collision analysis can still be conducted. The solution provides cryptographic guarantees for the confidentiality of the input trajectories.

To be able to implement the collision analysis algorithm in a privacy-preserving way, we first provide floating point arithmetic for general multiparty computations and present an implementation of floating point operations in an SMC setting based on secret sharing. We build the basic arithmetic primitives (addition and multiplication), develop example elementary functions (inverse, square root, exponentiation of \(e\), error function), and present benchmarks for the implementations. As a concrete instantiation of the underlying SMC engine, we have chosen Sharemind [7], as it provides both excellent performance and convenient development tools [17]. However, all the algorithms developed are applicable for general SMC platforms in which the necessary technical routines are implemented. Using these available resources, we finally implemented one possible collision probability computation algorithm, and we give the benchmark results for this implementation.

2 Preliminaries

2.1 Secure multiparty computation

In this subsection, we give an overview of SMC.

Secure multiparty computation (SMC) with \(n\) parties \(P_1,\ldots ,P_n\) is defined as the computation of a function \(f(x_1,\ldots ,x_n) = (y_1,\ldots ,y_n)\) so that the result is correct and the input values of all the parties are kept private. Every party \(P_i\) will provide an input value \(x_i\) and learn only the result value \(y_i\). For some functions \(f\) and some input values, it might be possible to deduce other parties’ inputs from the result, but in the case of data aggregation algorithms, it is generally not possible to learn the inputs of other parties from the result.

In this paper, we concentrate on SMC methods based on secret sharing—also called share computing techniques. Share computing uses secret sharing for the storage of data.

Definition 1

Let \(s\) be the secret value. An algorithm \(\mathbf {S}\) defines a \(k\)-out-of-\(n\)threshold secret sharing scheme, if it computes \(\mathbf {S}(s) = (s_1, \dots , s_n)\), and the following conditions hold [27]:
  1. 1.

    Correctness:\(s\) is uniquely determined by any \(k\) shares from \(\{s_1, \dots , s_n\}\), and there exists an algorithm \(\mathbf {S^\prime }\) that efficiently computes \(s\) from these \(k\) shares.

     
  2. 2.

    Privacy: Having access to any \(k - 1\) shares from \(\{s_1, \dots , s_n\}\) gives no information about the value of \(s\), i.e., the probability distribution of \(k - 1\) shares is independent of \(s\).

     

We use \([\![s]\!]\) to denote the shared value, i.e., the tuple \((s_1,\ldots ,s_n)\).

The data storage process with three SMC parties is illustrated in Fig. 1. Data are collected from data donors and sent to the three SMC parties called miners. A data donor distributes the data into \(n\) shares using secret sharing and sends one share of each value to a single miner. This separation of nodes into input nodes (donors) and SMC nodes (miners) is useful, as it does not force every party in the information system to run SMC protocols. This reduces the complexity of participating in the computation and makes the technology more accessible.
Fig. 1

The input nodes connect to all SMC nodes and store their data using secret sharing

After the data have been stored, the SMC nodes can perform computations on the shared data. Notice that none of the SMC nodes can reconstruct the input values thanks to the properties of secret sharing. We need to preserve this security guarantee during computations. This is achieved by using SMC protocols that specify which messages the SMC nodes should exchange in order to compute new shares of a value that corresponds to the result of an operation with the input data.

Figure 2 shows the overall structure of SMC. Assume, that the SMC nodes have shares of input values \(u\) and \(v\) and want to compute \(w = u \odot v\) for some operation \(\odot \). They run the SMC protocol for operation \(\odot \) which gives each SMC node one share of the result value \(w\). Note that the protocols do not leak information about the input values \(u\) and \(v\). For details and examples on how this can be achieved, see the classical works on SMC [5, 9, 13, 30].
Fig. 2

The SMC nodes can run SMC protocols to compute useful results from secret-shared data

After computations have been finished, the results are published to the client of the computation. The SMC nodes send the shares of the result values to the client node that reconstructs the real result from the shares, see Fig. 3. Note that it is important not to publish the results when they could still leak information about the inputs. The algorithms running in the SMC nodes must be audited to ensure that they do not publish private inputs. However, note that even if some nodes publish their result shares too early, they cannot leak data unless their number exceeds the threshold \(k\).
Fig. 3

Results are published after the computation is complete

Several SMC frameworks have implementations [14, 21, 28, 29]. While the majority of implementations are research oriented, it is expected that more practical systems will be created as the technology matures.

2.2 The Sharemind platform

\({\textsc {Sharemind}}\) [7] is an implementation of privacy-preserving computation technology based on SMC. In the most recent version of \({\textsc {Sharemind}}\), the users can choose which underlying SMC method suits them best. For each available method, some basic operations (e.g., addition, multiplication) have been implemented. More complex operations (e.g., division) often rely on these basic operations. If the basic operations are universally composable, it is possible to use them to implement the complex operations without additional effort. Knowing this, we can design protocols for these complex operations without having a specific SMC method in mind. Hence, the operations can be used with any kind of underlying method as long as it supports the basic operations that we need for the protocol. Some complex protocols, on the other hand, require us to know what the underlying schemas are and must, therefore, be redesigned for each underlying SMC method.

The default protocol suite in \({\textsc {Sharemind}}\) relies on the three-party additive secret sharing scheme working on \(n\)-bit integers, \(n \in \{8, 16, 32, 64\}\), i.e., each value \(x\) is split into three shares \(x_1,x_2,x_3\) so that \(x_1+x_2+x_3=x\,\hbox {mod}\, 2^{n}\) and \([\![x]\!]=(x_1,x_2,x_3)\). In this article, this is the underlying scheme we use for testing the protocols and for benchmarking results. This is also the scheme for which we have implemented targeted optimizations that are dependent on the format in which the data are stored.

The default protocol suite of \({\textsc {Sharemind}}\) has formal security proofs in the passive security model allowing one corrupt miner. This ensures that the security of inputs is guaranteed as long as the miner servers do not deviate from the protocols and do not disclose the contents of their private database. The protocols are also proven to be universally composable, meaning that we can run them sequentially or in parallel without losing the privacy guarantees. For more details on the security model of \({\textsc {Sharemind}}\), please refer to the PhD thesis [6]. Note that since then, the primitive protocols of \({\textsc {Sharemind}}\) have been updated to be more efficient [8].

3 Collaborative satellite collision probability analysis

3.1 Motivation and state of the art

The first confirmed and well-studied collision between a satellite and another orbital object was the collision of the French reconnaissance satellite Cerise with a fragment of an Ariane rocket body in 1996 [25].

The first accidental collision between two intact satellites happened on February 10th, 2009 when the US communications satellite Iridium 33 collided with a decommissioned Russian communications satellite Kosmos 2251 [23]. The two orbital planes intersected at a nearly right angle, resulting in a collision velocity of more than 11 km/s. Not only did the US lose their working satellite, the collision left two distinct debris clouds floating in Earth’s orbit. Even though the number of tracked debris is already high, an even larger amount is expected in case of a body-to-body hit. Preliminary assessments indicate that the orbital lifetimes of many of the debris are likely to be tens of years, and as the debris gradually form a shell about the Earth, further collisions with new satellites may happen.

This event was the fourth known accidental collision between two cataloged objects and produced more than 1,000 pieces of debris [18]. The previous collisions were between a spacecraft and a smaller piece of debris and did not produce more than four pieces of new debris each.

The US Space Surveillance Network (part of the US Strategic Command) is maintaining a catalog of orbital objects, like satellites and space debris, since the launch of early satellites like the Sputnik. Some of the information is available on the Space-Track.org3 web page for registered users. However, publicly available orbital data are often imprecise. For example, an analysis based on the publicly available orbital data determined that the Iridium 33 and Kosmos 2251 would come no closer to each other than half a kilometer at the time of the collision. This was not among the top predicted close approaches for that report nor was it the top predicted close approach for any of the Iridium satellites for the coming week [18].

This, however, was not a fault in the analysis software but rather the quality of the data. As a result of how the orbital data are generated, they are of low fidelity, and are, therefore, of limited value during analysis. If more precise data were acquired directly from satellite operators and governments, and the collision analysis process were agreed upon, these kinds of collisions could be avoided [19].

3.2 Underlying math

Algorithm 1 describes how to compute the probability of collision between two spherical objects and is based on [2, 3, 16]. As input, the satellite operators give the velocity vectors \(\mathbf{v} \in \mathbb {R}^{3}\), covariance matrices \({\mathbf {C}} \in \mathbb {R}^{3 \times 3}\), position vectors \(\mathbf{p} \in \mathbb {R}^{3}\) and radii \(R \in \mathbb {R}\) of two space objects. First, on line 2, we define an orthogonal coordinate system in terms of the velocity vectors and their difference. The resulting \(\mathbf{j}\)-\(\mathbf{k}\) plane is called the conjunction plane. We consider space objects as spheres but as all the uncertainty in the problem is restricted to the conjunction plane, we can project this sphere onto the conjunction plane (line 4) and treat the problem as 2-dimensional [2].

Next, we diagonalize the matrix \(C\) to choose a basis for the conjunction plane where the basis vectors are aligned with the principal axes of the ellipse (lines 5–9). We go on to project the center of the hardbody (\(\mathbf{p}_\mathbf{b}-\mathbf{p}_\mathbf{a}\)) onto the encounter plane and then put it in the eigenvector basis (line 10) to gain the coordinates \((x_m, y_m)\). Finally, the probability of collision is computed on line 12 [3].

3.2.1 Vector and matrix operations

To implement the analysis given in Algorithm 1, we need to compute the unit vector, dot product and the cross product for vectors of length \(3\). Furthermore, we need matrix multiplication for two-dimensional matrices and, also, the computation of the eigensystem for \(2 \times 2\) symmetric matrices. We only need to consider the eigensystem calculation for symmetric \(2 \times 2\) matrices, so specializing the general method [22] for \(2\) dimensions is sufficient for our needs. These operations require multiplication, addition, division and the computation of square root.

3.2.2 Computing integrals

In addition to common vector and matrix operations, Algorithm 1 also contains the two-dimensional probability equation for the combined spherical object (line 12)
$$\begin{aligned} p&\leftarrow \frac{1}{2 \pi \sigma _{x} \sigma _{y}} \int _{-R}^{R}\int _{-\sqrt{R^{2} - x^{2}}}^{\sqrt{R^{2} - x^{2}}}\\&\text {exp}\left[ \frac{-1}{2} \left[ \left( \frac{x-x_m}{\sigma _x}\right) ^{2} + \left( \frac{y-y_m}{\sigma _y}\right) ^{2} \right] \right] dydx, \end{aligned}$$
where \(R\) is the combined radius of the bodies, \((x_m, y_m)\) is the center of the hardbody in the encounter plane, and \(\sigma _x\) and \(\sigma _y\) are the lengths of the semi-principle axes. Similarly to the article [3], we use Simpson’s one-third rule for numerical integration of this double integral. As a result, we get the following equation:
$$\begin{aligned} p \approx \frac{\varDelta x}{3 \sqrt{8 \pi } \sigma _{x}}&\left[ f(0) + f(R) + \sum ^{n}_{i = 1} 4f((2i - 1)\varDelta x)\right. \\&\quad \left. + \sum ^{n-1}_{i = 1} 2f(2i\varDelta x) \right] , \end{aligned}$$
where \(\varDelta x = \frac{R}{2n}\) and the integrand is
$$\begin{aligned} f(x)&= \!\left[ \text {erf}\left( \frac{y_m \!+\! \sqrt{R^{2} \!-\! x^{2}}}{\sqrt{2} \sigma _y} \right) + \text {erf}\left( \frac{\!-\!y_m \!+\! \sqrt{R^{2} - x^{2}}}{\sqrt{2} \sigma _y} \right) \right] \\&\times \left[ \text {exp}\left( \frac{-(x+x_m)^{2}}{2\sigma ^{2}_x} \right) + \text {exp}\left( \frac{-(-x+x_m)^{2}}{2\sigma ^{2}_x} \right) \right] . \end{aligned}$$
For further details, see [3]. To compute the collision probability, we require multiplication, addition, division, square root, exponentiation of \(e\) and the computation of the error function.

3.3 Privacy-preserving solution using secure multiparty computation

The locations and orbits of communications satellites are sensitive information and governments or private companies might not be willing to reveal these data. One possible solution is to use a trusted third party who will gather all the data and perform the analysis. This, however, still requires the satellite owners to reveal their information to an outside party. While this could be done within one country, it is doubtful that different governments are willing to disclose the orbits of their satellites to a single third party that is inevitably connected with at least one country.

We propose using SMC instead of a trusted third party. Figure 4 describes the general idea of our solution. Firstly, the satellite operators choose three independent parties as the data hosts. These organizations will either host the secure computation system themselves or outsource it to a secure hosting provider. The hosts must be chosen such that the probability of collision between them is negligible. Next, the operators secret share their data and upload the shares to the three hosts. Collision analysis is performed in collaboration between the tree hosting parties, and the satellite operators can query the results of the analysis.
Fig. 4

Privacy-preserving collision analysis using secure multiparty computation

We need the entire trajectory of the satellite to compute the collision probability. So, for example, sharing only the difference between the publicly available trajectory and the precise one does not reduce the amount of work we have to perform, in fact it is likely to increase it as we would need to combine the imprecise public data with the precise secure difference and only then start the computation.

4 Secure floating point operations

4.1 Related work

Integer data domains are not well suited for scientific and engineering computations, where rational and real numbers allow achieving a much greater precision. Catrina et al. [10, 11, 12] took the first steps in adapting real number arithmetic to secure multi-party computation by developing secure fixed point arithmetic and applying it to linear programming. In 2011, Franz and Katzenbeisser [15] proposed a solution for the implementation of floating point arithmetic for secure signal processing. However, their approach relies on two-party computations working over garbled circuits, and they do not provide the actual implementation nor any benchmarking results. In 2012, Aliasgari et al. [4] designed and evaluated floating point computation techniques and protocols for square root, logarithm and exponentiation in a standard linear secret sharing framework.

4.2 Representation of floating point numbers

The most widely used standard for floating point arithmetic is IEEE 754 [1]. To reach our goal of implementing the arithmetic, we will also make use of Donald Knuth’s more generic presentation given in Sect. 4.2 of the book [20] and ideas used in FPGA implementations [24].

The majority of the available approaches split the representation of a floating point number \(N\) into several parts.
  • Radix (base) \(b\) (usually \(2\) or \(10\)).

  • Sign\(s\) (e.g., having the value \(0\) or \(1\) corresponding to signs plus and minus, respectively).

  • Significand\(f\) representing the significant digits of \(N\). This number is assumed to belong to a fixed interval like \([1,2)\) or \([1/b,1)\).

  • Exponent\(e\) showing the number of places the radix point of the significand needs to be shifted in order to produce the number \(N\).

  • Bias (excess) \(q\) is a fixed number used to make the representation of \(e\) nonnegative; hence, one would actually use \(E\) such that \(e=E-q\) for storing the exponent. For example, for IEEE 754 double-precision arithmetic, it is defined that \(q=1{,}023\), \(e\in [-1{,}022,1{,}023]\) and \(E\in [1,2{,}046]\). The value \(E=0\) is reserved for representing \(N=0\) and some other very small values. The value \(E=2{,}047\) is reserved for special quantities like infinity and Not a Number (NaN) occurring as a result of illegal operations (e.g., \(0/0\) or \(\sqrt{-1}\)).

Using these notations, we have a generic representation for floating point numbers
$$\begin{aligned} N=(-1)^s\cdot f \cdot b^{E-q}. \end{aligned}$$
The IEEE 754 standard is clearly targeted toward explicitly available data, where the computing entity has access to all the bits of the representation. This allows for several optimizations, e.g., putting the sign, significand and exponent together into one machine word, or leaving the leading digit of the significand out of the representation, as it can be assumed to be \(1\) and, hence, carries no information. Before real computations on such numbers can start, a special procedure called unpacking is required, accompanied by its counterpart packing to restore the representation after the computation is over.

In case of a secret-shared representation; however, access to the bits is nontrivial and involves a lot of computation for bit extraction [7, 8]. An alternative would be storing the values shared bitwise, but this would render the underlying processor arithmetic unusable, and we would need to reimplement all the basic arithmetic operations on these shares. At this point, we do not rule this possibility out completely, but this approach needs good justification before actual implementation.

In order to save time on bit extraction in the course of packing and unpacking, we will store the crucial parts of the floating point numbers (sign, significand and exponent) in separately shared values \(s\), \(e\) and \(f\) that have \(8\), \(16\) and \(32\) bits, respectively. We also allow double-precision floating point values, where the sign and exponent are the same as for \(32\)-bit floating point values, but the significand is stored using \(64\) bits.

The (unsigned) significand \(f\) will represent the real number \(\overline{f}=f/2^{32}\). In other words, the highest bit of \(f\) corresponds to \(1/2\), the second one to \(1/4\), etc. In order to obtain real floating point behavior, we require that the representation is normalized, i.e., that the highest bit of \(f\) is always \(1\) (unless the number to be represented is zero, in which case \(f=0\), \(s\) corresponds to the sign of the value, and the biased exponent \(e = 0\)). Hence, for all nonzero floats, we can assume that \(f\in [1/2,1)\). With our single-precision floating point values, we achieve \(32\)-bit significand precision, which is somewhere in between IEEE single and double precision. The double-precision floating point values achieve \(64\)-bit significand precision, which is more than the IEEE double precision.

The sign \(s\) can have two values: \(1\) and \(0\) denoting plus and minus, respectively. Note that we are using a convention opposite to the standard one to implement specific optimizations. As we are using a \(16\)-bit shared variable \(e\) to store the exponent, we do not need to limit the exponent to \(11\) bits like in the standard representation. Instead we will use \(15\) bits (we do not use all \(16\) because we want to keep the highest bit free), and we introduce a bias of \(2^{14} - 1\).

In a typical implementation of floating point arithmetic, care is taken to deal with exponent under- and overflows. In general, all kinds of exceptional cases (including under- and overflows, division by zero, \(\sqrt{-1}\)) are very problematic to handle with secret-shared computations, as issuing any kind of error message as a part of control flow leaks information (e.g., the division by zero error reveals the divisor).

We argue, that in many practical cases, it is possible to analyze the format of the data, based on knowledge about their contents. This allows the analysts to know that the data fall within certain limits and construct their algorithms accordingly. This makes the control flow of protocols independent of the inputs and preserves privacy in all possible cases. For example, consider the scenario, where the data are gathered as \(32\)-bit integers and then the values are cast to floating point numbers. Thus, the exponent cannot exceed \(32\), which means that we can multiply \(2^{9}\) of such values without overflowing the maximal possible exponent \(2^{14}\). This approach to exceptions is basically the same as Strategy 3 used in the article [15].

4.3 Multiplication

The simplest protocol for floating point arithmetic is multiplication, as the underlying representation of floating point numbers is multiplicative. The routine for multiplying two secret-shared floating point numbers is presented in Algorithm 2. In the following, we will use the notation \([\![N]\!]\) to denote the triple \(([\![s]\!],[\![f]\!],[\![e]\!])\). Throughout this and several following protocols, we will make use of primitives described in [8]. Recall, that we do not check for any exponent under- or overflows during multiplication.
The general idea of the algorithm is natural. The sign of the product is determined as an equality bit of the arguments (line 1) and can be computed as
$$\begin{aligned} {\mathsf {Eq}}{([\![s]\!],[\![s']\!])}&= [\![(1-s)\cdot (1-s')]\!]+[\![s \cdot s']\!] \\&= 1-[\![s]\!]-[\![s']\!]+2\cdot [\![s\cdot s']\!]. \end{aligned}$$
The exponent of the result is the sum of the exponents of the arguments (line 2), and the significands are multiplied. However, if we simply multiplied the representations of the significands, we would obtain a wrong result. For example, our reference framework \({\textsc {Sharemind}}\) uses \(n\)-bit integers (\(n \in \left\{ 32,64\right\} \)) and arithmetic modulo \(2^{n}\). Hence, out of the \(2n\) product bits, we obtain the lowest \(n\) after implicit modular reduction. These, however, are the least significant bits for out floating point significand representation, and we actually need the highest \(n\) bits instead. Therefore, we need to cast our additively shared \(n\)-bit values temporarily to \(2n\) bits and truncate the product to \(n\) bits again. Both of these operations are nontrivial, as we cannot ignore the overflow bits while casting up nor can we just throw away the last bits when truncating, and these cases are rather troublesome to handle obliviously.

For the casting on line 3, keep in mind that as Sharemind uses modular arithmetic, then while casting the significand up, we need to account for the overflow error generated by the increase in the modulus. For instance, if we are dealing with \(32\)-bit floats, the significand is \(32\) bits. When we cast it to \(64\) bits, we might get an overflow of up to 2 bits. To deal with this problem, we check obliviously whether our resulting \(64\) bit float is larger than \(2^{32}\) and \(2^{33}\) and adjust the larger value accordingly.

Truncation on line 4 can be performed by deleting the lower \(n\) bits of each share. This way we will lose the possible overflow that comes from adding the lower bits. This will result in a rounding error, so we check whether the lower \(n\) bits actually generate overflow. We do this similarly to the up-casting process—by checking whether the sum of the three shares of the lower \(n\) bits is larger than \(2^{32}\) and \(2^{33}\). Then, we truncate the product of the significands and obliviously add \(0\), \(1\) or \(2\) depending on the overflow we received as the result of the comparison.

For nonzero input values, we have \(f,f'\in [1/2,1)\), and hence \(f''=f\cdot f'\in [1/4,1)\) (note that precise truncation also prevents us from falling out of this interval). If the product is in the interval \([1/4,1/2)\), we need to normalize it. We detect the need for normalization by the highest bit of \(f''\) which is \(0\) exactly if \(f\in [1/4,1/2)\). If so, we need to shift the significand one position left (line 7) and reduce the exponent by \(1\) (line 8).

The \({\textsc {Sharemind}}\) system does not hide the execution flow of protocols, so whenever if-then statements have secret-shared values in the condition, an oblivious choice is used in the implementation. For example, the statement on line 6 is implemented using the following oblivious choice:
$$\begin{aligned}&\!\!\![\![f'']\!] \leftarrow [\![\lambda \cdot f'']\!] + [\![(1-\lambda )\cdot (f'' \ll 1)]\!], \\&\!\!\![\![e'']\!] \leftarrow [\![e'']\!] - 1 +[\![\lambda ]\!]. \end{aligned}$$
A general way for obliviously choosing between two inputs is given in Algorithm 3. In case of floating point numbers, we perform three oblivious choices separately for each component \(s\), \(f\) and \(e\). This is faster and more precise than multiplying floating point numbers with one and zero.

4.4 Addition

Because of the multiplicative representation of floating point numbers, addition is more difficult to implement than multiplication and requires us to consider more special cases.

The overall structure of the addition routine is presented in Algorithm 4. In order to add two numbers, their decimal points need to be aligned. The size of the required shift is determined by the difference of the exponents of the inputs. We perform an oblivious switch of the inputs based on the bit \([\![e']\!]\mathop {\ge }\limits ^{?}[\![e]\!]\), so that in the following, we can assume that \([\![e']\!]\ge [\![e]\!]\) (line 1). Our reference architecture Sharemind uses \(n\)-bit integers to store the significands, hence if \([\![e']\!]-[\![e]\!]\ge n\), adding \(N\) is efficiently as good as adding \(0\), and we can just output \([\![N']\!]\) (line 2). Unfortunately, in the implementation, we cannot halt the computation here, as the control flow must be data independent, but this check is needed for the following routines to work correctly. Instead of returning, the result of the private comparison is stored, and an oblivious choice is performed at the end of the protocol to make sure that the correct value is returned.

On line 5, we shift one of the significands to align the decimal points and adjust the exponent. Moving on to the if-else statement on lines 6–22, the operation performed on the significands depends on the sign bits. If they are equal (lines 6–13), we need to add the significands, and otherwise subtract them. When adding the significands, the result may exceed \(1\), so we may need to shift it back by one position. In order to achieve this functionality, we use temporary casting to \(2n\) bits similarly to the process in Algorithm 2 for multiplication.

If the sign bits differ, we need to subtract the smaller significand from the larger one. As a result, we may end up with a value \(<1/2\), in which case we need to normalize the value (lines 15–22). The if-else statement on lines 17–20 can be computed as
$$\begin{aligned}{}[\![f'']\!]&= [\![b \cdot (f'-f)]\!]+[\![(1-b) \cdot (f-f')]\!] \\&= 2\cdot [\![b \cdot (f'-f)]\!]+[\![f]\!]-[\![f']\!]. \end{aligned}$$
Note that the right shift used on line 5 of Algorithm 4 must be done obliviously as the value has to be shifted by a private number of bits. Oblivious shift right can be accomplished by using Algorithm 5. First, we compute all the possible \(n\) right-shifted values in parallel. Next, we extract the bits of the shift using the \({\mathsf {BitExtr}}\) routine from [8]. Recall that \(p\in [0,n-1]\), so we obtain \(\log _2(n)\) bits. Next, based on these \(\log _2(n)\) bits we compose a characteristic vector consisting of \(n\) shared bits, of which most have value \(0\) and exactly one has value \(1\). The value \(1\) occurs in the \(p\)-th position of the vector. Finally, the final result can be obtained by computing the dot product between the vector of shifted values and the characteristic vector.

Normalization of the significand and the exponent on line 22 of Algorithm 4 can be performed by using Algorithm 6. It works with a similar principle as Algorithm 5 and performs an oblivious selection from a vector of elements containing all the possible output values. The selection vector is computed using the \({\mathsf {MSNZB}}\) (most significant nonzero bit) routine from [8] which outputs \(n\) additively shared bits, at most one being equal to \(1\) in the position corresponding to the highest nonzero bit of the input. The vectors from which we obliviously select consist of all the possible shifts of the significands (lines 2 and 3) and the corresponding adjustments of the exponents (line 4).

5 Elementary functions

In this section, we present algorithms for four elementary functions, namely inversion, square root, exponentiation of \(e\) and error function. All of them are built using Taylor series expansion, and we also give a version using the Chebyshev polynomial for inversion and the exponentiation of \(e\). Polynomial evaluations are sufficiently robust for our purposes. They allow us to compute the functions using only additions and multiplications, and do so in a data-independent manner. Taylor series expansion also gives us a convenient way to have several trade-offs between speed and precision.

5.1 Inversion

We implement division by first taking the inverse of the divisor and then multiplying the result with the dividend. We compute the inverse by using Taylor series:
$$\begin{aligned} \frac{1}{x} = \sum _{n=0}^{\infty }{(1 - x)^{n}} . \end{aligned}$$
Algorithm 7 describes how to take the inverse of a shared \(n\)-bit float using Taylor series. As the inverse of a value \(N\) converges best in the interval \((0,1]\), we separate the significand \(f \in [\frac{1}{2},1)\) from the given floating point number and use the following equation:
$$\begin{aligned} \frac{1}{f \cdot 2^{e}} = \frac{1}{f} \cdot 2^{-e} . \end{aligned}$$
(1)
To achieve this, we compose a new floating point value \([\![S]\!]\) based on the significand \([\![f]\!]\) on line 2. Note that we make this new float always positive for optimization purposes and, at the end of the protocol, we reintroduce the sign \(s\) from the input float \([\![N]\!]\) (line 6). When we have evaluated the Taylor series for inverse, we correct the exponent based on equality (1) by subtracting the previous exponent from the exponent of the sum \([\![S']\!]\) (line 5).

As Sharemind works fastest with parallelised computations, we do as much of the Taylor series evaluation in parallel as possible. This parallelization is described in Algorithm 8. Variations of this algorithm can be adopted for the specific polynomial evaluations where needed, but this algorithm gives a general idea of how the process works. We use Algorithm 8 to parallelise polynomial evaluation in all elementary functions we implement.

We need to compute powers of \([\![x]\!]\) from \([\![x^{1}]\!]\) to \([\![x^{p}]\!]\). To do this in parallel, we fill vector \([\![\mathbf{u}]\!]\) with all the powers we have already computed \([\![x^{1}]\!], \ldots , [\![x^{h}]\!]\) and the second vector \([\![\mathbf{v}]\!]\) with the highest power of \([\![x]\!]\) we have computed \([\![x^{h}]\!]\). When we multiply these vectors element-wise, we get powers of \([\![x]\!]\) from \([\![x^{h+1}]\!]\) to \([\![x^{2h}]\!]\) (lines 6–16). If the given precision \(p\) is not a power of \(2\); however, we also need to do one extra parallel multiplication to get the rest of the powers from \([\![x^{h}]\!]\) to \([\![x^{p}]\!]\). This is done on lines 18–29. If the precision is \(<1.5\) times larger than the highest power, it is not necessary to add elements to the vector \([\![\mathbf{u}]\!]\) (checked on line 20).

The if-else statements in Algorithm 8 are based on public values, so there is no need to use oblivious choice to hide which branch is taken. The summation of vector elements on line 35 is done in parallel as much as possible.

As the Taylor series evaluation requires at least \(32\) summation terms to achieve the precision we need, we also offer an alternative algorithm for computing inverse using the corresponding approximated Chebyshev polynomial [26]. Algorithm 9 is similar to Algoritm 7, but instead of evaluating Taylor series, we compute the Chebyshev polynomial using the parallelization from Algorithm 8.

5.2 Square root

Square root is the most complex of the elementary functions we implemented. The Taylor series is given by the equation:
$$\begin{aligned} \sqrt{x} = \sum _{n=1}^{\infty }{\frac{(-1)^{n-1} (2n-2)!}{(1-2n-2)((n-1)!)^{2} 4^{n-1}}\cdot (x-1)^{n-1}} . \end{aligned}$$
The convergence interval of the series for square root is \((0,2)\). Thus, we will compute the square root of the significand and multiply it with a correcting exponential term:
$$\begin{aligned} \sqrt{f \cdot 2^{e}} = \sqrt{f} \cdot {2^{e/2}} . \end{aligned}$$
(2)
Algorithm 10 shows how square root is computed based on Eq. (2). We start by creating a float containing the absolute value of the significand \([\![f]\!]\) of the original value \(N\) and go on by evaluating the Taylor series for \([\![f]\!] - 1\). Line 5 marks the beginning of the computation of the exponent. We first divide the exponent by \(2\) by shifting the exponent right once and adding it to the exponent \([\![e_{S}]\!]\) we received as a result of the Taylor series evaluation. On line 7, we find the lowest bit of the exponent to determine whether it is even or odd. Now, we still have to compensate for the bit we lost during division (lines 8–14). If the exponent is even, we do nothing, otherwise, we multiply the intermediate result \([\![N']\!]\) by \(\sqrt{2}\) or \(\sqrt{\frac{1}{2}}\) depending on the sign of the exponent. The if-then statements on lines 8–14 and 9–13 need to be performed obliviously, as both \([\![b]\!]\) and \([\![e]\!]\) are secret-shared values.

5.3 Exponentiation of \(e\)

Algorithm 11 describes the exponentiation of \(e\) using the Taylor series:
$$\begin{aligned} e^{x} = \sum _{n=0}^{\infty }{\frac{1}{n!} \cdot x^{n}} . \end{aligned}$$

Unfortunately, the larger \([\![N]\!]\) is, the more we have to increase precision \(p\) to keep the result of the Taylor series evaluation from deviating from the real result too much. This, however, increases the working time of the algorithm significantly. To solve this problem, we implemented another algorithm using the separation of the integer and fractional parts of the power \([\![N]\!]\) and using the Chebyshev polynomial to approximate the result [26]. This solution is described in Algorithm 12.

In this case, we exponentiate \(e\) using the knowledge that \(e^{x} = (2^{\log _2 e})^{x} = 2^{(\log _2 e)\cdot x}\). Knowing this, we can first compute \(y = (\log _2 e)\cdot x \approx 1.44269504 \cdot x\) (line 1) and then \(2^{y}\). Notice, that \(2^{y} = 2^{[y] + \{y\}} = 2^{[y]} \cdot 2^{\{y\}}\), where \([y]\) denotes the integer part and \(\{y\}\) the fractional part of \(y\).

First, we need to separate the integer part from the shared value \([\![y]\!] = ([\![s]\!], [\![f]\!], [\![e]\!])\). Let \(n\) be the number of bits of \(f\). In this case, \([\![[|y|]]\!] = [\![f \gg (n - e)]\!]\) (line 3) is true if and only if \([\![e]\!] \le n\). On the other hand, if \([\![e]\!] > n\), then \([\![y]\!] \ge 2^{[\![e]\!] - 1} > 2^{n - 1}\) would also hold, and we would be trying to compute a value that is not smaller than \(2^{2^{n - 1}}\). Hence, we consider the requirement \([\![e]\!] \le n\) to be reasonable.

Note, however, that if we want to use the Chebyshev polynomial to compute \(2^{\{y\}}\), we need to use two different polynomials for the positive and negative case. We would like to avoid this if possible for optimization, hence, we want to keep \({\{y\}}\) always positive. This is easily done if \(y\) is positive, but causes problems if \(y\) is negative. So if \(y\) is negative and we want to compute the integer part of \(y\), we compute it in the normal way and add \(1\) (line 3) making its absolute value larger than that of \(y\). Now, when we subtract the gained integer to gain the fraction on line 4, we always get a positive value. This allows us to only evaluate one Chebyshev polynomial, thus giving us a slight performance improvement.

We construct the floating point value \(2^{[\![[y]]\!]}\) by setting the corresponding significand to be \(100\ldots 0\) and the exponent to be \([\![[y]]\!]+1\). As \([\![\{y\}]\!] \in [0,1]\) holds for the fractional part of \(y\), and we have made sure that the fraction is always positive, we can compute \(2^{[\![\{y\}]\!]}\) using the Chebyshev polynomial on line 7. In this domain, these polynomials will give a result that is accurate to the fifth decimal place. Finally, we multiply \(2^{[\![[y]]\!]} \cdot 2^{[\![\{y\}]\!]}\) to gain the result.

5.4 Error function

We implemented the error function by using the following Taylor series:
$$\begin{aligned} \text {erf}(x) = \frac{2}{\sqrt{\pi }}\sum _{n=0}^{\infty }{\frac{(-1)^{n}}{n!(2n + 1)} \cdot x^{2n + 1}} . \end{aligned}$$
Algorithm 13 describes how to compute the error function using Taylor series.
Note, that the Taylor series for the error function works on odd powers of \(x\). To use the parallelization trick from Algorithm 8, we first compute \(y = x^{2}\) and we get the following equation:
$$\begin{aligned} \text {erf}(x)&= \frac{2}{\sqrt{\pi }}\sum _{n=0}^{\infty }{\frac{(-1)^{n}}{n!(2n + 1)} \cdot x^{2n + 1}} \\&= \frac{2}{\sqrt{\pi }}\cdot x \sum _{n=0}^{\infty }{\frac{(-1)^{n}}{n!(2n + 1)} \cdot x^{2n}} \\&= \frac{2}{\sqrt{\pi }}\cdot x \sum _{n=0}^{\infty }{\frac{(-1)^{n}}{n!(2n + 1)} \cdot y^{n}}. \end{aligned}$$
From this, it is easy to see that we can evaluate the Taylor series for \(y\) and then multiply the result by \(x\).

Note that, in addition, we check whether the exponent is larger or smaller than \(3\), which shows us whether \(x \in [-2, 2]\) and based on the private comparison, we obliviously set the result as \(\pm 1\) according to the original sign \([\![s]\!]\) (line 2). We perform this approximation because at larger values, more cycles are needed for a sufficiently accurate result.

6 Performance of floating point operations

6.1 Benchmark results

We implemented the secure floating point operations described in this paper on the \({\textsc {Sharemind}}\) 3 secure computation system. We built the implementation using the three-party protocol suite already implemented in the \({\textsc {Sharemind}}\) system. Several low-level protocols were developed and optimized to simplify the use and conversion of shared values with different bit sizes. Secure floating point operations are programmed as compositions of low-level integer protocols.

To measure the performance of the floating point operations, we deployed the developed software on three servers connected with fast network connections. More specifically, each of the servers used contains two Intel X5670 2.93 GHz CPUs with 6 cores and 48 GB of memory. The network connections are point-to-point 1 Gb/s connections. We note that during the experiments, peak resource usage for each machine did not exceed 2 cores and 1 GB of RAM. The peak network usage varied during the experiments, but did not exceed 50 Mb/s. Therefore, we believe that the performance can be improved with further optimizations.

Integer operations on \({\textsc {Sharemind}}\) perform more efficiently on multiple inputs [8]. Therefore, we also implemented floating point operations as vector operations. Unary operations take a single input vector and binary operations take two input vectors of an equal size. Both kinds of operations produce a vector result. We performed the benchmarks on vectors of various sizes to estimate how much the size of the vector affects the processing efficiency.

For each operation and input size, we performed 10 iterations of the operation and computed the average. The only exceptions to this rule were operations based on Taylor series or Chebyshev polynomials running on the largest input vector. Their running time was extensively long so we performed only a few tests for each of them. In the presentation, such results are marked with an asterisk.

Tables 1 and 2 present the benchmarking results for floating point numbers with 32-bit significands and 64-bit significands, respectively. We see that with smaller input sizes, there is no difference in the running time of single and double-precision operations. However, for larger input vector sizes, the additional communication starts to slow down the computation as the network channel gets saturated.
Table 1

Performance of single-precision floating point operations

  

Input size in elements

1

10

100

1,000

10,000

Private \(x\)

Add

0.73 ops

7.3 ops

69 ops

540 ops

610 ops

Private \(y\)

Multiply

2.3 ops

23 ops

225 ops

1,780 ops

6,413 ops

Divide (TS)

0.08 ops

0.81 ops

6.8 ops

25.5 ops

38 ops\(^{*}\)

Divide (CP)

\(0.16\) ops

\(1.6\) ops

\(14\) ops

\(59\) ops

\(79\) ops\(^{*}\)

Private \(x\)

Multiply

2.3 ops

23 ops

220 ops

1,736 ops

6,321 ops

Public \(y\)

Multiply by \(2^{k}\)

\(5.9\times 10^{4}\) ops

\(6.1\times 10^{5}\) ops

\(4.2\times 10^{6}\) ops

\(1.1\times 10^{7}\) ops

\(1.4\times 10^{7}\) ops

Divide

2.3 ops

23 ops

221 ops

1,727 ops

6,313 ops

Divide by \(2^{k}\)

\(6.4\times 10^{4}\) ops

\(6.1\times 10^{5}\) ops

\(4.1\times 10^{6}\) ops

\(1.2\times 10^{7}\) ops

\(1.4\times 10^{7}\) ops

Private \(x\)

Find \(e^{x}\) (TS)

0.09 ops

0.9 ops

7.3 ops

23 ops

30 ops\(^{*}\)

Find \(e^{x}\) (CP)

\(0.12\) ops

\(1.2\) ops

\(11\) ops

\(71\) ops

\(114\) ops

Invert (TS)

0.09 ops

0.84 ops

6.8 ops

14.7 ops

35.7 ops\(^{*}\)

Invert (CP)

\(0.17\) ops

\(1.7\) ops

\(15.3\) ops

\(55.2\) ops

\(66.4\) ops

Square root

0.09 ops

0.85 ops

7 ops

24 ops

32 ops\(^{*}\)

Error function

\(0.1\) ops

0.97 ops

8.4 ops

30 ops

39 ops

All performance values are in operations per second (ops). Operations marked with TS and CP use Taylor series or Chebyshev polynomials, respectively. For the exponent function, square root, inversion and division, the number of elements in the Taylor series is 32. For the error function, 16 elements are computed

Table 2

Performance of double-precision floating point operations

  

Input size in elements

1

10

100

1,000

10,000

Private \(x\)

Add

0.68 ops

6.8 ops

60 ops

208 ops

244 ops

Private \(y\)

Multiply

2.1 ops

21 ops

195 ops

1,106 ops

2,858 ops

Divide (TS)

0.08 ops

0.7 ops

3 ops

5.2 ops

12.4 ops\(^{*}\)

Divide (CP)

\(0.15\) ops

\(1.5\) ops

\(12\) ops

\(23\) ops

\(52\) ops\(^{*}\)

Private \(x\)

Multiply

2.1 ops

21 ops

193 ops

1263 ops

2667 ops

Public \(y\)

Multiply by \(2^{k}\)

\(3.3\times 10^{4}\) ops

\(3.2\times 10^{5}\) ops

\(2.1\times 10^{6}\) ops

\(8.5\times 10^{6}\) ops

\(1.3\times 10^{7}\) ops

Divide

2.1 ops

21 ops

194 ops

1223 ops

2573 ops

Divide by \(2^{k}\)

\(6.4\times 10^{4}\) ops

\(6.1\times 10^{5}\) ops

\(4\times 10^{6}\) ops

\(1\times 10^{7}\) ops

\(1.2\times 10^{7}\) ops

Private \(x\)

Find \(e^{x}\) (TS)

0.09 ops

0.8 ops

3.1 ops

4.5 ops

6.2 ops\(^{*}\)

Find \(e^{x}\) (CP)

\(0.11\) ops

\(1.1\) ops

\(9.7\) ops

\(42\) ops

\(50\) ops

Invert (TS)

0.08 ops

0.75 ops

3.6 ops

4.8 ops

10.7 ops\(^{*}\)

Invert (CP)

\(0.16\) ops

\(1.5\) ops

\(11.1\) ops

\(29.5\) ops

\(47.2\) ops

Square root

0.08 ops

0.76 ops

4.6 ops

9.7 ops

10.4 ops\(^{*}\)

Error function

\(0.09\) ops

0.89 ops

5.8 ops

16 ops

21 ops\(^{*}\)

All performance values are in operations per second (ops). Operations marked with TS and CP use Taylor series or Chebyshev polynomials, respectively. For the exponent function, square root, inversion and division, the number of elements in the Taylor series is 32. For the error function, 16 elements are computed

Addition and multiplication are the primitive operations with addition being about five times slower than multiplication. With single-precision floating point numbers, both can achieve the speed of a thousand operations per second if the input vectors are sufficiently large. Based on the multiplication primitive, we also implemented multiplication and division by a public constant. For multiplication, there was no gain in efficiency, because we currently simply classify the public factor and use the private multiplication protocol.

However, for division by a public constant, we can have serious performance gains as we just invert the divisor publicly and then perform a private multiplication. Furthermore, as our secret floating point representation is based on powers of two, we were able to make a really efficient protocol for multiplying and dividing with public powers of two by simply modifying the exponent. As this is a local operation, it requires no communication, and is, therefore, very fast.

Although we isolated the network during benchmarking, there was enough fluctuation caused by the flow control to slightly vary the speeds of similar operations. Although private multiplication, multiplication and division by constant are performed using the same protocol, their speeds vary due to effects caused by networking.

The unary operations are implemented using either Taylor series or Chebyshev polynomials. When available, we benchmarked both cases. In all such cases, Chebyshev polynomials provide better performance, because smaller polynomials are evaluated to compute the function. In general, all these operations have similar performance.

As mentioned in Sect. 5.1, division between two private floating point values is implemented by inverting the second parameter and performing a private multiplication. As inversion is significantly slower than multiplication, the running time of division is mostly dictated by the running time of inversion.

6.2 Difference in the precision of Taylor series and Chebyshev polynomials

In Sect. 5, we give two algorithms for computing inverse and powers of \(e\) based on Taylor series and Chebyshev polynomials. It is easy to see from Tables 1 and 2 that using the Chebyshev polynomial versions to approximate the results gives us better performance.

Figure 5 shows the difference in the precision of these two polynomials. Panel (a) describes the absolute error that computing the inverse using both algorithms produces in the interval \([1/2, 1]\). As can be seen from the figure, the absolute error produced by the evaluation of \(32\) terms of the Taylor series is far \(<\! 10^{-6}\). The maximum error produced by the evaluation of \(8\) terms of the Chebyshev polynomial is not larger than \(2.1\times 10^{-6}\). The algorithm separates the exponent, then works on the significand, and, finally, subtracts the exponent. This means that the absolute error of the Chebyshev polynomial will grow as the value being inverted grows. The error of the Taylor series evaluation is so small that it will start showing itself significantly later.
Fig. 5

Comparison of the precision of the implemented Taylor series and Chebyshev polynomials

Panel (b) shows similar reasoning for exponentiation with the different polynomials. As \(32\) elements of Taylor series for exponentiation will not be enough to produce a precise result after \(x = 13\) and will produce an absolute error of more than \(2\times 10^{6}\) at \(x = 20\), we give this plot in the interval \([0, 15]\) and show the relative error instead of the absolute error. As can be seen from the figure, the relative error produced by evaluation of 5 terms of the Chebyshev polynomial always stays between \(-2.9\times 10^{-6}\) and \(3.4\times 10^{-6}\). Taylor series evaluation is more precise until \(x = 13\) after which the relative error increases. This point can be pushed further by adding more terms to the Taylor series evaluation at the cost of performance.

As expected, Fig. 5 indicates that using the Taylor series evaluation gives us a more precise result but, in larger algorithms, it can be a bottleneck. We propose that the choice of the algorithm be done based on the requirements of the problem being solved—if precision is important, Taylor series evaluation should be used, if speed is important, Chebyshev polynomials are the wiser choice. If the problem requires both speed and precision, it is possible to add terms to the Chebyshev polynomial to make it more precise. In addition, as the error in exponentiation using Taylor series evaluation becomes significant when \(x\) grows, Chebyshev polynomials should be used if it is expected that \(x\) will exceed \(10\). Another option is to raise the number of terms in Taylor series but this will, in turn, increase the running time of the algorithm.

6.3 Benefits of parallelization

The benchmarking results confirm that our implementation of secure floating point operations is more efficient on many simultaneous inputs. The amortized cost of processing a single input or a pair of inputs is reduced as the input size grows. We performed an additional round of experiments to measure the extent of this behavior. We ran the addition and multiplication operations with a wider range of steps to determine the optimal number of parallel additions and multiplications. We fitted the results with two linear regressions and plotted the result. The resulting plot for single-precision floating point operations is in Fig. 6. The corresponding plot for double-precision numbers is in Fig. 7.
Fig. 6

The efficiency of single-precision addition and multiplication in relation to input size

Fig. 7

The efficiency of double-precision addition and multiplication in relation to input size

Note that the running time of the secure computation protocols grows very little until a certain point, when it starts growing nearly linearly in the size of the inputs. At this point, the secure computation protocols saturate the network link and are working at their maximum efficiency. Addition is a more complex operation with more communication and; therefore, we can perform less additions before the network link becomes saturated. Addition takes even more communication with double-precision numbers so it is predictable that the saturation point occurs earlier than for single-precision values. Multiplication uses significantly less network communication, so its behavior is roughly similar with both single- and double-precision numbers.

Based on these observations, we decided to use vector operations in our collision probability computation for both primitives and the main algorithm. To validate this direction, we measured the performance of the matrix product function with several sequential executions versus one or more parallel executions. In the sequential case, a number of matrix multiplications were performed one after the other. In the parallel case, they were performed simultaneously, using the efficiency of vector operations as much as possible. The results are given in Table 3.
Table 3

Sequential and parallel performance of the matrix product function

  

Number of matrix multiplications

1 (s)

10 (s)

100 (s)

1,000 (s)

\(3\times 3\)

Sequential

3.2

31.9

318.5

3,212.3

Matrix

Parallel

3.2

3.3

5.2

45.7

\(5\times 5\)

Sequential

4.6

46.3

466

4,641.4

Matrix

Parallel

4.6

5.1

20.4

91.9

\(10\times 10\)

Sequential

6.4

65.9

674.6

6,624.2

matrix

Parallel

6.6

15.7

152

1,189.2

Matrix multiplication was performed using single-precision operations. For each matrix size, 10 iterations were performed and the average of the results was computed. Durations are given in seconds

We see that, for a single input, there is no significant difference between the running time of the sequential and parallel version. This is logical, as there is nothing to vectorize. However, as the number of multiplications grows, parallel execution becomes dramatically more efficient. However, when the network saturates, the performance of the parallel case also become more linear in the number of inputs, as illustrated by the \(10\times 10\) case. Even then, the parallel case is much faster than the sequential case. This justifies our decision to use vectorization to the highest possible degree. We note that this increase in efficiency comes at the cost of needing more memory. However, we can control the size of our inputs and balance vectorization and memory use.

7 Implementation of collision analysis

7.1 Reusable components

We implemented the collision analysis described in Algorithm 1 for Sharemind in its language SecreC. This allows us to easily do parallel operations on vectors as it has syntax for element-wise operations on vectors. We first implemented the algorithm and its component methods for one satellite pair using as much parallelization as possible and then went on to parallelise the solution further for \(n\) satellite pairs.

One of the most important values of our code is its reusability. The vector and matrix operations we have implemented can later be used in other algorithms similarly to the modular design of software solutions that is the norm in the industry. This makes it convenient to implement other algorithms that make use of vector and matrix operations. In addition, the operations have already been parallelised so future developers will not have to put time into this step of the coding process in SecreC.

7.2 Vector and matrix operations

We implemented and parallelised a number of vector and matrix operations:
  • Vector length, unit vector and dot product for vectors of any size;

  • Cross product for vectors of size \(3\);

  • Matrix product for two-dimensional matrices; and

  • Eigensystem computation for \(2 \times 2\) matrices.

We implemented simple and parallel methods for computing the unit vector and dot product for vectors of any length. We also implemented the cross product for vectors of length \(3\) as we are working in three-dimensional space.

Similarly, we implemented simple and parallel methods for matrix multiplication for two-dimensional matrices. We also provide a slight optimization for the case of diagonal matrices as the covariance matrices we work with are diagonal. This does not provide a very large speedup in the case of one satellite pair but if many pairs are analyzed in parallel, then the benefit is worth the implementation of such a conditional method. We also implemented the eigensystem computation for \(2 \times 2\) matrices.

7.2.1 Dot product

Let us take a closer look at the code of the dot product. We use this example to show the difference between the normal and the parallelised versions of the algorithm. Figure 8 shows the code for a simple dot product of two vectors \([\![\mathbf{x}]\!]\) and \([\![\mathbf{y}]\!]\). Firstly, the vectors \([\![\mathbf{x}]\!]\) and \([\![\mathbf{y}]\!]\) are multiplied element-wise and then the elements of the resulting vector \([\![\mathbf{product}]\!]\) are summed together.
Fig. 8

Simple code for dot product

The method from Fig. 9 gets as input two matrices \([\![{\mathbf {x}}]\!]\) and \([\![{\mathbf {y}}]\!]\). We assume that these matrices are vectors that contain the data of several vectors, and we assume that they have the same dimensions. The function shape returns a vector the length of which is determined by the dimension of the array it receives as input. In our case, the variable \(\mathbf{xShape}\) is a vector of length \(2\). Let the dimensions of \([\![{\mathbf {x}}]\!]\) and \([\![{\mathbf {y}}]\!]\) be \(m\) and \(n\), then \(xShape = (m,n)\), where \(n\) is the number of elements in the vectors and \(m\) is the number of vectors. As the result, we expect a vector of \(m\) dot products. It is easy to see, that the dimensions of the input and output values of the parallel version are higher by one as compared to the simple version.
Fig. 9

Parallelised code for dot product

We start the parallel dot product computation similarly to the simple version—we multiply the matrices together element-wise. The SecreC language allows us to do this easily by using the multiplication sign on two arrays of the same dimensions. As the result, we receive the matrix \([\![{\mathbf {product}}]\!]\) that has the same dimensions as the input matrices. To use the function sum that allows us to sum together vector elements, we first reshape the matrix \([\![{\mathbf {product}}]\!]\) to a vector. This is another array manipulation method which SecreC provides us. Finally, we use the method that lets us sum vector elements together. However, in the parallel case, we use a different version of the method sum that lets us sum elements together by groups, giving us \(xShape[0] = m\) sums as a result. The second parameter of the method sum must be a factor of the size of the vector, the elements of which are being summed.

The benefit in using the parallel version of the method is in the multiplication and summing sub methods. If we computed the dot product for each satellite pair as a cycle, we would get the products within one pair \([\![{\mathbf{x}_\mathbf{1}} \cdot {\mathbf{y}_\mathbf{1}}]\!], [\![{\mathbf{x}_\mathbf{2}} \cdot {\mathbf{y}_\mathbf{2}}]\!], [\![{\mathbf{x}_\mathbf{3}} \cdot {\mathbf{y}_\mathbf{3}}]\!]\) in parallel but would not be able to use the parallelization for computing many pairs at once. As the multiplication and summation of floating point values is quite time-consuming, parallelization is essential.

7.3 Benchmark results

Finally, we measured the performance of the full privacy-preserving collision analysis implementation. We performed two kinds of experiments. First, we measured the running time of the algorithm on various input sizes to know if it is feasible in practice and if there are efficiency gains in evaluating many satellite pairs in parallel. Second, we profiled the execution of a single instance of collision analysis to study, what the bottleneck operations in its execution are.

The running time measurements are given in Table 4. We differentiated the results based on the precision of the floating point operations and the kind of approximation used for complex operations.
Table 4

Collision analysis performance

  

Number of satellite pairs

1

2

4

8

16

Single

TS

258 (258) s

350 (175) s

488 (122) s

792 (99) s

1,348 (84) s

Precision

CP

204 (204) s

255 (128) s

355 (89) s

557 (70) s

967 (60) s

Double

TS

337 (337) s

447 (223) s

724 (181) s

1127 (141) s

2,064 (129) s

Precision

CP

246 (246) s

340 (170) s

450 (113) s

724 (91) s

1,272 (80) s

Three iterations were performed for each experiment. First, we give the total running time and then the running time per satellite pair in parentheses. We computed 40 iterations in the integral approximation. TS means that only Taylor series were used for elementary functions. CP means that operations based on Chebyshev polynomials were used where available

The running time for a single collision probability computation is 4–5 min and it is reduced 1–2 min with parallel executions. Consider the scenario reported in [16], where the Space Data Center computes the collision probabilities twice a day for 300 satellite pairs. By extrapolating our performance measurements (an average of 90 s per satellite pair), we see that a secure run would be shorter than 8  h. Also, given that we can balance the computation load to multiple secure computation clusters, we can scale the system to process a much larger number of satellites. Therefore, today’s performance is suitable for practical use and there is room for scalability through further optimizations.

We used the built-in profiling features of Sharemind to analyze the breakdown of the execution time. We computed the collision probability of a single satellite pair and measured the duration of each floating point operation in the execution. The results are given in Figs. 10 and 11. Addition also includes the running time of subtraction, because they are implemented using the same protocol. Multiplication also includes division by a public value for the same reason.
Fig. 10

Breakdown of secure floating point operation runtime in collision analysis with Taylor series

Fig. 11

Breakdown of secure floating point operation runtime in collision analysis with Chebyshev polynomials

When we compute collision probabilities using operations based on Taylor series, we see that division and square root are the most expensive operations, followed by the exponent function and addition. Using Chebyshev polynomials greatly reduces the importance of division and the exponent function, leaving square root as the most time-consuming operation in the algorithm.

8 Conclusions and further work

In this paper, we showed how to perform satellite collision analysis in a secure multiparty setting. For this purpose, we first presented routines for implementing floating point arithmetic operations on top of a SMC engine. The general approach is generic, but in order to obtain the efficient real implementation, platform-specific design decisions have been made. In the current paper, we based our decisions on the architecture of \({\textsc {Sharemind}}\) which served as a basic platform for our benchmarks. We also showed how to implement several elementary functions and benchmarked their performance. However, by replacing the specific optimizations with generic algorithms or other specific optimizations, these floating point algorithms can be ported to other secure computation systems. We concluded by implementing the algorithm for computing the probability of satellite collision and benchmarked the performance of this implementation.

The largest impact of this work is the conclusion that it is possible and feasible to perform satellite collision analysis in a privacy-preserving manner.

Footnotes

  1. 1.

    NSSDC Master Catalog, NASA. http://nssdc.gsfc.nasa.gov/nmc/, last accessed May 16, 2013.

  2. 2.

    Orbital Debris Program Office, NASA. http://orbitaldebris.jsc.nasa.gov/faqs.html, last accessed May 16, 2013.

  3. 3.

    Space-Track.org, https://www.space-track.org, last accessed May 16, 2013.

Notes

Acknowledgments

The authors would like to thank Galois Inc for the use of their prototype that works on public data. The authors also thank Nigel Smart for suggesting the use of Chebyshev polynomials, Dan Bogdanov for his help with benchmarking and Sven Laur for coming up with a neat trick for slightly speeding up exponentiation.

References

  1. 1.
    754-2008 IEEE Standard for Floating-Point Arithmetic (2008)Google Scholar
  2. 2.
    Akella, M.R., Alfriend, K.T.: Probability of collision between space objects. J. Guid. Control Dyn. 23(5), 769–772 (2000)CrossRefGoogle Scholar
  3. 3.
    Alfano, S.: A numerical implementation of spherical object collision probability. J. Astronaut. Sci. 53(1), 103 (2005)Google Scholar
  4. 4.
    Aliasgari, M., Blanton, M., Zhang, Y., Steele, A.: Secure computation on floating point numbers. Cryptology ePrint Archive, Report 2012/405. http://eprint.iacr.org/ (2012)
  5. 5.
    Ben-Or, M., Goldwasser, S., Wigderson, A.: Completeness theorems for non-cryptographic fault-tolerant distributed computation. In: STOC ’88: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 1–10 (1988)Google Scholar
  6. 6.
    Bogdanov, D.: Sharemind: programmable secure computations with practical applications. PhD thesis, University of Tartu (2013)Google Scholar
  7. 7.
    Bogdanov, D., Laur, S., Willemson, J.: Sharemind: A framework for fast privacy-preserving computations. In: Computer Security—ESORICS 2008, 13th European Symposium on Research in Computer Security, Málaga, Spain, October 6–8, 2008 Proceedings of LNCS, vol. 5283, pp. 192–206. Springer (2008)Google Scholar
  8. 8.
    Bogdanov, D., Niitsoo, M., Toft, T., Willemson, J.: High-performance secure multi-party computation for data mining applications. Int. J. Inf. Secur. 11(6), 403–418 (2012)CrossRefGoogle Scholar
  9. 9.
    Canetti, R.: Security and composition of multiparty cryptographic protocols. J. Cryptol. 13(1), 143–202 (2000)MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Catrina, O., De Hoogh, S.: Improved primitives for secure multiparty integer computation. In: Proceedings of the 7th International Conference on Security and Cryptography for Networks SCN’10, pp. 182–199. Springer, Berlin, Heidelberg (2010)Google Scholar
  11. 11.
    Catrina, O., De Hoogh, S.: Secure multiparty linear programming using fixed-point arithmetic. In: Proceedings of the 15th European Conference on Research in Computer Security ESORICS’10, pp. 134–150. Springer, Berlin, Heidelberg (2010)Google Scholar
  12. 12.
    Catrina, O., Saxena, A.: Secure computation with fixed-point numbers. In: Proceedings of the 14th International Conference on Financial Cryptography and Data Security FC’10, pp. 35–50. Springer, Berlin, Heidelberg (2010)Google Scholar
  13. 13.
    Chaum, D., Crépeau, C., Damgård, I.: Multiparty unconditionally secure protocols. In: STOC ’88: In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 11–19 (1988)Google Scholar
  14. 14.
    Damgård, I., Pastro, V., Smart, N.P., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO, volume 7417 of lecture notes in computer science, pp. 643–662. Springer, New York (2012)Google Scholar
  15. 15.
    Franz, M., Katzenbeisser, S.: Processing encrypted floating point signals. In: Proceedings of the Thirteenth ACM Multimedia Workshop on Multimedia and Security MM&Sec ’11, ACM, pp. 103–108. New York, NY, USA (2011)Google Scholar
  16. 16.
    Hemenway, B., Welser IV, W., Baiocchi, D.: Achieving higher-fidelity conjunction analyses using cryptography to improve information sharing. RAND Corporation,Technical report (2014)Google Scholar
  17. 17.
    Jagomägis, R.: SecreC: a privacy-aware programming language with applications in data mining. Master’s thesis, Institute of Computer Science, University of Tartu (2010)Google Scholar
  18. 18.
    Kelso, T.: Sean: Initial Analysis of Iridium 33-Cosmos 2251 Collision. Center for Space Standards & Innovation, Technical report (2009)Google Scholar
  19. 19.
    Kelso, T.S., Vallado, D.A., Chan, J., Buckwalter, B.: Improved conjunction analysis via collaborative space situational awareness. Technical report (2008)Google Scholar
  20. 20.
    Knuth, D.E.: The art of computer programming, vol. 2, 3rd edn. Addison-Wesley, Boston (1998)Google Scholar
  21. 21.
    Malkhi, D., Nisan, N., Pinkas, B., Sella, Y.: Fairplay—a secure two-party computation system. In: Proceedings of the 13th USENIX Security Symposium (2004), pp. 287–302 (2004)Google Scholar
  22. 22.
    McQuarrie, D.A.: Mathematical methods for scientists and engineers. University Science Books, Mill Valley (2003)MATHGoogle Scholar
  23. 23.
    NASA: Satellite collision leaves significant debris clouds. Orbital Debris Q. News 13(2), 1–2 (2009)Google Scholar
  24. 24.
    Paschalakis, S., Lee, P.: Double precision floating-point arithmetic on FPGAs. In: 2003 IEEE International Conference on Field-Programmable Technology (FPT), 2003, IEEE, pp. 352–358 (2003)Google Scholar
  25. 25.
    Payne, T.P.: First “confirmed” natural collision between two cataloged satellites. In: Proceedings of the Second European Conference on Space Debris, ESA, vol. ESA-SP 393, pp. 597–600 (1997)Google Scholar
  26. 26.
    Popov, B.A., Tesler, G.S.: Open image in new window Open image in new window Naukova dumka (1984)Google Scholar
  27. 27.
    Shamir, A.: How to share a secret. Commun. ACM 22, 612–613 (1979)MATHMathSciNetCrossRefGoogle Scholar
  28. 28.
    Sharemind development team The Sharemind framework. http://sharemind.cyber.ee (2007)
  29. 29.
    VIFF development team The virtual ideal functionality framework. http://viff.dk (2007)
  30. 30.
    Yao, A.C.-C.: Protocols for secure computations (extended abstract). In: Processidings of FOCS ’82, pp. 160–164 (1982)Google Scholar

Copyright information

© The Author(s) 2014

Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Authors and Affiliations

  1. 1.CyberneticaTartuEstonia
  2. 2.Institute of Computer ScienceUniversity of TartuTartuEstonia

Personalised recommendations