Abstract
This paper gives an overview of my Invited Plenary Lecture at the International Congress of Industrial and Applied Mathematics (ICIAM) in Valencia in July 2019.
Download conference paper PDF
1 Motivation: Privacy in Artificial Intelligence
These days more and more people are taking advantage of cloud-based artificial intelligence (AI) services on their smart phones to get useful predictions such as weather, directions, or nearby restaurant recommendations based on their location and other personal information and preferences. The AI revolution that we are experiencing in the high tech industry is based on the following value proposition: you input your private data and agree to share it with the cloud service in exchange for some useful prediction or recommendation. In some cases the data may contain extremely personal information, such as your sequenced genome, your health record, or your minute-to-minute location.
This quid pro quo may lead to the unwanted disclosure of sensitive information or an invasion of privacy. Examples during the year of ICIAM 2019 include the case of the Strava fitness app which revealed the location of U.S. army bases world-wide, or the case of the city of Los Angeles suing IBM’s weather company over deceptive use of location data. It is hard to quantify the potential harm from loss of privacy, but employment discrimination or loss of employment due to a confidential health or genomic condition are potential undesirable outcomes. Corporations also have a need to protect their confidential customer and operations data while storing, using, and analyzing it.
To protect privacy, one option is to lock down personal information by encrypting it before uploading it to the cloud. However, traditional encryption schemes do not allow for any computation to be done on encrypted data. In order to make useful predictions, we need a new kind of encryption which maintains the structure of the data when encrypting it so that meaningful computation is possible. Homomorphic encryption allows us to switch the order of encryption and computation: we get the same result if we first encrypt and then compute, as if we first compute and then encrypt.
The first solution for a homomorphic encryption scheme which can process any circuit was proposed in 2009 by Gentry [21]. Since then, many researchers in cryptography have worked hard to find schemes which are both practical and also based on well-known hard math problems. In 2011, my team at Microsoft Research collaborated on the homomorphic encryption schemes [8, 9] and many practical applications and improvements [30] which are now widely used in applications of Homomorphic Encryption. Then in 2016, we had a surprise breakthrough at Microsoft Research with the now widely cited CryptoNets paper [22], which demonstrated for the first time that evaluation of neural network predictions was possible on encrypted data.
Thus began our Private AI project, the topic of my Invited Plenary Lecture at the International Congress of Industrial and Applied Mathematics in Valencia in July 2019. Private AI refers to our Homomorphic Encryption-based tools for protecting the privacy of enterprise, customer, or patient data, while doing Machine Learning (ML)-based AI, both learning classification models and making valuable predictions based on such models.
You may ask, “What is Privacy?” Preserving “Privacy” can mean different things to different people or parties. Researchers in many fields including social science and computer science have formulated and discussed definitions of privacy. My favorite definition of privacy is: a person or party should be able to control how and when their data is used or disclosed. This is exactly what Homomorphic Encryption enables.
1.1 Real-World Applications
In 2019, the British Royal Society released a report on Protecting privacy in practice: Privacy Enhancing Technologies in data analysis. The report covers Homomorphic Encryption (HE) and Secure Multi-Party Computation (MPC), but also technologies not built with cryptography, including Differential Privacy (DP) and secure hardware hybrid solutions. Our homomorphic encryption project was featured as a way to protect “Privacy as a human right” at the Microsoft Build world-wide developers conference in 2018 [39]. Private AI forms one of the pillars of Responsible ML in our collection of Responsible AI research and Private Prediction notebooks were released in Azure ML at Build 2020.
Over the last 8 years, my team has created demos of Private AI in action, running private analytics services in the Azure cloud. I showed a few of these demos in my talk at ICIAM in Valencia. Our applications include an encrypted fitness app, which is a cloud service which processes all your workout and fitness data and locations in the cloud in encrypted form, and displays your summary statistics to you on your phone after decrypting the results of the analysis locally. Another application shows an encrypted weather prediction app, which takes your encrypted zip-code and returns encrypted versions of the weather at your location to be decrypted and displayed to you on your phone. The cloud service never learns your location or what weather data was returned to you. Finally, I showed a private medical diagnosis application, which uploads an encrypted version of your Chest X-Ray image, and the medical condition is diagnosed by running image recognition algorithms on the encrypted image in the cloud, and returned in encrypted form to the doctor.
Over the years, my teamFootnote 1 has developed other Private AI applications, enabling private predictions such as sentiment analysis in text, cat/dog image classification, heart attack risk based on personal health data, neural net image recognition of hand-written digits, flowering time based on the genome of a flower, and pneumonia mortality risk using intelligible models. All of these operate on encrypted data in the cloud to make predictions, and return encrypted results in a matter of fractions of a second.
Many of these demos and applications have been inspired by collaborations with researchers in Medicine, Genomics, Bioinformatics, and Machine Learning. We have worked together with finance experts and pharmaceutical companies to demonstrate a range of ML algorithms operating on encrypted data. The UK Financial Conduct Authority (FCA) ran an international Hackathon in August 2019 to combat money-laundering with encryption technologies by allowing banks to share confidential information with each other. Since 2015, the annual iDASH competition has attracted teams from around the world to submit solutions to the Secure Genome Analysis Competition. Participants include researchers at companies such as Microsoft and IBM, start-up companies, and academics from the U.S., Korea, Japan, Switzerland, Germany, France, etc. The results provide benchmarks for the medical research community of the performance of encryption tools for preserving privacy of health and genomic data.
2 What Is Homomorphic Encryption?
I could say, “Homomorphic Encryption is encryption which is homomorphic.” But that is not very helpful without further explanation. Encryption is one of the building blocks of cryptography: encryption protects the confidentiality of information. In mathematical language, encryption is just a map which transforms plaintexts (unencrypted data) into ciphertexts (encrypted data), according to some recipe. Examples of encryption include blockciphers, which take sequences of bits and process them in blocks, passing them through an S-box which scrambles them, and iterating that process many times. A more mathematical example is RSA encryption, which raises a message to a certain power modulo a large integer N, whose prime factorization is secret, \(N=p \cdot q\), where p and q are large primes of equal size with certain properties.
A map which is homomorphic preserves the structure, in the sense that an operation on plaintexts should correspond to an operation on ciphertexts. In practice that means that switching the order of operations preserves the outcome after decryption: i.e. encrypt-then-compute and compute-then-encrypt give the same answer. This property is described by the following diagram:
Starting with two pieces of data, a and b, the functional outcome should be the same when following the arrows in either direction, across and then down (compute-then-encrypt), or down and then across (encrypt-then-compute): \(E(a+b) ~ E(a) + E(b)\). If this diagram holds for two operations, addition and multiplication, then any circuit of AND and OR gates encrypted under map the encryption map E. It is important to note that homomorphic encryption solutions provide for randomized encryption, which is an important property to protect against so-called dictionary attacks. This means that new randomness is used each time a value is encrypted, and it should not be computationally feasible to detect whether two ciphertexts are the encryption of the same plaintext or not. Thus the ciphertexts in the bottom right corner of the diagram need to be decrypted in order to detect whether they are equal.
The above description gives a mathematical explanation of homomorphic encryption by defining its properties. To return to the motivation of Private AI, another way to describe homomorphic encryption is to explain the functionality that it enables. Figure 2 shows Homer-morphic encryption, where Homer Simpson is a jeweler tasked with making jewelry given some valuable gold. Here the gold represents some private data, and making jewelry is analogous to analyzing the data by applying some AI model. Instead of accessing the gold directly, the gold remains in a locked box, and the owner keeps the key to unlock the box. Homer can only handle the gold through gloves inserted in the box (analogous to handling only encrypted data). When Homer completes his work, the locked box is returned to the owner who unlocks the box to retrieve the jewelry.
To connect to Fig. 1 above, outsourcing sensitive work to an untrusted jeweler (cloud) is like following the arrows down, across, and then up. First the data owner encrypts the data and uploads it to the cloud, then the cloud operates on the encrypted data, then the cloud returns the output to the data owner to decrypt.
2.1 History
Almost 5 decades ago, we already had an example of encryption which is homomorphic for one operation: the RSA encryption scheme [36]. A message m is encrypted by raising it to the power e modulo N for fixed integers e and N. Thus the product of the encryption of two messages \(m_1\) and \(m_2\) is \(m_1^e m_2^e = (m_1 m_2)^e\). It was an open problem for more than thirty years to find an encryption scheme which was homomorphic with respect to two (ring) operations, allowing for the evaluation of any circuit. Boneh-Goh-Nissim [3] proposed a scheme allowing for unlimited additions and one multiplication, using the group of points on an elliptic curve over a finite field, along with the Weil pairing map to the multiplicative group of a finite field.
In 2009, Gentry proposed the first homomorphic encryption scheme, allowing in theory for evaluation of arbitrary circuits on encrypted data. However it took several years before researchers found schemes which were implementable, relatively practical, and based on known hard mathematical problems. Today all the major homomorphic encryption libraries world-wide implement schemes based on the hardness of lattice problems. A lattice can be thought of as a discrete linear subspace of Euclidean space, with the operations of vector addition, scalar multiplication, and inner product, and its dimension, n, is the number of basis vectors.
2.2 Lattice-Based Solutions
The high-level idea behind current solutions for homomorphic encryption is as follows. Building on an old and fundamental method of encryption, each message is blinded, by adding a random inner product to it: the inner product of a secret vector with a randomly generated vector. Historically, blinding a message with fresh randomness was the idea behind encryption via one-time pads, but those did not satisfy the homomorphic property. Taking inner products of vectors is a linear operation, but if homomorphic encryption involved only addition of the inner product, it would be easy to break using linear algebra. Instead, the encryption must also add some freshly generated noise to each blinded message, making it difficult to separate the noise from the secret inner product. The noise, or error, is selected from a fairly narrow Gaussian distribution. Thus the hard problem to solve becomes a noisy decoding problem in a linear space, essentially Bounded Distance Decoding (BDD) or a Closest Vector Problem (CVP) in a lattice. Decryption is possible with the secret key, because the decryptor can subtract the secret inner product and then the noise is small and is easy to cancel.
Although the above high-level description was formulated in terms of lattices, in fact the structure that we use in practice is a polynomial ring. A vector in a lattice of n dimensions can be thought of as a monic polynomial of degree n, where the coordinates of the vector are the coefficients of the polynomial. Any number ring is given as a quotient of \({\mathbb Z}[x]\), the polynomial ring with integer coefficients, by a monic irreducible polynomial f(x). The ring can be thought of as a lattice in \({\mathbb R}^n\) when embedded into Euclidean space via the canonical embedding. To make all objects finite, we consider these polynomial rings modulo a large prime q, which is often called the ciphertext modulus.
2.3 Encoding Data
When thinking about practical applications, it becomes clear that real data first has to be embedded into the mathematical structure that the encryption map is applied to, the plaintext space, before it is encrypted. This encoding procedure must also be homomorphic in order to achieve the desired functionality. The encryption will be applied to the polynomial ring with integer coefficients modulo q, so real data must be embedded into this polynomial ring.
In a now widely cited 2011 paper, “Can Homomorphic Encryption be Practical?” ([30, Sect. 4.1]), we introduced a new way of encoding real data in the polynomial space which allowed for efficient arithmetic operations on real data, opening up a new direction of research focusing on practical applications and computations. The encoding technique was simple: embed an integer m as a polynomial whose ith coefficient is the ith bit of the binary expansion of m (using the ordering of bits so that the least significant bit is encoded as the constant term in the polynomial). This allows for direct multiplication of real integers, represented as polynomials, instead of encoding and encrypting data bit-by-bit, which requires a deep circuit just to evaluate simple integer multiplication. When using this approach, it is important to keep track of the growth of the size of the output to the computation. In order to assure correct decryption, we limit the total size of the polynomial coefficients to t. Note that each coefficient was a single bit to start with, and a sum of k of them grows to at most k. We obtain the correct decryption and decoding as long as \(q> t > k\), so that the result does not wrap around modulo t.
This encoding of integers as polynomials has two important implications, for performance and for storage overhead. In addition to enabling multiplication of floating point numbers via direct multiplication of ciphertexts (rather than requiring deep circuits to multiply data encoded bit wise), this technique also saves space by packing a large floating point number into a single ciphertext, reducing the storage overhead. These encoding techniques help to squash the circuits to be evaluated, and make the size expansion reasonable. However, they limit the possible computations in interesting ways, and so all computations need to be expressed as polynomials. The key factor in determining the efficiency is the degree of the polynomial to be evaluated.
2.4 Brakerski/Fan-Vercauteren Scheme (BFV)
For completeness, I will describe one of the most widely used homomorphic encryption schemes, the Brakerski/Fan-Vercauteren Scheme (BFV) [7, 20], using the language of polynomial rings.
2.4.1 Parameters and Notation
Let \(q \gg t\) be positive integers and n a power of 2. Denote \(\Delta = \lfloor q/t \rfloor \). Define
and \(R_t = {\mathbb Z}/t{\mathbb Z}[x]/(x^n+1)\), where \({\mathbb Z}[x]\) is the set of polynomials with integer coefficients and \(({\mathbb Z}/q{\mathbb Z})[x]\) is the set of polynomials with integer coefficients in the range \([0,q-1)\).
In the BFV scheme, plaintexts are elements of \(R_t\), and ciphertexts are elements of \(R_q \times R_q\). Let \(\chi \) denote a narrow (centered) discrete Gaussian error distribution. In practice, most implementations of homomorphic encryption use a Gaussian distribution with standard deviation \(\sigma [\chi ] \approx 3.2\). Finally, let \(U_k\) denote the uniform distribution on \({\mathbb Z}\cap [-k/2, k/2)\).
2.4.2 Key Generation
To generate a public key, \(\texttt {pk}\), and a corresponding secret key, \(\texttt {sk}\), sample \(s \leftarrow U_3^n\), \(a \leftarrow U_q^n\), and \(e\leftarrow \chi ^n\). Each of s, a, and e is treated as an element of \(R_q\), where the n coefficients are sampled independently from the given distributions. To form the public key–secret key pair, let
where \([\cdot ]_q\) denotes the (coefficient-wise) reduction modulo q.
2.4.3 Encryption
Let \(m\in R_t\) be a plaintext message. To encrypt m with the public key \(\texttt {pk} = (p_0, p_1)\in R_q^2\), sample \(u \leftarrow U_3^n\) and \(e_1, e_2 \leftarrow \chi ^n\). Consider u and \(e_i\) as elements of \(R_q\) as in key generation, and create the ciphertext
2.4.4 Decryption
To decrypt a ciphertext \(\texttt {ct} = (c_0, c_1)\) given a secret key \(\texttt {sk} = s\), write
where \(c_0 + c_1 s\) is computed as an integer coefficient polynomial, and scaled by the rational number t/q. The polynomial b has integer coefficients, m is the underlying message, and v satisfies \(\Vert v\Vert _\infty \ll 1/2\). Thus decryption is performed by evaluating
where \(\left\lfloor \cdot \right\rceil \) denotes rounding to the nearest integer.
2.4.5 Homomorphic Computation
Next we see how to enable addition and multiplication of ciphertexts. Addition is easy: we define an operation \(\oplus \) between two ciphertexts \(\texttt {ct}_1 = (c_0, c_1)\) and \(\texttt {ct}_2 = (d_0, d_1)\) as follows:
Denote this homomorphic sum by \(\texttt {ct}_\text {sum} = (c^\text {sum}_0, c^\text {sum}_1)\), and note that if
then
As long as \(\Vert v_1 + v_2 \Vert _\infty < 1/2\), the ciphertext \(\texttt {ct}_\text {sum}\) is a correct encryption of \([m_1 + m_2]_t\).
Similarly, there is an operation \(\otimes \) between two ciphertexts that results in a ciphertext decrypting to \([m_1 m_2]_t\), as long as \(\Vert v_1\Vert _\infty \) and \(\Vert v_2\Vert _\infty \) are small enough. Since \(\otimes \) is more difficult to describe than \(\oplus \), we refer the reader to [20] for details.
2.4.6 Noise
In the decryption formula presented above the polynomial v with rational coefficients is assumed to have infinity-norm less than 1/2. Otherwise, the plaintext output by decryption will be incorrect. Given a ciphertext \(\texttt {ct} = (c_0, c_1)\) which is an encryption of a plaintext m, let \(v \in \mathbb {Q}[x]/(x^n + 1)\) be such that
The infinity norm of the polynomial v called the noise, and the ciphertext decrypts correctly as long as the noise is less than 1/2.
When operations such as addition and multiplication are applied to encrypted data, the noise in the result may be larger than the noise in the inputs. This noise growth is very small in homomorphic additions, but substantially larger in homomorphic multiplications. Thus, given a specific set of encryption parameters \((n, q, t, \chi )\), one can only evaluate computations of a bounded size (or bounded multiplicative depth).
A precise estimate of the noise growth for the YASHE scheme was given in [4] and these estimates were used in [5] to give an algorithm for selecting secure parameters for performing any given computation. Although the specific noise growth estimates needed for this algorithm do depend on which homomorphic encryption scheme is used, the general idea applies to any scheme.
2.5 Other Homomorphic Encryption Schemes
In 2011, researchers at Microsoft Research and Weizmann Institute published the (BV/BGV [8, 9]) homomorphic encryption scheme which is used by teams around the world today. In 2013, IBM released HELib, a homomorphic encryption library for research purposes, which implemented the BGV scheme. HELib is written in C++ and uses the NTL mathematical library. The Brakerski/Fan-Vercauteren (BFV) scheme described above was proposed in 2012. Alternative schemes with different security and error-growth properties were proposed in 2012 by Lopez-Alt, Tromer, and Vaikuntanathan (LTV [33]), and in 2013 by Bos, Lauter, Loftus, and Naehrig (YASHE [4]). The Cheon-Kim-Kim-Song (CKKS [14]) scheme was introduced in 2016, enabling approximate computation on ciphertexts.
Other schemes [16, 19] for general computation on bits are more efficient for logical tasks such as comparison, which operate bit-by-bit. Current research attempts to make it practical to switch between such schemes to enable both arithmetic and logical operations efficiently ([6]).
2.6 Microsoft SEAL
Early research prototype libraries were developed by the Microsoft Research (MSR) Cryptography group to demonstrate the performance numbers for initial applications such as those developed in [4, 5, 23, 29]. But due to requests from the biomedical research community, it became clear that it would be very valuable to develop a well-engineered library which would be widely usable by developers to enable privacy solutions. The Simple Encrypted Arithmetic Library (SEAL) [37] was developed in 2015 by the MSR Cryptography group with this goal in mind, and is written in C++. Microsoft SEAL was publicly released in November 2015, and was released open source in November 2018 for commercial use. It has been widely adopted by teams worldwide and is freely available online (http://sealcrypto.org).
Microsoft SEAL aims to be easy to use for non-experts, and at the same time powerful and flexible for expert use. SEAL maintains a delicate balance between usability and performance, but is extremely fast due to high-quality engineering. SEAL is extensively documented, and has no external dependencies. Other publicly available libraries include HELib from IBM, PALISADE by Duality Technologies, and HEAAN from Seoul National University.
2.7 Standardization of Homomorphic Encryption [1]
When new public key cryptographic primitives are introduced, historically there has been roughly a 10-year lag in adoption across the industry. In 2017, Microsoft Research Outreach and the MSR Cryptography group launched a consortium for advancing the standardization of homomorphic encryption technology, together with our academic partners, researchers from government and military agencies, and partners and customers from various industries: Homomorphic Encryption.org. The first workshop was hosted at Microsoft in July 2017, and developers for all the existing implementations around the world were invited to demo their libraries.
At the July 2017 workshop, we worked in groups to draft three white papers on Security, Applications, and APIs. We then worked with all relevant stakeholders of the HE community to revise the Security white paper [11] into the first draft standard for homomorphic encryption [1]. The Homomorphic Encryption Standard (HES) specifies secure parameters for the use of homomorphic encryption. The draft standard was initially approved by the HomomorphicEncryption.org community at the second workshop at MIT in March 2018, and then was finalized and made publicly available at the third workshop in October 2018 at the University of Toronto [1]. A study group was initiated in 2020 at the ISO, the International Standards Organization, to consider next steps for standardization.
3 What Kind of Computation Can We Do?
3.1 Statistical Computations
In early work, we focused on demonstrating the feasibility of statistical computations on health and genomic data, because privacy concerns are obvious in the realm of health and genomic data, and statistical computations are an excellent fit for efficient HE because they have very low depth. We demonstrated HE implementations and performance numbers for statistical computations in genomics such as the chi-square test, Cochran-Armitage Test for Trend, and Haplotype Estimation Maximization [29]. Next, we focused on string matching, using the Smith-Waterman algorithm for edit distance [15], another task which is frequently performed for genome sequencing and the study of genomic disease.
3.2 Heart Attack Risk
To demonstrate operations on health data, in 2013 we developed a live demo predicting the risk of having a heart attack based on six health characteristics [5]. We evaluated predictive models developed over decades in the Framingham Heart study, using the Cox proportional Hazard method. I showed the demo live to news reporters at the 2014 AAAS meeting, and our software processed my risk for a heart attack in the cloud, operating on encrypted data, in a fraction of a second.
In 2016, we started a collaboration with Merck to demonstrate the feasibility of evaluating such models on large patient populations. Inspired by our published work on heart attack risk prediction [5], they used SEAL to demonstrate running the heart attack risk prediction on one million patients from an affiliated hospital. Their implementation returned the results for all patients in about 2 h, compared to 10 min for the same computation on unencrypted patient data.
3.3 Cancer Patient Statistics
In 2017, we began a collaboration with a Crayon, a Norwegian company that develops health record systems. The goal of this collaboration was to demonstrate the value of SEAL in a real world working environment. Crayon reproduced all computations in the 2016 Norwegian Cancer Report using SEAL and operating on encrypted inputs. The report processed the cancer statistics from all cancer patients in Norway collected over the last roughly 5 decades.
3.4 Genomic Privacy
Engaging with a community of researchers in bioinformatics and biostatistics who were concerned with patient privacy issues led to a growing interdisciplinary community interested in the development of a range of cryptographic techniques to apply to privacy problems in the health and biological sciences arenas [18]. One measure of the growth of this community over the last five years has been participation in the iDASH Secure Genome Analysis Competition, a series of annual international competitions funded by the National Institutes of Health (NIH) in the U.S. The iDASH competition has included a track on Homomorphic Encryption for the last five years 2015–2019, and our team from MSR submitted winning solutions for the competition in 2015 ([27]) and 2016 ([10]). The tasks were: chi-square test, modified edit distance, database search, training logistic regression models, genotype imputation. Each year, roughly 5–10 teams from research groups around the world submitted solutions for the task, which were bench-marked by the iDASH team. These results provide the biological data science community and NIH with real and evolving measures of the performance and capability of homomorphic encryption to protect the privacy of genomic data sets while in use. Summaries of the competitions are published in [38, 40].
3.5 Machine Learning: Training and Prediction
The 2013 “ML Confidential” paper [23] was the first to propose training ML algorithms on homomorphically encrypted data and to show initial performance numbers for simple models such as linear means classifiers and gradient descent. Training is inherently challenging because of the large and unknown amount of data to be processed.
Prediction tasks on the other hand, process an input and model of known size, so many can be processed efficiently. For example, in 2016 we developed a demo using SEAL to predict the flowering time for a flower. The model processed 200, 000 SNPs from the genome of the flower, and evaluated a Fast Linear Mixed Model (LMM). Including the round-trip communication time with the cloud running the demo as a service in Azure, the prediction was obtained in under a second.
Another demo developed in 2016 using SEAL predicted the mortality risk for pneumonia patients based on 46 characteristics from the medical record for the patient. The model in this case is an example of an intelligible model and consists of 46\(^{\circ }\) 4 polynomials to be evaluated on the patient’s data. Data from 4, 096 patients can be batched together, and the prediction for all 4, 096 patients was returned by the cloud service in a few seconds (in 2016).
These two demos evaluated models which were represented by shallow circuits, linear in the first case and degree 4 in the second case. Other models such as deep neural nets (DNNs) are inherently more challenging because the circuits are so deep. To enable efficient solutions for such tasks requires a blend of cryptography and ML research, aimed at designing and testing ways to process data which allow for efficient operations on encrypted data while maintaining accuracy. An example of that was introduced in CryptoNets [22], showing that the activation function in the layers of the neural nets can be approximated with a low-degree polynomial function (\(x^2\)) without significant loss of accuracy.
The CryptoNets paper was the first to show the evaluation of a neural net predictions on encrypted data, and used the techniques introduced there to classify hand-written digits from the MNIST [31] data set. Many teams have since worked on improving the performance of CryptoNets, either with hybrid schemes or other optimizations [17, 25, 35]. In 2018, in collaboration with Median Technologies, we demonstrated deep neural net predictions for a medical image recognition task: classification of liver tumors based on medical images.
Returning to the challenge of training ML algorithms, the 2017 iDASH contest task required the teams to train a logistic regression model on encrypted data. The data set provided for the competition was very simple and did not require many iterations to train an effective model (the winning solution used only 7 iterations [26, 28]). The MSR solution [12] computed over 300 iterations and was fully scalable to any arbitrary number of iterations. We also applied our solution to a simplified version of the MNIST data set to demonstrate the performance numbers.
Performance numbers for all computations described here were published at the time of discovery. They would need to be updated now with the latest version of SEAL, or can be estimated. Hardware acceleration techniques using state-of-the-art FPGAs can be used to improve the performance further ([34]).
4 How Do We Assess Security?
The security of all homomorphic encryption schemes described in this article is based on the mathematics of lattice-based cryptography, and the hardness of well-known lattice problems in high dimensions, problems which have been studied for more than 25 years. Compare this to the age of other public key systems such as RSA (1975) or Elliptic Curve Cryptography ECC (1985). Cryptographic applications of Lattice-based Cryptography were first proposed by Hoffstein, Pipher, and Silverman [24] in 1996 and led them to launch the company NTRU. New hard problems such as LWE were proposed in the period of 2004–2010, but were reduced to older problems which had been studied already for several decades: the Approximate Shortest Vector Problem (SVP) and Bounded Distance Decoding.
The best known algorithms for attacking the Shortest Vector Problem or the Closest Vector Problem are called lattice basis reduction algorithms, and they have a more than 30-year history, including the LLL algorithm [32]. LLL runs in polynomial time, but only finds an exponentially bad approximation to the shortest vector. More recent improvements, such as BKZ 2.0 [13], involve exponential algorithms such as sieving and enumeration. Hard Lattice Challenges were created by TU Darmstadt and are publicly available online for anyone to try to attack and solve hard lattice problems of larger and larger size for the record.
Homomorphic Encryption scheme parameters are set such that the best known attacks take exponential time (exponential in the dimension of the lattice, n, meaning roughly \(2^n\) time). These schemes have the advantage that there are no known polynomial time quantum attacks, which means they are good candidates for Post-Quantum Cryptography (PQC) in the ongoing 5-year NIST PQC competition.
Lattice-based cryptography is currently under consideration for standardization in the ongoing NIST PQC Post-Quantum Cryptography competition. Most Homomorphic Encryption deployments use small secrets as an optimization, so it is important to understand the concrete security when sampling the secret from a non-uniform, small distribution. There are numerous heuristics used to estimate the running time and quality of lattice reduction algorithms such as BKZ2.0. The Homomorphic Encryption Standard recommends parameters based on the heuristic running time of the best known attacks, as estimated in the online LWE Estimator [2].
5 Conclusion
Homomorphic Encryption is a technology which allows meaningful computation on encrypted data, and provides a tool to protect privacy of data in use. A primary application of Homomorphic Encryption is secure and confidential outsourced storage and computation in the cloud (i.e. a data center). A client encrypts their data locally, and stores their encryption key(s) locally, then uploads it to the cloud for long-term storage and analysis. The cloud processes the encrypted data without decrypting it, and returns encrypted answers to the client for decryption. The cloud learns nothing about the data other than the size of the encrypted data and the size of the computation. The cloud can process Machine Learning or Artificial Intelligence (ML or AI) computations, either to make predictions based on known models or to train new models, while preserving the client’s privacy.
Current solutions for HE are implemented in 5–6 major open source libraries world-wide. The Homomorphic Encryption Standard [1] for using HE securely was approved in 2018 by HomomorphicEncryption.org, an international consortium of researchers in industry, government, and academia.
Today, applied Homomorphic Encryption remains an exciting direction in cryptography research. Several big and small companies, government contractors, and academic research groups are enthusiastic about the possibilities of this technology. With new algorithmic improvements, new schemes, an improved understanding of concrete use-cases, and an active standardization effort, wide-scale deployment of homomorphic encryption seems possible within the next 2–5 years. Small-scale deployment is already happening.
Computational performance, memory overhead, and the limited set of operations available in most libraries remain the main challenges. Most homomorphic encryption schemes are inherently parallelizable, which is important to take advantage of to achieve good performance. Thus, easily parallelizable arithmetic computations seem to be the most amenable to homomorphic encryption at this time and it seems plausible that initial wide-scale deployment may be in applications of Machine Learning to enable Private AI.
Notes
- 1.
My collaborators on the SEAL team include: Kim Laine, Hao Chen, Radames Cruz, Wei Dai, Ran Gilad-Bachrach, Yongsoo Song, Shabnam Erfani, Sreekanth Kannepalli, Jeremy Tieman, Tarun Singh, Hamed Khanpour, Steven Chith, James French, with substantial contributions from interns Gizem Cetin, Kyoohyung Han, Zhicong Huang, Amir Jalali, Rachel Player, Peter Rindal, Yuhou Xia as well.
References
Albrecht, M., Chase, M., Chen, H., Ding, J., Goldwasser, S., Gorbunov, S., Halevi, S., Hoffstein, J., Laine, K., Lauter, K., Lokam, S., Micciancio, Moody, D., Morrison, T., Sahai, A., Vaikuntanathan, V.: Homomorphic encryption security standard. Technical report, HomomorphicEncryption.org, Toronto, Canada, Nov 2018. https://eprint.iacr.org/2019/939
Albrecht, M., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. 9(3), 169–203 (2015)
Boneh, D., Goh, E., Nissim, K.: Evaluating 2-dnf formulas on ciphertexts. In: TCC’05: Proceedings of the Second international conference on Theory of Cryptography, vol. 3378. Lecture Notes in Computer Science, pp. 325–341. Springer, Berlin (2005)
Bos, J.W., Lauter, K., Loftus, J., Naehrig, M.: Improved security for a ring-based fully homomorphic encryption scheme. In: Cryptography and Coding, pp. 45–64. Springer, Berlin (2013)
Bos, J.W., Lauter, K., Naehrig, M.: Private predictive analysis on encrypted medical data. J. Biomed. Inform. 50, 234–243 (2014)
Boura, C., Gama, N., Georgieva, M., Jetchev, D.: Chimera: combining ring-LWE-based fully homomorphic encryption schemes. Cryptology ePrint Archive. https://eprint.iacr.org/2018/758
Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Advances in Cryptology–CRYPTO 2012, pp. 868–886. Springer, Berlin (2012)
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. In: Proceedings of ITCS, pp. 309–325. ACM (2012)
Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. In: 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pp. 97–106, Oct 2011
Cetin, G.S., Chen, H., Laine, K., Lauter, K., Rindal, P., Xia, Y.: Private queries on encrypted genomic data. BMC Med. Genomics 10(45) (2017)
Chase, M., Chen, H., Ding, J., Goldwasser, S., Gorbunov, S., Hoffstein, J., Lauter, K., Lokam, S., Moody, D., Morrison, T., Sahai, A., Vaikuntanathan, V.: Security of homomorphic encryption. HomomorphicEncryption.org, Redmond WA, Technical report (2017)
Chen, H., Gilad-Bachrach, R., Han, K., Huang, Z., Jalali, A., Laine, K., Lauter, K.: Logistic regression over encrypted data from fully homomorphic encryption. BMC Med. Genomics 11(81) (2018)
Chen, Y., Nguyen, P.Q.: BKZ 2.0: better lattice security estimates. In: Lee, D.H., Wang, X. (eds.) Advances in Cryptology—ASIACRYPT 2011, pp. 1–20. Springer, Berlin (2011)
Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: International Conference on the Theory and Application of Cryptology and Information Security, pp. 409–437. Springer, Berlin (2017)
Cheon, J.H., Kim, M., Song, Y.: . Homomorphic computation of edit distance. In: International Conference on Financial Cryptography and Data Security, pp. 194–212. Springer, Berlin (2015)
Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: TFHE: fast fully homomorphic encryption over the torus. J. Cryptol. 33, 34–91 (2020)
Dathathri, R., Saarikivi, O., Chen, H., Laine, K., Lauter, K., Maleki, S., Musuvathi, M., Mytkowicz, T.: CHET: an optimizing compiler for fully-homomorphic neural-network inferencing. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 142–156. ACM (2019)
Dowlin, N., Gilad-Bachrach, R., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: Manual for using homomorphic encryption for bioinformatics. Proc. IEEE 105(3), 552–567 (2017)
Ducas, L.,Micciancio, D.: FHEW: bootstrapping homomorphic encryption in less than a second. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 617–640. Springer, Berlin (2015)
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. In: IACR Cryptology ePrint Archive 144 (2012). https://eprint.iacr.org/2012/144. Accessed on 9 April 2018
Gentry, C.: A fully homomorphic encryption scheme. Stanford University (2009)
Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016)
Graepel, T., Lauter, K., Naehrig, M.: ML confidential: Machine learning on encrypted data. In: International Conference on Information Security and Cryptology, pp. 1–21. Springer, Berlin (2012)
Hoffstein, J., Pipher, J., Silverman, J.H.: NTRU: a ring-based public key cryptosystem. In: Algorithmic number theory (Portland, OR, 1998), vo. 1423. Lecture Notes in Computer Science, pp. 267–288. Springer, Berlin (1998)
Juvekar, C., Vaikuntanathan, V., Chandrakasan, A.: GAZELLE: a low latency framework for secure neural network inference. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 1651–1669 (2018)
Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.-H.: Logistic regression model training based on the approximate homomorphic encryption. Cryptology ePrint Archive, Report 2018/254 (2018). https://eprint.iacr.org/2018/254
Kim, M., Lauter, K.: Private genome analysis through homomorphic encryption. BMC Med. Inform. Decis. Making 15(Suppl 5), S3 (2015)
Kim, M., Song, Y., Wang, S., Xia, Y., Jiang, X.: Secure logistic regression based on homomorphic encryption. Cryptology ePrint Archive, Report 2018/074 (2018). https://eprint.iacr.org/2018/074
Lauter, K., López-Alt, A., Naehrig, M.: Private computation on encrypted genomic data. In: International Conference on Cryptology and Information Security in Latin America, pp. 3–27. Springer, Berlin (2014)
Lauter, K., Naehrig, M., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop (CCSW ’11), New York, NY, USA, pp. 113–124. ACM (2011)
LeCun, Y., Cortes, C., Burges, C.J.C.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
Lenstra, A.K., Lenstra, H.W., Lovász, L.: Factoring polynomials with rational coefficients. Mathematische Annalen 261(4), 515–534 (1982)
Lopez-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of STOC, pp. 1219–1234. IEEE Computer Society (2012)
Sadegh Riazi, M., Laine, K., Pelton, B., Dai, W.: Heax: high-performance architecture for computation on homomorphically encrypted data in the cloud. arXiv preprintarXiv:1909.09731 (2019)
Sadegh Riazi, M., Samragh, M., Chen, H., Laine, K., Lauter, K., Koushanfar, F.: XONN: Xnor-based oblivious deep neural network inference. In: 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, pp. 1501–1518. USENIX Association, Aug 2019
Rivest, R., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Microsoft SEAL (release 3.2). https://github.com/Microsoft/SEAL. Microsoft Research, Redmond, WA, Nov 2018
Tang, H., Jiang, X., Wang, X., Wang, S., Sofia, H., Fox, D., Lauter, K., Malin, B., Telenti, A., Li, Xi., Ohno-Machado, L.: Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med. Genomics 9(63) (2016)
Vanian, J.: 4 Big Takeaways from Satya Nadella’s talk at Microsoft Build (2018). https://fortune.com/2018/05/07/microsoft-satya-nadella-build/
Wang, S., Jiang, X., Tang, H., Wang, X., Bu, D., Carey, K., Dyke, S.O.M., Fox, D., Jiang, C., Lauter, K., Malin, B., Sofia, H., Telenti, A., Wang, L., Wang, W., Ohno-Machado, L.: A community effort to protect genomic data sharing, collaboration and outsourcing. NPJ Genomic Med. 2(33) (2017)
Acknowledgements
I would like to gratefully acknowledge the contributions of many people in the achievements, software, demos, standards, assets and impact described in this article. First and foremost, none of this software or applications would exist without my collaborators on the SEAL team, including Kim Laine, Hao Chen, Radames Cruz, Wei Dai, Ran Gilad-Bachrach, Yongsoo Song, John Wernsing, with substantial contributions from interns Gizem Cetin, Kyoohyung Han, Zhicong Huang, Amir Jalali, Rachel Player, Peter Rindal, Yuhou Xia as well. The demos described here were developed largely by our partner engineering team in Foundry 99: Shabnam Erfani, Sreekanth Kannepalli, Steven Chith, James French, Hamed Khanpour, Tarun Singh, Jeremy Tieman. I launched the Homomorphic Encryption Standardization process in collaboration with Kim Laine from my team, with Roy Zimmermann and the support of Microsoft Outreach, and collaborators Kurt Rohloff, Vinod Vaikuntanathan, Shai Halevi, and Jung Hee Cheon, and collectively we now form the Steering Committee of HomomorphicEncryption.org. Finally I would like to thank the organizers of ICIAM 2019 for the invitation to speak and to write this article.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Lauter, K. (2022). Private AI: Machine Learning on Encrypted Data. In: Chacón Rebollo, T., Donat, R., Higueras, I. (eds) Recent Advances in Industrial and Applied Mathematics. SEMA SIMAI Springer Series(), vol 1. Springer, Cham. https://doi.org/10.1007/978-3-030-86236-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-86236-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86235-0
Online ISBN: 978-3-030-86236-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)